Quick Facts
- Category: Linux & DevOps
- Published: 2026-05-01 02:02:53
- Mastering SAP-Related npm Packages Compromised in Credential-Stealing Supply ...
- Valve Breaks Four-Year Silence with Major Update to GameNetworkingSockets v1.5
- Rethinking Rice: Innovative Strategies to Reduce Methane Emissions from Paddies
- How to Explore Kingman’s Historic Powerhouse and Plan an Effortless EV Road Trip on Route 66
- How to Organize and Enjoy Your Music Library with Strawberry on Linux
Overview
Page migration is a critical operation in modern Linux memory management, especially in systems with heterogeneous memory (e.g., NUMA nodes, disaggregated memory, or CXL-attached devices). When a process accesses memory from a remote node, performance degrades due to latency. Migration moves pages to the accessing node, but traditional single-page migrations are CPU-intensive and cause high overhead. The new patch series, initially proposed by a NVIDIA engineer in early 2025 and now advanced by AMD engineers, introduces accelerated page migration using batch copies and hardware offloading. This guide will help you understand, apply, and test these patches to boost system performance.
Prerequisites
- Linux kernel development environment: You need a recent kernel source tree (mainline or linux-next) and build tools (gcc, make, binutils).
- Hardware with migration acceleration: Typically AMD EPYC processors with certain features (check PCIe ATS/PRI, CXL.mem support).
- Familiarity with memory management concepts: NUMA, page tables, direct memory access (DMA), and memory failure handling.
- Access to the kernel mailing list (LKML): To fetch the latest patch revision (v4 or later).
- Test workload: A program that stresses NUMA migrations (e.g., modified stream benchmark or sysbench memory test).
Step-by-Step Instructions
1. Understand the Patch Series
The patches extend the existing migrate_pages() system call and the internal migrate_vma mechanism. The key innovation is turning single-page copy operations into batch requests sent to hardware DMA engines (like the AMD IOMMU or CXL.mem controllers). This reduces per-page overhead and leverages dedicated copy engines. The series also adds a new MIGRATE_BATCH flag and modifies the kernel's page migration path to aggregate multiple pages before offloading.
Read the cover letter on LKML (link). Focus on the changes to mm/migrate.c, include/linux/migrate.h, and the architecture-specific DMA setup (e.g., arch/x86/kernel/amd_iommu.c).
2. Apply the Patches to Your Kernel
- Clone the latest linux-next tree:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git - Download the patch series from LKML (usually as mbox file or git pull request). For example, using
git amafter applying patches viab4tool. - Apply patches in order:
git am *.patch - Configure kernel: Enable
CONFIG_MIGRATION(already default) and newCONFIG_MIGRATE_BATCH(optional experimental). Also ensureCONFIG_AMD_IOMMUis enabled for hardware offloading. - Build kernel:
make -j$(nproc) - Install modules and new kernel:
sudo make modules_install install - Reboot into the new kernel.
3. Verify Feature Availability
After booting, check kernel messages for batch migration support:
dmesg | grep -i "migrate_batch"
You should see something like: "migrate_batch: acceleration enabled via IOMMU". Also examine /sys/kernel/debug/migration if debugfs is mounted. The directory may contain a batch_stats file.
4. Configure Hardware Offloading (Optional)
By default, batch offloading may be disabled. To enable, echo to sysfs:
echo 1 > /sys/module/migrate_batch/parameters/enable_offload
To set batch size (number of pages per batch, default 32):
echo 64 > /sys/module/migrate_batch/parameters/batch_size
Note: Larger batches may reduce overhead but increase latency for synchronous ops. Tune based on workload.
5. Run Benchmark and Compare Performance
- Test workload: Write a simple program that allocates memory on node 0, then binds to node 1 and accesses memory repeatedly, triggering page migration.
- Compile with
gcc -o migrate_test migrate_test.c -lnuma(install libnuma-dev). - Run with and without batch/hardware offloading. Disable offloading by writing 0 to the parameter.
- Measure migration time: Use
perf stator add internal timing. Example command:
sudo numactl --cpunodebind=1 --membind=0 timeout 10 ./migrate_test
Collect results and compare. Expected improvement: 2x-5x reduction in migration latency for large data sets.
Common Mistakes
- Applying patches out of order: The patch series interleaves dependencies. Use
git am --patch-format=mboxor a tool likeb4to apply correctly. - Missing kernel configuration symbols: Ensure
CONFIG_MIGRATE_BATCHandCONFIG_AMD_IOMMUare enabled. The kernel may compile without errors but batch support will be a no-op. - Not enabling hardware offloading: The sysfs parameter defaults to off. Forgetting to enable it means you still use software batch copies (which still benefit from reduced syscall overhead but not DMA).
- Overlooking hardware requirements: IOMMU v2 and CXL.mem hardware are required for true offloading. Check
lspci -v | grep -i iommuand verify BIOS settings. - Using incompatible workload: Very small pages (4KB) may not benefit; test with 2MB huge pages for better batch packing.
- Ignoring kernel warnings: If
dmesgshows "migrate_batch: no DMA engine found", fall back to software batch mode.
Summary
The AMD batch migration patches represent a significant step toward reducing page migration overhead in Linux. By grouping migrations into large batches and offloading copy operations to hardware DMA engines, the kernel can improve performance for memory-intensive workloads on NUMA and CXL systems. This guide provided an overview, prerequisites, step-by-step instructions for applying and testing the patches, and common pitfalls to avoid. With careful tuning, developers can achieve substantial latency reductions. Keep an eye on LKML for future revisions that might extend support to other architectures or improve batch scheduling.