linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
@ 2026-04-14 14:23 Kiryl Shutsemau (Meta)
  2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Kiryl Shutsemau (Meta) @ 2026-04-14 14:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Xu, David Hildenbrand, Lorenzo Stoakes, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Liam R . Howlett, Zi Yan,
	Jonathan Corbet, Shuah Khan, Sean Christopherson, Paolo Bonzini,
	linux-mm, linux-kernel, linux-doc, linux-kselftest, kvm,
	Kiryl Shutsemau (Meta)

This series adds userfaultfd support for tracking the working set of
VM guest memory, enabling VMMs to identify cold pages and evict them
to tiered or remote storage.

== Problem ==

VMMs managing guest memory need to:
1. Track which pages are actively used (working set detection)
2. Safely evict cold pages to slower storage
3. Fetch pages back on demand when accessed again

For shmem-backed guest memory, working set tracking partially works
today: MADV_DONTNEED zaps PTEs while pages stay in page cache, and
re-access auto-resolves from cache. But safe eviction still requires
synchronous fault interception to prevent data loss races.

For anonymous guest memory (needed for KSM cross-VM deduplication),
there is no mechanism at all — clearing a PTE loses the page.

== Solution ==

The series introduces a unified userfaultfd interface that works
across both anonymous and shmem-backed memory:

UFFD_FEATURE_MINOR_ANON: extends MODE_MINOR registration to anonymous
private memory. Uses the PROT_NONE hinting mechanism (same as NUMA
balancing) to make pages inaccessible without freeing them.

UFFD_FEATURE_MINOR_ASYNC: auto-resolves minor faults without handler
involvement. The kernel restores PTE permissions immediately and the
faulting thread continues. Works for anonymous, shmem, and hugetlbfs.

UFFDIO_DEACTIVATE: marks pages as deactivated. For anonymous memory,
sets PROT_NONE on PTEs (pages stay resident). For shmem/hugetlbfs,
zaps PTEs (pages stay in page cache).

UFFDIO_SET_MODE: toggles MINOR_ASYNC at runtime, synchronized via
mmap_write_lock. Enables the VMM workflow: async mode for lightweight
detection, sync mode for race-free eviction.

PAGE_IS_UFFD_DEACTIVATED: PAGEMAP_SCAN category flag for efficient
batch detection of cold (still-deactivated) anonymous pages.

== VMM Workflow ==

    UFFDIO_DEACTIVATE(all)            -- async, no vCPU stalls
    sleep(interval)
    PAGEMAP_SCAN                      -- find cold pages
    UFFDIO_SET_MODE(sync)             -- block faults for eviction
    pwrite + MADV_DONTNEED cold pages -- safe, faults block
    UFFDIO_SET_MODE(async)            -- resume tracking

The same workflow applies to shmem, with a different PAGEMAP_SCAN mask
(!PAGE_IS_PRESENT instead of PAGE_IS_UFFD_DEACTIVATED).

== NUMA Balancing ==

NUMA balancing scanning is skipped on anonymous VM_UFFD_MINOR VMAs to
avoid protnone conflicts. NUMA locality stats are fed from the uffd
fault path via task_numa_fault() so the scheduler retains placement
data. Shmem VMAs are unaffected (UFFDIO_DEACTIVATE zaps PTEs there,
no protnone involved).

== Testing ==

The series includes 6 new selftests covering async/sync modes,
PAGEMAP_SCAN cold detection, GUP through protnone, UFFDIO_SET_MODE
toggling, and cleanup on close. All 73 uffd unit tests pass
(including hugetlb) across defconfig, allnoconfig, allmodconfig,
and randomized configs.

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 ++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 +-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 +++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 +++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 ++-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 ++++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

-- 
2.51.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-04-14 17:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08   ` Kiryl Shutsemau
2026-04-14 17:45     ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10   ` Kiryl Shutsemau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox