linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kvm@vger.kernel.org, "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Subject: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Tue, 14 Apr 2026 15:23:34 +0100	[thread overview]
Message-ID: <20260414142354.1465950-1-kas@kernel.org> (raw)

This series adds userfaultfd support for tracking the working set of
VM guest memory, enabling VMMs to identify cold pages and evict them
to tiered or remote storage.

== Problem ==

VMMs managing guest memory need to:
1. Track which pages are actively used (working set detection)
2. Safely evict cold pages to slower storage
3. Fetch pages back on demand when accessed again

For shmem-backed guest memory, working set tracking partially works
today: MADV_DONTNEED zaps PTEs while pages stay in page cache, and
re-access auto-resolves from cache. But safe eviction still requires
synchronous fault interception to prevent data loss races.

For anonymous guest memory (needed for KSM cross-VM deduplication),
there is no mechanism at all — clearing a PTE loses the page.

== Solution ==

The series introduces a unified userfaultfd interface that works
across both anonymous and shmem-backed memory:

UFFD_FEATURE_MINOR_ANON: extends MODE_MINOR registration to anonymous
private memory. Uses the PROT_NONE hinting mechanism (same as NUMA
balancing) to make pages inaccessible without freeing them.

UFFD_FEATURE_MINOR_ASYNC: auto-resolves minor faults without handler
involvement. The kernel restores PTE permissions immediately and the
faulting thread continues. Works for anonymous, shmem, and hugetlbfs.

UFFDIO_DEACTIVATE: marks pages as deactivated. For anonymous memory,
sets PROT_NONE on PTEs (pages stay resident). For shmem/hugetlbfs,
zaps PTEs (pages stay in page cache).

UFFDIO_SET_MODE: toggles MINOR_ASYNC at runtime, synchronized via
mmap_write_lock. Enables the VMM workflow: async mode for lightweight
detection, sync mode for race-free eviction.

PAGE_IS_UFFD_DEACTIVATED: PAGEMAP_SCAN category flag for efficient
batch detection of cold (still-deactivated) anonymous pages.

== VMM Workflow ==

    UFFDIO_DEACTIVATE(all)            -- async, no vCPU stalls
    sleep(interval)
    PAGEMAP_SCAN                      -- find cold pages
    UFFDIO_SET_MODE(sync)             -- block faults for eviction
    pwrite + MADV_DONTNEED cold pages -- safe, faults block
    UFFDIO_SET_MODE(async)            -- resume tracking

The same workflow applies to shmem, with a different PAGEMAP_SCAN mask
(!PAGE_IS_PRESENT instead of PAGE_IS_UFFD_DEACTIVATED).

== NUMA Balancing ==

NUMA balancing scanning is skipped on anonymous VM_UFFD_MINOR VMAs to
avoid protnone conflicts. NUMA locality stats are fed from the uffd
fault path via task_numa_fault() so the scheduler retains placement
data. Shmem VMAs are unaffected (UFFDIO_DEACTIVATE zaps PTEs there,
no protnone involved).

== Testing ==

The series includes 6 new selftests covering async/sync modes,
PAGEMAP_SCAN cold detection, GUP through protnone, UFFDIO_SET_MODE
toggling, and cleanup on close. All 73 uffd unit tests pass
(including hugetlb) across defconfig, allnoconfig, allmodconfig,
and randomized configs.

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 ++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 +-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 +++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 +++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 ++-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 ++++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

-- 
2.51.2



             reply	other threads:[~2026-04-14 14:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 14:23 Kiryl Shutsemau (Meta) [this message]
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08   ` Kiryl Shutsemau
2026-04-14 17:45     ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260414142354.1465950-1-kas@kernel.org \
    --to=kas@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox