linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Suppress pte soft-dirty bit with UFFDIO_COPY?
@ 2025-05-05 16:37 Kyle Huey
  2025-05-05 20:05 ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Kyle Huey @ 2025-05-05 16:37 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu; +Cc: open list, linux-mm, criu, Robert O'Callahan

tl;dr I'd like to add UFFDIO_COPY_MODE_DONTSOFTDIRTY that does not add
the _PAGE_SOFT_DIRTY bit to the relevant pte flags. Any
thoughts/objections?

The kernel has a "soft-dirty" bit on ptes which tracks if they've been
written to since the last time /proc/pid/clear_refs was used to clear
the soft-dirty bit. CRIU uses this to track which pages have been
modified since a previous checkpoint and reduce the size of the
checkpoints taken. I would like to use this in my debugger[0] to track
which pages a program function dirties when that function is invoked
from the debugger.

However, the runtime environment for this function is rather unusual.
In my debugger, the process being debugged doesn't actually exist
while it's being debugged. Instead, we have a database of all program
state (including registers and memory values) from when the process
was executed. It's in some sense a giant core dump that spans multiple
points in time. To execute a program function from the debugger we
rematerialize the program state at the desired point in time from our
database.

For performance reasons, we fill in the memory lazily[1] via
userfaultfd. This makes it difficult to use the soft-dirty bit to
track the writes the function triggers, because UFFDIO_COPY (and
friends) mark every page they touch as soft-dirty. Because we have the
canonical source of truth for the pages we materialize via UFFDIO_COPY
we're only interested in what happens after the userfaultfd operation.

Clearing the soft-dirty bit is complicated by two things:
1. There's no way to clear the soft-dirty bit on a single pte, so
instead we have to clear the soft-dirty bits for the entire process.
That requires us to process all the soft-dirty bits on every other pte
immediately to avoid data loss.
2. We need to clear the soft-dirty bits after the userfaultfd
operation, but in order to avoid racing with the task that triggered
the page fault we have to do a non-waking copy, then clear the bits,
and then separately wake up the task.

To work around all of this, we currently have a 4 step process:
1. Read /proc/pid/pagemap and note all ptes that are soft-dirty.
2. Do the UFFDIO_COPY with UFFDIO_COPY_MODE_DONTWAKE.
3. Write to /proc/pid/clear_refs to clear soft-dirty bits across the process.
4. Do a UFFDIO_WAKE.

The overhead of all of this (particularly step 1) is a millisecond or
two *per page* that we lazily materialize, and while that's not
crippling for our purposes, it is rather undesirable. What I would
like to have instead is a UFFDIO_COPY mode that leaves the soft-dirty
bit unchanged, i.e. a UFFDIO_COPY_MODE_DONTSOFTDIRTY. Since we clear
all the soft-dirty bits once after setting up all the mmaps in the
process the relevant ptes would then "just do the right thing" from
our perspective.

But I do want to get some feedback on this before I spend time writing
any code. Is there a reason not to do this? Or an alternate way to
achieve the same goal?

If this is generally sensible, then a couple questions:
1. Do I need a UFFD_FEATURE flag for this, or is it enough for a
program to be able to detect the existence of a
UFFDIO_COPY_MODE_DONTSOFTDIRTY by whether the ioctl accepts the flag
or returns EINVAL? I would tend to think the latter.
2. Should I add this mode for the other UFFDIO variants (ZEROPAGE,
MOVE, etc) at the same time even if I don't have any use for them?

- Kyle

[0] https://pernos.co/
[1] Conceptually this is similar to CRIU's `restore --lazy-pages`. We
set up all the mappings at the beginning but we don't back them.
Instead we UFFDIO_REGISTER them all and when they're touched for the
first time we go get the pages from our database and then UFFDIO_COPY
them into the address space.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-23 20:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-05 16:37 Suppress pte soft-dirty bit with UFFDIO_COPY? Kyle Huey
2025-05-05 20:05 ` Peter Xu
2025-05-05 22:15   ` Kyle Huey
2025-05-12  3:06     ` Kyle Huey
2025-05-12 15:54       ` Peter Xu
2025-05-12 17:16         ` Kyle Huey
2025-05-13 13:24           ` Peter Xu
2025-05-23 20:32             ` Axel Rasmussen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox