linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] mm: MAP_POPULATE on writable anonymous mappings marks pte dirty is necessarily?
@ 2025-09-22  6:19 wuyifeng (C)
  2025-09-22  8:45 ` Pedro Falcato
  2025-09-22  9:00 ` David Hildenbrand
  0 siblings, 2 replies; 9+ messages in thread
From: wuyifeng (C) @ 2025-09-22  6:19 UTC (permalink / raw)
  To: david, akpm; +Cc: linux-mm

Hi all, While reviewing the memory management code, I noticed a
potential inefficiency related to MAP_POPULATE used on writable
anonymous mappings.I verified the behavior on the mainline kernel
and wanted to share it for discussion.

Test Environment:
Kernel version: 6.17.0-rc4-00083-gb9a10f876409
Architecture: aarch64

Background:
For anonymous mappings with PROT_WRITE | PROT_READ, using MAP_POPULATE
is intended to pre-fault pages, so that subsequent accesses do not
trigger page faults. However,I observed that when MAP_POPULATE is used
on writable anonymous mappings, all pre-faulted pages are immediately
marked as dirty, even though the user program has not written to them.

Minimal Reproduction:

#define _GNU_SOURCE
#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>

int main() {
    size_t len = 100*1024*1024; // 100MB
    void *p = mmap(NULL, len, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
    if (p == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    pause();
    return 0;
}

Observed Output (/proc/<pid>/smaps):
ffff7a600000-ffff80a00000 rw-p 00000000 00:00 0
Size:             102400 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              102400 kB
Pss:              102400 kB
Pss_Dirty:        102400 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    102400 kB
Referenced:       102400 kB
Anonymous:        102400 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:    102400 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd wr mr mw me ac

Code Path Analysis:
The behavior can be traced through the following kernel code path:
populate_vma_page_range() is invoked to pre-fault pages for the VMA.
Inside it:

if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
        gup_flags |= FOLL_WRITE;

This sets FOLL_WRITE for writable anonymous VMAs.

Later, in faultin_page():

if (*flags & FOLL_WRITE)
        fault_flags |= FAULT_FLAG_WRITE;

This effectively marks the page fault as a write.
Finally, in do_anonymous_page():

if (vma->vm_flags & VM_WRITE)
        entry = pte_mkwrite(pte_mkdirty(entry), vma);

Here, the PTE is updated to writable and immediately marked dirty.
As a result, all pre-faulted pages are marked dirty, even though the
user program has not performed any writes.
For large anonymous mappings, this can trigger unnecessary swap-out
writebacks, generating avoidable I/O.

Discussion:
Would it be possible to optimize this behavior: for example, by
populate pte as writable, but deferring the dirty bit until the user
actually writes to the page?

Thanks, [wuyifeng]


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-22 14:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-22  6:19 [RFC] mm: MAP_POPULATE on writable anonymous mappings marks pte dirty is necessarily? wuyifeng (C)
2025-09-22  8:45 ` Pedro Falcato
2025-09-22  9:07   ` David Hildenbrand
2025-09-22  9:37     ` Pedro Falcato
2025-09-22  9:49       ` wuyifeng (C)
2025-09-22 12:46       ` David Hildenbrand
2025-09-22 14:13         ` Pedro Falcato
2025-09-22 14:44           ` David Hildenbrand
2025-09-22  9:00 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox