linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk,
	bala.seshasayee@linux.intel.com, chrisl@kernel.org,
	kanchana.p.sridhar@intel.com, kasong@tencent.com,
	nphamcs@gmail.com, ryan.roberts@arm.com,
	senozhatsky@chromium.org, terrelln@fb.com,
	usamaarif642@gmail.com, v-songbaohua@oppo.com,
	wajdi.k.feghali@intel.com, willy@infradead.org,
	ying.huang@linux.alibaba.com, yosryahmed@google.com,
	baolin.wang@linux.alibaba.com
Subject: Re: [PATCH RFC] mm: map zero-filled pages to zero_pfn while doing swap-in
Date: Thu, 12 Dec 2024 08:25:08 -0800	[thread overview]
Message-ID: <20241212162508.GA4712@cmpxchg.org> (raw)
In-Reply-To: <CAGsJ_4zYNh_prasHkPbd1We-OaQEmdT7bbxLSpECnS=nSRPQ7Q@mail.gmail.com>

On Thu, Dec 12, 2024 at 10:16:22PM +1300, Barry Song wrote:
> On Thu, Dec 12, 2024 at 9:51 PM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 12.12.24 09:46, Barry Song wrote:
> > > On Thu, Dec 12, 2024 at 9:29 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >>
> > >> On Thu, Dec 12, 2024 at 08:37:11PM +1300, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> While developing the zeromap series, Usama observed that certain
> > >>> workloads may contain over 10% zero-filled pages. This may present
> > >>> an opportunity to save memory by mapping zero-filled pages to zero_pfn
> > >>> in do_swap_page(). If a write occurs later, do_wp_page() can
> > >>> allocate a new page using the Copy-on-Write mechanism.
> > >>
> > >> Shouldn't this be done during, or rather instead of swap out instead?
> > >> Swapping all zero pages out just to optimize the in-memory
> > >> representation on seems rather backwards.
> > >
> > > I’m having trouble understanding your point—it seems like you might
> > > not have fully read the code. :-)
> > >
> > > The situation is as follows: for a zero-filled page, we are currently
> > > allocating a new
> > > page unconditionally. By mapping this zero-filled page to zero_pfn, we could
> > > save the memory used by this page.
> > >
> > > We don't need to allocate the memory until the page is written(which may never
> > > happen).
> >
> > I think what Christoph means is that you would determine that at PTE
> > unmap time, and directly place the zero page in there. So there would be
> > no need to have the page fault at all.
> >
> > I suspect at PTE unmap time might be problematic, because we might still
> > have other (i.e., GUP) references modifying that page, and we can only
> > rely on the page content being stable after we flushed the TLB as well.
> > (I recall some deferred flushing optimizations)
> 
> Yes, we need to follow a strict sequence:
> 
> 1. try_to_unmap - unmap PTEs in all processes;
> 2. try_to_unmap_flush_dirty - flush deferred TLB shootdown;
> 3. pageout - zeromap will set 1 in bitmap if page is zero-filled
> 
> At the moment of pageout(), we can be confident that the page is zero-filled.
> 
> mapping to zeropage during unmap seems quite risky.

You have to unmap and flush to stop modifications, but I think not in
all processes before it's safe to decide. Shared anon pages have COW
semantics; when you enter try_to_unmap() with a page and rmap gives
you a pte, it's one of these:

  a) never forked, no sibling ptes
  b) cow broken into private copy, no sibling ptes
  c) cow/WP; any writes to this or another pte will go to a new page.

In cases a and b you need to unmap and flush the current pte, but then
it's safe to check contents and set the zero pte right away, even
before finishing the rmap walk.

In case c, modifications to the page are impossible due to WP, so you
don't even need to unmap and flush before checking the contents. The
pte lock holds up COW breaking to a new page until you're done.

It's definitely more complicated than the current implementation, but
if it can be made to work, we could get rid of the bitmap.

You might also reduce faults, but I'm a bit skeptical. Presumably
zerofilled regions are mostly considered invalid by the application,
not useful data, so a populating write that will cowbreak seems more
likely to happen next than a faultless read from the zeropage.


  reply	other threads:[~2024-12-12 16:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-12  7:37 Barry Song
2024-12-12  8:29 ` Christoph Hellwig
2024-12-12  8:46   ` Barry Song
2024-12-12  8:50     ` Christoph Hellwig
2024-12-12  8:54       ` Barry Song
2024-12-12  8:50     ` David Hildenbrand
2024-12-12  9:16       ` Barry Song
2024-12-12 16:25         ` Johannes Weiner [this message]
2024-12-13  1:47           ` Barry Song
2024-12-13  2:27             ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241212162508.GA4712@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bala.seshasayee@linux.intel.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=terrelln@fb.com \
    --cc=usamaarif642@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox