From: Yosry Ahmed <yosryahmed@google.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Usama Arif <usamaarif642@gmail.com>,
akpm@linux-foundation.org, chengming.zhou@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com, Hugh Dickins <hughd@google.com>,
Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH 1/2] mm: store zero pages to be swapped out in a bitmap
Date: Thu, 30 May 2024 12:49:35 -0700 [thread overview]
Message-ID: <CAJD7tkYDjmMLnH_2sQuuMLE0FE5YqZEppNsprCnm5RdaSkGEBQ@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=NX-4dbietxy-25F-OotuGGL0F9h+hwV76b9Ap5nSy9uw@mail.gmail.com>
On Thu, May 30, 2024 at 12:18 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Thu, May 30, 2024 at 9:24 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Thu, May 30, 2024 at 5:27 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Thu, May 30, 2024 at 11:19:07AM +0100, Usama Arif wrote:
> > > > Approximately 10-20% of pages to be swapped out are zero pages [1].
> > > > Rather than reading/writing these pages to flash resulting
> > > > in increased I/O and flash wear, a bitmap can be used to mark these
> > > > pages as zero at write time, and the pages can be filled at
> > > > read time if the bit corresponding to the page is set.
> > > > With this patch, NVMe writes in Meta server fleet decreased
> > > > by almost 10% with conventional swap setup (zswap disabled).
> > > >
> > > > [1]https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1/
> > > >
> > > > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > >
> > > This is awesome.
> > >
> > > > ---
> > > > include/linux/swap.h | 1 +
> > > > mm/page_io.c | 86 ++++++++++++++++++++++++++++++++++++++++++--
> > > > mm/swapfile.c | 10 ++++++
> > > > 3 files changed, 95 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > > index a11c75e897ec..e88563978441 100644
> > > > --- a/include/linux/swap.h
> > > > +++ b/include/linux/swap.h
> > > > @@ -299,6 +299,7 @@ struct swap_info_struct {
> > > > signed char type; /* strange name for an index */
> > > > unsigned int max; /* extent of the swap_map */
> > > > unsigned char *swap_map; /* vmalloc'ed array of usage counts */
> > > > + unsigned long *zeromap; /* vmalloc'ed bitmap to track zero pages */
> > >
> > > One bit per swap slot, so 1 / (4096 * 8) = 0.003% static memory
> > > overhead for configured swap space. That seems reasonable for what
> > > appears to be a fairly universal 10% reduction in swap IO.
> > >
> > > An alternative implementation would be to reserve a bit in
> > > swap_map. This would be no overhead at idle, but would force
> > > continuation counts earlier on heavily shared page tables, and AFAICS
> > > would get complicated in terms of locking, whereas this one is pretty
> > > simple (atomic ops protect the map, swapcache lock protects the bit).
> > >
> > > So I prefer this version. But a few comments below:
> >
> > I am wondering if it's even possible to take this one step further and
> > avoid reclaiming zero-filled pages in the first place. Can we just
> > unmap them and let the first read fault allocate a zero'd page like
> > uninitialized memory, or point them at the zero page and make them
> > read-only, or something? Then we could free them directly without
> > going into the swap code to begin with.
> >
> > That's how I thought about it initially when I attempted to support
> > only zero-filled pages in zswap. It could be a more complex
> > implementation though.
>
> We can aim for this eventually, but yeah the implementation will be
> more complex. We'll need to be careful in handling shared zero pages,
> synchronizing accesses and maintaining reference counts. I think we
> will need to special-case swap cache and swap map for these zero pages
> (a ghost zero swap device perhaps), or reinvent the wheel to manage
> these pieces of information.
Isn't there an existing mechanism to have read-only mappings pointing
at the shared zero page, and do COW? Can't we just use that?
I think this is already what we do for mapped areas that were never
written in some cases (see do_anonymous_page()), so it would be just
like that (i.e. as if the mappings were never written). Someone with
more familiarity with this would know better though.
>
> Not impossible, but annoying :) For now, I think Usama's approach is
> clean enough and does the job.
Yeah, I am not against Usama's approach at all. I just want us to
consider both options before we commit to one. If they are close
enough in complexity, it may be worth avoiding swap completely.
next prev parent reply other threads:[~2024-05-30 19:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 10:19 [PATCH 0/2] " Usama Arif
2024-05-30 10:19 ` [PATCH 1/2] " Usama Arif
2024-05-30 12:27 ` Johannes Weiner
2024-05-30 16:24 ` Yosry Ahmed
2024-05-30 19:18 ` Nhat Pham
2024-05-30 19:49 ` Yosry Ahmed [this message]
2024-05-30 20:04 ` Matthew Wilcox
2024-05-30 20:16 ` Yosry Ahmed
2024-05-31 18:18 ` Usama Arif
2024-05-30 16:20 ` Yosry Ahmed
2024-05-30 19:58 ` Andrew Morton
2024-05-30 10:19 ` [PATCH 2/2] mm: remove code to handle same filled pages Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkYDjmMLnH_2sQuuMLE0FE5YqZEppNsprCnm5RdaSkGEBQ@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=usamaarif642@gmail.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox