Re: A two-bit folio_mapcount

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>, linux-mm@kvack.org
Subject: Re: A two-bit folio_mapcount
Date: Thu, 27 Jan 2022 19:05:35 -0800	[thread overview]
Message-ID: <229a8846-a413-43c1-47dd-dfbb649db7df@nvidia.com> (raw)
In-Reply-To: <YfMVMYmQcT9cJ9Tr@casper.infradead.org>

On 1/27/22 13:57, Matthew Wilcox wrote:
> As promised, here's a half-baked proposal for making folio_mapcount()
> significantly cheaper at the cost of making it less precise.
> I appreciate that folio_mapcount() is not upstream yet, so take a look
> at total_mapcount() if you want to understand what I'm talking about.
> 
> For a 2MB folio on a 4k architecture, you have to check 512 cachelines
> to determine how many times a folio is mapped.  That's 32kB of memory,
> which is a good chunk of your L1 cache.  The problem is that every PTE
> mapping increments the ->mapcount of each individual page (and the number
> of PMD mappings is stored separately).  To find out how many times the
> entire folio is mapped, you've got to look at each constituent page.
> 
> Added to that, each increment of any of the ->mapcount bumps the
> refcount on the head page.  That's a lot of atomic ops, and we've had
> some problems where the page refcount has been attacked resulting in
> overflow.
> 
> I would like to start counting folio mapcounts in a more Discworld Troll
> manner.  Zero, One, Two, Many.  That limits the total number of refcount
> increments to 3.  Once you reach "Many", you've essentially lost count,
> and you need to walk the interval tree to figure out exactly how many
> mappings there are (this means we can no longer use mapcount to decide to
> stop walking the rmap, but I think that's OK?)  You can decrement from
> Two to One and One to Zero, but you can't decrement from Many to Two.
> If you walk the rmap and discover there are less than Many mappings,
> you can set mapcount to Two, One or Zero (adjusting page refcount at
> the same time).
> 
> The mapcount would also no longer count the number of individual PTE or
> PMD mappings.  Instead, it would be the number of VMAs which contain at
> least one page table reference to this folio.
> 
> One advantage to this scheme is that it makes something like 30 bits
> available in struct page.  I'm sure we'll be able to think of some good
> uses for them.  PageDoubleMap also goes away (because we no longer care

Such as upgrading from:
	page_maybe_dma_pinned(),
to:
	oh_yes_page_is_most_definitely_dma_pinned() !  :)

...I just can't let that idea go. haha.

thanks,
-- 
John Hubbard
NVIDIA

> whether the folio is mapped with PMDs or PTEs).
> 
> So ... what's going to be made catastrophically slower by this scheme?
> Maybe something involving anonymous pages?  Those tend to be my blind
> spot.
>

     prev parent reply	other threads:[~2022-01-28  3:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27 21:57 Matthew Wilcox
2022-01-28  3:05 ` John Hubbard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=229a8846-a413-43c1-47dd-dfbb649db7df@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox