From: John Hubbard <jhubbard@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>, linux-mm@kvack.org
Subject: Re: A two-bit folio_mapcount
Date: Thu, 27 Jan 2022 19:05:35 -0800 [thread overview]
Message-ID: <229a8846-a413-43c1-47dd-dfbb649db7df@nvidia.com> (raw)
In-Reply-To: <YfMVMYmQcT9cJ9Tr@casper.infradead.org>
On 1/27/22 13:57, Matthew Wilcox wrote:
> As promised, here's a half-baked proposal for making folio_mapcount()
> significantly cheaper at the cost of making it less precise.
> I appreciate that folio_mapcount() is not upstream yet, so take a look
> at total_mapcount() if you want to understand what I'm talking about.
>
> For a 2MB folio on a 4k architecture, you have to check 512 cachelines
> to determine how many times a folio is mapped. That's 32kB of memory,
> which is a good chunk of your L1 cache. The problem is that every PTE
> mapping increments the ->mapcount of each individual page (and the number
> of PMD mappings is stored separately). To find out how many times the
> entire folio is mapped, you've got to look at each constituent page.
>
> Added to that, each increment of any of the ->mapcount bumps the
> refcount on the head page. That's a lot of atomic ops, and we've had
> some problems where the page refcount has been attacked resulting in
> overflow.
>
> I would like to start counting folio mapcounts in a more Discworld Troll
> manner. Zero, One, Two, Many. That limits the total number of refcount
> increments to 3. Once you reach "Many", you've essentially lost count,
> and you need to walk the interval tree to figure out exactly how many
> mappings there are (this means we can no longer use mapcount to decide to
> stop walking the rmap, but I think that's OK?) You can decrement from
> Two to One and One to Zero, but you can't decrement from Many to Two.
> If you walk the rmap and discover there are less than Many mappings,
> you can set mapcount to Two, One or Zero (adjusting page refcount at
> the same time).
>
> The mapcount would also no longer count the number of individual PTE or
> PMD mappings. Instead, it would be the number of VMAs which contain at
> least one page table reference to this folio.
>
> One advantage to this scheme is that it makes something like 30 bits
> available in struct page. I'm sure we'll be able to think of some good
> uses for them. PageDoubleMap also goes away (because we no longer care
Such as upgrading from:
page_maybe_dma_pinned(),
to:
oh_yes_page_is_most_definitely_dma_pinned() ! :)
...I just can't let that idea go. haha.
thanks,
--
John Hubbard
NVIDIA
> whether the folio is mapped with PMDs or PTEs).
>
> So ... what's going to be made catastrophically slower by this scheme?
> Maybe something involving anonymous pages? Those tend to be my blind
> spot.
>
prev parent reply other threads:[~2022-01-28 3:05 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-27 21:57 Matthew Wilcox
2022-01-28 3:05 ` John Hubbard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=229a8846-a413-43c1-47dd-dfbb649db7df@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox