From: Matthew Wilcox <willy@infradead.org>
To: linux-mm@kvack.org
Subject: A two-bit folio_mapcount
Date: Thu, 27 Jan 2022 21:57:05 +0000 [thread overview]
Message-ID: <YfMVMYmQcT9cJ9Tr@casper.infradead.org> (raw)
As promised, here's a half-baked proposal for making folio_mapcount()
significantly cheaper at the cost of making it less precise.
I appreciate that folio_mapcount() is not upstream yet, so take a look
at total_mapcount() if you want to understand what I'm talking about.
For a 2MB folio on a 4k architecture, you have to check 512 cachelines
to determine how many times a folio is mapped. That's 32kB of memory,
which is a good chunk of your L1 cache. The problem is that every PTE
mapping increments the ->mapcount of each individual page (and the number
of PMD mappings is stored separately). To find out how many times the
entire folio is mapped, you've got to look at each constituent page.
Added to that, each increment of any of the ->mapcount bumps the
refcount on the head page. That's a lot of atomic ops, and we've had
some problems where the page refcount has been attacked resulting in
overflow.
I would like to start counting folio mapcounts in a more Discworld Troll
manner. Zero, One, Two, Many. That limits the total number of refcount
increments to 3. Once you reach "Many", you've essentially lost count,
and you need to walk the interval tree to figure out exactly how many
mappings there are (this means we can no longer use mapcount to decide to
stop walking the rmap, but I think that's OK?) You can decrement from
Two to One and One to Zero, but you can't decrement from Many to Two.
If you walk the rmap and discover there are less than Many mappings,
you can set mapcount to Two, One or Zero (adjusting page refcount at
the same time).
The mapcount would also no longer count the number of individual PTE or
PMD mappings. Instead, it would be the number of VMAs which contain at
least one page table reference to this folio.
One advantage to this scheme is that it makes something like 30 bits
available in struct page. I'm sure we'll be able to think of some good
uses for them. PageDoubleMap also goes away (because we no longer care
whether the folio is mapped with PMDs or PTEs).
So ... what's going to be made catastrophically slower by this scheme?
Maybe something involving anonymous pages? Those tend to be my blind
spot.
next reply other threads:[~2022-01-27 21:57 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-27 21:57 Matthew Wilcox [this message]
2022-01-28 3:05 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YfMVMYmQcT9cJ9Tr@casper.infradead.org \
--to=willy@infradead.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox