From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Yin, Fengwei" <fengwei.yin@intel.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, Vishal Moola <vishal.moola@gmail.com>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@surriel.com>
Subject: Re: Folio mapcount
Date: Mon, 3 Jul 2023 23:09:10 +0200 [thread overview]
Message-ID: <b1e13544-8ebb-f227-cacb-7f8ec65c1915@redhat.com> (raw)
In-Reply-To: <CDE63894-8530-48EF-B72D-F8050433217D@nvidia.com>
On 02.07.23 21:51, Zi Yan wrote:
> On 2 Jul 2023, at 7:45, David Hildenbrand wrote:
>
>> On 02.07.23 11:50, Yin, Fengwei wrote:
>>>
>>>
>>> On 7/1/2023 9:17 AM, Zi Yan wrote:
>>>> In kernel, almost all code only cares: 1) if a page/folio has extra pins
>>>> by checking if mapcount is equal to refcount + extra, and 2)
>>>> if a page/folio is mapped multiple times. A single mapcount can meet
>>>> these two needs.
>>> For 2, how can we know whether a page/folio is mapped multiple times from
>>> single mapcount? My understanding is we need two counts as folio could be
>>> partial mapped.
>>
>> Yes, a single mapcount is most probably insufficient. I started analyzing all existing users and use cases, trying to avoid walking page tables.
>
> From my understanding, a single mapcount is sufficient for kernel users, which
> calls page_mapcount(). Because they either check mapcount against refcount to
> see if a page has extra pin or check mapcount to see if a page is mapped more
> than once.
>
There are cases where we want to know "do we have PTE mappings", but I
yet have to write it all up.
>>
>> If we want to get rid of all of (most) sub-page mapcounts, we'd probably want:
>>
>> (1) Total mapcount (compound + any sub-page): page_mapped(), pagecount
>> vs. refcount games, ...
>
> a single mapcount is sufficient in this case.
Well, that's what I describe here: 1) covers exactly these cases.
>
>>
>> (2) Compound mapcount (for PMD/PUD-mappale THP only): (2) - (1) tells
>> you if it's only PMD mapped or also PTE-mapped. For example, for
>> statistics but also swapout code.
>
> For statistics, it is for NR_{ANON,FILE}_MAPPED and NR_ANON_THP. I wonder
> if we can use the number of anonymous/file pages and THPs instead, without
> caring about if it is mapped or not.
>
> For swapout, folio_entire_mapcount() is used to estimate if a THP is fully
> mapped or not. I wonder if we can get away with another estimation like
> total_mapcount() > folio_nr_pages().
What do we gain by that? Again, I don't see a reason to degrade current
state just by trying to achieve 1 mapcount when it really barely matter
if we have 2 or 3 instead. Right now we have 513 and with any larger
folios significantly more ... than 2 or 3.
>
>>
>> (3) Mapcount of first (or any other) subpage (compount+subpage): for
>> folio_estimated_sharers().
>
> This is another estimation. I wonder if we can use a different estimation
> like total_mapcount() > folio_nr_pages() instead.
At least not for PMD-mapped THP. Maybe we could do with (2). But I
recall some cases where it got ugly, will try to remember them.
>
>>
>> For anon pages, I'm thinking about remembering an additional
>>
>> (1) Page/folio creator (MM pointer/identification)
>> (2) Page/folio creator mapcount
>>
>> When optimizing a PTE-mapped THP (especially not- pmd-mappale) for the fork()+exec() case, we'd have to walk page tables to see if all folio references come from this MM. The page/folio creator exactly avoids that completely. We might need a mechanism to synchronize against mapping/unmapping of this folio from the creator concurrently (relevant when mapped into multiple page tables).
>
> creator_mapcount < total_mapcount means multiple MMs map this folio? And this is for
> page exclusive check? Sorry I have not checked the code in detail yet. The sync
Right now we essentially do if !PageAnonExlusive:
if (page_count() != 1)
copy
reuse
to see if we really hold the only reference to that folio.
If we could stabilize the creators mapcount, it would be something like
if (f->creator != mm || page_count(f) != f->creators_mapcount)
copy
reuse
So we wouldn't have to scan page tables to identify if we're resonsible
for all of the page references via our page tables.
But that's so far only an idea I had when thinking about how to avoid
page table scans for the simple fork+exec() case, not matured yet.
> of creator_mapcount with total_mapcount might have some extra cost. I wonder if
> this can be solved by checked num_active_vmas in anon_vma of a folio.
As we nowadays match the actual references (i.e., page_count() != 1),
that's most probably insufficient and what I recall, easily less precise.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-07-03 21:09 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-24 18:13 Matthew Wilcox
2023-01-24 18:35 ` David Hildenbrand
2023-01-24 18:37 ` David Hildenbrand
2023-01-24 18:35 ` Yang Shi
2023-02-02 3:45 ` Mike Kravetz
2023-02-02 15:31 ` Matthew Wilcox
2023-02-07 16:19 ` Zi Yan
2023-02-07 16:44 ` Matthew Wilcox
2023-02-06 20:34 ` Matthew Wilcox
2023-02-06 22:55 ` Yang Shi
2023-02-06 23:09 ` Matthew Wilcox
2023-02-07 3:06 ` Yin, Fengwei
2023-02-07 4:08 ` Matthew Wilcox
2023-02-07 22:39 ` Peter Xu
2023-02-07 23:27 ` Matthew Wilcox
2023-02-08 19:40 ` Peter Xu
2023-02-08 20:25 ` Matthew Wilcox
2023-02-08 20:58 ` Peter Xu
2023-02-09 15:10 ` Chih-En Lin
2023-02-09 15:43 ` Peter Xu
2023-02-07 22:56 ` James Houghton
2023-02-07 23:08 ` Matthew Wilcox
2023-02-07 23:27 ` James Houghton
2023-02-07 23:35 ` Matthew Wilcox
2023-02-08 0:35 ` James Houghton
2023-02-08 2:26 ` Matthew Wilcox
2023-02-07 16:23 ` Zi Yan
2023-02-07 16:51 ` Matthew Wilcox
2023-02-08 19:36 ` Zi Yan
2023-02-08 19:54 ` Matthew Wilcox
2023-02-10 15:15 ` Zi Yan
2023-03-29 14:02 ` Yin, Fengwei
2023-07-01 1:17 ` Zi Yan
2023-07-02 9:50 ` Yin, Fengwei
2023-07-02 11:45 ` David Hildenbrand
2023-07-02 12:26 ` Matthew Wilcox
2023-07-03 20:54 ` David Hildenbrand
2023-07-02 19:51 ` Zi Yan
2023-07-03 1:09 ` Yin, Fengwei
2023-07-03 13:24 ` Zi Yan
2023-07-03 20:46 ` David Hildenbrand
2023-07-04 1:22 ` Yin, Fengwei
2023-07-04 2:25 ` Matthew Wilcox
2023-07-03 21:09 ` David Hildenbrand [this message]
-- strict thread matches above, loose matches on Subject: below --
2021-12-15 21:55 folio mapcount Matthew Wilcox
2021-12-16 9:37 ` Kirill A. Shutemov
2021-12-16 13:56 ` Matthew Wilcox
2021-12-16 15:19 ` Jason Gunthorpe
2021-12-16 15:54 ` Matthew Wilcox
2021-12-16 16:45 ` David Hildenbrand
2021-12-16 17:01 ` Jason Gunthorpe
2021-12-16 18:56 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1e13544-8ebb-f227-cacb-7f8ec65c1915@redhat.com \
--to=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
--cc=vishal.moola@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox