linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Folio mapcount
@ 2023-01-24 18:13 Matthew Wilcox
  2023-01-24 18:35 ` David Hildenbrand
                   ` (4 more replies)
  0 siblings, 5 replies; 52+ messages in thread
From: Matthew Wilcox @ 2023-01-24 18:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Vishal Moola, Hugh Dickins, Rik van Riel, David Hildenbrand, Yin,
	Fengwei

Once we get to the part of the folio journey where we have 
one-pointer-per-page, we can't afford to maintain per-page state.
Currently we maintain a per-page mapcount, and that will have to go. 
We can maintain extra state for a multi-page folio, but it has to be a
constant amount of extra state no matter how many pages are in the folio.

My proposal is that we maintain a single mapcount per folio, and its
definition is the number of (vma, page table) tuples which have a
reference to any pages in this folio.

I think there's a good performance win and simplification to be had
here, so I think it's worth doing for 6.4.

Examples
--------

In the simple and common case where every page in a folio is mapped
once by a single vma and single page table, mapcount would be 1 [1].
If the folio is mapped across a page table boundary by a single VMA,
after we take a page fault on it in one page table, it gets a mapcount
of 1.  After taking a page fault on it in the other page table, its
mapcount increases to 2.

For a PMD-sized THP naturally aligned, mapcount is 1.  Splitting the 
PMD into PTEs would not change the mapcount; the folio remains order-9
but it stll has a reference from only one page table (a different page
table, but still just one).

Implementation sketch
---------------------

When we take a page fault, we can/should map every page in the folio
that fits in this VMA and this page table.  We do this at present in
filemap_map_pages() by looping over each page in the folio and calling
do_set_pte() on each.  We should have a:

                do_set_pte_range(vmf, folio, addr, first_page, n);

and then change the API to page_add_new_anon_rmap() / page_add_file_rmap()
to pass in (folio, first, n) instead of page.  That gives us one call to
page_add_*_rmap() per (vma, page table) tuple.

In try_to_unmap_one(), page_vma_mapped_walk() currently calls us for
each pfn.  We'll want a function like
        page_vma_mapped_walk_skip_to_end_of_ptable()
in order to persuade it to only call us once or twice if the folio
is mapped across a page table boundary.

Concerns
--------

We'll have to be careful to always zap all the PTEs for a given (vma,
pt) tuple at the same time, otherwise mapcount will get out of sync
(eg map three pages, unmap two; we shouldn't decrement the mapcount,
but I don't think we can know that.  But does this ever happen?  I think
we always unmap the entire folio, like in try_to_unmap_one().

I haven't got my head around SetPageAnonExclusive() yet.  I think it can
be a per-folio bit, but handling a folio split across two page tables
may be tricky.

Notes
-----

[1] Ignoring the bias by -1 to let us detect transitions that we care
about more efficiently; I'm talking about the value returned from
page_mapcount(), not the value stored in page->_mapcount.



^ permalink raw reply	[flat|nested] 52+ messages in thread
* folio mapcount
@ 2021-12-15 21:55 Matthew Wilcox
  2021-12-16  9:37 ` Kirill A. Shutemov
  0 siblings, 1 reply; 52+ messages in thread
From: Matthew Wilcox @ 2021-12-15 21:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Hugh Dickins, David Hildenbrand, Mike Kravetz

I've been trying to understand whether we can simplify the mapcount
handling for folios from the current situation with THPs.  Let me
quote the commit message from 53f9263baba6:

> mm: rework mapcount accounting to enable 4k mapping of THPs
>
> We're going to allow mapping of individual 4k pages of THP compound.  It
> means we need to track mapcount on per small page basis.
>
> Straight-forward approach is to use ->_mapcount in all subpages to track
> how many time this subpage is mapped with PMDs or PTEs combined.  But
> this is rather expensive: mapping or unmapping of a THP page with PMD
> would require HPAGE_PMD_NR atomic operations instead of single we have
> now.
>
> The idea is to store separately how many times the page was mapped as
> whole -- compound_mapcount.  This frees up ->_mapcount in subpages to
> track PTE mapcount.
>
> We use the same approach as with compound page destructor and compound
> order to store compound_mapcount: use space in first tail page,
> ->mapping this time.
>
> Any time we map/unmap whole compound page (THP or hugetlb) -- we
> increment/decrement compound_mapcount.  When we map part of compound
> page with PTE we operate on ->_mapcount of the subpage.
>
> page_mapcount() counts both: PTE and PMD mappings of the page.
>
> Basically, we have mapcount for a subpage spread over two counters.  It
> makes tricky to detect when last mapcount for a page goes away.
>
> We introduced PageDoubleMap() for this.  When we split THP PMD for the
> first time and there's other PMD mapping left we offset up ->_mapcount
> in all subpages by one and set PG_double_map on the compound page.
> These additional references go away with last compound_mapcount.
>
> This approach provides a way to detect when last mapcount goes away on
> per small page basis without introducing new overhead for most common
> cases.

What breaks if we simply track any mapping (whether by PMD or PTE)
as an increment to the head page (aka folio's) refcount?

Essentially, we make the head mapcount 'the number of VMAs which contain
a reference to any page in this folio'.  We can remove PageDoubleMap.
The tail refcounts will all be 0.  If it's useful, we could introduce a
'partial_mapcount' which would be <= mapcount (but I don't know if it's
useful).  Splitting a PMD would not change ->_mapcount.  Splitting the
folio already causes the folio to be unmapped, so page faults will
naturally re-increment ->_mapcount of each subpage.

We might need some additional logic to treat a large folio (aka compound
page) as a single unit; that is, when we fault on one page, we place
entries for all pages in this folio (that fit ...) into the page tables,
so that we only account it once, even if it's not compatible with using
a PMD.


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-07-04  2:26 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24 18:13 Folio mapcount Matthew Wilcox
2023-01-24 18:35 ` David Hildenbrand
2023-01-24 18:37   ` David Hildenbrand
2023-01-24 18:35 ` Yang Shi
2023-02-02  3:45 ` Mike Kravetz
2023-02-02 15:31   ` Matthew Wilcox
2023-02-07 16:19     ` Zi Yan
2023-02-07 16:44       ` Matthew Wilcox
2023-02-06 20:34 ` Matthew Wilcox
2023-02-06 22:55   ` Yang Shi
2023-02-06 23:09     ` Matthew Wilcox
2023-02-07  3:06   ` Yin, Fengwei
2023-02-07  4:08     ` Matthew Wilcox
2023-02-07 22:39   ` Peter Xu
2023-02-07 23:27     ` Matthew Wilcox
2023-02-08 19:40       ` Peter Xu
2023-02-08 20:25         ` Matthew Wilcox
2023-02-08 20:58           ` Peter Xu
2023-02-09 15:10             ` Chih-En Lin
2023-02-09 15:43               ` Peter Xu
2023-02-07 22:56   ` James Houghton
2023-02-07 23:08     ` Matthew Wilcox
2023-02-07 23:27       ` James Houghton
2023-02-07 23:35         ` Matthew Wilcox
2023-02-08  0:35           ` James Houghton
2023-02-08  2:26             ` Matthew Wilcox
2023-02-07 16:23 ` Zi Yan
2023-02-07 16:51   ` Matthew Wilcox
2023-02-08 19:36     ` Zi Yan
2023-02-08 19:54       ` Matthew Wilcox
2023-02-10 15:15         ` Zi Yan
2023-03-29 14:02         ` Yin, Fengwei
2023-07-01  1:17           ` Zi Yan
2023-07-02  9:50             ` Yin, Fengwei
2023-07-02 11:45               ` David Hildenbrand
2023-07-02 12:26                 ` Matthew Wilcox
2023-07-03 20:54                   ` David Hildenbrand
2023-07-02 19:51                 ` Zi Yan
2023-07-03  1:09                   ` Yin, Fengwei
2023-07-03 13:24                     ` Zi Yan
2023-07-03 20:46                       ` David Hildenbrand
2023-07-04  1:22                       ` Yin, Fengwei
2023-07-04  2:25                         ` Matthew Wilcox
2023-07-03 21:09                   ` David Hildenbrand
  -- strict thread matches above, loose matches on Subject: below --
2021-12-15 21:55 folio mapcount Matthew Wilcox
2021-12-16  9:37 ` Kirill A. Shutemov
2021-12-16 13:56   ` Matthew Wilcox
2021-12-16 15:19     ` Jason Gunthorpe
2021-12-16 15:54       ` Matthew Wilcox
2021-12-16 16:45         ` David Hildenbrand
2021-12-16 17:01         ` Jason Gunthorpe
2021-12-16 18:56     ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox