* [PATCH v2 0/2] mm: remove total_mapcount() @ 2024-02-26 14:13 David Hildenbrand 2024-02-26 14:13 ` [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() David Hildenbrand 2024-02-26 14:13 ` [PATCH v2 2/2] mm: remove total_mapcount() David Hildenbrand 0 siblings, 2 replies; 7+ messages in thread From: David Hildenbrand @ 2024-02-26 14:13 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm, David Hildenbrand, Andrew Morton, Matthew Wilcox Let's remove the remaining user from mm/memfd.c so we can get rid of total_mapcount(). v1 -> v2: * Adjust mm/memfd.c separately, and as suggested by Willy, clean it up properly. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> David Hildenbrand (2): mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() mm: remove total_mapcount() include/linux/mm.h | 9 +-------- mm/memfd.c | 47 ++++++++++++++++++---------------------------- 2 files changed, 19 insertions(+), 37 deletions(-) -- 2.43.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-02-26 14:13 [PATCH v2 0/2] mm: remove total_mapcount() David Hildenbrand @ 2024-02-26 14:13 ` David Hildenbrand 2024-02-26 16:07 ` Matthew Wilcox 2024-02-26 14:13 ` [PATCH v2 2/2] mm: remove total_mapcount() David Hildenbrand 1 sibling, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2024-02-26 14:13 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm, David Hildenbrand, Andrew Morton, Matthew Wilcox Both functions are the remaining users of total_mapcount(). Let's get rid of the calls by converting the code to folios. As it turns out, the code is unnecessarily complicated, especially: 1) We can query the number of pagecache references for a folio simply via folio_nr_pages(). This will handle other folio sizes in the future correctly. 2) The xas_set(xas, page->index + cache_count) call to increment the iterator for large folios is not required. Remove it. Further, simplify the XA_CHECK_SCHED check, counting each entry exactly once. Memfd pages can be swapped out when using shmem; leave xa_is_value() checks in place. Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/memfd.c | 47 ++++++++++++++++++----------------------------- 1 file changed, 18 insertions(+), 29 deletions(-) diff --git a/mm/memfd.c b/mm/memfd.c index d3a1ba4208c90..7d8d3ab3fa378 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -29,29 +29,25 @@ #define MEMFD_TAG_PINNED PAGECACHE_TAG_TOWRITE #define LAST_SCAN 4 /* about 150ms max */ +static bool memfd_folio_has_extra_refs(struct folio *folio) +{ + return folio_ref_count(folio) - folio_mapcount(folio) != + folio_nr_pages(folio); +} + static void memfd_tag_pins(struct xa_state *xas) { - struct page *page; + struct folio *folio; int latency = 0; - int cache_count; lru_add_drain(); xas_lock_irq(xas); - xas_for_each(xas, page, ULONG_MAX) { - cache_count = 1; - if (!xa_is_value(page) && - PageTransHuge(page) && !PageHuge(page)) - cache_count = HPAGE_PMD_NR; - - if (!xa_is_value(page) && - page_count(page) - total_mapcount(page) != cache_count) + xas_for_each(xas, folio, ULONG_MAX) { + if (!xa_is_value(folio) && memfd_folio_has_extra_refs(folio)) xas_set_mark(xas, MEMFD_TAG_PINNED); - if (cache_count != 1) - xas_set(xas, page->index + cache_count); - latency += cache_count; - if (latency < XA_CHECK_SCHED) + if (++latency < XA_CHECK_SCHED) continue; latency = 0; @@ -66,16 +62,16 @@ static void memfd_tag_pins(struct xa_state *xas) /* * Setting SEAL_WRITE requires us to verify there's no pending writer. However, * via get_user_pages(), drivers might have some pending I/O without any active - * user-space mappings (eg., direct-IO, AIO). Therefore, we look at all pages + * user-space mappings (eg., direct-IO, AIO). Therefore, we look at all folios * and see whether it has an elevated ref-count. If so, we tag them and wait for * them to be dropped. * The caller must guarantee that no new user will acquire writable references - * to those pages to avoid races. + * to those folios to avoid races. */ static int memfd_wait_for_pins(struct address_space *mapping) { XA_STATE(xas, &mapping->i_pages, 0); - struct page *page; + struct folio *folio; int error, scan; memfd_tag_pins(&xas); @@ -83,7 +79,6 @@ static int memfd_wait_for_pins(struct address_space *mapping) error = 0; for (scan = 0; scan <= LAST_SCAN; scan++) { int latency = 0; - int cache_count; if (!xas_marked(&xas, MEMFD_TAG_PINNED)) break; @@ -95,20 +90,15 @@ static int memfd_wait_for_pins(struct address_space *mapping) xas_set(&xas, 0); xas_lock_irq(&xas); - xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { + xas_for_each_marked(&xas, folio, ULONG_MAX, MEMFD_TAG_PINNED) { bool clear = true; - cache_count = 1; - if (!xa_is_value(page) && - PageTransHuge(page) && !PageHuge(page)) - cache_count = HPAGE_PMD_NR; - - if (!xa_is_value(page) && cache_count != - page_count(page) - total_mapcount(page)) { + if (!xa_is_value(folio) && + memfd_folio_has_extra_refs(folio)) { /* * On the last scan, we clean up all those tags * we inserted; but make a note that we still - * found pages pinned. + * found folios pinned. */ if (scan == LAST_SCAN) error = -EBUSY; @@ -118,8 +108,7 @@ static int memfd_wait_for_pins(struct address_space *mapping) if (clear) xas_clear_mark(&xas, MEMFD_TAG_PINNED); - latency += cache_count; - if (latency < XA_CHECK_SCHED) + if (++latency < XA_CHECK_SCHED) continue; latency = 0; -- 2.43.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-02-26 14:13 ` [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() David Hildenbrand @ 2024-02-26 16:07 ` Matthew Wilcox 2024-02-26 16:56 ` David Hildenbrand 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2024-02-26 16:07 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton On Mon, Feb 26, 2024 at 03:13:23PM +0100, David Hildenbrand wrote: > + xas_for_each(xas, folio, ULONG_MAX) { > + if (!xa_is_value(folio) && memfd_folio_has_extra_refs(folio)) > xas_set_mark(xas, MEMFD_TAG_PINNED); ... we decline to tag value entries here ... > @@ -95,20 +90,15 @@ static int memfd_wait_for_pins(struct address_space *mapping) > > xas_set(&xas, 0); > xas_lock_irq(&xas); > - xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { > + xas_for_each_marked(&xas, folio, ULONG_MAX, MEMFD_TAG_PINNED) { > bool clear = true; > > - cache_count = 1; > - if (!xa_is_value(page) && > - PageTransHuge(page) && !PageHuge(page)) > - cache_count = HPAGE_PMD_NR; > - > - if (!xa_is_value(page) && cache_count != > - page_count(page) - total_mapcount(page)) { > + if (!xa_is_value(folio) && > + memfd_folio_has_extra_refs(folio)) { ... so we don't need to test it here because we'll never see any value entries. No? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-02-26 16:07 ` Matthew Wilcox @ 2024-02-26 16:56 ` David Hildenbrand 2024-02-27 15:27 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2024-02-26 16:56 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-kernel, linux-mm, Andrew Morton On 26.02.24 17:07, Matthew Wilcox wrote: > On Mon, Feb 26, 2024 at 03:13:23PM +0100, David Hildenbrand wrote: >> + xas_for_each(xas, folio, ULONG_MAX) { >> + if (!xa_is_value(folio) && memfd_folio_has_extra_refs(folio)) >> xas_set_mark(xas, MEMFD_TAG_PINNED); > > ... we decline to tag value entries here ... > >> @@ -95,20 +90,15 @@ static int memfd_wait_for_pins(struct address_space *mapping) >> >> xas_set(&xas, 0); >> xas_lock_irq(&xas); >> - xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { >> + xas_for_each_marked(&xas, folio, ULONG_MAX, MEMFD_TAG_PINNED) { >> bool clear = true; >> >> - cache_count = 1; >> - if (!xa_is_value(page) && >> - PageTransHuge(page) && !PageHuge(page)) >> - cache_count = HPAGE_PMD_NR; >> - >> - if (!xa_is_value(page) && cache_count != >> - page_count(page) - total_mapcount(page)) { >> + if (!xa_is_value(folio) && >> + memfd_folio_has_extra_refs(folio)) { > > ... so we don't need to test it here because we'll never see any value > entries. No? I was not able to convince myself that swapout code would clear the mark when replacing the entry. shmem_writepage()->shmem_delete_from_page_cache()->shmem_replace_entry() will perform a xas_store() with swp_to_radix_entry(swap) under xa_lock_irq(). Reading the doc, and staring at the code for a bit too long, I think xas_store() would only clear tags when deleting an entry (passing NULL). But maybe xas_store() will always clear tags? In memfd code, I think we could see swapout between memfd_tag_pins() and the check for tags, where we drop the xa_lock. Unless some other lock (inode lock?) protects us. Thanks! -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-02-26 16:56 ` David Hildenbrand @ 2024-02-27 15:27 ` Matthew Wilcox 0 siblings, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2024-02-27 15:27 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton On Mon, Feb 26, 2024 at 05:56:09PM +0100, David Hildenbrand wrote: > > > @@ -95,20 +90,15 @@ static int memfd_wait_for_pins(struct address_space *mapping) > > > xas_set(&xas, 0); > > > xas_lock_irq(&xas); > > > - xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { > > > + xas_for_each_marked(&xas, folio, ULONG_MAX, MEMFD_TAG_PINNED) { > > > bool clear = true; > > > - cache_count = 1; > > > - if (!xa_is_value(page) && > > > - PageTransHuge(page) && !PageHuge(page)) > > > - cache_count = HPAGE_PMD_NR; > > > - > > > - if (!xa_is_value(page) && cache_count != > > > - page_count(page) - total_mapcount(page)) { > > > + if (!xa_is_value(folio) && > > > + memfd_folio_has_extra_refs(folio)) { > > > > ... so we don't need to test it here because we'll never see any value > > entries. No? > > I was not able to convince myself that swapout code would clear the mark > when replacing the entry. > > shmem_writepage()->shmem_delete_from_page_cache()->shmem_replace_entry() > > will perform a xas_store() with swp_to_radix_entry(swap) under > xa_lock_irq(). > > Reading the doc, and staring at the code for a bit too long, I think > xas_store() would only clear tags when deleting an entry (passing NULL). > > But maybe xas_store() will always clear tags? No, xas_store() will leave the tag alone ... this is the right thing to do for the pagecache because we always clear the tags before removing a folio from the cache. > In memfd code, I think we could see swapout between memfd_tag_pins() and the > check for tags, where we drop the xa_lock. Unless some other lock (inode > lock?) protects us. ... and if it does happen, we see the value entry tagged and clear the tag on it. OK. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] mm: remove total_mapcount() 2024-02-26 14:13 [PATCH v2 0/2] mm: remove total_mapcount() David Hildenbrand 2024-02-26 14:13 ` [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() David Hildenbrand @ 2024-02-26 14:13 ` David Hildenbrand 2024-02-26 16:09 ` Matthew Wilcox 1 sibling, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2024-02-26 14:13 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm, David Hildenbrand, Andrew Morton, Matthew Wilcox All users of total_mapcount() are gone, let's remove it. Signed-off-by: David Hildenbrand <david@redhat.com> --- include/linux/mm.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f4825d829656..49e22a2f6cccc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1183,7 +1183,7 @@ static inline int is_vmalloc_or_module_addr(const void *x) * How many times the entire folio is mapped as a single unit (eg by a * PMD or PUD entry). This is probably not what you want, except for * debugging purposes - it does not include PTE-mapped sub-pages; look - * at folio_mapcount() or page_mapcount() or total_mapcount() instead. + * at folio_mapcount() or page_mapcount() instead. */ static inline int folio_entire_mapcount(struct folio *folio) { @@ -1243,13 +1243,6 @@ static inline int folio_mapcount(struct folio *folio) return folio_total_mapcount(folio); } -static inline int total_mapcount(struct page *page) -{ - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) + 1; - return folio_total_mapcount(page_folio(page)); -} - static inline bool folio_large_is_mapped(struct folio *folio) { /* -- 2.43.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] mm: remove total_mapcount() 2024-02-26 14:13 ` [PATCH v2 2/2] mm: remove total_mapcount() David Hildenbrand @ 2024-02-26 16:09 ` Matthew Wilcox 0 siblings, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2024-02-26 16:09 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton On Mon, Feb 26, 2024 at 03:13:24PM +0100, David Hildenbrand wrote: > All users of total_mapcount() are gone, let's remove it. > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-27 15:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-26 14:13 [PATCH v2 0/2] mm: remove total_mapcount() David Hildenbrand 2024-02-26 14:13 ` [PATCH v2 1/2] mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() David Hildenbrand 2024-02-26 16:07 ` Matthew Wilcox 2024-02-26 16:56 ` David Hildenbrand 2024-02-27 15:27 ` Matthew Wilcox 2024-02-26 14:13 ` [PATCH v2 2/2] mm: remove total_mapcount() David Hildenbrand 2024-02-26 16:09 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox