* [PATCH v3 0/2] Minimize xa_node allocation during xarry split
@ 2025-02-26 21:08 Zi Yan
2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw)
To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel
Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin,
linux-kernel, Zi Yan
Hi all,
When splitting a multi-index entry in XArray from order-n to order-m,
existing xas_split_alloc()+xas_split() approach requires
2^(n % XA_CHUNK_SHIFT) xa_node allocations. But its callers,
__filemap_add_folio() and shmem_split_large_entry(), use at most 1 xa_node.
To minimize xa_node allocation and remove the limitation of no split from
order-12 (or above) to order-0 (or anything between 0 and 5)[1],
xas_try_split() was added[2], which allocates
(n / XA_CHUNK_SHIFT - m / XA_CHUNK_SHIFT) xa_node. It is used
for non-uniform folio split, but can be used by __filemap_add_folio()
and shmem_split_large_entry().
xas_split_alloc() and xas_split() split an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
| | | |
------- --- --- -------
| | ... | |
V V V V
----------- ----------- ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
|
|
V
-----------
| xa_node |
-----------
xas_try_split() is designed to be called iteratively with n = m + 1.
xas_try_split_mini_order() is added to minmize the number of calls to
xas_try_split() by telling the caller the next minimal order to split to
instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT
does not require xa_node allocation and requires 1 xa_node
when n=l * XA_CHUNK_SHIFT and m = n - 1, so it is OK to use
xas_try_split() with n > m + 1 when no new xa_node is needed.
xfstests quick group test passed on xfs and tmpfs.
It is on top of Buddy allocator like (or non-uniform)
folio split V9[2], which is on top of mm-everything-2025-02-26-03-56.
Changelog
===
From V2[3]:
1. Fixed shmem_split_large_entry() by setting swap offset correct.
(Thank Baolin for the detailed review)
2. Used updated xas_try_split() to avoid a bug when xa_node is allocated
by xas_nomem() instead of xas_try_split() itself.
Let me know your comments.
[1] https://lore.kernel.org/linux-mm/Z6YX3RznGLUD07Ao@casper.infradead.org/
[2] https://lore.kernel.org/linux-mm/20250226210032.2044041-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20250218235444.1543173-1-ziy@nvidia.com/
Zi Yan (2):
mm/filemap: use xas_try_split() in __filemap_add_folio()
mm/shmem: use xas_try_split() in shmem_split_large_entry()
include/linux/xarray.h | 7 +++++
lib/xarray.c | 25 ++++++++++++++++++
mm/filemap.c | 45 +++++++++++++-------------------
mm/shmem.c | 59 ++++++++++++++++++++----------------------
4 files changed, 78 insertions(+), 58 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan @ 2025-02-26 21:08 ` Zi Yan 2025-03-08 18:14 ` SeongJae Park 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2 siblings, 1 reply; 9+ messages in thread From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw) To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao During __filemap_add_folio(), a shadow entry is covering n slots and a folio covers m slots with m < n is to be added. Instead of splitting all n slots, only the m slots covered by the folio need to be split and the remaining n-m shadow entries can be retained with orders ranging from m to n-1. This method only requires (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) new xa_nodes, compared to the original xas_split_alloc() + xas_split() one. For example, to insert an order-0 folio when an order-9 shadow entry is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of 8. xas_try_split_min_order() is introduced to reduce the number of calls to xas_try_split() during split. Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> --- include/linux/xarray.h | 7 +++++++ lib/xarray.c | 25 +++++++++++++++++++++++ mm/filemap.c | 45 +++++++++++++++++------------------------- 3 files changed, 50 insertions(+), 27 deletions(-) diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 4010195201c9..78eede109b1a 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); void xas_split(struct xa_state *, void *entry, unsigned int order); void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); +unsigned int xas_try_split_min_order(unsigned int order); #else static inline int xa_get_order(struct xarray *xa, unsigned long index) { @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, unsigned int order) { } + +static inline unsigned int xas_try_split_min_order(unsigned int order) +{ + return 0; +} + #endif /** diff --git a/lib/xarray.c b/lib/xarray.c index bc197c96d171..8067182d3e43 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) } EXPORT_SYMBOL_GPL(xas_split); +/** + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept + * @order: Current entry order. + * + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if + * no new xa_node is needed. This function provides the minimal order + * xas_try_split() supports. + * + * Return: the minimal order xas_try_split() supports + * + * Context: Any context. + * + */ +unsigned int xas_try_split_min_order(unsigned int order) +{ + if (order % XA_CHUNK_SHIFT == 0) + return order == 0 ? 0 : order - 1; + + return order - (order % XA_CHUNK_SHIFT); +} +EXPORT_SYMBOL_GPL(xas_try_split_min_order); + /** * xas_try_split() - Try to split a multi-index entry. * @xas: XArray operation state. @@ -1144,6 +1166,9 @@ EXPORT_SYMBOL_GPL(xas_split); * needed, the function will use GFP_NOWAIT to get one if xas->xa_alloc is * NULL. If more new xa_node are needed, the function gives EINVAL error. * + * NOTE: use xas_try_split_min_order() to get next split order instead of + * @order - 1 if you want to minmize xas_try_split() calls. + * * Context: Any context. The caller should hold the xa_lock. */ void xas_try_split(struct xa_state *xas, void *entry, unsigned int order) diff --git a/mm/filemap.c b/mm/filemap.c index 2b860b59a521..cfb49ed659a1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -857,11 +857,10 @@ EXPORT_SYMBOL_GPL(replace_page_cache_folio); noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { - XA_STATE(xas, &mapping->i_pages, index); - void *alloced_shadow = NULL; - int alloced_order = 0; + XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); bool huge; long nr; + unsigned int forder = folio_order(folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); @@ -870,7 +869,6 @@ noinline int __filemap_add_folio(struct address_space *mapping, mapping_set_update(&xas, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); - xas_set_order(&xas, index, folio_order(folio)); huge = folio_test_hugetlb(folio); nr = folio_nr_pages(folio); @@ -880,7 +878,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->index = xas.xa_index; for (;;) { - int order = -1, split_order = 0; + int order = -1; void *entry, *old = NULL; xas_lock_irq(&xas); @@ -898,21 +896,25 @@ noinline int __filemap_add_folio(struct address_space *mapping, order = xas_get_order(&xas); } - /* entry may have changed before we re-acquire the lock */ - if (alloced_order && (old != alloced_shadow || order != alloced_order)) { - xas_destroy(&xas); - alloced_order = 0; - } - if (old) { - if (order > 0 && order > folio_order(folio)) { + if (order > 0 && order > forder) { + unsigned int split_order = max(forder, + xas_try_split_min_order(order)); + /* How to handle large swap entries? */ BUG_ON(shmem_mapping(mapping)); - if (!alloced_order) { - split_order = order; - goto unlock; + + while (order > forder) { + xas_set_order(&xas, index, split_order); + xas_try_split(&xas, old, order); + if (xas_error(&xas)) + goto unlock; + order = split_order; + split_order = + max(xas_try_split_min_order( + split_order), + forder); } - xas_split(&xas, old, order); xas_reset(&xas); } if (shadowp) @@ -936,17 +938,6 @@ noinline int __filemap_add_folio(struct address_space *mapping, unlock: xas_unlock_irq(&xas); - /* split needed, alloc here and retry. */ - if (split_order) { - xas_split_alloc(&xas, old, split_order, gfp); - if (xas_error(&xas)) - goto error; - alloced_shadow = old; - alloced_order = split_order; - xas_reset(&xas); - continue; - } - if (!xas_nomem(&xas, gfp)) break; } -- 2.47.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan @ 2025-03-08 18:14 ` SeongJae Park 2025-03-08 18:32 ` Zi Yan 0 siblings, 1 reply; 9+ messages in thread From: SeongJae Park @ 2025-03-08 18:14 UTC (permalink / raw) To: Zi Yan Cc: SeongJae Park, Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao Hello, On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: > During __filemap_add_folio(), a shadow entry is covering n slots and a > folio covers m slots with m < n is to be added. Instead of splitting all > n slots, only the m slots covered by the folio need to be split and the > remaining n-m shadow entries can be retained with orders ranging from m to > n-1. This method only requires > > (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) > > new xa_nodes instead of > (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) > > new xa_nodes, compared to the original xas_split_alloc() + xas_split() > one. For example, to insert an order-0 folio when an order-9 shadow entry > is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of > 8. > > xas_try_split_min_order() is introduced to reduce the number of calls to > xas_try_split() during split. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Mattew Wilcox <willy@infradead.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Yang Shi <yang@os.amperecomputing.com> > Cc: Yu Zhao <yuzhao@google.com> > --- > include/linux/xarray.h | 7 +++++++ > lib/xarray.c | 25 +++++++++++++++++++++++ > mm/filemap.c | 45 +++++++++++++++++------------------------- > 3 files changed, 50 insertions(+), 27 deletions(-) > > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 4010195201c9..78eede109b1a 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); > void xas_split(struct xa_state *, void *entry, unsigned int order); > void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); > void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); > +unsigned int xas_try_split_min_order(unsigned int order); > #else > static inline int xa_get_order(struct xarray *xa, unsigned long index) > { > @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, > unsigned int order) > { > } > + > +static inline unsigned int xas_try_split_min_order(unsigned int order) > +{ > + return 0; > +} > + > #endif > > /** > diff --git a/lib/xarray.c b/lib/xarray.c > index bc197c96d171..8067182d3e43 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) > } > EXPORT_SYMBOL_GPL(xas_split); > > +/** > + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept > + * @order: Current entry order. > + * > + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if > + * no new xa_node is needed. This function provides the minimal order > + * xas_try_split() supports. > + * > + * Return: the minimal order xas_try_split() supports > + * > + * Context: Any context. > + * > + */ > +unsigned int xas_try_split_min_order(unsigned int order) > +{ > + if (order % XA_CHUNK_SHIFT == 0) > + return order == 0 ? 0 : order - 1; > + > + return order - (order % XA_CHUNK_SHIFT); > +} > +EXPORT_SYMBOL_GPL(xas_try_split_min_order); > + I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ 1251 | unsigned int xas_try_split_min_order(unsigned int order) | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from /linux/lib/xarray.c:13: /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) | ^~~~~~~~~~~~~~~~~~~~~~~ I think we should have the definition only when CONFIG_XARRAY_MULTI? Thanks, SJ [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:14 ` SeongJae Park @ 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 0 siblings, 2 replies; 9+ messages in thread From: Zi Yan @ 2025-03-08 18:32 UTC (permalink / raw) To: SeongJae Park, Andrew Morton Cc: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 8 Mar 2025, at 13:14, SeongJae Park wrote: > Hello, > > On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: > >> During __filemap_add_folio(), a shadow entry is covering n slots and a >> folio covers m slots with m < n is to be added. Instead of splitting all >> n slots, only the m slots covered by the folio need to be split and the >> remaining n-m shadow entries can be retained with orders ranging from m to >> n-1. This method only requires >> >> (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) >> >> new xa_nodes instead of >> (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) >> >> new xa_nodes, compared to the original xas_split_alloc() + xas_split() >> one. For example, to insert an order-0 folio when an order-9 shadow entry >> is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of >> 8. >> >> xas_try_split_min_order() is introduced to reduce the number of calls to >> xas_try_split() during split. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >> Cc: Hugh Dickins <hughd@google.com> >> Cc: Kairui Song <kasong@tencent.com> >> Cc: Miaohe Lin <linmiaohe@huawei.com> >> Cc: Mattew Wilcox <willy@infradead.org> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> >> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> >> Cc: Ryan Roberts <ryan.roberts@arm.com> >> Cc: Yang Shi <yang@os.amperecomputing.com> >> Cc: Yu Zhao <yuzhao@google.com> >> --- >> include/linux/xarray.h | 7 +++++++ >> lib/xarray.c | 25 +++++++++++++++++++++++ >> mm/filemap.c | 45 +++++++++++++++++------------------------- >> 3 files changed, 50 insertions(+), 27 deletions(-) >> >> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >> index 4010195201c9..78eede109b1a 100644 >> --- a/include/linux/xarray.h >> +++ b/include/linux/xarray.h >> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); >> void xas_split(struct xa_state *, void *entry, unsigned int order); >> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); >> +unsigned int xas_try_split_min_order(unsigned int order); >> #else >> static inline int xa_get_order(struct xarray *xa, unsigned long index) >> { >> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, >> unsigned int order) >> { >> } >> + >> +static inline unsigned int xas_try_split_min_order(unsigned int order) >> +{ >> + return 0; >> +} >> + >> #endif >> >> /** >> diff --git a/lib/xarray.c b/lib/xarray.c >> index bc197c96d171..8067182d3e43 100644 >> --- a/lib/xarray.c >> +++ b/lib/xarray.c >> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >> } >> EXPORT_SYMBOL_GPL(xas_split); >> >> +/** >> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept >> + * @order: Current entry order. >> + * >> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if >> + * no new xa_node is needed. This function provides the minimal order >> + * xas_try_split() supports. >> + * >> + * Return: the minimal order xas_try_split() supports >> + * >> + * Context: Any context. >> + * >> + */ >> +unsigned int xas_try_split_min_order(unsigned int order) >> +{ >> + if (order % XA_CHUNK_SHIFT == 0) >> + return order == 0 ? 0 : order - 1; >> + >> + return order - (order % XA_CHUNK_SHIFT); >> +} >> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); >> + > > I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. > > /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ > 1251 | unsigned int xas_try_split_min_order(unsigned int order) > | ^~~~~~~~~~~~~~~~~~~~~~~ > In file included from /linux/lib/xarray.c:13: > /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ > 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) > | ^~~~~~~~~~~~~~~~~~~~~~~ > > I think we should have the definition only when CONFIG_XARRAY_MULTI? I think it might be a merge issue, since my original patch[1] places xas_try_split_min_order() above xas_try_split(), both of which are in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 seems to move xas_try_split_min_order() below xas_try_split() and out of CONFIG_XARRAY_MULTI guard. [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:32 ` Zi Yan @ 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 1 sibling, 0 replies; 9+ messages in thread From: Zi Yan @ 2025-03-08 18:36 UTC (permalink / raw) To: SeongJae Park, Andrew Morton Cc: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 8 Mar 2025, at 13:32, Zi Yan wrote: > On 8 Mar 2025, at 13:14, SeongJae Park wrote: > >> Hello, >> >> On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: >> >>> During __filemap_add_folio(), a shadow entry is covering n slots and a >>> folio covers m slots with m < n is to be added. Instead of splitting all >>> n slots, only the m slots covered by the folio need to be split and the >>> remaining n-m shadow entries can be retained with orders ranging from m to >>> n-1. This method only requires >>> >>> (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) >>> >>> new xa_nodes instead of >>> (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) >>> >>> new xa_nodes, compared to the original xas_split_alloc() + xas_split() >>> one. For example, to insert an order-0 folio when an order-9 shadow entry >>> is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of >>> 8. >>> >>> xas_try_split_min_order() is introduced to reduce the number of calls to >>> xas_try_split() during split. >>> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >>> Cc: Hugh Dickins <hughd@google.com> >>> Cc: Kairui Song <kasong@tencent.com> >>> Cc: Miaohe Lin <linmiaohe@huawei.com> >>> Cc: Mattew Wilcox <willy@infradead.org> >>> Cc: David Hildenbrand <david@redhat.com> >>> Cc: John Hubbard <jhubbard@nvidia.com> >>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> >>> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> >>> Cc: Ryan Roberts <ryan.roberts@arm.com> >>> Cc: Yang Shi <yang@os.amperecomputing.com> >>> Cc: Yu Zhao <yuzhao@google.com> >>> --- >>> include/linux/xarray.h | 7 +++++++ >>> lib/xarray.c | 25 +++++++++++++++++++++++ >>> mm/filemap.c | 45 +++++++++++++++++------------------------- >>> 3 files changed, 50 insertions(+), 27 deletions(-) >>> >>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >>> index 4010195201c9..78eede109b1a 100644 >>> --- a/include/linux/xarray.h >>> +++ b/include/linux/xarray.h >>> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); >>> void xas_split(struct xa_state *, void *entry, unsigned int order); >>> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >>> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); >>> +unsigned int xas_try_split_min_order(unsigned int order); >>> #else >>> static inline int xa_get_order(struct xarray *xa, unsigned long index) >>> { >>> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, >>> unsigned int order) >>> { >>> } >>> + >>> +static inline unsigned int xas_try_split_min_order(unsigned int order) >>> +{ >>> + return 0; >>> +} >>> + >>> #endif >>> >>> /** >>> diff --git a/lib/xarray.c b/lib/xarray.c >>> index bc197c96d171..8067182d3e43 100644 >>> --- a/lib/xarray.c >>> +++ b/lib/xarray.c >>> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >>> } >>> EXPORT_SYMBOL_GPL(xas_split); >>> >>> +/** >>> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept >>> + * @order: Current entry order. >>> + * >>> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if >>> + * no new xa_node is needed. This function provides the minimal order >>> + * xas_try_split() supports. >>> + * >>> + * Return: the minimal order xas_try_split() supports >>> + * >>> + * Context: Any context. >>> + * >>> + */ >>> +unsigned int xas_try_split_min_order(unsigned int order) >>> +{ >>> + if (order % XA_CHUNK_SHIFT == 0) >>> + return order == 0 ? 0 : order - 1; >>> + >>> + return order - (order % XA_CHUNK_SHIFT); >>> +} >>> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); >>> + >> >> I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. >> >> /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ >> 1251 | unsigned int xas_try_split_min_order(unsigned int order) >> | ^~~~~~~~~~~~~~~~~~~~~~~ >> In file included from /linux/lib/xarray.c:13: >> /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ >> 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) >> | ^~~~~~~~~~~~~~~~~~~~~~~ >> >> I think we should have the definition only when CONFIG_XARRAY_MULTI? > > I think it might be a merge issue, since my original patch[1] places > xas_try_split_min_order() above xas_try_split(), both of which are > in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 > seems to move xas_try_split_min_order() below xas_try_split() and > out of CONFIG_XARRAY_MULTI guard. > > [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ In addition, the new comment for xas_try_split() is added to xas_split() comment. See https://web.git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/lib/xarray.c?h=mm-everything-2025-03-08-00-43#n1084 Something went wrong when this patch was applied. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan @ 2025-03-08 21:34 ` SeongJae Park 1 sibling, 0 replies; 9+ messages in thread From: SeongJae Park @ 2025-03-08 21:34 UTC (permalink / raw) To: Zi Yan Cc: SeongJae Park, Andrew Morton, Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On Sat, 08 Mar 2025 13:32:02 -0500 Zi Yan <ziy@nvidia.com> wrote: > On 8 Mar 2025, at 13:14, SeongJae Park wrote: > > > Hello, > > > > On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: [...] > >> diff --git a/include/linux/xarray.h b/include/linux/xarray.h > >> index 4010195201c9..78eede109b1a 100644 > >> --- a/include/linux/xarray.h > >> +++ b/include/linux/xarray.h > >> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); > >> void xas_split(struct xa_state *, void *entry, unsigned int order); > >> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); > >> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); > >> +unsigned int xas_try_split_min_order(unsigned int order); > >> #else > >> static inline int xa_get_order(struct xarray *xa, unsigned long index) > >> { > >> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, > >> unsigned int order) > >> { > >> } > >> + > >> +static inline unsigned int xas_try_split_min_order(unsigned int order) > >> +{ > >> + return 0; > >> +} > >> + > >> #endif > >> > >> /** > >> diff --git a/lib/xarray.c b/lib/xarray.c > >> index bc197c96d171..8067182d3e43 100644 > >> --- a/lib/xarray.c > >> +++ b/lib/xarray.c > >> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) > >> } > >> EXPORT_SYMBOL_GPL(xas_split); > >> > >> +/** > >> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept > >> + * @order: Current entry order. > >> + * > >> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if > >> + * no new xa_node is needed. This function provides the minimal order > >> + * xas_try_split() supports. > >> + * > >> + * Return: the minimal order xas_try_split() supports > >> + * > >> + * Context: Any context. > >> + * > >> + */ > >> +unsigned int xas_try_split_min_order(unsigned int order) > >> +{ > >> + if (order % XA_CHUNK_SHIFT = 0) > >> + return order = 0 ? 0 : order - 1; > >> + > >> + return order - (order % XA_CHUNK_SHIFT); > >> +} > >> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); > >> + > > > > I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. > > > > /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ > > 1251 | unsigned int xas_try_split_min_order(unsigned int order) > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > In file included from /linux/lib/xarray.c:13: > > /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ > > 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > > > I think we should have the definition only when CONFIG_XARRAY_MULTI? > > I think it might be a merge issue, since my original patch[1] places > xas_try_split_min_order() above xas_try_split(), both of which are > in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 > seems to move xas_try_split_min_order() below xas_try_split() and > out of CONFIG_XARRAY_MULTI guard. You're right. I was testing this on the mm-unstable tree, more specifically, commit 2f0c87542d97. I confirmed the build failure goes away after moving the definition to the original place. > > [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ > > -- > Best Regards, > Yan, Zi Thanks, SJ ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan @ 2025-02-26 21:08 ` Zi Yan 2025-02-27 3:43 ` Baolin Wang 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2 siblings, 1 reply; 9+ messages in thread From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw) To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao During shmem_split_large_entry(), large swap entries are covering n slots and an order-0 folio needs to be inserted. Instead of splitting all n slots, only the 1 slot covered by the folio need to be split and the remaining n-1 shadow entries can be retained with orders ranging from 0 to n-1. This method only requires (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original xas_split_alloc() + xas_split() one. For example, to split an order-9 large swap entry (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of 8. xas_try_split_min_order() is used to reduce the number of calls to xas_try_split() during split. Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> --- mm/shmem.c | 59 ++++++++++++++++++++++++++---------------------------- 1 file changed, 28 insertions(+), 31 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index b959edf15073..f08ba9718013 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2153,15 +2153,16 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, { struct address_space *mapping = inode->i_mapping; XA_STATE_ORDER(xas, &mapping->i_pages, index, 0); - void *alloced_shadow = NULL; - int alloced_order = 0, i; + int split_order = 0, entry_order; + int i; /* Convert user data gfp flags to xarray node gfp flags */ gfp &= GFP_RECLAIM_MASK; for (;;) { - int order = -1, split_order = 0; void *old = NULL; + int cur_order; + pgoff_t swap_index; xas_lock_irq(&xas); old = xas_load(&xas); @@ -2170,60 +2171,56 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, goto unlock; } - order = xas_get_order(&xas); + entry_order = xas_get_order(&xas); - /* Swap entry may have changed before we re-acquire the lock */ - if (alloced_order && - (old != alloced_shadow || order != alloced_order)) { - xas_destroy(&xas); - alloced_order = 0; - } + if (!entry_order) + goto unlock; /* Try to split large swap entry in pagecache */ - if (order > 0) { - if (!alloced_order) { - split_order = order; + cur_order = entry_order; + swap_index = round_down(index, 1 << entry_order); + + split_order = xas_try_split_min_order(cur_order); + + while (cur_order > 0) { + pgoff_t aligned_index = + round_down(index, 1 << cur_order); + pgoff_t swap_offset = aligned_index - swap_index; + + xas_set_order(&xas, index, split_order); + xas_try_split(&xas, old, cur_order); + if (xas_error(&xas)) goto unlock; - } - xas_split(&xas, old, order); /* * Re-set the swap entry after splitting, and the swap * offset of the original large entry must be continuous. */ - for (i = 0; i < 1 << order; i++) { - pgoff_t aligned_index = round_down(index, 1 << order); + for (i = 0; i < 1 << cur_order; + i += (1 << split_order)) { swp_entry_t tmp; - tmp = swp_entry(swp_type(swap), swp_offset(swap) + i); + tmp = swp_entry(swp_type(swap), + swp_offset(swap) + swap_offset + + i); __xa_store(&mapping->i_pages, aligned_index + i, swp_to_radix_entry(tmp), 0); } + cur_order = split_order; + split_order = xas_try_split_min_order(split_order); } unlock: xas_unlock_irq(&xas); - /* split needed, alloc here and retry. */ - if (split_order) { - xas_split_alloc(&xas, old, split_order, gfp); - if (xas_error(&xas)) - goto error; - alloced_shadow = old; - alloced_order = split_order; - xas_reset(&xas); - continue; - } - if (!xas_nomem(&xas, gfp)) break; } -error: if (xas_error(&xas)) return xas_error(&xas); - return alloced_order; + return entry_order; } /* -- 2.47.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan @ 2025-02-27 3:43 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2025-02-27 3:43 UTC (permalink / raw) To: Zi Yan, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 2025/2/27 05:08, Zi Yan wrote: > During shmem_split_large_entry(), large swap entries are covering n slots > and an order-0 folio needs to be inserted. > > Instead of splitting all n slots, only the 1 slot covered by the folio > need to be split and the remaining n-1 shadow entries can be retained with > orders ranging from 0 to n-1. This method only requires > (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * > (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original > xas_split_alloc() + xas_split() one. > > For example, to split an order-9 large swap entry (assuming XA_CHUNK_SHIFT > is 6), 1 xa_node is needed instead of 8. > > xas_try_split_min_order() is used to reduce the number of calls to > xas_try_split() during split. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Mattew Wilcox <willy@infradead.org> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Yang Shi <yang@os.amperecomputing.com> > Cc: Yu Zhao <yuzhao@google.com> > --- LGTM. Feel free to add: Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] Minimize xa_node allocation during xarry split 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan @ 2025-03-07 20:33 ` Zi Yan 2 siblings, 0 replies; 9+ messages in thread From: Zi Yan @ 2025-03-07 20:33 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, Baolin Wang, Matthew Wilcox, linux-fsdevel, linux-mm On 26 Feb 2025, at 16:08, Zi Yan wrote: > Hi all, > > When splitting a multi-index entry in XArray from order-n to order-m, > existing xas_split_alloc()+xas_split() approach requires > 2^(n % XA_CHUNK_SHIFT) xa_node allocations. But its callers, > __filemap_add_folio() and shmem_split_large_entry(), use at most 1 xa_node. > To minimize xa_node allocation and remove the limitation of no split from > order-12 (or above) to order-0 (or anything between 0 and 5)[1], > xas_try_split() was added[2], which allocates > (n / XA_CHUNK_SHIFT - m / XA_CHUNK_SHIFT) xa_node. It is used > for non-uniform folio split, but can be used by __filemap_add_folio() > and shmem_split_large_entry(). > > xas_split_alloc() and xas_split() split an order-9 to order-0: > > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | | | | > ------- --- --- ------- > | | ... | | > V V V V > ----------- ----------- ----------- ----------- > | xa_node | | xa_node | ... | xa_node | | xa_node | > ----------- ----------- ----------- ----------- > > xas_try_split() splits an order-9 to order-0: > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | > | > V > ----------- > | xa_node | > ----------- > > xas_try_split() is designed to be called iteratively with n = m + 1. > xas_try_split_mini_order() is added to minmize the number of calls to > xas_try_split() by telling the caller the next minimal order to split to > instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT > does not require xa_node allocation and requires 1 xa_node > when n=l * XA_CHUNK_SHIFT and m = n - 1, so it is OK to use > xas_try_split() with n > m + 1 when no new xa_node is needed. > > xfstests quick group test passed on xfs and tmpfs. > > It is on top of Buddy allocator like (or non-uniform) > folio split V9[2], which is on top of mm-everything-2025-02-26-03-56. > > Changelog > === > From V2[3]: > 1. Fixed shmem_split_large_entry() by setting swap offset correct. > (Thank Baolin for the detailed review) > 2. Used updated xas_try_split() to avoid a bug when xa_node is allocated > by xas_nomem() instead of xas_try_split() itself. > > Let me know your comments. > > > [1] https://lore.kernel.org/linux-mm/Z6YX3RznGLUD07Ao@casper.infradead.org/ > [2] https://lore.kernel.org/linux-mm/20250226210032.2044041-1-ziy@nvidia.com/ > [3] https://lore.kernel.org/linux-mm/20250218235444.1543173-1-ziy@nvidia.com/ Hi Andrew, Do you want me to resend this? This still applies cleanly on mm-everything-2025-03-07-07-55 plus V10. Thanks. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-08 21:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan 2025-03-08 18:14 ` SeongJae Park 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan 2025-02-27 3:43 ` Baolin Wang 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox