[PATCH v3 0/4] mm/filemap: optimize folio adding and splitting

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting
@ 2024-04-15 17:18 Kairui Song
  2024-04-15 17:18 ` [PATCH v3 1/4] mm/filemap: return early if failed to allocate memory for split Kairui Song
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Kairui Song @ 2024-04-15 17:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, Kairui Song

From: Kairui Song <kasong@tencent.com>

Currently, at least 3 tree walks are needed for filemap folio adding if
the folio is previously evicted. One for getting the order of current slot,
one for ranged conflict check, and one for another order retrieving.
If a split is needed, more walks are needed.

This series is trying to merge these walks, and speed up filemap_add_folio,
I see a 7.5% - 12.5% performance gain for fio stress test.

So instead of doing multiple tree walks, do one optimism range check
with lock hold, and exit if raced with another insertion. If a shadow
exists, check it with a new xas_get_order helper before releasing the
lock to avoid redundant tree walks for getting its order.

Drop the lock and do the allocation only if a split is needed.

In the best case, it only need to walk the tree once. If it needs
to alloc and split, 3 walks are issued (One for first ranged
conflict check and order retrieving, one for the second check after
allocation, one for the insert after split).

Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
    --buffered=1 --ioengine=mmap --rw=randread --time_based \
    --ramp_time=30s --runtime=5m --group_reporting

Before:
bw (  MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691
iops        : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691

After (+7.3%):
bw (  MiB/s): min=  493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651
iops        : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651

Test result with THP (do a THP randread then switch to 4K page in hope it
issues a lot of splitting):

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap -thp=1 --readonly \
      --rw=randread --time_based --ramp_time=30s --runtime=10m \
      --group_reporting

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap \
      --rw=randread --time_based --runtime=5s --group_reporting

Before:
bw (  KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976
iops        : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·

READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec

After (+10.4%):
bw (  KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146
iops        : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146

READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec

The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.

V2: https://lore.kernel.org/lkml/20240325171405.99971-1-ryncsn@gmail.com/
Updates from V2:
- Fix the misusage of locks in test module:
  https://lore.kernel.org/oe-lkp/202404151046.448e2d6e-lkp@intel.com

V1: https://lore.kernel.org/lkml/20240319092733.4501-1-ryncsn@gmail.com/
Updates from V1:
- Collect Acks.
- Add tests for new xas_get_order and combined usage of xas_get_order with
  xas_for_each_conflict.
- Fix a memleak for patch 4/4 and modify the function in place instead
  of adding a new helper.
- Update benchmark, I forgot to drop cache and disable THP for pervious
  test, so the result was for mixed usaged of split and add. The result
  is even better now.

Kairui Song (4):
  mm/filemap: return early if failed to allocate memory for split
  mm/filemap: clean up hugetlb exclusion code
  lib/xarray: introduce a new helper xas_get_order
  mm/filemap: optimize filemap folio adding

 include/linux/xarray.h |  6 +++
 lib/test_xarray.c      | 93 ++++++++++++++++++++++++++++++++++++++++++
 lib/xarray.c           | 49 ++++++++++++++--------
 mm/filemap.c           | 74 +++++++++++++++++++++------------
 4 files changed, 179 insertions(+), 43 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/4] mm/filemap: return early if failed to allocate memory for split
  2024-04-15 17:18 [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
@ 2024-04-15 17:18 ` Kairui Song
  2024-04-15 17:18 ` [PATCH v3 2/4] mm/filemap: clean up hugetlb exclusion code Kairui Song
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Kairui Song @ 2024-04-15 17:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, Kairui Song

From: Kairui Song <kasong@tencent.com>

xas_split_alloc could fail with NOMEM, and in such case, it should abort
early instead of keep going and fail the xas_split below.

Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 480ae6589803..12089c24abfb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -880,9 +880,12 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		unsigned int order = xa_get_order(xas.xa, xas.xa_index);
 		void *entry, *old = NULL;
 
-		if (order > folio_order(folio))
+		if (order > folio_order(folio)) {
 			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
 					order, gfp);
+			if (xas_error(&xas))
+				goto error;
+		}
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
 			old = entry;
-- 
2.44.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 2/4] mm/filemap: clean up hugetlb exclusion code
  2024-04-15 17:18 [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
  2024-04-15 17:18 ` [PATCH v3 1/4] mm/filemap: return early if failed to allocate memory for split Kairui Song
@ 2024-04-15 17:18 ` Kairui Song
  2024-04-15 17:18 ` [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order Kairui Song
  2024-04-15 17:18 ` [PATCH v3 4/4] mm/filemap: optimize filemap folio adding Kairui Song
  3 siblings, 0 replies; 7+ messages in thread
From: Kairui Song @ 2024-04-15 17:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, Kairui Song

From: Kairui Song <kasong@tencent.com>

__filemap_add_folio only has two callers, one never passes hugetlb folio
and one always passes in hugetlb folio. So move the hugetlb related
cgroup charging out of it to make the code cleaner.

Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 12089c24abfb..17a66ea544e7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -853,20 +853,12 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 {
 	XA_STATE(xas, &mapping->i_pages, index);
 	bool huge = folio_test_hugetlb(folio);
-	bool charged = false;
-	long nr = 1;
+	long nr;
 
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio);
 	mapping_set_update(&xas, mapping);
 
-	if (!huge) {
-		int error = mem_cgroup_charge(folio, NULL, gfp);
-		if (error)
-			return error;
-		charged = true;
-	}
-
 	VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
 	xas_set_order(&xas, index, folio_order(folio));
 	nr = folio_nr_pages(folio);
@@ -931,8 +923,6 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 	trace_mm_filemap_add_to_page_cache(folio);
 	return 0;
 error:
-	if (charged)
-		mem_cgroup_uncharge(folio);
 	folio->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
 	folio_put_refs(folio, nr);
@@ -946,11 +936,16 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
 	void *shadow = NULL;
 	int ret;
 
+	ret = mem_cgroup_charge(folio, NULL, gfp);
+	if (ret)
+		return ret;
+
 	__folio_set_locked(folio);
 	ret = __filemap_add_folio(mapping, folio, index, gfp, &shadow);
-	if (unlikely(ret))
+	if (unlikely(ret)) {
+		mem_cgroup_uncharge(folio);
 		__folio_clear_locked(folio);
-	else {
+	} else {
 		/*
 		 * The folio might have been evicted from cache only
 		 * recently, in which case it should be activated like
-- 
2.44.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order
  2024-04-15 17:18 [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
  2024-04-15 17:18 ` [PATCH v3 1/4] mm/filemap: return early if failed to allocate memory for split Kairui Song
  2024-04-15 17:18 ` [PATCH v3 2/4] mm/filemap: clean up hugetlb exclusion code Kairui Song
@ 2024-04-15 17:18 ` Kairui Song
  2024-04-15 19:06   ` Matthew Wilcox
  2024-04-15 17:18 ` [PATCH v3 4/4] mm/filemap: optimize filemap folio adding Kairui Song
  3 siblings, 1 reply; 7+ messages in thread
From: Kairui Song @ 2024-04-15 17:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, Kairui Song

From: Kairui Song <kasong@tencent.com>

It can be used after xas_load to check the order of loaded entries.
Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.

Also add new test for xas_get_order, to make the test work we have to
export xas_get_order with EXPORT_SYMBOL_GPL.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 include/linux/xarray.h |  6 ++++++
 lib/test_xarray.c      | 34 +++++++++++++++++++++++++++++
 lib/xarray.c           | 49 ++++++++++++++++++++++++++----------------
 3 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index cb571dfcf4b1..d9d479334c9e 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1548,6 +1548,7 @@ void xas_create_range(struct xa_state *);
 
 #ifdef CONFIG_XARRAY_MULTI
 int xa_get_order(struct xarray *, unsigned long index);
+int xas_get_order(struct xa_state *xas);
 void xas_split(struct xa_state *, void *entry, unsigned int order);
 void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t);
 #else
@@ -1556,6 +1557,11 @@ static inline int xa_get_order(struct xarray *xa, unsigned long index)
 	return 0;
 }
 
+static inline int xas_get_order(struct xa_state *xas)
+{
+	return 0;
+}
+
 static inline void xas_split(struct xa_state *xas, void *entry,
 		unsigned int order)
 {
diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index ebe2af2e072d..0efde8f93490 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -1984,6 +1984,39 @@ static noinline void check_get_order(struct xarray *xa)
 	}
 }
 
+static noinline void check_xas_get_order(struct xarray *xa)
+{
+	XA_STATE(xas, xa, 0);
+
+	unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1;
+	unsigned int order;
+	unsigned long i, j;
+
+	for (order = 0; order < max_order; order++) {
+		for (i = 0; i < 10; i++) {
+			xas_set_order(&xas, i << order, order);
+			do {
+				xas_lock(&xas);
+				xas_store(&xas, xa_mk_value(i));
+				xas_unlock(&xas);
+			} while (xas_nomem(&xas, GFP_KERNEL));
+
+			for (j = i << order; j < (i + 1) << order; j++) {
+				xas_set_order(&xas, j, 0);
+				rcu_read_lock();
+				xas_load(&xas);
+				XA_BUG_ON(xa, xas_get_order(&xas) != order);
+				rcu_read_unlock();
+			}
+
+			xas_lock(&xas);
+			xas_set_order(&xas, i << order, order);
+			xas_store(&xas, NULL);
+			xas_unlock(&xas);
+		}
+	}
+}
+
 static noinline void check_destroy(struct xarray *xa)
 {
 	unsigned long index;
@@ -2035,6 +2068,7 @@ static int xarray_checks(void)
 	check_multi_store(&array);
 	check_multi_store_advanced(&array);
 	check_get_order(&array);
+	check_xas_get_order(&array);
 	check_xa_alloc();
 	check_find(&array);
 	check_find_entry(&array);
diff --git a/lib/xarray.c b/lib/xarray.c
index 39f07bfc4dcc..fbf1d1dd83bc 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1750,39 +1750,52 @@ void *xa_store_range(struct xarray *xa, unsigned long first,
 EXPORT_SYMBOL(xa_store_range);
 
 /**
- * xa_get_order() - Get the order of an entry.
- * @xa: XArray.
- * @index: Index of the entry.
+ * xas_get_order() - Get the order of an loaded entry after xas_load.
+ * @xas: XArray operation state.
+ *
+ * Called after xas_load, the xas should not be in an error state.
  *
  * Return: A number between 0 and 63 indicating the order of the entry.
  */
-int xa_get_order(struct xarray *xa, unsigned long index)
+int xas_get_order(struct xa_state *xas)
 {
-	XA_STATE(xas, xa, index);
-	void *entry;
 	int order = 0;
 
-	rcu_read_lock();
-	entry = xas_load(&xas);
-
-	if (!entry)
-		goto unlock;
-
-	if (!xas.xa_node)
-		goto unlock;
+	if (!xas->xa_node)
+		return 0;
 
 	for (;;) {
-		unsigned int slot = xas.xa_offset + (1 << order);
+		unsigned int slot = xas->xa_offset + (1 << order);
 
 		if (slot >= XA_CHUNK_SIZE)
 			break;
-		if (!xa_is_sibling(xas.xa_node->slots[slot]))
+		if (!xa_is_sibling(xas->xa_node->slots[slot]))
 			break;
 		order++;
 	}
 
-	order += xas.xa_node->shift;
-unlock:
+	order += xas->xa_node->shift;
+	return order;
+}
+EXPORT_SYMBOL_GPL(xas_get_order);
+
+/**
+ * xa_get_order() - Get the order of an entry.
+ * @xa: XArray.
+ * @index: Index of the entry.
+ *
+ * Return: A number between 0 and 63 indicating the order of the entry.
+ */
+int xa_get_order(struct xarray *xa, unsigned long index)
+{
+	XA_STATE(xas, xa, index);
+	int order = 0;
+	void *entry;
+
+	rcu_read_lock();
+	entry = xas_load(&xas);
+	if (entry)
+		order = xas_get_order(&xas);
 	rcu_read_unlock();
 
 	return order;
-- 
2.44.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 4/4] mm/filemap: optimize filemap folio adding
  2024-04-15 17:18 [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
                   ` (2 preceding siblings ...)
  2024-04-15 17:18 ` [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order Kairui Song
@ 2024-04-15 17:18 ` Kairui Song
  3 siblings, 0 replies; 7+ messages in thread
From: Kairui Song @ 2024-04-15 17:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, Kairui Song

From: Kairui Song <kasong@tencent.com>

Instead of doing multiple tree walks, do one optimism range check with
lock hold, and exit if raced with another insertion.  If a shadow exists,
check it with a new xas_get_order helper before releasing the lock to
avoid redundant tree walks for getting its order.

Drop the lock and do the allocation only if a split is needed.

In the best case, it only need to walk the tree once.  If it needs to
alloc and split, 3 walks are issued (One for first ranged conflict check
and order retrieving, one for the second check after allocation, one for
the insert after split).

Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
    --buffered=1 --ioengine=mmap --rw=randread --time_based \
    --ramp_time=30s --runtime=5m --group_reporting

Before:
bw (  MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691
iops        : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691

After (+7.3%):
bw (  MiB/s): min=  493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651
iops        : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651

Test result with THP (do a THP randread then switch to 4K page in hope it
issues a lot of splitting):

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap -thp=1 --readonly \
      --rw=randread --time_based --ramp_time=30s --runtime=10m \
      --group_reporting

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap \
      --rw=randread --time_based --runtime=5s --group_reporting

Before:
bw (  KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976
iops        : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·

READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec

After (+12.5%):
bw (  KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146
iops        : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146

READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec

The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 lib/test_xarray.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c      | 56 ++++++++++++++++++++++++++++++++------------
 2 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 0efde8f93490..8732a311f613 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -2017,6 +2017,64 @@ static noinline void check_xas_get_order(struct xarray *xa)
 	}
 }
 
+static noinline void check_xas_conflict_get_order(struct xarray *xa)
+{
+	XA_STATE(xas, xa, 0);
+
+	void *entry;
+	int only_once;
+	unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1;
+	unsigned int order;
+	unsigned long i, j, k;
+
+	for (order = 0; order < max_order; order++) {
+		for (i = 0; i < 10; i++) {
+			xas_set_order(&xas, i << order, order);
+			do {
+				xas_lock(&xas);
+				xas_store(&xas, xa_mk_value(i));
+				xas_unlock(&xas);
+			} while (xas_nomem(&xas, GFP_KERNEL));
+
+			/*
+			 * Ensure xas_get_order works with xas_for_each_conflict.
+			 */
+			j = i << order;
+			for (k = 0; k < order; k++) {
+				only_once = 0;
+				xas_set_order(&xas, j + (1 << k), k);
+				xas_lock(&xas);
+				xas_for_each_conflict(&xas, entry) {
+					XA_BUG_ON(xa, entry != xa_mk_value(i));
+					XA_BUG_ON(xa, xas_get_order(&xas) != order);
+					only_once++;
+				}
+				XA_BUG_ON(xa, only_once != 1);
+				xas_unlock(&xas);
+			}
+
+			if (order < max_order - 1) {
+				only_once = 0;
+				xas_set_order(&xas, (i & ~1UL) << order, order + 1);
+				xas_lock(&xas);
+				xas_for_each_conflict(&xas, entry) {
+					XA_BUG_ON(xa, entry != xa_mk_value(i));
+					XA_BUG_ON(xa, xas_get_order(&xas) != order);
+					only_once++;
+				}
+				XA_BUG_ON(xa, only_once != 1);
+				xas_unlock(&xas);
+			}
+
+			xas_set_order(&xas, i << order, order);
+			xas_lock(&xas);
+			xas_store(&xas, NULL);
+			xas_unlock(&xas);
+		}
+	}
+}
+
+
 static noinline void check_destroy(struct xarray *xa)
 {
 	unsigned long index;
@@ -2069,6 +2127,7 @@ static int xarray_checks(void)
 	check_multi_store_advanced(&array);
 	check_get_order(&array);
 	check_xas_get_order(&array);
+	check_xas_conflict_get_order(&array);
 	check_xa_alloc();
 	check_find(&array);
 	check_find_entry(&array);
diff --git a/mm/filemap.c b/mm/filemap.c
index 17a66ea544e7..7b0b2229d4ed 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -852,7 +852,9 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
-	bool huge = folio_test_hugetlb(folio);
+	void *alloced_shadow = NULL;
+	int alloced_order = 0;
+	bool huge;
 	long nr;
 
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -861,6 +863,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 
 	VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
 	xas_set_order(&xas, index, folio_order(folio));
+	huge = folio_test_hugetlb(folio);
 	nr = folio_nr_pages(folio);
 
 	gfp &= GFP_RECLAIM_MASK;
@@ -868,16 +871,10 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 	folio->mapping = mapping;
 	folio->index = xas.xa_index;
 
-	do {
-		unsigned int order = xa_get_order(xas.xa, xas.xa_index);
+	for (;;) {
+		int order = -1, split_order = 0;
 		void *entry, *old = NULL;
 
-		if (order > folio_order(folio)) {
-			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
-					order, gfp);
-			if (xas_error(&xas))
-				goto error;
-		}
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
 			old = entry;
@@ -885,19 +882,33 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 				xas_set_err(&xas, -EEXIST);
 				goto unlock;
 			}
+			/*
+			 * If a larger entry exists,
+			 * it will be the first and only entry iterated.
+			 */
+			if (order == -1)
+				order = xas_get_order(&xas);
+		}
+
+		/* entry may have changed before we re-acquire the lock */
+		if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
+			xas_destroy(&xas);
+			alloced_order = 0;
 		}
 
 		if (old) {
-			if (shadowp)
-				*shadowp = old;
-			/* entry may have been split before we acquired lock */
-			order = xa_get_order(xas.xa, xas.xa_index);
-			if (order > folio_order(folio)) {
+			if (order > 0 && order > folio_order(folio)) {
 				/* How to handle large swap entries? */
 				BUG_ON(shmem_mapping(mapping));
+				if (!alloced_order) {
+					split_order = order;
+					goto unlock;
+				}
 				xas_split(&xas, old, order);
 				xas_reset(&xas);
 			}
+			if (shadowp)
+				*shadowp = old;
 		}
 
 		xas_store(&xas, folio);
@@ -913,9 +924,24 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 				__lruvec_stat_mod_folio(folio,
 						NR_FILE_THPS, nr);
 		}
+
 unlock:
 		xas_unlock_irq(&xas);
-	} while (xas_nomem(&xas, gfp));
+
+		/* split needed, alloc here and retry. */
+		if (split_order) {
+			xas_split_alloc(&xas, old, split_order, gfp);
+			if (xas_error(&xas))
+				goto error;
+			alloced_shadow = old;
+			alloced_order = split_order;
+			xas_reset(&xas);
+			continue;
+		}
+
+		if (!xas_nomem(&xas, gfp))
+			break;
+	}
 
 	if (xas_error(&xas))
 		goto error;
-- 
2.44.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order
  2024-04-15 17:18 ` [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order Kairui Song
@ 2024-04-15 19:06   ` Matthew Wilcox
  2024-04-16  3:51     ` Kairui Song
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2024-04-15 19:06 UTC (permalink / raw)
  To: Kairui Song; +Cc: linux-mm, Andrew Morton, linux-kernel

On Tue, Apr 16, 2024 at 01:18:55AM +0800, Kairui Song wrote:
>  /**
> - * xa_get_order() - Get the order of an entry.
> - * @xa: XArray.
> - * @index: Index of the entry.
> + * xas_get_order() - Get the order of an loaded entry after xas_load.

I'd just leave that as "Get the order of an entry." as it doesn't have
to be after calling xas_load(), it could be from any other operation
that moves xas.  Also this is the short description!

>  		if (slot >= XA_CHUNK_SIZE)
>  			break;
> -		if (!xa_is_sibling(xas.xa_node->slots[slot]))
> +		if (!xa_is_sibling(xas->xa_node->slots[slot]))

Could you fold in this change I have locally?

-               if (!xa_is_sibling(xas->xa_node->slots[slot]))
+               if (!xa_is_sibling(xa_entry(xas.xa, xas.xa_node, slot)))

That fixes a sparse warning which existed before your modifications.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order
  2024-04-15 19:06   ` Matthew Wilcox
@ 2024-04-16  3:51     ` Kairui Song
  0 siblings, 0 replies; 7+ messages in thread
From: Kairui Song @ 2024-04-16  3:51 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton, linux-kernel

On Tue, Apr 16, 2024 at 3:06 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Apr 16, 2024 at 01:18:55AM +0800, Kairui Song wrote:
> >  /**
> > - * xa_get_order() - Get the order of an entry.
> > - * @xa: XArray.
> > - * @index: Index of the entry.
> > + * xas_get_order() - Get the order of an loaded entry after xas_load.
>
> I'd just leave that as "Get the order of an entry." as it doesn't have
> to be after calling xas_load(), it could be from any other operation
> that moves xas.  Also this is the short description!

Good suggestion!

>
> >               if (slot >= XA_CHUNK_SIZE)
> >                       break;
> > -             if (!xa_is_sibling(xas.xa_node->slots[slot]))
> > +             if (!xa_is_sibling(xas->xa_node->slots[slot]))
>
> Could you fold in this change I have locally?
>
> -               if (!xa_is_sibling(xas->xa_node->slots[slot]))
> +               if (!xa_is_sibling(xa_entry(xas.xa, xas.xa_node, slot)))
>
> That fixes a sparse warning which existed before your modifications.
>

OK.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-16  3:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-15 17:18 [PATCH v3 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
2024-04-15 17:18 ` [PATCH v3 1/4] mm/filemap: return early if failed to allocate memory for split Kairui Song
2024-04-15 17:18 ` [PATCH v3 2/4] mm/filemap: clean up hugetlb exclusion code Kairui Song
2024-04-15 17:18 ` [PATCH v3 3/4] lib/xarray: introduce a new helper xas_get_order Kairui Song
2024-04-15 19:06   ` Matthew Wilcox
2024-04-16  3:51     ` Kairui Song
2024-04-15 17:18 ` [PATCH v3 4/4] mm/filemap: optimize filemap folio adding Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox