linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
@ 2025-02-06  6:15 SeongJae Park
  2025-02-06  6:15 ` [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-06  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

process_madvise() calls do_madvise() for each address range.  Then, each
do_madvise() invocation holds and releases same mmap_lock.  Optimize the
redundant lock operations by splitting do_madvise() internal logics
including the mmap_lock operations, and calling the small logics
directly from process_madvise() in a sequence that removes the redundant
locking.  As a result of this change, process_madvise() becomes more
efficient and less racy in terms of its results and latency.

Note that the potential downside of this series is that other mmap_lock
holders may take more time due to the increased length of mmap_lock
critical section for process_madvise() calls.  But there is maximum
limit in the kernel space (IOV_MAX), and the user space can control the
critical section length by setting the request size.  Hence, the
downside would be limited and controllable.

Evaluation
==========

I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
using multiple madvise() calls, 4 KiB per each call.  I also do the same
with process_madvise(), but with varying batch size (vlen) from 1 to
1024.  The source code for the measurement is available at GitHub[1].
Because the microbenchmark result is not that stable, I ran each
configuration five times and use the average.

The measurement results are as below.  'sz_batches' column shows the
batch size of process_madvise() calls.  '0' batch size is for madvise()
calls case.  'before' and 'after' columns are the measured time to apply
MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
that built without and with the last patch of this series, respectively.
So lower value means better efficiency.  'after/before' column is the
ratio of 'after' to 'before'.

    sz_batches  before       after        after/before
    0           146294215.2  121280536.2  0.829017989769427
    1           165851018.8  136305598.2  0.821855658085351
    2           129469321.2  103740383.6  0.801273866569094
    4           110369232.4  87835896.2   0.795836795182785
    8           102906232.4  77420920.2   0.752344327397609
    16          97551017.4   74959714.4   0.768415506038587
    32          94809848.2   71200848.4   0.750985786305689
    64          96087575.6   72593180     0.755489765942227
    128         96154163.8   68517055.4   0.712575022154163
    256         92901257.6   69054216.6   0.743307662177439
    512         93646170.8   67053296.2   0.716028168874151
    1024        92663219.2   70168196.8   0.75723892830177

In despite of the unstable nature of the tet program, the trend is
somewhat we can expect.  The measurement shows this patch reduces the
process_madvise() latency, proportional to the batching size.  The
latency gain was about 20% with the batch size 2, and it has increased
to about 28% with the batch size 512, since more number of mmap locking
is reduced with larger batch size.

Note that the standard devitation of the measurements for each
sz_batches configuration was ranging from 1.9% to 7.2%.  That is, this
result is still not very stable.  The average of the standard deviations
for different batch sizes were 4.62% and 4.70% for the 'before' and
'after' kernel measurements.

Also note that this patch has somehow decreased latencies of madvise()
and single batch size process_madvise().  Seems this code path is small
enough to significantly be affected by compiler optimizations including
inlining of split-out functions.  Please focus on only the improvement
amount that changed by the batch size.

Changelog
=========

Changes from RFC v2
(https://lore.kernel.org/20250117013058.1843-1-sj@kernel.org)
- Release and acquire mmap lock again when a race-caused failure happens
  (Lorenzo Stoakes)
- Collected Reviewed-by: tags from Shakeel, Lorenzo and Davidlohr.

Changes from RFC v1
(https://lore.kernel.org/20250111004618.1566-1-sj@kernel.org)
- Split out do_madvise() and use those from vector_madvise(), instead of
  adding a flag to do_madvise() (Liam R. Howlett)

[1] https://github.com/sjp38/eval_proc_madvise

SeongJae Park (4):
  mm/madvise: split out mmap locking operations for madvise()
  mm/madvise: split out madvise input validity check
  mm/madvise: split out madvise() behavior execution
  mm/madvise: remove redundant mmap_lock operations from
    process_madvise()

 mm/madvise.c | 154 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 107 insertions(+), 47 deletions(-)


base-commit: f104b8534d19f31443a4fe6cb701bdb15fd931eb
-- 
2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise()
  2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
@ 2025-02-06  6:15 ` SeongJae Park
  2025-02-06 20:27   ` Liam R. Howlett
  2025-02-06  6:15 ` [PATCH 2/4] mm/madvise: split out madvise input validity check SeongJae Park
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2025-02-06  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

Split out the madvise behavior-dependent mmap_lock operations from
do_madvise(), for easier reuse of the logic in an upcoming change.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/madvise.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c8e28d51978a..df5a87a1846a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1567,6 +1567,33 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
 				 madvise_vma_anon_name);
 }
 #endif /* CONFIG_ANON_VMA_NAME */
+
+static int madvise_lock(struct mm_struct *mm, int behavior)
+{
+
+#ifdef CONFIG_MEMORY_FAILURE
+	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
+		return 0;
+#endif
+
+	if (madvise_need_mmap_write(behavior)) {
+		if (mmap_write_lock_killable(mm))
+			return -EINTR;
+	} else {
+		mmap_read_lock(mm);
+	}
+	return 0;
+
+}
+
+static void madvise_unlock(struct mm_struct *mm, int behavior)
+{
+	if (madvise_need_mmap_write(behavior))
+		mmap_write_unlock(mm);
+	else
+		mmap_read_unlock(mm);
+}
+
 /*
  * The madvise(2) system call.
  *
@@ -1643,7 +1670,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 {
 	unsigned long end;
 	int error;
-	int write;
 	size_t len;
 	struct blk_plug plug;
 
@@ -1665,19 +1691,15 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	if (end == start)
 		return 0;
 
+	error = madvise_lock(mm, behavior);
+	if (error)
+		return error;
+
 #ifdef CONFIG_MEMORY_FAILURE
 	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
 		return madvise_inject_error(behavior, start, start + len_in);
 #endif
 
-	write = madvise_need_mmap_write(behavior);
-	if (write) {
-		if (mmap_write_lock_killable(mm))
-			return -EINTR;
-	} else {
-		mmap_read_lock(mm);
-	}
-
 	start = untagged_addr_remote(mm, start);
 	end = start + len;
 
@@ -1694,10 +1716,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	}
 	blk_finish_plug(&plug);
 
-	if (write)
-		mmap_write_unlock(mm);
-	else
-		mmap_read_unlock(mm);
+	madvise_unlock(mm, behavior);
 
 	return error;
 }
-- 
2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/4] mm/madvise: split out madvise input validity check
  2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
  2025-02-06  6:15 ` [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
@ 2025-02-06  6:15 ` SeongJae Park
  2025-02-06 20:29   ` Liam R. Howlett
  2025-02-06  6:15 ` [PATCH 3/4] mm/madvise: split out madvise() behavior execution SeongJae Park
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2025-02-06  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

Split out the madvise parameters validation logic from do_madvise(), for
easy reuse of the logic from a future change.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/madvise.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index df5a87a1846a..efab2878be7c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1594,6 +1594,27 @@ static void madvise_unlock(struct mm_struct *mm, int behavior)
 		mmap_read_unlock(mm);
 }
 
+static bool is_valid_madvise(unsigned long start, size_t len_in, int behavior)
+{
+	size_t len;
+
+	if (!madvise_behavior_valid(behavior))
+		return false;
+
+	if (!PAGE_ALIGNED(start))
+		return false;
+	len = PAGE_ALIGN(len_in);
+
+	/* Check to see whether len was rounded up from small -ve to zero */
+	if (len_in && !len)
+		return false;
+
+	if (start + len < start)
+		return false;
+
+	return true;
+}
+
 /*
  * The madvise(2) system call.
  *
@@ -1673,20 +1694,11 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	size_t len;
 	struct blk_plug plug;
 
-	if (!madvise_behavior_valid(behavior))
+	if (!is_valid_madvise(start, len_in, behavior))
 		return -EINVAL;
 
-	if (!PAGE_ALIGNED(start))
-		return -EINVAL;
 	len = PAGE_ALIGN(len_in);
-
-	/* Check to see whether len was rounded up from small -ve to zero */
-	if (len_in && !len)
-		return -EINVAL;
-
 	end = start + len;
-	if (end < start)
-		return -EINVAL;
 
 	if (end == start)
 		return 0;
-- 
2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] mm/madvise: split out madvise() behavior execution
  2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
  2025-02-06  6:15 ` [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
  2025-02-06  6:15 ` [PATCH 2/4] mm/madvise: split out madvise input validity check SeongJae Park
@ 2025-02-06  6:15 ` SeongJae Park
  2025-02-06 20:30   ` Liam R. Howlett
  2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
  2025-02-11  8:48 ` [PATCH 0/4] " Vern Hao
  4 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2025-02-06  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

Split out the madvise behavior applying logic from do_madvise() to make
it easier to reuse from the following change.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/madvise.c | 53 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index efab2878be7c..31e5df75b926 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1615,6 +1615,35 @@ static bool is_valid_madvise(unsigned long start, size_t len_in, int behavior)
 	return true;
 }
 
+static int madvise_do_behavior(struct mm_struct *mm,
+		unsigned long start, size_t len_in, size_t len, int behavior)
+{
+	struct blk_plug plug;
+	unsigned long end;
+	int error;
+
+#ifdef CONFIG_MEMORY_FAILURE
+	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
+		return madvise_inject_error(behavior, start, start + len_in);
+#endif
+	start = untagged_addr_remote(mm, start);
+	end = start + len;
+
+	blk_start_plug(&plug);
+	switch (behavior) {
+	case MADV_POPULATE_READ:
+	case MADV_POPULATE_WRITE:
+		error = madvise_populate(mm, start, end, behavior);
+		break;
+	default:
+		error = madvise_walk_vmas(mm, start, end, behavior,
+					  madvise_vma_behavior);
+		break;
+	}
+	blk_finish_plug(&plug);
+	return error;
+}
+
 /*
  * The madvise(2) system call.
  *
@@ -1692,7 +1721,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	unsigned long end;
 	int error;
 	size_t len;
-	struct blk_plug plug;
 
 	if (!is_valid_madvise(start, len_in, behavior))
 		return -EINVAL;
@@ -1706,28 +1734,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	error = madvise_lock(mm, behavior);
 	if (error)
 		return error;
-
-#ifdef CONFIG_MEMORY_FAILURE
-	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
-		return madvise_inject_error(behavior, start, start + len_in);
-#endif
-
-	start = untagged_addr_remote(mm, start);
-	end = start + len;
-
-	blk_start_plug(&plug);
-	switch (behavior) {
-	case MADV_POPULATE_READ:
-	case MADV_POPULATE_WRITE:
-		error = madvise_populate(mm, start, end, behavior);
-		break;
-	default:
-		error = madvise_walk_vmas(mm, start, end, behavior,
-					  madvise_vma_behavior);
-		break;
-	}
-	blk_finish_plug(&plug);
-
+	error = madvise_do_behavior(mm, start, len_in, len, behavior);
 	madvise_unlock(mm, behavior);
 
 	return error;
-- 
2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
                   ` (2 preceding siblings ...)
  2025-02-06  6:15 ` [PATCH 3/4] mm/madvise: split out madvise() behavior execution SeongJae Park
@ 2025-02-06  6:15 ` SeongJae Park
  2025-02-06 13:04   ` Lorenzo Stoakes
                     ` (2 more replies)
  2025-02-11  8:48 ` [PATCH 0/4] " Vern Hao
  4 siblings, 3 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-06  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

Optimize redundant mmap lock operations from process_madvise() by
directly doing the mmap locking first, and then the remaining works for
all ranges in the loop.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/madvise.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 31e5df75b926..5a0a1fc99d27 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 
 	total_len = iov_iter_count(iter);
 
+	ret = madvise_lock(mm, behavior);
+	if (ret)
+		return ret;
+
 	while (iov_iter_count(iter)) {
-		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
-				 iter_iov_len(iter), behavior);
+		unsigned long start = (unsigned long)iter_iov_addr(iter);
+		size_t len_in = iter_iov_len(iter);
+		size_t len;
+
+		if (!is_valid_madvise(start, len_in, behavior)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		len = PAGE_ALIGN(len_in);
+		if (start + len == start)
+			ret = 0;
+		else
+			ret = madvise_do_behavior(mm, start, len_in, len,
+					behavior);
 		/*
 		 * An madvise operation is attempting to restart the syscall,
 		 * but we cannot proceed as it would not be correct to repeat
@@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 				ret = -EINTR;
 				break;
 			}
+
+			/* Drop and reacquire lock to unwind race. */
+			madvise_unlock(mm, behavior);
+			madvise_lock(mm, behavior);
 			continue;
 		}
 		if (ret < 0)
 			break;
 		iov_iter_advance(iter, iter_iov_len(iter));
 	}
+	madvise_unlock(mm, behavior);
 
 	ret = (total_len - iov_iter_count(iter)) ? : ret;
 
-- 
2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
@ 2025-02-06 13:04   ` Lorenzo Stoakes
  2025-02-06 16:53     ` SeongJae Park
  2025-02-06 20:32   ` Liam R. Howlett
  2025-02-11  5:30   ` Lai, Yi
  2 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2025-02-06 13:04 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

On Wed, Feb 05, 2025 at 10:15:17PM -0800, SeongJae Park wrote:
> Optimize redundant mmap lock operations from process_madvise() by
> directly doing the mmap locking first, and then the remaining works for
> all ranges in the loop.
>
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: SeongJae Park <sj@kernel.org>

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

But please fixup the comment as below. Maybe Andrew could do it if I am
giving enough info to go on, otherwise it could be a quick fix-patch? I
attach a lazy attempt at a fix-patch in case that's useful.

> ---
>  mm/madvise.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 31e5df75b926..5a0a1fc99d27 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>
>  	total_len = iov_iter_count(iter);
>
> +	ret = madvise_lock(mm, behavior);
> +	if (ret)
> +		return ret;
> +
>  	while (iov_iter_count(iter)) {
> -		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
> -				 iter_iov_len(iter), behavior);
> +		unsigned long start = (unsigned long)iter_iov_addr(iter);
> +		size_t len_in = iter_iov_len(iter);
> +		size_t len;
> +
> +		if (!is_valid_madvise(start, len_in, behavior)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		len = PAGE_ALIGN(len_in);
> +		if (start + len == start)
> +			ret = 0;
> +		else
> +			ret = madvise_do_behavior(mm, start, len_in, len,
> +					behavior);
>  		/*
>  		 * An madvise operation is attempting to restart the syscall,
>  		 * but we cannot proceed as it would not be correct to repeat

This comment is now incorrect, please fix up the 2nd paragraph so it is
something like:

		/*
		 * An madvise operation is attempting to restart the syscall,
		 * but we cannot proceed as it would not be correct to repeat
		 * the operation in aggregate, and would be surprising to the
		 * user.
		 *
		 * We drop and reacquire locks so it is safe to just loop and
		 * try again. We check for fatal signals in case we need exit
		 * early anyway.
		 */

See below fixpatch if that's easier...

> @@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  				ret = -EINTR;
>  				break;
>  			}
> +
> +			/* Drop and reacquire lock to unwind race. */
> +			madvise_unlock(mm, behavior);
> +			madvise_lock(mm, behavior);

Thanks for adding!

>  			continue;
>  		}
>  		if (ret < 0)
>  			break;
>  		iov_iter_advance(iter, iter_iov_len(iter));
>  	}
> +	madvise_unlock(mm, behavior);
>
>  	ret = (total_len - iov_iter_count(iter)) ? : ret;
>
> --
> 2.39.5

For convenience I attach a lazy, untested fixpatch which may or may not work :)

----8<----
From 86e99a658a5e0195050d6bb9e19975f54bf14e7a Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Thu, 6 Feb 2025 13:02:52 +0000
Subject: [PATCH] foo

---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c8e28d51978a..ab5f5da1571f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1725,7 +1725,7 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
 		 * the operation in aggregate, and would be surprising to the
 		 * user.
 		 *
-		 * As we have already dropped locks, it is safe to just loop and
+		 * We drop and reacquire locks so it is safe to just loop and
 		 * try again. We check for fatal signals in case we need exit
 		 * early anyway.
 		 */
--
2.48.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06 13:04   ` Lorenzo Stoakes
@ 2025-02-06 16:53     ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-06 16:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: SeongJae Park, Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

On Thu, 6 Feb 2025 13:04:53 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Wed, Feb 05, 2025 at 10:15:17PM -0800, SeongJae Park wrote:
> > Optimize redundant mmap lock operations from process_madvise() by
> > directly doing the mmap locking first, and then the remaining works for
> > all ranges in the loop.
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> But please fixup the comment as below. Maybe Andrew could do it if I am
> giving enough info to go on, otherwise it could be a quick fix-patch? I
> attach a lazy attempt at a fix-patch in case that's useful.

Thank you for your reviews and this fixup!  Since Andrew already picked this
patch in mm-unstable,  I will keep my eyes on the tree and make some action if
needed, to ensure your suggestion is applied.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise()
  2025-02-06  6:15 ` [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
@ 2025-02-06 20:27   ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2025-02-06 20:27 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, David Hildenbrand, Davidlohr Bueso,
	Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

* SeongJae Park <sj@kernel.org> [250206 01:15]:
> Split out the madvise behavior-dependent mmap_lock operations from
> do_madvise(), for easier reuse of the logic in an upcoming change.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: SeongJae Park <sj@kernel.org>

Reviewed-by: Liam R. Howlett <howlett@gmail.com>

> ---
>  mm/madvise.c | 45 ++++++++++++++++++++++++++++++++-------------
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c8e28d51978a..df5a87a1846a 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1567,6 +1567,33 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
>  				 madvise_vma_anon_name);
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
> +
> +static int madvise_lock(struct mm_struct *mm, int behavior)
> +{
> +
> +#ifdef CONFIG_MEMORY_FAILURE
> +	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> +		return 0;
> +#endif
> +
> +	if (madvise_need_mmap_write(behavior)) {
> +		if (mmap_write_lock_killable(mm))
> +			return -EINTR;
> +	} else {
> +		mmap_read_lock(mm);
> +	}
> +	return 0;
> +
> +}
> +
> +static void madvise_unlock(struct mm_struct *mm, int behavior)
> +{
> +	if (madvise_need_mmap_write(behavior))
> +		mmap_write_unlock(mm);
> +	else
> +		mmap_read_unlock(mm);
> +}
> +
>  /*
>   * The madvise(2) system call.
>   *
> @@ -1643,7 +1670,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  {
>  	unsigned long end;
>  	int error;
> -	int write;
>  	size_t len;
>  	struct blk_plug plug;
>  
> @@ -1665,19 +1691,15 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  	if (end == start)
>  		return 0;
>  
> +	error = madvise_lock(mm, behavior);
> +	if (error)
> +		return error;
> +
>  #ifdef CONFIG_MEMORY_FAILURE
>  	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
>  		return madvise_inject_error(behavior, start, start + len_in);
>  #endif
>  
> -	write = madvise_need_mmap_write(behavior);
> -	if (write) {
> -		if (mmap_write_lock_killable(mm))
> -			return -EINTR;
> -	} else {
> -		mmap_read_lock(mm);
> -	}
> -
>  	start = untagged_addr_remote(mm, start);
>  	end = start + len;
>  
> @@ -1694,10 +1716,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  	}
>  	blk_finish_plug(&plug);
>  
> -	if (write)
> -		mmap_write_unlock(mm);
> -	else
> -		mmap_read_unlock(mm);
> +	madvise_unlock(mm, behavior);
>  
>  	return error;
>  }
> -- 
> 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm/madvise: split out madvise input validity check
  2025-02-06  6:15 ` [PATCH 2/4] mm/madvise: split out madvise input validity check SeongJae Park
@ 2025-02-06 20:29   ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2025-02-06 20:29 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, David Hildenbrand, Davidlohr Bueso,
	Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

* SeongJae Park <sj@kernel.org> [250206 01:15]:
> Split out the madvise parameters validation logic from do_madvise(), for
> easy reuse of the logic from a future change.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: SeongJae Park <sj@kernel.org>

Reviewed-by: Liam R. Howlett <howlett@gmail.com>

> ---
>  mm/madvise.c | 32 ++++++++++++++++++++++----------
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index df5a87a1846a..efab2878be7c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1594,6 +1594,27 @@ static void madvise_unlock(struct mm_struct *mm, int behavior)
>  		mmap_read_unlock(mm);
>  }
>  
> +static bool is_valid_madvise(unsigned long start, size_t len_in, int behavior)
> +{
> +	size_t len;
> +
> +	if (!madvise_behavior_valid(behavior))
> +		return false;
> +
> +	if (!PAGE_ALIGNED(start))
> +		return false;
> +	len = PAGE_ALIGN(len_in);
> +
> +	/* Check to see whether len was rounded up from small -ve to zero */
> +	if (len_in && !len)
> +		return false;
> +
> +	if (start + len < start)
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * The madvise(2) system call.
>   *
> @@ -1673,20 +1694,11 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  	size_t len;
>  	struct blk_plug plug;
>  
> -	if (!madvise_behavior_valid(behavior))
> +	if (!is_valid_madvise(start, len_in, behavior))
>  		return -EINVAL;
>  
> -	if (!PAGE_ALIGNED(start))
> -		return -EINVAL;
>  	len = PAGE_ALIGN(len_in);
> -
> -	/* Check to see whether len was rounded up from small -ve to zero */
> -	if (len_in && !len)
> -		return -EINVAL;
> -
>  	end = start + len;
> -	if (end < start)
> -		return -EINVAL;
>  
>  	if (end == start)
>  		return 0;
> -- 
> 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] mm/madvise: split out madvise() behavior execution
  2025-02-06  6:15 ` [PATCH 3/4] mm/madvise: split out madvise() behavior execution SeongJae Park
@ 2025-02-06 20:30   ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2025-02-06 20:30 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, David Hildenbrand, Davidlohr Bueso,
	Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

* SeongJae Park <sj@kernel.org> [250206 01:15]:
> Split out the madvise behavior applying logic from do_madvise() to make
> it easier to reuse from the following change.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: SeongJae Park <sj@kernel.org>

Reviewed-by: Liam R. Howlett <howlett@gmail.com>

> ---
>  mm/madvise.c | 53 +++++++++++++++++++++++++++++-----------------------
>  1 file changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index efab2878be7c..31e5df75b926 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1615,6 +1615,35 @@ static bool is_valid_madvise(unsigned long start, size_t len_in, int behavior)
>  	return true;
>  }
>  
> +static int madvise_do_behavior(struct mm_struct *mm,
> +		unsigned long start, size_t len_in, size_t len, int behavior)
> +{
> +	struct blk_plug plug;
> +	unsigned long end;
> +	int error;
> +
> +#ifdef CONFIG_MEMORY_FAILURE
> +	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> +		return madvise_inject_error(behavior, start, start + len_in);
> +#endif
> +	start = untagged_addr_remote(mm, start);
> +	end = start + len;
> +
> +	blk_start_plug(&plug);
> +	switch (behavior) {
> +	case MADV_POPULATE_READ:
> +	case MADV_POPULATE_WRITE:
> +		error = madvise_populate(mm, start, end, behavior);
> +		break;
> +	default:
> +		error = madvise_walk_vmas(mm, start, end, behavior,
> +					  madvise_vma_behavior);
> +		break;
> +	}
> +	blk_finish_plug(&plug);
> +	return error;
> +}
> +
>  /*
>   * The madvise(2) system call.
>   *
> @@ -1692,7 +1721,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  	unsigned long end;
>  	int error;
>  	size_t len;
> -	struct blk_plug plug;
>  
>  	if (!is_valid_madvise(start, len_in, behavior))
>  		return -EINVAL;
> @@ -1706,28 +1734,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>  	error = madvise_lock(mm, behavior);
>  	if (error)
>  		return error;
> -
> -#ifdef CONFIG_MEMORY_FAILURE
> -	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> -		return madvise_inject_error(behavior, start, start + len_in);
> -#endif
> -
> -	start = untagged_addr_remote(mm, start);
> -	end = start + len;
> -
> -	blk_start_plug(&plug);
> -	switch (behavior) {
> -	case MADV_POPULATE_READ:
> -	case MADV_POPULATE_WRITE:
> -		error = madvise_populate(mm, start, end, behavior);
> -		break;
> -	default:
> -		error = madvise_walk_vmas(mm, start, end, behavior,
> -					  madvise_vma_behavior);
> -		break;
> -	}
> -	blk_finish_plug(&plug);
> -
> +	error = madvise_do_behavior(mm, start, len_in, len, behavior);
>  	madvise_unlock(mm, behavior);
>  
>  	return error;
> -- 
> 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
  2025-02-06 13:04   ` Lorenzo Stoakes
@ 2025-02-06 20:32   ` Liam R. Howlett
  2025-02-11  5:30   ` Lai, Yi
  2 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2025-02-06 20:32 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, David Hildenbrand, Davidlohr Bueso,
	Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

* SeongJae Park <sj@kernel.org> [250206 01:15]:
> Optimize redundant mmap lock operations from process_madvise() by
> directly doing the mmap locking first, and then the remaining works for
> all ranges in the loop.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: SeongJae Park <sj@kernel.org>

It might be worth calling out the drop/reacquire in the change log as
well as the comment like Lorenzo said.

Reviewed-by: Liam R. Howlett <howlett@gmail.com>

> ---
>  mm/madvise.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 31e5df75b926..5a0a1fc99d27 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  
>  	total_len = iov_iter_count(iter);
>  
> +	ret = madvise_lock(mm, behavior);
> +	if (ret)
> +		return ret;
> +
>  	while (iov_iter_count(iter)) {
> -		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
> -				 iter_iov_len(iter), behavior);
> +		unsigned long start = (unsigned long)iter_iov_addr(iter);
> +		size_t len_in = iter_iov_len(iter);
> +		size_t len;
> +
> +		if (!is_valid_madvise(start, len_in, behavior)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		len = PAGE_ALIGN(len_in);
> +		if (start + len == start)
> +			ret = 0;
> +		else
> +			ret = madvise_do_behavior(mm, start, len_in, len,
> +					behavior);
>  		/*
>  		 * An madvise operation is attempting to restart the syscall,
>  		 * but we cannot proceed as it would not be correct to repeat
> @@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  				ret = -EINTR;
>  				break;
>  			}
> +
> +			/* Drop and reacquire lock to unwind race. */
> +			madvise_unlock(mm, behavior);
> +			madvise_lock(mm, behavior);
>  			continue;
>  		}
>  		if (ret < 0)
>  			break;
>  		iov_iter_advance(iter, iter_iov_len(iter));
>  	}
> +	madvise_unlock(mm, behavior);
>  
>  	ret = (total_len - iov_iter_count(iter)) ? : ret;
>  
> -- 
> 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
  2025-02-06 13:04   ` Lorenzo Stoakes
  2025-02-06 20:32   ` Liam R. Howlett
@ 2025-02-11  5:30   ` Lai, Yi
  2025-02-11  6:37     ` SeongJae Park
  2025-02-11 10:34     ` Lorenzo Stoakes
  2 siblings, 2 replies; 17+ messages in thread
From: Lai, Yi @ 2025-02-11  5:30 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm, yi1.lai

On Wed, Feb 05, 2025 at 10:15:17PM -0800, SeongJae Park wrote:
> Optimize redundant mmap lock operations from process_madvise() by
> directly doing the mmap locking first, and then the remaining works for
> all ranges in the loop.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
>  mm/madvise.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 31e5df75b926..5a0a1fc99d27 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  
>  	total_len = iov_iter_count(iter);
>  
> +	ret = madvise_lock(mm, behavior);
> +	if (ret)
> +		return ret;
> +
>  	while (iov_iter_count(iter)) {
> -		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
> -				 iter_iov_len(iter), behavior);
> +		unsigned long start = (unsigned long)iter_iov_addr(iter);
> +		size_t len_in = iter_iov_len(iter);
> +		size_t len;
> +
> +		if (!is_valid_madvise(start, len_in, behavior)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		len = PAGE_ALIGN(len_in);
> +		if (start + len == start)
> +			ret = 0;
> +		else
> +			ret = madvise_do_behavior(mm, start, len_in, len,
> +					behavior);
>  		/*
>  		 * An madvise operation is attempting to restart the syscall,
>  		 * but we cannot proceed as it would not be correct to repeat
> @@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  				ret = -EINTR;
>  				break;
>  			}
> +
> +			/* Drop and reacquire lock to unwind race. */
> +			madvise_unlock(mm, behavior);
> +			madvise_lock(mm, behavior);
>  			continue;
>  		}
>  		if (ret < 0)
>  			break;
>  		iov_iter_advance(iter, iter_iov_len(iter));
>  	}
> +	madvise_unlock(mm, behavior);
>  
>  	ret = (total_len - iov_iter_count(iter)) ? : ret;
> 

Hi SeongJae Park,

Greetings!

I used Syzkaller and found that there is WARNING in madvise_unlock in linux-next tag - next-20250210.

After bisection and the first bad commit is:
"
ec68fbd9e99f mm/madvise: remove redundant mmap_lock operations from process_madvise()
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250210_144836_madvise_unlock/bzImage_df5d6180169ae06a2eac57e33b077ad6f6252440
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250210_144836_madvise_unlock/df5d6180169ae06a2eac57e33b077ad6f6252440_dmesg.log

"
[  135.191347] Injecting memory failure for pfn 0x8ea0 at process virtual address 0x20e7f000
[  135.194964] Memory failure: 0x8ea0: recovery action for reserved kernel page: Ignored
[  135.195584] ------------[ cut here ]------------
[  135.195863] WARNING: CPU: 1 PID: 680 at ./include/linux/rwsem.h:203 madvise_unlock+0x17e/0x1a0
[  135.196395] Modules linked in:
[  135.196612] CPU: 1 UID: 0 PID: 680 Comm: repro Not tainted 6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.197135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.197818] RIP: 0010:madvise_unlock+0x17e/0x1a0
[  135.198108] Code: a1 9f ff 31 f6 49 8d bc 24 e0 01 00 00 e8 fa 80 d5 03 31 ff 89 c3 89 c6 e8 9f 9b 9f ff 85 db 0f 85 1c ff ff ffe
[  135.199154] RSP: 0018:ffff888022647e88 EFLAGS: 00010293
[  135.199468] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81e878f1
[  135.199876] RDX: ffff88802264a540 RSI: ffffffff81e878fe RDI: 0000000000000005
[  135.200285] RBP: ffff888022647ea0 R08: 0000000000000001 R09: ffffed10044c8f25
[  135.200692] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888021116180
[  135.201100] R13: ffff8880211162f0 R14: 0000000000004000 R15: ffff888021116180
[  135.201531] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
[  135.201996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.202329] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
[  135.202737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.203155] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  135.203563] PKRU: 55555554
[  135.203731] Call Trace:
[  135.203883]  <TASK>
[  135.204020]  ? show_regs+0x6d/0x80
[  135.204240]  ? __warn+0xf3/0x390
[  135.204447]  ? report_bug+0x25e/0x4b0
[  135.204692]  ? madvise_unlock+0x17e/0x1a0
[  135.204937]  ? report_bug+0x2cb/0x4b0
[  135.205167]  ? madvise_unlock+0x17e/0x1a0
[  135.205413]  ? madvise_unlock+0x17f/0x1a0
[  135.205706]  ? handle_bug+0xf1/0x190
[  135.206188]  ? exc_invalid_op+0x3c/0x80
[  135.206423]  ? asm_exc_invalid_op+0x1f/0x30
[  135.206678]  ? madvise_unlock+0x171/0x1a0
[  135.206919]  ? madvise_unlock+0x17e/0x1a0
[  135.207156]  ? madvise_unlock+0x17e/0x1a0
[  135.207388]  ? madvise_unlock+0x17e/0x1a0
[  135.207623]  do_madvise+0x14f/0x1a0
[  135.207836]  __x64_sys_madvise+0xb2/0x120
[  135.208067]  ? syscall_trace_enter+0x14f/0x280
[  135.208328]  x64_sys_call+0x19b1/0x2150
[  135.208552]  do_syscall_64+0x6d/0x140
[  135.208766]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.209050] RIP: 0033:0x7f032763ee5d
[  135.209261] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.210261] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.210678] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.211069] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.211458] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.211851] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.212248] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.212656]  </TASK>
[  135.212794] irq event stamp: 807
[  135.212987] hardirqs last  enabled at (815): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
[  135.213477] hardirqs last disabled at (824): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
[  135.213951] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.214426] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.214910] ---[ end trace 0000000000000000 ]---
[  135.215182]
[  135.215283] =====================================
[  135.215550] WARNING: bad unlock balance detected!
[  135.215817] 6.14.0-rc2-next-20250210-df5d6180169a #1 Tainted: G        W
[  135.216234] -------------------------------------
[  135.216502] repro/680 is trying to release lock (&mm->mmap_lock) at:
[  135.216863] [<ffffffff81e87854>] madvise_unlock+0xd4/0x1a0
[  135.217179] but there are no more locks to release!
[  135.217457]
[  135.217457] other info that might help us debug this:
[  135.217819] no locks held by repro/680.
[  135.218046]
[  135.218046] stack backtrace:
[  135.218296] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.218308] Tainted: [W]=WARN
[  135.218310] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.218315] Call Trace:
[  135.218317]  <TASK>
[  135.218320]  dump_stack_lvl+0xea/0x150
[  135.218330]  ? madvise_unlock+0xd4/0x1a0
[  135.218341]  dump_stack+0x19/0x20
[  135.218349]  print_unlock_imbalance_bug+0x1b5/0x200
[  135.218368]  ? madvise_unlock+0xd4/0x1a0
[  135.218379]  lock_release+0x5bc/0x870
[  135.218386]  ? madvise_unlock+0x17f/0x1a0
[  135.218397]  ? handle_bug+0xf1/0x190
[  135.218407]  ? __pfx_lock_release+0x10/0x10
[  135.218415]  ? exc_invalid_op+0x3c/0x80
[  135.218426]  ? asm_exc_invalid_op+0x1f/0x30
[  135.218441]  up_write+0x31/0x550
[  135.218449]  ? madvise_unlock+0x17e/0x1a0
[  135.218462]  madvise_unlock+0xd4/0x1a0
[  135.218474]  do_madvise+0x14f/0x1a0
[  135.218487]  __x64_sys_madvise+0xb2/0x120
[  135.218500]  ? syscall_trace_enter+0x14f/0x280
[  135.218511]  x64_sys_call+0x19b1/0x2150
[  135.218522]  do_syscall_64+0x6d/0x140
[  135.218531]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.218542] RIP: 0033:0x7f032763ee5d
[  135.218548] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.218556] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.218562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.218569] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.218573] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.218578] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.218582] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.218594]  </TASK>
[  135.228272] ------------[ cut here ]------------
[  135.228541] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magy
[  135.229541] WARNING: CPU: 1 PID: 680 at kernel/locking/rwsem.c:1367 up_write+0x451/0x550
[  135.229997] Modules linked in:
[  135.230182] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.230758] Tainted: [W]=WARN
[  135.230940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.231565] RIP: 0010:up_write+0x451/0x550
[  135.231806] Code: ea 03 80 3c 02 00 0f 85 d5 00 00 00 49 8b 14 24 53 4d 89 f9 4c 89 f1 48 c7 c6 40 bd eb 85 48 c7 c7 60 bc eb 858
[  135.232814] RSP: 0018:ffff888022647e30 EFLAGS: 00010282
[  135.233113] RAX: 0000000000000000 RBX: ffffffff85ebbba0 RCX: ffffffff8146cfd3
[  135.233528] RDX: ffff88802264a540 RSI: ffffffff8146cfe0 RDI: 0000000000000001
[  135.233929] RBP: ffff888022647e78 R08: 0000000000000001 R09: ffffed10044c8f66
[  135.234325] R10: 0000000000000001 R11: 57525f4755424544 R12: ffff8880211162f0
[  135.234722] R13: ffff8880211162f8 R14: ffff8880211162f0 R15: ffff88802264a540
[  135.235122] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
[  135.235584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.235912] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
[  135.236314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.236713] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  135.237114] PKRU: 55555554
[  135.237276] Call Trace:
[  135.237445]  <TASK>
[  135.237580]  ? show_regs+0x6d/0x80
[  135.237789]  ? __warn+0xf3/0x390
[  135.237987]  ? find_bug+0x310/0x490
[  135.238199]  ? up_write+0x451/0x550
[  135.238412]  ? report_bug+0x2cb/0x4b0
[  135.238636]  ? up_write+0x451/0x550
[  135.238853]  ? up_write+0x452/0x550
[  135.239065]  ? handle_bug+0xf1/0x190
[  135.239285]  ? exc_invalid_op+0x3c/0x80
[  135.239515]  ? asm_exc_invalid_op+0x1f/0x30
[  135.239768]  ? __warn_printk+0x173/0x2e0
[  135.240001]  ? __warn_printk+0x180/0x2e0
[  135.240235]  ? up_write+0x451/0x550
[  135.240445]  ? madvise_unlock+0x17e/0x1a0
[  135.240686]  madvise_unlock+0xd4/0x1a0
[  135.240913]  do_madvise+0x14f/0x1a0
[  135.241126]  __x64_sys_madvise+0xb2/0x120
[  135.241364]  ? syscall_trace_enter+0x14f/0x280
[  135.241647]  x64_sys_call+0x19b1/0x2150
[  135.241887]  do_syscall_64+0x6d/0x140
[  135.242113]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.242407] RIP: 0033:0x7f032763ee5d
[  135.242619] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.243646] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.244075] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.244476] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.244877] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.245279] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.245696] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.246106]  </TASK>
[  135.246242] irq event stamp: 857
[  135.246434] hardirqs last  enabled at (857): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
[  135.246915] hardirqs last disabled at (856): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
[  135.247392] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.247871] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.248347] ---[ end trace 0000000000000000 ]---
"

Hope this cound be insightful to you.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

> -- 
> 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-11  5:30   ` Lai, Yi
@ 2025-02-11  6:37     ` SeongJae Park
  2025-02-11 10:34     ` Lorenzo Stoakes
  1 sibling, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-11  6:37 UTC (permalink / raw)
  To: Lai, Yi
  Cc: SeongJae Park, Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm, yi1.lai

Hello Lai,

On Tue, 11 Feb 2025 13:30:49 +0800 "Lai, Yi" <yi1.lai@linux.intel.com> wrote:

[...]
> Hi SeongJae Park,
> 
> Greetings!
> 
> I used Syzkaller and found that there is WARNING in madvise_unlock in linux-next tag - next-20250210.

Thank you so much for this nice report!  I just sent a fix:
https://lore.kernel.org/20250211063201.5106-1-sj@kernel.org

> 
> After bisection and the first bad commit is:
> "
> ec68fbd9e99f mm/madvise: remove redundant mmap_lock operations from process_madvise()
> "

Nonetheless, I think the real first bad commit is f19c9d7b57cf ("mm/madvise:
split out madvise() behavior execution").  I confirmed I can reproduce the
issue using your reproducer on the commit.  And I think the fix may better to
be squashed into an earlier commit, 948a0a9ea070 ("mm/madvise: split out mmap
locking operations for madvise()").  Please refer to the fix for details about
why I think so, and let me know if anything seems wrong.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
                   ` (3 preceding siblings ...)
  2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
@ 2025-02-11  8:48 ` Vern Hao
  2025-02-11 18:28   ` SeongJae Park
  4 siblings, 1 reply; 17+ messages in thread
From: Vern Hao @ 2025-02-11  8:48 UTC (permalink / raw)
  To: SeongJae Park, Andrew Morton
  Cc: Liam R. Howlett, David Hildenbrand, Davidlohr Bueso,
	Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 4828 bytes --]


On 2025/2/6 14:15, SeongJae Park wrote:
> process_madvise() calls do_madvise() for each address range.  Then, each
> do_madvise() invocation holds and releases same mmap_lock.  Optimize the
> redundant lock operations by splitting do_madvise() internal logics
> including the mmap_lock operations, and calling the small logics
> directly from process_madvise() in a sequence that removes the redundant
> locking.  As a result of this change, process_madvise() becomes more
> efficient and less racy in terms of its results and latency.
>
> Note that the potential downside of this series is that other mmap_lock
> holders may take more time due to the increased length of mmap_lock
> critical section for process_madvise() calls.  But there is maximum
> limit in the kernel space (IOV_MAX), and the user space can control the
> critical section length by setting the request size.  Hence, the
> downside would be limited and controllable.
>
> Evaluation
> ==========
>
> I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> using multiple madvise() calls, 4 KiB per each call.  I also do the same
> with process_madvise(), but with varying batch size (vlen) from 1 to
> 1024.  The source code for the measurement is available at GitHub[1].
> Because the microbenchmark result is not that stable, I ran each
> configuration five times and use the average.
>
> The measurement results are as below.  'sz_batches' column shows the
> batch size of process_madvise() calls.  '0' batch size is for madvise()
> calls case.
Hi,  i just wonder why these patches can reduce latency time on call 
madvise() DONT_NEED.
>   'before' and 'after' columns are the measured time to apply
> MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
> that built without and with the last patch of this series, respectively.
> So lower value means better efficiency.  'after/before' column is the
> ratio of 'after' to 'before'.
>
>      sz_batches  before       after        after/before
>      0           146294215.2  121280536.2  0.829017989769427
>      1           165851018.8  136305598.2  0.821855658085351
>      2           129469321.2  103740383.6  0.801273866569094
>      4           110369232.4  87835896.2   0.795836795182785
>      8           102906232.4  77420920.2   0.752344327397609
>      16          97551017.4   74959714.4   0.768415506038587
>      32          94809848.2   71200848.4   0.750985786305689
>      64          96087575.6   72593180     0.755489765942227
>      128         96154163.8   68517055.4   0.712575022154163
>      256         92901257.6   69054216.6   0.743307662177439
>      512         93646170.8   67053296.2   0.716028168874151
>      1024        92663219.2   70168196.8   0.75723892830177
>
> In despite of the unstable nature of the tet program, the trend is
> somewhat we can expect.  The measurement shows this patch reduces the
> process_madvise() latency, proportional to the batching size.  The
> latency gain was about 20% with the batch size 2, and it has increased
> to about 28% with the batch size 512, since more number of mmap locking
> is reduced with larger batch size.
>
> Note that the standard devitation of the measurements for each
> sz_batches configuration was ranging from 1.9% to 7.2%.  That is, this
> result is still not very stable.  The average of the standard deviations
> for different batch sizes were 4.62% and 4.70% for the 'before' and
> 'after' kernel measurements.
>
> Also note that this patch has somehow decreased latencies of madvise()
> and single batch size process_madvise().  Seems this code path is small
> enough to significantly be affected by compiler optimizations including
> inlining of split-out functions.  Please focus on only the improvement
> amount that changed by the batch size.
>
> Changelog
> =========
>
> Changes from RFC v2
> (https://lore.kernel.org/20250117013058.1843-1-sj@kernel.org)
> - Release and acquire mmap lock again when a race-caused failure happens
>    (Lorenzo Stoakes)
> - Collected Reviewed-by: tags from Shakeel, Lorenzo and Davidlohr.
>
> Changes from RFC v1
> (https://lore.kernel.org/20250111004618.1566-1-sj@kernel.org)
> - Split out do_madvise() and use those from vector_madvise(), instead of
>    adding a flag to do_madvise() (Liam R. Howlett)
>
> [1]https://github.com/sjp38/eval_proc_madvise
>
> SeongJae Park (4):
>    mm/madvise: split out mmap locking operations for madvise()
>    mm/madvise: split out madvise input validity check
>    mm/madvise: split out madvise() behavior execution
>    mm/madvise: remove redundant mmap_lock operations from
>      process_madvise()
>
>   mm/madvise.c | 154 +++++++++++++++++++++++++++++++++++----------------
>   1 file changed, 107 insertions(+), 47 deletions(-)
>
>
> base-commit: f104b8534d19f31443a4fe6cb701bdb15fd931eb

[-- Attachment #2: Type: text/html, Size: 5484 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-11  5:30   ` Lai, Yi
  2025-02-11  6:37     ` SeongJae Park
@ 2025-02-11 10:34     ` Lorenzo Stoakes
  2025-02-11 18:32       ` SeongJae Park
  1 sibling, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2025-02-11 10:34 UTC (permalink / raw)
  To: Lai, Yi
  Cc: SeongJae Park, Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Shakeel Butt, Vlastimil Babka, linux-kernel,
	linux-mm, yi1.lai, Naresh Kamboju, Arnd Bergmann

+cc Naresh, as [0] does not appear to be in lore which seems to be having issues
(meaning I can't reply to that thread at all), also +cc Arnd for his reply in [1].

[0]:https://lwn.net/ml/linux-mm/CA+G9fYt5QwJ4_F8fJj7jx9_0Le9kOVSeG38ox9qnKqwsrDdvHQ@mail.gmail.com/
[1]:https://lwn.net/ml/linux-mm/fa1a7a10-f892-4e7e-acb4-0b058aa53d88@app.fastmail.com/

The report here from Yi Lai (thanks for that!) appears to be the same thing.

SJ - I think the issue is we're unconditionally trying to unlock even in the
case of MADV_HWPOISON, MADV_SOFT_OFFLINE. E.g.:

static int madvise_lock(struct mm_struct *mm, int behavior)
{

#ifdef CONFIG_MEMORY_FAILURE
	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
		return 0;
#endif
	...
}

But on unlock...

static void madvise_unlock(struct mm_struct *mm, int behavior)
{
	if (madvise_need_mmap_write(behavior))
		mmap_write_unlock(mm);
	else
		mmap_read_unlock(mm);
}

So we just need to insert an equivalent line (ideally abstracted to a helper) on
unlock also.

OK having just typed that I realise SJ sent a fix already. Lore being broken
means I can't link to it. Sigh.

Will reply there.


On Tue, Feb 11, 2025 at 01:30:49PM +0800, Lai, Yi wrote:
> On Wed, Feb 05, 2025 at 10:15:17PM -0800, SeongJae Park wrote:
> > Optimize redundant mmap lock operations from process_madvise() by
> > directly doing the mmap locking first, and then the remaining works for
> > all ranges in the loop.
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> > ---
> >  mm/madvise.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 31e5df75b926..5a0a1fc99d27 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
> >
> >  	total_len = iov_iter_count(iter);
> >
> > +	ret = madvise_lock(mm, behavior);
> > +	if (ret)
> > +		return ret;
> > +
> >  	while (iov_iter_count(iter)) {
> > -		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
> > -				 iter_iov_len(iter), behavior);
> > +		unsigned long start = (unsigned long)iter_iov_addr(iter);
> > +		size_t len_in = iter_iov_len(iter);
> > +		size_t len;
> > +
> > +		if (!is_valid_madvise(start, len_in, behavior)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		len = PAGE_ALIGN(len_in);
> > +		if (start + len == start)
> > +			ret = 0;
> > +		else
> > +			ret = madvise_do_behavior(mm, start, len_in, len,
> > +					behavior);
> >  		/*
> >  		 * An madvise operation is attempting to restart the syscall,
> >  		 * but we cannot proceed as it would not be correct to repeat
> > @@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
> >  				ret = -EINTR;
> >  				break;
> >  			}
> > +
> > +			/* Drop and reacquire lock to unwind race. */
> > +			madvise_unlock(mm, behavior);
> > +			madvise_lock(mm, behavior);
> >  			continue;
> >  		}
> >  		if (ret < 0)
> >  			break;
> >  		iov_iter_advance(iter, iter_iov_len(iter));
> >  	}
> > +	madvise_unlock(mm, behavior);
> >
> >  	ret = (total_len - iov_iter_count(iter)) ? : ret;
> >
>
> Hi SeongJae Park,
>
> Greetings!
>
> I used Syzkaller and found that there is WARNING in madvise_unlock in linux-next tag - next-20250210.
>
> After bisection and the first bad commit is:
> "
> ec68fbd9e99f mm/madvise: remove redundant mmap_lock operations from process_madvise()
> "
>
> All detailed into can be found at:
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock
> Syzkaller repro code:
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.c
> Syzkaller repro syscall steps:
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.prog
> Syzkaller report:
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.report
> Kconfig(make olddefconfig):
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/kconfig_origin
> Bisect info:
> https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/bisect_info.log
> bzImage:
> https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250210_144836_madvise_unlock/bzImage_df5d6180169ae06a2eac57e33b077ad6f6252440
> Issue dmesg:
> https://github.com/laifryiee/syzkaller_logs/blob/main/250210_144836_madvise_unlock/df5d6180169ae06a2eac57e33b077ad6f6252440_dmesg.log
>
> "
> [  135.191347] Injecting memory failure for pfn 0x8ea0 at process virtual address 0x20e7f000
> [  135.194964] Memory failure: 0x8ea0: recovery action for reserved kernel page: Ignored
> [  135.195584] ------------[ cut here ]------------
> [  135.195863] WARNING: CPU: 1 PID: 680 at ./include/linux/rwsem.h:203 madvise_unlock+0x17e/0x1a0
> [  135.196395] Modules linked in:
> [  135.196612] CPU: 1 UID: 0 PID: 680 Comm: repro Not tainted 6.14.0-rc2-next-20250210-df5d6180169a #1
> [  135.197135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  135.197818] RIP: 0010:madvise_unlock+0x17e/0x1a0
> [  135.198108] Code: a1 9f ff 31 f6 49 8d bc 24 e0 01 00 00 e8 fa 80 d5 03 31 ff 89 c3 89 c6 e8 9f 9b 9f ff 85 db 0f 85 1c ff ff ffe
> [  135.199154] RSP: 0018:ffff888022647e88 EFLAGS: 00010293
> [  135.199468] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81e878f1
> [  135.199876] RDX: ffff88802264a540 RSI: ffffffff81e878fe RDI: 0000000000000005
> [  135.200285] RBP: ffff888022647ea0 R08: 0000000000000001 R09: ffffed10044c8f25
> [  135.200692] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888021116180
> [  135.201100] R13: ffff8880211162f0 R14: 0000000000004000 R15: ffff888021116180
> [  135.201531] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
> [  135.201996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  135.202329] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
> [  135.202737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  135.203155] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  135.203563] PKRU: 55555554
> [  135.203731] Call Trace:
> [  135.203883]  <TASK>
> [  135.204020]  ? show_regs+0x6d/0x80
> [  135.204240]  ? __warn+0xf3/0x390
> [  135.204447]  ? report_bug+0x25e/0x4b0
> [  135.204692]  ? madvise_unlock+0x17e/0x1a0
> [  135.204937]  ? report_bug+0x2cb/0x4b0
> [  135.205167]  ? madvise_unlock+0x17e/0x1a0
> [  135.205413]  ? madvise_unlock+0x17f/0x1a0
> [  135.205706]  ? handle_bug+0xf1/0x190
> [  135.206188]  ? exc_invalid_op+0x3c/0x80
> [  135.206423]  ? asm_exc_invalid_op+0x1f/0x30
> [  135.206678]  ? madvise_unlock+0x171/0x1a0
> [  135.206919]  ? madvise_unlock+0x17e/0x1a0
> [  135.207156]  ? madvise_unlock+0x17e/0x1a0
> [  135.207388]  ? madvise_unlock+0x17e/0x1a0
> [  135.207623]  do_madvise+0x14f/0x1a0
> [  135.207836]  __x64_sys_madvise+0xb2/0x120
> [  135.208067]  ? syscall_trace_enter+0x14f/0x280
> [  135.208328]  x64_sys_call+0x19b1/0x2150
> [  135.208552]  do_syscall_64+0x6d/0x140
> [  135.208766]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  135.209050] RIP: 0033:0x7f032763ee5d
> [  135.209261] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
> [  135.210261] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
> [  135.210678] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
> [  135.211069] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
> [  135.211458] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
> [  135.211851] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
> [  135.212248] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
> [  135.212656]  </TASK>
> [  135.212794] irq event stamp: 807
> [  135.212987] hardirqs last  enabled at (815): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
> [  135.213477] hardirqs last disabled at (824): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
> [  135.213951] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
> [  135.214426] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
> [  135.214910] ---[ end trace 0000000000000000 ]---
> [  135.215182]
> [  135.215283] =====================================
> [  135.215550] WARNING: bad unlock balance detected!
> [  135.215817] 6.14.0-rc2-next-20250210-df5d6180169a #1 Tainted: G        W
> [  135.216234] -------------------------------------
> [  135.216502] repro/680 is trying to release lock (&mm->mmap_lock) at:
> [  135.216863] [<ffffffff81e87854>] madvise_unlock+0xd4/0x1a0
> [  135.217179] but there are no more locks to release!
> [  135.217457]
> [  135.217457] other info that might help us debug this:
> [  135.217819] no locks held by repro/680.
> [  135.218046]
> [  135.218046] stack backtrace:
> [  135.218296] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
> [  135.218308] Tainted: [W]=WARN
> [  135.218310] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  135.218315] Call Trace:
> [  135.218317]  <TASK>
> [  135.218320]  dump_stack_lvl+0xea/0x150
> [  135.218330]  ? madvise_unlock+0xd4/0x1a0
> [  135.218341]  dump_stack+0x19/0x20
> [  135.218349]  print_unlock_imbalance_bug+0x1b5/0x200
> [  135.218368]  ? madvise_unlock+0xd4/0x1a0
> [  135.218379]  lock_release+0x5bc/0x870
> [  135.218386]  ? madvise_unlock+0x17f/0x1a0
> [  135.218397]  ? handle_bug+0xf1/0x190
> [  135.218407]  ? __pfx_lock_release+0x10/0x10
> [  135.218415]  ? exc_invalid_op+0x3c/0x80
> [  135.218426]  ? asm_exc_invalid_op+0x1f/0x30
> [  135.218441]  up_write+0x31/0x550
> [  135.218449]  ? madvise_unlock+0x17e/0x1a0
> [  135.218462]  madvise_unlock+0xd4/0x1a0
> [  135.218474]  do_madvise+0x14f/0x1a0
> [  135.218487]  __x64_sys_madvise+0xb2/0x120
> [  135.218500]  ? syscall_trace_enter+0x14f/0x280
> [  135.218511]  x64_sys_call+0x19b1/0x2150
> [  135.218522]  do_syscall_64+0x6d/0x140
> [  135.218531]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  135.218542] RIP: 0033:0x7f032763ee5d
> [  135.218548] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
> [  135.218556] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
> [  135.218562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
> [  135.218569] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
> [  135.218573] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
> [  135.218578] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
> [  135.218582] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
> [  135.218594]  </TASK>
> [  135.228272] ------------[ cut here ]------------
> [  135.228541] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magy
> [  135.229541] WARNING: CPU: 1 PID: 680 at kernel/locking/rwsem.c:1367 up_write+0x451/0x550
> [  135.229997] Modules linked in:
> [  135.230182] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
> [  135.230758] Tainted: [W]=WARN
> [  135.230940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  135.231565] RIP: 0010:up_write+0x451/0x550
> [  135.231806] Code: ea 03 80 3c 02 00 0f 85 d5 00 00 00 49 8b 14 24 53 4d 89 f9 4c 89 f1 48 c7 c6 40 bd eb 85 48 c7 c7 60 bc eb 858
> [  135.232814] RSP: 0018:ffff888022647e30 EFLAGS: 00010282
> [  135.233113] RAX: 0000000000000000 RBX: ffffffff85ebbba0 RCX: ffffffff8146cfd3
> [  135.233528] RDX: ffff88802264a540 RSI: ffffffff8146cfe0 RDI: 0000000000000001
> [  135.233929] RBP: ffff888022647e78 R08: 0000000000000001 R09: ffffed10044c8f66
> [  135.234325] R10: 0000000000000001 R11: 57525f4755424544 R12: ffff8880211162f0
> [  135.234722] R13: ffff8880211162f8 R14: ffff8880211162f0 R15: ffff88802264a540
> [  135.235122] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
> [  135.235584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  135.235912] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
> [  135.236314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  135.236713] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  135.237114] PKRU: 55555554
> [  135.237276] Call Trace:
> [  135.237445]  <TASK>
> [  135.237580]  ? show_regs+0x6d/0x80
> [  135.237789]  ? __warn+0xf3/0x390
> [  135.237987]  ? find_bug+0x310/0x490
> [  135.238199]  ? up_write+0x451/0x550
> [  135.238412]  ? report_bug+0x2cb/0x4b0
> [  135.238636]  ? up_write+0x451/0x550
> [  135.238853]  ? up_write+0x452/0x550
> [  135.239065]  ? handle_bug+0xf1/0x190
> [  135.239285]  ? exc_invalid_op+0x3c/0x80
> [  135.239515]  ? asm_exc_invalid_op+0x1f/0x30
> [  135.239768]  ? __warn_printk+0x173/0x2e0
> [  135.240001]  ? __warn_printk+0x180/0x2e0
> [  135.240235]  ? up_write+0x451/0x550
> [  135.240445]  ? madvise_unlock+0x17e/0x1a0
> [  135.240686]  madvise_unlock+0xd4/0x1a0
> [  135.240913]  do_madvise+0x14f/0x1a0
> [  135.241126]  __x64_sys_madvise+0xb2/0x120
> [  135.241364]  ? syscall_trace_enter+0x14f/0x280
> [  135.241647]  x64_sys_call+0x19b1/0x2150
> [  135.241887]  do_syscall_64+0x6d/0x140
> [  135.242113]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  135.242407] RIP: 0033:0x7f032763ee5d
> [  135.242619] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
> [  135.243646] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
> [  135.244075] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
> [  135.244476] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
> [  135.244877] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
> [  135.245279] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
> [  135.245696] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
> [  135.246106]  </TASK>
> [  135.246242] irq event stamp: 857
> [  135.246434] hardirqs last  enabled at (857): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
> [  135.246915] hardirqs last disabled at (856): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
> [  135.247392] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
> [  135.247871] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
> [  135.248347] ---[ end trace 0000000000000000 ]---
> "
>
> Hope this cound be insightful to you.
>
> Regards,
> Yi Lai
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one reproduced environment, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>   // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>   // You could change the bzImage_xxx as you want
>   // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage           //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> yum -y install libslirp-devel.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> make
> make install
>
> > --
> > 2.39.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-11  8:48 ` [PATCH 0/4] " Vern Hao
@ 2025-02-11 18:28   ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-11 18:28 UTC (permalink / raw)
  To: Vern Hao
  Cc: SeongJae Park, Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Davidlohr Bueso, Lorenzo Stoakes, Shakeel Butt, Vlastimil Babka,
	linux-kernel, linux-mm

Hi Vern,

On Tue, 11 Feb 2025 16:48:06 +0800 Vern Hao <haoxing990@gmail.com> wrote:

> 
> On 2025/2/6 14:15, SeongJae Park wrote:
> > process_madvise() calls do_madvise() for each address range.  Then, each
> > do_madvise() invocation holds and releases same mmap_lock.  Optimize the
> > redundant lock operations by splitting do_madvise() internal logics
> > including the mmap_lock operations, and calling the small logics
> > directly from process_madvise() in a sequence that removes the redundant
> > locking.  As a result of this change, process_madvise() becomes more
> > efficient and less racy in terms of its results and latency.
[...]
> >
> > Evaluation
> > ==========
> >
[...]
> > The measurement results are as below.  'sz_batches' column shows the
> > batch size of process_madvise() calls.  '0' batch size is for madvise()
> > calls case.
> Hi,  i just wonder why these patches can reduce latency time on call 
> madvise() DONT_NEED.

Thank you for asking this!

> >   'before' and 'after' columns are the measured time to apply
> > MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
> > that built without and with the last patch of this series, respectively.
> > So lower value means better efficiency.  'after/before' column is the
> > ratio of 'after' to 'before'.
> >
> >      sz_batches  before       after        after/before
> >      0           146294215.2  121280536.2  0.829017989769427
> >      1           165851018.8  136305598.2  0.821855658085351
> >      2           129469321.2  103740383.6  0.801273866569094
> >      4           110369232.4  87835896.2   0.795836795182785
> >      8           102906232.4  77420920.2   0.752344327397609
> >      16          97551017.4   74959714.4   0.768415506038587
> >      32          94809848.2   71200848.4   0.750985786305689
> >      64          96087575.6   72593180     0.755489765942227
> >      128         96154163.8   68517055.4   0.712575022154163
> >      256         92901257.6   69054216.6   0.743307662177439
> >      512         93646170.8   67053296.2   0.716028168874151
> >      1024        92663219.2   70168196.8   0.75723892830177
[...]
> > Also note that this patch has somehow decreased latencies of madvise()
> > and single batch size process_madvise().  Seems this code path is small
> > enough to significantly be affected by compiler optimizations including
> > inlining of split-out functions.  Please focus on only the improvement
> > amount that changed by the batch size.

I believe the above paragraph may answer your question.  Please let me know if
not.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
  2025-02-11 10:34     ` Lorenzo Stoakes
@ 2025-02-11 18:32       ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2025-02-11 18:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: SeongJae Park, Lai, Yi, Andrew Morton, Liam R. Howlett,
	David Hildenbrand, Davidlohr Bueso, Shakeel Butt,
	Vlastimil Babka, linux-kernel, linux-mm, yi1.lai, Naresh Kamboju,
	Arnd Bergmann

On Tue, 11 Feb 2025 10:34:19 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> +cc Naresh, as [0] does not appear to be in lore which seems to be having issues
> (meaning I can't reply to that thread at all), also +cc Arnd for his reply in [1].
> 
> [0]:https://lwn.net/ml/linux-mm/CA+G9fYt5QwJ4_F8fJj7jx9_0Le9kOVSeG38ox9qnKqwsrDdvHQ@mail.gmail.com/
> [1]:https://lwn.net/ml/linux-mm/fa1a7a10-f892-4e7e-acb4-0b058aa53d88@app.fastmail.com/
> 
> The report here from Yi Lai (thanks for that!) appears to be the same thing.
[...]
> OK having just typed that I realise SJ sent a fix already. Lore being broken
> means I can't link to it. Sigh.
> 
> Will reply there.

No worry, and yes, let's continue discussion on the thread.  Seems lore is also
working again, so for others' reference, the thread is at
https://lore.kernel.org/20250211063201.5106-1-sj@kernel.org


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-02-11 18:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-06  6:15 [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-02-06  6:15 ` [PATCH 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
2025-02-06 20:27   ` Liam R. Howlett
2025-02-06  6:15 ` [PATCH 2/4] mm/madvise: split out madvise input validity check SeongJae Park
2025-02-06 20:29   ` Liam R. Howlett
2025-02-06  6:15 ` [PATCH 3/4] mm/madvise: split out madvise() behavior execution SeongJae Park
2025-02-06 20:30   ` Liam R. Howlett
2025-02-06  6:15 ` [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-02-06 13:04   ` Lorenzo Stoakes
2025-02-06 16:53     ` SeongJae Park
2025-02-06 20:32   ` Liam R. Howlett
2025-02-11  5:30   ` Lai, Yi
2025-02-11  6:37     ` SeongJae Park
2025-02-11 10:34     ` Lorenzo Stoakes
2025-02-11 18:32       ` SeongJae Park
2025-02-11  8:48 ` [PATCH 0/4] " Vern Hao
2025-02-11 18:28   ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox