linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks
@ 2023-07-06  1:13 Suren Baghdasaryan
  2023-07-06  1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
  2023-07-06  1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
  0 siblings, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06  1:13 UTC (permalink / raw)
  To: akpm
  Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
	mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
	peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
	peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
	punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
	axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
	edumazet, gthelen, linux-mm, linux-kernel, stable,
	Suren Baghdasaryan

A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking
while forking a child process.

Patch 1/2 in the series implements proper VMA locking during fork.
I tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.

Patch 2/2 disables per-VMA locks until the fix is tested and verified.

Both patches apply cleanly over Linus' ToT and stable 6.4.y branch.

Changes from v3 posted at [3]:
- Replace vma_iter_init with vma_iter_set, per Liam R. Howlett
- Update the regression number caused by additional VMA tree walk

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
[3] https://lore.kernel.org/all/20230705171213.2843068-1-surenb@google.com

Suren Baghdasaryan (2):
  fork: lock VMAs of the parent process when forking
  mm: disable CONFIG_PER_VMA_LOCK until its fixed

 kernel/fork.c | 6 ++++++
 mm/Kconfig    | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
  2023-07-06  1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
@ 2023-07-06  1:13 ` Suren Baghdasaryan
  2023-07-06 13:18   ` Liam R. Howlett
  2023-07-06 13:20   ` David Hildenbrand
  2023-07-06  1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
  1 sibling, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06  1:13 UTC (permalink / raw)
  To: akpm
  Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
	mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
	peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
	peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
	punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
	axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
	edumazet, gthelen, linux-mm, linux-kernel, stable,
	Suren Baghdasaryan

When forking a child process, parent write-protects an anonymous page
and COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to
some stale writable TLB entries targeting the wrong (old) page.
Similar issue happened in the past with userfaultfd (see flush_tlb_page()
call inside do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 kernel/fork.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index b85814e614a5..2ba918f83bde 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		retval = -EINTR;
 		goto fail_uprobe_end;
 	}
+#ifdef CONFIG_PER_VMA_LOCK
+	/* Disallow any page faults before calling flush_cache_dup_mm */
+	for_each_vma(old_vmi, mpnt)
+		vma_start_write(mpnt);
+	vma_iter_set(&old_vmi, 0);
+#endif
 	flush_cache_dup_mm(oldmm);
 	uprobe_dup_mmap(oldmm, mm);
 	/*
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed
  2023-07-06  1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
  2023-07-06  1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
@ 2023-07-06  1:14 ` Suren Baghdasaryan
  1 sibling, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06  1:14 UTC (permalink / raw)
  To: akpm
  Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
	mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
	peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
	peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
	punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
	axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
	edumazet, gthelen, linux-mm, linux-kernel, stable,
	Suren Baghdasaryan

A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.
Disable per-VMA locks config to prevent this issue until the fix is
confirmed. This is expected to be a temporary measure.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com

Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 09130434e30d..0abc6c71dd89 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
        def_bool n
 
 config PER_VMA_LOCK
-	def_bool y
+	bool "Enable per-vma locking during page fault handling."
 	depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+	depends on BROKEN
 	help
 	  Allow per-vma locking during page fault handling.
 
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
  2023-07-06  1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
@ 2023-07-06 13:18   ` Liam R. Howlett
  2023-07-06 13:20   ` David Hildenbrand
  1 sibling, 0 replies; 5+ messages in thread
From: Liam R. Howlett @ 2023-07-06 13:18 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
	mhocko, vbabka, hannes, mgorman, dave, willy, peterz, ldufour,
	paulmck, mingo, will, luto, songliubraving, peterx, david,
	dhowells, hughd, bigeasy, kent.overstreet, punit.agrawal,
	lstoakes, peterjung1337, rientjes, chriscli, axelrasmussen,
	joelaf, minchan, rppt, jannh, shakeelb, tatashin, edumazet,
	gthelen, linux-mm, linux-kernel, stable

* Suren Baghdasaryan <surenb@google.com> [230705 21:14]:
> When forking a child process, parent write-protects an anonymous page
> and COW-shares it with the child being forked using copy_present_pte().
> Parent's TLB is flushed right before we drop the parent's mmap_lock in
> dup_mmap(). If we get a write-fault before that TLB flush in the parent,
> and we end up replacing that anonymous page in the parent process in
> do_wp_page() (because, COW-shared with the child), this might lead to
> some stale writable TLB entries targeting the wrong (old) page.
> Similar issue happened in the past with userfaultfd (see flush_tlb_page()
> call inside do_wp_page()).
> Lock VMAs of the parent process when forking a child, which prevents
> concurrent page faults during fork operation and avoids this issue.
> This fix can potentially regress some fork-heavy workloads. Kernel build
> time did not show noticeable regression on a 56-core machine while a
> stress test mapping 10000 VMAs and forking 5000 times in a tight loop
> shows ~7% regression. If such fork time regression is unacceptable,
> disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
> optimizations are possible if this regression proves to be problematic.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reported-by: Jiri Slaby <jirislaby@kernel.org>
> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
> Reported-by: Jacob Young <jacobly.alt@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> Cc: stable@vger.kernel.org
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  kernel/fork.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b85814e614a5..2ba918f83bde 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		retval = -EINTR;
>  		goto fail_uprobe_end;
>  	}
> +#ifdef CONFIG_PER_VMA_LOCK
> +	/* Disallow any page faults before calling flush_cache_dup_mm */
> +	for_each_vma(old_vmi, mpnt)
> +		vma_start_write(mpnt);
> +	vma_iter_set(&old_vmi, 0);
> +#endif
>  	flush_cache_dup_mm(oldmm);
>  	uprobe_dup_mmap(oldmm, mm);
>  	/*
> -- 
> 2.41.0.255.g8b1d071c50-goog
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
  2023-07-06  1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
  2023-07-06 13:18   ` Liam R. Howlett
@ 2023-07-06 13:20   ` David Hildenbrand
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2023-07-06 13:20 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
	mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
	peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
	peterx, dhowells, hughd, bigeasy, kent.overstreet, punit.agrawal,
	lstoakes, peterjung1337, rientjes, chriscli, axelrasmussen,
	joelaf, minchan, rppt, jannh, shakeelb, tatashin, edumazet,
	gthelen, linux-mm, linux-kernel, stable

On 06.07.23 03:13, Suren Baghdasaryan wrote:
> When forking a child process, parent write-protects an anonymous page
> and COW-shares it with the child being forked using copy_present_pte().
> Parent's TLB is flushed right before we drop the parent's mmap_lock in
> dup_mmap(). If we get a write-fault before that TLB flush in the parent,
> and we end up replacing that anonymous page in the parent process in
> do_wp_page() (because, COW-shared with the child), this might lead to
> some stale writable TLB entries targeting the wrong (old) page.
> Similar issue happened in the past with userfaultfd (see flush_tlb_page()
> call inside do_wp_page()).
> Lock VMAs of the parent process when forking a child, which prevents
> concurrent page faults during fork operation and avoids this issue.
> This fix can potentially regress some fork-heavy workloads. Kernel build
> time did not show noticeable regression on a 56-core machine while a
> stress test mapping 10000 VMAs and forking 5000 times in a tight loop
> shows ~7% regression. If such fork time regression is unacceptable,
> disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
> optimizations are possible if this regression proves to be problematic.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reported-by: Jiri Slaby <jirislaby@kernel.org>
> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
> Reported-by: Jacob Young <jacobly.alt@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> Cc: stable@vger.kernel.org
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Acked-by: David Hildenbrand <david@redhat.com>

Feel free to keep my ACK on minor changes.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-06 13:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06  1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
2023-07-06  1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
2023-07-06 13:18   ` Liam R. Howlett
2023-07-06 13:20   ` David Hildenbrand
2023-07-06  1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox