* [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks
@ 2023-07-06 1:13 Suren Baghdasaryan
2023-07-06 1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
2023-07-06 1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
0 siblings, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06 1:13 UTC (permalink / raw)
To: akpm
Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
edumazet, gthelen, linux-mm, linux-kernel, stable,
Suren Baghdasaryan
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking
while forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork.
I tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
Both patches apply cleanly over Linus' ToT and stable 6.4.y branch.
Changes from v3 posted at [3]:
- Replace vma_iter_init with vma_iter_set, per Liam R. Howlett
- Update the regression number caused by additional VMA tree walk
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
[3] https://lore.kernel.org/all/20230705171213.2843068-1-surenb@google.com
Suren Baghdasaryan (2):
fork: lock VMAs of the parent process when forking
mm: disable CONFIG_PER_VMA_LOCK until its fixed
kernel/fork.c | 6 ++++++
mm/Kconfig | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
2023-07-06 1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
@ 2023-07-06 1:13 ` Suren Baghdasaryan
2023-07-06 13:18 ` Liam R. Howlett
2023-07-06 13:20 ` David Hildenbrand
2023-07-06 1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
1 sibling, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06 1:13 UTC (permalink / raw)
To: akpm
Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
edumazet, gthelen, linux-mm, linux-kernel, stable,
Suren Baghdasaryan
When forking a child process, parent write-protects an anonymous page
and COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to
some stale writable TLB entries targeting the wrong (old) page.
Similar issue happened in the past with userfaultfd (see flush_tlb_page()
call inside do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
kernel/fork.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index b85814e614a5..2ba918f83bde 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = -EINTR;
goto fail_uprobe_end;
}
+#ifdef CONFIG_PER_VMA_LOCK
+ /* Disallow any page faults before calling flush_cache_dup_mm */
+ for_each_vma(old_vmi, mpnt)
+ vma_start_write(mpnt);
+ vma_iter_set(&old_vmi, 0);
+#endif
flush_cache_dup_mm(oldmm);
uprobe_dup_mmap(oldmm, mm);
/*
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed
2023-07-06 1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
2023-07-06 1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
@ 2023-07-06 1:14 ` Suren Baghdasaryan
1 sibling, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-07-06 1:14 UTC (permalink / raw)
To: akpm
Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
peterx, david, dhowells, hughd, bigeasy, kent.overstreet,
punit.agrawal, lstoakes, peterjung1337, rientjes, chriscli,
axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb, tatashin,
edumazet, gthelen, linux-mm, linux-kernel, stable,
Suren Baghdasaryan
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.
Disable per-VMA locks config to prevent this issue until the fix is
confirmed. This is expected to be a temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 09130434e30d..0abc6c71dd89 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
def_bool n
config PER_VMA_LOCK
- def_bool y
+ bool "Enable per-vma locking during page fault handling."
depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+ depends on BROKEN
help
Allow per-vma locking during page fault handling.
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
2023-07-06 1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
@ 2023-07-06 13:18 ` Liam R. Howlett
2023-07-06 13:20 ` David Hildenbrand
1 sibling, 0 replies; 5+ messages in thread
From: Liam R. Howlett @ 2023-07-06 13:18 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
mhocko, vbabka, hannes, mgorman, dave, willy, peterz, ldufour,
paulmck, mingo, will, luto, songliubraving, peterx, david,
dhowells, hughd, bigeasy, kent.overstreet, punit.agrawal,
lstoakes, peterjung1337, rientjes, chriscli, axelrasmussen,
joelaf, minchan, rppt, jannh, shakeelb, tatashin, edumazet,
gthelen, linux-mm, linux-kernel, stable
* Suren Baghdasaryan <surenb@google.com> [230705 21:14]:
> When forking a child process, parent write-protects an anonymous page
> and COW-shares it with the child being forked using copy_present_pte().
> Parent's TLB is flushed right before we drop the parent's mmap_lock in
> dup_mmap(). If we get a write-fault before that TLB flush in the parent,
> and we end up replacing that anonymous page in the parent process in
> do_wp_page() (because, COW-shared with the child), this might lead to
> some stale writable TLB entries targeting the wrong (old) page.
> Similar issue happened in the past with userfaultfd (see flush_tlb_page()
> call inside do_wp_page()).
> Lock VMAs of the parent process when forking a child, which prevents
> concurrent page faults during fork operation and avoids this issue.
> This fix can potentially regress some fork-heavy workloads. Kernel build
> time did not show noticeable regression on a 56-core machine while a
> stress test mapping 10000 VMAs and forking 5000 times in a tight loop
> shows ~7% regression. If such fork time regression is unacceptable,
> disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
> optimizations are possible if this regression proves to be problematic.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reported-by: Jiri Slaby <jirislaby@kernel.org>
> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
> Reported-by: Jacob Young <jacobly.alt@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> Cc: stable@vger.kernel.org
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> kernel/fork.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b85814e614a5..2ba918f83bde 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> retval = -EINTR;
> goto fail_uprobe_end;
> }
> +#ifdef CONFIG_PER_VMA_LOCK
> + /* Disallow any page faults before calling flush_cache_dup_mm */
> + for_each_vma(old_vmi, mpnt)
> + vma_start_write(mpnt);
> + vma_iter_set(&old_vmi, 0);
> +#endif
> flush_cache_dup_mm(oldmm);
> uprobe_dup_mmap(oldmm, mm);
> /*
> --
> 2.41.0.255.g8b1d071c50-goog
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] fork: lock VMAs of the parent process when forking
2023-07-06 1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
2023-07-06 13:18 ` Liam R. Howlett
@ 2023-07-06 13:20 ` David Hildenbrand
1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2023-07-06 13:20 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm
Cc: jirislaby, jacobly.alt, holger, hdegoede, michel, jglisse,
mhocko, vbabka, hannes, mgorman, dave, willy, liam.howlett,
peterz, ldufour, paulmck, mingo, will, luto, songliubraving,
peterx, dhowells, hughd, bigeasy, kent.overstreet, punit.agrawal,
lstoakes, peterjung1337, rientjes, chriscli, axelrasmussen,
joelaf, minchan, rppt, jannh, shakeelb, tatashin, edumazet,
gthelen, linux-mm, linux-kernel, stable
On 06.07.23 03:13, Suren Baghdasaryan wrote:
> When forking a child process, parent write-protects an anonymous page
> and COW-shares it with the child being forked using copy_present_pte().
> Parent's TLB is flushed right before we drop the parent's mmap_lock in
> dup_mmap(). If we get a write-fault before that TLB flush in the parent,
> and we end up replacing that anonymous page in the parent process in
> do_wp_page() (because, COW-shared with the child), this might lead to
> some stale writable TLB entries targeting the wrong (old) page.
> Similar issue happened in the past with userfaultfd (see flush_tlb_page()
> call inside do_wp_page()).
> Lock VMAs of the parent process when forking a child, which prevents
> concurrent page faults during fork operation and avoids this issue.
> This fix can potentially regress some fork-heavy workloads. Kernel build
> time did not show noticeable regression on a 56-core machine while a
> stress test mapping 10000 VMAs and forking 5000 times in a tight loop
> shows ~7% regression. If such fork time regression is unacceptable,
> disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
> optimizations are possible if this regression proves to be problematic.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reported-by: Jiri Slaby <jirislaby@kernel.org>
> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
> Reported-by: Jacob Young <jacobly.alt@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> Cc: stable@vger.kernel.org
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Feel free to keep my ACK on minor changes.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-07-06 13:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 1:13 [PATCH v4 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
2023-07-06 1:13 ` [PATCH v4 1/2] fork: lock VMAs of the parent process when forking Suren Baghdasaryan
2023-07-06 13:18 ` Liam R. Howlett
2023-07-06 13:20 ` David Hildenbrand
2023-07-06 1:14 ` [PATCH v4 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox