linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Young <jacobly.alt@gmail.com>
To: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Laurent Dufour <ldufour@linux.ibm.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Linux Regressions <regressions@lists.linux.dev>,
	Linux Memory Management <linux-mm@kvack.org>,
	 Linux PowerPC <linuxppc-dev@lists.ozlabs.org>,
	 Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: Memory corruption in multithreaded user space program while calling fork
Date: Sun, 2 Jul 2023 08:40:31 -0400	[thread overview]
Message-ID: <CALrpxLe2VagXEhsHPb9P4vJC97hkBYkLswFJB_jmhu1K+x_QhQ@mail.gmail.com> (raw)
In-Reply-To: <facbfec3-837a-51ed-85fa-31021c17d6ef@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8659 bytes --]

> Jacob: Can you repeat bisection please? Why did you skip VMA lock-based
page fault commits in your bisection?

All skips were due to compile errors of the form:
make[3]: 'install_headers' is up to date.
In file included from ./include/linux/memcontrol.h:20,
                 from ./include/linux/swap.h:9,
                 from ./include/linux/suspend.h:5,
                 from arch/x86/kernel/asm-offsets.c:14:
./include/linux/mm.h: In function ‘vma_try_start_write’:
./include/linux/mm.h:702:37: error: ‘struct vm_area_struct’ has no member
named ‘vm_lock’
  702 |         if (!down_write_trylock(&vma->vm_lock->lock))
      |                                     ^~
./include/linux/mm.h:706:22: error: ‘struct vm_area_struct’ has no member
named ‘vm_lock’
  706 |         up_write(&vma->vm_lock->lock);
      |                      ^~
make[1]: *** [scripts/Makefile.build:114: arch/x86/kernel/asm-offsets.s]
Error 1
make: *** [Makefile:1286: prepare0] Error 2

On Sun, Jul 2, 2023, 08:27 Bagas Sanjaya <bagasdotme@gmail.com> wrote:

> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent
> but random crashes in a user space program.  After a lot of reduction, I
> have come up with the following reproducer program:
> >
> > $ uname -a
> > Linux jacob 6.4.1-gentoo #1 SMP PREEMPT_DYNAMIC Sat Jul  1 19:02:42 EDT
> 2023 x86_64 AMD Ryzen 9 7950X3D 16-Core Processor AuthenticAMD GNU/Linux
> > $ cat repro.c
> > #define _GNU_SOURCE
> > #include <sched.h>
> > #include <sys/wait.h>
> > #include <unistd.h>
> >
> > void *threadSafeAlloc(size_t n) {
> >     static size_t end_index = 0;
> >     static char buffer[1 << 25];
> >     size_t start_index = __atomic_load_n(&end_index, __ATOMIC_SEQ_CST);
> >     while (1) {
> >         if (start_index + n > sizeof(buffer)) _exit(1);
> >         if (__atomic_compare_exchange_n(&end_index, &start_index,
> start_index + n, 1, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) return buffer +
> start_index;
> >     }
> > }
> >
> > int thread(void *arg) {
> >     size_t i;
> >     size_t n = 1 << 7;
> >     char *items;
> >     (void)arg;
> >     while (1) {
> >         items = threadSafeAlloc(n);
> >         for (i = 0; i != n; i += 1) items[i] = '@';
> >         for (i = 0; i != n; i += 1) if (items[i] != '@') _exit(2);
> >     }
> > }
> >
> > int main(void) {
> >     static size_t stacks[2][1 << 9];
> >     size_t i;
> >     for (i = 0; i != 2; i += 1) clone(&thread, &stacks[i] + 1,
> CLONE_THREAD | CLONE_VM | CLONE_SIGHAND, NULL);
> >     while (1) {
> >         if (fork() == 0) _exit(0);
> >         (void)wait(NULL);
> >     }
> > }
> > $ cc repro.c
> > $ ./a.out
> > $ echo $?
> > 2
> >
> > After tuning the various parameters for my computer, exit code 2, which
> indicates that memory corruption was detected, occurs approximately 99% of
> the time.  Exit code 1, which occurs approximately 1% of the time, means it
> ran out of statically-allocated memory before reproducing the issue, and
> increasing the memory usage any more only leads to diminishing returns.
> There is also something like a 0.1% chance that it segfaults due to memory
> corruption elsewhere than in the statically-allocated buffer.
> >
> > With this reproducer in hand, I was able to perform the following
> bisection:
> >
> > git bisect start
> > # status: waiting for both good and bad commits
> > # bad: [6995e2de6891c724bfeb2db33d7b87775f913ad1] Linux 6.4
> > git bisect bad 6995e2de6891c724bfeb2db33d7b87775f913ad1
> > # status: waiting for good commit(s), bad commit known
> > # good: [457391b0380335d5e9a5babdec90ac53928b23b4] Linux 6.3
> > git bisect good 457391b0380335d5e9a5babdec90ac53928b23b4
> > # good: [d42b1c47570eb2ed818dc3fe94b2678124af109d] Merge tag
> 'devicetree-for-6.4-1' of git://
> git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> > git bisect good d42b1c47570eb2ed818dc3fe94b2678124af109d
> > # bad: [58390c8ce1bddb6c623f62e7ed36383e7fa5c02f] Merge tag
> 'iommu-updates-v6.4' of git://
> git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
> > git bisect bad 58390c8ce1bddb6c623f62e7ed36383e7fa5c02f
> > # good: [888d3c9f7f3ae44101a3fd76528d3dd6f96e9fd0] Merge tag
> 'sysctl-6.4-rc1' of git://
> git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
> > git bisect good 888d3c9f7f3ae44101a3fd76528d3dd6f96e9fd0
> > # bad: [86e98ed15b3e34460d1b3095bd119b6fac11841c] Merge tag
> 'cgroup-for-6.4' of git://
> git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> > git bisect bad 86e98ed15b3e34460d1b3095bd119b6fac11841c
> > # bad: [7fa8a8ee9400fe8ec188426e40e481717bc5e924] Merge tag
> 'mm-stable-2023-04-27-15-30' of git://
> git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > git bisect bad 7fa8a8ee9400fe8ec188426e40e481717bc5e924
> > # bad: [0120dd6e4e202e19a0e011e486fb2da40a5ea279] zram: make
> zram_bio_discard more self-contained
> > git bisect bad 0120dd6e4e202e19a0e011e486fb2da40a5ea279
> > # good: [fce0b4213edb960859dcc65ea414c8efb11948e1] mm/page_alloc: add
> helper for checking if check_pages_enabled
> > git bisect good fce0b4213edb960859dcc65ea414c8efb11948e1
> > # bad: [59f876fb9d68a4d8c20305d7a7a0daf4ee9478a8] mm: avoid passing 0 to
> __ffs()
> > git bisect bad 59f876fb9d68a4d8c20305d7a7a0daf4ee9478a8
> > # good: [0050d7f5ee532f92e8ab1efcec6547bfac527973] afs: split
> afs_pagecache_valid() out of afs_validate()
> > git bisect good 0050d7f5ee532f92e8ab1efcec6547bfac527973
> > # good: [2ac0af1b66e3b66307f53b1cc446514308ec466d] mm: fall back to
> mmap_lock if vma->anon_vma is not yet set
> > git bisect good 2ac0af1b66e3b66307f53b1cc446514308ec466d
> > # skip: [0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a] mm/mmap: free
> vm_area_struct without call_rcu in exit_mmap
> > git bisect skip 0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a
> > # skip: [70d4cbc80c88251de0a5b3e8df3275901f1fa99a] powerc/mm: try VMA
> lock-based page fault handling first
> > git bisect skip 70d4cbc80c88251de0a5b3e8df3275901f1fa99a
> > # good: [444eeb17437a0ef526c606e9141a415d3b7dfddd] mm: prevent
> userfaults to be handled under per-vma lock
> > git bisect good 444eeb17437a0ef526c606e9141a415d3b7dfddd
> > # bad: [e06f47a16573decc57498f2d02f9af3bb3e84cf2] s390/mm: try VMA
> lock-based page fault handling first
> > git bisect bad e06f47a16573decc57498f2d02f9af3bb3e84cf2
> > # skip: [0bff0aaea03e2a3ed6bfa302155cca8a432a1829] x86/mm: try VMA
> lock-based page fault handling first
> > git bisect skip 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> > # skip: [cd7f176aea5f5929a09a91c661a26912cc995d1b] arm64/mm: try VMA
> lock-based page fault handling first
> > git bisect skip cd7f176aea5f5929a09a91c661a26912cc995d1b
> > # good: [52f238653e452e0fda61e880f263a173d219acd1] mm: introduce per-VMA
> lock statistics
> > git bisect good 52f238653e452e0fda61e880f263a173d219acd1
> > # bad: [c7f8f31c00d187a2c71a241c7f2bd6aa102a4e6f] mm: separate vma->lock
> from vm_area_struct
> > git bisect bad c7f8f31c00d187a2c71a241c7f2bd6aa102a4e6f
> > # only skipped commits left to test
> > # possible first bad commit: [c7f8f31c00d187a2c71a241c7f2bd6aa102a4e6f]
> mm: separate vma->lock from vm_area_struct
> > # possible first bad commit: [0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a]
> mm/mmap: free vm_area_struct without call_rcu in exit_mmap
> > # possible first bad commit: [70d4cbc80c88251de0a5b3e8df3275901f1fa99a]
> powerc/mm: try VMA lock-based page fault handling first
> > # possible first bad commit: [cd7f176aea5f5929a09a91c661a26912cc995d1b]
> arm64/mm: try VMA lock-based page fault handling first
> > # possible first bad commit: [0bff0aaea03e2a3ed6bfa302155cca8a432a1829]
> x86/mm: try VMA lock-based page fault handling first
> >
> > I do not usually see any kernel log output while running the program,
> just occasional logs about user space segfaults.
>
> See Bugzilla for the full thread.
>
> Jacob: Can you repeat bisection please? Why did you skip VMA lock-based
> page fault commits in your bisection?
>
> Anyway, I'm adding it to regzbot:
>
> #regzbot introduced: 0bff0aaea03e2a..c7f8f31c00d187
> https://bugzilla.kernel.org/show_bug.cgi?id=217624
> #regzbot <https://bugzilla.kernel.org/show_bug.cgi?id=217624#regzbot>
> title: Memory corruption in multithreaded user space program while calling
> fork (possibly caused by trying VMA lock-based page fault)
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217624
>
> --
> An old man doll... just what I always wanted! - Clara
>

[-- Attachment #2: Type: text/html, Size: 10532 bytes --]

  reply	other threads:[~2023-07-02 12:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-02 12:27 Fwd: " Bagas Sanjaya
2023-07-02 12:40 ` Jacob Young [this message]
2023-07-02 14:11   ` Bagas Sanjaya
2023-07-03  9:53 ` Fwd: " Linux regression tracking (Thorsten Leemhuis)
2023-07-03 18:08   ` Suren Baghdasaryan
2023-07-03 18:27     ` Suren Baghdasaryan
2023-07-03 18:44       ` Greg KH
2023-07-04  7:45         ` Suren Baghdasaryan
2023-07-04  8:00           ` Greg KH
2023-07-04 16:18             ` Andrew Morton
2023-07-04 20:22               ` Suren Baghdasaryan
2023-07-04 21:28                 ` Andrew Morton
2023-07-04 22:04                   ` Suren Baghdasaryan
2023-07-05  6:42                     ` Suren Baghdasaryan
2023-07-05  7:08                 ` Greg KH
2023-07-05  8:51                   ` Linux regression tracking (Thorsten Leemhuis)
2023-07-05  9:27                     ` Greg KH
2023-07-05 15:49                     ` Andrew Morton
2023-07-05 16:14                       ` Suren Baghdasaryan
2023-07-05 17:17                         ` Suren Baghdasaryan
2023-07-08 11:35                       ` Thorsten Leemhuis
2023-07-08 17:29                         ` Linus Torvalds
2023-07-08 17:39                           ` Andrew Morton
2023-07-08 18:04                             ` Linus Torvalds
2023-07-08 18:40                               ` Suren Baghdasaryan
2023-07-08 19:05                                 ` Linus Torvalds
2023-07-08 19:17                                   ` Suren Baghdasaryan
2023-07-08 19:22                                     ` Linus Torvalds
2023-07-08 19:41                                       ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALrpxLe2VagXEhsHPb9P4vJC97hkBYkLswFJB_jmhu1K+x_QhQ@mail.gmail.com \
    --to=jacobly.alt@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bagasdotme@gmail.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=regressions@lists.linux.dev \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox