linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Pedro Falcato <pfalcato@suse.de>
Cc: syzbot <syzbot+8be9bf36c3cf574426c8@syzkaller.appspotmail.com>,
	 asml.silence@gmail.com, io-uring@vger.kernel.org,
	 linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	 Linux-MM <linux-mm@kvack.org>,
	dennis@kernel.org
Subject: Re: [syzbot] [io-uring] KCSAN: data-race in copy_mm / percpu_counter_destroy_many
Date: Mon, 12 May 2025 19:08:49 +0200	[thread overview]
Message-ID: <CAG48ez20ot+-H1DmJ86COL4p_oif4mw9dvBRD7Ff5B-=axhtPQ@mail.gmail.com> (raw)
In-Reply-To: <81a0d2da-d863-4a6e-8d3b-b899d38ea605@kernel.dk>

+mmap maintainers

On Mon, May 12, 2025 at 3:51 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 5/12/25 12:36 AM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    3ce9925823c7 Merge tag 'mm-hotfixes-stable-2025-05-10-14-2..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14ff74d4580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=6154604431d9aaf9
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8be9bf36c3cf574426c8
> > compiler:       Debian clang version 20.1.2 (++20250402124445+58df0ef89dd6-1~exp1~20250402004600.97), Debian LLD 20.1.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/afdc6302fc05/disk-3ce99258.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/fc7f98d3c420/vmlinux-3ce99258.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/ea7ca2da2258/bzImage-3ce99258.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8be9bf36c3cf574426c8@syzkaller.appspotmail.com
> >
> > ==================================================================
> > BUG: KCSAN: data-race in copy_mm / percpu_counter_destroy_many
> >
> > write to 0xffff8881045e19d8 of 8 bytes by task 2123 on cpu 0:
> >  __list_del include/linux/list.h:195 [inline]
> >  __list_del_entry include/linux/list.h:218 [inline]
> >  list_del include/linux/list.h:229 [inline]
> >  percpu_counter_destroy_many+0xc7/0x2b0 lib/percpu_counter.c:244
> >  __mmdrop+0x22e/0x350 kernel/fork.c:947
> >  mmdrop include/linux/sched/mm.h:55 [inline]
> >  io_ring_ctx_free+0x31e/0x360 io_uring/io_uring.c:2740
> >  io_ring_exit_work+0x529/0x560 io_uring/io_uring.c:2962
> >  process_one_work kernel/workqueue.c:3238 [inline]
> >  process_scheduled_works+0x4cb/0x9d0 kernel/workqueue.c:3319
> >  worker_thread+0x582/0x770 kernel/workqueue.c:3400
> >  kthread+0x486/0x510 kernel/kthread.c:464
> >  ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:153
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >
> > read to 0xffff8881045e1600 of 1344 bytes by task 5051 on cpu 1:
> >  dup_mm kernel/fork.c:1728 [inline]
> >  copy_mm+0xfb/0x1310 kernel/fork.c:1786
> >  copy_process+0xcf1/0x1f90 kernel/fork.c:2429
> >  kernel_clone+0x16c/0x5b0 kernel/fork.c:2844
> >  __do_sys_clone kernel/fork.c:2987 [inline]
> >  __se_sys_clone kernel/fork.c:2971 [inline]
> >  __x64_sys_clone+0xe6/0x120 kernel/fork.c:2971
> >  x64_sys_call+0x2c59/0x2fb0 arch/x86/include/generated/asm/syscalls_64.h:57
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xd0/0x1a0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 1 UID: 0 PID: 5051 Comm: syz.1.494 Not tainted 6.15.0-rc5-syzkaller-00300-g3ce9925823c7 #0 PREEMPT(voluntary)
>
> This doesn't look like an io_uring issue, it's successive setup and teardown
> of a percpu counter. Adding some relevant folks.

This is an intentional-but-dodgy data race:

dup_mm() does memcpy() of the entire old mm_struct, while the old
mm_struct is not yet locked, in order to initialize some parts of the
new mm_struct that are expected to be stable (I guess?); and then
afterwards re-initializes all the important fields with proper
locking.

(What happens in the KCSAN report is that one mm is being forked while
an **unrelated** mm is being destroyed; and the unrelated mm's
destruction involves a linked list deletion at mm.rss_stat[].list, and
the two mm's happen to be adjacent on this linked list, so destroying
the unrelated mm updates a linked list element embedded in the first
mm.)

This design is pretty dirty, and it would be a good idea to get rid of
that memcpy() by going through every mm_struct field, checking whether
it is already initialized after the memcpy(), and figuring out proper
initialization if it isn't; another easier, less clean approach would
be to just slap a data_race() annotation around the memcpy() for now.

From a quick look, I think these are the fields that might not be reinitialized:

Fields that are safe to copy without lock because they're immutable:

 - mmap_{compat_,}{legacy_,}base
 - task_size
 - binfmt

Fields which could actually race but only if CRIU-specific APIs raced
with execve:

 - {start,end}_{code,data}
 - start_brk, start_stack
 - {arg,env}_{start,end}
 - saved_auxv

Fields which actually look like they might be data racing in practice
(but with most of these, probably not much bad stuff actually results
from that):

 - membarrier_state
 - mm_cid_next_scan
 - brk
 - numa_scan_offset
 - tlb_flush_batched
 - ksm_merging_pages, ksm_rmap_items, ksm_zero_pages

And then there's arch-specific stuff in mm_context_t, I haven't looked
at that for all architectures. But there is some pretty dodgy stuff in
there too, for example on X86 the mm->context.vdso_image pointer can
be updated through the CRIU API concurrently with the memcpy(), and it
doesn't look like that pointer is re-initialized later, so I think
that could theoretically result in a torn pointer being accessed later
on in the child. (Note that memcpy() is very much not guaranteed to
copy pointer-sized fields atomically, though it tends to often do that
in practice.)


      reply	other threads:[~2025-05-12 17:09 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <682196ed.050a0220.f2294.0053.GAE@google.com>
2025-05-12 13:51 ` Jens Axboe
2025-05-12 17:08   ` Jann Horn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez20ot+-H1DmJ86COL4p_oif4mw9dvBRD7Ff5B-=axhtPQ@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=dennis@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=pfalcato@suse.de \
    --cc=syzbot+8be9bf36c3cf574426c8@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox