From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DD150C369CB
	for <linux-mm@archiver.kernel.org>; Wed, 23 Apr 2025 14:29:57 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5B1F66B00A4; Wed, 23 Apr 2025 10:29:55 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 560116B00A5; Wed, 23 Apr 2025 10:29:55 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3B1D66B00A6; Wed, 23 Apr 2025 10:29:55 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 1A3186B00A4
	for <linux-mm@kvack.org>; Wed, 23 Apr 2025 10:29:55 -0400 (EDT)
Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id CD166BEF33
	for <linux-mm@kvack.org>; Wed, 23 Apr 2025 14:29:56 +0000 (UTC)
X-FDA: 83365542792.29.D7147E2
Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44])
	by imf30.hostedemail.com (Postfix) with ESMTP id DFA1E8000D
	for <linux-mm@kvack.org>; Wed, 23 Apr 2025 14:29:54 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=A+jJfKru;
	spf=pass (imf30.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.161.44 as permitted sender) smtp.mailfrom=qq282012236@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1745418594;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=vAmiNRPfDylXI4p2I89n/el3RsJ4kbOsDdMQamDJ/lU=;
	b=fx2AZyxp4o3XeoaGmmALtU4lbKgt9DwmzMUHBbjTK5gA9JahT+q7U6RMDNdDU0AG1WjxVK
	fBOJo74Gl7rZOKYnAXxgMvlqOKp25prLeR8CMz3aVOEQkZih+adRNIqxr1AEQ0/yuRNLnx
	KACBKhO1YiTrHMpvsQhRqLmt3f2hvOw=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=A+jJfKru;
	spf=pass (imf30.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.161.44 as permitted sender) smtp.mailfrom=qq282012236@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745418594; a=rsa-sha256;
	cv=none;
	b=ZA0zWh5hVIk8Hyj+m8DymD8g1nKBZ+30MIpR+QOK/+JvBxOnDQ0Eobu16A+omYlF1+msCq
	wFUE5mTWl6SuwATbuNb/OaxDsWhr5B7qjLyOTIFUTeeMKDD9tn/W7oGAlQShQh6/A2RgoH
	SgiGjMsklC2Rw7lCZ0v2WdAi9vg2+vc=
Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-6049acb776bso2921393eaf.3
        for <linux-mm@kvack.org>; Wed, 23 Apr 2025 07:29:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1745418594; x=1746023394; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=vAmiNRPfDylXI4p2I89n/el3RsJ4kbOsDdMQamDJ/lU=;
        b=A+jJfKru1PreoI9j/g9/gN34wj7leK1wCtIRMhKKu0/VZ+EgnJylh6lJXlRjrkeBZA
         k87fhAwrf96zx399drWKWMPH5AyvGMzLhQIScvtV5ygj5thZNHd9wt6ivIo75VUlgcAt
         LqptRn3vZgjSrqxRlx/1zakEraMxCedqrvGKoRC33x/66WEUbLQuh/0b/9HhtHVVisLV
         bS78MZOcHqu33S5iMiF6beEiIuQC8xCwJG38qjgVxAYIFQRiyFVVBoTPWotAHaHGpaab
         ZON+1EZP8Pu5Ph7IArPrZ9gWxCfj85bQ6XhCm/Y77Xj7jGbxXH7CFp8ex+lcL10rAX1u
         3rUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1745418594; x=1746023394;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=vAmiNRPfDylXI4p2I89n/el3RsJ4kbOsDdMQamDJ/lU=;
        b=O85+Xwg2xvJ3HfiEOB1qHnqYWiSWxqzNhkvsY1+VjaGankdlOJMzPakoFBl8F3TIYh
         o0OgjEi2nKKHgem1/FYcESRwSUZQf+gTUYZAxBqcpMrr6NBCx6U3kab3bgu2w7v1oEoj
         XAcuHwAD8dBO6WlwMREkjGQfbHiWjVnbgU/qHVMx3Txncscn3XaIGLSZFr4h6E+MUOYR
         Lht8+ZlW0ohJD2o3QgT1cdsary+r032JB/1HKxujRtNSD6gJH/qLl+al5jiLCxTt5I4z
         u3fMp7oZC1aVvMdZYpu2gMDX6hLxAleYSpQFFGZI/0qzaSMRfKgUOfPu3+g94ssRNQtN
         556Q==
X-Forwarded-Encrypted: i=1; AJvYcCVKS7sOC+6bRcp3HL4mZdfSFZ1WmvdrXYGLa8O7ZvFzHuzKQyD2JmBZtR/o464C6CjD5D5/DMtLxw==@kvack.org
X-Gm-Message-State: AOJu0YzkeBilLPzF6nMgXHXH7pSKpb0aQAcImSqge4P5SWqwPrD83Y7k
	JIEX9Mci27zjc93LWKtKyVp4aYqOqCbY+MwQPNEvMnG2jr9r2UnQE2viYxAnDJiU6E32RYScL9S
	4G7KfkWwWREjv+5CT/vJQD4ImDy0=
X-Gm-Gg: ASbGncuXaC61FO7/C6oPYRuXNWpnZvTT1kb+ehRnChAISFyI5W//OVj50x+G31JqRrS
	VzuKSp0RKprITqdUiklxZvbWeEAKqSHB35vC8deTZ6xRyMZeAzhG85nzAN7H0UBvJI8D0EZixMZ
	De0bIMtVLRaJT4P2bSKqo9/qk=
X-Google-Smtp-Source: AGHT+IG/83Vm9P2qTMF2EymMFKDXet9s/AZitDACbuc3Nu8TMbhGwbGeWoz11uK69jP/+KGgtXg1R2c2Y0EWUlyBaZc=
X-Received: by 2002:a05:6871:400b:b0:2d4:d9d6:c8cf with SMTP id
 586e51a60fabf-2d526973ecamr11685729fac.5.1745418593659; Wed, 23 Apr 2025
 07:29:53 -0700 (PDT)
MIME-Version: 1.0
References: <20250422162913.1242057-1-qq282012236@gmail.com>
 <20250422162913.1242057-2-qq282012236@gmail.com> <14195206-47b1-4483-996d-3315aa7c33aa@kernel.dk>
 <CANHzP_uW4+-M1yTg-GPdPzYWAmvqP5vh6+s1uBhrMZ3eBusLug@mail.gmail.com>
 <b61ac651-fafe-449a-82ed-7239123844e1@kernel.dk> <CANHzP_tLV29_uk2gcRAjT9sJNVPH3rMyVuQP07q+c_TWWgsfDg@mail.gmail.com>
 <7bea9c74-7551-4312-bece-86c4ad5c982f@kernel.dk>
In-Reply-To: <7bea9c74-7551-4312-bece-86c4ad5c982f@kernel.dk>
From: =?UTF-8?B?5aec5pm65Lyf?= <qq282012236@gmail.com>
Date: Wed, 23 Apr 2025 22:29:39 +0800
X-Gm-Features: ATxdqUEhnCv--36VUKzk9XOpQWOK0vKQGChSsvOwX9Wb21JeSMSuWP0HeyDRE1U
Message-ID: <CANHzP_ui_TEPvr6wkWr42j46Sk5qHzZ+p0oo06BrNny52dPK9Q@mail.gmail.com>
Subject: Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios
To: Jens Axboe <axboe@kernel.dk>
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, 
	akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, 
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, io-uring@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: wf9fpe8r1o7oedahb49r4pne4frct1zu
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: DFA1E8000D
X-Rspam-User: 
X-HE-Tag: 1745418594-127581
X-HE-Meta: U2FsdGVkX18GuaDpx6+RowBF0Pj/7Th/+LWXl4CdPxartVf4XNlP66LTQNANAXxI1TuxKOt0LZwEjrJSjf8zNRr9YykQpgypPsXQzzObIPDH01G1wO5IoeqabRzbuPEgH5CKzm56aJ4IPId5ianjlZQObONmyoW9ToKAJLODipfG7+nQ6+jokMOSwoyPx0lB/9U20UIUeer8SWzojDC9aRbjE8Xy8fWP1nGda34lbiiFuifZalQnWtabYaMLyqjriiIN+G7A8+EYeTuVdbLhEA/ikwmES+iIONzWsps+bWaljmUcmed7vkLQ+cj+Nfo9bELj8E9E2xC/4fwwc2CmlNhXc44PnqL5vXkcIc16RNeNMQ7fK60jENR6PEPlHmIOtPyzfmom1phQXo3NHLbgFtyJntaK2Y0tPiSNr3hOFoGwoCBkLD7u0/RJB1LMBVsqLadbHg/y3NAFR1PWL+kmcyKTLxiR6XpXGOBTZvQNrTTolXYDyesVLKT6YQU7l3ALSL0EkM4wERQa/BYuAe3qrTjJYeLC+udtyqkE9E3tfrmRCIXQNeQiGybETDW1Wp2/ueTnQI5ndHNCkjSsY6D1bP15UWkWKMEiNq0TigoHr8NUy/f56NlnqhaFkKkfGH5wpogorn804bl5gPiHtQsrZ4VujHUiQznVbPcXSY+hyhS/xPk+CqbX73Mwqlzc+goiH19e9rKWPF8S0BcrcoXD6YJFChRjbqyP2js04gg9uVlNsOh4ynulMB6326TG/lgbWCVjFsgVRq/Hv6aGkLN+5J5vt4fjXgaFwaUWl5qTyHfITGR+7zzrHl9fMTHzzIkNOq9iWZw75kv7YP6M9qiVzTXaFxr5wRodhPGW5zIJtL/olrM/uC1jBadOWKOZK+xVSHprh88toKaK04AxQbZN2hr5TGPmDzlp0QgCr64yi5+eQ9sge/Eg8z2Cpi6WR5WoqpvU7RZ216RWKP9Xfsn
 1lVyPXCt
 ZN8FWZMN8D7II9QkG1zr+3rxew2Q91js7yRr/kcf4dYNK7gn+GWBKR0e7jq9xTygtYU+N7hsTACACK8E6cdwDbH1bkox2YA6KpNAANNTNcz1ylMLxrup+/lH5rYEIEQ7teP7vhApF7pOvKqaUgcdRQkMEaU+xgzh9OEqhA5KRMQIdsQga8IgAQ0b+1a55xoVPEmdXnK8Wnfp6o34SbNWpjNKw8JU2KhtLe4mc+daL2ME54cFQoy0+puqDZJa7GHpAF9RHUcSzaqTAGOAPVfYdLIpUXdFW3ss8tqlvKVDywLy2r7LEUZzIHgxlRnadZa9LeTl9EX7oTwc3SS7/EXMp79/lYYNyxEti7MrhNEnDUDnDRked+O/fVPyH+cyp8Uldj4R3RBvPGQZdIIhx9edgZzbnr8VXcDltL11Bye80/YGzROuBzRCIgGhn/WsgqnkqYeDYzSrDuneMwjY=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Jens Axboe <axboe@kernel.dk> =E4=BA=8E2025=E5=B9=B44=E6=9C=8823=E6=97=A5=E5=
=91=A8=E4=B8=89 21:34=E5=86=99=E9=81=93=EF=BC=9A
>
> On 4/22/25 8:49 PM, ??? wrote:
> > On Wed, Apr 23, 2025 at 1:33?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 4/22/25 11:04 AM, ??? wrote:
> >>> On Wed, Apr 23, 2025 at 12:32?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> On 4/22/25 10:29 AM, Zhiwei Jiang wrote:
> >>>>> diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
> >>>>> index d4fb2940e435..8567a9c819db 100644
> >>>>> --- a/io_uring/io-wq.h
> >>>>> +++ b/io_uring/io-wq.h
> >>>>> @@ -70,8 +70,10 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *=
wq, work_cancel_fn *cancel,
> >>>>>                                       void *data, bool cancel_all);
> >>>>>
> >>>>>  #if defined(CONFIG_IO_WQ)
> >>>>> -extern void io_wq_worker_sleeping(struct task_struct *);
> >>>>> -extern void io_wq_worker_running(struct task_struct *);
> >>>>> +extern void io_wq_worker_sleeping(struct task_struct *tsk);
> >>>>> +extern void io_wq_worker_running(struct task_struct *tsk);
> >>>>> +extern void set_userfault_flag_for_ioworker(void);
> >>>>> +extern void clear_userfault_flag_for_ioworker(void);
> >>>>>  #else
> >>>>>  static inline void io_wq_worker_sleeping(struct task_struct *tsk)
> >>>>>  {
> >>>>> @@ -79,6 +81,12 @@ static inline void io_wq_worker_sleeping(struct =
task_struct *tsk)
> >>>>>  static inline void io_wq_worker_running(struct task_struct *tsk)
> >>>>>  {
> >>>>>  }
> >>>>> +static inline void set_userfault_flag_for_ioworker(void)
> >>>>> +{
> >>>>> +}
> >>>>> +static inline void clear_userfault_flag_for_ioworker(void)
> >>>>> +{
> >>>>> +}
> >>>>>  #endif
> >>>>>
> >>>>>  static inline bool io_wq_current_is_worker(void)
> >>>>
> >>>> This should go in include/linux/io_uring.h and then userfaultfd woul=
d
> >>>> not have to include io_uring private headers.
> >>>>
> >>>> But that's beside the point, like I said we still need to get to the
> >>>> bottom of what is going on here first, rather than try and paper aro=
und
> >>>> it. So please don't post more versions of this before we have that
> >>>> understanding.
> >>>>
> >>>> See previous emails on 6.8 and other kernel versions.
> >>>>
> >>>> --
> >>>> Jens Axboe
> >>> The issue did not involve creating new worker processes. Instead, the
> >>> existing IOU worker kernel threads (about a dozen) associated with th=
e VM
> >>> process were fully utilizing CPU without writing data, caused by a fa=
ult
> >>> while reading user data pages in the fault_in_iov_iter_readable funct=
ion
> >>> when pulling user memory into kernel space.
> >>
> >> OK that makes more sense, I can certainly reproduce a loop in this pat=
h:
> >>
> >> iou-wrk-726     729    36.910071:       9737 cycles:P:
> >>         ffff800080456c44 handle_userfault+0x47c
> >>         ffff800080381fc0 hugetlb_fault+0xb68
> >>         ffff80008031fee4 handle_mm_fault+0x2fc
> >>         ffff8000812ada6c do_page_fault+0x1e4
> >>         ffff8000812ae024 do_translation_fault+0x9c
> >>         ffff800080049a9c do_mem_abort+0x44
> >>         ffff80008129bd78 el1_abort+0x38
> >>         ffff80008129ceb4 el1h_64_sync_handler+0xd4
> >>         ffff8000800112b4 el1h_64_sync+0x6c
> >>         ffff80008030984c fault_in_readable+0x74
> >>         ffff800080476f3c iomap_file_buffered_write+0x14c
> >>         ffff8000809b1230 blkdev_write_iter+0x1a8
> >>         ffff800080a1f378 io_write+0x188
> >>         ffff800080a14f30 io_issue_sqe+0x68
> >>         ffff800080a155d0 io_wq_submit_work+0xa8
> >>         ffff800080a32afc io_worker_handle_work+0x1f4
> >>         ffff800080a332b8 io_wq_worker+0x110
> >>         ffff80008002dd38 ret_from_fork+0x10
> >>
> >> which seems to be expected, we'd continually try and fault in the
> >> ranges, if the userfaultfd handler isn't filling them.
> >>
> >> I guess this is where I'm still confused, because I don't see how this
> >> is different from if you have a normal write(2) syscall doing the same
> >> thing - you'd get the same looping.
> >>
> >> ??
> >>
> >>> This issue occurs like during VM snapshot loading (which uses
> >>> userfaultfd for on-demand memory loading), while the task in the gues=
t is
> >>> writing data to disk.
> >>>
> >>> Normally, the VM first triggers a user fault to fill the page table.
> >>> So in the IOU worker thread, the page tables are already filled,
> >>> fault no chance happens when faulting in memory pages
> >>> in fault_in_iov_iter_readable.
> >>>
> >>> I suspect that during snapshot loading, a memory access in the
> >>> VM triggers an async page fault handled by the kernel thread,
> >>> while the IOU worker's async kernel thread is also running.
> >>> Maybe If the IOU worker's thread is scheduled first.
> >>> I?m going to bed now.
> >>
> >> Ah ok, so what you're saying is that because we end up not sleeping
> >> (because a signal is pending, it seems), then the fault will never get
> >> filled and hence progress not made? And the signal is pending because
> >> someone tried to create a net worker, and this work is not getting
> >> processed.
> >>
> >> --
> >> Jens Axboe
> >         handle_userfault() {
> >           hugetlb_vma_lock_read();
> >           _raw_spin_lock_irq() {
> >             __pv_queued_spin_lock_slowpath();
> >           }
> >           vma_mmu_pagesize() {
> >             hugetlb_vm_op_pagesize();
> >           }
> >           huge_pte_offset();
> >           hugetlb_vma_unlock_read();
> >           up_read();
> >           __wake_up() {
> >             _raw_spin_lock_irqsave() {
> >               __pv_queued_spin_lock_slowpath();
> >             }
> >             __wake_up_common();
> >             _raw_spin_unlock_irqrestore();
> >           }
> >           schedule() {
> >             io_wq_worker_sleeping() {
> >               io_wq_dec_running();
> >             }
> >             rcu_note_context_switch();
> >             raw_spin_rq_lock_nested() {
> >               _raw_spin_lock();
> >             }
> >             update_rq_clock();
> >             pick_next_task() {
> >               pick_next_task_fair() {
> >                 update_curr() {
> >                   update_curr_se();
> >                   __calc_delta.constprop.0();
> >                   update_min_vruntime();
> >                 }
> >                 check_cfs_rq_runtime();
> >                 pick_next_entity() {
> >                   pick_eevdf();
> >                 }
> >                 update_curr() {
> >                   update_curr_se();
> >                   __calc_delta.constprop.0();
> >                   update_min_vruntime();
> >                 }
> >                 check_cfs_rq_runtime();
> >                 pick_next_entity() {
> >                   pick_eevdf();
> >                 }
> >                 update_curr() {
> >                   update_curr_se();
> >                   update_min_vruntime();
> >                   cpuacct_charge();
> >                   __cgroup_account_cputime() {
> >                     cgroup_rstat_updated();
> >                   }
> >                 }
> >                 check_cfs_rq_runtime();
> >                 pick_next_entity() {
> >                   pick_eevdf();
> >                 }
> >               }
> >             }
> >             raw_spin_rq_unlock();
> >             io_wq_worker_running();
> >           }
> >           _raw_spin_lock_irq() {
> >             __pv_queued_spin_lock_slowpath();
> >           }
> >           userfaultfd_ctx_put();
> >         }
> >       }
> > The execution flow above is the one that kept faulting
> > repeatedly in the IOU worker during the issue. The entire fault path,
> > including this final userfault handling code you're seeing, would be
> > triggered in an infinite loop. That's why I traced and found that the
> > io_wq_worker_running() function returns early, causing the flow to
> > differ from a normal user fault, where it should be sleeping.
>
> io_wq_worker_running() is called when the task is scheduled back in.
> There's no "returning early" here, it simply updates the accounting.
> Which is part of why your patch makes very little sense to me, we
> would've called both io_wq_worker_sleeping() and _running() from the
> userfaultfd path. The latter doesn't really do much, it simply just
> increments the running worker count, if the worker was previously marked
> as sleeping.
>
> And I strongly suspect that the latter is the issue, not the marking of
> running. The above loop is fine if we do go to sleep in schedule.
> However, if there's task_work (either TWA_SIGNAL or TWA_NOTIFY_SIGNAL
> based) pending, then schedule() will be a no-op and we're going to
> repeatedly go through that loop. This is because the expectation here is
> that the loop will be aborted if either of those is true, so that
> task_work can get run (or a signal handled, whatever), and then the
> operation retried.
>
> > However, your call stack appears to behave normally,
> > which makes me curious about what's different about execution flow.
> > Would you be able to share your test case code so I can study it
> > and try to reproduce the behavior on my side?
>
> It behaves normally for the initial attempt - we end up sleeping in
> schedule(). However, then a new worker gets created, or the ring
> shutdown, in which case schedule() ends up being a no-op because
> TWA_NOTIFY_SIGNAL is set, and then we just sit there in a loop running
> the same code again and again to no avail. So I do think my test case
> and your issue is the same, I just reproduce it by calling
> io_uring_queue_exit(), but the exact same thing would happen if worker
> creation is attempted while an io-wq worker is blocked
> handle_userfault().
>
> This is why I want to fully understand the issue rather than paper
> around it, as I don't think the fix is correct as-is. We really want to
> abort the loop and allow the task to handle whatever signaling is
> currently preventing proper sleeps.
>
> I'll dabble a bit more and send out the test case too, in case it'll
> help on your end.
>
> --
> Jens Axboe
I=E2=80=99m really looking forward to your test case. Also, I=E2=80=99d lik=
e to emphasize one
more point: the handle_userfault graph path I sent you, including the sched=
ule
function, is complete and unmodified. You can see that the schedule functio=
n
is very, very short. I understand your point about signal handling, but in =
this
very brief function graph, I haven=E2=80=99t yet seen any functions related=
 to signal
handling. Additionally, there is no context switch here, nor is it the situ=
ation
where the thread is being scheduled back in. Perhaps the scenario you=E2=80=
=99ve
reproduced is still different from the one I=E2=80=99ve encountered in some=
 subtle way?

void io_wq_worker_running(struct task_struct *tsk)
{
struct io_worker *worker =3D tsk->worker_private;

if (!worker)
return;
if (!test_bit(IO_WORKER_F_FAULT, &worker->flags)) {
if (!test_bit(IO_WORKER_F_UP, &worker->flags))
return;
if (test_bit(IO_WORKER_F_RUNNING, &worker->flags))
return;
set_bit(IO_WORKER_F_RUNNING, &worker->flags);
io_wq_inc_running(worker);
}
}
However, from my observation during the crash live memory analysis,
when this happens in the IOU worker thread, the
IO_WORKER_F_RUNNING flag is set. This is what I said "early return",
rather than just a simple accounting function.I look forward to your
deeper analysis and any corrections you may have.