From: Zhiwei Jiang <qq282012236@gmail.com>
To: viro@zeniv.linux.org.uk
Cc: brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org,
peterx@redhat.com, axboe@kernel.dk, asml.silence@gmail.com,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, io-uring@vger.kernel.org,
Zhiwei Jiang <qq282012236@gmail.com>
Subject: [PATCH 0/2] Fix 100% CPU usage issue in IOU worker threads
Date: Tue, 22 Apr 2025 10:45:43 +0000 [thread overview]
Message-ID: <20250422104545.1199433-1-qq282012236@gmail.com> (raw)
In the Firecracker VM scenario, sporadically encountered threads with
the UN state in the following call stack:
[<0>] io_wq_put_and_exit+0xa1/0x210
[<0>] io_uring_clean_tctx+0x8e/0xd0
[<0>] io_uring_cancel_generic+0x19f/0x370
[<0>] __io_uring_cancel+0x14/0x20
[<0>] do_exit+0x17f/0x510
[<0>] do_group_exit+0x35/0x90
[<0>] get_signal+0x963/0x970
[<0>] arch_do_signal_or_restart+0x39/0x120
[<0>] syscall_exit_to_user_mode+0x206/0x260
[<0>] do_syscall_64+0x8d/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80
The cause is a large number of IOU kernel threads saturating the CPU
and not exiting. When the issue occurs, CPU usage 100% and can only
be resolved by rebooting. Each thread's appears as follows:
iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork_asm
iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork
iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker
iou-wrk-44588 [kernel.kallsyms] [k] io_worker_handle_work
iou-wrk-44588 [kernel.kallsyms] [k] io_wq_submit_work
iou-wrk-44588 [kernel.kallsyms] [k] io_issue_sqe
iou-wrk-44588 [kernel.kallsyms] [k] io_write
iou-wrk-44588 [kernel.kallsyms] [k] blkdev_write_iter
iou-wrk-44588 [kernel.kallsyms] [k] iomap_file_buffered_write
iou-wrk-44588 [kernel.kallsyms] [k] iomap_write_iter
iou-wrk-44588 [kernel.kallsyms] [k] fault_in_iov_iter_readable
iou-wrk-44588 [kernel.kallsyms] [k] fault_in_readable
iou-wrk-44588 [kernel.kallsyms] [k] asm_exc_page_fault
iou-wrk-44588 [kernel.kallsyms] [k] exc_page_fault
iou-wrk-44588 [kernel.kallsyms] [k] do_user_addr_fault
iou-wrk-44588 [kernel.kallsyms] [k] handle_mm_fault
iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_fault
iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_no_page
iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_handle_userfault
iou-wrk-44588 [kernel.kallsyms] [k] handle_userfault
iou-wrk-44588 [kernel.kallsyms] [k] schedule
iou-wrk-44588 [kernel.kallsyms] [k] __schedule
iou-wrk-44588 [kernel.kallsyms] [k] __raw_spin_unlock_irq
iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker_sleeping
I tracked the address that triggered the fault and the related function
graph, as well as the wake-up side of the user fault, and discovered this
: In the IOU worker, when fault in a user space page, this space is
associated with a userfault but does not sleep. This is because during
scheduling, the judgment in the IOU worker context leads to early return.
Meanwhile, the listener on the userfaultfd user side never performs a COPY
to respond, causing the page table entry to remain empty. However, due to
the early return, it does not sleep and wait to be awakened as in a normal
user fault, thus continuously faulting at the same address,so CPU loop.
Therefore, I believe it is necessary to specifically handle user faults by
setting a new flag to allow schedule function to continue in such cases,
make sure the thread to sleep.
Patch 1 io_uring: Add new functions to handle user fault scenarios
Patch 2 userfaultfd: Set the corresponding flag in IOU worker context
fs/userfaultfd.c | 7 ++++++
io_uring/io-wq.c | 57 +++++++++++++++---------------------------------
io_uring/io-wq.h | 45 ++++++++++++++++++++++++++++++++++++--
3 files changed, 68 insertions(+), 41 deletions(-)
--
2.34.1
next reply other threads:[~2025-04-22 10:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 10:45 Zhiwei Jiang [this message]
2025-04-22 10:45 ` [PATCH 1/2] io_uring: Add new functions to handle user fault scenarios Zhiwei Jiang
2025-04-22 10:45 ` [PATCH 2/2] userfaultfd: Set the corresponding flag in IOU worker context Zhiwei Jiang
2025-04-22 13:34 ` [PATCH 0/2] Fix 100% CPU usage issue in IOU worker threads Jens Axboe
2025-04-22 14:10 ` 姜智伟
2025-04-22 14:13 ` Jens Axboe
2025-04-22 14:18 ` 姜智伟
2025-04-22 14:29 ` Jens Axboe
2025-04-22 15:49 ` Jens Axboe
2025-04-22 16:14 ` 姜智伟
2025-04-22 16:24 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250422104545.1199433-1-qq282012236@gmail.com \
--to=qq282012236@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox