From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2123C369D7 for ; Tue, 22 Apr 2025 14:29:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D41986B0005; Tue, 22 Apr 2025 10:29:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC92E6B0006; Tue, 22 Apr 2025 10:29:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6C586B0007; Tue, 22 Apr 2025 10:29:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 952286B0005 for ; Tue, 22 Apr 2025 10:29:16 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3DD10C1BB4 for ; Tue, 22 Apr 2025 14:29:18 +0000 (UTC) X-FDA: 83361912396.25.82AA366 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) by imf08.hostedemail.com (Postfix) with ESMTP id 2F9AE160012 for ; Tue, 22 Apr 2025 14:29:16 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b="p/lcKAlI"; dmarc=none; spf=pass (imf08.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.47 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745332156; a=rsa-sha256; cv=none; b=MHAiZO2J1uxQ9ggWlTFjBs+meqEGZmep1newFUxGH5Vnlm57Hk4YwrON3c5kWJQhr2lIDW Ir+pKlxDOMqfE6vzhpjoexC4iNSvnhx8Dpy76E9A8mlcnCVi9cJISE/JUJi5rIWxIMXR64 qSJ8DPthVO+P7qpEd+nFMwYyXId5Wro= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b="p/lcKAlI"; dmarc=none; spf=pass (imf08.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.47 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745332156; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZItchBs+om2xlqnvEPsN/ABV3TFl0Qstvr2y/+6RJl0=; b=M4YWLYigQ0XFd/tAJg8yyZ3ubnyCdSffvnxHHYL4Bmbdm81cNwtPf+8bGFmgcGQMuq1j3U QsXKoDpos2c9TlnGePiqrxz6KfU4KTP8pKayHByJXGWjiGpcCBiBqy6S5CuLtXGAXV23gf /KzCjQTGph95K4nYcX36dsb6g7M4fYQ= Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-861b1f04b99so145884839f.0 for ; Tue, 22 Apr 2025 07:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745332155; x=1745936955; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ZItchBs+om2xlqnvEPsN/ABV3TFl0Qstvr2y/+6RJl0=; b=p/lcKAlIWj2N362zxG7QAGBNLc69ON+YVQhgiVU1zpMGF+vQFyIg0WQcBGI7Ld2uVJ szCezVX85frljUVm03WBJV4/s/SCabYVMSJJT2mnyhSUp7mWG8jF2kgm33Bp5E3hyL0f 0Ak8AR0covtzB+au0TjvN5fbeArpWs8hF241d3okVdytYDEsJkTx67aYGM1FB4wvVatL gv+7uhmzQx/QUmAY5OQJZevH8MsogV0HyDCgxQqjDSHxzM94INfDPntujKzfbVFhyiLt cs6MOQfXaSFrS3co2E45nMY4IYgpxL/iMdaIhAqz/9TKQzFeWV0S016KCaZl7GPZ61Yn OLiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745332155; x=1745936955; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZItchBs+om2xlqnvEPsN/ABV3TFl0Qstvr2y/+6RJl0=; b=QsI+UNrpx1GKsbhKPj8m59Ck44C09pxI4wxBCkfTQeOVn3mFFWuwqvDkFJU0YkQoPe h7VCJc3n56x2PZcaSSfV27Hj+RCONyNV2HPR7MJditnl3hCGEV1hdUy8F7rzcDDryco3 pc6CtkSUirJGPSaNrM/Ak1aEewFiuz4j9Zhd8qhaaOF/6oQDLZCQqF3yqJufPHXvgDcU 5tR7sRjExnbPo1bPu0I0qdDNy+FumI6TP9XxeeVVm44/yPRsbfWAo3mYYaPjdJ08HRzL rP5EeEIqyM9TBer3Ei5i1AdHwnQy+eM1ovTuKEBLB+iitM+e3Hqmq9tniYd4QEtewl9S PWbw== X-Forwarded-Encrypted: i=1; AJvYcCXnuACY8i7l8YV2aQosWh0nA/lY48J1ycwnuW2EiP3tNbj7FRUQBAgNGYMgW190h71giCuwurQgOA==@kvack.org X-Gm-Message-State: AOJu0YweVrEI9a9KD0pG/mIAvkp4/vteweAG/7k4agiKlwcF42T1JewZ B4qbCaC2JKWPKf/2wbNldPAWmJIJ2jx0ChxGROeyBzNArh8J9lNYJRgiatJtjeI= X-Gm-Gg: ASbGncvroR7LkTdW+zsNq4PNlUBE2bwboWH6hvn9j21XPpfR64ehRSkXGtIEsXhv3TS FeVNK4vXnr8riSN/O7nM84n+/wNbkiQLc6qXBafrnX3kr5jUyR3vvTLvRcv0cBT9oMBzHPvaq+0 fizIZ6njdrUJexxgB4QuT1hoUGpylwyCODCAtw822df2zQMP8TbP7lPBizsnMqXYxcAUUHjPp8X oETUIfrvsuxmkejc3TeDsAQD438hWwLv7g4a5xeVFf8hfmyUzN4nnLQZi0/skK9TG6WnMRrrh25 62pLoLN6lJx5CR1GgsvsvxqIEcIChJ99YracBA== X-Google-Smtp-Source: AGHT+IFLV/+m8b6shyscF+FGI9c1u3BTJL9q26JyzZDFvicZ/3ujqlTuYIRRvE5h3xowrMyrH1jWVw== X-Received: by 2002:a05:6602:360d:b0:85d:9e5d:efa9 with SMTP id ca18e2360f4ac-861dbeab815mr1478981639f.10.1745332154944; Tue, 22 Apr 2025 07:29:14 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f6a39554d1sm2320394173.111.2025.04.22.07.29.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Apr 2025 07:29:14 -0700 (PDT) Message-ID: Date: Tue, 22 Apr 2025 08:29:13 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] Fix 100% CPU usage issue in IOU worker threads To: =?UTF-8?B?5aec5pm65Lyf?= Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org References: <20250422104545.1199433-1-qq282012236@gmail.com> <028b4791-b6fc-47e3-9220-907180967d3a@kernel.dk> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2F9AE160012 X-Stat-Signature: zfoeaautf13cs5uutqa56xuqrp9zjpqp X-Rspam-User: X-HE-Tag: 1745332156-781893 X-HE-Meta: U2FsdGVkX181KmaA1a5pFwaMqKSD+eNCrt1TQ+id80jn3mbbCeN/b/6ERF7umP1EVMLzwqB5HaYhNaOmY2rOspH7Z9aQFKipuzS0nGqX4JLdnjN8zkfzGzYDUypVJwl23Mt9tar8+oYRHOqdq/vhS7g/mlWnI0U5C3bGLxtiOX7Ts2RNuCEN09UswU0NrXJx9kJLX13KkfXuVMOhxAAvzVSzS8+BmlfkICznkVxCqD4YPx9zJ2bKBDB6V5dnxnB6U/4ByFa8Dy435/LcVOMFVmsO4s7c/XIyzSVRDUQ6e4nXChgLc/CDGncUo9jHLHEhcMPTwLMsjz8+4Kb5zI3MlCYNmKDH7w62AxGQj1xkD4SwipT/qNuweWftgwEAF9Q8eVKXDFYHbJIO8anCX9EwUzyPszQRT2ziiXrVCBKMugloTx7HnEf5ZDtbqcH6U4TUS/uTEgSEaOqr8jkDrSJFq8NJjuGuKEFPgqnxj6Pm6pYm8EHV3jZksGADld3wh+dGX8EKBu0Ei+dxNFxMd/3adLzNYJmOB2I/PMXQN6LsOy11QV+lFAT88ibZ6Gv6e5+REtJV3y8AoRohpkdkn3KHxwwyH9Wp3h3YrsHuiElTb8oSsx60M5HUeY4dIlUTQJfApB37EByZIZfREd5D+AHHnQu1e7h2nwaPHsYsn9WvSdFIyLFhJxoPVkUH84V3mfeT0rbvLO4laPz0k1Rf/apCJ+WMAdi2JRged6usPxwAmBHSZhkGQAXSN/ACIIVt7wrmKVoQTD4ATMGQsx7B4ab54529jh0I8hRfL1ciZ7pRM7ESx4kW7rYkqUBHGD+o7BFBbgHNQMpR6CfImj+2w92RGaNTT46XPJWzgrcluTjLocVu9q8We6ihUMWpHSITdXHrpPpWjamfuO5lEEOgRyu29qvHKaviBj80AiI4WWh4etlOp907yR8z685nWSgMuLmbknrqSxaD38/nFrblzEv Bgr6XtD4 XvFlyfM0gSLyZIfjdHJH1uYl8a0aCsm3+JFRZWXOdGVQAPH3RmsLwZGomOqFHuiE1lsn8Y00DaIjYTekxvCz/y6v9AVkB0th1C2NE5uBXLbZxbY73L1/t0tqnBROT0rH1ntv0vA2RQJ9gmPtEBSgQQlWAjDruUPHyzZWabyt+Ml1UeLE3DWQUUd0qgcfsit04tMuqRhQ8tw6sa3T0Nmhc7Q26TNksdMEJA4g7ooDxk5YK0Wjja7HGrJZABX1J3+bLVLu7BW+dDDBgcg9wurgdTVdQdneMNYObHU0k7nM70qSFSBn0WgVxtFeLz2AEtFGvBAlfMi7W33ZH4TVYQq3NNXmNcKctJbKLyXtXBvIl5Pa0Dse0wpLi79C4qFxZuyi2beHxCT6j5LpNMf0DHo+dsAck5rV+cvW4QzWt0JueWUZkYSA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 8:18 AM, ??? wrote: > On Tue, Apr 22, 2025 at 10:13?PM Jens Axboe wrote: >> >> On 4/22/25 8:10 AM, ??? wrote: >>> On Tue, Apr 22, 2025 at 9:35?PM Jens Axboe wrote: >>>> >>>> On 4/22/25 4:45 AM, Zhiwei Jiang wrote: >>>>> In the Firecracker VM scenario, sporadically encountered threads with >>>>> the UN state in the following call stack: >>>>> [<0>] io_wq_put_and_exit+0xa1/0x210 >>>>> [<0>] io_uring_clean_tctx+0x8e/0xd0 >>>>> [<0>] io_uring_cancel_generic+0x19f/0x370 >>>>> [<0>] __io_uring_cancel+0x14/0x20 >>>>> [<0>] do_exit+0x17f/0x510 >>>>> [<0>] do_group_exit+0x35/0x90 >>>>> [<0>] get_signal+0x963/0x970 >>>>> [<0>] arch_do_signal_or_restart+0x39/0x120 >>>>> [<0>] syscall_exit_to_user_mode+0x206/0x260 >>>>> [<0>] do_syscall_64+0x8d/0x170 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 >>>>> The cause is a large number of IOU kernel threads saturating the CPU >>>>> and not exiting. When the issue occurs, CPU usage 100% and can only >>>>> be resolved by rebooting. Each thread's appears as follows: >>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork_asm >>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_worker_handle_work >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_submit_work >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_issue_sqe >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_write >>>>> iou-wrk-44588 [kernel.kallsyms] [k] blkdev_write_iter >>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_file_buffered_write >>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_write_iter >>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_iov_iter_readable >>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_readable >>>>> iou-wrk-44588 [kernel.kallsyms] [k] asm_exc_page_fault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] exc_page_fault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] do_user_addr_fault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_mm_fault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_fault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_no_page >>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_handle_userfault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_userfault >>>>> iou-wrk-44588 [kernel.kallsyms] [k] schedule >>>>> iou-wrk-44588 [kernel.kallsyms] [k] __schedule >>>>> iou-wrk-44588 [kernel.kallsyms] [k] __raw_spin_unlock_irq >>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker_sleeping >>>>> >>>>> I tracked the address that triggered the fault and the related function >>>>> graph, as well as the wake-up side of the user fault, and discovered this >>>>> : In the IOU worker, when fault in a user space page, this space is >>>>> associated with a userfault but does not sleep. This is because during >>>>> scheduling, the judgment in the IOU worker context leads to early return. >>>>> Meanwhile, the listener on the userfaultfd user side never performs a COPY >>>>> to respond, causing the page table entry to remain empty. However, due to >>>>> the early return, it does not sleep and wait to be awakened as in a normal >>>>> user fault, thus continuously faulting at the same address,so CPU loop. >>>>> Therefore, I believe it is necessary to specifically handle user faults by >>>>> setting a new flag to allow schedule function to continue in such cases, >>>>> make sure the thread to sleep. >>>>> >>>>> Patch 1 io_uring: Add new functions to handle user fault scenarios >>>>> Patch 2 userfaultfd: Set the corresponding flag in IOU worker context >>>>> >>>>> fs/userfaultfd.c | 7 ++++++ >>>>> io_uring/io-wq.c | 57 +++++++++++++++--------------------------------- >>>>> io_uring/io-wq.h | 45 ++++++++++++++++++++++++++++++++++++-- >>>>> 3 files changed, 68 insertions(+), 41 deletions(-) >>>> >>>> Do you have a test case for this? I don't think the proposed solution is >>>> very elegant, userfaultfd should not need to know about thread workers. >>>> I'll ponder this a bit... >>>> >>>> -- >>>> Jens Axboe >>> Sorry,The issue occurs very infrequently, and I can't manually >>> reproduce it. It's not very elegant, but for corner cases, it seems >>> necessary to make some compromises. >> >> I'm going to see if I can create one. Not sure I fully understand the >> issue yet, but I'd be surprised if there isn't a more appropriate and >> elegant solution rather than exposing the io-wq guts and having >> userfaultfd manipulate them. That really should not be necessary. >> >> -- >> Jens Axboe > Thanks.I'm looking forward to your good news. Well, let's hope there is! In any case, your patches could be considerably improved if you did: void set_userfault_flag_for_ioworker(void) { struct io_worker *worker; if (!(current->flags & PF_IO_WORKER)) return; worker = current->worker_private; set_bit(IO_WORKER_F_FAULT, &worker->flags); } void clear_userfault_flag_for_ioworker(void) { struct io_worker *worker; if (!(current->flags & PF_IO_WORKER)) return; worker = current->worker_private; clear_bit(IO_WORKER_F_FAULT, &worker->flags); } and then userfaultfd would not need any odd checking, or needing io-wq related structures public. That'd drastically cut down on the size of them, and make it a bit more palatable. -- Jens Axboe