From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7209C369D7 for ; Tue, 22 Apr 2025 16:14:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 039726B0008; Tue, 22 Apr 2025 12:14:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2A2C6B000A; Tue, 22 Apr 2025 12:14:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E183F6B000C; Tue, 22 Apr 2025 12:14:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C4BE06B0008 for ; Tue, 22 Apr 2025 12:14:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 11684C078F for ; Tue, 22 Apr 2025 16:14:49 +0000 (UTC) X-FDA: 83362178298.23.7739094 Received: from mail-oa1-f51.google.com (mail-oa1-f51.google.com [209.85.160.51]) by imf16.hostedemail.com (Postfix) with ESMTP id 39595180004 for ; Tue, 22 Apr 2025 16:14:46 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eSUn2h0N; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.160.51 as permitted sender) smtp.mailfrom=qq282012236@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745338487; a=rsa-sha256; cv=none; b=vObFzB0/Ru3WgyHQvtY0ACZm6MMgeNZVHhz6FpvbomZBzExTrU6h2VCy0dMCp4FbkHmg8Q 8+zvuN7MlF1eTjrlMNCQw+GZgwEcnq1JGmRYjf34FZRRZ1GN0dz5Nc0AZ7XEienqDa7lKJ fPqQy7DaRQTvSjK1ML4zgEDlFOtdFWE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745338487; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BjefU3lyqL7vf149hmDnxMZqNfL1VDpCFNsQB/0sEuk=; b=qm3gR14VM1tiPhSmkOW0TetlLc3g6fVWqx1XdBuZUNoNBjxhmHT7NHlUvt/rUa37HjGWEB UFQiMHth2lNjPsXIjhFDpRYBm5EEwG6fts+SFFvRKq1rv7rQ7DfvVATrsCYhwf49Kxqp2d dCb7c2cXvICXyKuL/ZYGKw20EPqxQXI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eSUn2h0N; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.160.51 as permitted sender) smtp.mailfrom=qq282012236@gmail.com Received: by mail-oa1-f51.google.com with SMTP id 586e51a60fabf-2c2504fa876so1375068fac.0 for ; Tue, 22 Apr 2025 09:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745338486; x=1745943286; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BjefU3lyqL7vf149hmDnxMZqNfL1VDpCFNsQB/0sEuk=; b=eSUn2h0N0Noyem5ccSvkYhiqQ/rxsUle4JugPp1i3mtXjIXGc/fKnS2i+37Po0IwF+ 4Yg7Kd7AC7dnxbGqBa+4WVetgS8ANjzazg8r2bQ5Mk/9goo5YVtphzHaOxF1cHTy2k7a sGaCqc7Dfx0mk58MjHoaQKL2+91xWqmCQxhVDaGqgsdSDjBFm1EBjyFJaKKsxjd6S2Uq iWJi1G5brDUG56szUo8kRRZeGIq1XPEBRa2kP18ltx9HK58P3NHcU+y8kzYhYiYUaBlY fwCNDySl0DHPOW2l5RjbYDoOsQoMC2p7Ti+kEm8wFIhiT5u1ZPrYL298drwuYLutOXKp mSFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745338486; x=1745943286; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BjefU3lyqL7vf149hmDnxMZqNfL1VDpCFNsQB/0sEuk=; b=g1U+ZaYkH7x3i+vNWDWWIW5BpjCiXVlLo83NZmljkyTGHoLfVCaAJqYdjClayMLPPx FPEPTHBb2vL9X8jjp0mgCSeDm5YFw1Rtni8AZMKFrokSw3nAhrivEW2u651m3yut+GHV Ar84W5ttEX/hl9IHCnDIDv0Yar+k1EwixK88Np5Dyky78r2UgeV7d1k+e1AJvA1vuoso JPx6Jxk7PK/2FgkuaYm03XF7BF/gn250TzMrzVLnsgdzYPoavc+su1Of/TpiihHuJ4Ao QXgVK3lLsHg1jpQBj2LmixboMgZkEJ68Qwvp/ak2OU4k2PHTrUWiyXH/QvTGsF7VDxXX ATIg== X-Forwarded-Encrypted: i=1; AJvYcCX0zKS+yJuBCkEUT/Y6PBUHs4HvsEFYZifMqDmkd7PPfaToj7H6iK+DIbcZFzqoPE70JtqgtqfWYQ==@kvack.org X-Gm-Message-State: AOJu0YwZvjTD1T9qgZVNaxe9GehSLEJ8eRC25NQ+igHa5jqrESV1Alzw a6dPK+1L6koBAIsrNqZ1c5jSPGUaiR8q9rf4Ahq2YU3Ib0SeVykX2B16CNek/v0E5EEuFROxpq5 MAxL/f9fTyhV8UoG0/vHWQkgYD4A= X-Gm-Gg: ASbGncsknznTyzOXPwleddJ3/Oq4vBeOla5vte6Dk3BuYyBPvg79GkHuCx/uaB5p7xG cSfb0KI+/Vorv1xKvdKRDiiSut+xBdzgk1bCR3VrEqtgFfoIy2AjclDXmsRKC9nwCOWzvJ4h1H5 3cLqDtUCvheZtNJ5iCXPHD2eE= X-Google-Smtp-Source: AGHT+IF/s76yf6vIEoB/TsNYNjVeK16dsXTPRLz/ZPEfnN8E6sz6XFspieyAlrYHK2pmJxYIfwPQgbv0ptfPAUUJOn8= X-Received: by 2002:a05:6870:b49f:b0:2d4:ce45:6994 with SMTP id 586e51a60fabf-2d526d5507bmr9257755fac.24.1745338485866; Tue, 22 Apr 2025 09:14:45 -0700 (PDT) MIME-Version: 1.0 References: <20250422104545.1199433-1-qq282012236@gmail.com> <028b4791-b6fc-47e3-9220-907180967d3a@kernel.dk> In-Reply-To: From: =?UTF-8?B?5aec5pm65Lyf?= Date: Wed, 23 Apr 2025 00:14:33 +0800 X-Gm-Features: ATxdqUEn6MVVqneS6L3qL0mMa6qQnrGM8O7SLltyzrTiLhIfOaLIeBR49U4laJ8 Message-ID: Subject: Re: [PATCH 0/2] Fix 100% CPU usage issue in IOU worker threads To: Jens Axboe Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 39595180004 X-Stat-Signature: fhgpcng9odkomzzpzjhwabu81qj91o1m X-Rspam-User: X-HE-Tag: 1745338486-208561 X-HE-Meta: U2FsdGVkX18M29xu4w4kVZURTIlwvIVhj+QJjNPgRkIRweqoW1iGg8ecr0dD9X6/Rxkoap89LSC87V7psPhRADVZIioYdcmz7gQyC9XQzB/5ZtTCVYy8xwqyg0kp0bLJnq7S0OQzXf87lCFtTBGXPPIQ4M8hYDOKWnZV1hgvfJ/M1jn7k76/6t+almKWAAgnBls992pL/0L1DhlsMtJmGe+mrIAHHMV1J0RuWOUamzjdZeipPTnZ2V3RZqmGht+PsZrw1/DTndZWxhH+CIemqfXO0TRYx2f//JAHaEHMLrBJuovr+UQnORZFMgXfho7Dq9MPWbIDypCzBQOgit8jHdpGMtC2rb9hs3LhOB/y8U6IZN5ZMb8jKQTnl2f3ISO95ZJuAt6zXxPTFXtLZIoI+d9WiUc+AovzOF7YlMK7cU653HeYC4yDQ3JA9LU2E3BKG6Teq3FXCToSVY2/I1NIvinUf54IGw3GHS93CYiYnF604k0oqA6HlWrdfanJDhTKN9TL7VaaCxE+hLC3Ak5c1Qu504P+SqGmYNA8JVPebR0kEXSrYRQPf7iGcOVa6fh/3BoRAcSXUWmJl6nw52mYq7uNborUVGcSRoauyYfw0VrkqJLHf5XRWfdKIYjVpKBKrrBfRmv+BitPNqHv76AXbx2xFtRNbboQzy2XAK2EQRBIU95ys9WdfkFMijzicGckpiuaH5SJfXGR7q7s5RDi7Xf8BtykiSy5ekFu5R4V/X96uWLC5AEkHf61wpNzSVXl77cqLN9EORMiN8vyrP1+rQOrik+R9MXPKzd2jkb+z6/3RXFeaSYCzSB68I62Ji2tRwTqbYto0JIzU3aCqMddKyW/V2H6VRzW8NUNapUlE5qMNNrt/xPvwdHdqxW3s1RV17mq48qNmGNOReRUCWIOfQjXtq9TkCwfqtiEU49POGInOwhq3AsCT0ex2WmtOsVQd4uYW4zEMc8/kbMBJh4 LtMCe7a7 TQ9mwlrhcymjtOjjG5tj2Z1Ut7ypkpA0L2kAaXYCsqAhiQvkgmLpRNy+n30IpzLVghcdM4ukoOmn32Uu/jYZwgGbydmEqjQs/qKu/1jIqK+bhbmc7HjwwonjCx4xDq4+LKg3lyKQHx2TT2StygnGV2gus6gXccwBJ2vYCdrcfUGol/+b5vMYGUhK0stDmygMyQJ5KH0qfYoatdvF5IPCi7XNEoE6EGuLZ0UoBmJEI0xM05fbEY+yhiuzUdirclakS/dgRj6dwpMCRk6sES3GAyvQcGve+pizQzEkiALUEItZS6P9SjSaK7R5WYQPlyBkXEWpmDFUguRsEQLcyEE13oLu5yct1eDUuGLezH4L3APOqtZLQ0bJ0ISg8gmiNStanHUKrjBebTUC94yZwzPi652O4zVpwP64JsxSc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 22, 2025 at 11:50=E2=80=AFPM Jens Axboe wrote= : > > On 4/22/25 8:29 AM, Jens Axboe wrote: > > On 4/22/25 8:18 AM, ??? wrote: > >> On Tue, Apr 22, 2025 at 10:13?PM Jens Axboe wrote: > >>> > >>> On 4/22/25 8:10 AM, ??? wrote: > >>>> On Tue, Apr 22, 2025 at 9:35?PM Jens Axboe wrote: > >>>>> > >>>>> On 4/22/25 4:45 AM, Zhiwei Jiang wrote: > >>>>>> In the Firecracker VM scenario, sporadically encountered threads w= ith > >>>>>> the UN state in the following call stack: > >>>>>> [<0>] io_wq_put_and_exit+0xa1/0x210 > >>>>>> [<0>] io_uring_clean_tctx+0x8e/0xd0 > >>>>>> [<0>] io_uring_cancel_generic+0x19f/0x370 > >>>>>> [<0>] __io_uring_cancel+0x14/0x20 > >>>>>> [<0>] do_exit+0x17f/0x510 > >>>>>> [<0>] do_group_exit+0x35/0x90 > >>>>>> [<0>] get_signal+0x963/0x970 > >>>>>> [<0>] arch_do_signal_or_restart+0x39/0x120 > >>>>>> [<0>] syscall_exit_to_user_mode+0x206/0x260 > >>>>>> [<0>] do_syscall_64+0x8d/0x170 > >>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 > >>>>>> The cause is a large number of IOU kernel threads saturating the C= PU > >>>>>> and not exiting. When the issue occurs, CPU usage 100% and can onl= y > >>>>>> be resolved by rebooting. Each thread's appears as follows: > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork_asm > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_worker_handle_work > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_submit_work > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_issue_sqe > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_write > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] blkdev_write_iter > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_file_buffered_write > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_write_iter > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_iov_iter_readable > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_readable > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] asm_exc_page_fault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] exc_page_fault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] do_user_addr_fault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_mm_fault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_fault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_no_page > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_handle_userfault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_userfault > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] schedule > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] __schedule > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] __raw_spin_unlock_irq > >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker_sleeping > >>>>>> > >>>>>> I tracked the address that triggered the fault and the related fun= ction > >>>>>> graph, as well as the wake-up side of the user fault, and discover= ed this > >>>>>> : In the IOU worker, when fault in a user space page, this space i= s > >>>>>> associated with a userfault but does not sleep. This is because du= ring > >>>>>> scheduling, the judgment in the IOU worker context leads to early = return. > >>>>>> Meanwhile, the listener on the userfaultfd user side never perform= s a COPY > >>>>>> to respond, causing the page table entry to remain empty. However,= due to > >>>>>> the early return, it does not sleep and wait to be awakened as in = a normal > >>>>>> user fault, thus continuously faulting at the same address,so CPU = loop. > >>>>>> Therefore, I believe it is necessary to specifically handle user f= aults by > >>>>>> setting a new flag to allow schedule function to continue in such = cases, > >>>>>> make sure the thread to sleep. > >>>>>> > >>>>>> Patch 1 io_uring: Add new functions to handle user fault scenario= s > >>>>>> Patch 2 userfaultfd: Set the corresponding flag in IOU worker con= text > >>>>>> > >>>>>> fs/userfaultfd.c | 7 ++++++ > >>>>>> io_uring/io-wq.c | 57 +++++++++++++++----------------------------= ----- > >>>>>> io_uring/io-wq.h | 45 ++++++++++++++++++++++++++++++++++++-- > >>>>>> 3 files changed, 68 insertions(+), 41 deletions(-) > >>>>> > >>>>> Do you have a test case for this? I don't think the proposed soluti= on is > >>>>> very elegant, userfaultfd should not need to know about thread work= ers. > >>>>> I'll ponder this a bit... > >>>>> > >>>>> -- > >>>>> Jens Axboe > >>>> Sorry,The issue occurs very infrequently, and I can't manually > >>>> reproduce it. It's not very elegant, but for corner cases, it seems > >>>> necessary to make some compromises. > >>> > >>> I'm going to see if I can create one. Not sure I fully understand the > >>> issue yet, but I'd be surprised if there isn't a more appropriate and > >>> elegant solution rather than exposing the io-wq guts and having > >>> userfaultfd manipulate them. That really should not be necessary. > >>> > >>> -- > >>> Jens Axboe > >> Thanks.I'm looking forward to your good news. > > > > Well, let's hope there is! In any case, your patches could be > > considerably improved if you did: > > > > void set_userfault_flag_for_ioworker(void) > > { > > struct io_worker *worker; > > if (!(current->flags & PF_IO_WORKER)) > > return; > > worker =3D current->worker_private; > > set_bit(IO_WORKER_F_FAULT, &worker->flags); > > } > > > > void clear_userfault_flag_for_ioworker(void) > > { > > struct io_worker *worker; > > if (!(current->flags & PF_IO_WORKER)) > > return; > > worker =3D current->worker_private; > > clear_bit(IO_WORKER_F_FAULT, &worker->flags); > > } > > > > and then userfaultfd would not need any odd checking, or needing io-wq > > related structures public. That'd drastically cut down on the size of > > them, and make it a bit more palatable. > > Forgot to ask, what kernel are you running on? > > -- > Jens Axboe Thanks Jens It is linux-image-6.8.0-1026-gcp