From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE162C369D3 for ; Tue, 22 Apr 2025 15:50:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 123296B0008; Tue, 22 Apr 2025 11:50:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D4846B000A; Tue, 22 Apr 2025 11:50:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB3E26B000C; Tue, 22 Apr 2025 11:50:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C76516B0008 for ; Tue, 22 Apr 2025 11:50:03 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5D66D160501 for ; Tue, 22 Apr 2025 15:50:03 +0000 (UTC) X-FDA: 83362115886.23.D085AC8 Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) by imf19.hostedemail.com (Postfix) with ESMTP id 224741A000B for ; Tue, 22 Apr 2025 15:50:00 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=BENP4sW2; dmarc=none; spf=pass (imf19.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.182 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745337001; a=rsa-sha256; cv=none; b=jAcCtqkjMMwIUlW+49iPH5Xt4FfQab/P2MD61022rXTXIWnnljGTDg6D4pySqTFZR4VOGd 2ENtrva+R6fRoTfIZLUjsZMXK1hJ1XZ7yslHEfhsPxfpcZZi32dnW7IpetH5mWFARcl30p g2y49uhQDNeFLAY8NZtUcMNL4i3NV60= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=BENP4sW2; dmarc=none; spf=pass (imf19.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.182 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745337001; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PWtCOErAPdviJAPXqAg1WYEhucNuXZisNIO11WL+WKM=; b=xnEgNK0gecWQMP5qNM9cd6Sm6UIdmpuEEOnU/isIMvJbx1Y3OSgofHAuBwG8Hoz+hf2SNF Qu4EGhcpeOg0SllUuztwxH1dSFBqPJlu3TPYhgq4jVXGCiK6mbsK2W2YksVkh0ktZn+A6g hPF8gJyeOmIBZ/NUsBdLSiN/vyIG1jI= Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-3d91b55bd39so7300345ab.0 for ; Tue, 22 Apr 2025 08:50:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745337000; x=1745941800; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=PWtCOErAPdviJAPXqAg1WYEhucNuXZisNIO11WL+WKM=; b=BENP4sW2NLhheF0f8oPNMdGSgmQtI7yXKctvRPSXU7B3qVKHju3WQOANoEBw22msyF 1AxmBSaC8eAbU5GHD8tU+NgSApTaJAMRMSQKVmsJUgq8m0KJZiL/AN7lIXKfXKObk430 UgKaIp+6AjjnyZPVOj6ypU4ZUQGAwf/bZeiukp/qBKNVJ+iv/bGHjJ3SnD5wCeJ4Wlub 3G1G9tqc8KaUnmKcIP3OmED0lCCIxrSEHKWUexk0+pG/1xaLRhXbmU1NlZypc/fHwzbt xnxD0/QJOPLlS6EZ1HsdDr3aGa1IUYB8J2Dx2ZmEk8np47npQHXTKVYA50cHoQjkT9IY 6fYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745337000; x=1745941800; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PWtCOErAPdviJAPXqAg1WYEhucNuXZisNIO11WL+WKM=; b=Br2dad/STTMPkN5xqmE30fzoju3r7zCNPCV/xKYY+bKX3oOPgXqaXuXe3pXQ4djsq/ u0RgZ1bIlls1Z4fFF/5PtpwfRoNLVlwulE1HN7bUSlOvgn86C2l5lQUc+PO1jcnv9Nza YOjlOF55ydhgaOkF9QVkXvSVVOY4GLm3yXgJ/vVx1HRhSkfabjBN5P8CDxwd0E8gh+BW oQ/AcLHxw38MCOHKXPZnwR1ISllY2GmkiGaQRvYHK9pzZgsjmjTsgjBmbytbwUcVoZSx oQOU5lijAHoliH9q7p9P1zZlLG+22zfaNqPWodPYbfxsxS1N0OnJHRQA2p17AmoFucWU /oxw== X-Forwarded-Encrypted: i=1; AJvYcCUTWwxTzq2FyIwpueFbUjhuMPFzWxy7xtpKC6TS1s4t9ucHmTv4IT6ggiJqFbvF4pgx2IjQ4aEbkA==@kvack.org X-Gm-Message-State: AOJu0YxqcGXlVZJKz5+yq9k+D1zhxFMoq2Jq8leIUAA9lc4CBNH5Rb5Q kCR6hR5CDY425DT6VSSpNCPq6xN81AZsQf6+NwCpy4kqTj8uuCTSNcIl3ZP8A/w= X-Gm-Gg: ASbGncvrffMy3qQPloqSk0rzydCyAlY4JHRk4YLOKYMo1EMbI+XvG9uCV1QNWkrhuKi ST47W0AYWixZ3HK2oIACKW2+lY6WIhUl8gcfSbb4PSRhjLN8kTDtgUR/Gdk5ycf+jNrKZV93cEM QFcMwhAkSJPP6yu8VjBTYJMZF9Od1gBUi9qcJd/a1ZVF+jk2ffKeNtxdQ8r20T5D/RAZs8Zgl2u bKs5oEP47crCGa6WipUZvGS2rOX9xtUgDQttxOgTnbn4S/pu5deoswgij7/3hQFfYg1ZYDq6+LH 5w0lz8rdgByonMZhzIg18W81GcQyP9XukL9E7A== X-Google-Smtp-Source: AGHT+IGAwwrMjNk9FpLKU7sspnSpBBOnFfhAwVWOfWP0g5H8a8AXj4k2N3R/0L2Z+lNpCM0/XNeMOw== X-Received: by 2002:a05:6e02:338f:b0:3d8:1e50:1d55 with SMTP id e9e14a558f8ab-3d88eda9aebmr146286335ab.11.1745336999867; Tue, 22 Apr 2025 08:49:59 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3d821ec204dsm23495105ab.53.2025.04.22.08.49.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Apr 2025 08:49:59 -0700 (PDT) Message-ID: Date: Tue, 22 Apr 2025 09:49:58 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] Fix 100% CPU usage issue in IOU worker threads From: Jens Axboe To: =?UTF-8?B?5aec5pm65Lyf?= Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org References: <20250422104545.1199433-1-qq282012236@gmail.com> <028b4791-b6fc-47e3-9220-907180967d3a@kernel.dk> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 224741A000B X-Stat-Signature: eocec9rbg5q5453m4go818hghcx6qfi6 X-Rspam-User: X-HE-Tag: 1745337000-467306 X-HE-Meta: U2FsdGVkX19SnSvHH8w8Vur5dkHWuLpPQeactuQK9UMNidCvChaLgW743Nif2Pdjh3FOysOBjoKfoeq6vCFmby8lEAtuA2N/rvxJ31snpjZlUkZtcHLcrUQ7AmEZPaJ6JgXniffcAlmsDVcW+ekHQE3r1KV03+bapSAN+TXOjq6jP/ERns0kFFMRKKyYtC424QlYrGw61H8wqzxEOxPAdQOhRazjQ4F+E5PvDZ4jGn7qE30k/abpzqw1qVALTEwZwprNGiNFnThwuaufzsJ8jcf+Y5lfC4T3jr6FGcXFYpTHUtmPm7S8MQjX9qedv+jkpj8mx7Ezit6e3K20c21F1imeUdNSFJ4xo+7WPTxMx2LZeOiQjsavQEETa4hTLLsvB4CaZw/OcITznblcKb3kUSEsAjjqSicbSx1eLH4rWpFFud8YBiPpklD8pQS60weN798T+yLIIg/AXfPxqMs8lPBdwQC2NP+ob3CjELF1G+YHOxF2osP20zFWF7zBonNRlcN9vZZUw0aP7bIO/Gth8z8FinI/CS9MFs0oL6fTEsiRs0kQtlAlyIG0l5KbjXMl+RZpSlnNw1pcE9DpjwIV02emEFr61iZ3A7k+hyeLr1MVJQd2W5o8m42ijNrU1Hy10cEuGeml+gyYFj+WdP7YjUR+nzumaczg6U73UMbMInjodW4fCi44bwRhUDv4nljC2pxHKm7LZiO/nUxq1FgxUUNo2ti/S6+q1JmVy+Vd5aaWEsS1s4rPSaheBqAwVB2qX7PN6rvEp/h4oeFFLbA43rJ2BQCXiNlqkhpg6jfCmfVm+tRccYfHAkDr8pzSbm4lg/aHdDK0marMulBdqL2Bu3bIcEAB7GmA7SgdsLkQb4no4a3MOA8UouAuBo85oDdrk4Fa/V2blCHENF9cQO2D3/bt8xX3Kr68jjxAzRuXUJcqIvyn//sFaqjkC93fSX4ponzY85gMxHNOw+c4eua Ip1cBgNV 91TP5VOkLZf88MPKFYEEJQITZ2xmfNmV/p0gSjeW7BiIBwRhFBnpBkJl193hyMVXlZqZcoSt978n7rUZcJkbe5Pz89d3oHPomlZ7nbkpsjQlHUEr4frwVDHVrypWaj+2diGLkp81SgFMa9/ZZi1+NR7d7R9BGvjSWhm7YLCC/VWr8Iyn4eXFyn8TSDD8Bqv7IyE/gmPCdq0v8xEyP3dWLE60n7sl/7capPWLinCPsi105LTOZQMXPb+nuO/C2TTNuDBxljEBa+um9Jgo3zzVRDzdixoSumGCQJO8FWRV/CXAermJDAZU9eKY1js8SP6mZsN6nH+igWA1Fx1lnZihPUXw1XXpJRUSyVkYFHV18lojcdlK3LVBsYEmDPaLHJ1MEebvlbhmuCE4UP561386sBMergp0TFHP3b87rulCVAYJeUqs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 8:29 AM, Jens Axboe wrote: > On 4/22/25 8:18 AM, ??? wrote: >> On Tue, Apr 22, 2025 at 10:13?PM Jens Axboe wrote: >>> >>> On 4/22/25 8:10 AM, ??? wrote: >>>> On Tue, Apr 22, 2025 at 9:35?PM Jens Axboe wrote: >>>>> >>>>> On 4/22/25 4:45 AM, Zhiwei Jiang wrote: >>>>>> In the Firecracker VM scenario, sporadically encountered threads with >>>>>> the UN state in the following call stack: >>>>>> [<0>] io_wq_put_and_exit+0xa1/0x210 >>>>>> [<0>] io_uring_clean_tctx+0x8e/0xd0 >>>>>> [<0>] io_uring_cancel_generic+0x19f/0x370 >>>>>> [<0>] __io_uring_cancel+0x14/0x20 >>>>>> [<0>] do_exit+0x17f/0x510 >>>>>> [<0>] do_group_exit+0x35/0x90 >>>>>> [<0>] get_signal+0x963/0x970 >>>>>> [<0>] arch_do_signal_or_restart+0x39/0x120 >>>>>> [<0>] syscall_exit_to_user_mode+0x206/0x260 >>>>>> [<0>] do_syscall_64+0x8d/0x170 >>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 >>>>>> The cause is a large number of IOU kernel threads saturating the CPU >>>>>> and not exiting. When the issue occurs, CPU usage 100% and can only >>>>>> be resolved by rebooting. Each thread's appears as follows: >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork_asm >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_worker_handle_work >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_submit_work >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_issue_sqe >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_write >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] blkdev_write_iter >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_file_buffered_write >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] iomap_write_iter >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_iov_iter_readable >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] fault_in_readable >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] asm_exc_page_fault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] exc_page_fault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] do_user_addr_fault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_mm_fault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_fault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_no_page >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_handle_userfault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] handle_userfault >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] schedule >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] __schedule >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] __raw_spin_unlock_irq >>>>>> iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker_sleeping >>>>>> >>>>>> I tracked the address that triggered the fault and the related function >>>>>> graph, as well as the wake-up side of the user fault, and discovered this >>>>>> : In the IOU worker, when fault in a user space page, this space is >>>>>> associated with a userfault but does not sleep. This is because during >>>>>> scheduling, the judgment in the IOU worker context leads to early return. >>>>>> Meanwhile, the listener on the userfaultfd user side never performs a COPY >>>>>> to respond, causing the page table entry to remain empty. However, due to >>>>>> the early return, it does not sleep and wait to be awakened as in a normal >>>>>> user fault, thus continuously faulting at the same address,so CPU loop. >>>>>> Therefore, I believe it is necessary to specifically handle user faults by >>>>>> setting a new flag to allow schedule function to continue in such cases, >>>>>> make sure the thread to sleep. >>>>>> >>>>>> Patch 1 io_uring: Add new functions to handle user fault scenarios >>>>>> Patch 2 userfaultfd: Set the corresponding flag in IOU worker context >>>>>> >>>>>> fs/userfaultfd.c | 7 ++++++ >>>>>> io_uring/io-wq.c | 57 +++++++++++++++--------------------------------- >>>>>> io_uring/io-wq.h | 45 ++++++++++++++++++++++++++++++++++++-- >>>>>> 3 files changed, 68 insertions(+), 41 deletions(-) >>>>> >>>>> Do you have a test case for this? I don't think the proposed solution is >>>>> very elegant, userfaultfd should not need to know about thread workers. >>>>> I'll ponder this a bit... >>>>> >>>>> -- >>>>> Jens Axboe >>>> Sorry,The issue occurs very infrequently, and I can't manually >>>> reproduce it. It's not very elegant, but for corner cases, it seems >>>> necessary to make some compromises. >>> >>> I'm going to see if I can create one. Not sure I fully understand the >>> issue yet, but I'd be surprised if there isn't a more appropriate and >>> elegant solution rather than exposing the io-wq guts and having >>> userfaultfd manipulate them. That really should not be necessary. >>> >>> -- >>> Jens Axboe >> Thanks.I'm looking forward to your good news. > > Well, let's hope there is! In any case, your patches could be > considerably improved if you did: > > void set_userfault_flag_for_ioworker(void) > { > struct io_worker *worker; > if (!(current->flags & PF_IO_WORKER)) > return; > worker = current->worker_private; > set_bit(IO_WORKER_F_FAULT, &worker->flags); > } > > void clear_userfault_flag_for_ioworker(void) > { > struct io_worker *worker; > if (!(current->flags & PF_IO_WORKER)) > return; > worker = current->worker_private; > clear_bit(IO_WORKER_F_FAULT, &worker->flags); > } > > and then userfaultfd would not need any odd checking, or needing io-wq > related structures public. That'd drastically cut down on the size of > them, and make it a bit more palatable. Forgot to ask, what kernel are you running on? -- Jens Axboe