From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A339C369C2 for ; Tue, 22 Apr 2025 16:29:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18E426B000C; Tue, 22 Apr 2025 12:29:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13F406B000D; Tue, 22 Apr 2025 12:29:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 005466B000E; Tue, 22 Apr 2025 12:29:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D77096B000C for ; Tue, 22 Apr 2025 12:29:38 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 073BCBB67F for ; Tue, 22 Apr 2025 16:29:39 +0000 (UTC) X-FDA: 83362215678.28.377E202 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf05.hostedemail.com (Postfix) with ESMTP id 1D7AC100003 for ; Tue, 22 Apr 2025 16:29:36 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aoDVzHYl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=qq282012236@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745339377; a=rsa-sha256; cv=none; b=xM2f5hQpw6lGvE6aXeUgwSFr6uoBsKEnuRjB1dPlKXGFGwaYQxxznubKKBvzFkyNylNpUs GahiQCdQse3dV0Wqe1w6t3bYvd/uyYxRGQ0goWtK1DF0vZmlL0LE1at2X/qTjiowbmEt8w IwqoAK5+bOfqV+RPtRi0t/h9YrFuSOI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aoDVzHYl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=qq282012236@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745339377; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZGWcwKQmaZiYdj1a1V7uIYMMhxfylLN6zGwOKb4RBME=; b=VfTMRPP+BL1L1H4rZfZIEzqKq681J3j4hTo+lc07nZUMw9l/kVlZQzdZgc1ZvI5Ggc5LUO gzihxc8hD1BbpL/Y1dlv1iTkauNFMOSgNxKyqgRoHtyrrFEESFbVYs+Xv5KDikmwgxFNZ4 J8JPEQILNFfpl+nhta9m1AIg5QGz5dM= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-227c7e57da2so47197085ad.0 for ; Tue, 22 Apr 2025 09:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745339376; x=1745944176; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZGWcwKQmaZiYdj1a1V7uIYMMhxfylLN6zGwOKb4RBME=; b=aoDVzHYlXoqWZ6C4z8jVSSzgQbjB1QZSBNpaco/f2e5Q5/1X4Wm8BzmgRZ9hD5vCoG SfpKh/HL/7gIeePXnOI2wC2OoGLwgBogLbImEKFeKraD8FQCwlwlL6ICtzwsgq6VnQEt OiWgmfCbrtYRrec17dFeghJLInPLTgc54hGNqykUwLz1l8WlJ49wjA2JR5QmAWTytlso RadnngUFHk/Mf+tejUaCaAgbOLz4shQ5jAN522CU0/sJI7cS2veZBv2qckRk4xQESYhg GeLT1njTTfZ9xk8ofwLUtJORDEKY+E3ZYSadjGjaf3HLZQuDPNHpLyUcxcrPso2A7bcy I30w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745339376; x=1745944176; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZGWcwKQmaZiYdj1a1V7uIYMMhxfylLN6zGwOKb4RBME=; b=Jdga8KifdSNMNeGvgOVvY9uc3NRM0sN3kIMMxVo5pP6QIggliWhTNczE5KuBkWJpS6 LTh7/FEAZbrygYRVLBksYCqj4NDfr6hpk9NZJyTI/DvCSwa6rP1en6pDpjU/KJgGYtT0 aSJOwC4amcf9xXQtqmLOt30nO2J1MdwptBebCad/6hulfsgAdsU68HREqnsEYioY+gJ5 dw05hZ8JpEDCZkJ3aFnT00Ve0QNl6dFz7UwlWHsw56S1IT8GGZ2favbtetdaY7ClE9uu +KpVb4E7RE8uLLXb/NQKZ6asIE2k/2a8rcS/ZC/6Nrav3NOHZf/0e6HxL3dq8rLlg5oE oH4Q== X-Forwarded-Encrypted: i=1; AJvYcCUOJOvwDu4wji4UJnu/cy8xQC5YJW1L3oUmU/3SpCbyhbtweS8QaVZQcDkGrDa9HFvIN0oo1SPNyQ==@kvack.org X-Gm-Message-State: AOJu0Yz66KoiL2s52yDfmGua2+hnzN9BMz6cA7e8wiENR3WNUi7kGF49 2qamoG5VhiDWkq1ajc2eUP+U8QlmGA+0PC/mcRO/aEvklmlYVD5t X-Gm-Gg: ASbGnctZW1mVf7xJTG6zqn7NV+NpUxc4mNRXsx9s7+lGIjaMJhRr/hR3wJmrsmsL+Zx NGqHLEe1OJ73pdW8gw1UXRBb4rZFt0AoYWiNx12EUlu1GzJr1R+2d+5Jb9SKd6E/oMsHJcKsF+G oFtihiOuH2R6FOCo4X6H0C5Gsqmo4Vvz5TLw/lBqvdJmHxEj4KCMH7jQTC/x0SzqHoDVPyHhHBa RUqvGMfOyRirCU17gyy/ulPtPhSF7X2mxjTeN2CPEpG3p8CLk3Q4ycmkWHFT2vjGrNxEi2eUXyq +aJ9N3C9SuDUHrvd9ntTShDuMc+ow1oD61uXpfUBmtdK02gV2EfiF9PGW8FEm9td1z8DHLbFuMt MwgDY4L9KvQJ33T9Xa6/UdMsTlvymo7PDRKc2Nk9GVWrTQysB8UtIhr6CRto8M9LwumqBgmRCIb Cq5X7A X-Google-Smtp-Source: AGHT+IElm/lh7VyfuLradDvOYg4VHePKck6v5HX980zzUVwjtUOMYXTrCjwOUmx6CbuBc8VgW4RtPw== X-Received: by 2002:a17:903:3bad:b0:223:4341:a994 with SMTP id d9443c01a7336-22c53285c80mr214749845ad.9.1745339375912; Tue, 22 Apr 2025 09:29:35 -0700 (PDT) Received: from linux-devops-jiangzhiwei-1.asia-southeast1-a.c.monica-ops.internal (92.206.124.34.bc.googleusercontent.com. [34.124.206.92]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22c50fe20b0sm87481695ad.243.2025.04.22.09.29.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Apr 2025 09:29:35 -0700 (PDT) From: Zhiwei Jiang To: viro@zeniv.linux.org.uk Cc: brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, axboe@kernel.dk, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Zhiwei Jiang Subject: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios Date: Tue, 22 Apr 2025 16:29:12 +0000 Message-Id: <20250422162913.1242057-2-qq282012236@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250422162913.1242057-1-qq282012236@gmail.com> References: <20250422162913.1242057-1-qq282012236@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1D7AC100003 X-Stat-Signature: jcamdxtrgrxrw9o9pszo3earbu1iuyt7 X-HE-Tag: 1745339376-92621 X-HE-Meta: U2FsdGVkX1/aSSMn5uvH1phrJEknF0cdFUlKSWkciew9IjfdIcfD2/O/1ZVQgIrGEtqTp0MNAEj20O2Lr2YUlBC2eHC3rB4rYtLZTUYJgZN450WZmM5WArVE/BCzdewwX90/gD7FeZZ5xyI3Jnebs6UL3+x1iha2xosiKcRGM+SiZE/2gF+lPqKAYHPf9m+jYS4GLeWdhWrISEPEZvSU204vA44o9uXG0JRKkg5VkiwrbNPaQXeS6owq9Zz4GL+v3zvtkHMikmbxGcWdBR9u6hAqW0ydqBDz8q+L+S2qEsvAYioxvcgYfcIML18uo4jew0yMUbedCHzw8VNLAZkJ/Fdqv5PCE1tllDtJecNaurYUf6KEmca0kg+x/qD3UGw7ueu1X/AkNQsWnVbPpYHIe8756mJ+Rl2N3ShFrV8l0tu0PRf2CFWS7cY2ZgJ8vLDfBx7ruJfCVo1inDk42xAqBVhsD1wlPmk3VVGQEufl7w35UhlRgKih3l0R1FQXaubug22GL790Qdsnkp+ANNklmTj/YkIQnKuZ9m/FvMgJiD0n3vEK2xVr6g+HLdr4abWcsPd2A/E6zLSvNlWI4j+OBBB/cLKpXVaPzSvY/0187G9p9/laLh/da0hJ6mQ4kzzkcAqtny0egZQF5C3gYGSBcNcjOH5KFYug8bUH7VPDEeGASg/5MBSOLppKGukAqTgOkHp+SQnTzlXPPS7cFCL7PbIEQqOyjZNf9d9bYIKLBKsoyJUCasSNQep6bEXzYZvEiZkr4G32Bt+na/gmpLFgZmKAmpnQXbPRFxPFBA6gqggL2dMlpZuFO7iNmWnzt5AduXljBUo1Xidnk5JkJnlcM4r25MLgH86OoHfrij+MXxhXEP+uqMh6RNLE2d/ufIKCCwH88bkuPV2wVuD8ZBL0avpOdQdUrNL2rgX7hsur3E/tlbDw145BoAXsfgoClLSVoKo34kif98gx2hU+092 B1p3Uvf5 81agY4UL7VZW9bZIis+D7lXQZpxQ194afTHByI/n4BUTkYGobM/BnzqIw0chc8ZH4JsA4emUMT1BNnryADC80+DDg4DaZntp3hNgM6SzzcuVr6OtKqBv+f4JfKuh2gJbVsO1EHtqS5FsfLfYj4UfspJ8+MKRNMYGUoS/wGYhteaFwi5gUyxCgeXFLH/lrN3Zihz8IRGCrC5sJ8tV7TUCn78C4i6wIDsp9uOulyr2EY7RpSSbxcZ6T0l7rlfdTmNQxDtVILHERbbMYGA77iEBJouKfKlDoyiTVHZ+IgiyHHM5x0YQLDsmyVtsLw9/B49tGNiJo4IFqkw5u17LFAMIKX5NeaKedHAPq3Df/yAA89yfOTfsTKEXvAH5k50QcbZub2O5D316U2ecYbfXMj6XZa5In/hynzMaxMkGOcqajf6YyeookzHKff0/hLoOEVmHgnKfDNyhnfm/JsLPiJjHIg/UqggPx9c+pHVrl4sSRf0KxGaCyWNgWm/CVJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the Firecracker VM scenario, sporadically encountered threads with the UN state in the following call stack: [<0>] io_wq_put_and_exit+0xa1/0x210 [<0>] io_uring_clean_tctx+0x8e/0xd0 [<0>] io_uring_cancel_generic+0x19f/0x370 [<0>] __io_uring_cancel+0x14/0x20 [<0>] do_exit+0x17f/0x510 [<0>] do_group_exit+0x35/0x90 [<0>] get_signal+0x963/0x970 [<0>] arch_do_signal_or_restart+0x39/0x120 [<0>] syscall_exit_to_user_mode+0x206/0x260 [<0>] do_syscall_64+0x8d/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 The cause is a large number of IOU kernel threads saturating the CPU and not exiting. When the issue occurs, CPU usage 100% and can only be resolved by rebooting. Each thread's appears as follows: iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork_asm iou-wrk-44588 [kernel.kallsyms] [k] ret_from_fork iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker iou-wrk-44588 [kernel.kallsyms] [k] io_worker_handle_work iou-wrk-44588 [kernel.kallsyms] [k] io_wq_submit_work iou-wrk-44588 [kernel.kallsyms] [k] io_issue_sqe iou-wrk-44588 [kernel.kallsyms] [k] io_write iou-wrk-44588 [kernel.kallsyms] [k] blkdev_write_iter iou-wrk-44588 [kernel.kallsyms] [k] iomap_file_buffered_write iou-wrk-44588 [kernel.kallsyms] [k] iomap_write_iter iou-wrk-44588 [kernel.kallsyms] [k] fault_in_iov_iter_readable iou-wrk-44588 [kernel.kallsyms] [k] fault_in_readable iou-wrk-44588 [kernel.kallsyms] [k] asm_exc_page_fault iou-wrk-44588 [kernel.kallsyms] [k] exc_page_fault iou-wrk-44588 [kernel.kallsyms] [k] do_user_addr_fault iou-wrk-44588 [kernel.kallsyms] [k] handle_mm_fault iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_fault iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_no_page iou-wrk-44588 [kernel.kallsyms] [k] hugetlb_handle_userfault iou-wrk-44588 [kernel.kallsyms] [k] handle_userfault iou-wrk-44588 [kernel.kallsyms] [k] schedule iou-wrk-44588 [kernel.kallsyms] [k] __schedule iou-wrk-44588 [kernel.kallsyms] [k] __raw_spin_unlock_irq iou-wrk-44588 [kernel.kallsyms] [k] io_wq_worker_sleeping I tracked the address that triggered the fault and the related function graph, as well as the wake-up side of the user fault, and discovered this : In the IOU worker, when fault in a user space page, this space is associated with a userfault but does not sleep. This is because during scheduling, the judgment in the IOU worker context leads to early return. Meanwhile, the listener on the userfaultfd user side never performs a COPY to respond, causing the page table entry to remain empty. However, due to the early return, it does not sleep and wait to be awakened as in a normal user fault, thus continuously faulting at the same address,so CPU loop. Therefore, I believe it is necessary to specifically handle user faults by setting a new flag to allow schedule function to continue in such cases, make sure the thread to sleep.Export the relevant functions and struct for user fault. Signed-off-by: Zhiwei Jiang --- io_uring/io-wq.c | 35 +++++++++++++++++++++++++++++------ io_uring/io-wq.h | 12 ++++++++++-- 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 04a75d666195..4a4f65de6699 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -30,6 +30,7 @@ enum { IO_WORKER_F_UP = 0, /* up and active */ IO_WORKER_F_RUNNING = 1, /* account as running */ IO_WORKER_F_FREE = 2, /* worker on free list */ + IO_WORKER_F_FAULT = 3, /* used for userfault */ }; enum { @@ -706,6 +707,26 @@ static int io_wq_worker(void *data) return 0; } +void set_userfault_flag_for_ioworker(void) +{ + struct io_worker *worker; + + if (!(current->flags & PF_IO_WORKER)) + return; + worker = current->worker_private; + set_bit(IO_WORKER_F_FAULT, &worker->flags); +} + +void clear_userfault_flag_for_ioworker(void) +{ + struct io_worker *worker; + + if (!(current->flags & PF_IO_WORKER)) + return; + worker = current->worker_private; + clear_bit(IO_WORKER_F_FAULT, &worker->flags); +} + /* * Called when a worker is scheduled in. Mark us as currently running. */ @@ -715,12 +736,14 @@ void io_wq_worker_running(struct task_struct *tsk) if (!worker) return; - if (!test_bit(IO_WORKER_F_UP, &worker->flags)) - return; - if (test_bit(IO_WORKER_F_RUNNING, &worker->flags)) - return; - set_bit(IO_WORKER_F_RUNNING, &worker->flags); - io_wq_inc_running(worker); + if (!test_bit(IO_WORKER_F_FAULT, &worker->flags)) { + if (!test_bit(IO_WORKER_F_UP, &worker->flags)) + return; + if (test_bit(IO_WORKER_F_RUNNING, &worker->flags)) + return; + set_bit(IO_WORKER_F_RUNNING, &worker->flags); + io_wq_inc_running(worker); + } } /* diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h index d4fb2940e435..8567a9c819db 100644 --- a/io_uring/io-wq.h +++ b/io_uring/io-wq.h @@ -70,8 +70,10 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, void *data, bool cancel_all); #if defined(CONFIG_IO_WQ) -extern void io_wq_worker_sleeping(struct task_struct *); -extern void io_wq_worker_running(struct task_struct *); +extern void io_wq_worker_sleeping(struct task_struct *tsk); +extern void io_wq_worker_running(struct task_struct *tsk); +extern void set_userfault_flag_for_ioworker(void); +extern void clear_userfault_flag_for_ioworker(void); #else static inline void io_wq_worker_sleeping(struct task_struct *tsk) { @@ -79,6 +81,12 @@ static inline void io_wq_worker_sleeping(struct task_struct *tsk) static inline void io_wq_worker_running(struct task_struct *tsk) { } +static inline void set_userfault_flag_for_ioworker(void) +{ +} +static inline void clear_userfault_flag_for_ioworker(void) +{ +} #endif static inline bool io_wq_current_is_worker(void) -- 2.34.1