From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45276C369D1 for ; Tue, 22 Apr 2025 17:33:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4C806B0005; Tue, 22 Apr 2025 13:33:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFAC36B0006; Tue, 22 Apr 2025 13:33:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC3AD6B0008; Tue, 22 Apr 2025 13:33:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9F9546B0005 for ; Tue, 22 Apr 2025 13:33:46 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7F307C0ECE for ; Tue, 22 Apr 2025 17:33:46 +0000 (UTC) X-FDA: 83362377252.02.EC1ED14 Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) by imf01.hostedemail.com (Postfix) with ESMTP id 4E52540019 for ; Tue, 22 Apr 2025 17:33:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=oLslCY4s; dmarc=none; spf=pass (imf01.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.43 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745343224; a=rsa-sha256; cv=none; b=od6lPLhr31BYoyb+zib0Ktk2AbdXE9cpH7xU7enb83Z9IY4mjg01VQg+s3J7TJT3Y7A5pK kUif7sOq2+vuOH4+dN+M14Xy120Sp2GKKverFpyMqP62XT/M7KjyyVETuEEIpJDq53j+YA GeLp+Fu6iFXiVk0x/AtcghzlRcHfsAQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=oLslCY4s; dmarc=none; spf=pass (imf01.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.43 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745343224; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9ScI1SXFOzSb57R/DfhDhjRTVd9dkpAdipPrsoDQFnA=; b=K45cQINnEwTcSl29EKZncF49ROlPX8hNRZegYP22iK/K6c7QCupi5IjzdShHNrubmBNoQh Gx4Rgdp63q+L2HEyyCUtRJpz3XGllfo9j0Td9vpRIT1XyskOoxvgHKjyjpoGutj482aPCc XL5Sblbi5v6LUejHgCagmfpSlxwHfQg= Received: by mail-io1-f43.google.com with SMTP id ca18e2360f4ac-85b3f92c866so94849039f.3 for ; Tue, 22 Apr 2025 10:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745343223; x=1745948023; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=9ScI1SXFOzSb57R/DfhDhjRTVd9dkpAdipPrsoDQFnA=; b=oLslCY4sDPa61ymP6n+s8ZDFArMeqrQTLYxz1L9PJ2QGrWOWzoWHz3TG8xbHZvMqhS 2NsKmkGA49WbAzIjnli2X7lvKasa1IqhJCorvHcuaJrGKXsO5N0xogOr9bzjTXy/XzCn 2UdLf6JEGFQI2fwIohUesIBTUqF1A3iETOm/XcDQ3FZsEPsuXiUptJMdQ6+jFjPIQLDL 4Ywl3qKtf9iJo71/14w7jcaNAX6314Jl/94jL1Zkv/SafkmompY1DNU2W8gnXL0KpKPA 9zn/BvHUab4gdLvwPmAcY5OvBjassMQX8ClPwZsUKUlUCnf+NAhCxprQ9HS4oIrhOmzw vy3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745343223; x=1745948023; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9ScI1SXFOzSb57R/DfhDhjRTVd9dkpAdipPrsoDQFnA=; b=v3bGzlWBltIkLx2LcMM/JelZ0pI6jiS3KYlSojZHHH8YQ/ngLACkdrq5g7ntymh3ax FAIXXJfmzrHdwoi9GhI5jAxJaNjYhSsIHQz6N4K7MZB/YZtjLi9uNDC6zdE0qzsD9lAX VPX91HJ53nbiK+LsvRyo7ncpwz97l8iak+Bw6MBJCmzD6wAjV5X82BwMMqvGXVim3rz2 6D7wCmslmlspvP6dVncs0X36zZVwZaABzUmgnQPUmmpBSxFvjiZ1WnuQZo2cVrcNEvPm 2NC/JSfDM/7H79gAF5tA+UwhuvAfffevZ7BTOtm73hZycpZ+mjiXJSM9Htiv3T5c3+Jk KV+w== X-Forwarded-Encrypted: i=1; AJvYcCUckahMbQwC9a5mdShodWS+Cic4gLzU9/tSxByhszyNBUuEEAn4ByaU8fVXw0YOhIXrGsXQvPRqNg==@kvack.org X-Gm-Message-State: AOJu0Yx5f6kV3Gxg2BE5Ur2YSi6eJrpM21hnSQpHbeDHy6I1YbIWaVo6 nRbdIV670em4Vmk+rD0Vd7QZmEE1iOsTl3nDlkK1nTrVbcsK5lRBa4wZ29sr5hs= X-Gm-Gg: ASbGncsTr8KhN/FL5VFldssbiPDvngFJDbzuWcx9hO6MdHQtEZOk/Va9IxztXY02dSN g5LxPT9qutUYChn9NBnFvRxrjc5E9j+CRgi98qZOfpQfCZEmGSN1lH1sWMHchwkqiKPnahnj6eG 4tPhyexjQizQ2mfDXlMLHM4S5B1Y9SD4iiqZQah6E9ojqsf2ABIDHmmtG1c4AcqvXxqxb+xvdQC /LB5tKjgm0GLhT5GxcVfYh4U4VLRUYHDgpzu8htxUz4MId7W4Z1NMcyF5qz0u3wO9bW5H6galFw 33Y0mygO4bAABEQHIidlu9R9L3LVz2S1vgfzZA== X-Google-Smtp-Source: AGHT+IG8uGVTpBjrvsiqLLauHBgN8q79Pyq/8FjC5UurY/AC+NdKVjMzlDN4p2eRnVLt9F3CH6lp+g== X-Received: by 2002:a05:6602:3789:b0:85b:3885:159e with SMTP id ca18e2360f4ac-861dbdc71e1mr1900443339f.3.1745343223260; Tue, 22 Apr 2025 10:33:43 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f6a380664dsm2413104173.47.2025.04.22.10.33.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Apr 2025 10:33:42 -0700 (PDT) Message-ID: Date: Tue, 22 Apr 2025 11:33:41 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios To: =?UTF-8?B?5aec5pm65Lyf?= Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org References: <20250422162913.1242057-1-qq282012236@gmail.com> <20250422162913.1242057-2-qq282012236@gmail.com> <14195206-47b1-4483-996d-3315aa7c33aa@kernel.dk> From: Jens Axboe Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4E52540019 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: h73zzwsb4crxidks4zatd88gxhoub4s3 X-HE-Tag: 1745343224-240231 X-HE-Meta: U2FsdGVkX19gRlEHYz5a24EaNdDYpFBVI6iRiwoogD7XNgbuKwgSCy0j+dm7sS68/Skylamnma4OvXtbvgRjY3ghA8ENIOMkc6Ihyz/+t1wAwv1UhD36fjKEitXPDZOS2pdH1B0IcjsrClxKU1SbHml10kMwedo3Q3EHwzDRN7REbflhvT7WQQz5ATrjIvMz1glXSXJMeFtDd2SPrVNrRsghYY+s4K5EZBDTcOs/xmpoeIUExJwVgaJuIRe1HztU4CPO3OpBzwjcy7NWIZ5Ukr8/T+2EapqJO03pPXqtTwLUyD9+/1fFGRWFHjlsQjUmnUpc8/rD0bgNoxzdBITBj1/YTJEBDSwzslVjNY/fVubzATsurzp0Lhn7e6mVX+V+j01/HGRp9PtvkFzBOwfWicsupqWW0NPtQuH4LD0w0ngzXvxkeCRp9e5tDuqL9hzoZ66XVao4203KWEqBru0F9Tbd+aYqhccPqMzw7GU31AT02TSMvI9EGN7GgnnSH+6T44FYUhcSuEUSpzqJzHXRXpR5cdrNUmAzz9x5qi2yAwHcl1QrF9Qlf7sTFTHDWj9nyiFSl5BDpkqQzsgoJS4rvksfk1KrPPx7QRgiTjvalwOalQNj3hJ+oFNdW5tgC8rKv9aA02Bqn1/KHVey/PxyuVaTAlEPM/eh41926wy2trY9Cb3Qlu9PTFrtgrOvUBujn01yNCJmjxghkkzBkIWZ/29Mrsw7zfDN6izlgxy5bi5kR9Hjgwvw2Cmar0Hp2ypcw1bkmlfSdquq1IdFQaq+0zTGhELVrgfmdhuGYDmcLCrRlzI3OSty8f9S7K0ytohLQKn09+xx23T4w6tOQsLB3YgilPN51Gwp54EQI3Qn9hmbm3xuirB9IocKyHLIM9JPh2vey3i2SuQQeXdgC36gfKWino5O9XfIg9J5x3HYoVh7879kbScw/Y7XPGgB97PqjsDaIk30J2eqmu1hDLZ 7U1oMAm5 qkv8+2lLBzySXHNY39Kv9Zvls6Dtif3Yr4kbm1mfA1SOOcOO/alU5RpfHlszkuvaZ6Yd97dstn9cmONCC2S6WaRXsDq9pUBIJ59/oiZvido0ggEvsTPV6gZzyoFvsB+r8SxWyGlVS+/rTifttGzLAJi+eGsXArY9o69ghf3neHo0FDFyONrSeFFwjIL5q28ZqpfJRlNnjSnHYj4+g8+Pp26ZmPW+Ce99jtiNpnusLm0wxQgoHChsNBnEMgrgVDw8rUc/U6XMekexFf3azV8BAb/a5O+WDbNC2yFPqEFs+ekUdvFmvZOqS5fBaLq58IWrFRDNs5+UcZCbGJhMXxallMaYqJQlj5LznxSxWk8YSI/CL9gMk5nl1Nk6r+pdeQqJM9nRYTQ21krr95ef1xLuytb9mx47KzpMkZQp+0FLXZdXlufIYelU7AzjL7faO31eKpPrAsgB71v3bE/9I6BeeKZR2Fj2HHUzSR2sK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 11:04 AM, ??? wrote: > On Wed, Apr 23, 2025 at 12:32?AM Jens Axboe wrote: >> >> On 4/22/25 10:29 AM, Zhiwei Jiang wrote: >>> diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h >>> index d4fb2940e435..8567a9c819db 100644 >>> --- a/io_uring/io-wq.h >>> +++ b/io_uring/io-wq.h >>> @@ -70,8 +70,10 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, >>> void *data, bool cancel_all); >>> >>> #if defined(CONFIG_IO_WQ) >>> -extern void io_wq_worker_sleeping(struct task_struct *); >>> -extern void io_wq_worker_running(struct task_struct *); >>> +extern void io_wq_worker_sleeping(struct task_struct *tsk); >>> +extern void io_wq_worker_running(struct task_struct *tsk); >>> +extern void set_userfault_flag_for_ioworker(void); >>> +extern void clear_userfault_flag_for_ioworker(void); >>> #else >>> static inline void io_wq_worker_sleeping(struct task_struct *tsk) >>> { >>> @@ -79,6 +81,12 @@ static inline void io_wq_worker_sleeping(struct task_struct *tsk) >>> static inline void io_wq_worker_running(struct task_struct *tsk) >>> { >>> } >>> +static inline void set_userfault_flag_for_ioworker(void) >>> +{ >>> +} >>> +static inline void clear_userfault_flag_for_ioworker(void) >>> +{ >>> +} >>> #endif >>> >>> static inline bool io_wq_current_is_worker(void) >> >> This should go in include/linux/io_uring.h and then userfaultfd would >> not have to include io_uring private headers. >> >> But that's beside the point, like I said we still need to get to the >> bottom of what is going on here first, rather than try and paper around >> it. So please don't post more versions of this before we have that >> understanding. >> >> See previous emails on 6.8 and other kernel versions. >> >> -- >> Jens Axboe > The issue did not involve creating new worker processes. Instead, the > existing IOU worker kernel threads (about a dozen) associated with the VM > process were fully utilizing CPU without writing data, caused by a fault > while reading user data pages in the fault_in_iov_iter_readable function > when pulling user memory into kernel space. OK that makes more sense, I can certainly reproduce a loop in this path: iou-wrk-726 729 36.910071: 9737 cycles:P: ffff800080456c44 handle_userfault+0x47c ffff800080381fc0 hugetlb_fault+0xb68 ffff80008031fee4 handle_mm_fault+0x2fc ffff8000812ada6c do_page_fault+0x1e4 ffff8000812ae024 do_translation_fault+0x9c ffff800080049a9c do_mem_abort+0x44 ffff80008129bd78 el1_abort+0x38 ffff80008129ceb4 el1h_64_sync_handler+0xd4 ffff8000800112b4 el1h_64_sync+0x6c ffff80008030984c fault_in_readable+0x74 ffff800080476f3c iomap_file_buffered_write+0x14c ffff8000809b1230 blkdev_write_iter+0x1a8 ffff800080a1f378 io_write+0x188 ffff800080a14f30 io_issue_sqe+0x68 ffff800080a155d0 io_wq_submit_work+0xa8 ffff800080a32afc io_worker_handle_work+0x1f4 ffff800080a332b8 io_wq_worker+0x110 ffff80008002dd38 ret_from_fork+0x10 which seems to be expected, we'd continually try and fault in the ranges, if the userfaultfd handler isn't filling them. I guess this is where I'm still confused, because I don't see how this is different from if you have a normal write(2) syscall doing the same thing - you'd get the same looping. ?? > This issue occurs like during VM snapshot loading (which uses > userfaultfd for on-demand memory loading), while the task in the guest is > writing data to disk. > > Normally, the VM first triggers a user fault to fill the page table. > So in the IOU worker thread, the page tables are already filled, > fault no chance happens when faulting in memory pages > in fault_in_iov_iter_readable. > > I suspect that during snapshot loading, a memory access in the > VM triggers an async page fault handled by the kernel thread, > while the IOU worker's async kernel thread is also running. > Maybe If the IOU worker's thread is scheduled first. > I?m going to bed now. Ah ok, so what you're saying is that because we end up not sleeping (because a signal is pending, it seems), then the fault will never get filled and hence progress not made? And the signal is pending because someone tried to create a net worker, and this work is not getting processed. -- Jens Axboe