From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4851DC369AB for ; Thu, 24 Apr 2025 14:09:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5431E6B00B1; Thu, 24 Apr 2025 10:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F4306B00B2; Thu, 24 Apr 2025 10:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BFDA6B00B4; Thu, 24 Apr 2025 10:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1458D6B00B1 for ; Thu, 24 Apr 2025 10:09:05 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 456311A0901 for ; Thu, 24 Apr 2025 14:09:06 +0000 (UTC) X-FDA: 83369119092.11.89C2445 Received: from mail-oo1-f42.google.com (mail-oo1-f42.google.com [209.85.161.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 4B91580012 for ; Thu, 24 Apr 2025 14:09:04 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kVK6fdWk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=qq282012236@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745503744; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2CAkVkU/OxnMww8X2FpR50DRF3NZTt9BRpF414jxVPw=; b=FVnEvbrb9GGGpTgZyjCQfWPc+sGfaE3WQaLqGLEgOc/j+ZwwWppAAyaMTojpPFj/0fN2Sr CMomm2FvM/S5TjSf+VFILMO7yyGaGhGF3YyNbG9PPD6qJ6Udpbg9y1jOh8G6MkefXjxmlx MrrTv0H8FDl0yM7LqBglKbyAX4DorTQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745503744; a=rsa-sha256; cv=none; b=Rfxn9vgEUHB9/qxKkEeGtEObdu7ZOBQJRSwbtmgRfMl9I3qE55awHxtdWtpMU8//w4ztdx kCYGGduZekvEjfbkrl9EGb9/Olr5gPVHIqcK8mkVqxVXnlS6DBV5IU4MsfHOfXyS43Apvz q6PfkT6CdsrUnnSaV6HNizk46jfM3yA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kVK6fdWk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of qq282012236@gmail.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=qq282012236@gmail.com Received: by mail-oo1-f42.google.com with SMTP id 006d021491bc7-604f0d27c24so544360eaf.2 for ; Thu, 24 Apr 2025 07:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745503743; x=1746108543; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2CAkVkU/OxnMww8X2FpR50DRF3NZTt9BRpF414jxVPw=; b=kVK6fdWkwChUrtUC3YRm7Yf00jtLygWFC7e12UoHqLtsths85ep9YlQ0BJ2k1YukS2 metm/syLBkpl4KntbW/TEOmuaPJFsdDE9ipidMgd+3c6PQF5k4xQ+lLdo0ELftXm9zoD DQ4C4A1FFT2oeTJswbL8e0Mgs+gHD1PqNMVTopmqaESdZKUvPf2kX/wIQp7StW+fzoE6 HT/cKsv28QbXS3aGbuLWxhmGlUbN9D/QkXDpqbM/q5Vxghh3BNXyecD+/7+gNGls4zr+ w4X5uDLMEh0+oPJcNQ17ZGT7CRtVD0O6ucmqG7Qd7bXzPbflsco0jEzpF7Iej9Oc3S8l rIJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745503743; x=1746108543; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2CAkVkU/OxnMww8X2FpR50DRF3NZTt9BRpF414jxVPw=; b=T75G8nvrfLi7TBcNs+qBzNf2IE6VTsMbs8XTCo+bBPSj1jgDJkHVwgdyU2MDcZawgd 78DrVtlpFB24EYTa1BFpLbhbTvxT5odbH5IKol3WbWEJsx6RF4DzjAtpx3mBuXsB2guw dgWt+dzZ/XrUeLGI5D0RDkVMLruwkh3xXMF9PvDWXa1uMFMuc/AjvHMfGu+xf4KrXxJT nNlCmMg91wdTb3p9H50KkqbhBZLeETDi15RSsPIJm1ssf1nfD2ML8JVX0U5UaGnkXAbU ykhZn2xNv+sPYCAgvlnPhh40FI7qQlFpgWvHOzA4NoODHWJy3deGYrXB5vx4/9cjMV6F aoaw== X-Forwarded-Encrypted: i=1; AJvYcCW4OLpqSIskPeLQq43KS/0v2gxDdQDuB0Y7jUG2nU65kBfY++msZwLJyCjsiZM9H4vxBrRcSxMZqA==@kvack.org X-Gm-Message-State: AOJu0YymiBxdgFDVIU9iwY13/YskKuT00ZiaXyttMIYRNkFhhSX7EIsD 17MVLdVgFzUkTvbCJKIR449PynXRzrQtQwO/Zu7OCGspOAXfR7fGZwDnOjDKC7vXsYfnDbnqnXs mofMWIcivWXAZDk3K/DLMzedxDig= X-Gm-Gg: ASbGncuBelqc/qrWuWwfUpq+ezLJe231pRiSOD7OPe/SShpcs/fkg0anUbDJyYRMeR9 vCiUsxdfG5KRBG67aYnh+FFc0yqywM2HzQXnrbt3viL0SsSZkOQHY7HvxTQ8U2ri80aSF3E7CnK VtdFcFDfv8QNgWgnbQTXk8xmI= X-Google-Smtp-Source: AGHT+IH33bg6O/Rlzqan9kyG0UjeT6AF4yJ5sNzMyKR5fDnkXrMhiDWqYuid7ea+COTDLkcFINWneffryByzL7tziFc= X-Received: by 2002:a05:6870:808d:b0:2d5:4f4:e24d with SMTP id 586e51a60fabf-2d96e2203f8mr1857960fac.6.1745503742322; Thu, 24 Apr 2025 07:09:02 -0700 (PDT) MIME-Version: 1.0 References: <20250422162913.1242057-1-qq282012236@gmail.com> <20250422162913.1242057-2-qq282012236@gmail.com> <14195206-47b1-4483-996d-3315aa7c33aa@kernel.dk> <7bea9c74-7551-4312-bece-86c4ad5c982f@kernel.dk> <52d55891-36e3-43e7-9726-a2cd113f5327@kernel.dk> In-Reply-To: From: =?UTF-8?B?5aec5pm65Lyf?= Date: Thu, 24 Apr 2025 22:08:50 +0800 X-Gm-Features: ATxdqUFf6_g9o4TAG75rI3AXOHGexfQMUN56anr7lPfaQPN_FZU4nYJA67nCd1Y Message-ID: Subject: Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios To: Jens Axboe Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, peterx@redhat.com, asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4B91580012 X-Rspam-User: X-Stat-Signature: kj4e5e4hdcegd13infy9iid6n8zim63f X-HE-Tag: 1745503744-792292 X-HE-Meta: U2FsdGVkX1/gV0+hdtVtTFyMM3005Ly3Qq/dekwB6B1t22l2iaa5kNk16tkhY2M8f9LPK+oz+QQM4hdfXpcWqA8MEJjqgEpUt1aWZ5c9t3huT/y1TcLkKrmXOpX/Ez3UD4wf0f3lX1bWpxtlQDeQZ03mPGH/fKN31RebdRli/Rlr8SVU+cKb8/EkprYwKpxeSFjZ6WL4cHc32c6/P/xuSxmWU4WxOU555F5z5cPdt6k41Db1M3P6/VKnkS+yqcN9sdkowwvvuoQ721jvUfGTotmWr0+0X2S/k2kFayTqk2BOd4zhjMRTKR/IZYxovCJslWtgoOs2AmCUs2YOlEWysGs+LXSCvrnoCpmkh1Evm+x+I5wXd2QOfU3kDGF6iGfSajI3P3w/oTOrfzi1lCBpwfGqQMtQc0wUCn1DFE2oeDs2P22S1wxKvkAp49tRtic6ACk1N7fW7phafOBpai7ha0p2bEv154rxQqZPHYtTqAO2DttOQunPaPPfZv5jKfa2rlwvvdeAqUHO3DAhX28ID7KzuKlPoRKQ0PnUW6snoOxMhBnHtSft6CDrEtNRWZNzoa87uqa3i8fuGWa5d3LoSdWTq6NCN16ELt5TqS670BuCLM/komYv+NmC1BrhJVGyC3oCCvxe7ZvXHPqcJELyx3XJkNep2XYTXhZ0zbJNdVZN3nIenYlc3hP7gb00ekoMjv1VkY/8xGaub3dDLzv1KM/LwJexJwo9cZVFsxWNpKGu3cdAC7t9zG6uc+UVe4TA7oufcQEaRshOISt/1gNrX2m1ONR24ICvZhlSrcgK+7NDOiVHFfqHixfLV2RLKkb6BErAeATu/pZa/ebs5o8kQkZK03Jaiy4LGzBBQXdS2iNy1RphPtfU0mnyi9UjrWEkxFLIKPOaeTxNjv3mOTHFhuPFV3M2S/zgBMIQxTiIkUTXeOXW5UcIo6+gtjpgsJcprkw2fiVxunkNECk9cv0 WefwkNTl QZYXwp2wgf8KEGoensnXDGwzplJz2rFOB/Q1BuArcxggxtQAzCwF7epFH7mxbNa9ptdw99RNYjln7U5FxrsMlKkOoickNIee/yP7DkwTJAhNJc6xPi6sYtDBioP+a31H85+KAR7dfYXS8D/1/vdzCev3SLfDJEgpUYrHBWUW42pak26J1uHLISuEVv6h7XHKdsM15mA1UWIJtyGmXaJ7hKQfKSaWW8WbC2ot7u/06kD3FiDLqLhv7fZbMisGveQzcRXMwvHthxbbczQp6uDWY8wQTYAjCoF6k9JucE0t+RJDifT1QgUKFgadz3n/nrH4Q37YY2WJWEoAqqR+GfmNThwSRs01u7v1yO1j7+UcPypKqicDQ34z6OzBrLi3+0ES/zl5ALzfHDjPWXiyszzuAwmABTnORkWoOiYL0uYRg9VXKKDiUvWJd1OTS64MbD3WFCpxaiNPfsG0ucX4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Jens Axboe =E4=BA=8E2025=E5=B9=B44=E6=9C=8824=E6=97=A5=E5= =91=A8=E5=9B=9B 06:58=E5=86=99=E9=81=93=EF=BC=9A > > On 4/23/25 9:55 AM, Jens Axboe wrote: > > Something like this, perhaps - it'll ensure that io-wq workers get a > > chance to flush out pending work, which should prevent the looping. I'v= e > > attached a basic test case. It'll issue a write that will fault, and > > then try and cancel that as a way to trigger the TIF_NOTIFY_SIGNAL base= d > > looping. > > Something that may actually work - use TASK_UNINTERRUPTIBLE IFF > signal_pending() is true AND the fault has already been tried once > before. If that's the case, rather than just call schedule() with > TASK_INTERRUPTIBLE, use TASK_UNINTERRUPTIBLE and schedule_timeout() with > a suitable timeout length that prevents the annoying parts busy looping. > I used HZ / 10. > > I don't see how to fix userfaultfd for this case, either using io_uring > or normal write(2). Normal syscalls can pass back -ERESTARTSYS and get > it retried, but there's no way to do that from inside fault handling. So > I think we just have to be nicer about it. > > Andrew, as the userfaultfd maintainer, what do you think? > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index d80f94346199..1016268c7b51 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -334,15 +334,29 @@ static inline bool userfaultfd_must_wait(struct use= rfaultfd_ctx *ctx, > return ret; > } > > -static inline unsigned int userfaultfd_get_blocking_state(unsigned int f= lags) > +struct userfault_wait { > + unsigned int task_state; > + bool timeout; > +}; > + > +static struct userfault_wait userfaultfd_get_blocking_state(unsigned int= flags) > { > + /* > + * If the fault has already been tried AND there's a signal pendi= ng > + * for this task, use TASK_UNINTERRUPTIBLE with a small timeout. > + * This prevents busy looping where schedule() otherwise does not= hing > + * for TASK_INTERRUPTIBLE when the task has a signal pending. > + */ > + if ((flags & FAULT_FLAG_TRIED) && signal_pending(current)) > + return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, tr= ue }; > + > if (flags & FAULT_FLAG_INTERRUPTIBLE) > - return TASK_INTERRUPTIBLE; > + return (struct userfault_wait) { TASK_INTERRUPTIBLE, fals= e }; > > if (flags & FAULT_FLAG_KILLABLE) > - return TASK_KILLABLE; > + return (struct userfault_wait) { TASK_KILLABLE, false }; > > - return TASK_UNINTERRUPTIBLE; > + return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, false }; > } > > /* > @@ -368,7 +382,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, uns= igned long reason) > struct userfaultfd_wait_queue uwq; > vm_fault_t ret =3D VM_FAULT_SIGBUS; > bool must_wait; > - unsigned int blocking_state; > + struct userfault_wait wait_mode; > > /* > * We don't do userfault handling for the final child pid update > @@ -466,7 +480,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, uns= igned long reason) > uwq.ctx =3D ctx; > uwq.waken =3D false; > > - blocking_state =3D userfaultfd_get_blocking_state(vmf->flags); > + wait_mode =3D userfaultfd_get_blocking_state(vmf->flags); > > /* > * Take the vma lock now, in order to safely call > @@ -488,7 +502,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, uns= igned long reason) > * following the spin_unlock to happen before the list_add in > * __add_wait_queue. > */ > - set_current_state(blocking_state); > + set_current_state(wait_mode.task_state); > spin_unlock_irq(&ctx->fault_pending_wqh.lock); > > if (!is_vm_hugetlb_page(vma)) > @@ -501,7 +515,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, un= signed long reason) > > if (likely(must_wait && !READ_ONCE(ctx->released))) { > wake_up_poll(&ctx->fd_wqh, EPOLLIN); > - schedule(); > + /* See comment in userfaultfd_get_blocking_state() */ > + if (!wait_mode.timeout) > + schedule(); > + else > + schedule_timeout(HZ / 10); > } > > __set_current_state(TASK_RUNNING); > > -- > Jens Axboe I guess the previous io_work_fault patch might have already addressed the issue sufficiently. The later patch that adds a timeout for userfaultfd mig= ht not be necessary wouldn=E2=80=99t returning after a timeout just cause the= same fault to repeat indefinitely again? Regardless of whether the thread is in UN or IN state, the expected behavior should be to wait until the page is filled or the uffd resource is released to be woken up, which seems like the correct logic.