From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9133BC677C4 for ; Wed, 11 Jun 2025 12:56:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D4A26B0093; Wed, 11 Jun 2025 08:56:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AC0F6B0095; Wed, 11 Jun 2025 08:56:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C2706B0096; Wed, 11 Jun 2025 08:56:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F1FB66B0093 for ; Wed, 11 Jun 2025 08:56:51 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AA756160B90 for ; Wed, 11 Jun 2025 12:56:51 +0000 (UTC) X-FDA: 83543119422.08.C9A10CA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 5289DC000A for ; Wed, 11 Jun 2025 12:56:49 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QvcvQhpl; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749646609; a=rsa-sha256; cv=none; b=rB6LUBZ1ZdDC62BYJqZh86tr61c8WQ3nQzvOjim0Hl0m4jhCNiPmNZnHmpR3skAAaRfd56 cOh3kfvvS+vQ75BHzyHJZQVF0HZcvYL4hWwPdfRUDOhiCpcbc8QGWBtU/rMIktdj21ZENh G3T/UdYDNnujYXQAvwcdDSOXpbCt0MI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QvcvQhpl; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749646609; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A5Kas9xwQ+OPm3yoVaGIsnzULAQTcfMhc6nbP8S9fvE=; b=xmOUNIeLMh5rTZZN3KaV2PrYqv+JCST4dBs028M7u9U0AjwSEE3PfRMld22bpBVU2J8SQx K0pEJea34ocmDUx4B85khm3Lw9w892YoNoZM7XpMrNcz4ev055HU8aYXW1m9koOzuta91k fHnio52YWrTxXJpiQy4czWMVy13z5Nw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749646607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=A5Kas9xwQ+OPm3yoVaGIsnzULAQTcfMhc6nbP8S9fvE=; b=QvcvQhplfihYlpgq4DVTicbMK50f+BQ2JmqC0nyS5vYk9jeMUf7l+0xXeijP7khPt6adrL bPknNRf2j63k92IiTG5FBFy9JGcU6INKbq+2Em52kupa6sbIU0JEPiq4FdwM//2A7NmfVh XxeGUHzCp8XMZ3WGdpt1VnqIWk3CY3o= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-299-zuoPO3PUOkGYFRItWWua4A-1; Wed, 11 Jun 2025 08:56:46 -0400 X-MC-Unique: zuoPO3PUOkGYFRItWWua4A-1 X-Mimecast-MFC-AGG-ID: zuoPO3PUOkGYFRItWWua4A_1749646605 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6fad8b4c92cso171509976d6.0 for ; Wed, 11 Jun 2025 05:56:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749646605; x=1750251405; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=A5Kas9xwQ+OPm3yoVaGIsnzULAQTcfMhc6nbP8S9fvE=; b=OgN4WVx3rKkTnKOkNsolhr7/n8RRFWZzRoJolnS9S+AV7a4tJhi4icmWl/vGjNZ1Rh LUs9hlUb3bFCqbL/pveSxTtNIob5VLfSnm+gNuSOcIgQ9aPRdDTsb+jGI2RfhmmGbg1q FTQQGKvewEDWPaLFkYyg4oqgWPvMm1xN2ZiR0kbsnhhN3ZzE6v+W7gaG7MRrFWe52uQ1 vyPLwnV0hIiUTdUWsJVVknqaWiwsh9gUisD8A/QZvTtdtKKon5HxrjZSdKLqzTfSSU3k iTQRpH9Ku5gcGPaLBokgZMh/1sgfv0XWwYqOnmIKLpvHpkFz7wDuzmxTRqy0MVokJFzE oRaQ== X-Forwarded-Encrypted: i=1; AJvYcCWmaL6PD5HfWAWtZbgxVdspB1dN4HIA8gqeYf5qJEcUU69nc1YvOARPMdZX/047q6um7m+ZO8jI9A==@kvack.org X-Gm-Message-State: AOJu0Yww4wnzK9cmvxQb7FwclMRz0DGixE4nMZtx5YWdB708BvnB9peM B3UBheIONQSLx83RTOz9muFxABgFGbnvFJIXnWkTEarlpwacQLU2nfAx3KLdbEKlYxx7IWXBYhm Vc8Z0RsCCk+WUbvC3lqSLv6/pi6kHb0M0+OP6V7nK84dCRN6uyJq6 X-Gm-Gg: ASbGnct2anC17KzlUiwCkGSIe9MQETV9jqNSUF62SpsICj/ts0PSIcaAmAo5gfgk2Uf nbWtQiLeQO/3y4BCvmAfijUz4khA4eh0fiF/4s8SKQorQLrvbcJfOSm++VsSc/FOi9hUKhliHuG 24+IwGKBI7oTt/5RvLto6vG+qoDZ99d5VSUKFYU58UgrHj+0HYvFRX4Mx/wbLN9hORaME8X5Qfe 5u+dP7RN5UrGpgs0UBRqSDWxBCRqzx3K3h1gDO19soA/iDBWHgVTU/5DHPZfqzoCAFoYuGWAHUJ HYeNVz31ViW+RQ== X-Received: by 2002:ad4:5de3:0:b0:6ea:d033:2846 with SMTP id 6a1803df08f44-6fb2d150f50mr46124796d6.25.1749646605467; Wed, 11 Jun 2025 05:56:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZ3II3J1nqirG0Y1l89AqC5v1uU5YGX748p4ie8CqW4+N2aTm8LOyMY61BMCGc48dang2fKg== X-Received: by 2002:ad4:5de3:0:b0:6ea:d033:2846 with SMTP id 6a1803df08f44-6fb2d150f50mr46124046d6.25.1749646604859; Wed, 11 Jun 2025 05:56:44 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fb09ab8a0esm82314846d6.25.2025.06.11.05.56.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Jun 2025 05:56:44 -0700 (PDT) Date: Wed, 11 Jun 2025 08:56:40 -0400 From: Peter Xu To: Nikita Kalyazin Cc: akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, muchun.song@linux.dev, hughd@google.com, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, jack@suse.cz, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, jannh@google.com, ryan.roberts@arm.com, david@redhat.com, jthoughton@google.com, graf@amazon.de, jgowans@amazon.com, roypat@amazon.co.uk, derekmn@amazon.com, nsaenz@amazon.es, xmarcalx@amazon.com Subject: Re: [PATCH v3 1/6] mm: userfaultfd: generic continue for non hugetlbfs Message-ID: References: <20250404154352.23078-1-kalyazin@amazon.com> <20250404154352.23078-2-kalyazin@amazon.com> <36d96316-fd9b-4755-bb35-d1a2cea7bb7e@amazon.com> MIME-Version: 1.0 In-Reply-To: <36d96316-fd9b-4755-bb35-d1a2cea7bb7e@amazon.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: nImr5Rve5raVCMoyjzEtW5D8e1zMy-Hdo4ndtYm5fVY_1749646605 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam01 X-Stat-Signature: bpmaytpdgq5um8n3waaqzzs8yiimuj4y X-Rspamd-Queue-Id: 5289DC000A X-Rspam-User: X-HE-Tag: 1749646609-700077 X-HE-Meta: U2FsdGVkX1/ocXxgd6BoxQOrZ3Q59j2+yYp+HUgmqnz0N6DhrC1mBMF0fok6TnUJdtS2afZeWgeR9PlKztlw4U2yKYHnd/LXh9p0jX3xTxDCwLct6SGt7MeZAv3TIvx7oH9FTmhvF0fhMCQ5wgEDWfHcnITFnSmQoFtkiyFh45nZKGrWYcy1mEVmf/t8gC8dzOTa9EDgvWTq0eLJdxaPEm0XkG606nccbpo3WZOwlbCHSUKeLP7HmE9VccyIzWh7AKTH0DgZuJojA+Q3c6s3qti53GfDRthbCcw97MTg9TTB+uQjFTK+qydq26sH+uodNONu5Ex1El3n/0SVdIAxyDarxBQR4VnKvyVn5VOMMZOJqFJhGgp8oZvegKsILTBg72tOhxoes1sDI+14TBTZJL2q/qjGnafh25WHlurGUhrH2H7CAhGtcMzFTOTzM5NEFu/R+GTdS/GNm4tso983ECfJR5YVOiwbsH/qegDC9tiE6p2NHvw3zD/66xdrOgGMSxx1yWV6ylpu9WHolku6DZ4MltIxUOP2xwZ7+HKxMOO6mLzn+kSzS91l3NBa9ZP+HfFe1yyvF17+HEAWP9Ixlm+kkyK/dibrxMeRMwStRik7j1QecZdq11TWybz45Aek3sIy+aKWeb8lsVayktz/6dyhMrPR73Kp8LaWzynz9Xo//GzgSy559AZd+qNmcoouQoEZX/fwV8GqOc1oEcINANHHerMzSUda2XojgCERHZOH1Iy8od4TG+8U+oMMxZ3zsbzzwGww/VAhu7XtikQFYz1kxc1YKwJbwxClnufMuseE0HytH5rxVV4xIz9qjebm2mg4QU86l5pWd5erjg9vCZat905yC55Zhys4RJdiO8I2QodIRoUoAS0fW5eQEcatSxDDT5KGEDQIvrxRdZTZN2vBC5lFmbiiSsPGsiCS/UyH8rqZM+CgPZWv/Zc7C2CyrJ2XZLBQNvZiBkVC2LU gVCMAGUW W1bMGlqCz/yZq8/wGO02G4OP1gflb6ABmN/HrB+xL1kSc5sLkiU71pu1NtQEEYa8pQEyAMUBtuLLUTV1pFLF3sWRs+/x/e8ITrPMUf47Nh8kaHnnYEtuBYGXFUYIojx/IRF8oHSDDq6+rcSpZGI3xJZAPOi20Yn+Nxsr3x3NBlYaA2o3cqZ1LozIxrZ3lq21QR1161dV7AvUFfYGAacbCqVC0AuE+Jrjgi/SDLbnrCKKMTzOhDbjuu5u40Pb1NtVMWM7LIghJRb+jQkCb5k1+h/JzeA1u9EernC0gczTeQuiw2WbB4JC/UlDlcxuY2Pa7OUXR9lPlDomJx+VGhDu1BiKsez1i09qn7nwxp5fhm8BiOwIeiO7O0LDrzas1JdKUJ592J6bSTCz0oaM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 11, 2025 at 01:09:32PM +0100, Nikita Kalyazin wrote: > > > On 10/06/2025 23:22, Peter Xu wrote: > > On Fri, Apr 04, 2025 at 03:43:47PM +0000, Nikita Kalyazin wrote: > > > Remove shmem-specific code from UFFDIO_CONTINUE implementation for > > > non-huge pages by calling vm_ops->fault(). A new VMF flag, > > > FAULT_FLAG_USERFAULT_CONTINUE, is introduced to avoid recursive call to > > > handle_userfault(). > > > > It's not clear yet on why this is needed to be generalized out of the blue. > > > > Some mentioning of guest_memfd use case might help for other reviewers, or > > some mention of the need to introduce userfaultfd support in kernel > > modules. > > Hi Peter, > > Sounds fair, thank you. > > > > > > > Suggested-by: James Houghton > > > Signed-off-by: Nikita Kalyazin > > > --- > > > include/linux/mm_types.h | 4 ++++ > > > mm/hugetlb.c | 2 +- > > > mm/shmem.c | 9 ++++++--- > > > mm/userfaultfd.c | 37 +++++++++++++++++++++++++++---------- > > > 4 files changed, 38 insertions(+), 14 deletions(-) > > > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > > index 0234f14f2aa6..2f26ee9742bf 100644 > > > --- a/include/linux/mm_types.h > > > +++ b/include/linux/mm_types.h > > > @@ -1429,6 +1429,9 @@ enum tlb_flush_reason { > > > * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached. > > > * We should only access orig_pte if this flag set. > > > * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock. > > > + * @FAULT_FLAG_USERFAULT_CONTINUE: The fault handler must not call userfaultfd > > > + * minor handler as it is being called by the > > > + * userfaultfd code itself. > > > > We probably shouldn't leak the "CONTINUE" concept to mm core if possible, > > as it's not easy to follow when without userfault minor context. It might > > be better to use generic terms like NO_USERFAULT. > > Yes, I agree, can name it more generically. > > > Said that, I wonder if we'll need to add a vm_ops anyway in the latter > > patch, whether we can also avoid reusing fault() but instead resolve the > > page faults using the vm_ops hook too. That might be helpful because then > > we can avoid this new FAULT_FLAG_* that is totally not useful to > > non-userfault users, meanwhile we also don't need to hand-cook the vm_fault > > struct below just to suite the current fault() interfacing. > > I'm not sure I fully understand that. Calling fault() op helps us reuse the > FS specifics when resolving the fault. I get that the new op can imply the > userfault flag so the flag doesn't need to be exposed to mm, but doing so > will bring duplication of the logic within FSes between this new op and the > fault(), unless we attempt to factor common parts out. For example, for > shmem_get_folio_gfp(), we would still need to find a way to suppress the > call to handle_userfault() when shmem_get_folio_gfp() is called from the new > op. Is that what you're proposing? Yes it is what I was proposing. shmem_get_folio_gfp() always has that handling when vmf==NULL, then vma==NULL and userfault will be skipped. So what I was thinking is one vm_ops.userfaultfd_request(req), where req can be: (1) UFFD_REQ_GET_SUPPORTED: this should, for existing RAM-FSes return both MISSING/WP/MINOR. Here WP should mean sync-wp tracking, async was so far by default almost supported everywhere except VM_DROPPABLE. For guest-memfd in the future, we can return MINOR only as of now (even if I think it shouldn't be hard to support the rest two..). (2) UFFD_REQ_FAULT_RESOLVE: this should play the fault() role but well defined to suite userfault's need on fault resolutions. It likely doesn't need vmf as the parameter, but likely (when anon isn't taking into account, after all anon have vm_ops==NULL..) the inode and offsets, perhaps some flag would be needed to identify MISSING or MINOR faults, for example. Maybe some more. I was even thinking whether we could merge hugetlb into the picture too on generalize its fault resolutions. Hugetlb was always special, maye this is a chance too to make it generalized, but it doesn't need to happen in one shot even if it could work. We could start with shmem. So this does sound like slightly involved, and I'm not yet 100% sure this will work, but likely. If you want, I can take a stab at this this week or next just to see whether it'll work in general. I also don't expect this to depend on guest-memfd at all - it can be alone a refactoring making userfault module-ready. Thanks, -- Peter Xu