From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B0C7C83038 for ; Tue, 1 Jul 2025 17:04:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2AC6B00B1; Tue, 1 Jul 2025 13:04:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BB2D6B00B2; Tue, 1 Jul 2025 13:04:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D0826B00B3; Tue, 1 Jul 2025 13:04:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F18296B00B1 for ; Tue, 1 Jul 2025 13:04:43 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7021380563 for ; Tue, 1 Jul 2025 17:04:43 +0000 (UTC) X-FDA: 83616320046.05.29B2E35 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf24.hostedemail.com (Postfix) with ESMTP id B3FF8180003 for ; Tue, 1 Jul 2025 17:04:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O4L2R8Cz; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751389481; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wmm+SBgWObvnarZW8fSMrMK7upMaElubl5ENC0Y8cTo=; b=f/P2lZFib7dnlDhedhO2mq8MOfQmg+6u7m7RmnpSSVtaegw/A28yVDZeiUgEX2XPcs8toa uKYGtppUjCGQ/b//e+25EQ9KZci8OBQDf/QIzfJ9M/DzXkZCi142awcInuiT07mE1nB5As BBGTh+CNNrGWlnyoiYEqCBHzBrY1+jk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O4L2R8Cz; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751389481; a=rsa-sha256; cv=none; b=qoSZkAaazxkAuV4H/+onKFyvL6ueWucdzrym0HBYeYHvYyeHDxS4eJhUA5X1MnNs46q9Uv GdGiQheu8NtdtzgIII9xqYdMHJ9fws49xJC0krtEM4ELstL0bS3elADrZVkS8uZ7Vp1uwx 1lt3xbEqQQLMoOQydfw8mbkSeTjp5BU= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-4a5ac8fae12so566521cf.0 for ; Tue, 01 Jul 2025 10:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751389481; x=1751994281; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Wmm+SBgWObvnarZW8fSMrMK7upMaElubl5ENC0Y8cTo=; b=O4L2R8CzDLTSaKYZxJd1u/H0PPEA5aJZEabUiuMPkP7uGzfKz6sMmqdJ1blD+y/cpD 7XB4JZtiHF0TqsGGN+p7NFY+ZkffcQXOgnP3s4wK0R9tFNz40wJO+YYOyWaKiqowCdLl CBA7vTnSvGoXokMbuQo/YGLzPVpD8WlJosnb7vCIueZOdDy1FM3bsJ1BEM/TawKvGn/d nRULPHQNc16zicF538L13OvqjcdrGVscwhG56JhWBOtTSu/1xyAwzwEBnZ2nqVMCbgjw Yw6dZFPUD9OewGQjRvpYK+xZjjS0VNR6nWWurlGG7jNKJ71QnsOIlsil4AbXGxbwK7JC omvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751389481; x=1751994281; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wmm+SBgWObvnarZW8fSMrMK7upMaElubl5ENC0Y8cTo=; b=dqtvFdCdN67c6P8kC08d62D+g8O9raHIZb36DD+s6CD+xh+lCy4I24oXYsUPJH2rAW 9pJgAPe6/oY7a+b54EJI8Ey3GmxAfk/VudjIXHvFxbNBm0TmBZngA+2ny2jtYT4SLfiJ D088PEY4Ga0avFc9ePvPQA+vaMXs3CS8VicPtuiZdAshGzRJUwRe5rC1MBJONLhJqRLh cO9a8+peUoAUpMrNzz/F2IrsTcnPm1jPkckklPLiO/2kQyKtYKr4z4x6W6zl7lT5txdj 9OE/8bGzhzLu6lqkHnwR1Y1aLX1eNsMZvPHAPrsRPVOSEzZAeeZ7CdLq4tmfc+X7nvs0 CYBA== X-Forwarded-Encrypted: i=1; AJvYcCV+E8fWFOJda64zraINWqb6IOg4yC5UVp2tBiiOBKPQjkcUO9JZVNWFXlPWxWokYAtief8MaxMhVA==@kvack.org X-Gm-Message-State: AOJu0YyjthVjML1BINkixVufQ/4nxnOYcwldv3hG/NlwhLTJsHuXwHr8 jDPH75QoBLcMsHeZBO1jbIGJ2t6YXPe6WIHo/XIby4gPJGG6hOynqP60wvkXLhJLxMj9hq9rCds LcGVW3diGTgYLnTsvcVANpJljRv048T656GdoYCVw X-Gm-Gg: ASbGnctIKyhGMGK+t8M7jSvOVYrigeOCtpik2jliA8l3HupnJgZXCNqZUYO+vkf8Mgv T3vut0UtxeIcmjMR3+yL1pzTRNAoT4da1aJ8XFY2TvmEPo33eQvwanLnkAWwVmyyEQ+y/4M7jrF hGBoOiRcdD+ccvNeDc1p+NE9dl7yVODoslHIxFC7G1jNB6bx5u525EPHotC2I5kj/UoFfSqK77K Q== X-Google-Smtp-Source: AGHT+IE2nx1+TjsPOD4dd0RuCC9EDcfyRmIw9ZzZkLf0CkyETMBendDT5v5FRxLl4LmxKAj95iD32eLk684qZh4wTnc= X-Received: by 2002:a05:622a:a609:b0:4a6:f9d2:b538 with SMTP id d75a77b69052e-4a87af83b27mr3590211cf.28.1751389480235; Tue, 01 Jul 2025 10:04:40 -0700 (PDT) MIME-Version: 1.0 References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-2-peterx@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 1 Jul 2025 10:04:28 -0700 X-Gm-Features: Ac12FXwfFgZZMGO4zvKP1fjXn5pQj6YbRBID6NH0mn3wXkQNXe7zdoAupyIKrSw Message-ID: Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API To: Lorenzo Stoakes Cc: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Muchun Song , Mike Rapoport , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: phuwswo8m44gxmwrzspn5q55p4pcruny X-Rspamd-Queue-Id: B3FF8180003 X-Rspamd-Server: rspam11 X-Rspam-User: X-HE-Tag: 1751389481-668074 X-HE-Meta: U2FsdGVkX1/y6W/i2pEDGHIcw5VXFjVMSywIZ046DRn/40ql7qD9agAYDzWc5i872NIh4RqTqQI4NzITVNFi71h5EiZp7jPd4ZqT9eE+HESR8v23NkqlCFeNYRZMCe9hN4sXlnfrAPatD+3Yd5kTUOsnxuVWbMGJami+VJnrEqPdiYtdUvgZ6iAYnVnaVUKu0jtqPCoegTZXfiOcvptge+VEMRvuYEeuC2TblEkn4UBxBQiFrHHZZu5v9Orh6HyUoBYM183qEa/YhmtftoI4ApMhO1f01pBbppXn3nU8kN0LvFcDdaRHCS1+ZSyIm7zuKTrGCZziDZfFsANEK2XOnug0gw6NhOzz9bRl45xe+r9p7UxnhM/csMfq4Dv1boqY1IEqAcY998SFxBfFVN8UYFUJ4/XG3BYSYEnt9Iy7JK/dYzHoLq5ZO25cEcbtAEsBoK2qlA/1Zh3NZEaoV/wuJDNJ3fLchRE1LosJa6aQZDjCXwYdgWfQL/vGISbYpgkZWdMsUa1qNg4Uln9/lbxAVi+dDuKpB3DnGCSoLbYiS5SUZ0iSiDriz38GEciS37EdJpBTAP0sr0sivbIL78ALBF4E/d5kZQHMWx4pPnnP5+lis6St9k3pIjqh1U2jete6ECRDrl0avUiJE0L0OyeNHrJIJuRF0n0u6S89W4m3yLWZIGMYjFcwPhjQtLZACWgaG+b7H2ZRw2U+fVeb/sIMVC8n7hx6/eD2Bs0cQJq6a4KSPSUOvjSYCmK80rUTMLVAhPjsErA+gwq0DYOoOHcxYS66AF3onV9VDi2Ow7Um7Ls9IRm0SxJm8JoY7zcHxNdIom9qFS2U+zbiWfIfyhzjoywaFOHpwx/72GIzd1r5jRlPInSPONJjqI905GMs7TvK45HP7ixwPvMD6y+SCeNBdLCOhr4lk4FKPyTRgi4NH4Zt+Jt2ztIrlpHSXrEu1w4/Jk7EDRq1vF3R+BmXHxr 2dpeolsg +S/yyR6Mg+DS2Us2jRL2sfkejWWy717sk1wiMjArEySyfsrkfFVzsLFWtCsvgo6PfI7+jV69lr687Vq+PcOzpCohxTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 30, 2025 at 3:16=E2=80=AFAM Lorenzo Stoakes wrote: > > On Fri, Jun 27, 2025 at 11:46:52AM -0400, Peter Xu wrote: > > Introduce a generic userfaultfd API for vm_operations_struct, so that o= ne > > vma, especially when as a module, can support userfaults without modify= ing > > the core files. More importantly, when the module can be compiled out = of > > the kernel. > > I'm not sure I understand this. One VMA 'especially when as a module'? > > This whole sentence is really unclear. > > So it seems to me that you want to be able to allow more than just: > > - anonymous memory > - shmem > - hugetlb > > To support userfaultfd correct? > > > > > So, instead of having core mm referencing modules that may not ever exi= st, > > we need to have modules opt-in on core mm hooks instead. > > OK this sounds reasonable. > > > > > After this API applied, if a module wants to support userfaultfd, the > > module should only need to touch its own file and properly define > > vm_uffd_ops, instead of changing anything in core mm. > > Yes this also sounds reasonable, _as long as_ :) the module authors are > aware of any conditions that might exist in the VMA that might be odd or > confusing. > > For instance, if locks need to be held etc. > > I am generally slightly wary of us giving VMAs in hooks because people do > crazy things with them that > > > > > Note that such API will not work for anonymous. Core mm will process > > anonymous memory separately for userfault operations like before. > > Right. > > > > > This patch only introduces the API alone so that we can start to move > > existing users over but without breaking them. > > Sounds sensible. > > > > > Currently the uffd_copy() API is almost designed to be the simplistic w= ith > > minimum mm changes to move over to the API. > > A good approach. > > > > > Signed-off-by: Peter Xu > > --- > > include/linux/mm.h | 9 ++++++ > > include/linux/userfaultfd_k.h | 52 +++++++++++++++++++++++++++++++++++ > > 2 files changed, 61 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index ef40f68c1183..6a5447bd43fd 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -576,6 +576,8 @@ struct vm_fault { > > */ > > }; > > > > +struct vm_uffd_ops; > > + > > /* > > * These are the virtual MM functions - opening of an area, closing an= d > > * unmapping it (needed to keep files on disk up-to-date etc), pointer > > @@ -653,6 +655,13 @@ struct vm_operations_struct { > > */ > > struct page *(*find_special_page)(struct vm_area_struct *vma, > > unsigned long addr); > > +#ifdef CONFIG_USERFAULTFD > > + /* > > + * Userfaultfd related ops. Modules need to define this to suppo= rt > > + * userfaultfd. > > + */ > > + const struct vm_uffd_ops *userfaultfd_ops; > > +#endif > > }; > > This shouldn't go in vm_ops like this. You're basically changing a > fundamental convention here - that _ops structs define handlers, now you > can have somehow nested ones? > > It seems more appropriate to have something like: > > struct vm_uffd_ops *(*get_uffd_ops)(struct vm_area_struct *vma); > > This then matches e.g. mempolicy's get_policy handler. > > > > > #ifdef CONFIG_NUMA_BALANCING > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_= k.h > > index df85330bcfa6..c9a093c4502b 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -92,6 +92,58 @@ enum mfill_atomic_mode { > > NR_MFILL_ATOMIC_MODES, > > }; > > > > +/* VMA userfaultfd operations */ > > Are we sure this should be operating at the VMA level? > > I mean are the operations going to be operating on VMAs or VM fault > structs? If not, then this surely belongs to the file operations no? > > > +struct vm_uffd_ops { > > I'd comment on the naming, but 'vm_operations_struct' is so bad that it > seems we have no sensible convention anyway so this is fine :P > > > + /** > > + * @uffd_features: features supported in bitmask. > > + * > > + * When the ops is defined, the driver must set non-zero features > > I don't think the 'when the ops are defined' bit is necessray here, you'r= e > commenting on the ops here, you can safely assume they're defined. > > So I'd just say 'must be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR.' > > > + * to be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR. > > + */ > > + unsigned long uffd_features; > > + /** > > + * @uffd_ioctls: ioctls supported in bitmask. > > + * > > + * Userfaultfd ioctls supported by the module. Below will always > > + * be supported by default whenever a module provides vm_uffd_ops= : > > + * > > + * _UFFDIO_API, _UFFDIO_REGISTER, _UFFDIO_UNREGISTER, _UFFDIO_W= AKE > > + * > > + * The module needs to provide all the rest optionally supported > > + * ioctls. For example, when VM_UFFD_MISSING was supported, > > + * _UFFDIO_COPY must be supported as ioctl, while _UFFDIO_ZEROPAG= E > > + * is optional. > > I'm not sure why we need this field? Isn't this implied by whether handle= rs > are set or NULL? > > > + */ > > + unsigned long uffd_ioctls; > > + /** > > + * uffd_get_folio: Handler to resolve UFFDIO_CONTINUE request. > > + * > > + * @inode: the inode for folio lookup > > + * @pgoff: the pgoff of the folio > > + * @folio: returned folio pointer > > + * > > + * Return: zero if succeeded, negative for errors. > > + */ > > + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff, > > + struct folio **folio); > > Not sure uufd_ prefix is needed in a struct that's obviously about uffd > already? I'd strip that. > > Also this name is really confusing, I think it should contain continue in > some form, such as 'handle_continue()'? > > This really feels like it shouldn't be a VMA function as you're dealing > with inode, pgoff, and folio and not VMA at all? > > > > + /** > > + * uffd_copy: Handler to resolve UFFDIO_COPY|ZEROPAGE request. > > + * > > + * @dst_pmd: target pmd to resolve page fault > > + * @dst_vma: target vma > > + * @dst_addr: target virtual address > > + * @src_addr: source address to copy from > > + * @flags: userfaultfd request flags > > + * @foliop: previously allocated folio > > + * > > + * Return: zero if succeeded, negative for errors. > > Can you please ensure you put details as to VMA lock state here. Uffd has > some very tricky handling around stuff like this. > > > + */ > > + int (*uffd_copy)(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, unsigned long src_addr, > > + uffd_flags_t flags, struct folio **foliop); > > Do we not need a uffd_ctx parameter here? > > It seems like we're assuming a _lot_ of mm understanding in the underlyin= g > driver here. > > I'm not sure it's really normal to be handing around page table state and > folios etc. to a driver like this, this is really... worrying to me. > > This feels like you're trying to put mm functionality outside of mm? To second that, two things stick out for me here: 1. uffd_copy and uffd_get_folio seem to be at different abstraction levels. uffd_copy is almost the entire copy operation for VM_SHARED VMAs while uffd_get_folio is a small part of the continue operation. 2. shmem_mfill_atomic_pte which becomes uffd_copy for shmem in the last patch is quite a complex function which itself calls some IMO pretty internal functions like mfill_atomic_install_pte(). Expecting modules to implement such functionality seems like a stretch to me but maybe this is for some specialized modules which are written by mm experts only? > > What if we change how we handle page tables in future, or the locking > changes? We might not know to go and update x, y, z driver. > > This is concerning. > > > +}; > > +typedef struct vm_uffd_ops vm_uffd_ops; > > + > > #define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1)= + 1) > > #define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr)) > > #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT= (nr)) > > -- > > 2.49.0 > >