From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18B8BC7EE39 for ; Sun, 29 Jun 2025 08:50:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 320626B0088; Sun, 29 Jun 2025 04:50:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CF776B0089; Sun, 29 Jun 2025 04:50:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E4D46B008A; Sun, 29 Jun 2025 04:50:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E6FF06B0088 for ; Sun, 29 Jun 2025 04:50:24 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1237A1D7A05 for ; Sun, 29 Jun 2025 08:50:24 +0000 (UTC) X-FDA: 83607816768.24.1C1BCFB Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 64D58120006 for ; Sun, 29 Jun 2025 08:50:22 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SHk1klkk; spf=pass (imf29.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751187022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ll+dDPrwwRcPbS7Hd9tyeWPnGFozdYUs9R8waxrnnJY=; b=z2HcMYYmgzbSL/yOU6p99tSz696v8EZFTNYu7864JNid7UifCROA+8mCg47p+oNvGLNeKO ksT7/Cx/+gVi5tUZjSAzS+LifTcTlVcUxP480/2ppQrqVNfL3FCILPKCaZq1fGikAQRXTl KSYqwyT9TpSG2HRY8rcV3qZ1Gb6y0HM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SHk1klkk; spf=pass (imf29.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751187022; a=rsa-sha256; cv=none; b=pkj1OK+zQTRKHyLGpHXkPx02/kqAMpAtsMwqxrU1GINH3ivDAvgCAuOMdBhHYjSvMVhYiW SJG0/w0Ipaj+sMGOVP+5Yc6pXU+y2REZ2xlWRfD5yOBoexqWOMmO26tTO6YEBPcCDGRT4s mliGBnGvQX3PJ08HfdmFNJbAzp+VbQ0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 4B0315C028E; Sun, 29 Jun 2025 08:50:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 165D3C4CEEB; Sun, 29 Jun 2025 08:50:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751187021; bh=+uN/BMeSqdG97kTuJhCePNMiCzZqMgbQ1KWNK8YN0F0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SHk1klkk1P2LS246jBVLwKnTBxe3PjnO/ZrI7VHTtDCTMQ+tIOnmGhxiTW7jIVNJK ZhBnF/5EhWWopCsSki9Ndgoj2nJ2LNHMjL1ZtTOlxPn2jp6U3qslV25gtARnpSYRGu VuKMCjEfAbKjxUWl7CKFktaqrsvVE1WF4iaClmYITpPahUdoAsAlTOQImOGzSH28+R mdLNMqejNiuaMTUw2G5u/eZDDpxp/rDzOe8pG/KQImc0ZiDHqIZbffWUGFnNpSNq9J jwKFtjpxGiWtl6l+DvFtam1ShnT7B4yLox5IRdRu1xo40fS0Oxm8yMzPJgkYiN3uV1 sOpFY23mRHeKA== Date: Sun, 29 Jun 2025 11:50:11 +0300 From: Mike Rapoport To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-2-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250627154655.2085903-2-peterx@redhat.com> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 64D58120006 X-Stat-Signature: g8hd97xp41jkwk8z8ir5ckhoe49rx45y X-Rspam-User: X-HE-Tag: 1751187022-205877 X-HE-Meta: U2FsdGVkX1+LgHT8kDexFHNLNRem86K0ZYsJger6kS3NPxTKWrlYpr/6OUr9m/nkiR/VQl03PPUeJKYU96vGNBquH+MWYnaITQVFjx2U2w7AxSboS/8vhH77vbKuaTWo8kttagdE/y+Ecs48mNC8y4HpsaqkSi6pboIjqLLpLaClVswQOf+eL+cpMGHldMA23w7c5hIwz9Wxp0jg6y0SMTZRlezp3vKdAtcpDoQNsszK0KzvH27Kk5PNnyvenVmhprvoKc2q/8kwgxjCyTf7WuPFWI98riT7EKtKq0u+fIKh9MdXxudHqpwuZFhcW1dBYXkVu5rcWaz7H/BpYW/Te0xsVr59Gj+z93R65sHwmjkFNuhu1d8INC20EhPQ1K9fessDpRAyTVc3RijfMp0E0IIpI65PE1ML+TchXkEzqecXSGbdZV14ccZQAIslUK2JBdxcpc5zxd+5NwHmZ/eOG4rFZBueYRnleMBXVo6KdJpEes/6uvftn4EjeQFxHon0q5Z53jGnAurem5mDEscgh9B7x3g5tYVZ5kqfbn1KmDcZWjZ63pTINM0k+I0bzuTUEE1q2GPxqaQ7MxUj1oKVW+sey2Hh3DbC5/RzmC1XIp2Q9GQHAP7W4MFDHSlT1j1tU7yzxf07N2c/R9KPOnwwl1WE3IzOCjpfchfy0Qxizda10wJIr7rwxPVR8caNhKl1Zu9ChwUnZ9apJ77hNP1QDNTPQx9ObBJe4tCEOK6MbK5gn4Ci9uQwisGh5bEUC49mQenNgGDd1o+JVrZcSOCLLCsb31xgCIu68Rfhi/JDRZSNtonA7JLSNyUQu8W2GE3ralbtHk3/YxJqG3r2jaS/FSWBrwmCXIYVHA5MQZPTJB6eHfXK0yqqWaNqaYAvoKOzEWjEpigfnr2+Dhvxmqm11OoGwtoMlSjgWQgRyDQ7Hsya6ReUZzfqnVK1/TX2eHRh9mtmktGF1h21BRAULR6 mcwwILky kh1nzsQ5YVqWP0Nt9Ifgp4hJkBwfeIfUqdEmAx+WBu0Vu2u7Y955iWyY0Q7TkPEtxCP1/rJqnpGUzcbuKzj6X4S27Ohbm/UbrCRuWiV1D1lxwBCINURQ0UjtZwTBPhyGV2Gvp/B7SmY+kSY71DdQrXIc/hAB0vsxvU6E4w0NF+3uYgB+Pz6q9HeBTnigTe/Wonh9T6WFUQyT4mLI0kXXSzXhDhIIKaMLWQhoNsXZmx7J3jtZlfqNBL8BO+baEdmnbhWsYOrbF6XeI2r0aN9KljoBq10MY87AJAvsLGEYQ/CG8AP4rH2BaNozaa60NxJYJk6ZbMgGSi8NbDUwDzUJF36OTRyUx/erqqEey7h3tR/vx+idqZ0Ry9uHgtXbMHCjYeiumKWj1iEouk64WRABibrqKBht+twqy6uJ13twTbS5JwPQtwLzsAxd0Hw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Peter, On Fri, Jun 27, 2025 at 11:46:52AM -0400, Peter Xu wrote: > Introduce a generic userfaultfd API for vm_operations_struct, so that one > vma, especially when as a module, can support userfaults without modifying > the core files. More importantly, when the module can be compiled out of > the kernel. > > So, instead of having core mm referencing modules that may not ever exist, > we need to have modules opt-in on core mm hooks instead. > > After this API applied, if a module wants to support userfaultfd, the > module should only need to touch its own file and properly define > vm_uffd_ops, instead of changing anything in core mm. I liked the changelog update you proposed in v1 thread. I took liberty to slightly update it and here's what I've got: Currently, most of the userfaultfd features are implemented directly in the core mm. It will invoke VMA specific functions whenever necessary. So far it is fine because it almost only interacts with shmem and hugetlbfs. Introduce a generic userfaultfd API extension for vm_operations_struct, so that any code that implements vm_operations_struct (including kernel modules that can be compiled separately from the kernel core) can support userfaults without modifying the core files. With this API applied, if a module wants to support userfaultfd, the module should only need to properly define vm_uffd_ops and hook it to vm_operations_struct, instead of changing anything in core mm. > Note that such API will not work for anonymous. Core mm will process > anonymous memory separately for userfault operations like before. Maybe: This API will not work for anonymous memory. Handling of userfault operations for anonymous memory remains unchanged in core mm. > This patch only introduces the API alone so that we can start to move > existing users over but without breaking them. Please use imperative mood, e.g. Only introduce the new API so that ... > Currently the uffd_copy() API is almost designed to be the simplistic with > minimum mm changes to move over to the API. > > Signed-off-by: Peter Xu > --- > include/linux/mm.h | 9 ++++++ > include/linux/userfaultfd_k.h | 52 +++++++++++++++++++++++++++++++++++ > 2 files changed, 61 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ef40f68c1183..6a5447bd43fd 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -576,6 +576,8 @@ struct vm_fault { > */ > }; > > +struct vm_uffd_ops; > + > /* > * These are the virtual MM functions - opening of an area, closing and > * unmapping it (needed to keep files on disk up-to-date etc), pointer > @@ -653,6 +655,13 @@ struct vm_operations_struct { > */ > struct page *(*find_special_page)(struct vm_area_struct *vma, > unsigned long addr); > +#ifdef CONFIG_USERFAULTFD > + /* > + * Userfaultfd related ops. Modules need to define this to support > + * userfaultfd. > + */ > + const struct vm_uffd_ops *userfaultfd_ops; > +#endif > }; > > #ifdef CONFIG_NUMA_BALANCING > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index df85330bcfa6..c9a093c4502b 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -92,6 +92,58 @@ enum mfill_atomic_mode { > NR_MFILL_ATOMIC_MODES, > }; > > +/* VMA userfaultfd operations */ > +struct vm_uffd_ops { > + /** > + * @uffd_features: features supported in bitmask. > + * > + * When the ops is defined, the driver must set non-zero features > + * to be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR. > + */ > + unsigned long uffd_features; > + /** > + * @uffd_ioctls: ioctls supported in bitmask. > + * > + * Userfaultfd ioctls supported by the module. Below will always > + * be supported by default whenever a module provides vm_uffd_ops: > + * > + * _UFFDIO_API, _UFFDIO_REGISTER, _UFFDIO_UNREGISTER, _UFFDIO_WAKE > + * > + * The module needs to provide all the rest optionally supported > + * ioctls. For example, when VM_UFFD_MISSING was supported, > + * _UFFDIO_COPY must be supported as ioctl, while _UFFDIO_ZEROPAGE > + * is optional. > + */ > + unsigned long uffd_ioctls; > + /** > + * uffd_get_folio: Handler to resolve UFFDIO_CONTINUE request. > + * > + * @inode: the inode for folio lookup > + * @pgoff: the pgoff of the folio > + * @folio: returned folio pointer > + * > + * Return: zero if succeeded, negative for errors. > + */ > + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff, > + struct folio **folio); > + /** > + * uffd_copy: Handler to resolve UFFDIO_COPY|ZEROPAGE request. > + * > + * @dst_pmd: target pmd to resolve page fault > + * @dst_vma: target vma > + * @dst_addr: target virtual address > + * @src_addr: source address to copy from > + * @flags: userfaultfd request flags > + * @foliop: previously allocated folio > + * > + * Return: zero if succeeded, negative for errors. > + */ > + int (*uffd_copy)(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, > + unsigned long dst_addr, unsigned long src_addr, > + uffd_flags_t flags, struct folio **foliop); > +}; > +typedef struct vm_uffd_ops vm_uffd_ops; Either use vm_uffd_ops_t for the typedef or drop the typedef entirely. My preference is for the second option. > + > #define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1) > #define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr)) > #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) > -- > 2.49.0 > -- Sincerely yours, Mike.