From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59A9AC77B7C for ; Wed, 2 Jul 2025 19:30:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE7946B00A7; Wed, 2 Jul 2025 15:30:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABF456B00A8; Wed, 2 Jul 2025 15:30:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D4F26B00A9; Wed, 2 Jul 2025 15:30:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8BCF56B00A7 for ; Wed, 2 Jul 2025 15:30:24 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0A6011D64C4 for ; Wed, 2 Jul 2025 19:30:24 +0000 (UTC) X-FDA: 83620315968.08.F654FFF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id C5219C0010 for ; Wed, 2 Jul 2025 19:30:21 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CO0TzRUo; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751484621; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bVjNXEmwWJ25OMSC8D6SKZP+6PnJvUru8QWuoRzdce0=; b=2lTnOocGKEqwQKAkb0gPFaf8yvOqYpsLOAva0niFWSLP5GDdINu5IuCNLA8s9a0jDD3dSK 5MQd/ujOKPk2l3VxgFJuJHqVh8cafOxRLpp63bc9RJ7mMFxZ7xV7z0+LasOhULt+bngkDI S60w9e+ITAASwxNfzLXC6vIpEWoxEM8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CO0TzRUo; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751484621; a=rsa-sha256; cv=none; b=bU5h3JbECW4OPEHK+ATaM0MlZDseqYEnFk7w2UEHlAe5P8uKFiSQT3Opfm35E26enHTVtx UeTO4jT3qwdBUoVZMZ8dMdS7zlFVH8L5de5YAtfRMIgnKtWtN7MRMtVvrW/hZ7SAgnv4Ka CpWLc9UAYfBNk6J2QfFfwJsyKoXk0JI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751484621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bVjNXEmwWJ25OMSC8D6SKZP+6PnJvUru8QWuoRzdce0=; b=CO0TzRUoel8ZoKGHB3QBFldNYkceZblILqlEjKWo3XAlKL4UZGh0z3qh8jnKFqkgTqanux FMIr0xGrDzmz8kxl/5xG4HHo4w1Jbez9+Ut1FEJvMLEvjpCW0iTqZspuUU2y8SPozCPP/n 73tyAL+jq7pddErrte0uqYzeOsnis/c= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-584-B7-5E47iOT6Ab1f8ielSNQ-1; Wed, 02 Jul 2025 15:30:20 -0400 X-MC-Unique: B7-5E47iOT6Ab1f8ielSNQ-1 X-Mimecast-MFC-AGG-ID: B7-5E47iOT6Ab1f8ielSNQ_1751484620 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6fad9167e4cso101539406d6.1 for ; Wed, 02 Jul 2025 12:30:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751484620; x=1752089420; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bVjNXEmwWJ25OMSC8D6SKZP+6PnJvUru8QWuoRzdce0=; b=i2zj4YS1nhTamJWlJfgf5d1yUKqW3gLVx6QHak4naeYwnGD/sa50lPSX0DdeTTwklV SUuK0Q7v8hWgEuvdahw3IotI0GjNNK/lACsRYL+kiK6X+DXdBJ6OYWyhak7BqEU05tT9 ry9/+4olyn4k9iuf/hKyZHDvwuOiWMCFh+q50Tu13IHsFe+vj6xNIDRrL0bWbbcp0rGQ 9IaJRPkgbvEHUboShGy5SXtH2zDVb0+EznjwO4q1bvu5lEAOVhwrP1DxRvP3/eJiKh/U o3q17w9gDAtbLgR/jTYJMGdtpgekD3GS5Tp2LGCYvnVEXXxZ/gyldFfKZ5AyPYfHF7sy is6w== X-Gm-Message-State: AOJu0Yy+QxO+Y73jo8JFkiItx8TSKunNvtsIClsm8rlVCGjc/mAOP/xH Sh5oq0ITrHgEPFPN+LOmshOXkRfqV/lG7chQMQ2D2A6MYMlR413hmudq2yTzC9EHcMQXZuYdJy5 rZD56W9AlIzanBkDP4ifGnTJ67MYWFGqRIa0pkcg3fE5aVi3UfA3M X-Gm-Gg: ASbGnctKueczuSjrKpOOwEI/PHhjW1uisnPkX1vL0dKXbU2paRtY03Tc3WjL7GyKjfd nvA1RQ+4nS9jfvGoQptCTYeullSsKtSSy7jfZCRIJOEXMb72CIBwxYCAZHzAddoM6uzBIXLG96r c/lMQwllHQ0X+TA4kZEWKlGCsck0xYL4pvLbsKappdsPQxO2nOyInMcB2fLNnKK+MbMpol5FPTt 5r3Dog6QiJgbycDotLRz1k/Q7Fpslex2LAZbcMycu4362P7pXOc4RHG41F55UF/MS/o1I3y2ZQW 2VcbVFHqyH6YkA== X-Received: by 2002:ad4:5de7:0:b0:6fa:c81a:6234 with SMTP id 6a1803df08f44-702bc8a6f74mr10611646d6.10.1751484619060; Wed, 02 Jul 2025 12:30:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IESVA/R9sRs5Zd0ceXMGp/3YBzwNIxrliDpxNfmTWOPrytf8EoXJDoPprMsHlrfqNc+nZKlSg== X-Received: by 2002:ad4:5de7:0:b0:6fa:c81a:6234 with SMTP id 6a1803df08f44-702bc8a6f74mr10610616d6.10.1751484618418; Wed, 02 Jul 2025 12:30:18 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd773119c3sm106153026d6.120.2025.07.02.12.30.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jul 2025 12:30:17 -0700 (PDT) Date: Wed, 2 Jul 2025 15:30:14 -0400 From: Peter Xu To: Mike Rapoport Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-2-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 92ex-tlTFf6SaS2bHvtOq9lfEN-cLboZKAzQ576EkLY_1751484620 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C5219C0010 X-Stat-Signature: shkn9zxeec5nta97he3bnzid1go4un71 X-Rspam-User: X-HE-Tag: 1751484621-295910 X-HE-Meta: U2FsdGVkX19pyiGpVJxgoi72O5uvLrntRvO2Tp+CXqZriTCj4vH7h3FJFuOGXb6H5tJZy+n8x1jhuSOdOlwKB+2q2HiJNc6Sn5Kc6KzjzYO5b0SOLZjVi6N2wYab3DE5p6jD/u96kBZf7vTbYZW7WH3hqhEgFEAUSho4xyZLeIH5U5wuTwj08sqLXFAwf1M8lcajD3YGJ96oQbLBrquUmk+24qyNvoImT5RHmk7U++bZJf3cx6eeMuemGpNW8d5VH/DHNnopLtWJ1b4D/v9UIzkoWqWiLR1dgvl+YB4/FmX2f7nfav94Cy1LB/jY/+lhnaUfTgM1PvFl9NV2UogXtWyQdfe+pv2XCqAc2nzhTo0YpF0jQfbCx6y+11aEccaNOvd/H/9eeWuYcGHh1IyTnhUbghkgH232vtdnohYBVLM/klssplxt8wh89OADAEiZYyyVH+SQw2U1jnhlaU9SX54kUz0i//LL6XV8HWBNtJI4RYySrQV6i2doG4MYCZUg23TkIqiledyUa6a+WEw0AVCSd2jxuNQ7l7xY5tfYqb41tyDMF5YpHdfq+TVPEUmC3Rs1ABEb7nE7KUTNlkcWiXfaGI0jooHy7Gjkwz+CwY20vk6t4HBHOD3PzlHOZIi6TXDTOgZQdlnd+hDKTPqz8Dw6qOREi7gST/jlAFkYTFLy9dAcxrtYdpv0LIKKQnBIPJzqHx5QJzFY9UoMxmT6fQldhsgR6zyFDE7pgoiyF7RpWWIQhiCnszaotiP7N6DOCsFK0i4pRisetlCfOX694oH2LCFEdEP2LqZ1V1PfDXtO6bGuI+EwNg3ZCQcWcNOd+MpGODXVagmkEXFas7HfsSGQoNE0AD8p3yMMPCDBd8a0rMhEjxJfWwQnZYXareKhPR7J+mBF7s7O8Yb/1DKOEb4i+DVEinE4RY19BPmWXefNFhDKa63EXLqhbVYpbmttDpBOxIV0AKAOyy3pf+T WxNK+fkH zlW9v7G5SD760/SZjimUPAal9Ux6R++9MiDEDu+ft7E5fxFCONwSxHwQ+JOSBB1eA+DXN/fyvCGjVT8OmWp77oZGC6zBziAmVbF2ksKAZB7TTmhBbxxDwbUV2J1nuKFRLaA5Ru5ub5YfWXCk00m1f90wSvxaKGUzEydPFOTSv6ka7l/o/FZQthLS2OkBSkuCOHdpb2QuvZKZxYEUk1fizUa5RaIzEQ8jmy5+GOgrtIbAFiAU1hgKEaEXBHTT6N8u9ciDB8eL8+V49amN7yq+aB68GZWFrrSb8D1lkOqY5KULcBjBY2egl485wJuoEdt5VvJFvbQPTv4P1x8D6oSJiddFhI9p+eOrd9o1HfFCaKiGW6C7Gwp5APJraQqE82AjleuNUlfEK/oP5DYkm3SCiRnFavLPmKHKLJ49R/In58wv2eVZgJXlC7r1YNTeO30SWdKxE5vzBt2d0xx2YT23mO+3GxmQf//ZkzwFQ4o2duoJp43tqUI6vZ1EBoDl9j/hOMdCATjW/lCguCs2w7e3WKoqSFI3vC/ovSE8MnjsYUgkJzSAtoEA0KD+AcF+DEhYqqT4/DTr/1ru1Jrh1MQOOlHouKe8YYF8xvULD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jun 29, 2025 at 11:50:11AM +0300, Mike Rapoport wrote: > Hi Peter, Hi, Mike, > > On Fri, Jun 27, 2025 at 11:46:52AM -0400, Peter Xu wrote: > > Introduce a generic userfaultfd API for vm_operations_struct, so that one > > vma, especially when as a module, can support userfaults without modifying > > the core files. More importantly, when the module can be compiled out of > > the kernel. > > > > So, instead of having core mm referencing modules that may not ever exist, > > we need to have modules opt-in on core mm hooks instead. > > > > After this API applied, if a module wants to support userfaultfd, the > > module should only need to touch its own file and properly define > > vm_uffd_ops, instead of changing anything in core mm. > > I liked the changelog update you proposed in v1 thread. I took liberty to It's definitely hard to satisfy all reviewers on one version of commit message.. > slightly update it and here's what I've got: > > Currently, most of the userfaultfd features are implemented directly in the > core mm. It will invoke VMA specific functions whenever necessary. So far > it is fine because it almost only interacts with shmem and hugetlbfs. > > Introduce a generic userfaultfd API extension for vm_operations_struct, > so that any code that implements vm_operations_struct (including kernel > modules that can be compiled separately from the kernel core) can support > userfaults without modifying the core files. > > With this API applied, if a module wants to support userfaultfd, the > module should only need to properly define vm_uffd_ops and hook it to > vm_operations_struct, instead of changing anything in core mm. Thanks, I very much appreciate explicit suggestions on the wordings. Personally I like it and the rest suggestions, I'll use it when repost, but I'll also wait for others if anyone has other things to say. > > > Note that such API will not work for anonymous. Core mm will process > > anonymous memory separately for userfault operations like before. > > Maybe: > > This API will not work for anonymous memory. Handling of userfault > operations for anonymous memory remains unchanged in core mm. > > > This patch only introduces the API alone so that we can start to move > > existing users over but without breaking them. > > Please use imperative mood, e.g. > > Only introduce the new API so that ... > > > Currently the uffd_copy() API is almost designed to be the simplistic with > > minimum mm changes to move over to the API. > > > > Signed-off-by: Peter Xu > > --- > > include/linux/mm.h | 9 ++++++ > > include/linux/userfaultfd_k.h | 52 +++++++++++++++++++++++++++++++++++ > > 2 files changed, 61 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index ef40f68c1183..6a5447bd43fd 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -576,6 +576,8 @@ struct vm_fault { > > */ > > }; > > > > +struct vm_uffd_ops; > > + > > /* > > * These are the virtual MM functions - opening of an area, closing and > > * unmapping it (needed to keep files on disk up-to-date etc), pointer > > @@ -653,6 +655,13 @@ struct vm_operations_struct { > > */ > > struct page *(*find_special_page)(struct vm_area_struct *vma, > > unsigned long addr); > > +#ifdef CONFIG_USERFAULTFD > > + /* > > + * Userfaultfd related ops. Modules need to define this to support > > + * userfaultfd. > > + */ > > + const struct vm_uffd_ops *userfaultfd_ops; > > +#endif > > }; > > > > #ifdef CONFIG_NUMA_BALANCING > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > index df85330bcfa6..c9a093c4502b 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -92,6 +92,58 @@ enum mfill_atomic_mode { > > NR_MFILL_ATOMIC_MODES, > > }; > > > > +/* VMA userfaultfd operations */ > > +struct vm_uffd_ops { > > + /** > > + * @uffd_features: features supported in bitmask. > > + * > > + * When the ops is defined, the driver must set non-zero features > > + * to be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR. > > + */ > > + unsigned long uffd_features; > > + /** > > + * @uffd_ioctls: ioctls supported in bitmask. > > + * > > + * Userfaultfd ioctls supported by the module. Below will always > > + * be supported by default whenever a module provides vm_uffd_ops: > > + * > > + * _UFFDIO_API, _UFFDIO_REGISTER, _UFFDIO_UNREGISTER, _UFFDIO_WAKE > > + * > > + * The module needs to provide all the rest optionally supported > > + * ioctls. For example, when VM_UFFD_MISSING was supported, > > + * _UFFDIO_COPY must be supported as ioctl, while _UFFDIO_ZEROPAGE > > + * is optional. > > + */ > > + unsigned long uffd_ioctls; > > + /** > > + * uffd_get_folio: Handler to resolve UFFDIO_CONTINUE request. > > + * > > + * @inode: the inode for folio lookup > > + * @pgoff: the pgoff of the folio > > + * @folio: returned folio pointer > > + * > > + * Return: zero if succeeded, negative for errors. > > + */ > > + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff, > > + struct folio **folio); > > + /** > > + * uffd_copy: Handler to resolve UFFDIO_COPY|ZEROPAGE request. > > + * > > + * @dst_pmd: target pmd to resolve page fault > > + * @dst_vma: target vma > > + * @dst_addr: target virtual address > > + * @src_addr: source address to copy from > > + * @flags: userfaultfd request flags > > + * @foliop: previously allocated folio > > + * > > + * Return: zero if succeeded, negative for errors. > > + */ > > + int (*uffd_copy)(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, unsigned long src_addr, > > + uffd_flags_t flags, struct folio **foliop); > > +}; > > +typedef struct vm_uffd_ops vm_uffd_ops; > > Either use vm_uffd_ops_t for the typedef or drop the typedef entirely. My > preference is for the second option. Andrew helped me to fix some hidden spaces which I appreciated, then I found checkpatch warns on this one too besides the spaces fixed in mm-new. I do not know why checkpatch doesn't like typedefs even if typedefs are massively used in Linux.. I think I'll simply stick with not using typedefs. Thanks, > > > + > > #define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1) > > #define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr)) > > #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) > > -- > > 2.49.0 > > > > -- > Sincerely yours, > Mike. > -- Peter Xu