From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DDF4C7115C for ; Wed, 25 Jun 2025 16:56:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F23356B00AC; Wed, 25 Jun 2025 12:56:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED5566B00AF; Wed, 25 Jun 2025 12:56:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9B166B00B0; Wed, 25 Jun 2025 12:56:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C769F6B00AC for ; Wed, 25 Jun 2025 12:56:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7A75B1219D9 for ; Wed, 25 Jun 2025 16:56:34 +0000 (UTC) X-FDA: 83594526708.15.B4FFEEE Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf28.hostedemail.com (Postfix) with ESMTP id 61572C0002 for ; Wed, 25 Jun 2025 16:56:32 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazoncorp2 header.b=YX7DqT+t; spf=pass (imf28.hostedemail.com: domain of "prvs=264a2cd87=kalyazin@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=264a2cd87=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750870592; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+vRtrz9MOOBe4jJ74zhcMAZfEe7FJhBMbwgUuOpN8Bs=; b=3M19mFMX3Fyq8gejTQAYWXKjFugDOBaPP+eeIFCEnmbTRcmawhcpEpmhx9+M4WFFdwS+1m W/QsY60xcLPL/oCioCDnr/01/Xzw1kW8EjOS/7vWVQrYqW2IPz2Ys9FstU+0IpkmvhHg/z wpdspKI75hZLKrY5psnYt3Jd82ffMKs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750870592; a=rsa-sha256; cv=none; b=W+20vc3auyjPqHRmOBRMTghMlA/JVzxYB3r/gLIPI4u7spjJgCL7a3UbVQ1kci3WCgWOZW d7TqZjyntcfgCd9+YAVVn/NKCkgtDvhEHBrnUhDIc16MmSetZNMtiYDT110GaVPtxr7DEF eSx+VVgNC54Op4NZLBrFS8/wniaGbmY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazoncorp2 header.b=YX7DqT+t; spf=pass (imf28.hostedemail.com: domain of "prvs=264a2cd87=kalyazin@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=264a2cd87=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1750870593; x=1782406593; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=+vRtrz9MOOBe4jJ74zhcMAZfEe7FJhBMbwgUuOpN8Bs=; b=YX7DqT+tiAYyWEddsl/Bb9bIMKIudqxOx9m+5CYtCALIFl8tbHbM6WiM gccEN9PY/bnnr77XTNZJvVbp6BocLvaSy5v5MiNP0wDP9BD/BYoyC4u2X xGdVrcSW0xKtmiFBUjT455xG6op6C4sjEpzYFHfkyBXpy9g1pHwEKD8i9 wcLHxYmUWHjAbs5SJcxVvUmF0ySaw++HFCXorgmtGczVQGjgViq9O0VeA q/gs0Zps0wxN80syV7KqGw+gcPQkza9joIwl2FtVeHimYIP74635a6lhZ za5t/TdFDoQ+eC4MSYiuda2878Oppa2lYnA43pJ/4ooEb7MP2QPtZe+oz A==; X-IronPort-AV: E=Sophos;i="6.16,265,1744070400"; d="scan'208";a="505638031" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jun 2025 16:56:29 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:7530] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.42.139:2525] with esmtp (Farcaster) id aa7a323b-7e78-456a-8a40-2f2234ef4242; Wed, 25 Jun 2025 16:56:27 +0000 (UTC) X-Farcaster-Flow-ID: aa7a323b-7e78-456a-8a40-2f2234ef4242 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 25 Jun 2025 16:56:26 +0000 Received: from [192.168.16.20] (10.106.83.26) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 25 Jun 2025 16:56:25 +0000 Message-ID: <114133f5-0282-463d-9d65-3143aa658806@amazon.com> Date: Wed, 25 Jun 2025 17:56:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [PATCH 0/4] mm/userfaultfd: modulize memory types To: Peter Xu , , CC: Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , "Muchun Song" , Andrea Arcangeli , "Ujwal Kundur" , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen References: <20250620190342.1780170-1-peterx@redhat.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: <20250620190342.1780170-1-peterx@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.106.83.26] X-ClientProxiedBy: EX19D003EUB001.ant.amazon.com (10.252.51.97) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 61572C0002 X-Stat-Signature: jmjq8jy55ich4anfygx1p4nanwnim8si X-Rspam-User: X-HE-Tag: 1750870592-704055 X-HE-Meta: U2FsdGVkX1+sY3/pIPWp2cO4Il4+LKeBpS/MT/N4FlfJ/Tt8vyDIxO+rerxDA2A8ZoypiPbT63oUHOH1II1M9AoWhsY3PDAWkyrooLbPc9bWpAO74pLXq2ZI/PJD0moU8dcsUTHFv15TxBXQNEOXDCTtrmxUwnAf2O0MIXWOYXBVipYjwkZj8BbjjA41ECT772dsd85lxxKlXaktyGC67qBTimxkVk5ZsZuQBWT+lRvvRiNCSH3lixbeNJk30AFXmAyYGwh42h1lHO80pAEjtXV3YlpEsEZICwX6AVlWzbH3KWkHRVYBKy/ndZQujKKG/kjeE6Z88Ih0lErxbCnbLBaSCJeBYm9b/EOw7aKIjhma6Ma88D7XVjC93xdi/dg+I/oDcNCTJ9S5hQ2hjJifnvkR6Vk5m4nW7z0kQsAhDO6umzzIAbIyrtBX03MQRwHPicpVwnJqpbQ7wWQ37guIJW/OPULcV7C9SN0AaMjtuNwb0nb310FpYPfPaj3fuEk5ovdkuTG63VcYmnLM9sBNDwLcDpMblffRaD/fMmUb/I2hTx3eIWWOQksPG46khcJMOVGvj+FktfA5BqetGj8c9RR9P9fULahKXLXOHjmIb1Prqe48YsxTeiJRISP7PymjyK/iiu0cE1F65rSqoEQp9BJvgAoVHq4yrhiu3uwtQiGXdbkcy4QINdUQsNwjtWAI5iqsSue0xlueDfJS+u3MW9ICnA4EPHt6607JLt2pc2aBdokRbWkb22TBfV7Ac383/5fyrncUBySqXDrlAxQ038PsODI77K6CGBA37gLqycl/sjoqfepEV0DG0aaTeEAk0EvAwqJKehgzPpu4TQcdWOEL0pYqVYSHmnOvuVlMA20vo0CGORwmAUooNJoBQMk0T6bGcoVk/INTOKlbg6Zn76KsxiabBl1Rwf/wRgF0XE04gtqn5EAT05mrDYoFFN9QV6NgVIi82PgB7Zgfkyt PS4be1Ap WupOV2/ga93uulKzT6rSl1WE/p+tI31V0qOojTumaxoegc18m5crQ7/TXYrEkuRou4aoj9s/Ekwwp56JwqGUlPSCjJ5KQv50AulG8jctnrYiPgDrMsldq/Zv8248rycW6CijuZUSRuROF86zB6JJ5SPCScKCVvNrvHUnXY1otgXUVG4JNoI0skv5UYK+yuGkMtdSM1t2ibXi1DIAgS2EbIskIHzGQcIERc4i9HNz+DC480NLz/jCs+7cQFMls3v6QGXI5lwqS2QWcJsPmEp3uCwnSsXGuUe+UHW6uKoWqT8DlnM4l/a0gUOMCeXkBSDV1MudiFN6OWqa8bjO4PWNDdifEOaia4P+jYRVnwcMpxhyCUHs/ilKmwEDLM4/ccKZX+fBUqQV0ZHb3tC4mbF9wHgGjcTLoWCXwsak6M+nk7v5uE1Ec683bJqguuKjUDDSBxeqiZE6fplTIopP7etMtbIo6QYaZ1njgElZQoPEvbq1pZ4U61hz4O74oZLeCml9PRyQdvdIvPqbEM4l0ZCVOIEWfU0dZpiCkFpF0S1gUml20Abe0FbBc9+bCxv7+O6Pj5WgUQjIX51Fb2ZaA3jQN4N9FAsQyWbsXoQI9Uh+HHD4D3kNwe1gftnM8YGn7CthCgEwMSUdOmScbtp9WvX0U0sNlcgj1kyaLEe/mXTQjP7jLCHGV35zyAc3OufjVWF9ud3TFj0F87ivwxeWVGSUekxWkj7rqyTkeZ+F8QU28Kw/wjgiVn2Mlpi6K6WVO5wkQo/+AjmFNL337XRWxeQ4Ns/+Gqt/csDw73eUQy2oZsi+xlGWiJTP7AapJsIbZRmRn41PJOJATYFh9N1NDqcFCzdgoPernLoN1QkWcvSehuQ6nsdBRhEIbv8JrVIkXFgLH3vHve4vQU6kNHhVQz3g3pojAgSP7XuYDKhBC8mQIvfgkKYVeb6ET639LFE3ZK8wnbNYdxem9zTTluBoinBIXvncm8P8n 2CJm09+b X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/06/2025 20:03, Peter Xu wrote: > [based on akpm/mm-new] > > This series is an alternative proposal of what Nikita proposed here on the > initial three patches: > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com > > This is not yet relevant to any guest-memfd support, but paving way for it. Hi Peter, Thanks for posting this. I confirmed that minor fault handling was working for guest_memfd based on this series and looked simple (a draft based on mmap support in guest_memfd v7 [1]): diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 5abb6d52a375..6ddc73419724 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -5,6 +5,9 @@ #include #include #include +#ifdef CONFIG_USERFAULTFD +#include +#endif #include "kvm_mm.h" @@ -396,6 +399,14 @@ static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) kvm_gmem_mark_prepared(folio); } +#ifdef CONFIG_USERFAULTFD + if (userfaultfd_minor(vmf->vma)) { + folio_unlock(folio); + filemap_invalidate_unlock_shared(inode->i_mapping); + return handle_userfault(vmf, VM_UFFD_MINOR); + } +#endif + vmf->page = folio_file_page(folio, vmf->pgoff); out_folio: @@ -410,8 +421,39 @@ static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) return ret; } +#ifdef CONFIG_USERFAULTFD +static int kvm_gmem_uffd_get_folio(struct inode *inode, pgoff_t pgoff, + struct folio **foliop) +{ + struct folio *folio; + folio = kvm_gmem_get_folio(inode, pgoff); + + if (IS_ERR(folio)) { + *foliop = NULL; + return PTR_ERR(folio); + } + + if (!folio_test_uptodate(folio)) { + clear_highpage(folio_page(folio, 0)); + kvm_gmem_mark_prepared(folio); + } + + *foliop = folio; + return 0; +} + +static const vm_uffd_ops kvm_gmem_uffd_ops = { + .uffd_features = VM_UFFD_MINOR, + .uffd_ioctls = BIT(_UFFDIO_CONTINUE), + .uffd_get_folio = kvm_gmem_uffd_get_folio, +}; +#endif + static const struct vm_operations_struct kvm_gmem_vm_ops = { .fault = kvm_gmem_fault, +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops = &kvm_gmem_uffd_ops, +#endif }; static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) [1]: https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/ > Here, the major goal is to make kernel modules be able to opt-in with any > form of userfaultfd supports, like guest-memfd. This alternative option > should hopefully be cleaner, and avoid leaking userfault details into > vm_ops.fault(). > > It also means this series does not depend on anything. It's a pure > refactoring of userfaultfd internals to provide a generic API, so that > other types of files, especially RAM based, can support userfaultfd without > touching mm/ at all. > > To achieve it, this series introduced a file operation called vm_uffd_ops. > The ops needs to be provided when a file type supports any of userfaultfd. > > With that, I moved both hugetlbfs and shmem over. > > Hugetlbfs is still very special that it will only use partial of the > vm_uffd_ops API, due to similar reason why hugetlb_vm_op_fault() has a > BUG() and so far hard-coded into core mm. But this should still be better, > because at least hugetlbfs is still always involved in feature probing > (e.g. where it used to not support ZEROPAGE and we have a hard-coded line > to fail that, and some more). Meanwhile after this series, shmem will be > completely converted to the new vm_uffd_ops API; the final vm_uffd_ops for > shmem looks like this: > > static const vm_uffd_ops shmem_uffd_ops = { > .uffd_features = __VM_UFFD_FLAGS, > .uffd_ioctls = BIT(_UFFDIO_COPY) | > BIT(_UFFDIO_ZEROPAGE) | > BIT(_UFFDIO_WRITEPROTECT) | > BIT(_UFFDIO_CONTINUE) | > BIT(_UFFDIO_POISON), > .uffd_get_folio = shmem_uffd_get_folio, > .uffd_copy = shmem_mfill_atomic_pte, > }; > > As I mentioned in one of my reply to Nikita, I don't like the current > interface of uffd_copy(), but this will be the minimum change version of > such API to support complete extrenal-module-ready userfaultfd. Here, very > minimal change will be needed from shmem side to support that. > > Meanwhile, the vm_uffd_ops is also not the only place one will need to > provide to support userfaultfd. Normally vm_ops.fault() will also need to > be updated, but that's a generic function and it'll play together with the > new vm_uffd_ops to make everything fly. > > No functional change expected at all after the whole series applied. There > might be some slightly stricter check on uffd ops here and there in the > last patch, but that really shouldn't stand out anywhere to anyone. > > For testing: besides the cross-compilation tests, I did also try with > uffd-stress in a VM to measure any perf difference before/after the change; > The static call becomes a pointer now. I really cannot measure anything > different, which is more or less expected. > > Comments welcomed, thanks. > > Peter Xu (4): > mm: Introduce vm_uffd_ops API > mm/shmem: Support vm_uffd_ops API > mm/hugetlb: Support vm_uffd_ops API > mm: Apply vm_uffd_ops API to core mm > > include/linux/mm.h | 71 +++++++++++++++++++++ > include/linux/shmem_fs.h | 14 ----- > include/linux/userfaultfd_k.h | 58 ++++------------- > mm/hugetlb.c | 19 ++++++ > mm/shmem.c | 28 ++++++++- > mm/userfaultfd.c | 115 +++++++++++++++++++++++++--------- > 6 files changed, 217 insertions(+), 88 deletions(-) > > -- > 2.49.0 >