From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02A19C83F03 for ; Wed, 2 Jul 2025 20:36:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71E306B00AC; Wed, 2 Jul 2025 16:36:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D05B6B00AD; Wed, 2 Jul 2025 16:36:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E43F6B00B0; Wed, 2 Jul 2025 16:36:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4DAB86B00AC for ; Wed, 2 Jul 2025 16:36:43 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB51E1242A3 for ; Wed, 2 Jul 2025 20:36:42 +0000 (UTC) X-FDA: 83620483044.03.D60D878 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf08.hostedemail.com (Postfix) with ESMTP id BFA0E160004 for ; Wed, 2 Jul 2025 20:36:40 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AbXoQOvK; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751488600; a=rsa-sha256; cv=none; b=C4WwUIJ4vs8KU/Xttk5gy+FNqiqnqTBmHnRAtdxSNktpvGbNR5dI5cb31sQtht4zoPAV6F KiIErsKJrjXUz/jeO442wNWaByOqXqQSkS3WyWHju0jxZr7cX6Dnad0M/K+apoiB+T+i6m kmnZmXRGVqbZKMfbbVQCpZk6uL/eeqM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AbXoQOvK; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751488600; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OL23Q7Q9NrlgNMFncEiNJFIUCrL+UMC2Lcyay4XewBc=; b=D2n0+sw1nGfrJda1fVOYxVDbrf7snQUQ7yLgeYTfgwuW0Dypamn9RXBVIvmZ1DH86HXQzs Sry7tpH6aYIWvhmbj4OT4FlzM7eWJtbQOLFY6sq5Pyc0udQunht3SoZyRbiEY9dbjCPEqY y2oQym5O8McxMa/ZiN5pCmONNovlSsg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751488600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OL23Q7Q9NrlgNMFncEiNJFIUCrL+UMC2Lcyay4XewBc=; b=AbXoQOvKNV/USNfUAUR6UuUO1gT0yN6C/uFjcQxFT/e+SUQXgmq8SLxkACZVPHkvXJ6VjP 7a9shclMyhalXzmNHWQNPwevNMCIh4TEeT2FyMqDzEiB7wWeADtkbZufrubSg7OH0Ud0hA xDRYHSEennptKW33trEMCXvhplpEyUQ= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-ZU0BkdyLOK-Fz0uqkkjAFQ-1; Wed, 02 Jul 2025 16:36:37 -0400 X-MC-Unique: ZU0BkdyLOK-Fz0uqkkjAFQ-1 X-Mimecast-MFC-AGG-ID: ZU0BkdyLOK-Fz0uqkkjAFQ_1751488596 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7d446ce0548so159153085a.3 for ; Wed, 02 Jul 2025 13:36:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751488596; x=1752093396; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OL23Q7Q9NrlgNMFncEiNJFIUCrL+UMC2Lcyay4XewBc=; b=r3JGsUhlwj+p0ctDvsSvj20Dm2Ix3sX/QzUTwqL/Oa8K6UtiZ8A87dUxtrJt5u8uFe XzRWoaJfDE2eNKRUt7BxHOxOlg743N9th5C1KWuvqEUqMvgziDQLx67RWXpJDUCpddng gpo+JRbarBItwjCDAzFsBZEq3iyNzuN+kJ1CiLgRetFDzbVVVUKeaDLavi7M9C9PxWQq P9Q06+ePHMonr36kaf5k7srTPZw22PrOS0XstW3X4q3myhSD1DjJJIRKRFoUV3vAoSUi rT/fogGqpKmudUrdvf7RSTMcmmTew1FbhqCjTmdKMdKssnEzTMavGofxjykb+cJg+JDY aaVQ== X-Gm-Message-State: AOJu0YwfYyW07JL4YP8O2elEMekddxQnLZIWANRRWw5YyV8XyCNRh3OD iGcWkCPTmBohqOX2aPzMrS0rDI7moiAJbP1OQMbiPIRserCwRCQ2pHv52h+pv8MxTGvaauRDArK L77RONGjevVJV9vJoP7cy+O3QYEUEl3Tm8kIwxrOpNZSw4lcGoRp8 X-Gm-Gg: ASbGnctNnz8l/iU8O4PuS4Oz7vp1s74CuebFDmo9JXgdkB2xbFMhokI9d1YnnPOBgub oglMIRoEr4XNDH7sJWcDK3wl0bY2chLtaNhnF7FqNCSiePnPddhAI2/yIS6zb0fbC0VezQDyzas cvaMrvb8CviH9DNIc+yaKJHQ0dE43Bqj9v2CaNw8gLOZlP43It/3JCaa8sGK8F9ZmtHCBNkOfgJ zBiYOkwXoMywLQKLzxJy6/d6rb9h6HbHwUJjMb39f6larmHsqECAohgA8BeE9vqcpITx4yKjJA6 jYuqqcx9Kwth/Q== X-Received: by 2002:a05:620a:1a8d:b0:7cd:565b:dbc9 with SMTP id af79cd13be357-7d5c477edacmr470446085a.26.1751488596365; Wed, 02 Jul 2025 13:36:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGr+VZI7xQQFXix0YoKKczgSHvm4AIwXNE11tqJYsMDPOV7HPgC1heTFWU6De0i3wyYJjUw7A== X-Received: by 2002:a05:620a:1a8d:b0:7cd:565b:dbc9 with SMTP id af79cd13be357-7d5c477edacmr470442085a.26.1751488595851; Wed, 02 Jul 2025 13:36:35 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d443134f32sm998799385a.12.2025.07.02.13.36.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jul 2025 13:36:35 -0700 (PDT) Date: Wed, 2 Jul 2025 16:36:31 -0400 From: Peter Xu To: Lorenzo Stoakes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Mike Rapoport , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 0/4] mm/userfaultfd: modulize memory types Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <92265a41-7e32-430c-8ab2-4e7680609624@lucifer.local> MIME-Version: 1.0 In-Reply-To: <92265a41-7e32-430c-8ab2-4e7680609624@lucifer.local> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: BnQdx-MfY230D33xJ4PQ7OdVmZJOzWcPercQ2fvYcRs_1751488596 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BFA0E160004 X-Stat-Signature: sf3ykba757cywbezbdg88i8ezj5o3yaz X-HE-Tag: 1751488600-514123 X-HE-Meta: U2FsdGVkX19piWb4teI+flSuna53d9Yp+jBbHMtaEw4ZDdMx49UIkR06dS/Zl+BonvAdPLfI6nqqsWF4P/kbbE4sJZ2E2xVtji41lI8M3N8/GKbg4KYFrETF0VhTLF2v95Yp71J+C5phmxtRChYlGUfQ9ijAz9mW2UkQkq+svU2s2EDWS9yT4hnIIzE1RVKcAPdsWYORieQvvrgGRU6betKTG98gNxRu8TjWNPztj5m0v8Bzl34pi8tOi9RmRTleSQ43JzHvBHx1pEy8KOq8QgaKXiupaUfn24d0FiUJMTJkZ00EaZI//2dexR7Bn+l6KO9V4Euw0UHfA3VrHc0kt6Q9rZgBoQdd7D9hdIfvdp8lmSN9Dqb9/Q8Kq7LbioNXBZuA781pjv8HaB0XI8Dbjcwze9k3abLF/mCR3Cmz3SKKiUZrlUcSeb18ibKyN1uv7bWwYZIQOMbV5MUeV3nCcn76P/89f7+ZEQ/N40QqIIiWavZ2aSMaP07yMhFkjYiX2flVirv854RPDvFEgGz7v3QfFA/2kwB6u5DQg7Pn522wd/hVlgMtHfhkWP8Q79Grwad83bUjSqZQyF1luU4modNWniKzkCqFhvjJzkRuA7ws76KXWhdB5uA7gN39BdUC9LIUh35iXVNMnUW3TqL8KiWvbpTaRL8d6w1wla3+TZO+QObrPGQHYq9sqcSgYq1WHp9ZwPOufFEb0/zJK9iNmjJ0G6Pdn+YQ2r2ImQhYw2cGO5fyj7wGzvM5Na7r9C2PBPSCvYgMt14YBEztVYUC0HiQWTGietG+fMdUyniKAIGNUGEfRNqnoTcUav5ziApcg0BJPSCHEAt2D9lgI5ziDEIaYcES9E8ZFIf2dWfEqABruyiBI5lvuukVklVBZ5BO8T337gkJTc42KuNFDxOp1+Vzmds4X6RNZbQrsx33p0kudovWNC9s5tX5YWwoTERhOmrXROUf82cAVu4ahs2 LJ1jST8U A5Xx+RzJQe7E5iakbKYb+DpGu2fh+1xON/J2AWks6MeWmFtkhcWlb+FjAHqg15/faN5qgB+/IlSJpXnjG0zZlZhG9MGyoo2YA2XxZC3Fi67Wa+E420+NTzYyaxLl5wFIy1fyhIhl/bnb2zJ0PcN3FuMpOX45aGIxnx06NySwo/WfDlVWOGQR6o4OZDU8qIiGkgWosQFUu3YN/K+E8Qw/evD7wu/UM4l1EjXwioPZkgH6Joh1VTczEgOxAECTQR/QSd3HQya0G+XH7MwZp5b2z8PFr54j0rJaN4XHAWjunCpnr8v5i3tE5wreD4fFRfvdJa7+uh8iq/NbB7/yarSbJrq1ZoC610B49nrzBAlOPkfZUAOjPWCeYBWSe/ugBZZcGpNrvk7rxKyWDafQoQE9e4FNXrWhBH0K4mektwyNj/U8Di5jdb44rHwUb7WIgMR6q+fT/9r3BFMpyPikdKC05/hnmnp0GC3Ejlm52a+yLm0V0Rd9U1p3mqbnf8u3jp/eVDp6b2ik0upJymiu7aa01Z11ilwHc5Gtu582rhkwTddS1wgElrkxX2qcNvXceIwgdiS49 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 30, 2025 at 11:29:30AM +0100, Lorenzo Stoakes wrote: > On Fri, Jun 27, 2025 at 11:46:51AM -0400, Peter Xu wrote: > > [based on latest akpm/mm-new of June 27th, commit 9be7387ae43f] > > > > v2 changelog: > > - Patch 1 > > - update English in commit log [David] > > - move vm_uffd_ops definition to userfaultfd_k.h [Mike] > > - Patch 4 > > - fix sparse warning on bitwise type conversions [syzbot] > > - Commit message updates on explanation of vma_can_userfault check [James] > > > > v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com > > > > This series is an alternative proposal of what Nikita proposed here on the > > initial three patches: > > > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com > > > > This is not yet relevant to any guest-memfd support, but paving way for it. > > Here, the major goal is to make kernel modules be able to opt-in with any > > form of userfaultfd supports, like guest-memfd. This alternative option > > should hopefully be cleaner, and avoid leaking userfault details into > > vm_ops.fault(). > > > > It also means this series does not depend on anything. It's a pure > > refactoring of userfaultfd internals to provide a generic API, so that > > other types of files, especially RAM based, can support userfaultfd without > > touching mm/ at all. > > I'm very concerned that this change will simply move core mm functionality out > of mm and into drivers where it can bitrot and cause subtle bugs? > > You're proposing providing stuff like page table state and asking for a folio > back from a driver etc. > > I absolutely am not in favour of us providing core mm internals like this to > drivers, and I don't want to see us having to EXPORT() mm internals just to make > module-ised uffd code work (I mean I just will flat out refuse to do that). > > I think we need to think _very_ carefully about how we do this. > > I also feel like this series is at a really basic level and you've not fully > determined what API calls you need. See: https://lore.kernel.org/all/aGWVIjmmsmskA4bp@x1.local/#t > > I agree that it's sensible to be incremental, but I feel like you sort of need > to somewhat prove the case that you can jump from 'incremental version where we > only support code in mm/' to supporting arbitrary file system code that might be > modules. > > Because otherwise you're basically _guessing_ that you can do this, possibly, in > the future and maybe it's just not the right approach but that's not clear yet? Did you follow up with the discussions in v1? I copied you too. https://lore.kernel.org/r/114133f5-0282-463d-9d65-3143aa658806@amazon.com Would Nikita's work help here? Could you explain what are you asking for to prove that this works for us? > > > > > To achieve it, this series introduced a file operation called vm_uffd_ops. > > The ops needs to be provided when a file type supports any of userfaultfd. > > > > With that, I moved both hugetlbfs and shmem over. > > Well as you say below hugetlbfs is sort of a stub implementation, I wonder > whether we'd need quite a bit more to make that work. > > One thing I'd _really_ like to avoid is us having to add a bunch of hook points > into core mm code just for uffd that then call out to some driver. > > We've encountered such a total nightmare with .mmap() for instance in the past > (including stuff that resulted in security issues) because we - simply cannot > assume anything - about what the hook implementor might do with the passed > parameters. > > This is really really problematic. > > I also absolutely hate the: > > if (uffd) > do_something_weird(); > > Pattern, so hopefully this won't proliferate that. > > > > > Hugetlbfs is still very special that it will only use partial of the > > vm_uffd_ops API, due to similar reason why hugetlb_vm_op_fault() has a > > BUG() and so far hard-coded into core mm. But this should still be better, > > because at least hugetlbfs is still always involved in feature probing > > (e.g. where it used to not support ZEROPAGE and we have a hard-coded line > > to fail that, and some more). Meanwhile after this series, shmem will be > > completely converted to the new vm_uffd_ops API; the final vm_uffd_ops for > > shmem looks like this: > > > > static const vm_uffd_ops shmem_uffd_ops = { > > .uffd_features = __VM_UFFD_FLAGS, > > .uffd_ioctls = BIT(_UFFDIO_COPY) | > > BIT(_UFFDIO_ZEROPAGE) | > > BIT(_UFFDIO_WRITEPROTECT) | > > BIT(_UFFDIO_CONTINUE) | > > BIT(_UFFDIO_POISON), > > .uffd_get_folio = shmem_uffd_get_folio, > > .uffd_copy = shmem_mfill_atomic_pte, > > }; > > > > As I mentioned in one of my reply to Nikita, I don't like the current > > interface of uffd_copy(), but this will be the minimum change version of > > such API to support complete extrenal-module-ready userfaultfd. Here, very > > minimal change will be needed from shmem side to support that. > > Right, maybe a better version of this interface might address some of my > concerns... :) > > > > > Meanwhile, the vm_uffd_ops is also not the only place one will need to > > provide to support userfaultfd. Normally vm_ops.fault() will also need to > > be updated, but that's a generic function and it'll play together with the > > new vm_uffd_ops to make everything fly. > > > > No functional change expected at all after the whole series applied. There > > might be some slightly stricter check on uffd ops here and there in the > > last patch, but that really shouldn't stand out anywhere to anyone. > > > > For testing: besides the cross-compilation tests, I did also try with > > uffd-stress in a VM to measure any perf difference before/after the change; > > The static call becomes a pointer now. I really cannot measure anything > > different, which is more or less expected. > > > > Comments welcomed, thanks. > > > > Peter Xu (4): > > mm: Introduce vm_uffd_ops API > > mm/shmem: Support vm_uffd_ops API > > mm/hugetlb: Support vm_uffd_ops API > > mm: Apply vm_uffd_ops API to core mm > > > > include/linux/mm.h | 9 +++ > > include/linux/shmem_fs.h | 14 ----- > > include/linux/userfaultfd_k.h | 98 +++++++++++++++++++---------- > > mm/hugetlb.c | 19 ++++++ > > mm/shmem.c | 28 ++++++++- > > mm/userfaultfd.c | 115 +++++++++++++++++++++++++--------- > > 6 files changed, 207 insertions(+), 76 deletions(-) > > > > -- > > 2.49.0 > > > > Sorry to be critical, I just want to make sure we're not setting ourselves up > for trouble here. > > I _very much_ support efforts to make uffd more generalised, and ideally to find > a way to separate out shmem and hugetlbfs implementation bits, so I support the > intent _fully_. > > I just want to make sure we do it in a safe way :) Any explicit suggestions (besides objections)? Thanks, -- Peter Xu