From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB2E7C77B7C for ; Thu, 3 Jul 2025 15:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 883FF6B0209; Thu, 3 Jul 2025 11:24:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85B556B020A; Thu, 3 Jul 2025 11:24:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 771446B020B; Thu, 3 Jul 2025 11:24:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 66FE76B0209 for ; Thu, 3 Jul 2025 11:24:33 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C7F9716028F for ; Thu, 3 Jul 2025 15:24:32 +0000 (UTC) X-FDA: 83623325184.21.D7D9C47 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 1099F40004 for ; Thu, 3 Jul 2025 15:24:28 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IhW2gA9Y; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751556269; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sG+S5CSkLmIudzihd+SRDW9SuS3QIem5KNDkhUYKvpo=; b=SDYp+oO7rXKvvrJ16SqcMFRj+O62YKMy2HuNg859iLHURnKlAmzgkObCmWrtt3kXoaQ4bJ ZE/K5NPA/FdPbu705XRd3xJwiFw1WJfjV4TAwGoVFi3PVsK3Src39kzSokceS6gOLsLy5I HbzhQ4Yr5p4MR9u2Oqru5h7P96vDewc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751556269; a=rsa-sha256; cv=none; b=VZVMXOoT+LlniAOCVJybEQnH6mgdg/Bprgl3IgFyVTdROp/7V1S7z1ij/5u9KraymxHdzi sIhUyrXKJTYr7A9B6FFe00ixUJtXKcbZ01o+njwBuVYPYmbu7krqhixTY6FYM/7JfisoEm NPr9UvYz296fSv4OqJP9/0EG1QFofyg= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IhW2gA9Y; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751556268; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sG+S5CSkLmIudzihd+SRDW9SuS3QIem5KNDkhUYKvpo=; b=IhW2gA9YDWyBQUrng+P+DuiRhKF3uSaH2b8RP0lMDcdfjjhFPjvQ4PyL0G0fxOHAriICb9 xYHeelT8O1kj6jmk3OjRQiX6IO5od6vUR7j+LWv4Gr7BoGYqDpnQ+b0K5mcSk/dVdDrMaB T7q02lzU+5Pb/M+ITk8YwS4WA0TcmRw= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-494-pUypT1D5PfasZyJnnVKeKw-1; Thu, 03 Jul 2025 11:24:27 -0400 X-MC-Unique: pUypT1D5PfasZyJnnVKeKw-1 X-Mimecast-MFC-AGG-ID: pUypT1D5PfasZyJnnVKeKw_1751556267 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-4a441a769c7so148782841cf.3 for ; Thu, 03 Jul 2025 08:24:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751556267; x=1752161067; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sG+S5CSkLmIudzihd+SRDW9SuS3QIem5KNDkhUYKvpo=; b=ka9l1k9yMwjXj9ZXxg1XseYAGJX/GI2DwcbxJhEtfernSaU+OqXoeHy4+SZBpHDq34 4ANph0ni3XmXFLcX7YoN/ZUcSWXPTJNaZhgsMC0MclZ1atZmmakTcVmBcLqlG64HRIw8 UQNReEgMb5zDW7eWGyt2uTDTDfIkWtLvgaXac4H+x/FCNWi5KJ2ea0P0qC26JXx0xJ+J VYhZAf1t04DsciYg21Kvql6zBXGv9Pjgo6Z6ZHqwOfzeoGGkbflyM2ykwktCHU74iwBS W2ahFbYwiDmHDcB5DXhlcm7zcIwlevqHkLo+JWJy4RH0TffspODAH/RraoWWyobp5W5z Rx2A== X-Forwarded-Encrypted: i=1; AJvYcCVB7TTHAqCDeiAZAYE7mWhCoGTnK4TUk+bktqdas53FnUEcbTZqnwlVQ98uQ3iIB72oF7xzn5bDdQ==@kvack.org X-Gm-Message-State: AOJu0YwfvuiBQL2TZl1Hi9EnTn1aU8/Cix6XD9QRp6oHfXvj0kakvl3z N9W/QwJdrkGswd2LZRvlqrtn487FWurkYSXnP9nO3QHvWXtORKcQRycLljrsphTl1a7HZhCcvqa 8V2WU77xqZW1cFBCRB2eMoIlYmS+X7rDWekLFN5BBwCyRXETQPlvR X-Gm-Gg: ASbGnctsnoGZmjceMA6oxq6uACnFUh012fpMld3o+nhb072DVtqo5gphwLt4bLxMOWw /k7gWdmEN/6gg3l3x9z1T6RFuo481Lc7vzFH/75VuBysCrN9Hmuo8QjKX7pRVJmAwViumYcV70o +KJPt7R1oMCnVAA0AVCg+0U3I1+MxUIpGiYRWDn9tbuWpDnuJT7Io43NhqQYoIsjbTv8PGIWzdG 5JHeFFeGSEaKYi9SPqhqYpQ2fwx3SPwWa8Gu0eZ4cNC60RWPY8xlfAmFkaXU++fJCiHUWZROyK2 AUaCeQPA4Uy9xg== X-Received: by 2002:a05:622a:58cd:b0:4a9:93f0:e228 with SMTP id d75a77b69052e-4a993f0e39emr2794891cf.1.1751556266629; Thu, 03 Jul 2025 08:24:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEoEPaUfOTaN/I05EUqi6Tk8jbCeSg8Qi/pxZDiVrMvwj9PP62dpWGRAqEqp70a+FQIaeAPSQ== X-Received: by 2002:a05:622a:58cd:b0:4a9:93f0:e228 with SMTP id d75a77b69052e-4a993f0e39emr2794331cf.1.1751556266189; Thu, 03 Jul 2025 08:24:26 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4a7fc55c396sm109754811cf.50.2025.07.03.08.24.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Jul 2025 08:24:25 -0700 (PDT) Date: Thu, 3 Jul 2025 11:24:21 -0400 From: Peter Xu To: "Liam R. Howlett" , Nikita Kalyazin , Lorenzo Stoakes , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Muchun Song , Mike Rapoport , Hugh Dickins , Andrew Morton , James Houghton , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-2-peterx@redhat.com> <982f4f94-f0bf-45dd-9003-081b76e57027@lucifer.local> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Tx0Fsb5WfkOwGshgWsRcAZsfKAxOGa3i_OsfiUIYjKs_1751556267 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1099F40004 X-Stat-Signature: fqkcsfeigcrfxt4sa1ebtstobifzc18f X-Rspam-User: X-HE-Tag: 1751556268-378939 X-HE-Meta: U2FsdGVkX18trq04UxZxJxfJahIR2LaJvXTQyFtUbU9TqnhZLWAdPGW4khJ6exL1+EaRQ5Q8NXXzppmFf28Bc4CBB9bqIibHjcEa1t0iqVpADZWkPhdau8IyUgPk8ETRfcGUMWO0e3HR5wUWyHtMCasVvHJiy/P9y062cZJjQ5h0KSzdmIC1JKeepLI9GXbI0iheFSXveNQR/dHNck/15hHp5+KoqhLjUFvdo8xTHElo/vyDCLQCieTP7ELhH+Mg1PH6tQXzUJfgf3z4CZtxolTr0yNCXvCGiisl56NZr8+HPnHG+xpLRqsZcKBKLiqBGPLaZzBNaSM7Shcrx4eoaIzNhSI+nl/O1u18jkumD+2tw4XjG9B3K03l+1zI1p2+GnE9ELgldPLGAOn4QMGflKDIYYQhrhAEb9thZMZWr2CC1YXoiIWnz7RuQNrKQOUou7F9GmIhnC9JOiq7N0qHCX7VmIFVacLtsQA2P4A3XhGe0JNctSW42yzOhuHRoKwK03ZaaTcBGxHAHzlQlwGtLVDWNnAuiJ1bjyOC6XkzZIzttxZbeVQvZtS8nTor/ooPe+s9qH3mr0pCzRjdyeJ0SW44hDtAXNr6GLLqZX66T/MPrPUTa5JTWYGyNOQ1ezmiD26MlPJDwgdEFqqMy/tXfNI2qBIKuXiBULJzV41HXXEQfCC41JVU2Q0RkEYE8T24phw8UcDaoIVOF/RTiakO1COXwvGm029C44CnBW52eMjl/DXdxMolJIo5BDZiGrsxup0DE8mjLlVbkhlD3iUmkYIpJcL31xQR3GNvu092siJOjG9eq2EJkp9/mP236jRbHwvYD3x82+GNlAK//USMDtYC5QWsGmXEqNj5Zd76QtDTDwbK7Q00akDY5v6y3iX1z3HyJTA+Oo3MbAEPawdJXUa5AEBcTX/1WzmKsp4oYG3OTN6PgaThgbGTNuXHWvirbsB1yhyEr7MOWBdV68e pRoFbNWV pwdzNlZGQLKvjcjIHGeujQGyWVP9OcpQHRAXqLIZNsFqqHAFFJIAiGx4rKk30pcPgYksOA3tFVRIjGhsBbsJA8lNraGWJId+lcaxexEmGrdvBLyvFylHOMAyEQ1lhUChbQ78+hLDklsJvKuvC8Oce2vns0w6yMmt9XxDHObA+2rLlnVsWvDrHbrDlTs/4UhkbXSNtHWazwSSE7mvSto9dSx1saRBbEeTupu9XvjnTmWm5MQSylLfSD/mcDNF3BL8FQa6N+fU2WcDRc3wtzrw8IL2KXAlzj1qb/ui2AdHPsr5yJfMGrPl8CKIiQrN30Pg4XzxDGCjDdHgQGX2ppwHVLvgHN355wLtuwyLd74ULPUQcKWKZGF+IzRdw4LDXgMIUFWPHUZXvCKTIOc8c2lIv/KlsdKeGuDZ0RYHgBS65MowPje4JKf7SNrAW93CwrQVGJwMjSrk37xuMIgUj1kY/KwhxSf6oUX1Y8SHp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 02, 2025 at 10:00:51PM -0400, Liam R. Howlett wrote: > * Peter Xu [250702 17:36]: > > On Wed, Jul 02, 2025 at 05:24:02PM -0400, Liam R. Howlett wrote: > > > That's because the entry point is from a function pointer, so [3] won't > > > help at all. > > > > > > It is recreating the situation that existed for the vma through the > > > vm_ops in mmap, but for uffd. And at a lower level (page tables). I do not > > > want to relive that experience. > > > > > > We are not doing this. It is for the benefit of everyone that we are > > > not doing this. > > > > Is the vma issue about "allowing vma->vm_flags to be modified anywhere" > > issue? Or is there a pointer to the issue being discussed if not? > > The issue is passing pointers of structs that are protected by locks or > ref counters into modules to do what they please. > > vma->vm_flags was an example of where we learned how wrong this can go. > > There is also the concern of the state of the folio on return from the > callback. The error handling gets messy quick. > > Now, imagine we have something that gets a folio, but then we find a > solution for contention of a lock or ref count (whatever is next), but > it doesn't work because the mm code has been bleeding into random > modules and we have no clue what that module is supposed to be doing, or > we can't make the necessary change because this module will break > userspace, or cause a performance decrease, or any other random thing > that we cannot work around without rewriting (probably suboptimally) > something we don't maintain. > > Again, these are examples of how this can go bad but not an exhaustive > list by any means. > > So the issue is with allowing modules to play with the folio and page > tables on their own. I understand the concern, however IMHO that's really why mm can be hard and important at the same time.. We definitely have driver code manipulating pgtables. We also have folios or pages that can be directly accessible from drivers. After all mm is the core function provider for those and there needs to be some API accessing them from outside. I agree some protection would be nice, like what Suren did with the vm_flags using __private, even though it's unfortunate it only works with sparse not a warn/error when compiling, as vm_flags is not a pointer. OTOH, forbid exposing anything might be an overkill, IMHO. It stops mm from growing in healthy ways. > > If this is outside the mm, we probably won't even be Cc'ed on modules > that use it. > > And do we want to be Cc'ed on modules that want to use it? For this specific case, I'm happy to be copied if guest-memfd will start to support userfaultfd, because obviously I also work with the kvm community. It'll be the same if not, as I'm list as an userfaultfd reviewer. But when it's in the modules, it should really be the modules job. It's ok too when it's an API then mm people do not get copied. It looks fine to me. > > We will most likely be Cc'ed or emailed directly on the resulting memory > leak/security issue that results in what should be mm code. It'll be a > Saturday because it always is.. :) True, it's just unavoidable IMHO, and after triaged then the module owner needs to figure out how to fix it, not a mm developer, if the bug only happens with the module. It's the same when a module allocated a folio/page and randomly update its flags. It may also crash core mm later. We can have more protections all over the places but I don't see an easy way to completely separate core mm from modules. > > Even the example use code had a potential ref leak that you found [1]. That's totally ok. I appreciate Nikita's help completely and never thought it as an issue. IMHO the leak is not a big deal in verifying the API. > > > > > > > We need to find another way. > > > > Could you suggest something? The minimum goal is to allow guest-memfd > > support minor faults. > > Mike brought up another idea, that seems worth looking into. I replied to Mike already before we extended this thread. Feel free to chime in with any suggestions on top. So far this series is still almost the best I can think of. Thanks, -- Peter Xu