From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB8DFC83F03 for ; Wed, 2 Jul 2025 20:23:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 186768D0008; Wed, 2 Jul 2025 16:23:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15ECE8D0001; Wed, 2 Jul 2025 16:23:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 073E78D0008; Wed, 2 Jul 2025 16:23:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E62A68D0001 for ; Wed, 2 Jul 2025 16:23:09 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 148D01A048D for ; Wed, 2 Jul 2025 20:23:09 +0000 (UTC) X-FDA: 83620448898.10.9C2031C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id BAAAE40006 for ; Wed, 2 Jul 2025 20:23:06 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QY5lpMM4; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751487786; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fDhDBbwWPZp0OEwG0lfYKuusgWMoWpv67eEswYn9d5c=; b=j3Gki5I1BhNAJSn1FeFzvlPh157PVBNJ/JtmkHXuJkMDlOp2guMMOR/kxLk0VNMj1VD++J x5fHHtm7qCzxhjY57d71rzhl6NwTmtu3gFdJY4npxlasfnL2Fo580JVgU17OY+F+C/zTxW Ng+zq9DJB7voAFRkbLJ23DOep9flmMw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QY5lpMM4; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751487786; a=rsa-sha256; cv=none; b=N7UsX1BkTnwLhGcgykzHhufg5fbBDnZs4gZNRte8CbFFDCQTjImVdTjFfsdlVzCgbWQi98 4nJ0eVOxwtn30yN+5tRO30kWXxn9c3ugWp6OcuZ+dXAxF5lD86gYWamnbbVpjT/m9KIMmw NovPH2SshEF5k3+OWJv7bDGCCbWkqpw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751487786; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fDhDBbwWPZp0OEwG0lfYKuusgWMoWpv67eEswYn9d5c=; b=QY5lpMM4fx4hpv0+KR80FemdtTLknpP9C29JPnyiamiCAyEUJ/TxgwRyCsDRRv6PmV3gV6 PLlEIO5dnVleIJ08bonMHaiQ1k8NBxyldrnVGNHrCBMSFJjkqugro2i/BS0tbExwAaBJcR Uj99IVV7EQyxiMYLC7hlcA0fD2FN/qI= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-561-eFAWllKTNM2dp_7-m667nw-1; Wed, 02 Jul 2025 16:23:04 -0400 X-MC-Unique: eFAWllKTNM2dp_7-m667nw-1 X-Mimecast-MFC-AGG-ID: eFAWllKTNM2dp_7-m667nw_1751487784 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6ff81086f57so53737996d6.3 for ; Wed, 02 Jul 2025 13:23:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751487784; x=1752092584; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fDhDBbwWPZp0OEwG0lfYKuusgWMoWpv67eEswYn9d5c=; b=OlZelsGIBVS9H8XzJEitaO24Dg1qUl6Zcmjld8B5d+R+sDQHSmJpfARh2EO8sptsRr xtSf1Ebv/HOwtGpIhYre6zj+SOcmsxWigb1NqjrP03E7bAp72wvbZ9K0NJY5nk0uEXPw qxO/YWnquQ8tsIGdb0csZs2T4vZ2+kMJ1am5o6+Bw4zU8RzPo2QX5+DTbk48zJN0EgnG MJzg7WMxVC+kQhDZwjyYPO92I4H2zN/ghqEfU49Bs9gKO4KzKNz0S3KHsEu4cTn+jTXZ puByb//YDoFlALII6YamHEnkpLJwpqErShV7O5BBCoCq8oTZzDYuZZJ5gvTNrP3fZZon GLiQ== X-Forwarded-Encrypted: i=1; AJvYcCWDLdru/FZzWWWCEiem9cXOX0lCPLviDuXL83Tq1LaunLdtrcEVmJOrp7kH12xcQnAhS0Hlr/ZCEA==@kvack.org X-Gm-Message-State: AOJu0YxzX8REOhQLgpaILEzmlVL4OouxiJsw5dYBOSsCguPh4wvIWYpu Qr3f7StSNmKJFYj7F2jkp/9BFCcO4fXryyb2CmLQmEIOz/zvDQUETUPxTwyAG4dKyTfFF6/w/bi XICiOLZJ47rVro5s83oaJdRjQZRv+iIwkLA4U8ye5E5G+c/N8oT6i X-Gm-Gg: ASbGncu8Zvm7+9WKF/Ae4yzJow2GYusxfN9ULJyTEJ0MhIYnndCy6RY/RNLpIiGiMfB 8nkDAip7t7Ft5IsHb5H7iHnHoPdXhpw3h2zLMmaa7Rv/HL2O8wgSqxeJtuh8trOccB2qqO79sqi WC+XUljkSKZWvYrcFjjKJI7VP9r9bU2RbDUv0ky8xuhA8NOJFH0TKU8sAwKfeJfTyx/72XxP150 mAbWAPqr4ajy9z4Imjv7MfcK01+2mfHsDF85FPxHIAS3v+A5zLeAhPowrd2Nf/WGvXbDd8nDxca Gw+mKjmq5i4DRA== X-Received: by 2002:a05:6214:2267:b0:6fb:5bb5:ccfb with SMTP id 6a1803df08f44-702b1b46f61mr62661636d6.34.1751487783768; Wed, 02 Jul 2025 13:23:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGiKgeOx6dktDtgZgY72jURqNUMruopshT6W8nF+D7goln1uMqTumFSIXYnDlauRA2FyD48nA== X-Received: by 2002:a05:6214:2267:b0:6fb:5bb5:ccfb with SMTP id 6a1803df08f44-702b1b46f61mr62661226d6.34.1751487783174; Wed, 02 Jul 2025 13:23:03 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd771ce2basm106472736d6.44.2025.07.02.13.23.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jul 2025 13:23:02 -0700 (PDT) Date: Wed, 2 Jul 2025 16:22:58 -0400 From: Peter Xu To: Mike Rapoport Cc: Suren Baghdasaryan , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Muchun Song , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-2-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: qlmPSg_Ft1snQnE408Y-0ECJzFKf1VCb5Wjf4p5MhYw_1751487784 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: BAAAE40006 X-Stat-Signature: robcoyom33we6jza89rfrwcxrtxfqote X-HE-Tag: 1751487786-43262 X-HE-Meta: U2FsdGVkX18Iqsi7TuqtJK8+IjBbPI0hBuPpd5qXpH3b0nG1OC06qcQvLm+BA02UhlU9zZm2WFDlWvKyiQDewQD8UPRXtqSOmVjHV2MB9KuwVAEHUUlArlX6RYN8ZfqA2greoTHsYblkkyvsv4d+QpH053vM9hjm6XXps7AdyaSoItZZnuP1Qry4/+ArW8C9jEYR7N3BHoTJRHsehR2nYv7ZPFCNxiWQz9U85E0W5r4CeiqHsAgtFitl+B0wo5SiA0kUtj92YXT4D28PGFo6+ogkSKjS7QaECHvcwcJqWrv4upz1OE2NtATdTv0kUId1b3as3YU8rGssfjuz1ixUcGZGxeItr/LaEDfHvNXOSBXYgTN6omBjUpqvwfX2k3I0bl0VHX4rBdy/yg8qBSRXkVaZxXpmfB4PlFzUHXXdod2TIZ9t+2CA9wb5T37R7hrqSQGiPO9pmgSK8Y6ZswuvytiKnKlaYk004EfFs7i96/B+j1ToiPWC2o9lIRWUMpn9HjbATnQwSoDPnASJkFEDLdMswbxMvUCXkJefFk6OfVdjl/LICoJ98AHftqLBAd2Kml61G4D9aR+9m+oKmrEXJftnOz2HofI41BK7KII7sbGgWOv42dNxlWhWXROw5kqY1GdslYLCc1LV9jrIPkgkotf4YV32v49RThXcvU3oSvmoLTFGnsZMRxU4sGeNJYVtBly26IEVKGZAtpx6xr0Xa/hRGyY/oqQ8iZqkbmZP18deefPjlvuEh3fiaSNlrYnUTp5h/8JAb57KQBUXbLnCLyO3Fup6YZAfTZXJh3V53/qM0Dle/3sCFzrMqaRFpJnvco41rbP2xc26JfOuf4Qs8MmZFpa8jyQQrsWzV45AlfKh322V4G+feXAXSMJw0uZMOQK1sX95wjFNjPGhgzRLvBTpgNPwkoEgGidljBs0FcPfSbfNE/BhZiMoaypdS9Q12noZnGWNhJLy4YnlMjE nZy35Zgw +6nBjB6EfehAIaCr7PhmQCn1xdWBPyGGXokSeaPQCbSYwOy9DzufGsfOmOmWkEAGKqDndCnQxsvtt914iUBef+HyLai9H2YLjldJ8XCe5o9RC9472Kp4/OE9SAHVxRtxcQrJ+t1T0HtfOfFU8IBhMc8pCpmw8ILDHU0V/VoCQ9SInveJXdUiz+KM5As0SUt/+ng6/QdLtC9LJC7VrTLcxWma/rliAutfehBOUyOKbR5jyl1x9dX+P61JnGrw6Hj8im8aA+j+03lWJ13NUXxcN6VnkSRtTcCZDq/8ysAJbEHAZ7suVJGdgNYZTrI1uh9Yqrqyurqv2hftWm6tDxUteZVJ7G3gjeG53IHJhqVBAqpi3qo8K7pD54zHgOpNx5OviKFLlKcIb8106Grd9QzdjcYsEG54V7kCa+zaF0sg6ULeiYy5d7b+KV9Wz4ihHgz/nqN9gnUSSQ5VAI2uZAaW5GuIoFbIC6zrKcL8ZUT1RlMtojcdW8yt9IWZhDsB1YayO71GqxtvDyc/egJ25VgSygmbUnMbsx8AyR8IkzpoZP47munCyMGWCUAgxLLcJCjdzLcDu/waCYIENA+i8O0AcKsxR+k25m8pcwQPAEZV3aD98e1EVV2i+8TeW9VHFwJQN+oeANvKyiXx/Z9d3vUYz605ISLk00U01aO0lGZE8Rbx9spk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 02, 2025 at 09:16:31PM +0300, Mike Rapoport wrote: > On Tue, Jul 01, 2025 at 10:04:28AM -0700, Suren Baghdasaryan wrote: > > On Mon, Jun 30, 2025 at 3:16 AM Lorenzo Stoakes > > wrote: > > > > > > It seems like we're assuming a _lot_ of mm understanding in the underlying > > > driver here. > > > > > > I'm not sure it's really normal to be handing around page table state and > > > folios etc. to a driver like this, this is really... worrying to me. > > > > > > This feels like you're trying to put mm functionality outside of mm? > > > > To second that, two things stick out for me here: > > 1. uffd_copy and uffd_get_folio seem to be at different abstraction > > levels. uffd_copy is almost the entire copy operation for VM_SHARED > > VMAs while uffd_get_folio is a small part of the continue operation. > > 2. shmem_mfill_atomic_pte which becomes uffd_copy for shmem in the > > last patch is quite a complex function which itself calls some IMO > > pretty internal functions like mfill_atomic_install_pte(). Expecting > > modules to implement such functionality seems like a stretch to me but > > maybe this is for some specialized modules which are written by mm > > experts only? > > Largely shmem_mfill_atomic_pte() differs from anonymous memory version > (mfill_atomic_pte_copy()) by the way the allocated folio is accounted and > whether it's added to the page cache. So instead of uffd_copy(...) we might > add > > int (*folio_alloc)(struct vm_area_struct *vma, unsigned long dst_addr); > void (*folio_release)(struct vm_area_struct *vma, struct folio *folio); Thanks for digging into this, Mike. It's just that IMHO it may not be enough.. I actually tried to think about a more complicated API, but more I thought of that, more it looked like an overkill. I can list something here to show why I chose the simplest solution with uffd_copy() as of now. Firstly, see shmem_inode_acct_blocks() at the entrance: that's shmem accounting we need to do before allocations, and with/without allocations. That accounting can't be put into folio_alloc() yet even if we'll have one, because we could have the folio allocated when reaching here (that is, when *foliop != NULL). That was a very delicated decision of us to do shmem accounting separately in 2016: https://lore.kernel.org/all/20161216144821.5183-37-aarcange@redhat.com/ Then, there's also the complexity on how the page cache should be managed for any mem type. For shmem, folio was only injected right before the pgtable installations. We can't inject it when folio_alloc() because then others can fault-in without data populated. It means we at least need one more API to do page cache injections for the folio just got allocated from folio_alloc(). We also may want to have different treatment on how the folio flags are setup. It may not always happen in folio_alloc(). E.g. for shmem right now we do this right before the page cache injections: VM_BUG_ON(folio_test_locked(folio)); VM_BUG_ON(folio_test_swapbacked(folio)); __folio_set_locked(folio); __folio_set_swapbacked(folio); __folio_mark_uptodate(folio); We may not want to do exactly the same for all the rest mem types. E.g. we likely don't want to set swapbacked for guest-memfd folios. We may need one more API to do it. Then if to think about failure path, when we have the question above on shmem acct issue, we may want to have yet another post_failure hook doing shmem_inode_unacct_blocks() properly for shmem.. maybe we don't need that for guest-memfd, but we still need that for shmem to properly unacct when folio allocation succeeded, but copy_from_user failed somehow. Then the question is, do we really need all these fuss almost for nothing? Note that if we want, IIUC we can still change this in the future. The initial isolation like this series would still be good to land earlier; we don't even plan to support MISSING for guest-memfd in the near future, but only MINOR mode for now. We don't necessarily need to work out MISSING immediately to move on useful features landing Linux. Even if we'll have it for guest-memfd, it shouldn't be a challenge to maintain both. This is just not a contract we need to sign forever yet. Hope above clarifies a bit on why I chose the simplest solution as of now. I also don't like this API, as I mentioned in the cover letter. It's really a trade-off I made at least for now the best I can come up with. Said that, if any of us has better solution, please shoot. I'm always open to better alternatives. Thanks, -- Peter Xu