From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE92EC83F03 for ; Wed, 2 Jul 2025 20:58:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 572AC8D0009; Wed, 2 Jul 2025 16:58:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5234E8D0001; Wed, 2 Jul 2025 16:58:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 439668D0009; Wed, 2 Jul 2025 16:58:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 365C88D0001 for ; Wed, 2 Jul 2025 16:58:56 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 923371D75F4 for ; Wed, 2 Jul 2025 20:58:55 +0000 (UTC) X-FDA: 83620539030.16.038C918 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 561FD40005 for ; Wed, 2 Jul 2025 20:58:53 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DpwOBkT2; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751489933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WOTU+d2v5BAWQ3IDOfILW9WYtvyZL2oUTXZsgcEBv+A=; b=lJzy0Yxi0L/q8II0RIolAgU/y5sougkTVXGDAhtsifMxYX5aSHAunz9B4B/c3aLqqWzVv1 VrnwPlxJJc8u2V9ce1IZVFwHtthLo4VPwtd6GlNxINBq7J3nsbSRupC1zAKdPNweUAUw0w TBHOAN778YmNQw1gVKMAjfpJ6hRVb/M= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DpwOBkT2; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751489933; a=rsa-sha256; cv=none; b=VFo/M54UyYKwZl9WkspVy0SEHtv30knaSqCEa9Ebr9sl5OJR8ti8an8N/pwhpyih2j35WB bYKyOttCnRyLFqkboAQRTcJbSkUEzmm8XU7r9Um/tyYZJrUmEnoNIQv/YX/Xr2zqyaoWs1 zxbCLfpx/e11IsUGualS9ISuxrZBJGI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751489932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WOTU+d2v5BAWQ3IDOfILW9WYtvyZL2oUTXZsgcEBv+A=; b=DpwOBkT2aMpq18ucrYpVtDWn0KkSCLDi+s24VP/svjUKtm3FlmB2ekA2VVtCyR69sVx98k kqKpDhkodCtvzuYO5fij+kspq80tGie592RIicr8W8kLg5uAYmfCwmYHrzqdLGw1QWQJWu 5pM0s27pVb1iJjSPd8u/Qh0x5eQa3LE= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-582-e4H0soT2OLiGRSAFfJ3KJQ-1; Wed, 02 Jul 2025 16:58:51 -0400 X-MC-Unique: e4H0soT2OLiGRSAFfJ3KJQ-1 X-Mimecast-MFC-AGG-ID: e4H0soT2OLiGRSAFfJ3KJQ_1751489931 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6fb3bb94b5cso128929976d6.2 for ; Wed, 02 Jul 2025 13:58:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751489931; x=1752094731; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WOTU+d2v5BAWQ3IDOfILW9WYtvyZL2oUTXZsgcEBv+A=; b=cPIXOUaL2kt6nfi38kfkUGAEl0jsf2wEq+wPlmVZiiqI5ik28BaV69PEGLTPS7VYfV bcQMnxmroL2z3cZCDWBrKLEWZC6XOYe4o7krmmEUkM0EzitP+chZxDhYjX3iGfxHUhbT z4CwRoD8lTHAF0cu+zHZELybz9GNdgLPDoX4HxG5ia/iZEmsCbXbZArl6iEgbwaemHwX HiLqGeiJsZz128YE7g0levs0QDPsAcxW/tFbUpOcq5umAKf+c6okBji6WpH0f5+361E5 dcj0j+V8YeYDJ1fDSSLvcBsm+cT2ixRVBA1amZuthJt8DXIcjSTq5ER3B1o2nCqCnvZj 7+tA== X-Forwarded-Encrypted: i=1; AJvYcCW9M3zgH9oomuk5pESetHsgKEejH1xRmKMXRW0jzBp9f+WOsBpboRJLgVdbul/yLahdaOg5iHwQcg==@kvack.org X-Gm-Message-State: AOJu0YybfGhGWYw9+Z2sAvzo9FAp0BmD9cqjnwQg6+ACS+eKzVQvgUhZ H/E7/Y6pQnyZECJu+5UD0x86S6Tih4U7F2kY5whZenx2u81J4HVSo03lRia0G+RKAGu6bKVSX5g 6eBus2RWLR/v9+venAP+d21gY1yNLHZxp1iYBhVt1lljTNbatbTNh X-Gm-Gg: ASbGncs52mEsTZCxDWftducBccu3Ej1IweZ2wdt6mhIRhfDC7hcdwAlJYMxgA4Pzugk ErCL9dsbcAkQrb8GP8KvxNkyCGaE3rvv/iV6HtUwwT5ITIUvNm0hjW4G/oLgu1WAtwHn8/pasN2 dgyWfD85d+xdHceVdus6FWdPxf/LqlciCOqJztNcZeGAxm7bZ66kFwd2d0W8zRqEVes+aNECXWV aNu2VvWQQDsHY5+A56/SeUhL8LDixpXbO8nvchmnRNA2V5ICCu8GO95tWmHxWS1kp0b5c6Hicde Pa6KM0E80lfXrQ== X-Received: by 2002:a05:6214:2aa7:b0:6f8:e66b:578e with SMTP id 6a1803df08f44-702bcc67924mr11063236d6.32.1751489930999; Wed, 02 Jul 2025 13:58:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE+QOp1gP5d1ygXeemNGH1JhDJQDSPwFCZsqWpTNaKH+tPWObjk5CcB+1Hqk3D9Ifq7DRzEeg== X-Received: by 2002:a05:6214:2aa7:b0:6f8:e66b:578e with SMTP id 6a1803df08f44-702bcc67924mr11062776d6.32.1751489930414; Wed, 02 Jul 2025 13:58:50 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd7718da94sm106707176d6.24.2025.07.02.13.58.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jul 2025 13:58:49 -0700 (PDT) Date: Wed, 2 Jul 2025 16:58:46 -0400 From: Peter Xu To: Jason Gunthorpe Cc: "Liam R. Howlett" , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Alex Mastro , David Hildenbrand , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250619184041.GA10191@nvidia.com> <20250624234032.GC167785@nvidia.com> <20250625130711.GH167785@nvidia.com> <20250625184154.GI167785@nvidia.com> <20250630140537.GW167785@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20250630140537.GW167785@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2lvIKTkJ3q4blrNvt3ZZsvYz20i-qmf7ciweR3jfBR4_1751489931 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: rd9ghudt31d35dk7qokc7qy1mf79ske6 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 561FD40005 X-Rspam-User: X-HE-Tag: 1751489933-104620 X-HE-Meta: U2FsdGVkX189xqqDMSbuDji76J9Skfr2J5jQPw7QXj1bKorGDED4L895zjg178pW7svc1KrRE6ucqVhvSQMAu+CFQGSo8KvTIFtpFjqm2JkQTiwQzEu4Pkgga4rZca18nphh+oOB7IQjZZAyK/O7s5aRUE0BGRxe9bVJ/Ba53VWtCbiQ0Ec2Pny6ysZ08QlrZzldNeINCSygjoLb597l0PSd0p9tSMgxbOPY5t+XScYidWYMwdR1MAyEITndWAhi1vrCi+DmT8lwEUGMbrPD7T6Y6JscdQ5UgtN+C9B/IC/9XzVVXubAqE26NpAe6fBCxI1lqIwTn9BmadCgy40T1Edmnefty1KBxPkdFzRZnCfmwi9peqfT9az3Vlp8RzNMiv7vTrijGN1V4LbJTUwb+gNKfopgDmfCZQOXHvClhNHxdSD9VpyBDJ6CPGvsc1oMx/C/LZWKEPEtapzFE4krpq1P1Yb2ob0ab8G9hq0Syhq+WGFsqqm+kTlMm+aAjn2WwydU1FJpFLhq92GBKS0DC0Wz01+dq1c2JKVnJfyhu6t2dKCMCEKF15Wyli3Op9eadqqIph3WQ6GzKiKAK3rrUPRIvo1gXb2/O1THJssI7ntUoVbp4dKh7HATYydQVO22ZKdAmaoUf19SsdYmyZ6xgnI+7rdzF3LEPl1M7frMvHyG0L4A+ryFLyGp4WG40lmI5CZnSn2nSvvuVBPSoz+izduRvxBbEKdLEwu/KHvucfO+yKPcI7ShbUQWvXQEJNE6gPq6lRs43JN8odNUVdg+GQezM7t1pQ6c97q85xNUjuxpi5Zt1tUX2Tq/D/hv4WATJ82KEHS3g5EO2nj5YQd6FDGimOh6og+HY8VmDL/K4XIP6fuI5QCypGZbF5WguJ8Yjthiy92o7/PPuSzDksA67LpaVjEuzTtuqXDH2NF4JDRd3jC6Qrx+ywa1g2p/4bDlpmz93pETAFKS2xvOBle qolnVehs ULcsYHHIzn4sPF2XQHZF35NtH6y/T2PepfOB8ayiTl0i93RAumheWCCNc3wMFNaORbNXNxpUTpPJSb7WlAOKYvugLjF0Kzhaaub9V5KfRKDbn75rDaZoi/Fli9XZntlAbzn6O/lBurZ/A3b0QiQfAwUD+m6xZGhRb1RUAS6bbrMnaMXepIKApNzUAEutMBYc7URlVT41MFFfEemZHQWhFqKMSDn5TXU7NWCAn6yMEKcWvHF6d4dl559hgWiJ4Y4bR18M3qE18go0vm+fohp3XUgYgk591nboVVvi8UB+3dXFLRxmTZwfp1XO1kkl7aPRKQ7pO3j7VlmbEEfUpsGv7AMCdMu5inwNIr9F2/QYEb6nOQPtijIIKsgyQ11GMc9WLwCiStjCE4OD5NoPI6dLV8TdGYsL61o4uVMgOMW+HJUEjBvo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 30, 2025 at 11:05:37AM -0300, Jason Gunthorpe wrote: > On Wed, Jun 25, 2025 at 03:26:44PM -0400, Peter Xu wrote: > > On Wed, Jun 25, 2025 at 03:41:54PM -0300, Jason Gunthorpe wrote: > > > On Wed, Jun 25, 2025 at 01:12:11PM -0400, Peter Xu wrote: > > > > > > > After I read the two use cases, I mostly agree. Just one trivial thing to > > > > mention, it may not be direct map but vmap() (see io_region_init_ptr()). > > > > > > If it is vmapped then this is all silly, you should vmap and mmmap > > > using the same cache colouring and, AFAIK, pgoff is how this works for > > > purely userspace. > > > > > > Once vmap'd it should determine the cache colour and set the pgoff > > > properly, then everything should already work no? > > > > I don't yet see how to set the pgoff. Here pgoff is passed from the > > userspace, which follows io_uring's definition (per io_uring_mmap). > > That's too bad > > So you have to do it the other way and pass the pgoff to the vmap so > the vmap ends up with the same colouring as a user VMa holding the > same pages.. Not sure if I get that point, but.. it'll be hard to achieve at least. The vmap() happens (submit/complete queues initializes) when io_uring instance is created. The mmap() happens later, and it can also happen multiple times, so that all of the VAs got mmap()ed need to share the same colouring with the vmap().. In this case it sounds reasonable to me to have the alignment done at mmap(), against the vmap() results. > > > So if we want the new API to be proposed here, and make VFIO use it first > > (while consider it to be applicable to all existing MMU users at least, > > which I checked all of them so far now), I'd think this proper: > > > > int (*mmap_va_hint)(struct file *file, unsigned long *pgoff, size_t len); > > > > The changes comparing to previous: > > > > (1) merged pgoff and *phys_pgoff parameters into one unsigned long, so > > the hook can adjust the pgoff for the va allocator to be used. The > > adjustment will not be visible to future mmap() when VMA is created. > > It seems functional, but the above is better, IMHO. Do you mean we can start with no modification allowed on *pgoff? I'd prefer having *pgoff modifiable from the start, as it'll not only work for io_uring / parisc above since the 1st day (so we don't need to introduce it on top, modifying existing users..), but it'll also be cleaner to be used in the current VFIO's use case. > > > (2) I renamed it to mmap_va_hint(), because *pgoff will be able to be > > updated, so it's not only about ordering, but "order" and "pgoff > > adjustment" hints that the core mm will use when calculating the VA. > > Where does order come back though? Returns order? Yes. > > It seems viable After I double check with the API above, I can go and prepare a new version. Thanks a lot, Jason. -- Peter Xu