From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CABFCC77B7C for ; Wed, 25 Jun 2025 00:49:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63AFB6B00A2; Tue, 24 Jun 2025 20:49:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 612776B00A3; Tue, 24 Jun 2025 20:49:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54FC36B00A5; Tue, 24 Jun 2025 20:49:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 453B66B00A2 for ; Tue, 24 Jun 2025 20:49:01 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E89B5104FEA for ; Wed, 25 Jun 2025 00:49:00 +0000 (UTC) X-FDA: 83592088440.14.D7D75AA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 8DAFD20003 for ; Wed, 25 Jun 2025 00:48:58 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aZy1uEZy; spf=pass (imf03.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750812538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bJc/tRw3aj6KzpT4rJTNsgKmKIFeKpkxIH932eVSgok=; b=O8YztYcHXYyUDLQkg3PWzQBpbbLOIYe+cwMWbYt1qaYrC1WoMl9UvsKVrMBpT00T2sOAkx 8LyVqTA1DF2MZmi7+7TCfiK0ctSQlCe/xXmlzDudfSYBPCDiCVYLMvSvPHPYi/VmsJZo7I 1wqu2TUC5F6g6JfmwV1v5TMFF7PJ7do= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750812538; a=rsa-sha256; cv=none; b=212sw5mcvvtqmFrCqOp+Euq5mb1LYKRvu8UuA8wuVPdXNekrw3giItnXuApKHQfzbuCPmS wgP2BV2GEV3GigLT+qJeBh13MvnC8/XsiZntqGCg/pWpRxspv/m/KsGz2nICXZ37SRUd2g JHg/TMy2/MCw6xwl1LqltSm3s5odZC0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aZy1uEZy; spf=pass (imf03.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750812538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bJc/tRw3aj6KzpT4rJTNsgKmKIFeKpkxIH932eVSgok=; b=aZy1uEZycmwNLQDr30I0j3CxSpwivF0AcogHXjOtl2bJlek1iqLW43eGe60nRi4TtiN+L5 cP7n2ue4uCD+RxnCQmgr4T6w9DUExQfuqAbJj0OwIvS0KlXHwY1tTFGbc+RrvWg04x8Ccd xn1BSM1F1UHuAof2rAMOWJ4XWwVwbmo= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-438-zeHCf0QMOFu7yDv5WFQSoQ-1; Tue, 24 Jun 2025 20:48:56 -0400 X-MC-Unique: zeHCf0QMOFu7yDv5WFQSoQ-1 X-Mimecast-MFC-AGG-ID: zeHCf0QMOFu7yDv5WFQSoQ_1750812531 Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-748cf01de06so8701480b3a.3 for ; Tue, 24 Jun 2025 17:48:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750812531; x=1751417331; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bJc/tRw3aj6KzpT4rJTNsgKmKIFeKpkxIH932eVSgok=; b=CXM5+1f6qSVvb2v4NUGfrTBzw0LllSBgpmIwdqi3p/W7YF31gJXoqbqslA3SxSKDdk NYGuPDkpbpmF2x5WCr5jxjC8Al4QbW7oHXq2Dj+H4c18pUAnoNpMhGd/iJ2h3wdhNOy1 xS+PkilzlFEmt//KySmNNSZUh5SNVyfnwv+I8dpDEasKVqa4yr1+4Kmyg/mCoIBwGSxG ygmdeA0cvtqseFwLa0wSzZixgYA+nYCxgNyLpPsF0w9V0/SHj1PsdFEv13sBP/G3Y2Iu j8bcdJdb4Viu7BpsQC3SjkNmrp6a5j3F1UF5VclbK1ba/BBESL7AE0zf+/7rJNDVeZHO u9Jg== X-Forwarded-Encrypted: i=1; AJvYcCXUgOdgcxruQa8JvMAEOuzWXgTBYspYyNm4wbg/uAz+tUNco2uuTg/JQrhfVb2GIN/R4v9RFLfKAw==@kvack.org X-Gm-Message-State: AOJu0Yw3RiO0ZPQAmMUo8G3CUJfECRS5RIXeJFciZELykg7QrdH8lh0x BCYRP3eU1fbV18shwCYTH/rVYy1jDGB5kQAKeWx5oIXQ/9fZGGj9R0/Iv0hI7r8nBNyrjCLtzRn di641PbuWYtREd1euU6OQtvx1OREQSCvXFoGkgU1olXYwwA23DsMmcA3F5Cbx X-Gm-Gg: ASbGncuFtDN6ogHg080VbvgghZ8gG0sa2TYKRMaUOc7XVQX0la1cBMazSk1csAEo7oF 9pbQmMTc9GmYU/GNcO3sPi3lLltidiYwqJOFBO/iYTelLXoOPOJyVGLlkj1iMdilqLWY3DLFoDa 9oEEPL2Lhw2qv+c+XbCkn3Z//mDsW/UgQFYklD2ijUeMtW0A4RDd7Qy1cNSshS875sttldwAyuQ nMkemM/i6gCmI2wxWaktqgdyZnlpNZaCBUnZ5EA3ejB8yEqKAl6XgrXMqe0V6JJmKQ6E0YV6zeS BF65ytTaHO9yMg== X-Received: by 2002:a05:6a00:2d89:b0:748:de24:1ade with SMTP id d2e1a72fcca58-74ad44ada69mr1719168b3a.7.1750812530602; Tue, 24 Jun 2025 17:48:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGd4CqtbF7gS396Rpz/I4tWCNn10tCGvF/5/VrOAGVnYRP8qbZq5Ll9WitZXtz8k9UIjGbUCg== X-Received: by 2002:a05:6a00:2d89:b0:748:de24:1ade with SMTP id d2e1a72fcca58-74ad44ada69mr1719133b3a.7.1750812530171; Tue, 24 Jun 2025 17:48:50 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-749c882ced4sm2980577b3a.85.2025.06.24.17.48.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jun 2025 17:48:49 -0700 (PDT) Date: Tue, 24 Jun 2025 20:48:45 -0400 From: Peter Xu To: Jason Gunthorpe Cc: "Liam R. Howlett" , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Alex Mastro , David Hildenbrand , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250617231807.GD1575786@nvidia.com> <20250618174641.GB1629589@nvidia.com> <20250619135852.GC1643312@nvidia.com> <20250619184041.GA10191@nvidia.com> <20250624234032.GC167785@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20250624234032.GC167785@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Y-E44q--4oanLn4o4_l7q8VXXR88REnP_bhJI5vn6h4_1750812531 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Stat-Signature: fyhxz664ajw3xz4csh6ku7u8o66skt8a X-Rspamd-Queue-Id: 8DAFD20003 X-Rspamd-Server: rspam08 X-HE-Tag: 1750812538-969583 X-HE-Meta: U2FsdGVkX19yRTzSg/Qbjegt+gn6MFVDpk0Xe8JuPIEf80JKKQKp2A1+BwvpoT8rYn2ZZmwsV97JYzxu0a77zAQVoP/MEi5zhUQH52I08pJESXGFOxpOr8/vqeO+4iXoT8uYdiaY+YmEjeR2uqa72fCg+OVFhboZAPkyDANZAZo+6IWmj6ttHfuZ3s6zOqxm6mcZygDM3ZBkSZK/pMczM2gndAcoA8sg7iQhFBWDcPr6gTNSnideZra9px5kgNssFlpqeOa8+FczZuH6R1/6ivcNiVcJJ6lY14eFMYqIzXg52LwpS0/pLrbh2SUcoBA4oEU/F7tYTQNzpxmknWktJsKFP0d2TYD8m0m59AgY02MnCQEh3Ptkw72CR0WWPZRLTh+B10YsKwsmF1dtfzD00wxpdPJVWAdSb4JbDRB+5bFOuHjwH22YN4GezpNatdsr+WL6VHJV7C0+Ibl/vdSac5n+6ErkFt/It6QmDovjD6nhiPX0y8ZATgvZz7U0EXhsEr66OHrGCnleo2tG2Mxa9KtYhjY1Lf3LgapDYDvvd8Z3bbVgKlZS11xYVXlvQNhheXczKd2MakIsR7TYOcZ2wFm+PuNyJiUutBbmKgi/BOPkRmj2ylQOccDuv7vyvIgAQ9MNYG19xWe1HjOzQEjyDYuBT5qoFgsYsQUa+W3Iq7K4MHSpno+kJJP8oUpeKOzkAtNgmJ8goYaFp2MqD82k32bAGzxusMNB0i4ahc8uK6UYUW/rf1Q339XSyHJ0xuEcC8VpBmaCnPkpmxW1ORVbT7Fc/PCkoFtWFh7xsq4C4dTPnI4hGvsH1P3xpEMh+Z02vVHYCtPPstF4u7z9bYUI3LCGGvzFrUISg/LwkWSygi3SZRARjgYWmZnEeNoVH5+DuMIwGOcnGKHDhmRe170uG1EC5R64m04m8hyAKkUOIAY5BU4eHn+yLh2J8WlyM4hYW1M6StkrtP0DOMVsG0b ym0SHkva YzdGPhhknpPgKB4AnMZzeBb1z+5ES2EMLLTpWIpj6QKm3QI/N4eFqNpcxGWg7IP4wIfTAAlVBpr7xn2gKvDdPMrK9kHMNcQyS+2s9avZ2T+uaTwYALHEdhPS2IA3ic+4xXcjmwnTzWalACkEO5W8JoDmPiWdZjYOtOLETGivLYu0W/urDKDqxhQ0yh4TzkWkZPv2eE9XU0VkUuczmLIM+smpCrw+pDAHljk6t+fqVbHjANE+XatIelE8Q7hpctjv4rjre5Q4hxAu/WF/Rn4x9f/VsJG4Pkh8Sk5ewIy6zHjAKfJmJ0hD0nryK+A7BuMlwnu4H135uDNdMGof+/RcUdeLE33khkb6SHwELzMWGurrwTQokq3dXGFYsxbVXmN69g5KEcCr00Y2yJ2pqIX/11bsc2Dl1x7xbRx3PK2YZDjuDeE8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 24, 2025 at 08:40:32PM -0300, Jason Gunthorpe wrote: > On Tue, Jun 24, 2025 at 04:37:26PM -0400, Peter Xu wrote: > > On Thu, Jun 19, 2025 at 03:40:41PM -0300, Jason Gunthorpe wrote: > > > Even with this new version you have to decide to return PUD_SIZE or > > > bar_size in pci and your same reasoning that PUD_SIZE make sense > > > applies (though I would probably return bar_size and just let the core > > > code cap it to PUD_SIZE) > > > > Yes. > > > > Today I went back to look at this, I was trying to introduce this for > > file_operations: > > > > int (*get_mapping_order)(struct file *, unsigned long, size_t); > > > > It looks almost good, except that it so far has no way to return the > > physical address for further calculation on the alignment. > > > > For THP, VA is always calculated against pgoff not physical address on the > > alignment. I think it's OK for THP, because every 2M THP folio will be > > naturally 2M aligned on the physical address, so it fits when e.g. pgoff=0 > > in the calculation of thp_get_unmapped_area_vmflags(). > > > > Logically it should even also work for vfio-pci, as long as VFIO keeps > > using the lower 40 bits of the device_fd to represent the bar offset, > > meanwhile it'll also require PCIe spec asking the PCI bars to be mapped > > aligned with bar sizes. > > > > But from an API POV, get_mapping_order() logically should return something > > for further calculation of the alignment to get the VA. pgoff here may not > > always be the right thing to use to align to the VA: after all, pgtable > > mapping is about VA -> PA, the only reasonable and reliable way is to align > > VA to the PA to be mappped, and as an API we shouldn't assume pgoff is > > always aligned to PA address space. > > My feeling, and the reason I used the phrase "pgoff aligned address", > is that the owner of the file should already ensure that for the large > PTEs/folios: > pgoff % 2**order == 0 > physical % 2**order == 0 IMHO there shouldn't really be any hard requirement in mm that pgoff and physical address space need to be aligned.. but I confess I don't have an example driver that didn't do that in the linux tree. > > So, things like VFIO do need to hand out high alignment pgoffs to make > this work - which it already does. > > To me this just keeps thing simpler. I guess if someone comes up with > a case where they really can't get a pgoff alignment and really need a > high order mapping then maybe we can add a new return field of some > kind (pgoff adjustment?) but that is so weird I'd leave it to the > future person to come and justfiy it. When looking more, I also found some special cased get_unmapped_area() that may not be trivially converted into the new API even for CONFIG_MMU, namely: - io_uring_get_unmapped_area - arena_get_unmapped_area (from bpf_map->ops->map_get_unmapped_area) I'll need to have some closer look tomorrow. If any of them cannot be 100% safely converted to the new API, I'd also think we should not introduce the new API, but reuse get_unmapped_area() until we know a way out. -- Peter Xu