From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1EDCC77B7C for ; Tue, 24 Jun 2025 20:51:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E3906B00AD; Tue, 24 Jun 2025 16:51:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BB0A6B00AE; Tue, 24 Jun 2025 16:51:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D1636B00AF; Tue, 24 Jun 2025 16:51:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F1C196B00AD for ; Tue, 24 Jun 2025 16:51:21 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8EEEE16103F for ; Tue, 24 Jun 2025 20:51:21 +0000 (UTC) X-FDA: 83591489562.29.FF95324 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 09E9040009 for ; Tue, 24 Jun 2025 20:51:18 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KfQiqwu5; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750798279; a=rsa-sha256; cv=none; b=77yCopBHEvL/AJag1KU/e5uqrIjoqUqf2GT4wseZ+1NFBReq/8cbRV95uGssfXdkbid68m hQghXiI5GbRkQwtHNn+M0CLxuFy7CAsiw7cAqTg1ZOZIW4ISeK8VBg94oMEsX1VLDszRrC M9ihxzLdrDhFs3WF49/5kk2asS3rBxg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KfQiqwu5; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750798279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J11ds042ECzs7H3dbgHbcrEw+iFqu0r/26P/J9h7Qko=; b=2GeoHKkFzP1Dx8zBAIllZfgVJP3DG8/mTLO6sMzo+mWU+g6IXeV8N9BlLTacqkaF3jZQEi a+qpUlqS/pR02JWiOwzF9bqoVRqS3Aj20on8xPqN6LJB+VOfkzebtGIdEH3pVFj9beHM9T 1kk8v3qzWx77I/NjJryZMSqc9IHbkJc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750798278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=J11ds042ECzs7H3dbgHbcrEw+iFqu0r/26P/J9h7Qko=; b=KfQiqwu5gPS1TgAw7NTzae70NRTk2luc6b/PVz7ol3apXBug/gRiXhb/4GBOKV+dTJH+7R Fb8TlTgsc0gBwFnKyCFz/c3IMr/iBcF2cKmyLg6WpAkb6S5XpsMUmn0IXEPtGU7uHy7xxB 8w2gQg9YI+3icEcZ/p7U7JOIs69kA74= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-692-LgSqTAD5PUyNJR58cXV-aw-1; Tue, 24 Jun 2025 16:51:17 -0400 X-MC-Unique: LgSqTAD5PUyNJR58cXV-aw-1 X-Mimecast-MFC-AGG-ID: LgSqTAD5PUyNJR58cXV-aw_1750798276 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7d3dcac892bso44318985a.1 for ; Tue, 24 Jun 2025 13:51:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750798276; x=1751403076; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=J11ds042ECzs7H3dbgHbcrEw+iFqu0r/26P/J9h7Qko=; b=Zj3oJZLQrieg0AUEDJlKWWW/gDepjb+OoMZ5imikXEpaga/47z+ftyzy55YSYb0uvt i2lJ/Ud0aiJaP9uf6y3LxB4mCyMx7JuQNVrMAJM/OKSjGn9/PjmSD/3Gz9WXCYhnjsgV HNQVbguyJXJ9oLmVytIEJ7MO1JPNZYC9RGj4fBD3yq11isn8bfECgju+oogsia5m8ZOP IIFi2oPk1IAUKq+XjyKpBIlW/3VoS8EWErTD08Fps39srI9B4p7JLY5flMWP/1CiK4mx PUsyzK0JpAjFYqE+GzwKdlQH6tcD+ppVY2Jz2MRVwsGNdw4FWgoh8VNIJK5yhwpQ4rdx xP7A== X-Forwarded-Encrypted: i=1; AJvYcCWUuBhQ6Yj6EOgKZf07T86Lt+CoQ9iCODXOQWpOImus5Ojx+5HFxWOaNZ4EF5sGZSM8tw2nf/4LwQ==@kvack.org X-Gm-Message-State: AOJu0Yy6+99osQ/lWJHp5qy5egMe9kMg0DGsLQpdjio+m9DCzguvlQ56 ikx7gb5qQ7qcN2W+/m4mgoiAF+rkDvBsCpWk3ltIT2BrPrcq3vVn3rREDzTjPsYt5pkkqqlBNgv ScbyrQOc+QMHEYSzi8UG6rYJfIqgmh6u01J/e2gR+kpo3RotjIt7s X-Gm-Gg: ASbGncv+qEHq/CwZX7iEYD16sjVmY8KGO3cEOKVHgXOSQTB4hfwUi15gs6yPEPPBigW 8pS1v1clkIITGxITPhrrNdCYh6nhHLRR6L0WpDGg3rChmv0Vi5a/K5pX80DX1HRkoug15l48kyL L0AsWEJl/et7+Bd3rcai+uU8sjLAD0X0oKHVzwp+fOD01unPO6y9hYvoEZYApLr6iCFf6HrEw9X lSDhFJ/dyME6CCzFaJYYqafK5yuKAaxUNcfY96TDqcAr3jYyzfoqExH/L0UnZCJxPEMCrPyEKkz GZ2C+lxSwsSDzg== X-Received: by 2002:a05:620a:2907:b0:7d3:f8b8:b1ce with SMTP id af79cd13be357-7d429964b36mr55472885a.27.1750798276412; Tue, 24 Jun 2025 13:51:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGXB0NjNPJFhxsKe8ZP75Mk2G2kMyPcL9xehXKcPoe5G89surWH97a5ZybNtffg/Gvuki7iBg== X-Received: by 2002:a05:620a:2907:b0:7d3:f8b8:b1ce with SMTP id af79cd13be357-7d429964b36mr55469885a.27.1750798275985; Tue, 24 Jun 2025 13:51:15 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d3f999c07bsm548154585a.4.2025.06.24.13.51.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jun 2025 13:51:15 -0700 (PDT) Date: Tue, 24 Jun 2025 16:51:12 -0400 From: Peter Xu To: Jason Gunthorpe Cc: "Liam R. Howlett" , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Alex Mastro , David Hildenbrand , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250617231807.GD1575786@nvidia.com> <20250618174641.GB1629589@nvidia.com> <20250619135852.GC1643312@nvidia.com> <20250619184041.GA10191@nvidia.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: br-vvyW5fkO30PPTQ72h5vkK09sw7xLS_R4a3ZjB4Ng_1750798276 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: sarpaiurhjpu89znnpuuxb1gisrx71kq X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 09E9040009 X-Rspam-User: X-HE-Tag: 1750798278-951105 X-HE-Meta: U2FsdGVkX1/ZNeky8A/pQleSosu2Q7yFyqR8yGLdU34I24/DmXsvUiPQJCEEKHGL23oz7SDTsnsoB2dF8zaK7zszPsEE3bRBw4h+dNQkM4GhHG6RbocAi68PHGyOHqhO1bfmqXg0kjKhKTmZ7M8kgRecDdLF8As70pwuAWTYQSmHu2L7bviROW8XcU7RKCym8l5TSoPXqxuwAnp0ZByCOWA8P/oK4cSGNM4XYXOx0I3kYtWfRGO4TZTeoq8ApkARPen6FoiTarFYZnN85TJphsjYSP+JTK7ovbWI3iYT2JCeE18z73ABHB59Eu6w+bUyKg2JOJVtMnA9fvr2djZyDPynmsuTVbAAI/kKEk0ZBAH/MaW49UvwzS/Ypu2UPzJTSSEPXAKHWFRM/wuLIJChLwdgyVo3fgPkEXqOoib6kIZuEKzc58ITz0Cn6DEOStTPxIUY7/Tanh2BAsXNBd7qjdy2KdLAuzuGpxa/HRicJRs1+aZkKfFL6/klnGVqWrU5IaqFZk1oWBPpqARGdz+uw0VHTP3VNAG4dhMitOli17jo5oUIX9v2EoRv1WFqcs9tJWdmNDQ0VZaES9Fa3QEOoV/PDw7lkbNAtjw93GEz4Xtse7AmvrMnJScFiIGj64ByviPNT8cb0u9HeBn09dH9mBnTYI/l0H1Zne/P3T04Ll6G9gTLbxWT/uH9Uub9+xs6dBqw7FEIAKFgyPCMobtVw2OHTeG/q9TO3KGuqrF97jq5B1NztJkiCvvt+w4FernH9dmYx7AxR2OycfF3/XhILSDVOAFIuy8tLItmWDfFRjPfEIgrK7DlJFv8NU5UJu5E5JI8o6pxzZ7ksW5s17Z1VdO4TUSGgFvajW7+UjDzpc5x0Fwqh3vyuOmhne2UDMyh0OY7jRogKsvAX9XWNyjnav0x34HbptG3zMiNIH4WESuRmpL5r/swK86xe2q+owAtCi8GVIdsYJmCQiFd2ti XiqnEE9m nneagO3ePKQLrfTQl0Uxzu3KpQMGG/Pz7/7P3fv42qSfSnz8gvpDt5rOxIiZDuw4rao42nhBX0F/KRNZs2EGyO4z+RQYfX3B2+GwTu9EahYzmz2c/tTX2LyGKmLjOU4D618QFl67GCB1hCC5hDUBE3GfNegrwnr1WRU9ColDgjbU2v58cB270gOZfTwmrFXdks/pd9Ve8t6Ygl4AyGwidgp3HBis8MNL3sCr5gEtgE9tjzfemtueDzod56wig9ikoDusgrnqMbPN5kLVYSZzr74Rb1fYBlf+CdZ/mJCmr5ygBC13A5PFHFiPQSP82r10f3x5qeFXdMnvGvKAgEF3raNP0nGgClUepRXWSeVGx7pBrZxYSLFucaeRAJr3TwTO6FvxsmT2p8IBKKgUyHAhTz/Y0+Hk1358SuCCFKh8QyFX+0Rn6WRGu3tdmlcIOQWS0AJK0yrKz+bmRT/PCB4Zd1uJEMJs2wFOsL1iSz7Q92nPo296tVqgQgDZynq/tTRki/yYGrvviICm4RQwlWs1uDVWmVPYAvZ3fv4A/MNHBvUxWL9M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 24, 2025 at 04:37:26PM -0400, Peter Xu wrote: > On Thu, Jun 19, 2025 at 03:40:41PM -0300, Jason Gunthorpe wrote: > > Even with this new version you have to decide to return PUD_SIZE or > > bar_size in pci and your same reasoning that PUD_SIZE make sense > > applies (though I would probably return bar_size and just let the core > > code cap it to PUD_SIZE) > > Yes. > > Today I went back to look at this, I was trying to introduce this for > file_operations: > > int (*get_mapping_order)(struct file *, unsigned long, size_t); > > It looks almost good, except that it so far has no way to return the > physical address for further calculation on the alignment. > > For THP, VA is always calculated against pgoff not physical address on the > alignment. I think it's OK for THP, because every 2M THP folio will be > naturally 2M aligned on the physical address, so it fits when e.g. pgoff=0 > in the calculation of thp_get_unmapped_area_vmflags(). > > Logically it should even also work for vfio-pci, as long as VFIO keeps > using the lower 40 bits of the device_fd to represent the bar offset, > meanwhile it'll also require PCIe spec asking the PCI bars to be mapped > aligned with bar sizes. > > But from an API POV, get_mapping_order() logically should return something > for further calculation of the alignment to get the VA. pgoff here may not > always be the right thing to use to align to the VA: after all, pgtable > mapping is about VA -> PA, the only reasonable and reliable way is to align > VA to the PA to be mappped, and as an API we shouldn't assume pgoff is > always aligned to PA address space. > > Any thoughts? I should have listed current viable next steps.. We have at least these options: (a) Ignore this issue, keep the get_mapping_order() interface like above, as long as it works for vfio-pci I don't like this option. I prefer the API (if we're going to introduce one) to be applicable no matter how pgoff would be mapped to PAs. I don't like the API to rely on specific driver on specific spec (in this case, PCI). (b) I can make the new API like this instead: int (*get_mapping_order)(struct file *, unsigned long, unsigned long *, size_t); where I can return a *phys_pgoff altogether after the call returned the order to map in retval. But that's very not pretty if not ugly. (c) Go back to what I did with the current v1, addressing comments and keep using get_unmapped_area() until we know a better way. I'll vote for (c), but I'm open to suggestions. Thanks, -- Peter Xu