From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D59BDC7115A for ; Wed, 18 Jun 2025 19:16:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7532A8D0003; Wed, 18 Jun 2025 15:16:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 703798D0001; Wed, 18 Jun 2025 15:16:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CBA78D0003; Wed, 18 Jun 2025 15:16:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4BE468D0001 for ; Wed, 18 Jun 2025 15:16:02 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DD2A5120AF2 for ; Wed, 18 Jun 2025 19:16:01 +0000 (UTC) X-FDA: 83569476522.07.BBA5AD2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 804FE1C0013 for ; Wed, 18 Jun 2025 19:15:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=K6Z8Lbrl; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750274159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kc0dVjUcpt5gqxs6qiFWYkQnsos+yMT/xtO6WjX53Bw=; b=HfzuWqIayzMxZlyQf6lqwrxZOhGmc6dwTC/yRAY8FZxxi6FzszpjGHHfMFUqmkOsgqXI92 UYLa8pNKTceDHkuPbtXH4a6Z/bczzMXIfbyqCGPe+wAs06h8yCfX/6bswwMOScFCsB6IQq +NuwhnZF67OX6+XXi59oeiZaOgnbHsk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=K6Z8Lbrl; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750274159; a=rsa-sha256; cv=none; b=mSAEbYaghAFua0LqRMbId9lEUIB7jBxd7Z6NQLLjg4NmayxHE60pDfPOv1ZYa5ahp8d4dG MdY0SwzW/ze6UFQNZK1vwRTdEKUmA6s+ptrnDk3vFjk7M4phnI26StARlgUNOYupMVtNNb WXY5Z2vIQ+KSeSNhIAlQIQJxglzvmI0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750274158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kc0dVjUcpt5gqxs6qiFWYkQnsos+yMT/xtO6WjX53Bw=; b=K6Z8LbrlH+QbX70ZYNPKvWy1gZQaMZg5D2ZW0eYpx6MqUxfNj5ye8YLyJTE0A1ij5jcoli Hxt17276SsoxAJDRDqibJwhLSRyOWK0UOrpGnL2u45DpTNc9ffqecwJI6BrmI9tqx5576j To1icEnRIXWOEeStaVNcUrpy17Q+uRs= Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-0illmI56M6SWE5OJBFIiCw-1; Wed, 18 Jun 2025 15:15:57 -0400 X-MC-Unique: 0illmI56M6SWE5OJBFIiCw-1 X-Mimecast-MFC-AGG-ID: 0illmI56M6SWE5OJBFIiCw_1750274156 Received: by mail-pg1-f197.google.com with SMTP id 41be03b00d2f7-b0e0c573531so25867a12.3 for ; Wed, 18 Jun 2025 12:15:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750274156; x=1750878956; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kc0dVjUcpt5gqxs6qiFWYkQnsos+yMT/xtO6WjX53Bw=; b=ICHCSp9ts1UOkuFPhLQ0dFFKro6lwVswFQMdHULX1mhRnlJXe0m/RNUp111jUIEr21 wJn9MUz3Ysr2SCmL/Kk8CvTfcUoqkFNruJ4XaBMpsOtRm3Cur0XFzn+gp/ySUEQEAKsy 3RAdpe1060TMb7aXXE/M5hLO/Tmd12IXrgiNZq1Ee9aV4LhFCaaxxo1NE1LJPPOD/MA/ wRfN8Vpz5xibGAQjpVbNmZm8tfv3G6Oc5pbpHrqxmOzfGf0eLE5rRNs7aoUeLPIQqoyh GqAvMDkva/X02/7Vkj2rqGcr32yhYI+FfQy83X7GOZB8NPqRxtlpv3kh5Sr/ef7eWoxz m40A== X-Forwarded-Encrypted: i=1; AJvYcCUWALookHqixdIldY96dUP2Cy5in5B9awpHPoA5lBOY2Irdn602jy2YoAvSdJ+X4TWMkSu4N3nZOQ==@kvack.org X-Gm-Message-State: AOJu0YzAxFn3i+nqoW3ZDBcE1AmryX9z+/D8nydDzg7qrpTBKf0hDDQE QFJKP9o7ijcOD41g2NJvLItGn2IO3iFyvmf0bxseQqHzrPAF4INkCtMZI/FyDYh+uYLie6hHfju ora/6VEG/ZGTNirIL7O2cx+XHOKSFf2lxA+CR8bkH5nOZxZhwhCOw X-Gm-Gg: ASbGncu8iEP3wMht8nb79FXsB2vNft5ftA93he4KRrM4C67w+ac7O3jQzsrdL5R0vam mFh4ikAcOppLACtjjaJwFGCxuDbpZC4olo9zTi98xixjlC2F3HZwjEKh+i9oKHYSEZFEAURAj42 bLbSH9mMpSPVHOOU8atEATJrUkRqaYzxLGFkiqconlkzFSXUpz8Kpmu60kCHlLjafviH1pLLt0w hjlJqffRmZQsOdQ32NNoFR+2ZykHaxf9ZfdxSzjTFZHmzPrGpB4cL1bDbsFUfAnDC2XOhQZ3gN0 QiMN1JmU/4vc0A== X-Received: by 2002:a05:6a21:3283:b0:21c:fea4:60e2 with SMTP id adf61e73a8af0-21fbd50703amr31919630637.3.1750274156185; Wed, 18 Jun 2025 12:15:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGkUR7lr6JS/+B3qk3OAHsRiYcnhO1km+mxn8h2hyzqyJAbV+OPjD7TSfxxaVIoV5qwzJHRrQ== X-Received: by 2002:a05:6a21:3283:b0:21c:fea4:60e2 with SMTP id adf61e73a8af0-21fbd50703amr31919582637.3.1750274155792; Wed, 18 Jun 2025 12:15:55 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-748f649d15bsm487986b3a.65.2025.06.18.12.15.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jun 2025 12:15:55 -0700 (PDT) Date: Wed, 18 Jun 2025 15:15:50 -0400 From: Peter Xu To: Jason Gunthorpe Cc: "Liam R. Howlett" , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Alex Mastro , David Hildenbrand , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250613160956.GN1174925@nvidia.com> <20250613231657.GO1174925@nvidia.com> <20250616230011.GS1174925@nvidia.com> <20250617231807.GD1575786@nvidia.com> <20250618174641.GB1629589@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20250618174641.GB1629589@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: qxHQDSZXb00qZHAJkHABegWdKCfl6lAFYEc_4V04WbY_1750274156 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam11 X-Rspam-User: X-Rspamd-Queue-Id: 804FE1C0013 X-Stat-Signature: 7mtz3q5koowugmwmoousn1d7ctcgr3bb X-HE-Tag: 1750274159-262567 X-HE-Meta: U2FsdGVkX1/h4GDVbtP1BxWZnK7RHIYCRi+j90SJCBmMLPH9VkDHTbNvAoRxRYbQgsN4GCcLSXA8a2XEb9cgRh9JDM5svY+9TeymfX0oTUcyNDWUfaNbXnsqWjGlv8lgb4vh7RkOUWkhZzRGlmRnh1fp2CyJZirIR+hzkkkpwd+EwcEH/7mB3GjYwXYyg90Vcx+TNRY751n0IBn26o509HMc/3wM5FYAas5LETG2rtjDhcMqujnyWkHWZJiyObSg0nqRBc825C2coMS3UmKFafAfgaUE+D13hyq2Ma7aWUDep8VDPwqjfrJ7Tiz8GDIvfUZ2A5FG5sCZJH8U6hXbB6H0uc/I9oYiet3R+T1UwRj6J0DNLy0309deXFmYRUJ5kVeS7ndITnJ5uA0QUlVBRVQzJPFhmQHZVmbqAqeTgbPfdNPmEr3I9n/27g5NVRLlmb5VG6rXAOhx8zL+tBHrh7xleHDVAh5KoI1ej71qimccKqi4++fvbKTU9wvpogCiSigmwP3xQxCTP8UidH9LbffMmf4DBaelp9lz25mIk0oY2ltvqWVM/nWDjLpj4gAvX0uYn7osD2Uet8EDr55rzNH5WrPkRMpmn86XL6DB6WysqjYGm56+pkJGIv73Z3VV07ddq908UMQE975gFvXebZQGywMD2bYqh0aMeeuMCOzB5cmF2W4yorHZajU6AgTobURf3nvzn/3x9Ql4YA8QeyWh13Nq5CN9BfieHr/nxgwC0PdpfNAQvkEEi9R/AvWQEWrP4r3sARuegt6BBTSgZAOzG29mrTymCHGu1zNj7WFnU/cjit7itemwTvw/IsE0JMKK3xZ3SdaQDnSg9iwZReBdDCT+RmiWBs6pivSj3QWh6LWkQLyqsF+/KmuxB+MHmO4QmMXewUysr+01I1FBKb7kmZ7tTMCbjOPb4A3sg5etZLGWMr2h+rrkggKvnfuWgTqS/rnkU1edJu8ah+z pRWavoZC hKD1f8dIwa05auj8wglw3jehk7Nxy8K+0TsxzCLBExfXK12Ld3rTM02A8UYbSECWlroI/dggkygad6qDBBMxnSiXVbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 02:46:41PM -0300, Jason Gunthorpe wrote: > On Wed, Jun 18, 2025 at 12:56:01PM -0400, Peter Xu wrote: > > So I changed my mind, slightly. I can still have the "order" parameter to > > make the API cleaner (even if it'll be a pure overhead.. because all > > existing caller will pass in PUD_SIZE as of now), > > That doesn't seem right, the callers should report the real value not > artifically cap it.. Like ARM does have page sizes greater than PUD > that might be interesting to enable someday for PFN users. It needs to pass in PUD_SIZE to match what vfio-pci currently supports in its huge_fault(). > > > but I think I'll still > > stick with the ifdef in patch 4, as I mentioned here: > > > https://lore.kernel.org/all/aFGMG3763eSv9l8b@x1.local/ > > > > The problem is I just noticed yet again that exporting > > huge_mapping_get_va_aligned() for all configs doesn't make sense. At least > > it'll need something like this to make !MMU compile for VFIO, while this is > > definitely some ugliness I also want to avoid.. > > IMHO this uglyness should certainly be contained to the mm code and not > leak into drivers. > > > There's just no way to provide a sane default value for !MMU. > > So all this mess seems to say that get_unmapped_area() is just the > wrong fop to have here. It can't be implemented sanely for !MMU and > has these weird conditions, like can't fail. > > I again suggest to just simplify and add an new fop > > size_t get_best_mapping_order(struct file *filp, pgoff_t pgoff, > size_t length); > > Which will return the largest pgoff aligned order within pgoff/length > that the FD could try to install. Very simple for the driver > side. vfio pci will just return ilog2(bar_size). > > PAGE_SHIFT can be a safe default. I agree this is a better way. We can make the PAGE_SHIFT by default or just 0, because it doesn't sound necessary to me to support anything smaller than PAGE_SIZE.. maybe a "int" retval would suffice to also cover errors. So this will introduce a new file operation that will only be used so far in VFIO, playing similar role until we start to convert many get_unmapped_area() to this one. > > Then put all this maze of conditionals in the mm side replacing the > call to fops->get_unmapped_area() and don't export anything new. The > mm will automaticall cap the alignment based on what the architecture > can do and what > > !MMU would simply entirely ignore this new stuff. For the long term, we should move all get_unmapped_area() users to the new API. For old !MMU users, we should rename get_unmapped_area() to something better, like get_mmap_addr(). For those cases it's really not about looking for something not mapped, but normally exactly what is requested. > > > So going one step back: huge_mapping_get_va_aligned() (or whatever name we > > prefer) doesn't make sense to be exported always, but only when CONFIG_MMU. > > It should follow the same way we treat mm_get_unmapped_area(). > > We just deleted !SMP, I really wonder if it is time for !MMU to go > away too.. Yes, if this comes earlier, we can completely drop get_unmapped_area() after all existing MMU users converted to the new one. Any early objections / concerns / comments from anyone else, before I go and introduce it? -- Peter Xu