From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1F2CC25B74 for ; Fri, 24 May 2024 20:55:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 314496B0093; Fri, 24 May 2024 16:55:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C3BC6B0095; Fri, 24 May 2024 16:55:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18BEC6B0096; Fri, 24 May 2024 16:55:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EE7726B0093 for ; Fri, 24 May 2024 16:55:00 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A9B2A40767 for ; Fri, 24 May 2024 20:55:00 +0000 (UTC) X-FDA: 82154493960.26.4AE38EB Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf07.hostedemail.com (Postfix) with ESMTP id 0F99540002 for ; Fri, 24 May 2024 20:54:58 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4THF3Xkr; spf=pass (imf07.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716584099; a=rsa-sha256; cv=none; b=XKf+1fbXA2FVCtJ8IB6NmES0msXottg5lJ4mrbb0itjxoOLdbt6ysle5HZidr6XWNPQre3 7vuut0UnmXwsPad9/BQjlTBlEQciq1esEVK3FvoyvyeWg+YRy05UW7vG/RB4njJQ2Daz/7 nr62G3B6PEtqbe1fJIaCNuPqc7B0R88= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4THF3Xkr; spf=pass (imf07.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716584099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=K3YOqAZMwYWD8mZYZR+PWNUsiAFwVbBZgpeLlfkjOMw=; b=oM9gHmeerhdLYGTTq5i/drSJGnJWubyG256KbyFS09AZKZ1oBHxxM2XMH2mTUhFTDpodJ8 2q5jf4xPRd0RUF9u+LdqT6PmgNgxTUfu6bP7t53rP7CkQMq0iZTHVomhVFRGFCiP3ee/Ry bZiZZF0EdtkWEefnpfxHeTUDaBzpxqw= Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-354f5fb80d5so2175460f8f.2 for ; Fri, 24 May 2024 13:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716584097; x=1717188897; darn=kvack.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=K3YOqAZMwYWD8mZYZR+PWNUsiAFwVbBZgpeLlfkjOMw=; b=4THF3XkrV2BHkMp5z8JtmbQ4Tf7ev6TC80IHtLQPcEjMorXwtpo8y8wS2gkcwCGtSe /VU0g9BBGpAroRt2pgBXjJyrREMD4pQgCQLxW/v8uQ5F9p1mClSVYDtQfHbGNC9F5YMw X7cgf4CNG+tL2BzIniuX4caTJ2kaf5RROd5Ufs4bQVl+BrYpqz03un/TKrXGXNFy6vWe NCs+3IQU3J5B5FUmvqDVNTfzbMyuLTCS23OxvY1B+pXzF2Qz3z8rg373B8N+eDRz+WL3 SZxiJCbL1ivoLWAlykKkw/aUyLeYhPIchwSg5UBXs49034BuCS92HhkcBdHzt4ixBINn r0YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716584097; x=1717188897; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=K3YOqAZMwYWD8mZYZR+PWNUsiAFwVbBZgpeLlfkjOMw=; b=c1Gps9aPKFzoMm4lqzzkaSR2NxxZkxWQ05/ylNcn6W5eqpP1gccfwS1/7t4iFpr34y cBFrSS3dVHbD57gUV/ZgznXGsdk0QDStgm0DBatk8IofZ6MDo3kTjorioL+AukGmlREW VzFtsBlkcvMp/rtU04JCgcjIDV9YU6+porUyNlrA0fhFhYKorlhqrlWaQ7eF+qwAqYsH RYEqsaxQraGqRmxGS3RH+qbjCXal9Pl6yHpbcM0PxMW+VPGu2FCiejWtbdsdT9U1tt2U Um8Y397XQOhAH7YjTWFJ1MoOp+Na1g88qqPZMkreeikXze3sT2paUPY3gOgugD1kavKm iugw== X-Gm-Message-State: AOJu0YxD9mTKITKA/px5iegBTwdY3018numsxFY6Ob4GbEe4lkyjpi7I /dDgo401vZWkcv33v8M9NC9PgxjQxX0nMRUUedLorADDPrfZmNGaWbn877aSBovGjqrmA8j1iHA RxFtwr0B16NU0OvFkEBEnH/qC6nMdYJeh5gh8 X-Google-Smtp-Source: AGHT+IGSpXQCvdP4AYb1ANqRd0hTpUfpcjVU5pIDApVSgINw9Z965VlruUx8Cnxl077/hYv9stywBNONwC8u+IWR944= X-Received: by 2002:a7b:ce10:0:b0:41e:454b:2f7 with SMTP id 5b1f17b1804b1-421089fa056mr28130635e9.23.1716584097213; Fri, 24 May 2024 13:54:57 -0700 (PDT) MIME-Version: 1.0 From: Axel Rasmussen Date: Fri, 24 May 2024 13:54:20 -0700 Message-ID: Subject: [RFC] Huge remap_pfn_range for vfio-pci To: Jason Gunthorpe , Peter Xu , David Hildenbrand , Sean Christopherson Cc: Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 0F99540002 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: ye6rsoipjmybmzzmwmexnax8k1efpty9 X-HE-Tag: 1716584098-18043 X-HE-Meta: U2FsdGVkX18rHrlOPKyiS/8DFzRxcQNKciwyWH5rSgVMUo7/6GYvueNYb5uUueLSmQAOGuZTxD4HMVb5UQIAHyQEy/3+y2Y8+gXmO4ZubBjW7KUJPJ45pPHJ42Re1KvKw+5eTmZjSJyzak5N3kil5LY15cDBVocyaMB7HpILAgvngBF9jaBMgZ0AV7xLTgBT468LNoh/R2t+0QS9hu1TZZ3LI69U06ifCtVwAZUG9EFFW0jxolF0iopDtxa4l+H/lqOr1l+F9WUtAQtb/ewqYw2fQP1JoTeuUHKBUsof+hM0anMnh91Q1Mw5WvNpJB7fZr9Bn9aZl0Q8XQtuqe+LcjkSZUgtZO0NYioD5iY+dE8MNYzlryuYk5tLt84jWtS3wDKFu4u675MH4WCJ/zgFhFulAJYEJNNUqlK9LknWKfeBbWT25kqvq5ffYH2lbfXHFpIl1zVy7dN8Pn+Dc3GEULRdQdfWheKooMY2xZMX14K5BFrIUJWwayHcttK+jx9fVtmysPnNvHFJSPpa+H+ApB54DolhAY5FV7oS+xr+E4dxnhGatI0ewnpCFpNZim232NrmclRRD3pJnnvMuf/ly7ecM1F9wDBd5xypOUQqcWXwtXRe6rt6TpvSOZN+0jPf5ydIkgHS9kFTJMxVW3VvM4t7OxF2h40x9BiKyIg3dB6Pb4Sm6+Y2KWB0i+Qhp+p6R9r2DO/l8FwgnM3y8+YbvCdnR3HCzujhTKW0dC8bhICcrOvE/yLBj6dt0GYCmcig3i8vQELOajn+m6csliRzo8U5beEeM+viN1P9wI+8NlKFJJER00pBJTLriHy9ufkqY797rrFDUpQJEuRk2e2YcDmagJ/SbZbeGjDEQTYG8+DvKI6QdCMZyjTW360BjNTHkq6btmThNc0LBw/Meqo4Fi/lwCMFnyK27rS/yBmwsWTFVeKTkg7H4a2zM5FSmoACDd9g0d4qlG6SFy/Ufyw pXPKw29+ yLNqFXeSDN+/PCOPKJsJ2nAdft3I1ngjcQNLjN2mjTQUvde2Zetdk4Dz+Dc2AH7m1z/73MwNusjbaoJrzAu2ap7AYmYwrGkpJJ28Ex+QL/RIOFChgARAbHDmNSIQintgnjhrKnddkWVqteh4KhPzX8QEOvZdRFeXeLrDKPgc07RnWPsa78Nr4kQmOurlUElL7GEts9TTfRZW+3P5QhMjdAgAXbeUi1lw7BaHH21suYH/BXNenCQfK2JUK+2+dJI7UrxXv6wV61PUQ308= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000045, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, I'm interested in extending remap_pfn_range to allow it to map the range hugely (using PUDs or PMDs). The initial user I have in mind is vfio-pci; I'm thinking when we're mapping large ranges for GPUs, we can get both a performance and host overhead win by doing this hugely. Another thing I have in the back of my mind is adding something KVM can re-use to simplify its whole host_pfn_mapping_level / hva_to_pfn_remapped / get_user_page_fast_only thing. I know Peter and David are working on some related things (hugetlbfs unification and follow_pte et al improvements, respectively). Although I have a hacky proof of concept that works, I thought it best to get some consensus on the design before I post something, so I don't conflict with this existing / upcoming work. Changing remap_pfn_range to install PUDs or PMDs is straightforward. The hairy part is the fault / follow side of things: 1. follow_pte clearly doesn't work for this, since the leaf might be a PUD or PMD instead. Most callers don't care about the PTE itself, they care about the pgprot or flags it has set, so my idea was to add a new interface which just yields those bits, instead of the actual PTE. Peter, I think hugetlbfs unification may run into similar issues, do you have some plan already to deal with PUD/PMD/PTE being different types? 2. vfio-pci relies on vm_ops->fault. This is a problem because the normal fault handler path doesn't call this until after it has walked down to the PTE level, installing PUDs/PMDs along the way. I have only gross ideas for how to deal with this: - Add a VM_HUGEPFNMAP VMA flag indicating vm_ops->fault should be called earlier in __handle_mm_fault - Add a vm_ops->hugepfn_fault (name not important) which should be called earlier in __handle_mm_fault - Go ahead and let remap_pfn_range overwrite existing PUDs/PMDS I wonder which of these folks find least offensive? Or is there a better way I haven't thought of? 3. That's also an issue for CoW faults, but I don't know of any real use case for CoW huge pfn mappings, so I thought we can just keep the existing small mapping behavior for CoW VMAs. Any objections? Thanks!