From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D989CC531DC for ; Fri, 16 Aug 2024 19:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A7AC6B03A9; Fri, 16 Aug 2024 15:21:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BCC86B03AA; Fri, 16 Aug 2024 15:21:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 435056B03AB; Fri, 16 Aug 2024 15:21:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1ED606B03A9 for ; Fri, 16 Aug 2024 15:21:24 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 820BDA022B for ; Fri, 16 Aug 2024 19:21:23 +0000 (UTC) X-FDA: 82459077246.19.68C6ABA Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 9F195140016 for ; Fri, 16 Aug 2024 19:21:21 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iLZysB3T; spf=pass (imf23.hostedemail.com: domain of 3r6a_ZgYKCDgmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3r6a_ZgYKCDgmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723836044; a=rsa-sha256; cv=none; b=HYq1UQ22zkR0PbbJdu3tEAIuGjZZfGgXtoilzcbeTkTnwjod1eIhGUUVjfk631jFP9dVTX ZLTtgbI4rwVW1HciDb7dQniD4/MSC/fe3/tN2VSWPSp9zIHLT4sbbGIz5q5HwkvNm8IWdS xZmzp1roHmqsB+u7bjnqT+X/G5v/M0Q= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iLZysB3T; spf=pass (imf23.hostedemail.com: domain of 3r6a_ZgYKCDgmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3r6a_ZgYKCDgmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723836044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ftVQJ22uwhnVCu9sapxQkj0rYV8PxV9JBpHRmc+PMHc=; b=4rV18iF3QFyxspMYl2nR1JGGGzsIeGSy8FFoKgPieemd6WFvo0mMRMIL2COuOoHMBafe7o aCfeyxGEWkNksymU8PIg/aexjc3flswtoVW/XCVwgV0J91wcMyEN76aj1BzHFo4RPC8vJD 4uz8aRSka3cxrZUCQAbJrvW9w1n2LGk= Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7a1188b3bc2so2060939a12.2 for ; Fri, 16 Aug 2024 12:21:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723836080; x=1724440880; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ftVQJ22uwhnVCu9sapxQkj0rYV8PxV9JBpHRmc+PMHc=; b=iLZysB3TwdQpVX6fIvWefzBqlcRiBjtOhWpMY4rUzN2wCsQdiLA5AMd0co68sBnQbO kMxqDTgLnmRMRP2etYhdea47gHzxra8YiuJie3N9HXTISQuEPSyvYJSr6gr52Fmd0Yns oTze/cU+m9JiUuGhfGXIaK6FCsQMfN5q56nlYNrnGHNHkN/OXsS3A5FMj1rlZJRNgTtT Sh5ii9LyfFzY0bUvqTMaleCyoeibBBiq0cecphFZKoZpMmoYlxqjuGfaXfCnIEBZS3zA Jzb0PxWzze5Ncn6r8hLjDtBIqPfy0z1l4hsg2AG6iBACMS9BkN6B6EKxc3sJPobVJOO2 QWEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723836080; x=1724440880; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ftVQJ22uwhnVCu9sapxQkj0rYV8PxV9JBpHRmc+PMHc=; b=jE20xUcR2OlIPcWTdgbVNh6DpDdTQwrLgNLIwEaEpKFgUvCfyWA8RczT4V2k+zBXA0 R7vK+m+QonTGiYQDU4/92jzHX/6OEF7lG3MOmneVv00H4Y1uHldk4B432DY7I84o2QyO 80WEJ93yVjFPHq9d7NDw3osnxamHyj50mpYcUEUTx73t7P6yiW/zjDZ4mi+68KUJslmW o88UdSdv2+IXiuRmZlr2gsSVMzC7LjIRl0ENV50enTyYVsKqagVwqWHO00j/li4s5bXT Owg/bb/TrFvVrdJbbknZK1tjHQNm/iaYg6apnUWUhyAERHbZWyAjXhlbVtHXi3fysbaE fOBw== X-Forwarded-Encrypted: i=1; AJvYcCUJfZRkeImUC7iejpavhO6zGx7SZqDYU25yDdHWnQQRjZowNUa72buVuS5NJei0GouODvGE+d7HFw==@kvack.org X-Gm-Message-State: AOJu0YzUA55RWmcRZ9VznkKpuebysjK2dZ7iDhHjCvzPpjpw980LC94r 4BZugE0SvyvRYstSNjxJO8qgHH1Gb/oawCxJUY1tGl7p8oE8JNanO2nZwGYXNLfhAJHzTNbgVzp MXA== X-Google-Smtp-Source: AGHT+IHf6qVj/b2zC94+HdArQXss/KprNTNzFkLYPIjrHyHGZ5yxwPLzjGqV4Tufr9LC9iVbmqgXfl1sOeQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:4e60:0:b0:7c6:acc8:3eb5 with SMTP id 41be03b00d2f7-7c978efc48fmr6574a12.1.1723836079882; Fri, 16 Aug 2024 12:21:19 -0700 (PDT) Date: Fri, 16 Aug 2024 12:21:18 -0700 In-Reply-To: <13-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> Mime-Version: 1.0 References: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> <13-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> Message-ID: Subject: Re: [PATCH 13/16] iommupt: Add the x86 PAE page table format From: Sean Christopherson To: Jason Gunthorpe Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Tina Zhang Content-Type: text/plain; charset="us-ascii" X-Stat-Signature: muua6nxqu39src9e5jwgfjqxnhh3o4a6 X-Rspamd-Queue-Id: 9F195140016 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723836081-157391 X-HE-Meta: U2FsdGVkX1/zX531j6hWLCEDvN+tOnjm3exBk0ENjIHj+9km5fcyQqPZwwWqdFpEobb5sgj2s+uQZf9PmH5blKuH7vr9tkGSm3xw9Pi46Y99DYwEVyci8Sj+MDeeFmGUHe6F2HDGEgmug8cIiiakoRT4mTZx0xS8QsACIJxa9vaaiJxlrxnsfpoHtPSGBuPTJuSG3HU7O/MlW5anJfKT501ulRoqpUwrg6FTSp5Yyhq6DdBrHsxK/H4ARkWi69813wsOFR4B8qU7Un89INKIxHBtMkmRiZF4RFm1AUxUN8upWAQ5efKVAj7ZM43LbIcsd8uzBba1XzkX1uJgK9NaqadMoFipR4TtmC/3vywdg1cJlVeLu5wveHE235rqCai6BGMFOsPUbaAxtMj4cg37jrM40tj07T0rpu9f10x8VMYHm/Rb4jaGv+HbgGMc6wMwItl3OE5OOOP4AxS1s1JwNFS/eloLmklFjRXGOIXah5QXetxAz6BlVcfqRTcAKZ2jEat/lj9QxwGn97o5kt4Ariczre0pB63oFn6Z5xqdR1e55NxFIbKZpb/B2xmmGcTbaeaOmLOCddGrmBEQYQpHRXEu6Ex1f/kanozZgusUxkJbLGD8Hqkbf1vpuqRSkMyRlWWQtkv+QwtksivteGBtaP7PCId8UXHNtrQWwidL1uIpGsffmv3x/JlzsAfX15UVei7PZZsgG7Cbt6oyykTVeqBcYkIq7uWFcMcW8OEac8Zmp+VeUP+jn+D47f31xlFrFHN50CAq2ZTHuBm572hWbP2YEaSFSqagOQxMGgGOpY7zpKCEO6AJoSch1SjM0OeBiBvRpq5taZ5DM3IUle1PLWlgrJKVJI5nIEJ0jpNZ2tkRhAe1Nvm0J6NxCmxC0+7Mn0CzuhNtzDCBzgLzd6ZeuujpHSmoxw8bqo0OlAyZpSkYHlXnKP4ruGjWQWr9g6R4WT/62u77agyHlC9zCMx Q+vShY6Z yqJGRFC9BpYqShGPx2EdFI5aaVwp8WMeFbqfnr5xzZgttjIAKU+a9Y33q5hp4oQ1wYb4sUnCBUwR70tvYDX0452WFX/FvuiThB77rAqcw7MVEJGFPkic4+tlCtPf7BBntzcmIu6/oL0G7zkvb4ly2Fqnzkv85EiMFcD3dliL9zLOGiSCYLYXd/kvPCWnjcca37mnK7PY1mR3zMlM+n8ZNwqumHNGnd62XDJ8df27lLVnT42JnWZLfK6HrVXFckJuihc55PpN1OhgcPlN3r/wIoHfVMXnmjmRMAgzXY39OaGS1Pgo9NRp42L6yLInoUPpwPa2GzhS7AlzvmJ2eckOdRpI6pCWwbtdYb1HBV7PkFf9v17UCIeUZGnJzdHzcKyks/Lfm7qnpRs6YCg6zDOlpr1jeRecnS0ti5FaSYaAsbcA/9oEM/UYRjbchuud0bPu6fg842MXzGw5TDQcEkdiqTVXeVZVCwsZzhTnLHTzvbJPM1BgFXdzxUd+FAzyt5UfdDg7WSuOr86Qw1q4tw/DNb0B90Ihmg0/0HIeahGlv5GOINKPfefYNMW5OPaeR1tc/u+UJkTb0xZ8dliGTqUKmvmirdQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 15, 2024, Jason Gunthorpe wrote: > This is used by x86 CPUs and can be used in both x86 IOMMUs. When the x86 > IOMMU is running SVA it is using this page table format. > > This implementation follows the AMD v2 io-pgtable version. > > There is nothing remarkable here, the format has a variable top and > limited support for different page sizes and no contiguous pages support. > > In principle this can support the 32 bit configuration with fewer table > levels. What's "the 32 bit configuration"? > FIXME: Compare the bits against the VT-D version too. > > Signed-off-by: Jason Gunthorpe > --- > drivers/iommu/generic_pt/Kconfig | 6 + > drivers/iommu/generic_pt/fmt/Makefile | 2 + > drivers/iommu/generic_pt/fmt/defs_x86pae.h | 21 ++ > drivers/iommu/generic_pt/fmt/iommu_x86pae.c | 8 + > drivers/iommu/generic_pt/fmt/x86pae.h | 283 ++++++++++++++++++++ > include/linux/generic_pt/common.h | 4 + > include/linux/generic_pt/iommu.h | 12 + > 7 files changed, 336 insertions(+) > create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h > create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c > create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h > > diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig > index e34be10cf8bac2..a7c006234fc218 100644 > --- a/drivers/iommu/generic_pt/Kconfig > +++ b/drivers/iommu/generic_pt/Kconfig > @@ -70,6 +70,11 @@ config IOMMU_PT_ARMV8_64K > > If unsure, say N here. > > +config IOMMU_PT_X86PAE > + tristate "IOMMU page table for x86 PAE" > +#include "iommu_template.h" > diff --git a/drivers/iommu/generic_pt/fmt/x86pae.h b/drivers/iommu/generic_pt/fmt/x86pae.h > new file mode 100644 > index 00000000000000..9e0ee74275fcb3 > --- /dev/null > +++ b/drivers/iommu/generic_pt/fmt/x86pae.h > @@ -0,0 +1,283 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES > + * > + * x86 PAE page table > + * > + * This is described in > + * Section "4.4 PAE Paging" of the Intel Software Developer's Manual Volume 3 I highly doubt what's implemented here is actually PAE paging, as the SDM (that is referenced above) and most x86 folks describe PAE paging. PAE paging is specifically used when the CPU is in 32-bit mode (NOT including compatibility mode!). PAE paging translates 32-bit linear addresses to 52-bit physical addresses. Presumably what's implemented here is what Intel calls 4-level and 5-level paging. Those are _really_ similar to PAE paging, e.g. have the same encodings for bits 11:0, and even require CR4.PAE=1, but they aren't 100% identical. E.g. true PAE paging doesn't have software-available bits in 62:MAXPHYADDR. Unfortuntately, I have no idea what name to use for this flavor. x86pae is actually kinda good, but I think it'll be confusing to people that are familiar with the more canonical version of PAE paging. > + * Section "2.2.6 I/O Page Tables for Guest Translations" of the "AMD I/O > + * Virtualization Technology (IOMMU) Specification" > + * > + * It is used by x86 CPUs and The AMD and VT-D IOMMU HW. > + * > + * The named levels in the spec map to the pts->level as: > + * Table/PTE - 0 > + * Directory/PDE - 1 > + * Directory Ptr/PDPTE - 2 > + * PML4/PML4E - 3 > + * PML5/PML5E - 4 Any particularly reason not to use x86's (and KVM's) effective 1-based system? (level '0' is essentially the 4KiB leaf entries in a page table) Starting at '1' is kinda odd, but it aligns with thing like PML4/5, allows using the pg_level enums from x86, and diverging from both x86 MM and KVM is likely going to confuse people. > + * FIXME: __sme_set > + */ > +#ifndef __GENERIC_PT_FMT_X86PAE_H > +#define __GENERIC_PT_FMT_X86PAE_H > + > +#include "defs_x86pae.h" > +#include "../pt_defs.h" > + > +#include > +#include > +#include > + > +enum { > + PT_MAX_OUTPUT_ADDRESS_LG2 = 52, > + PT_MAX_VA_ADDRESS_LG2 = 57, > + PT_ENTRY_WORD_SIZE = sizeof(u64), > + PT_MAX_TOP_LEVEL = 4, > + PT_GRANUAL_LG2SZ = 12, > + PT_TABLEMEM_LG2SZ = 12, > +}; > + > +/* Shared descriptor bits */ > +enum { > + X86PAE_FMT_P = BIT(0), > + X86PAE_FMT_RW = BIT(1), > + X86PAE_FMT_U = BIT(2), > + X86PAE_FMT_A = BIT(5), > + X86PAE_FMT_D = BIT(6), > + X86PAE_FMT_OA = GENMASK_ULL(51, 12), > + X86PAE_FMT_XD = BIT_ULL(63), Any reason not to use the #defines in arch/x86/include/asm/pgtable_types.h? > +static inline bool x86pae_pt_install_table(struct pt_state *pts, > + pt_oaddr_t table_pa, > + const struct pt_write_attrs *attrs) > +{ > + u64 *tablep = pt_cur_table(pts, u64); > + u64 entry; > + > + /* > + * FIXME according to the SDM D is ignored by HW on table pointers? Correct, only leaf entries have dirty bits. > + * io_pgtable_v2 sets it > + */ > + entry = X86PAE_FMT_P | X86PAE_FMT_RW | X86PAE_FMT_U | X86PAE_FMT_A | What happens with the USER bit for I/O page tables? Ignored, I assume? > + X86PAE_FMT_D | > + FIELD_PREP(X86PAE_FMT_OA, log2_div(table_pa, PT_GRANUAL_LG2SZ)); > + return pt_table_install64(&tablep[pts->index], entry, pts->entry); > +}