From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92777C3ABA5 for ; Tue, 29 Apr 2025 13:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9C2D6B000E; Tue, 29 Apr 2025 09:44:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4D656B0010; Tue, 29 Apr 2025 09:44:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC6966B0011; Tue, 29 Apr 2025 09:44:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 900E16B000E for ; Tue, 29 Apr 2025 09:44:24 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 96C4EBBD83 for ; Tue, 29 Apr 2025 13:44:24 +0000 (UTC) X-FDA: 83387200848.03.9107933 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 63BAAA0009 for ; Tue, 29 Apr 2025 13:44:22 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="MK/IWAht"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745934262; a=rsa-sha256; cv=none; b=UhQvqjrtjmGovK5+JodG8fVewaEYnzt3mtr4halWC6ZRclSPNhrvv3pDn20w4EsulopkdR xaXo6izlPM744tqjINsJ/TaI6umd65Kyea/ChcYpRjVhC8b0uHgTE2X2ZPj2KkjXQR80Qv w7yU/Fb+WtN6PIcaetVr6jzsaR1MI4w= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745934262; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8El/MNSWeMGWGhHkl6gHFO/ffF6zovBaM0ZG9eZHFK4=; b=POTAtou35wYCtO27q5+lGCWVQgVasv/DQjU6x0oFpgKwc93K6P+pETJmr8ESRRF+1tkAe7 zvmP0P+2slb51huQcQlo+MCydm8Y+U/k9edphGPWrHaZ2OYlVT/T7aBoJngqADlEGXgiKC Qe4/KINzVGNyWyEtsBa2aU/D6cszvsQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="MK/IWAht"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745934261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8El/MNSWeMGWGhHkl6gHFO/ffF6zovBaM0ZG9eZHFK4=; b=MK/IWAhtUj+KdZWX0C/hjsy6FN7MXjuBXU05zL9K/dC4ydFkCxUFh1GpHuiCqm1T/78I3G FkK+VVrcTQnG2rBBoXghsHC12G7Sh9eM8PkF3i3jBqTt+FGXdAtILmWQPj/7mHlbAgrM7E XafVqqvXFSf9jJEsvLAy1xShcto6HlI= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-065UnQSlP4ioNEVwCYCRvw-1; Tue, 29 Apr 2025 09:44:19 -0400 X-MC-Unique: 065UnQSlP4ioNEVwCYCRvw-1 X-Mimecast-MFC-AGG-ID: 065UnQSlP4ioNEVwCYCRvw_1745934259 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-476753132a8so49369531cf.3 for ; Tue, 29 Apr 2025 06:44:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745934259; x=1746539059; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8El/MNSWeMGWGhHkl6gHFO/ffF6zovBaM0ZG9eZHFK4=; b=FESwRbinMV4YXsx3An98pYcUUU+jtvrtKXDjRUO3b385UsEsZK4LtbE4+6okB87ds7 XU3qy0FMmHG0lVtL+DDuNZTxwykyc14l8H3choi5RxDaJbxFLfPRYpUxbtCiXEPT970X I73JQv8vgiLaxb2Manpx5u7XwSRILp9A3KRvqIu2MSyV1ICZEUbQeCSfq9mHvkwaBj+t VPn3m/hr/JHysHSXKqOfFJHmEGT4desRhwsTBrSP2WtubAwGqODugpUgiqhUwdXlOgXT w5A0AWOsYBRRiX3ieyr80aF8cP/ErJ7wlT3RzgL5uRCGFUT8gyTJRjJ6Ci7bLMuOKp3Y 0hhA== X-Forwarded-Encrypted: i=1; AJvYcCU0AgGlxluI43dmOx82TpaoC+7GVF9D/tKXJb7gZqdlSnj1vk0WJlvTOIVfmDXcDUZTCC5g78HBZQ==@kvack.org X-Gm-Message-State: AOJu0YxSsvHjdiik6F1ZNsQnd+3rm4/N6mje8X3OJ0BMVLZ1LMNLSDrN ss0WGbb9trTBK7nD3hiCVWbtsOPEB9gUQ7Ga5o9IKpcM6dD0tDI9qzYIq5rzruF3BXBgced9m8h R/J9uIERfMyqSgAGpxnXxyu07U92civ7Vdyi0J7WjAUw/eifM X-Gm-Gg: ASbGncsN3CqrzXYsiyIVeageCjSokiuAqVOdQmIikAGjgjcQQZRS40/dfU6cZ+T5IYv PCFApQIoco2sENgHs3bsyA9E6uYkscpCWg+GiL3dmAjx3YqX+YOepuPM8FPoK7HVf6WSdreW3HT zgIACY4zK2OstZeZOyuUqdlXBNEb/cQc+geYATwT6Npx9BPURDFXB19wVVnt0LtXn9ThoVzVEQc Ax1U9pRIC6It2Lrm54GaHH/OHS5enD+l/RKNrj4NBz3TEhYIGGRU4petX1i3F+QhTNgL/o3Nl+5 La4= X-Received: by 2002:a05:622a:192a:b0:476:7d74:dd10 with SMTP id d75a77b69052e-488131609e1mr63610901cf.19.1745934259236; Tue, 29 Apr 2025 06:44:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEnvb9YE4nJ15XVZdziBUCLpPu73z51k76ZUd4aW/VF9ZsxE0SxmntEXtj/bIpgYaiPj9O/Pg== X-Received: by 2002:a05:622a:192a:b0:476:7d74:dd10 with SMTP id d75a77b69052e-488131609e1mr63610481cf.19.1745934258824; Tue, 29 Apr 2025 06:44:18 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47ea1693378sm80410351cf.64.2025.04.29.06.44.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 06:44:18 -0700 (PDT) Date: Tue, 29 Apr 2025 09:44:15 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Simona Vetter , Andrew Morton , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pedro Falcato Subject: Re: [PATCH v1 02/11] mm: convert track_pfn_insert() to pfnmap_sanitize_pgprot() Message-ID: References: <20250425081715.1341199-1-david@redhat.com> <20250425081715.1341199-3-david@redhat.com> <78f88303-6b00-42cf-8977-bf7541fa45a9@redhat.com> <75998f7c-93d2-4b98-bb53-8d858b2c108e@redhat.com> <57f9480c-2f8c-4be8-864c-406fec917eb1@redhat.com> MIME-Version: 1.0 In-Reply-To: <57f9480c-2f8c-4be8-864c-406fec917eb1@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: gFkUnXnFs9cSwZ5Iz-UXpwN84iRyDAKf5aFOLxOcOGI_1745934259 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 63BAAA0009 X-Stat-Signature: fczmn6oheyox9ugdqz19pc8rit7grr6r X-Rspam-User: X-HE-Tag: 1745934262-618264 X-HE-Meta: U2FsdGVkX1/7bB8r0KmV44pwF+43mxgG0v+999Ttf4yIOgqfViFzdvjfAzkPUvcvK8lbLF1pS3oeCs7ap7IRObI0MLbX9a9BH6UmRsmY9vNUY+V3QfHxpmD21NjT6Q1VDPZ4lWfHBFvNRDCrbKk6rkHamN90zBYG7ZDpA0J82RpRLo5KDmgNIRtrHY+uRb1gJlZboeqoKTroFQMgqOMfEPSYs88GB55l0L6KP7bVjtpM/o8Z0NTfPc4U8zABqVS9lIKxsPNAyfFHMLiU/5uLRgfGPAFWSqcd2KXmx9TED65rn9YW/MqmWCqaHjJmT5zPTKVph0UoO1PNG606aP6eG5QlCOtMNs6sw56QwJqGgl5l0MrScC2Xk5hAQBzg/gbtrPZPI1TMK4qHCtQLeVg7/l2qwfDz02a68nQa4B3fFA+29Dc+ntuFub8lz074ORL1gQPHKk1r0Drw2R9DgA75v5xAmzZFV+PgRPR4CruUhaBIMcLPKvpcrUxHMtB1vH1foyM70sH6DqiZUW6PhLMRH5ePf5noybmMhgW4GrX6PseahqplhZf71SkJk3TVi42bsrrhytfPBE5dOVGg8EcQjbrnqGNA1uV+PVZwUeEWD/4CVzefm0NywAeSP12x/tta3EkwCDX8xTih/nOqa3CqFHqR61z78oySrhBswvFs10K5DTuE6LgiXrUJo027dR+vJPYKbQJGNFbcdMm5SkaWzaKhEspfrIQBnPWu6JOuUPCSN/K8DZaNS9wRrETAw8hCiRgjliQNdk35C+qKLCMbvbGHMilUHjQgvfCFjGtR7a0q+ZkWXOS1ZA3/w5skU6huExPJlYv4ai2ThxJJ6IpYV8v5M7k8EoUz/wW1KULFvDHjW5/JQ09wa7D4x+h5WYOr+s615q0ak8tzPV24yZQd7VAdIUyekSzGugNe99RUggVotWSwEhhhmFoBvWIlttxN1d52EgkYn61nETite8E DYbICV7/ EQEO3rZ51T6YsraCxhdeh+vq8yiwNLg8g3tv0s+G4l9dz82KHEEoSuCui/fKy+jREfHrUJNR6eywwHWC+B+ssZtcpfN3RlLjws7w77oKaPlFyhMD9MTq3cPE10mpJpvgTLuRrwlVj/4g7oTXDuKTe+KvvEdh0+fcB/+BygIOvfGZW81HBuaEtcNc6Ne5RdjrGgGlIIk/DJiRttmJI3nWcr1vldE3Yja1vtSVebMRKuxvsUqD+/ArKlXskEPTsb7u1MJT9TwRtKSNPB8b6sr0Qxh5MBOCGNtFJgsNuigCdZ/9nDf/6SHBa830MCC5Qqv1czVlFd4YSryatR/dEislNDnjbsNFD74OWT9jQjubh5iVFr3LQ7Lrp7BUpQLzk1+ht4qNCj74UH1ZNgZnzbWpgNtnbt9X+qRWoYFL9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 28, 2025 at 10:37:49PM +0200, David Hildenbrand wrote: > On 28.04.25 18:21, Peter Xu wrote: > > On Mon, Apr 28, 2025 at 04:58:46PM +0200, David Hildenbrand wrote: > > > > > > > > What it does on PAT (only implementation so far ...) is looking up the > > > > > memory type to select the caching mode that can be use. > > > > > > > > > > "sanitize" was IMHO a good fit, because we must make sure that we don't use > > > > > the wrong caching mode. > > > > > > > > > > update/setup/... don't make that quite clear. Any other suggestions? > > > > > > > > I'm very poor on naming.. :( So far anything seems slightly better than > > > > sanitize to me, as the word "sanitize" is actually also used in memtype.c > > > > for other purpose.. see sanitize_phys(). > > > > > > Sure, one can sanitize a lot of things. Here it's the cachemode/pgrpot, in > > > the other functions it's an address. > > > > > > Likely we should just call it pfnmap_X_cachemode()/ > > > > > > Set/update don't really fit for X in case pfnmap_X_cachemode() is a NOP. > > > > > > pfnmap_setup_cachemode() ? Hm. > > > > Sounds good here. > > Okay, I'll use that one. If ever something else besides PAT would require > different semantics, they can bother with finding a better name :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > + * @pfn: the start of the pfn range > > > > > > > + * @size: the size of the pfn range > > > > > > > + * @prot: the pgprot to sanitize > > > > > > > + * > > > > > > > + * Sanitize the given pgprot for a pfn range, for example, adjusting the > > > > > > > + * cachemode. > > > > > > > + * > > > > > > > + * This function cannot fail for a single page, but can fail for multiple > > > > > > > + * pages. > > > > > > > + * > > > > > > > + * Returns 0 on success and -EINVAL on error. > > > > > > > + */ > > > > > > > +int pfnmap_sanitize_pgprot(unsigned long pfn, unsigned long size, > > > > > > > + pgprot_t *prot); > > > > > > > extern int track_pfn_copy(struct vm_area_struct *dst_vma, > > > > > > > struct vm_area_struct *src_vma, unsigned long *pfn); > > > > > > > extern void untrack_pfn_copy(struct vm_area_struct *dst_vma, > > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > > > > > index fdcf0a6049b9f..b8ae5e1493315 100644 > > > > > > > --- a/mm/huge_memory.c > > > > > > > +++ b/mm/huge_memory.c > > > > > > > @@ -1455,7 +1455,9 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) > > > > > > > return VM_FAULT_OOM; > > > > > > > } > > > > > > > - track_pfn_insert(vma, &pgprot, pfn); > > > > > > > + if (pfnmap_sanitize_pgprot(pfn_t_to_pfn(pfn), PAGE_SIZE, &pgprot)) > > > > > > > + return VM_FAULT_FALLBACK; > > > > > > > > > > > > Would "pgtable" leak if it fails? If it's PAGE_SIZE, IIUC it won't ever > > > > > > trigger, though. > > > > > > > > > > > > Maybe we could have a "void pfnmap_sanitize_pgprot_pfn(&pgprot, pfn)" to > > > > > > replace track_pfn_insert() and never fail? Dropping vma ref is definitely > > > > > > a win already in all cases. > > > > > > > > > > It could be a simple wrapper around pfnmap_sanitize_pgprot(), yes. That's > > > > > certainly helpful for the single-page case. > > > > > > > > > > Regarding never failing here: we should check the whole range. We have to > > > > > make sure that none of the pages has a memory type / caching mode that is > > > > > incompatible with what we setup. > > > > > > > > Would it happen in real world? > > > > > IIUC per-vma registration needs to happen first, which checks for > > > memtype > > > > conflicts in the first place, or reserve_pfn_range() could already have > > > > failed. > > > > > Here it's the fault path looking up the memtype, so I would expect it is > > > > guaranteed all pfns under the same vma is following the verified (and same) > > > > memtype? > > > > > > The whole point of track_pfn_insert() is that it is used when we *don't* use > > > reserve_pfn_range()->track_pfn_remap(), no? > > > > > > track_pfn_remap() would check the whole range that gets mapped, so > > > track_pfn_insert() user must similarly check the whole range that gets > > > mapped. > > > > > > Note that even track_pfn_insert() is already pretty clear on the intended > > > usage: "called when a _new_ single pfn is established" > > > > We need to define "new" then.. But I agree it's not crystal clear at > > least. I think I just wasn't the first to assume it was reserved, see this > > (especially, the "Expectation" part..): > > > > commit 5180da410db6369d1f95c9014da1c9bc33fb043e > > Author: Suresh Siddha > > Date: Mon Oct 8 16:28:29 2012 -0700 > > > > x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn > > With PAT enabled, vm_insert_pfn() looks up the existing pfn memory > > attribute and uses it. Expectation is that the driver reserves the > > memory attributes for the pfn before calling vm_insert_pfn(). > > It's all confusing. > > We do have the following functions relevant in pat code: > > (1) memtype_reserve(): used by ioremap and set_memory_XX > > (2) memtype_reserve_io(): used by iomap > > (3) reserve_pfn_range(): only remap_pfn_range() calls it > > (4) arch_io_reserve_memtype_wc() > > > Which one would perform the reservation for, say, vfio? My understanding is it was done via barmap. See this stack: vfio_pci_core_mmap pci_iomap pci_iomap_range ... __ioremap_caller memtype_reserve > > > I agree that if there would be a guarantee/expectation that all PFNs have > the same memtype (from previous reservation), it would be sufficient to > check a single PFN, and we could document that. I just don't easily see > where that reservation is happening. > > So a pointer to that would be appreciated! I am not aware of any pointer.. maybe others could chime in. IMHO, if there's anything uncertain, for this one we could always decouple this issue from the core issue you're working on, so at least it keeps the old behavior (which is pure lookup on pfn injections) until a solid issue occurs? It avoids the case where we could introduce unnecessary code but then it's much harder to justify a removal. What do you think? Thanks, -- Peter Xu