From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43293C5AE5A for ; Wed, 28 Aug 2024 14:41:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0FDB6B007B; Wed, 28 Aug 2024 10:41:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BFD76B0082; Wed, 28 Aug 2024 10:41:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85FE86B0083; Wed, 28 Aug 2024 10:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6874D6B007B for ; Wed, 28 Aug 2024 10:41:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F1BA0A010E for ; Wed, 28 Aug 2024 14:41:50 +0000 (UTC) X-FDA: 82501918380.20.7D12AD8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id D80BA40005 for ; Wed, 28 Aug 2024 14:41:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="EuO89/Cj"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724856087; a=rsa-sha256; cv=none; b=HKWniKwAcB+D09dZqJzasvr78kPDmg1t4k6nU6adIbN4ziw0OPK/tG33gaN7cex1HoFtjh mcsmIHhKy5p4QLo+SvSKoWujMwgaSGU/+1zlnKMLocF7+wOJHMo47VY+PeG/Lb2Jwbv+zL MPfnlOWUqva8z0h+MK419tnl3VDCNq8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="EuO89/Cj"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724856087; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hvZdgP0NUcxYPpTm2+LyySi+WaC5CrwfGRx24HOtBqA=; b=Q/AgQhIVzrX8ezofuZ8MFgQDYTM55bBQLzbrcEPiMDGnon9HjNugNOxBfN7/c4DBvxG6Oo NdZC3BYWimiNHYc/AKMKVbUQ+GXzR+qL2Mgnli7l6ji4c0Mysv40WvvT3wkU0sQgcnI0ra dpK9knd0XuTQ8xBaKd6yEwvQeIsZOIg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724856107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hvZdgP0NUcxYPpTm2+LyySi+WaC5CrwfGRx24HOtBqA=; b=EuO89/CjJ8OnyP0IZA62cCx4TTZ7I06pPTQTifEvsutCncY4xJJ8NAumg1aPNl9xchY/E2 4hlnmnyKdbCfdRaPNMewL5kXKVFbJNocvGOE1WOm4EGe/hiTQKX98Cscn8w44EO+x4nRvI JrWeoBUzessd3IivdLgse4057vQ899c= Received: from mail-oi1-f199.google.com (mail-oi1-f199.google.com [209.85.167.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-691-mH3pOvVOOjmBwVqBb1IUCQ-1; Wed, 28 Aug 2024 10:41:45 -0400 X-MC-Unique: mH3pOvVOOjmBwVqBb1IUCQ-1 Received: by mail-oi1-f199.google.com with SMTP id 5614622812f47-3de10cb25a9so2215784b6e.2 for ; Wed, 28 Aug 2024 07:41:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724856105; x=1725460905; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hvZdgP0NUcxYPpTm2+LyySi+WaC5CrwfGRx24HOtBqA=; b=ONgT+XDsATdVgbBWXu862i/nOSv1x11ihRUFtEYF4MWQic3a5YKBFzze7gfkdSnNr/ ev08+8jYu0tOeup3wIQ2a1K5gxIIlnhYwjdalCBvakhGLFa/4VgfIxGTBwbxq5iXYN+g Z3od4P2CoA5w77Ag9qaKNGcDkO9iGLZZxuATGEMrwYeGcDp1BySC/bbxR1tEoKHcSoy4 8H5qdkKnIbn9qropMX+XAIVz2INeyrx7zAMejv5HKS4K34Lc84A2+5z0wvPmkkCl7dPY epEqrFpy2mJFuUppa5LNRItRfhNaHIFXR/0+qU5o1XtAY2owwte7Tz7tKIbpEHQcvwIS 9m1w== X-Forwarded-Encrypted: i=1; AJvYcCUXTacBNSpByluxWs8A4YMjDYN0LMB3OlUWfM+/m0+vOSPMHLqNOkQU1WGohzDxoFrm+ia7+b5s2A==@kvack.org X-Gm-Message-State: AOJu0YxQRRBu4+kOX8WbGt1bQhuCD2YZIIQk9l7vDumYP3O3/iQZ9Jin XQiTxfNF6q1RnlUEoceFTnco9NEZPb3X5eIh4C+LzA4vNn8vclUtaSp6gGlvV8Gcm/nDPPsL+g5 Wz+uffzri6YOqfB4SyivI/b/19GiyM8RtgUsXtrSNfY1YE3wQ X-Received: by 2002:a05:6870:b1c9:b0:261:10b7:8c48 with SMTP id 586e51a60fabf-273e64e6a65mr18733935fac.27.1724856105116; Wed, 28 Aug 2024 07:41:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1e8zs2rKVQDk+rUuWOOwvpT4TAWto4m4s1MPtOxpMbOLBvZ55G4kPBLJOts7QICf1VZvizA== X-Received: by 2002:a05:6870:b1c9:b0:261:10b7:8c48 with SMTP id 586e51a60fabf-273e64e6a65mr18733899fac.27.1724856104728; Wed, 28 Aug 2024 07:41:44 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-273ce9e38c9sm3780542fac.19.2024.08.28.07.41.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Aug 2024 07:41:44 -0700 (PDT) Date: Wed, 28 Aug 2024 10:41:40 -0400 From: Peter Xu To: Jiaqi Yan Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Andrew Morton , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Yan Zhao , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 00/19] mm: Support huge pfnmaps Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: D80BA40005 X-Rspamd-Server: rspam01 X-Stat-Signature: obprjmzdsg19drz6zp3cnnyc39mjm9pu X-HE-Tag: 1724856107-577935 X-HE-Meta: U2FsdGVkX1+mhNwAWu3kWfsmg8YWYMLmRNj+7vwbpKLlzAHF9dn/jdR9JnlFjzF+d4ccUOhS+Fk/tqTf0k4ISr6/YNj8RXhMToROWMHCy3ACxq5W2evhWkpxUpOK6C9kX0BR3K/5cteKfQMwwa7KYy/N7VGg3lJxGRiEzTr3N5m3U1lCjWxUSRmHxGQ7Nv6ATqx6/aEwxnk1ejsQEdCBiFKcoidmKuOUg2PdeXwrkWm69sKZypul1u1nOFf1Q8Y+cR9nPYvIlUWtSZ8SL99ElSJst92mG3OKi4kul6Vd9UAr54JsNthBJeqp7WYkEB0SndfKq2WJI/JQpy8D04g5i7TbsU7uFOLwgaXdpW05N3HTqb8e2zyCzwYpOp4NTcIyET7AH9LOeTicSr/JJ/u/HJ7J02RVB2sdUp0rb1RhSzL+VnhYsvwlDufDirUhp/krZpnti9RHJDl6bclQcvawi4IcVrFmLM53ipsFCqL7/jb5Pdgk3nbC67HOiePmG5Y0i1NdpZbimytAybmlE9HyBnzknJYXMqiDKwrCkSpZDsjIdY5xuVLEN6V9A5ZDcy2ayYqCbM/Wyk0Ceu8KmJFPDOHKbFMUxgAvpiUuxq+Qz1oBf+yn2/NTUKiCXskBELkFVgheFvQtMMTiLvuHlSep9vUFEgQ8USKIYlZprZxmZrWrFZ4GV45qeGhZOnKvmBE5qwBj6JwcYGKVYuM3td4acgxopo6mUa3UsDuYJG1QrFncHGxCLXnJa1JQwj4YcDndk9CoDpz+P/4YhsPxN0PaFO5mFJKCzTouRBtEnQLAP83xi8eqWBVgDAY4JVCtOE96n1cI0ORxsyMK5VomdrY9Av+N9lTS+YSXiccTDH/o494BVjdV/1ZRv9kgJ3iDVsgaeuzhuorJkDXBVLYaHjbG+6vVT7GRZu/Suq0Fev5F6w6lOZcYkACo3s5MkqRVi7qRsDArasDpH/gYehd/0zr vAjXwSO0 mhHdfGOwPPPQrY2ldelcfll0Y6AnVj1wTxfI7i75WK7BJOemv729KbaXCtKooVZjyweISPXjDwOYb6QMO2tFAQEigTFGYCuEP0SGtYtrOZVnC0+Dopv8qIqo6+tW4urgMGrAzw6e/LS+LdW3gCxvZ/BNahkrIstU/JelMTaph/tEXh2yvMY4O8ltaHoKsZj1f283zHP5Bc8KdDtSpENmLLK6Ww3l0BWzx8pT9nozZ5qJbfnZdJsvz0qbSZIN0tPSqmpE3oRSDUMXgSvA2bHTk/SHf3zkoB8kEw37tLM/3VAuaKIN2j8W37LaLFyo83eQH+RmppxeiWt4snFoim65+0dBLPScs2TGOINKxIpQ2RI3RxG2Qd0xDKuNw9kcjAXKDFhmSZOKB4aGcgDYAnbA9FZGiqYxbRVhgbpgjW2oAm4MlC+fFY3GfqpIfVUWkKxw4cxs4RWC96LyMPzfXIIB4Y70Tr18okfMkh8sWuEFvlUZkJXxZmbL3E06XmQoZlKOlNvgd9EU0ZIZBFF24nQM7cY1Ukw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 27, 2024 at 05:42:21PM -0700, Jiaqi Yan wrote: > On Tue, Aug 27, 2024 at 3:57 PM Peter Xu wrote: > > > > On Tue, Aug 27, 2024 at 03:36:07PM -0700, Jiaqi Yan wrote: > > > Hi Peter, > > > > Hi, Jiaqi, > > > > > I am curious if there is any work needed for unmap_mapping_range? If a > > > driver hugely remap_pfn_range()ed at 1G granularity, can the driver > > > unmap at PAGE_SIZE granularity? For example, when handling a PFN is > > > > Yes it can, but it'll invoke the split_huge_pud() which default routes to > > removal of the whole pud right now (currently only covers either DAX > > mappings or huge pfnmaps; it won't for anonymous if it comes, for example). > > > > In that case it'll rely on the driver providing proper fault() / > > huge_fault() to refault things back with smaller sizes later when accessed > > again. > > I see, so the driver needs to drive the recovery process, and code > needs to be in the driver. > > But it seems to me the recovery process will be more or less the same > to different drivers? In that case does it make sense that > memory_failure do the common things for all drivers? > > Instead of removing the whole pud, can driver or memory_failure do > something similar to non-struct-page-version of split_huge_page? So > driver doesn't need to re-fault good pages back? I think we can, it's just that we don't yet have a valid use case. DAX is definitely fault-able. While for the new huge pfnmap, currently vfio is the only user, and vfio only requires to either zap all or map all. In that case there's no real need to ask for what you described yet. Meanwhile it's also faultable, so if / when needed it should hopefully still do the work properly. I believe it's not usual requirement too for most of the rest drivers, as most of them don't even support fault() afaiu. remap_pfn_range() can start to use huge mappings, however I'd expect they're mostly not ready for random tearing down of any MMIO mappings. It sounds doable to me though when there's a need of what you're describing, but I don't think I know well on the use case yet. > > > > > > > poisoned in the 1G mapping, it would be great if the mapping can be > > > splitted to 2M mappings + 4k mappings, so only the single poisoned PFN > > > is lost. (Pretty much like the past proposal* to use HGM** to improve > > > hugetlb's memory failure handling). > > > > Note that we're only talking about MMIO mappings here, in which case the > > PFN doesn't even have a struct page, so the whole poison idea shouldn't > > apply, afaiu. > > Yes, there won't be any struct page. Ankit proposed this patchset* for > handling poisoning. I wonder if someday the vfio-nvgrace-gpu-pci > driver adopts your change via new remap_pfn_range (install PMD/PUD > instead of PTE), and memory_failure_pfn still > unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, PAGE_SIZE, > 0), can it somehow just work and no re-fault needed? > > * https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@nvidia.com/#t I see now, interesting.. Thanks for the link. In that case of nvgpu usage, one way is to do as what you said; we can enhance the pmd/pud split for pfnmap, but maybe that's an overkill. I saw that the nvgpu will need a fault() anyway so as to detect poisoned PFNs, then it's also feasible that in the new nvgrace_gpu_vfio_pci_fault() when it supports huge pfnmaps it'll need to try to detect whether the whole faulting range contains any poisoned PFNs, then provide FALLBACK if so (rather than VM_FAULT_HWPOISON). E.g., when 4K of 2M is poisoned, we'll erase the 2M completely. When access happens, as long as the accessed 4K is not on top of the poisoned 4k, huge_fault() should still detect that there's 4k range poisoned, then it'll not inject pmd but return FALLBACK, then in the fault() it'll see the accessed 4k range is not poisoned, then install a pte. Thanks, -- Peter Xu