From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73938C5DF60 for ; Fri, 8 Nov 2019 05:10:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1734721848 for ; Fri, 8 Nov 2019 05:10:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="RkTDhtjy" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1734721848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F40D6B0005; Fri, 8 Nov 2019 00:10:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CCEB6B0006; Fri, 8 Nov 2019 00:10:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70E456B0007; Fri, 8 Nov 2019 00:10:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 5BA566B0005 for ; Fri, 8 Nov 2019 00:10:01 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 086AE824999B for ; Fri, 8 Nov 2019 05:10:01 +0000 (UTC) X-FDA: 76131933402.05.night05_746895d28be33 X-HE-Tag: night05_746895d28be33 X-Filterd-Recvd-Size: 9182 Received: from mail-oi1-f193.google.com (mail-oi1-f193.google.com [209.85.167.193]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Nov 2019 05:09:59 +0000 (UTC) Received: by mail-oi1-f193.google.com with SMTP id a14so4191729oid.5 for ; Thu, 07 Nov 2019 21:09:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5hxLqM7wbQepnmpAkx3kQrqUx9W7V5kbuct3VgNLCfE=; b=RkTDhtjykB7j2cuaWtNSU3sVjWoWCIze54YYFBujyaWCIQJkapqLD4MeV2dhxoJWMC 7xv0dkRHVnZYw3zW1L+ru2fry1bNNwjg7RgLcYQ3R76p8RcKcd+8i/rDWNxpbJmoe0cP mnziZ7Pb+0gfzkkbAysW5E2nHGmXIzy3yuQyUw867guN3bcJNvAgOPmHzqtFqwetCnH/ e8probWK/BURthbe/OQp8zpLcy69QDF3nmw/997epMxX6jS1VTwSL6XvfsYPUS4fTA1T BXUEKElSBd6VBrdE1u2lF9amw29zsQk0lrhqLqMzdjSSMdojeG9wjbJQfzg4/TSptXIp GJXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5hxLqM7wbQepnmpAkx3kQrqUx9W7V5kbuct3VgNLCfE=; b=XlKxxJalZAMEi4Rc63VpzNFmHUjdZCVJCFLADxJrvtilO2pv9u5VcusqELGrGfBpyx CmDK4Wxd/suwFEHxHkEXoPpnem2LKZnk+INpTQv1X41Xq398fRNF7NtRfb6l1m4+ISJP x3KVmvejmXJc26CdttYz4rQ9Oht1p5OUfPrVypF5UOgep9V256I3ihHTPFlIW61E+i4P pTkLxvG/571iopiZn83NB/pgYZOM70auzSmOERDJjMZHXHr3NmheDxR2NyPmCBHrZJ+Z +GBmjlirzlwoUnouT/WdRQbbNwlol9SBINkjmA8qO1RAECKNCVZAf/wpISVMgMn+ZPd1 VpOQ== X-Gm-Message-State: APjAAAXu1bFq11G1QXXseSOE5jtDfxeVCtHGGVNwW+x7cS3zBJdhr5Ku oAqkScKXVQLU6T/VzWDFh0jzfX4cb6YohO2pb10yTg== X-Google-Smtp-Source: APXvYqykH4Y+YpdJn/i8XOAkjaabFQF8UbaSg/jnSgqkUxCA1/ca81jEPmL8H19fMIeKE0KMVQC/PCci0jvS512PO8A= X-Received: by 2002:aca:55c1:: with SMTP id j184mr7906855oib.105.1573189797950; Thu, 07 Nov 2019 21:09:57 -0800 (PST) MIME-Version: 1.0 References: <0eb001e0-bb26-59bb-c514-d2f8a86a7eab@redhat.com> In-Reply-To: <0eb001e0-bb26-59bb-c514-d2f8a86a7eab@redhat.com> From: Dan Williams Date: Thu, 7 Nov 2019 21:09:46 -0800 Message-ID: Subject: Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes To: David Hildenbrand Cc: David Hildenbrand , Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sasha Levin , Sean Christopherson , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 7, 2019 at 2:07 PM David Hildenbrand wrote: > > On 07.11.19 19:22, David Hildenbrand wrote: > > > > > >> Am 07.11.2019 um 16:40 schrieb Dan Williams = : > >> > >> =EF=BB=BFOn Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand wrote: > >>> > >>> Right now, ZONE_DEVICE memory is always set PG_reserved. We want to > >>> change that. > >>> > >>> KVM has this weird use case that you can map anything from /dev/mem > >>> into the guest. pfn_valid() is not a reliable check whether the memma= p > >>> was initialized and can be touched. pfn_to_online_page() makes sure > >>> that we have an initialized memmap (and don't have ZONE_DEVICE memory= ). > >>> > >>> Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to= make > >>> sure the function produces the same result once we stop setting ZONE_= DEVICE > >>> pages PG_reserved. > >>> > >>> Cc: Alex Williamson > >>> Cc: Cornelia Huck > >>> Signed-off-by: David Hildenbrand > >>> --- > >>> drivers/vfio/vfio_iommu_type1.c | 10 ++++++++-- > >>> 1 file changed, 8 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iomm= u_type1.c > >>> index 2ada8e6cdb88..f8ce8c408ba8 100644 > >>> --- a/drivers/vfio/vfio_iommu_type1.c > >>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>> @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, = long npage, bool async) > >>> */ > >>> static bool is_invalid_reserved_pfn(unsigned long pfn) > >>> { > >>> - if (pfn_valid(pfn)) > >>> - return PageReserved(pfn_to_page(pfn)); > >>> + struct page *page =3D pfn_to_online_page(pfn); > >> > >> Ugh, I just realized this is not a safe conversion until > >> pfn_to_online_page() is moved over to subsection granularity. As it > >> stands it will return true for any ZONE_DEVICE pages that share a > >> section with boot memory. > > > > That should not happen right now and I commented back when you introduc= ed subsection support that I don=E2=80=99t want to have ZONE_DEVICE mixed w= ith online pages in a section. Having memory block devices that partially s= pan ZONE_DEVICE would be ... really weird. With something like pfn_active()= - as discussed - we could at least make this check work - but I am not sur= e if we really want to go down that path. In the worst case, some MB of RAM= are lost ... I guess this needs more thought. > > > > I just realized the "boot memory" part. Is that a real thing? IOW, can > we have ZONE_DEVICE falling into a memory block (with holes)? I somewhat > have doubts that this would work ... One of the real world failure cases that started the subsection effect is that Persistent Memory collides with System RAM on a 64MB boundary on shipping platforms. System RAM ends on a 64MB boundary and due to a lack of memory controller resources PMEM is mapped contiguously at the end of that boundary. Some more details in the subsection cover letter / changelogs [1] [2]. It's not sufficient to just lose some memory, that's the broken implementation that lead to the subsection work because the lost memory may change from one boot to the next and software can't reliably inject a padding that conforms to the x86 128MB section constraint. Suffice to say I think we need your pfn_active() to get subsection granularity pfn_to_online_page() before PageReserved() can be removed. [1]: https://lore.kernel.org/linux-mm/156092349300.979959.17603710711957735= 135.stgit@dwillia2-desk3.amr.corp.intel.com/ [2]: https://lore.kernel.org/linux-mm/156092354368.979959.62324439234409523= 59.stgit@dwillia2-desk3.amr.corp.intel.com/