From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFDF5C5DF60 for ; Tue, 5 Nov 2019 23:42:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6E1702087E for ; Tue, 5 Nov 2019 23:42:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6E1702087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0471C6B0007; Tue, 5 Nov 2019 18:42:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F11536B0008; Tue, 5 Nov 2019 18:42:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB2EE6B000A; Tue, 5 Nov 2019 18:42:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id C0F386B0007 for ; Tue, 5 Nov 2019 18:42:15 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 68B3A180AD81A for ; Tue, 5 Nov 2019 23:42:15 +0000 (UTC) X-FDA: 76123849830.09.band48_309a7e1258e34 X-HE-Tag: band48_309a7e1258e34 X-Filterd-Recvd-Size: 6352 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 5 Nov 2019 23:42:13 +0000 (UTC) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2019 15:42:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,271,1569308400"; d="scan'208";a="200541599" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga008.fm.intel.com with ESMTP; 05 Nov 2019 15:42:08 -0800 Date: Tue, 5 Nov 2019 15:42:08 -0800 From: Sean Christopherson To: Dan Williams Cc: David Hildenbrand , Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , Radim =?utf-8?B?S3LEjW3DocWZ?= , Sasha Levin , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing , Adam Borowski Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes Message-ID: <20191105234208.GH23297@linux.intel.com> References: <20191024120938.11237-4-david@redhat.com> <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 05, 2019 at 03:30:00PM -0800, Dan Williams wrote: > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > wrote: > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > had to be said :/ ). Luckily, this should be independent of the > > > > PG_reserved thingy AFAIKs. > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > page count gets mismanaged and leads to the reported hang. > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > and so never puts its reference to ZONE_DEVICE pages. > > Oh, yeah, that's busted. > > > My transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > comments were for a post-patch/series scenario wheren PageReserved() is > > no longer true for ZONE_DEVICE pages. > > Ah, ok, for that David is preserving kvm_is_reserved_pfn() returning > true for ZONE_DEVICE because pfn_to_online_page() will fail for > ZONE_DEVICE. But David's proposed fix for the above refcount bug is to omit the patch so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems like the right thing to do, including for thp_adjust(), e.g. it would naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is mapped with a huge page (2mb or above) in the host. The only hiccup is figuring out how to correctly transfer the reference.