From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C67F4C433E2 for ; Mon, 14 Sep 2020 23:53:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 54DCD206B6 for ; Mon, 14 Sep 2020 23:53:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="dbMe0ECu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 54DCD206B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8A800900002; Mon, 14 Sep 2020 19:53:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 832206B008A; Mon, 14 Sep 2020 19:53:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F92C900002; Mon, 14 Sep 2020 19:53:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id 53FAA6B0089 for ; Mon, 14 Sep 2020 19:53:31 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 10ECF3630 for ; Mon, 14 Sep 2020 23:53:31 +0000 (UTC) X-FDA: 77263321422.22.rain61_5106f022710c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id CC90018038E67 for ; Mon, 14 Sep 2020 23:53:30 +0000 (UTC) X-HE-Tag: rain61_5106f022710c X-Filterd-Recvd-Size: 8403 Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 14 Sep 2020 23:53:29 +0000 (UTC) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 14 Sep 2020 16:51:08 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 14 Sep 2020 16:53:28 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 14 Sep 2020 16:53:28 -0700 Received: from rcampbell-dev.nvidia.com (172.20.13.39) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 14 Sep 2020 23:53:26 +0000 Subject: Re: [PATCH] mm: remove extra ZONE_DEVICE struct page refcount To: Dan Williams CC: Linux MM , , , Linux Kernel Mailing List , Ira Weiny , "Matthew Wilcox" , Jerome Glisse , "John Hubbard" , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , "Bharata B Rao" , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , "Andrew Morton" References: <20200914224509.17699-1-rcampbell@nvidia.com> X-Nvconfidentiality: public From: Ralph Campbell Message-ID: <10b4b85c-f1e9-b6b5-74cd-6190ee0aca5d@nvidia.com> Date: Mon, 14 Sep 2020 16:53:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1600127468; bh=GSb48WQPQrkgi3bBuccaFuKU67HsPRSlHk+Uq/1GA1o=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=dbMe0ECus2gTUFHe8/EBdTG4bM7a+Szx8krahhX/k29r99HHbmD68ATPlhtwnTF9n C7TC982i8Jc/5aPiL2zAOpmuksKWq48vM3OHgCxO7MnVYDNko4Iq++kuBOWAO9LzZE Dqg5Q+fKWQ96YvBB6v9Oghw9EljXDcrdFJBKfe1tGWveRagxMvjZr/nq3WA2le6AMr h82lKgJNzBp+2CsP6rGlL6SZSf457Gi4YF79V3sMfANnPLN9Hbe7qsSqwbxPIU4EpV 0zuB+3QAuNzfS4e/E7mmwdT5zpkubVabPUTY83gjy8ryR1p3huULeumACtmkH4AsM0 LXVytDljms0fA== X-Rspamd-Queue-Id: CC90018038E67 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/14/20 4:10 PM, Dan Williams wrote: > On Mon, Sep 14, 2020 at 3:45 PM Ralph Campbell wrote: >> >> ZONE_DEVICE struct pages have an extra reference count that complicates the >> code for put_page() and several places in the kernel that need to check the >> reference count to see that a page is not being used (gup, compaction, >> migration, etc.). Clean up the code so the reference count doesn't need to >> be treated specially for ZONE_DEVICE. >> >> Signed-off-by: Ralph Campbell >> --- >> >> Matthew Wilcox, Ira Weiny, and others have complained that ZONE_DEVICE >> struct page reference counting is ugly/broken. This is my attempt to >> fix it and it works for the HMM migration self tests. > > Can you link to a technical description of what's broken? Or better > yet, summarize that argument in the changelog? > >> I'm only sending this out as a RFC since I'm not that familiar with the >> DAX, PMEM, XEN, and other uses of ZONE_DEVICE struct pages allocated >> with devm_memremap_pages() or memremap_pages() but my best reading of >> the code looks like it might be OK. I could use help testing these >> configurations. > > Back in the 4.15 days I could not convince myself that some code paths > blindly assumed that pages with refcount==0 were on an lru list. Since > then, struct page has been reorganized to not collide the ->pgmap back > pointer with the ->lru list and there have been other cleanups for > page pinning that might make this incremental cleanup viable. > > You also need to fix up ext4_break_layouts() and > xfs_break_dax_layouts() to expect ->_refcount is 0 instead of 1. This > also needs some fstests exposure. Got it. Thanks! >> I have a modified THP migration patch series that applies on top of >> this one and is cleaner since I don't have to add code to handle the >> +1 reference count. The link below is for the earlier v2: >> ("mm/hmm/nouveau: add THP migration to migrate_vma_*") >> https://lore.kernel.org/linux-mm/20200902165830.5367-1-rcampbell@nvidia.com >> >> >> arch/powerpc/kvm/book3s_hv_uvmem.c | 1 - >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 - >> include/linux/memremap.h | 6 +-- >> include/linux/mm.h | 39 --------------- >> lib/test_hmm.c | 1 - >> mm/gup.c | 44 ----------------- >> mm/memremap.c | 20 ++++---- >> mm/migrate.c | 5 -- >> mm/swap.c | 66 +++++++++++--------------- >> 9 files changed, 41 insertions(+), 142 deletions(-) > > This diffstat is indeed appealing. > >> >> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >> index 84e5a2dc8be5..00d97050d7ff 100644 >> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >> @@ -711,7 +711,6 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) >> >> dpage = pfn_to_page(uvmem_pfn); >> dpage->zone_device_data = pvt; >> - get_page(dpage); >> lock_page(dpage); >> return dpage; >> out_clear: >> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> index a13c6215bba8..2a4bbe01a455 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> @@ -324,7 +324,6 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) >> return NULL; >> } >> >> - get_page(page); >> lock_page(page); >> return page; >> } >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> index 4e9c738f4b31..7dd9802d2612 100644 >> --- a/include/linux/memremap.h >> +++ b/include/linux/memremap.h >> @@ -67,9 +67,9 @@ enum memory_type { >> >> struct dev_pagemap_ops { >> /* >> - * Called once the page refcount reaches 1. (ZONE_DEVICE pages never >> - * reach 0 refcount unless there is a refcount bug. This allows the >> - * device driver to implement its own memory management.) >> + * Called once the page refcount reaches 0. The reference count is >> + * reset to 1 before calling page_free(). This allows the >> + * device driver to implement its own memory management. > > I'd clarify the order events / responsibility of the common core > page_free() and the device specific page_free(). At the same time, why > not update drivers to expect that the page is already refcount==0 on > entry? Seems odd to go through all this trouble to make the reference > count appear to be zero to the wider kernel but expect that drivers > get a fake reference on entry to their ->page_free() callbacks. Good point. Since set_page_refcounted() is defined in mm_interal.h I would have to move the definition to someplace like page_ref.h or have the drivers cal init_page_count() or set_page_count() since get_page() calls VM_BUG_ON_PAGE() if refcount == 0. I'll move set_page_refcounted() since that is what the page allocator uses and seems named for the purpose.