From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6C87C433DF for ; Fri, 9 Oct 2020 16:53:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7A94D22251 for ; Fri, 9 Oct 2020 16:53:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A94D22251 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C77B66B0071; Fri, 9 Oct 2020 12:53:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4E5F900002; Fri, 9 Oct 2020 12:53:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B40656B0074; Fri, 9 Oct 2020 12:53:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 86E866B0071 for ; Fri, 9 Oct 2020 12:53:55 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1C9F8180AD80F for ; Fri, 9 Oct 2020 16:53:55 +0000 (UTC) X-FDA: 77352984030.21.cakes34_400d4b1271e2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id EC811180442C0 for ; Fri, 9 Oct 2020 16:53:54 +0000 (UTC) X-HE-Tag: cakes34_400d4b1271e2 X-Filterd-Recvd-Size: 5071 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 9 Oct 2020 16:53:53 +0000 (UTC) IronPort-SDR: uhkynwtSjCKqIQV+tPNKbilJCkAebRmcBoxg2xxN9/zSDXyHdwz7kLtTvk+TC+CH1dH1pUHOa1 GPnogscWuHCQ== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="250205369" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="250205369" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 09:53:51 -0700 IronPort-SDR: cZPsX0pTjYEGuD0ArdpSCuRg3di0O8Q5VtMLwajHT3ZQMVDnnXcOC/xpo+DvF5F75jcclyC3Q9 zRwAKC+0JvpA== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="528996042" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 09:53:50 -0700 Date: Fri, 9 Oct 2020 09:53:50 -0700 From: Ira Weiny To: Ralph Campbell Cc: linux-mm@kvack.org, kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, Dan Williams , Matthew Wilcox , Jerome Glisse , John Hubbard , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , Bharata B Rao , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , Andrew Morton Subject: Re: [PATCH] mm: make device private reference counts zero based Message-ID: <20201009165350.GV2046448@iweiny-DESK2.sc.intel.com> References: <20201008172544.29905-1-rcampbell@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201008172544.29905-1-rcampbell@nvidia.com> User-Agent: Mutt/1.11.1 (2018-12-01) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 08, 2020 at 10:25:44AM -0700, Ralph Campbell wrote: > ZONE_DEVICE struct pages have an extra reference count that complicates the > code for put_page() and several places in the kernel that need to check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't need to > be treated specially for device private pages, leaving DAX as still being > a special case. What about the check in mc_handle_swap_pte()? mm/memcontrol.c: 5513 /* 5514 * MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have 5515 * a refcount of 1 when free (unlike normal page) 5516 */ 5517 if (!page_ref_add_unless(page, 1, 1)) 5518 return NULL; ... does that need to change? Perhaps just the comment? > > Signed-off-by: Ralph Campbell > --- > [snip] > > void put_devmap_managed_page(struct page *page); > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > index e151a7f10519..bf92a261fa6f 100644 > --- a/lib/test_hmm.c > +++ b/lib/test_hmm.c > @@ -509,10 +509,15 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, > mdevice->devmem_count * (DEVMEM_CHUNK_SIZE / (1024 * 1024)), > pfn_first, pfn_last); > > + /* > + * Pages are created with an initial reference count of one but should > + * have a reference count of zero while in the free state. > + */ > spin_lock(&mdevice->lock); > for (pfn = pfn_first; pfn < pfn_last; pfn++) { > struct page *page = pfn_to_page(pfn); > > + set_page_count(page, 0); This confuses me. How does this and init_page_count() not confuse the buddy allocator? Don't you have to reset the refcount somewhere after the test? > page->zone_device_data = mdevice->free_pages; > mdevice->free_pages = page; > } > @@ -561,7 +566,7 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) > } > > dpage->zone_device_data = rpage; > - get_page(dpage); > + init_page_count(dpage); > lock_page(dpage); > return dpage; > > diff --git a/mm/internal.h b/mm/internal.h > index c43ccdddb0f6..e1443b73aa9b 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > [snip] > diff --git a/mm/swap.c b/mm/swap.c > index 0eb057141a04..93d880c6f73c 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -116,12 +116,11 @@ static void __put_compound_page(struct page *page) > void __put_page(struct page *page) > { > if (is_zone_device_page(page)) { > - put_dev_pagemap(page->pgmap); > - > /* > * The page belongs to the device that created pgmap. Do > * not return it to page allocator. > */ > + free_zone_device_page(page); I really like this. Ira