From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B64BC433F5 for ; Fri, 14 Jan 2022 22:04:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F1C46B00B9; Fri, 14 Jan 2022 17:04:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 079C36B00BB; Fri, 14 Jan 2022 17:04:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E37D96B00BC; Fri, 14 Jan 2022 17:04:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id CE5566B00B9 for ; Fri, 14 Jan 2022 17:04:25 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 947DF998B4 for ; Fri, 14 Jan 2022 22:04:25 +0000 (UTC) X-FDA: 79030272090.23.C27E60C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id 1F7BE140004 for ; Fri, 14 Jan 2022 22:04:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2894E61FF8; Fri, 14 Jan 2022 22:04:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBE90C36AEC; Fri, 14 Jan 2022 22:04:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1642197863; bh=9iAiEeb4UvT+jYIwxrSfdvyYYMjAoD854bj1LI2V/Mc=; h=Date:From:To:Subject:In-Reply-To:From; b=pe4cG6PYMqGSxf/UQbpZYiyzK8P771o3RD1deYlCqlZE/e5T7cbJjQPZ9KXtQwR0m Kvjhg7hRlcgb5ccqqtSIuhr7aNZbXRVw3exccJSPi6XkIdp0qWQgvIRr6oS7u0KR+T X+PFAQaarWwWkiwsPOJcc1uN9ayKOWB2d+f0q+9Y= Date: Fri, 14 Jan 2022 14:04:22 -0800 From: Andrew Morton To: akpm@linux-foundation.org, corbet@lwn.net, dan.j.williams@intel.com, dave.jiang@intel.com, hch@lst.de, jane.chu@oracle.com, jgg@nvidia.com, jgg@ziepe.ca, jhubbard@nvidia.com, joao.m.martins@oracle.com, linux-mm@kvack.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, songmuchun@bytedance.com, torvalds@linux-foundation.org, vishal.l.verma@intel.com, willy@infradead.org Subject: [patch 028/146] mm/memremap: add ZONE_DEVICE support for compound pages Message-ID: <20220114220422.NTYjq91hd%akpm@linux-foundation.org> In-Reply-To: <20220114140222.6b14f0061194d3200000c52d@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1F7BE140004 X-Stat-Signature: p53714zb4i7x7fafzi8ehuor93ib1hs1 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=pe4cG6PY; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-HE-Tag: 1642197864-46059 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Joao Martins Subject: mm/memremap: add ZONE_DEVICE support for compound pages Add a new @vmemmap_shift property for struct dev_pagemap which specifies that a devmap is composed of a set of compound pages of order @vmemmap_shift, instead of base pages. When a compound page devmap is requested, all but the first page are initialised as tail pages instead of order-0 pages. For certain ZONE_DEVICE users like device-dax which have a fixed page size, this creates an opportunity to optimize GUP and GUP-fast walkers, treating it the same way as THP or hugetlb pages. Additionally, commit 7118fc2906e2 ("hugetlb: address ref count racing in prep_compound_gigantic_page") removed set_page_count() because the setting of page ref count to zero was redundant. devmap pages don't come from page allocator though and only head page refcount is used for compound pages, hence initialize tail page count to zero. Link: https://lkml.kernel.org/r/20211202204422.26777-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins Reviewed-by: Dan Williams Cc: Christoph Hellwig Cc: Dave Jiang Cc: Jane Chu Cc: Jason Gunthorpe Cc: Jason Gunthorpe Cc: John Hubbard Cc: Jonathan Corbet Cc: Matthew Wilcox (Oracle) Cc: Mike Kravetz Cc: Muchun Song Cc: Naoya Horiguchi Cc: Vishal Verma Signed-off-by: Andrew Morton --- include/linux/memremap.h | 11 ++++++++++ mm/memremap.c | 18 +++++++++++------ mm/page_alloc.c | 38 ++++++++++++++++++++++++++++++++++++- 3 files changed, 60 insertions(+), 7 deletions(-) --- a/include/linux/memremap.h~mm-memremap-add-zone_device-support-for-compound-pages +++ a/include/linux/memremap.h @@ -99,6 +99,11 @@ struct dev_pagemap_ops { * @done: completion for @internal_ref * @type: memory type: see MEMORY_* in memory_hotplug.h * @flags: PGMAP_* flags to specify defailed behavior + * @vmemmap_shift: structural definition of how the vmemmap page metadata + * is populated, specifically the metadata page order. + * A zero value (default) uses base pages as the vmemmap metadata + * representation. A bigger value will set up compound struct pages + * of the requested order value. * @ops: method table * @owner: an opaque pointer identifying the entity that manages this * instance. Used by various helpers to make sure that no @@ -114,6 +119,7 @@ struct dev_pagemap { struct completion done; enum memory_type type; unsigned int flags; + unsigned long vmemmap_shift; const struct dev_pagemap_ops *ops; void *owner; int nr_range; @@ -130,6 +136,11 @@ static inline struct vmem_altmap *pgmap_ return NULL; } +static inline unsigned long pgmap_vmemmap_nr(struct dev_pagemap *pgmap) +{ + return 1 << pgmap->vmemmap_shift; +} + #ifdef CONFIG_ZONE_DEVICE void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); --- a/mm/memremap.c~mm-memremap-add-zone_device-support-for-compound-pages +++ a/mm/memremap.c @@ -102,15 +102,22 @@ static unsigned long pfn_end(struct dev_ return (range->start + range_len(range)) >> PAGE_SHIFT; } -static unsigned long pfn_next(unsigned long pfn) +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn) { - if (pfn % 1024 == 0) + if (pfn % (1024 << pgmap->vmemmap_shift)) cond_resched(); - return pfn + 1; + return pfn + pgmap_vmemmap_nr(pgmap); +} + +static unsigned long pfn_len(struct dev_pagemap *pgmap, unsigned long range_id) +{ + return (pfn_end(pgmap, range_id) - + pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift; } #define for_each_device_pfn(pfn, map, i) \ - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn)) + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); \ + pfn = pfn_next(map, pfn)) static void dev_pagemap_kill(struct dev_pagemap *pgmap) { @@ -295,8 +302,7 @@ static int pagemap_range(struct dev_page memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], PHYS_PFN(range->start), PHYS_PFN(range_len(range)), pgmap); - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) - - pfn_first(pgmap, range_id)); + percpu_ref_get_many(pgmap->ref, pfn_len(pgmap, range_id)); return 0; err_add_memory: --- a/mm/page_alloc.c~mm-memremap-add-zone_device-support-for-compound-pages +++ a/mm/page_alloc.c @@ -6612,6 +6612,35 @@ static void __ref __init_zone_device_pag } } +static void __ref memmap_init_compound(struct page *head, + unsigned long head_pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap, + unsigned long nr_pages) +{ + unsigned long pfn, end_pfn = head_pfn + nr_pages; + unsigned int order = pgmap->vmemmap_shift; + + __SetPageHead(head); + for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) { + struct page *page = pfn_to_page(pfn); + + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + prep_compound_tail(head, pfn - head_pfn); + set_page_count(page, 0); + + /* + * The first tail page stores compound_mapcount_ptr() and + * compound_order() and the second tail page stores + * compound_pincount_ptr(). Call prep_compound_head() after + * the first and second tail pages have been initialized to + * not have the data overwritten. + */ + if (pfn == head_pfn + 2) + prep_compound_head(head, order); + } +} + void __ref memmap_init_zone_device(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, @@ -6620,6 +6649,7 @@ void __ref memmap_init_zone_device(struc unsigned long pfn, end_pfn = start_pfn + nr_pages; struct pglist_data *pgdat = zone->zone_pgdat; struct vmem_altmap *altmap = pgmap_altmap(pgmap); + unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap); unsigned long zone_idx = zone_idx(zone); unsigned long start = jiffies; int nid = pgdat->node_id; @@ -6637,10 +6667,16 @@ void __ref memmap_init_zone_device(struc nr_pages = end_pfn - start_pfn; } - for (pfn = start_pfn; pfn < end_pfn; pfn++) { + for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) { struct page *page = pfn_to_page(pfn); __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + + if (pfns_per_compound == 1) + continue; + + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, + pfns_per_compound); } pr_info("%s initialised %lu pages in %ums\n", __func__, _