From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F872C433DB for ; Sat, 20 Feb 2021 06:17:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8CBD664EE0 for ; Sat, 20 Feb 2021 06:17:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CBD664EE0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C4E086B0005; Sat, 20 Feb 2021 01:17:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BFD3F6B006C; Sat, 20 Feb 2021 01:17:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B12F36B006E; Sat, 20 Feb 2021 01:17:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 973DA6B0005 for ; Sat, 20 Feb 2021 01:17:34 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 518951839A9C0 for ; Sat, 20 Feb 2021 06:17:34 +0000 (UTC) X-FDA: 77837639628.23.C5E1F37 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf17.hostedemail.com (Postfix) with ESMTP id 252DE407F8F1 for ; Sat, 20 Feb 2021 06:17:32 +0000 (UTC) Received: by mail-ej1-f47.google.com with SMTP id do6so18762281ejc.3 for ; Fri, 19 Feb 2021 22:17:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=E3GUJ8AJ+D9/QZUOveSIImTcQ3VTzMGJ8kOmAXaY/1Q=; b=ZVl2AYyA1YI1iu7OEc6NU2FnpnuDkfnc0P6lEQcVMJOjuZl12xj6dc+Qpw03rFfrMI hEXWN0lxyzzjz67euwvlOzBgDokU+KYEd2+qnHSl2lRl7FS6ajbIo+sLBFy1lDheesBu eVLDxdJ9Tagula5miCCk90HnecsZKWazQKj0cXhZo+ltLFdCLYR8kcQ9l1q/llXzcfw/ UM8g3kinQHURCCFH50OeviHoWDBsK/mCYhggRLPueGFGvWMA8uN0TbcjIGZp+43jt5sD IH6Ky2f6jpNPxa12Qy/Llrua5RZRlMJAo8vgHteJW/D5zX60v9lQ9NcJjblw8loyiZdo 6OFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E3GUJ8AJ+D9/QZUOveSIImTcQ3VTzMGJ8kOmAXaY/1Q=; b=FoxH/F7rPEiMd4mLULvO08H6eys+/U//yORidjJnOaSAzmbtZ6dXU7Yqj0rfUXyVfR p5eAj7XxWDcdhoZBoLlyYJWYpYAfcCCkv7BjtSfFYhNY6cGXLmsXl5v61gfSCQu9qGUx GlVn4t5HcpzBKoeehzmZjpfrtRaWGD0aktKviOPLUBoIxhE+j7mfr3I0p74mKTH496Oa vg3P0vg7jDOBa1L7ffcCeP9OqeZo9IOXkuGzHjHRS9EnWYduTHljzGqVQaoK/YQ89Lv0 cJNtiUIP7J+hsuZnAhYEDWTW/zuc+KpFd+7vWzD1C7JoA0tVeirPx1BLbeoJV0CN3fnN pmCQ== X-Gm-Message-State: AOAM530AaLTPNlhNBxeS4LUPQXIZSMrEgS7YmMDrzbW6ds0ShMIQcmta wg5pMZYVpmEIOz55QRTPw8zOZOXvU9uqLujQ+OXMGQ== X-Google-Smtp-Source: ABdhPJzXnuIFQPZDszy5V6szEjKYAKem1l0L1r+O3UQShTT40ZGgd4VWp4tqpTkv/iCVCyRy0pzEMF0fnR97xdtBze8= X-Received: by 2002:a17:906:e0b:: with SMTP id l11mr4756295eji.523.1613801852007; Fri, 19 Feb 2021 22:17:32 -0800 (PST) MIME-Version: 1.0 References: <20201208172901.17384-1-joao.m.martins@oracle.com> <20201208172901.17384-6-joao.m.martins@oracle.com> In-Reply-To: <20201208172901.17384-6-joao.m.martins@oracle.com> From: Dan Williams Date: Fri, 19 Feb 2021 22:17:24 -0800 Message-ID: Subject: Re: [PATCH RFC 4/9] mm/page_alloc: Reuse tail struct pages for compound pagemaps To: Joao Martins Cc: Linux MM , Ira Weiny , linux-nvdimm , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: r9srj8xw6cdez3ag8bfnbt955wo1y75w X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 252DE407F8F1 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mail-ej1-f47.google.com; client-ip=209.85.218.47 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613801852-795661 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 8, 2020 at 9:31 AM Joao Martins wrote: > > When PGMAP_COMPOUND is set, all pages are onlined at a given huge page > alignment and using compound pages to describe them as opposed to a > struct per 4K. > Same s/online/mapped/ comment as other changelogs. > To minimize struct page overhead and given the usage of compound pages we > utilize the fact that most tail pages look the same, we online the > subsection while pointing to the same pages. Thus request VMEMMAP_REUSE > in add_pages. > > With VMEMMAP_REUSE, provided we reuse most tail pages the amount of > struct pages we need to initialize is a lot smaller that the total > amount of structs we would normnally online. Thus allow an @init_order > to be passed to specify how much pages we want to prep upon creating a > compound page. > > Finally when onlining all struct pages in memmap_init_zone_device, make > sure that we only initialize the unique struct pages i.e. the first 2 > 4K pages from @align which means 128 struct pages out of 32768 for 2M > @align or 262144 for a 1G @align. > > Signed-off-by: Joao Martins > --- > mm/memremap.c | 4 +++- > mm/page_alloc.c | 23 ++++++++++++++++++++--- > 2 files changed, 23 insertions(+), 4 deletions(-) > > diff --git a/mm/memremap.c b/mm/memremap.c > index ecfa74848ac6..3eca07916b9d 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -253,8 +253,10 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, > goto err_kasan; > } > > - if (pgmap->flags & PGMAP_COMPOUND) > + if (pgmap->flags & PGMAP_COMPOUND) { > params->align = pgmap->align; > + params->flags = MEMHP_REUSE_VMEMMAP; The "reuse" naming is not my favorite. Yes, page reuse is happening, but what is more relevant is that the vmemmap is in a given minimum page size mode. So it's less of a flag and more of enum that selects between PAGE_SIZE, HPAGE_SIZE, and PUD_PAGE_SIZE (GPAGE_SIZE?). > + } > > error = arch_add_memory(nid, range->start, range_len(range), > params); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9716ecd58e29..180a7d4e9285 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -691,10 +691,11 @@ void free_compound_page(struct page *page) > __free_pages_ok(page, compound_order(page), FPI_NONE); > } > > -void prep_compound_page(struct page *page, unsigned int order) > +static void __prep_compound_page(struct page *page, unsigned int order, > + unsigned int init_order) > { > int i; > - int nr_pages = 1 << order; > + int nr_pages = 1 << init_order; > > __SetPageHead(page); > for (i = 1; i < nr_pages; i++) { > @@ -711,6 +712,11 @@ void prep_compound_page(struct page *page, unsigned int order) > atomic_set(compound_pincount_ptr(page), 0); > } > > +void prep_compound_page(struct page *page, unsigned int order) > +{ > + __prep_compound_page(page, order, order); > +} > + > #ifdef CONFIG_DEBUG_PAGEALLOC > unsigned int _debug_guardpage_minorder; > > @@ -6108,6 +6114,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > } > > #ifdef CONFIG_ZONE_DEVICE > + > +#define MEMMAP_COMPOUND_SIZE (2 * (PAGE_SIZE/sizeof(struct page))) > + > void __ref memmap_init_zone_device(struct zone *zone, > unsigned long start_pfn, > unsigned long nr_pages, > @@ -6138,6 +6147,12 @@ void __ref memmap_init_zone_device(struct zone *zone, > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > struct page *page = pfn_to_page(pfn); > > + /* Skip already initialized pages. */ > + if (compound && (pfn % align >= MEMMAP_COMPOUND_SIZE)) { > + pfn = ALIGN(pfn, align) - 1; > + continue; > + } > + > __init_single_page(page, pfn, zone_idx, nid); > > /* > @@ -6175,7 +6190,9 @@ void __ref memmap_init_zone_device(struct zone *zone, > > if (compound) { > for (pfn = start_pfn; pfn < end_pfn; pfn += align) > - prep_compound_page(pfn_to_page(pfn), order_base_2(align)); > + __prep_compound_page(pfn_to_page(pfn), > + order_base_2(align), > + order_base_2(MEMMAP_COMPOUND_SIZE)); > } Alex did quite a bit of work to optimize this path, and this organization appears to undo it. I'd prefer to keep it all in one loop so a 'struct page' is only initialized once. Otherwise by the time the above loop finishes and this one starts the 'struct page's are probably cache cold again. So I'd break prep_compoud_page into separate head and tail init and call them at the right time in one loop.