From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E21B0EEAA7A for ; Thu, 14 Sep 2023 22:41:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 139C18D0008; Thu, 14 Sep 2023 18:41:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EAD78D0001; Thu, 14 Sep 2023 18:41:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF6158D0008; Thu, 14 Sep 2023 18:41:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DFD5C8D0001 for ; Thu, 14 Sep 2023 18:41:14 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A53171A05F5 for ; Thu, 14 Sep 2023 22:41:14 +0000 (UTC) X-FDA: 81236675268.14.A7509B4 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf20.hostedemail.com (Postfix) with ESMTP id DB12E1C0007 for ; Thu, 14 Sep 2023 22:41:12 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZXR625kw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of souravpanda@google.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=souravpanda@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694731273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SG0YvV3at/eoR0dtmnRWIsHMwF0R749aC/fVz6CQHos=; b=Cssro/muxl910bdesqXh5/ZlivsiisCsSNFYn6wMYXvv/ke4aWcq8UtKWHsw6smsmJn3Pe vSZMo4dWtSrVnxQTvArNmf/omOu6q4QzTqLtrTi0sXVL7G4MyfVqyxeKtHQOFIQ1Y/82u+ PMupcYyyMI5gkrujhKjppc3O5efvzWk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZXR625kw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of souravpanda@google.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=souravpanda@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694731273; a=rsa-sha256; cv=none; b=t9wSnNEJNG4ygV/dNpZe6XZZ9GtXt7bl2pkHn5UlXIsnrMUVxwB0EI4euwxhUyog8Kqbhs ltbmY8Bv7Eak54yVKLEpt2oPfrWjjDfX2KVx11NFu8hBAw7fvKCqkLEw5Ut94FTGnKkyJv nDZIB4lEhZR3U6kQjO93T5t/B4g41N0= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-49032a0ff13so668929e0c.0 for ; Thu, 14 Sep 2023 15:41:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694731272; x=1695336072; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SG0YvV3at/eoR0dtmnRWIsHMwF0R749aC/fVz6CQHos=; b=ZXR625kwlUd4v+oXu1ZILxAoevC4hRarQFt2iEYs/Tka0yyl2GuiuptIuVdSoinjDq dZ111U9Kur639lhkhWg+mjKHI+Gwiia3NyXzQqaUiGxApXCtynlJYRsq4DDNe02BN2LC wLrXfDpiur4FLV8hbzsz3Z0RCpPk0Ft36SzQ33U7yFVnfqyUJYED48B/iApLzQnMIHIU 5w/j0gIV6WM/O9zbKnVHLsRLoYfuVWHDcN+35wOcIxfGwrqYD1vQqj/R4RQOsJpWqrow MLEpsEUKkDHkNdy6mEOqDDhT55Fdi2nl1AqiBuKCE/JJ6roJU4eu+2A/XWECgn8ieeBs 6hzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694731272; x=1695336072; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SG0YvV3at/eoR0dtmnRWIsHMwF0R749aC/fVz6CQHos=; b=sEumNgW1niCSyhlzpjccAEOCfp/blumvY08k8FWUqUeMop1Pr/Ktl5sO9TS47ItQbB 1LZ7pHLf+Tf+jFAoY+gtcHny/t7T6DEnUy1hOfdLN0BYqUbr0xy8scaAERilNx8yv3Ig 4S/WI0leKDUQDcYu3yHIvkp9mQ0AV5OKiA5lJ5nAyDdx9ORxC+P6tlpOCiybjvajbcwv k+hi6s40NVcbSfAROss/hLDCMhALhgaUztDLSDWoiaPrOV3k3rbvJRcGrffdiOoPluMB os2WlzAjDIfA41TsbaocEjCQYCgzjFbxfqVCd0iR3cDnMEaKB+n8EZhG+MqWI19IPG1K d+fA== X-Gm-Message-State: AOJu0YyK/k0TdTxdReb/kpAD37X2wnfRHHJCTeFhoGoZhAgz1NrQ3y/5 OQwQzZ6dUcwPm0wVNHbuZqAmlqcCfPH4o1gJXbiMLw== X-Google-Smtp-Source: AGHT+IGzeVaJdfodxtrPrepnbKLmCqqalSm94i76Nc8kU0QnE6PHPqECw2N8XzUXIcZH7Ppto1xppxhh0zGz8mSCtNA= X-Received: by 2002:a1f:4c83:0:b0:48d:1e9:2707 with SMTP id z125-20020a1f4c83000000b0048d01e92707mr124075vka.7.1694731271852; Thu, 14 Sep 2023 15:41:11 -0700 (PDT) MIME-Version: 1.0 References: <20230913173000.4016218-1-souravpanda@google.com> <20230913173000.4016218-2-souravpanda@google.com> <20230913205125.GA3303@kernel.org> In-Reply-To: <20230913205125.GA3303@kernel.org> From: Sourav Panda Date: Thu, 14 Sep 2023 15:41:00 -0700 Message-ID: Subject: Re: [PATCH v1 1/1] mm: report per-page metadata information To: Mike Rapoport Cc: corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, david@redhat.com, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, pasha.tatashin@soleen.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="0000000000007333030605595f03" X-Rspamd-Queue-Id: DB12E1C0007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: zxcy8qqnb9e9tnz51tmsqqnh65kmq6i9 X-HE-Tag: 1694731272-567709 X-HE-Meta: U2FsdGVkX1/uBDUTjQE7BIywdHm1274ehYwqIrWVUmlAN+JSraTlZAr7GvxG9foJwI1fsbRtTV4Nt3ei0Um22+Ph9s7Naxd5+CR9/fptN2nVtFaT+4uCFtIE2ShWE3qWE6IzxQlpXqzodszp9JM2C0WFV8pp4NWZAXs5kko+mxv8RzQ4WAFD0RH6jzCNlWl+CH5CcuLQCpO1lYhaT2yVRU5OhdgPXDhZsCFuW8G9uAMvDJGMbGirPrJtb9hf48Ix8eHbU8ORoD7aFnvSCO+Bm8ui5mrH4tlMPPQ5LTV6YXmewuijbQrV6UmdNwBJuv8U+81gS2QpitBl5A7YP+wHtizs/PBqdMsPsP01ZLkUYkE+dLEV6vEUapNAI3N/NG3r5VtRQuYHj/Xe/nn+68oswooCf/1jQVvFYMjG4ovjLeZqYnL/kxuHL3eRZIODtZSO+YldrWDHSNjPpUAWPk9CjzQuVkZFhQRMCwZCxut53KR6/A3PjPE7v7ZxbW+wRQZ8wFakyrvjNTEkKr2jnc7noQZ7Kz9VYeePgNVzGIFTKKDu9O8oucNwxchynCrViUldxLTB8NRnb2ZlAo+1wIC7OYmYraBYFQqT8nWAYWkGW1/KpUAnfs2s/i4dbVfOfqBs+XYLTB0J7Ysnou7ZwPiBjuQFu0KVSv9oNXYN5fSsoJPrbPXOQdPofLrqcmRTITtbJjT94hW340C1lX7xr51fFHPlOydQLAU/NTjxXFMQO/4I7N9SlgSx8OaSbMiF+uNMVYriy1L5ApVIxslrZcvlcQISDTYC9yjqqJe3GoowEk2xU1gSAuqnUgu/He6GV+dqvB/VmLOVQ/816ULrC0NjMv1gZlOzNy8EREgaX8C4cXttBvnSNfL7hgCoPapHVEhFRe89NazlblDKqMpE1xRo2/Stkcyjau2XD9W8AkWHbHWdfytdqvMop7gKzHau3c8hM2V+hqiQWKawy3cu0y8 tv0rDvUi j0ZQa8Ts/qFPfd8mcvLIZEfJi9PSKsOz/22HhEkaAaXQ25CVu5YbR0dzyOoc7xY+exwQOo786eBfFFKyP9wT0Xsmw3Qostj71Quli5Vk+kTkYV9ZbyxCVpVyWQ1TWat5a7unDV2zGXgZqJy6geo2xkvuYr2e/BbA71NvnTkrmwnfpnSGYnbjGzwtBDZGXGljhWfH2Bx22uXrNNMQCA2Hz7/Qz1T2EWrc2vSctZQOgS6dq0+hCdSg7gcmKVNmCOYMN//o+dybj1hg6fW4wlZPGsXmup7SOqEVtjXIwE6R0ICZ+TcnNjdbbaa96FsdnG090A+USe7kKVJPv6NcmcwJ0bDlwLiww2cDqeMWNpWhT/W2+qR8PRcQhvjzD7L4Z0m+aWEwb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --0000000000007333030605595f03 Content-Type: text/plain; charset="UTF-8" Thank you Mike Rapoport for reviewing this patch series. Please find my responses below. > > > #endif /* _LINUX_VMSTAT_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index ba6d39b71cb1..ca36751be50e 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1758,6 +1758,10 @@ static void > __update_and_free_hugetlb_folio(struct hstate *h, > > destroy_compound_gigantic_folio(folio, huge_page_order(h)); > > free_gigantic_folio(folio, huge_page_order(h)); > > } else { > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + __mod_node_page_state(NODE_DATA(page_to_nid(&folio->page)), > > + NR_PAGE_METADATA, > -huge_page_order(h)); > > I don't think memory map will change here with classic SPARSEMEM > Thank you. Yes, I agree with your comment. > > > +#endif > > __free_pages(&folio->page, huge_page_order(h)); > > } > > } > > @@ -2143,7 +2147,9 @@ static struct folio > *alloc_buddy_hugetlb_folio(struct hstate *h, > > __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); > > return NULL; > > } > > - > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > huge_page_order(h)); > > +#endif > > __count_vm_event(HTLB_BUDDY_PGALLOC); > > return page_folio(page); > > } > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index 4b9734777f69..7f920bfa8e79 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -214,6 +214,8 @@ static inline void free_vmemmap_page(struct page > *page) > > free_bootmem_page(page); > > else > > __free_page(page); > > + __mod_node_page_state(NODE_DATA(page_to_nid(page)), > > + NR_PAGE_METADATA, -1); > > } > > > > /* Free a list of the vmemmap pages */ > > @@ -336,6 +338,7 @@ static int vmemmap_remap_free(unsigned long start, > unsigned long end, > > (void *)walk.reuse_addr); > > list_add(&walk.reuse_page->lru, &vmemmap_pages); > > } > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, 1); > > > > /* > > * In order to make remapping routine most efficient for the huge > pages, > > @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long > start, unsigned long end, > > > > while (nr_pages--) { > > page = alloc_pages_node(nid, gfp_mask, 0); > > - if (!page) > > + if (!page) { > > goto out; > > + } else { > > + __mod_node_page_state(NODE_DATA(page_to_nid(page)), > > + NR_PAGE_METADATA, 1); > > We can update this once for nr_pages outside the loop, cannot we? > Thank you for the comment. I agree with you and shall incorporate it. > > > + } > > list_add_tail(&page->lru, list); > > } > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > index 50f2f34745af..e02dce7e2e9a 100644 > > --- a/mm/mm_init.c > > +++ b/mm/mm_init.c > > @@ -26,6 +26,7 @@ > > #include > > #include > > #include > > +#include > > #include "internal.h" > > #include "slab.h" > > #include "shuffle.h" > > @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct > pglist_data *pgdat) > > panic("Failed to allocate %ld bytes for node %d > memory map\n", > > size, pgdat->node_id); > > pgdat->node_mem_map = map + offset; > > + mod_node_early_perpage_metadata(pgdat->node_id, > > + PAGE_ALIGN(size) >> > PAGE_SHIFT); > > } > > pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", > > __func__, pgdat->node_id, (unsigned > long)pgdat, > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 0c5be12f9336..4e295d5087f4 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void) > > for_each_online_pgdat(pgdat) > > pgdat->per_cpu_nodestats = > > alloc_percpu(struct per_cpu_nodestat); > > + writeout_early_perpage_metadata(); > > Why it's called here? > You can copy early stats to actual node stats as soon as the nodes and page > allocator are initialized. > Thank you for mentioning this. I agree with you and shall move it there. > > > } > > > > __meminit void zone_pcp_init(struct zone *zone) > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 4548fcc66d74..b5b9d3079e20 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid) > > return -ENOMEM; > > NODE_DATA(nid)->node_page_ext = base; > > total_usage += table_size; > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + PAGE_ALIGN(table_size) >> PAGE_SHIFT); > > return 0; > > } > > > > @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t size, > int nid) > > void *addr = NULL; > > > > addr = alloc_pages_exact_nid(nid, size, flags); > > - if (addr) { > > + if (addr) > > kmemleak_alloc(addr, size, 1, flags); > > - return addr; > > - } > > + else > > + addr = vzalloc_node(size, nid); > > > > - addr = vzalloc_node(size, nid); > > + if (addr) { > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + PAGE_ALIGN(size) >> PAGE_SHIFT); > > + } > > > > return addr; > > } > > @@ -314,6 +319,10 @@ static void free_page_ext(void *addr) > > BUG_ON(PageReserved(page)); > > kmemleak_free(addr); > > free_pages_exact(addr, table_size); > > + > > + __mod_node_page_state(NODE_DATA(page_to_nid(page)), > NR_PAGE_METADATA, > > + (long)-1 * (PAGE_ALIGN(table_size) > >> PAGE_SHIFT)); > > + > > what happens with vmalloc()ed page_ext? > Thank you for pointing this out. I shall also make this change for vmalloc()ed page_ext. > > > } > > } > > > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > > index a2cbe44c48e1..e33f302db7c6 100644 > > --- a/mm/sparse-vmemmap.c > > +++ b/mm/sparse-vmemmap.c > > @@ -469,5 +469,8 @@ struct page * __meminit > __populate_section_memmap(unsigned long pfn, > > if (r < 0) > > return NULL; > > > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + PAGE_ALIGN(end - start) >> PAGE_SHIFT); > > + > > return pfn_to_page(pfn); > > } > > diff --git a/mm/sparse.c b/mm/sparse.c > > index 77d91e565045..db78233a85ef 100644 > > --- a/mm/sparse.c > > +++ b/mm/sparse.c > > @@ -14,7 +14,7 @@ > > #include > > #include > > #include > > - > > +#include > > #include "internal.h" > > #include > > > > @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned long > size, int nid) > > */ > > sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid, > true); > > sparsemap_buf_end = sparsemap_buf + size; > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + mod_node_early_perpage_metadata(nid, PAGE_ALIGN(size) >> > PAGE_SHIFT); > > All early struct pages are allocated in memmap_alloc(). It'd make sense to > update > the counter there. > Thanks for the comment. The reason why we did not do it in memmap_alloc() is because the struct pages can decrease as well. > > > +#endif > > } > > > > static void __init sparse_buffer_fini(void) > > @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned long > pfn, unsigned long nr_pages, > > unsigned long start = (unsigned long) pfn_to_page(pfn); > > unsigned long end = start + nr_pages * sizeof(struct page); > > > > + __mod_node_page_state(NODE_DATA(page_to_nid(pfn_to_page(pfn))), > NR_PAGE_METADATA, > > + (long)-1 * (PAGE_ALIGN(end - start) >> > PAGE_SHIFT)); > > vmemmap_free(start, end, altmap); > > } > > static void free_map_bootmem(struct page *memmap) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 00e81e99c6ee..731eb5264b49 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] = { > > "pgpromote_success", > > "pgpromote_candidate", > > #endif > > + "nr_page_metadata", > > > > /* enum writeback_stat_item counters */ > > "nr_dirty_threshold", > > @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void) > > } > > > > module_init(extfrag_debug_init); > > + > > +// Page metadata size (struct page and page_ext) in pages > > +unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata; > > static? > Thanks for pointing this out. I shall make __initdata static in the next version of the patch. --0000000000007333030605595f03 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you=C2=A0Mike Rapopo= rt for reviewing this=C2=A0patch series. Please find my responses below.
=C2=A0

>=C2=A0 #endif /* _LINUX_VMSTAT_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ba6d39b71cb1..ca36751be50e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1758,6 +1758,10 @@ static void __update_and_free_hugetlb_folio(str= uct hstate *h,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0destroy_compound= _gigantic_folio(folio, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_gigantic_fo= lio(folio, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__mod_node_page_state= (NODE_DATA(page_to_nid(&folio->page)),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NR_PAGE_METADATA, -= huge_page_order(h));

I don't think memory map will change here with classic SPARSEMEM

Thank you. Yes, I agree with your comment.
=C2=A0

> +#endif
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__free_pages(&am= p;folio->page, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 }
> @@ -2143,7 +2147,9 @@ static struct folio *alloc_buddy_hugetlb_folio(s= truct hstate *h,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__count_vm_event= (HTLB_BUDDY_PGALLOC_FAIL);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return NULL;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA, huge_page_order(h));
> +#endif
>=C2=A0 =C2=A0 =C2=A0 =C2=A0__count_vm_event(HTLB_BUDDY_PGALLOC);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return page_folio(page);
>=C2=A0 }
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4b9734777f69..7f920bfa8e79 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -214,6 +214,8 @@ static inline void free_vmemmap_page(struct page *= page)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_bootmem_pag= e(page);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__free_page(page= );
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(page_to_nid(page)= ),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0NR_PAGE_METADATA, -1);
>=C2=A0 }
>=C2=A0
>=C2=A0 /* Free a list of the vmemmap pages */
> @@ -336,6 +338,7 @@ static int vmemmap_remap_free(unsigned long start,= unsigned long end,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0(void *)walk.reuse_addr);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0list_add(&wa= lk.reuse_page->lru, &vmemmap_pages);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA, 1);
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * In order to make remapping routine most e= fficient for the huge pages,
> @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long = start, unsigned long end,
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0while (nr_pages--) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D alloc_p= ages_node(nid, gfp_mask, 0);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!page)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!page) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0goto out;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0__mod_node_page_state(NODE_DATA(page_to_nid(page)),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0NR_PAGE_METADATA, 1);

We can update this once for nr_pages outside the loop, cannot we?

Thank you for the comment. I agree with you and = shall incorporate it.
=C2=A0

> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0list_add_tail(&a= mp;page->lru, list);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 50f2f34745af..e02dce7e2e9a 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -26,6 +26,7 @@
>=C2=A0 #include <linux/pgtable.h>
>=C2=A0 #include <linux/swap.h>
>=C2=A0 #include <linux/cma.h>
> +#include <linux/vmstat.h>
>=C2=A0 #include "internal.h"
>=C2=A0 #include "slab.h"
>=C2=A0 #include "shuffle.h"
> @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct pgl= ist_data *pgdat)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0panic("Failed to allocate %ld bytes for node %d memory ma= p\n",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size, pgdat->node_id);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat->node_m= em_map =3D map + offset;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mod_node_early_perpag= e_metadata(pgdat->node_id,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0PAGE_ALIGN(size) >> PAGE_SHIFT);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0pr_debug("%s: node %d, pgdat %08lx, nod= e_mem_map %08lx\n",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__func__, pgdat->node_id, (unsi= gned long)pgdat,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..4e295d5087f4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0for_each_online_pgdat(pgdat)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat->per_cp= u_nodestats =3D
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0alloc_percpu(struct per_cpu_nodestat);
> +=C2=A0 =C2=A0 =C2=A0writeout_early_perpage_metadata();

Why it's called here?
You can copy early stats to actual node stats as soon as the nodes and page=
allocator are initialized.

Thank you fo= r=C2=A0mentioning this. I agree with you and shall move it there.
=C2=A0

>=C2=A0 }
>=C2=A0
>=C2=A0 __meminit void zone_pcp_init(struct zone *zone)
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 4548fcc66d74..b5b9d3079e20 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return -ENOMEM;<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0NODE_DATA(nid)->node_page_ext =3D base; >=C2=A0 =C2=A0 =C2=A0 =C2=A0total_usage +=3D table_size;
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0PAGE_ALIGN(table_size) >> PAGE_SHIFT); >=C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
>=C2=A0 }
>=C2=A0
> @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t siz= e, int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0void *addr =3D NULL;
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0addr =3D alloc_pages_exact_nid(nid, size, fl= ags);
> -=C2=A0 =C2=A0 =C2=A0if (addr) {
> +=C2=A0 =C2=A0 =C2=A0if (addr)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kmemleak_alloc(a= ddr, size, 1, flags);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return addr;
> -=C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0else
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0addr =3D vzalloc_node= (size, nid);
>=C2=A0
> -=C2=A0 =C2=A0 =C2=A0addr =3D vzalloc_node(size, nid);
> +=C2=A0 =C2=A0 =C2=A0if (addr) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__mod_node_page_state= (NODE_DATA(nid), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PAGE_ALIGN(size) &g= t;> PAGE_SHIFT);
> +=C2=A0 =C2=A0 =C2=A0}
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return addr;
>=C2=A0 }
> @@ -314,6 +319,10 @@ static void free_page_ext(void *addr)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(PageReser= ved(page));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kmemleak_free(ad= dr);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_pages_exact= (addr, table_size);
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__mod_node_page_state= (NODE_DATA(page_to_nid(page)), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(long)-1 * (PAGE_AL= IGN(table_size) >> PAGE_SHIFT));
> +

what happens with vmalloc()ed page_ext?

Thank you for pointing this out. I shall also make this change for vmalloc= ()ed page_ext.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 }
>=C2=A0
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index a2cbe44c48e1..e33f302db7c6 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -469,5 +469,8 @@ struct page * __meminit __populate_section_memmap(= unsigned long pfn,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (r < 0)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return NULL;
>=C2=A0
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0PAGE_ALIGN(end - start) >> PAGE_SHIFT); > +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return pfn_to_page(pfn);
>=C2=A0 }
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e565045..db78233a85ef 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -14,7 +14,7 @@
>=C2=A0 #include <linux/swap.h>
>=C2=A0 #include <linux/swapops.h>
>=C2=A0 #include <linux/bootmem_info.h>
> -
> +#include <linux/vmstat.h>
>=C2=A0 #include "internal.h"
>=C2=A0 #include <asm/dma.h>
>=C2=A0
> @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned lon= g size, int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sparsemap_buf =3D memmap_alloc(size, section= _map_size(), addr, nid, true);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sparsemap_buf_end =3D sparsemap_buf + size;<= br> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0mod_node_early_perpage_metadata(nid, PAGE_ALIGN(s= ize) >> PAGE_SHIFT);

All early struct pages are allocated in memmap_alloc(). It'd make sense= to update
the counter there.

Thanks for the comme= nt. The reason why we did not do it in memmap_alloc() is because the struct= pages can decrease as well.
=C2=A0

> +#endif
>=C2=A0 }
>=C2=A0
>=C2=A0 static void __init sparse_buffer_fini(void)
> @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned lon= g pfn, unsigned long nr_pages,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long start =3D (unsigned long) pfn_= to_page(pfn);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long end =3D start + nr_pages * siz= eof(struct page);
>=C2=A0
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(page_to_nid(pfn_t= o_page(pfn))), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0(long)-1 * (PAGE_ALIGN(end - start) >> PA= GE_SHIFT));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0vmemmap_free(start, end, altmap);
>=C2=A0 }
>=C2=A0 static void free_map_bootmem(struct page *memmap)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 00e81e99c6ee..731eb5264b49 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] =3D {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"pgpromote_success",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"pgpromote_candidate",
>=C2=A0 #endif
> +=C2=A0 =C2=A0 =C2=A0"nr_page_metadata",
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* enum writeback_stat_item counters */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"nr_dirty_threshold",
> @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void)
>=C2=A0 }
>=C2=A0
>=C2=A0 module_init(extfrag_debug_init);
> +
> +// Page metadata size (struct page and page_ext) in pages
> +unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata;

static?

Thanks for pointing this out. I= shall make=C2=A0 __initdata static in the next version of the patch.
=
--0000000000007333030605595f03--