From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D65EC4167B for ; Wed, 1 Nov 2023 22:58:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 658E98000B; Wed, 1 Nov 2023 18:58:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6082480009; Wed, 1 Nov 2023 18:58:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 482188000B; Wed, 1 Nov 2023 18:58:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 35AF580009 for ; Wed, 1 Nov 2023 18:58:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id ED89D40681 for ; Wed, 1 Nov 2023 22:58:33 +0000 (UTC) X-FDA: 81410901306.15.4AB89B8 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf01.hostedemail.com (Postfix) with ESMTP id D651D40009 for ; Wed, 1 Nov 2023 22:58:31 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RErpZTJp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of souravpanda@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=souravpanda@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698879512; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oP//q9gDeLPd3JMaMLn64em5B3SRzRf4qK2QwLdUoVU=; b=JRN7YmwtD6xQx+II5H2gAMYZCygkQv2OoUlOc/Juc1RS4tVPVPSVleEEYaOrNDlRAgWoro ClEzNXYlAZkMfBD/N6g1KGGPvwCq47r3W3RIhLOFqdykXuOk01VdEnlSkFWjh10qOkVSwQ 7Ejs0R3/YTdvkI/dovr/hiHLjpbR/GA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RErpZTJp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of souravpanda@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=souravpanda@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698879512; a=rsa-sha256; cv=none; b=pTDFm0Tfct5rl/y8tiqFPCQeV6uXKyc76SPcfBsDLTifYI3B0I8Z7hYTZh+xagep1LIChW BZ3bwU7S7RjI7wsV1okrYQnInJX7HRwjUtFySFF1VXWauH4GH1pfjd7KpJ+AcuOG5Ljd4+ nIEMLHR7gaYl2KFcH8fxDmCTFCBZ5Rk= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-9becde9ea7bso287256366b.0 for ; Wed, 01 Nov 2023 15:58:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698879510; x=1699484310; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oP//q9gDeLPd3JMaMLn64em5B3SRzRf4qK2QwLdUoVU=; b=RErpZTJpCEfCutJxW3OSpyPUeSIUB78aArdBeYlQNbOOR7pLj19TD4+CVR193KoRHl zitCtjM8d+hJ4FF4J3UrRdV/KachCn+rTj/GJtjZsq+PLUKqO4Wn07pZ2ehjiv3Xydp/ Vzv2ON2CtGOkRH+rCIbnmAtuPl7p4VKwRz0Ma4cjcDJZeguKjCpR4t734o61VuPCtG8Y A5E0wAYv0dWhWSSEKT2NbZcu0mbH6wsFZ3WRTY6US/M9uCWHo4ica74snBTVxbwzoVeU qOGCl9HoNAmJKFufFgtD7jL5TbyrPaSIsQ2oA4IXh6s6QdG5GDLdxy1ZjfIjcknwNiXB YbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698879510; x=1699484310; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oP//q9gDeLPd3JMaMLn64em5B3SRzRf4qK2QwLdUoVU=; b=YqZmnkzJnt85S7r8tPJF0XXYpWHPXE97HF7f0sbJlt0LF5MRhP6fxxZeMHzHZpM7su wdQrpqlv8GQZwEIPac8lY9MJBV7Ou1Uk+Jd4cz4f90r2hLdSbRgyFj8d+ZWd8JjPhRJJ 1WtCUsZzZpaeHo8FLq776OP9o6BUFjI+m+Oin1VvTMu1CdZxdp+ZAuvgKd9sleK6Ez1D M0gif382Ul1yh4CKLo9UrsWgWtz1l/0YFLCVbr/0g0cgUSIe3DmhOcRMy7ICLvDYqpSo JiwuxV02Rh1OY5k97DxJVrF6zNNLTlv8ysUWMg7zqwC9Ainw5k+PY5JhyBIjIBV8zWG4 i+Kg== X-Gm-Message-State: AOJu0Yx3IZc1GPNk+uG9tOAxdPDP8pmGXkcD/7XNUVP4r+4GVWiYe0Ep jtfb+esn+N/7C6wGFnt1fjdtQE8VrC4dDUGMyQAfKQ== X-Google-Smtp-Source: AGHT+IGwlmFGeKDWNetiDHdPSpAzAgKW0D1CANmf+HXkDJ/VQV6gMSU8oje/BuRON5WDjYsdW+CxnsgCEvSGsfOGcpU= X-Received: by 2002:a17:906:e4a:b0:9a5:c38d:6b75 with SMTP id q10-20020a1709060e4a00b009a5c38d6b75mr3165326eji.15.1698879510133; Wed, 01 Nov 2023 15:58:30 -0700 (PDT) MIME-Version: 1.0 References: <20231031223846.827173-1-souravpanda@google.com> <20231031223846.827173-2-souravpanda@google.com> <4a1de79e-a3e8-2544-e975-e17cad0d2f8a@linux.dev> In-Reply-To: <4a1de79e-a3e8-2544-e975-e17cad0d2f8a@linux.dev> From: Sourav Panda Date: Wed, 1 Nov 2023 15:58:18 -0700 Message-ID: Subject: Re: [PATCH v4 1/1] mm: report per-page metadata information To: Muchun Song Cc: corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, rppt@kernel.org, david@redhat.com, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, pasha.tatashin@soleen.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, weixugc@google.com Content-Type: multipart/alternative; boundary="000000000000b8138406091f35c6" X-Rspamd-Queue-Id: D651D40009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: fbmmzxw5w4xsfsz8yn85ush86ng3sdeu X-HE-Tag: 1698879511-790492 X-HE-Meta: U2FsdGVkX19zxPGY9Da9NV6sy/ghXEpVZXAgn/ZkMZQHG8g7o5+2R68upvRcp3nAfjDeyDC6eTmVNON1yNGkQTTXTcItKDaq6r/YdSRUtLZ7MaCFIVvNBqriWl5QuxMhXJ0jlCv8EBWTCiQ00QoT4mbbvVIB/sNby08hcv6zruZpghi1Ik8M5lVPCxOuFO9EWMp7MO872aUMQn+vNUKOYP7xVmDc6gqGdmkMoY4NmknziTT8FEE1SkzBjowS9+bpZP05J2PyzErWMUt+680gx4zJKH8w+8YASj0p3lkhhALK0EHMjNwFo7SBzRt5Mojp9GLBovZ28BwX2OAigH4jwa3z7T1/v0u5Igmq1uZzU7sLKaLjfETLRcOj5nd3eo1W+qv5dkcEyzs7f8jDH7+8TMa2cY+cYunwijzKvIJYEgilmqZnuCN4JqJQQO8YBVTH9lyxlxgnERWaoAwsK8xTM1nX5/5MpoMw620c286SNnYZFLAov4Y8Bak/DjIKAO2LotK5aqdVXPdQk8OV61AdjGXeI6NB5NDNtC44bsMtfrUXSubrmr65QmxlH+QzkQWu4xyyIqAJtFIL6kUwXPMVEIuvMSY5OKfKTKdPI+l+W1KDp7OYvb7nWh+24nxuUJyrtqDUekbb0W6G5K0W5hdVONNrFSxzPOQFze1SjPqouHY+960mZA+Tiw7ArPxHD9DBOSrQiXF/aRJV1+VWdAjiXd6KgaDeISR5Lu3gbOe3SmVCAKKv53KaQEAUKbTt5GA3MMB7qX8fEsOzH0+cVVSfZwh+yWLgLU5SYezPIQtUhVlHR6bZRc3qA6VqXVbq21fkJ3hvUK24N2YiHm+fgWC79RTHLbXKQv1ErOh35JrvTKSnN7+tmKkM1HSMrMUz3Qxi8o54tdiLAHpS3zh2IsOaWDWM8IVXZpj9S9UPHXpUuK34VfFuYbrUtNzDmAltrnsS1sqMn/RyoKwFUmyyjIP 5btRyDOy M6ArnVdDFeRVswDrnk7Me9foAS1eSkgYJy7ZowbHjVljw794XHUmvRbu8Eg6jTnN31Ub92aAGdIFm7fUcmemkc4Ku0hWl+DhFvAL3S8GfVhkoY55KXSIo8XrJupYMgifomlAzPnIJRFg/eRSQdlT0OzHePCzMplhFS7uxj7VG3DxwJYbxK54fqaCU6qSCukQ3E+YJqd3MrmMuvTInV/x9Z2XFvgaoAwA1QRcaeALLhrRZ/ByJeQhgaITxtFswX0B0GwrAv3kv2FxMLSaV21t3SEMQZ6Y2l1OkArlvowWtS4agon3TL2T5n6ND3IPU4KGxA4vX8Ic2wyjIBhv30yNneNPYXEP2Edio5JCXn0D+Z/1u5+KjuvqA13tflRGCVWmwm6ujvVEvRZG8fDXKfWcL6oBp6SYT3dUCq1yTEhzhMZWCl3ApgvzY+4zi0B/JcFkBAzG+7hf/rD6Jit28GF6zXzsTM5plSWJnYkakbF/K/U0Xxls= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --000000000000b8138406091f35c6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Oct 31, 2023 at 8:38=E2=80=AFPM Muchun Song = wrote: > > > On 2023/11/1 06:38, Sourav Panda wrote: > > Adds a new per-node PageMetadata field to > > /sys/devices/system/node/nodeN/meminfo > > and a global PageMetadata field to /proc/meminfo. This information can > > be used by users to see how much memory is being used by per-page > > metadata, which can vary depending on build configuration, machine > > architecture, and system use. > > > > Per-page metadata is the amount of memory that Linux needs in order to > > manage memory at the page granularity. The majority of such memory is > > used by "struct page" and "page_ext" data structures. In contrast to > > most other memory consumption statistics, per-page metadata might not > > be included in MemTotal. For example, MemTotal does not include membloc= k > > allocations but includes buddy allocations. While on the other hand, > > per-page metadata would include both memblock and buddy allocations. > > > > This memory depends on build configurations, machine architectures, and > > the way system is used: > > > > Build configuration may include extra fields into "struct page", > > and enable / disable "page_ext" > > Machine architecture defines base page sizes. For example 4K x86, > > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata > > overhead is smaller on machines with larger page sizes. > > System use can change per-page overhead by using vmemmap > > optimizations with hugetlb pages, and emulated pmem devdax pages. > > Also, boot parameters can determine whether page_ext is needed > > to be allocated. This memory can be part of MemTotal or be outside > > MemTotal depending on whether the memory was hot-plugged, booted with, > > or hugetlb memory was returned back to the system. > > > > Suggested-by: Pasha Tatashin > > Signed-off-by: Sourav Panda > > --- > > Documentation/filesystems/proc.rst | 3 +++ > > drivers/base/node.c | 2 ++ > > fs/proc/meminfo.c | 7 +++++++ > > include/linux/mmzone.h | 3 +++ > > include/linux/vmstat.h | 4 ++++ > > mm/hugetlb.c | 11 ++++++++-- > > mm/hugetlb_vmemmap.c | 8 ++++++-- > > mm/mm_init.c | 3 +++ > > mm/page_alloc.c | 1 + > > mm/page_ext.c | 32 +++++++++++++++++++++--------= - > > mm/sparse-vmemmap.c | 3 +++ > > mm/sparse.c | 7 ++++++- > > mm/vmstat.c | 24 ++++++++++++++++++++++ > > 13 files changed, 94 insertions(+), 14 deletions(-) > > > > diff --git a/Documentation/filesystems/proc.rst > b/Documentation/filesystems/proc.rst > > index 2b59cff8be17..c121f2ef9432 100644 > > --- a/Documentation/filesystems/proc.rst > > +++ b/Documentation/filesystems/proc.rst > > @@ -987,6 +987,7 @@ Example output. You may not have all of these field= s. > > AnonPages: 4654780 kB > > Mapped: 266244 kB > > Shmem: 9976 kB > > + PageMetadata: 513419 kB > > KReclaimable: 517708 kB > > Slab: 660044 kB > > SReclaimable: 517708 kB > > @@ -1089,6 +1090,8 @@ Mapped > > files which have been mmapped, such as libraries > > Shmem > > Total memory used by shared memory (shmem) and tmpfs > > +PageMetadata > > + Memory used for per-page metadata > > KReclaimable > > Kernel allocations that the kernel will attempt to > reclaim > > under memory pressure. Includes SReclaimable (below), > and other > > diff --git a/drivers/base/node.c b/drivers/base/node.c > > index 493d533f8375..da728542265f 100644 > > --- a/drivers/base/node.c > > +++ b/drivers/base/node.c > > @@ -428,6 +428,7 @@ static ssize_t node_read_meminfo(struct device *dev= , > > "Node %d Mapped: %8lu kB\n" > > "Node %d AnonPages: %8lu kB\n" > > "Node %d Shmem: %8lu kB\n" > > + "Node %d PageMetadata: %8lu kB\n" > > "Node %d KernelStack: %8lu kB\n" > > #ifdef CONFIG_SHADOW_CALL_STACK > > "Node %d ShadowCallStack:%8lu kB\n" > > @@ -458,6 +459,7 @@ static ssize_t node_read_meminfo(struct device *dev= , > > nid, K(node_page_state(pgdat, > NR_FILE_MAPPED)), > > nid, K(node_page_state(pgdat, > NR_ANON_MAPPED)), > > nid, K(i.sharedram), > > + nid, K(node_page_state(pgdat, > NR_PAGE_METADATA)), > > nid, node_page_state(pgdat, > NR_KERNEL_STACK_KB), > > #ifdef CONFIG_SHADOW_CALL_STACK > > nid, node_page_state(pgdat, NR_KERNEL_SCS_KB= ), > > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c > > index 45af9a989d40..f141bb2a550d 100644 > > --- a/fs/proc/meminfo.c > > +++ b/fs/proc/meminfo.c > > @@ -39,7 +39,9 @@ static int meminfo_proc_show(struct seq_file *m, void > *v) > > long available; > > unsigned long pages[NR_LRU_LISTS]; > > unsigned long sreclaimable, sunreclaim; > > + unsigned long nr_page_metadata; > > int lru; > > + int nid; > > > > si_meminfo(&i); > > si_swapinfo(&i); > > @@ -57,6 +59,10 @@ static int meminfo_proc_show(struct seq_file *m, voi= d > *v) > > sreclaimable =3D global_node_page_state_pages(NR_SLAB_RECLAIMABLE= _B); > > sunreclaim =3D global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE= _B); > > > > + nr_page_metadata =3D 0; > > + for_each_online_node(nid) > > + nr_page_metadata +=3D node_page_state(NODE_DATA(nid), > NR_PAGE_METADATA); > > + > > show_val_kb(m, "MemTotal: ", i.totalram); > > show_val_kb(m, "MemFree: ", i.freeram); > > show_val_kb(m, "MemAvailable: ", available); > > @@ -104,6 +110,7 @@ static int meminfo_proc_show(struct seq_file *m, > void *v) > > show_val_kb(m, "Mapped: ", > > global_node_page_state(NR_FILE_MAPPED)); > > show_val_kb(m, "Shmem: ", i.sharedram); > > + show_val_kb(m, "PageMetadata: ", nr_page_metadata); > > show_val_kb(m, "KReclaimable: ", sreclaimable + > > global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE)); > > show_val_kb(m, "Slab: ", sreclaimable + sunreclaim); > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 4106fbc5b4b3..dda1ad522324 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -207,6 +207,9 @@ enum node_stat_item { > > PGPROMOTE_SUCCESS, /* promote successfully */ > > PGPROMOTE_CANDIDATE, /* candidate pages to promote */ > > #endif > > + NR_PAGE_METADATA, /* Page metadata size (struct page and > page_ext) > > + * in pages > > + */ > > NR_VM_NODE_STAT_ITEMS > > }; > > > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > > index fed855bae6d8..af096a881f03 100644 > > --- a/include/linux/vmstat.h > > +++ b/include/linux/vmstat.h > > @@ -656,4 +656,8 @@ static inline void lruvec_stat_sub_folio(struct > folio *folio, > > { > > lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio)); > > } > > + > > +void __init mod_node_early_perpage_metadata(int nid, long delta); > > +void __init store_early_perpage_metadata(void); > > + > > #endif /* _LINUX_VMSTAT_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 1301ba7b2c9a..cd3158a9c7f3 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1790,6 +1790,9 @@ static void __update_and_free_hugetlb_folio(struc= t > hstate *h, > > destroy_compound_gigantic_folio(folio, huge_page_order(h)= ); > > free_gigantic_folio(folio, huge_page_order(h)); > > } else { > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + __node_stat_sub_folio(folio, NR_PAGE_METADATA); > > +#endif > > __free_pages(&folio->page, huge_page_order(h)); > > } > > } > > @@ -2125,6 +2128,7 @@ static struct folio > *alloc_buddy_hugetlb_folio(struct hstate *h, > > struct page *page; > > bool alloc_try_hard =3D true; > > bool retry =3D true; > > + struct folio *folio; > > > > /* > > * By default we always try hard to allocate the page with > > @@ -2175,9 +2179,12 @@ static struct folio > *alloc_buddy_hugetlb_folio(struct hstate *h, > > __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); > > return NULL; > > } > > - > > + folio =3D page_folio(page); > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + __node_stat_add_folio(folio, NR_PAGE_METADATA) > > Seems you have not tested this patch with CONFIG_SPARSEMEM_VMEMMAP > disabled. > You missed ";" in the end. > Thanks for reviewing this patch. I will submit v5 by testing against FLATMEM and SPARSEMEM (VMEMMAP disabled) memory model on ARM32. This error was introduced in v4. > > > +#endif > > I am curious why we should account HugeTLB pages as metadata. > When HugeTLB pages are reserved, memory pertaining to redundant `struct page` are returned to the buddy allocator for other uses. This essentially reflects the change in the amount of `struct pages` when HugeTLB pages are reserved and free'd. > > > __count_vm_event(HTLB_BUDDY_PGALLOC); > > - return page_folio(page); > > + return folio; > > } > > > > /* > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index 4b9734777f69..804a93d18cab 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -214,6 +214,7 @@ static inline void free_vmemmap_page(struct page > *page) > > free_bootmem_page(page); > > else > > __free_page(page); > > + __mod_node_page_state(page_pgdat(page), NR_PAGE_METADATA, -1); > > } > > > > /* Free a list of the vmemmap pages */ > > @@ -336,6 +337,7 @@ static int vmemmap_remap_free(unsigned long start, > unsigned long end, > > (void *)walk.reuse_addr); > > list_add(&walk.reuse_page->lru, &vmemmap_pages); > > } > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, 1); > > What if allocation of walk.reuse_page fails? > Thank you for pointing this out. I will move the NR_PAGE_METADATA update within the if ( walk.reuse_page ) clause to cover the case where walk.reuse_page fails= . > > > > > /* > > * In order to make remapping routine most efficient for the huge > pages, > > @@ -381,14 +383,16 @@ static int alloc_vmemmap_page_list(unsigned long > start, unsigned long end, > > struct list_head *list) > > { > > gfp_t gfp_mask =3D GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISN= ODE; > > - unsigned long nr_pages =3D (end - start) >> PAGE_SHIFT; > > + unsigned long nr_pages =3D DIV_ROUND_UP(end - start, PAGE_SIZE); > > "end - start" is always multiple of PAGE_SIZE, why we need DIV_ROUND_UP > here? > Thank you. I agree with this and will revert this change in v5. > > > int nid =3D page_to_nid((struct page *)start); > > struct page *page, *next; > > + int i; > > > > - while (nr_pages--) { > > + for (i =3D 0; i < nr_pages; i++) { > > page =3D alloc_pages_node(nid, gfp_mask, 0); > > if (!page) > > goto out; > > + __mod_node_page_state(page_pgdat(page), NR_PAGE_METADATA, > 1); > > list_add_tail(&page->lru, list); > > } > > Count one by ine is really inefficient. Can't we count *nr_pages* at > one time? > Thanks for suggesting this optimization. I will modify the implementation to update the metadata once as opposed to every iteration. > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > index 50f2f34745af..6997bf00945b 100644 > > --- a/mm/mm_init.c > > +++ b/mm/mm_init.c > > @@ -26,6 +26,7 @@ > > #include > > #include > > #include > > +#include > > #include "internal.h" > > #include "slab.h" > > #include "shuffle.h" > > @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct > pglist_data *pgdat) > > panic("Failed to allocate %ld bytes for node %d > memory map\n", > > size, pgdat->node_id); > > pgdat->node_mem_map =3D map + offset; > > + mod_node_early_perpage_metadata(pgdat->node_id, > > + DIV_ROUND_UP(size, > PAGE_SIZE)); > > } > > pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", > > __func__, pgdat->node_id, (unsigned > long)pgdat, > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 85741403948f..522dc0c52610 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void) > > for_each_online_pgdat(pgdat) > > pgdat->per_cpu_nodestats =3D > > alloc_percpu(struct per_cpu_nodestat); > > + store_early_perpage_metadata(); > > } > > > > __meminit void zone_pcp_init(struct zone *zone) > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 4548fcc66d74..d8d6db9c3d75 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid) > > return -ENOMEM; > > NODE_DATA(nid)->node_page_ext =3D base; > > total_usage +=3D table_size; > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + DIV_ROUND_UP(table_size, PAGE_SIZE)); > > return 0; > > } > > > > @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t size= , > int nid) > > void *addr =3D NULL; > > > > addr =3D alloc_pages_exact_nid(nid, size, flags); > > - if (addr) { > > + if (addr) > > kmemleak_alloc(addr, size, 1, flags); > > - return addr; > > - } > > + else > > + addr =3D vzalloc_node(size, nid); > > > > - addr =3D vzalloc_node(size, nid); > > + if (addr) { > > + mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + DIV_ROUND_UP(size, PAGE_SIZE)); > > + } > > > > return addr; > > } > > @@ -303,18 +308,27 @@ static int __meminit > init_section_page_ext(unsigned long pfn, int nid) > > > > static void free_page_ext(void *addr) > > { > > + size_t table_size; > > + struct page *page; > > + struct pglist_data *pgdat; > > + > > + table_size =3D page_ext_size * PAGES_PER_SECTION; > > + > > if (is_vmalloc_addr(addr)) { > > + page =3D vmalloc_to_page(addr); > > + pgdat =3D page_pgdat(page); > > vfree(addr); > > } else { > > - struct page *page =3D virt_to_page(addr); > > - size_t table_size; > > - > > - table_size =3D page_ext_size * PAGES_PER_SECTION; > > - > > + page =3D virt_to_page(addr); > > + pgdat =3D page_pgdat(page); > > BUG_ON(PageReserved(page)); > > kmemleak_free(addr); > > free_pages_exact(addr, table_size); > > } > > + > > + __mod_node_page_state(pgdat, NR_PAGE_METADATA, > > + -1L * (DIV_ROUND_UP(table_size, PAGE_SIZE))= ); > > + > > } > > > > static void __free_page_ext(unsigned long pfn) > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > > index a2cbe44c48e1..2bc67b2c2aa2 100644 > > --- a/mm/sparse-vmemmap.c > > +++ b/mm/sparse-vmemmap.c > > @@ -469,5 +469,8 @@ struct page * __meminit > __populate_section_memmap(unsigned long pfn, > > if (r < 0) > > return NULL; > > > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + DIV_ROUND_UP(end - start, PAGE_SIZE)); > > + > > return pfn_to_page(pfn); > > } > > diff --git a/mm/sparse.c b/mm/sparse.c > > index 77d91e565045..7f67b5486cd1 100644 > > --- a/mm/sparse.c > > +++ b/mm/sparse.c > > @@ -14,7 +14,7 @@ > > #include > > #include > > #include > > - > > +#include > > #include "internal.h" > > #include > > > > @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned long > size, int nid) > > */ > > sparsemap_buf =3D memmap_alloc(size, section_map_size(), addr, ni= d, > true); > > sparsemap_buf_end =3D sparsemap_buf + size; > > +#ifndef CONFIG_SPARSEMEM_VMEMMAP > > + mod_node_early_perpage_metadata(nid, DIV_ROUND_UP(size, > PAGE_SIZE)); > > +#endif > > } > > > > static void __init sparse_buffer_fini(void) > > @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned long > pfn, unsigned long nr_pages, > > unsigned long start =3D (unsigned long) pfn_to_page(pfn); > > unsigned long end =3D start + nr_pages * sizeof(struct page); > > > > + __mod_node_page_state(page_pgdat(pfn_to_page(pfn)), > NR_PAGE_METADATA, > > + -1L * (DIV_ROUND_UP(end - start, > PAGE_SIZE))); > > vmemmap_free(start, end, altmap); > > } > > static void free_map_bootmem(struct page *memmap) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 00e81e99c6ee..070d2b3d2bcc 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] =3D { > > "pgpromote_success", > > "pgpromote_candidate", > > #endif > > + "nr_page_metadata", > > > > /* enum writeback_stat_item counters */ > > "nr_dirty_threshold", > > @@ -2274,4 +2275,27 @@ static int __init extfrag_debug_init(void) > > } > > > > module_init(extfrag_debug_init); > > + > > #endif > > + > > +/* > > + * Page metadata size (struct page and page_ext) in pages > > + */ > > +static unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata; > > + > > +void __init mod_node_early_perpage_metadata(int nid, long delta) > > +{ > > + early_perpage_metadata[nid] +=3D delta; > > +} > > + > > +void __init store_early_perpage_metadata(void) > > +{ > > + int nid; > > + struct pglist_data *pgdat; > > + > > + for_each_online_pgdat(pgdat) { > > + nid =3D pgdat->node_id; > > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, > > + early_perpage_metadata[nid]); > > + } > > +} > > --000000000000b8138406091f35c6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Oct 31, 2023 at 8:38=E2=80=AF= PM Muchun Song <muchun.song@lin= ux.dev> wrote:


On 2023/11/1 06:38, Sourav Panda wrote:
> Adds a new per-node PageMetadata field to
> /sys/devices/system/node/nodeN/meminfo
> and a global PageMetadata field to /proc/meminfo. This information can=
> be used by users to see how much memory is being used by per-page
> metadata, which can vary depending on build configuration, machine
> architecture, and system use.
>
> Per-page metadata is the amount of memory that Linux needs in order to=
> manage memory at the page granularity. The majority of such memory is<= br> > used by "struct page" and "page_ext" data structur= es. In contrast to
> most other memory consumption statistics, per-page metadata might not<= br> > be included in MemTotal. For example, MemTotal does not include memblo= ck
> allocations but includes buddy allocations. While on the other hand, > per-page metadata would include both memblock and buddy allocations. >
> This memory depends on build configurations, machine architectures, an= d
> the way system is used:
>
> Build configuration may include extra fields into "struct page&qu= ot;,
> and enable / disable "page_ext"
> Machine architecture defines base page sizes. For example 4K x86,
> 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> overhead is smaller on machines with larger page sizes.
> System use can change per-page overhead by using vmemmap
> optimizations with hugetlb pages, and emulated pmem devdax pages.
> Also, boot parameters can determine whether page_ext is needed
> to be allocated. This memory can be part of MemTotal or be outside
> MemTotal depending on whether the memory was hot-plugged, booted with,=
> or hugetlb memory was returned back to the system.
>
> Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Sourav Panda <souravpanda@google.com>
> ---
>=C2=A0 =C2=A0Documentation/filesystems/proc.rst |=C2=A0 3 +++
>=C2=A0 =C2=A0drivers/base/node.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 |=C2=A0 2 ++
>=C2=A0 =C2=A0fs/proc/meminfo.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 |=C2=A0 7 +++++++
>=C2=A0 =C2=A0include/linux/mmzone.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|=C2=A0 3 +++
>=C2=A0 =C2=A0include/linux/vmstat.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|=C2=A0 4 ++++
>=C2=A0 =C2=A0mm/hugetlb.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 11 ++++++++--
>=C2=A0 =C2=A0mm/hugetlb_vmemmap.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0|=C2=A0 8 ++++++--
>=C2=A0 =C2=A0mm/mm_init.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 3 +++
>=C2=A0 =C2=A0mm/page_alloc.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 1 +
>=C2=A0 =C2=A0mm/page_ext.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 32 +++++++++++++++++++++---------
>=C2=A0 =C2=A0mm/sparse-vmemmap.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 |=C2=A0 3 +++
>=C2=A0 =C2=A0mm/sparse.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 7 ++++++-
>=C2=A0 =C2=A0mm/vmstat.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 24 ++++++++++++++++++++++
>=C2=A0 =C2=A013 files changed, 94 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesy= stems/proc.rst
> index 2b59cff8be17..c121f2ef9432 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -987,6 +987,7 @@ Example output. You may not have all of these fiel= ds.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0AnonPages:=C2=A0 =C2=A0 =C2=A0 =C2=A04654780= kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Mapped:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0266244 kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Shmem:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 9976 kB
> +=C2=A0 =C2=A0 PageMetadata:=C2=A0 =C2=A0 =C2=A0513419 kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0KReclaimable:=C2=A0 =C2=A0 =C2=A0517708 kB >=C2=A0 =C2=A0 =C2=A0 =C2=A0Slab:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0660044 kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0SReclaimable:=C2=A0 =C2=A0 =C2=A0517708 kB > @@ -1089,6 +1090,8 @@ Mapped
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0files whi= ch have been mmapped, such as libraries
>=C2=A0 =C2=A0Shmem
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Total mem= ory used by shared memory (shmem) and tmpfs
> +PageMetadata
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Memory used for per-= page metadata
>=C2=A0 =C2=A0KReclaimable
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Kernel al= locations that the kernel will attempt to reclaim
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0under mem= ory pressure. Includes SReclaimable (below), and other
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 493d533f8375..da728542265f 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -428,6 +428,7 @@ static ssize_t node_read_meminfo(struct device *de= v,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Node %d Mapped:=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0%8lu kB\n"
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Node %d AnonPages:=C2=A0 =C2=A0 =C2=A0 %8= lu kB\n"
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Node %d Shmem:=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 %8lu kB\n"
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 "Node %d PageMetadata:=C2=A0 =C2=A0%8lu kB\n&quo= t;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Node %d KernelStack:=C2=A0 =C2=A0 %8lu kB= \n"
>=C2=A0 =C2=A0#ifdef CONFIG_SHADOW_CALL_STACK
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Node %d ShadowCallStack:%8lu kB\n" > @@ -458,6 +459,7 @@ static ssize_t node_read_meminfo(struct device *de= v,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 nid, K(i.sharedram),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 nid, K(node_page_state(pgdat, NR_PAGE_METADATA)),
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 nid, node_page_state(pgdat, NR_KERNEL_STACK_KB)= ,
>=C2=A0 =C2=A0#ifdef CONFIG_SHADOW_CALL_STACK
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),<= br> > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 45af9a989d40..f141bb2a550d 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -39,7 +39,9 @@ static int meminfo_proc_show(struct seq_file *m, voi= d *v)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0long available;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long pages[NR_LRU_LISTS];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long sreclaimable, sunreclaim;
> +=C2=A0 =C2=A0 =C2=A0unsigned long nr_page_metadata;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0int lru;
> +=C2=A0 =C2=A0 =C2=A0int nid;
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0si_meminfo(&i);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0si_swapinfo(&i);
> @@ -57,6 +59,10 @@ static int meminfo_proc_show(struct seq_file *m, vo= id *v)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sreclaimable =3D global_node_page_state_page= s(NR_SLAB_RECLAIMABLE_B);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sunreclaim =3D global_node_page_state_pages(= NR_SLAB_UNRECLAIMABLE_B);
>=C2=A0 =C2=A0
> +=C2=A0 =C2=A0 =C2=A0nr_page_metadata =3D 0;
> +=C2=A0 =C2=A0 =C2=A0for_each_online_node(nid)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nr_page_metadata +=3D= node_page_state(NODE_DATA(nid), NR_PAGE_METADATA);
> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "MemTotal:=C2=A0 =C2=A0 = =C2=A0 =C2=A0", i.totalram);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "MemFree:=C2=A0 =C2=A0 = =C2=A0 =C2=A0 ", i.freeram);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "MemAvailable:=C2=A0 =C2= =A0", available);
> @@ -104,6 +110,7 @@ static int meminfo_proc_show(struct seq_file *m, v= oid *v)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "Mapped:=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0gl= obal_node_page_state(NR_FILE_MAPPED));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "Shmem:=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 ", i.sharedram);
> +=C2=A0 =C2=A0 =C2=A0show_val_kb(m, "PageMetadata:=C2=A0 =C2=A0&q= uot;, nr_page_metadata);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "KReclaimable:=C2=A0 =C2= =A0", sreclaimable +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0gl= obal_node_page_state(NR_KERNEL_MISC_RECLAIMABLE));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0show_val_kb(m, "Slab:=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0", sreclaimable + sunreclaim);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc5b4b3..dda1ad522324 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -207,6 +207,9 @@ enum node_stat_item {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0PGPROMOTE_SUCCESS,=C2=A0 =C2=A0 =C2=A0 /* pr= omote successfully */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0PGPROMOTE_CANDIDATE,=C2=A0 =C2=A0 /* candida= te pages to promote */
>=C2=A0 =C2=A0#endif
> +=C2=A0 =C2=A0 =C2=A0NR_PAGE_METADATA,=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Pa= ge metadata size (struct page and page_ext)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * in pages
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0NR_VM_NODE_STAT_ITEMS
>=C2=A0 =C2=A0};
>=C2=A0 =C2=A0
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index fed855bae6d8..af096a881f03 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -656,4 +656,8 @@ static inline void lruvec_stat_sub_folio(struct fo= lio *folio,
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0 =C2=A0lruvec_stat_mod_folio(folio, idx, -folio_nr_= pages(folio));
>=C2=A0 =C2=A0}
> +
> +void __init mod_node_early_perpage_metadata(int nid, long delta);
> +void __init store_early_perpage_metadata(void);
> +
>=C2=A0 =C2=A0#endif /* _LINUX_VMSTAT_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1301ba7b2c9a..cd3158a9c7f3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1790,6 +1790,9 @@ static void __update_and_free_hugetlb_folio(stru= ct hstate *h,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0destroy_compound= _gigantic_folio(folio, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_gigantic_fo= lio(folio, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__node_stat_sub_folio= (folio, NR_PAGE_METADATA);
> +#endif
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__free_pages(&am= p;folio->page, huge_page_order(h));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0}
> @@ -2125,6 +2128,7 @@ static struct folio *alloc_buddy_hugetlb_folio(s= truct hstate *h,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *page;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0bool alloc_try_hard =3D true;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0bool retry =3D true;
> +=C2=A0 =C2=A0 =C2=A0struct folio *folio;
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * By default we always try hard to allocate= the page with
> @@ -2175,9 +2179,12 @@ static struct folio *alloc_buddy_hugetlb_folio(= struct hstate *h,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__count_vm_event= (HTLB_BUDDY_PGALLOC_FAIL);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return NULL;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -
> +=C2=A0 =C2=A0 =C2=A0folio =3D page_folio(page);
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0__node_stat_add_folio(folio, NR_PAGE_METADATA)
Seems you have not tested this patch with CONFIG_SPARSEMEM_VMEMMAP disabled= .
You missed ";" in the end.

Th= anks for reviewing this patch. I will submit v5 by testing against FLATMEM = and
SPARSEMEM (VMEMMAP=C2=A0disabled) memory model on ARM32. This= error was introduced
in v4.

=C2=A0

> +#endif

I am curious why we should account HugeTLB pages as metadata.

When HugeTLB pages are reserved, memory pertaining t= o redundant `struct page` are
returned to the buddy allocator for= other uses. This essentially reflects the change in the
amount o= f `struct pages` when HugeTLB pages are reserved and free'd.
= =C2=A0

>=C2=A0 =C2=A0 =C2=A0 =C2=A0__count_vm_event(HTLB_BUDDY_PGALLOC);
> -=C2=A0 =C2=A0 =C2=A0return page_folio(page);
> +=C2=A0 =C2=A0 =C2=A0return folio;
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0/*
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4b9734777f69..804a93d18cab 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -214,6 +214,7 @@ static inline void free_vmemmap_page(struct page *= page)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_bootmem_pag= e(page);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__free_page(page= );
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(page_pgdat(page), NR_PAGE_M= ETADATA, -1);
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0/* Free a list of the vmemmap pages */
> @@ -336,6 +337,7 @@ static int vmemmap_remap_free(unsigned long start,= unsigned long end,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0(void *)walk.reuse_addr);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0list_add(&wa= lk.reuse_page->lru, &vmemmap_pages);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA, 1);

What if allocation of walk.reuse_page fails?

Thank you for pointing this out. I will move the NR_PAGE_METADATA upd= ate within the=C2=A0
if ( walk.reuse_page ) clause to cover the c= ase where walk.reuse_page fails.
=C2=A0

>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * In order to make remapping routine most e= fficient for the huge pages,
> @@ -381,14 +383,16 @@ static int alloc_vmemmap_page_list(unsigned long= start, unsigned long end,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct list_head *list) >=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0 =C2=A0gfp_t gfp_mask =3D GFP_KERNEL | __GFP_RETRY_= MAYFAIL | __GFP_THISNODE;
> -=C2=A0 =C2=A0 =C2=A0unsigned long nr_pages =3D (end - start) >>= PAGE_SHIFT;
> +=C2=A0 =C2=A0 =C2=A0unsigned long nr_pages =3D DIV_ROUND_UP(end - sta= rt, PAGE_SIZE);

"end - start" is always multiple of PAGE_SIZE, why we need DIV_RO= UND_UP
here?

Thank you. I agree with this and = will revert this change in v5.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0 =C2=A0int nid =3D page_to_nid((struct page *)start= );
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *page, *next;
> +=C2=A0 =C2=A0 =C2=A0int i;
>=C2=A0 =C2=A0
> -=C2=A0 =C2=A0 =C2=A0while (nr_pages--) {
> +=C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < nr_pages; i++) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D alloc_p= ages_node(nid, gfp_mask, 0);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!page)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0goto out;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__mod_node_page_state= (page_pgdat(page), NR_PAGE_METADATA, 1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0list_add_tail(&a= mp;page->lru, list);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}

Count one by ine is really inefficient. Can't we count *nr_pages* at one time?

Thanks for suggesting this op= timization. I will modify the implementation to update the
metada= ta once as opposed=C2=A0to every iteration.
=C2=A0

>=C2=A0 =C2=A0
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 50f2f34745af..6997bf00945b 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -26,6 +26,7 @@
>=C2=A0 =C2=A0#include <linux/pgtable.h>
>=C2=A0 =C2=A0#include <linux/swap.h>
>=C2=A0 =C2=A0#include <linux/cma.h>
> +#include <linux/vmstat.h>
>=C2=A0 =C2=A0#include "internal.h"
>=C2=A0 =C2=A0#include "slab.h"
>=C2=A0 =C2=A0#include "shuffle.h"
> @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct pgl= ist_data *pgdat)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0panic("Failed to allocate %ld bytes for node %d memory ma= p\n",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size, pgdat->node_id);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat->node_m= em_map =3D map + offset;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mod_node_early_perpag= e_metadata(pgdat->node_id,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0DIV_ROUND_UP(size, PAGE_SIZE));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0pr_debug("%s: node %d, pgdat %08lx, nod= e_mem_map %08lx\n",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__func__, pgdat->node_id, (unsi= gned long)pgdat,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 85741403948f..522dc0c52610 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0for_each_online_pgdat(pgdat)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat->per_cp= u_nodestats =3D
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0alloc_percpu(struct per_cpu_nodestat);
> +=C2=A0 =C2=A0 =C2=A0store_early_perpage_metadata();
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0__meminit void zone_pcp_init(struct zone *zone)
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 4548fcc66d74..d8d6db9c3d75 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return -ENOMEM;<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0NODE_DATA(nid)->node_page_ext =3D base; >=C2=A0 =C2=A0 =C2=A0 =C2=A0total_usage +=3D table_size;
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0DIV_ROUND_UP(table_size, PAGE_SIZE));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
> @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t siz= e, int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0void *addr =3D NULL;
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0addr =3D alloc_pages_exact_nid(nid, size, fl= ags);
> -=C2=A0 =C2=A0 =C2=A0if (addr) {
> +=C2=A0 =C2=A0 =C2=A0if (addr)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kmemleak_alloc(a= ddr, size, 1, flags);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return addr;
> -=C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0else
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0addr =3D vzalloc_node= (size, nid);
>=C2=A0 =C2=A0
> -=C2=A0 =C2=A0 =C2=A0addr =3D vzalloc_node(size, nid);
> +=C2=A0 =C2=A0 =C2=A0if (addr) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mod_node_page_state(N= ODE_DATA(nid), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0DIV_ROUND_UP(size, PAGE_SI= ZE));
> +=C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return addr;
>=C2=A0 =C2=A0}
> @@ -303,18 +308,27 @@ static int __meminit init_section_page_ext(unsig= ned long pfn, int nid)
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0static void free_page_ext(void *addr)
>=C2=A0 =C2=A0{
> +=C2=A0 =C2=A0 =C2=A0size_t table_size;
> +=C2=A0 =C2=A0 =C2=A0struct page *page;
> +=C2=A0 =C2=A0 =C2=A0struct pglist_data *pgdat;
> +
> +=C2=A0 =C2=A0 =C2=A0table_size =3D page_ext_size * PAGES_PER_SECTION;=
> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (is_vmalloc_addr(addr)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D vmalloc_to_p= age(addr);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat =3D page_pgdat(= page);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vfree(addr);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *page =3D= virt_to_page(addr);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t table_size; > -
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0table_size =3D page_e= xt_size * PAGES_PER_SECTION;
> -
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D virt_to_page= (addr);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgdat =3D page_pgdat(= page);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(PageReser= ved(page));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kmemleak_free(ad= dr);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0free_pages_exact= (addr, table_size);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(pgdat, NR_PAGE_METADATA, > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0-1L * (DIV_ROUND_UP(table_size, PAGE_SIZE))); > +
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0static void __free_page_ext(unsigned long pfn)
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index a2cbe44c48e1..2bc67b2c2aa2 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -469,5 +469,8 @@ struct page * __meminit __populate_section_memmap(= unsigned long pfn,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (r < 0)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return NULL;
>=C2=A0 =C2=A0
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(NODE_DATA(nid), NR_PAGE_MET= ADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0DIV_ROUND_UP(end - start, PAGE_SIZE));
> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return pfn_to_page(pfn);
>=C2=A0 =C2=A0}
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e565045..7f67b5486cd1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -14,7 +14,7 @@
>=C2=A0 =C2=A0#include <linux/swap.h>
>=C2=A0 =C2=A0#include <linux/swapops.h>
>=C2=A0 =C2=A0#include <linux/bootmem_info.h>
> -
> +#include <linux/vmstat.h>
>=C2=A0 =C2=A0#include "internal.h"
>=C2=A0 =C2=A0#include <asm/dma.h>
>=C2=A0 =C2=A0
> @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned lon= g size, int nid)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sparsemap_buf =3D memmap_alloc(size, section= _map_size(), addr, nid, true);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0sparsemap_buf_end =3D sparsemap_buf + size;<= br> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +=C2=A0 =C2=A0 =C2=A0mod_node_early_perpage_metadata(nid, DIV_ROUND_UP= (size, PAGE_SIZE));
> +#endif
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0static void __init sparse_buffer_fini(void)
> @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned lon= g pfn, unsigned long nr_pages,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long start =3D (unsigned long) pfn_= to_page(pfn);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long end =3D start + nr_pages * siz= eof(struct page);
>=C2=A0 =C2=A0
> +=C2=A0 =C2=A0 =C2=A0__mod_node_page_state(page_pgdat(pfn_to_page(pfn)= ), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0-1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0vmemmap_free(start, end, altmap);
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0static void free_map_bootmem(struct page *memmap)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 00e81e99c6ee..070d2b3d2bcc 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] =3D {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"pgpromote_success",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"pgpromote_candidate",
>=C2=A0 =C2=A0#endif
> +=C2=A0 =C2=A0 =C2=A0"nr_page_metadata",
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* enum writeback_stat_item counters */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0"nr_dirty_threshold",
> @@ -2274,4 +2275,27 @@ static int __init extfrag_debug_init(void)
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
>=C2=A0 =C2=A0module_init(extfrag_debug_init);
> +
>=C2=A0 =C2=A0#endif
> +
> +/*
> + * Page metadata size (struct page and page_ext) in pages
> + */
> +static unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata;=
> +
> +void __init mod_node_early_perpage_metadata(int nid, long delta)
> +{
> +=C2=A0 =C2=A0 =C2=A0early_perpage_metadata[nid] +=3D delta;
> +}
> +
> +void __init store_early_perpage_metadata(void)
> +{
> +=C2=A0 =C2=A0 =C2=A0int nid;
> +=C2=A0 =C2=A0 =C2=A0struct pglist_data *pgdat;
> +
> +=C2=A0 =C2=A0 =C2=A0for_each_online_pgdat(pgdat) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nid =3D pgdat->nod= e_id;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__mod_node_page_state= (NODE_DATA(nid), NR_PAGE_METADATA,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0early_perpage_metad= ata[nid]);
> +=C2=A0 =C2=A0 =C2=A0}
> +}

--000000000000b8138406091f35c6--