* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
@ 2023-09-13 17:56 ` Matthew Wilcox
2023-09-14 21:04 ` Sourav Panda
2023-09-13 19:34 ` kernel test robot
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-09-13 17:56 UTC (permalink / raw)
To: Sourav Panda
Cc: corbet, gregkh, rafael, akpm, mike.kravetz, muchun.song, rppt,
david, rdunlap, chenlinxuan, yang.yang29, tomas.mudrunka,
bhelgaas, ivan, pasha.tatashin, yosryahmed, hannes, shakeelb,
kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-kernel, linux-fsdevel, linux-doc,
linux-mm
On Wed, Sep 13, 2023 at 10:30:00AM -0700, Sourav Panda wrote:
> @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>
> while (nr_pages--) {
> page = alloc_pages_node(nid, gfp_mask, 0);
> - if (!page)
> + if (!page) {
> goto out;
> + } else {
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> + NR_PAGE_METADATA, 1);
> + }
> list_add_tail(&page->lru, list);
What a strange way of writing this. Why not simply:
if (!page)
goto out;
+ __mod_node_page_state(NODE_DATA(page_to_nid(page)),
+ NR_PAGE_METADATA, 1);
list_add_tail(&page->lru, list);
> @@ -314,6 +319,10 @@ static void free_page_ext(void *addr)
> BUG_ON(PageReserved(page));
> kmemleak_free(addr);
> free_pages_exact(addr, table_size);
> +
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)), NR_PAGE_METADATA,
> + (long)-1 * (PAGE_ALIGN(table_size) >> PAGE_SHIFT));
Why not spell that as "-1L"?
And while I'm asking questions, why NODE_DATA(page_to_nid(page)) instead
of page_pgdat(page)?
> @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void)
> }
>
> module_init(extfrag_debug_init);
> +
> +// Page metadata size (struct page and page_ext) in pages
Don't use // comments.
> +void __init writeout_early_perpage_metadata(void)
"writeout" is something swap does. I'm sure this has a better name,
though I can't think what it might be.
> +{
> + int nid;
> + struct pglist_data *pgdat;
> +
> + for_each_online_pgdat(pgdat) {
> + nid = pgdat->node_id;
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> + early_perpage_metadata[nid]);
> + }
> +}
> #endif
> --
> 2.42.0.283.g2d96d420d3-goog
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:56 ` Matthew Wilcox
@ 2023-09-14 21:04 ` Sourav Panda
0 siblings, 0 replies; 13+ messages in thread
From: Sourav Panda @ 2023-09-14 21:04 UTC (permalink / raw)
To: Matthew Wilcox
Cc: corbet, gregkh, rafael, akpm, mike.kravetz, muchun.song, rppt,
david, rdunlap, chenlinxuan, yang.yang29, tomas.mudrunka,
bhelgaas, ivan, pasha.tatashin, yosryahmed, hannes, shakeelb,
kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-kernel, linux-fsdevel, linux-doc,
linux-mm
[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]
Hi Matthew Wilcox,
Thank you very much for reviewing my patch. Please find my responses below.
On Wed, Sep 13, 2023 at 10:56 AM Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Sep 13, 2023 at 10:30:00AM -0700, Sourav Panda wrote:
> > @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long
> start, unsigned long end,
> >
> > while (nr_pages--) {
> > page = alloc_pages_node(nid, gfp_mask, 0);
> > - if (!page)
> > + if (!page) {
> > goto out;
> > + } else {
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> > + NR_PAGE_METADATA, 1);
> > + }
> > list_add_tail(&page->lru, list);
>
> What a strange way of writing this. Why not simply:
>
> if (!page)
> goto out;
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> + NR_PAGE_METADATA, 1);
> list_add_tail(&page->lru, list);
>
Thank you Matthew Wilcox for your comment. I agree with you and will make
the corresponding change.
>
> > @@ -314,6 +319,10 @@ static void free_page_ext(void *addr)
> > BUG_ON(PageReserved(page));
> > kmemleak_free(addr);
> > free_pages_exact(addr, table_size);
> > +
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> NR_PAGE_METADATA,
> > + (long)-1 * (PAGE_ALIGN(table_size)
> >> PAGE_SHIFT));
>
> Why not spell that as "-1L"?
>
> And while I'm asking questions, why NODE_DATA(page_to_nid(page)) instead
> of page_pgdat(page)?
>
Yes, thank you! I shall make both the suggested changes.
>
> > @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void)
> > }
> >
> > module_init(extfrag_debug_init);
> > +
> > +// Page metadata size (struct page and page_ext) in pages
>
> Don't use // comments.
>
Thank you. I shall replace them with /* text */ to be uniform with the
document.
>
> > +void __init writeout_early_perpage_metadata(void)
>
> "writeout" is something swap does. I'm sure this has a better name,
> though I can't think what it might be.
>
> > +{
> > + int nid;
> > + struct pglist_data *pgdat;
> > +
> > + for_each_online_pgdat(pgdat) {
> > + nid = pgdat->node_id;
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> > + early_perpage_metadata[nid]);
> > + }
> > +}
> > #endif
> > --
> > 2.42.0.283.g2d96d420d3-goog
> >
Yep, thank you! Does store_early_perpage_metadata seem better to you?
[-- Attachment #2: Type: text/html, Size: 4149 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
2023-09-13 17:56 ` Matthew Wilcox
@ 2023-09-13 19:34 ` kernel test robot
2023-09-13 20:51 ` Mike Rapoport
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2023-09-13 19:34 UTC (permalink / raw)
To: Sourav Panda, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, rppt, david, rdunlap, chenlinxuan, yang.yang29,
tomas.mudrunka, bhelgaas, ivan, pasha.tatashin, yosryahmed,
hannes, shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan,
vbabka, Liam.Howlett, surenb, linux-kernel, linux-fsdevel,
linux-doc, linux-mm
Cc: oe-kbuild-all
Hi Sourav,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus linus/master v6.6-rc1 next-20230913]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Sourav-Panda/mm-report-per-page-metadata-information/20230914-013201
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230913173000.4016218-2-souravpanda%40google.com
patch subject: [PATCH v1 1/1] mm: report per-page metadata information
config: i386-tinyconfig (https://download.01.org/0day-ci/archive/20230914/202309140322.cF62Kywb-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/202309140322.cF62Kywb-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309140322.cF62Kywb-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: mm/mm_init.o: in function `free_area_init':
>> mm_init.c:(.init.text+0x842): undefined reference to `mod_node_early_perpage_metadata'
ld: mm/page_alloc.o: in function `setup_per_cpu_pageset':
>> page_alloc.c:(.init.text+0x60): undefined reference to `writeout_early_perpage_metadata'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
2023-09-13 17:56 ` Matthew Wilcox
2023-09-13 19:34 ` kernel test robot
@ 2023-09-13 20:51 ` Mike Rapoport
2023-09-14 12:47 ` Matthew Wilcox
2023-09-14 22:41 ` Sourav Panda
2023-09-13 21:53 ` kernel test robot
` (2 subsequent siblings)
5 siblings, 2 replies; 13+ messages in thread
From: Mike Rapoport @ 2023-09-13 20:51 UTC (permalink / raw)
To: Sourav Panda
Cc: corbet, gregkh, rafael, akpm, mike.kravetz, muchun.song, david,
rdunlap, chenlinxuan, yang.yang29, tomas.mudrunka, bhelgaas,
ivan, pasha.tatashin, yosryahmed, hannes, shakeelb,
kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-kernel, linux-fsdevel, linux-doc,
linux-mm
On Wed, Sep 13, 2023 at 10:30:00AM -0700, Sourav Panda wrote:
> Adds a new per-node PageMetadata field to
> /sys/devices/system/node/nodeN/meminfo
> and a global PageMetadata field to /proc/meminfo. This information can
> be used by users to see how much memory is being used by per-page
> metadata, which can vary depending on build configuration, machine
> architecture, and system use.
>
> Per-page metadata is the amount of memory that Linux needs in order to
> manage memory at the page granularity. The majority of such memory is
> used by "struct page" and "page_ext" data structures.
>
> This memory depends on build configurations, machine architectures, and
> the way system is used:
>
> Build configuration may include extra fields into "struct page",
> and enable / disable "page_ext"
> Machine architecture defines base page sizes. For example 4K x86,
> 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> overhead is smaller on machines with larger page sizes.
> System use can change per-page overhead by using vmemmap
> optimizations with hugetlb pages, and emulated pmem devdax pages.
> Also, boot parameters can determine whether page_ext is needed
> to be allocated. This memory can be part of MemTotal or be outside
> MemTotal depending on whether the memory was hot-plugged, booted with,
> or hugetlb memory was returned back to the system.
>
> Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Sourav Panda <souravpanda@google.com>
> ---
> Documentation/filesystems/proc.rst | 3 +++
> drivers/base/node.c | 2 ++
> fs/proc/meminfo.c | 7 +++++++
> include/linux/mmzone.h | 3 +++
> include/linux/vmstat.h | 4 ++++
> mm/hugetlb.c | 8 +++++++-
> mm/hugetlb_vmemmap.c | 9 ++++++++-
> mm/mm_init.c | 3 +++
> mm/page_alloc.c | 1 +
> mm/page_ext.c | 17 +++++++++++++----
> mm/sparse-vmemmap.c | 3 +++
> mm/sparse.c | 7 ++++++-
> mm/vmstat.c | 21 +++++++++++++++++++++
> 13 files changed, 81 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 2b59cff8be17..c121f2ef9432 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -987,6 +987,7 @@ Example output. You may not have all of these fields.
> AnonPages: 4654780 kB
> Mapped: 266244 kB
> Shmem: 9976 kB
> + PageMetadata: 513419 kB
> KReclaimable: 517708 kB
> Slab: 660044 kB
> SReclaimable: 517708 kB
> @@ -1089,6 +1090,8 @@ Mapped
> files which have been mmapped, such as libraries
> Shmem
> Total memory used by shared memory (shmem) and tmpfs
> +PageMetadata
> + Memory used for per-page metadata
> KReclaimable
> Kernel allocations that the kernel will attempt to reclaim
> under memory pressure. Includes SReclaimable (below), and other
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 493d533f8375..da728542265f 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -428,6 +428,7 @@ static ssize_t node_read_meminfo(struct device *dev,
> "Node %d Mapped: %8lu kB\n"
> "Node %d AnonPages: %8lu kB\n"
> "Node %d Shmem: %8lu kB\n"
> + "Node %d PageMetadata: %8lu kB\n"
> "Node %d KernelStack: %8lu kB\n"
> #ifdef CONFIG_SHADOW_CALL_STACK
> "Node %d ShadowCallStack:%8lu kB\n"
> @@ -458,6 +459,7 @@ static ssize_t node_read_meminfo(struct device *dev,
> nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
> nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
> nid, K(i.sharedram),
> + nid, K(node_page_state(pgdat, NR_PAGE_METADATA)),
> nid, node_page_state(pgdat, NR_KERNEL_STACK_KB),
> #ifdef CONFIG_SHADOW_CALL_STACK
> nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 45af9a989d40..f141bb2a550d 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -39,7 +39,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> long available;
> unsigned long pages[NR_LRU_LISTS];
> unsigned long sreclaimable, sunreclaim;
> + unsigned long nr_page_metadata;
> int lru;
> + int nid;
>
> si_meminfo(&i);
> si_swapinfo(&i);
> @@ -57,6 +59,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B);
> sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B);
>
> + nr_page_metadata = 0;
> + for_each_online_node(nid)
> + nr_page_metadata += node_page_state(NODE_DATA(nid), NR_PAGE_METADATA);
> +
> show_val_kb(m, "MemTotal: ", i.totalram);
> show_val_kb(m, "MemFree: ", i.freeram);
> show_val_kb(m, "MemAvailable: ", available);
> @@ -104,6 +110,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> show_val_kb(m, "Mapped: ",
> global_node_page_state(NR_FILE_MAPPED));
> show_val_kb(m, "Shmem: ", i.sharedram);
> + show_val_kb(m, "PageMetadata: ", nr_page_metadata);
> show_val_kb(m, "KReclaimable: ", sreclaimable +
> global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE));
> show_val_kb(m, "Slab: ", sreclaimable + sunreclaim);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc5b4b3..dda1ad522324 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -207,6 +207,9 @@ enum node_stat_item {
> PGPROMOTE_SUCCESS, /* promote successfully */
> PGPROMOTE_CANDIDATE, /* candidate pages to promote */
> #endif
> + NR_PAGE_METADATA, /* Page metadata size (struct page and page_ext)
> + * in pages
> + */
> NR_VM_NODE_STAT_ITEMS
> };
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index fed855bae6d8..b5c292560f37 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -656,4 +656,8 @@ static inline void lruvec_stat_sub_folio(struct folio *folio,
> {
> lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
> }
> +
> +void __init mod_node_early_perpage_metadata(int nid, long delta);
> +void __init writeout_early_perpage_metadata(void);
> +
> #endif /* _LINUX_VMSTAT_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ba6d39b71cb1..ca36751be50e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1758,6 +1758,10 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
> destroy_compound_gigantic_folio(folio, huge_page_order(h));
> free_gigantic_folio(folio, huge_page_order(h));
> } else {
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> + __mod_node_page_state(NODE_DATA(page_to_nid(&folio->page)),
> + NR_PAGE_METADATA, -huge_page_order(h));
I don't think memory map will change here with classic SPARSEMEM
> +#endif
> __free_pages(&folio->page, huge_page_order(h));
> }
> }
> @@ -2143,7 +2147,9 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
> __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> return NULL;
> }
> -
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, huge_page_order(h));
> +#endif
> __count_vm_event(HTLB_BUDDY_PGALLOC);
> return page_folio(page);
> }
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4b9734777f69..7f920bfa8e79 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -214,6 +214,8 @@ static inline void free_vmemmap_page(struct page *page)
> free_bootmem_page(page);
> else
> __free_page(page);
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> + NR_PAGE_METADATA, -1);
> }
>
> /* Free a list of the vmemmap pages */
> @@ -336,6 +338,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> (void *)walk.reuse_addr);
> list_add(&walk.reuse_page->lru, &vmemmap_pages);
> }
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, 1);
>
> /*
> * In order to make remapping routine most efficient for the huge pages,
> @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>
> while (nr_pages--) {
> page = alloc_pages_node(nid, gfp_mask, 0);
> - if (!page)
> + if (!page) {
> goto out;
> + } else {
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> + NR_PAGE_METADATA, 1);
We can update this once for nr_pages outside the loop, cannot we?
> + }
> list_add_tail(&page->lru, list);
> }
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 50f2f34745af..e02dce7e2e9a 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -26,6 +26,7 @@
> #include <linux/pgtable.h>
> #include <linux/swap.h>
> #include <linux/cma.h>
> +#include <linux/vmstat.h>
> #include "internal.h"
> #include "slab.h"
> #include "shuffle.h"
> @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct pglist_data *pgdat)
> panic("Failed to allocate %ld bytes for node %d memory map\n",
> size, pgdat->node_id);
> pgdat->node_mem_map = map + offset;
> + mod_node_early_perpage_metadata(pgdat->node_id,
> + PAGE_ALIGN(size) >> PAGE_SHIFT);
> }
> pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n",
> __func__, pgdat->node_id, (unsigned long)pgdat,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..4e295d5087f4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void)
> for_each_online_pgdat(pgdat)
> pgdat->per_cpu_nodestats =
> alloc_percpu(struct per_cpu_nodestat);
> + writeout_early_perpage_metadata();
Why it's called here?
You can copy early stats to actual node stats as soon as the nodes and page
allocator are initialized.
> }
>
> __meminit void zone_pcp_init(struct zone *zone)
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 4548fcc66d74..b5b9d3079e20 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid)
> return -ENOMEM;
> NODE_DATA(nid)->node_page_ext = base;
> total_usage += table_size;
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> + PAGE_ALIGN(table_size) >> PAGE_SHIFT);
> return 0;
> }
>
> @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t size, int nid)
> void *addr = NULL;
>
> addr = alloc_pages_exact_nid(nid, size, flags);
> - if (addr) {
> + if (addr)
> kmemleak_alloc(addr, size, 1, flags);
> - return addr;
> - }
> + else
> + addr = vzalloc_node(size, nid);
>
> - addr = vzalloc_node(size, nid);
> + if (addr) {
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> + PAGE_ALIGN(size) >> PAGE_SHIFT);
> + }
>
> return addr;
> }
> @@ -314,6 +319,10 @@ static void free_page_ext(void *addr)
> BUG_ON(PageReserved(page));
> kmemleak_free(addr);
> free_pages_exact(addr, table_size);
> +
> + __mod_node_page_state(NODE_DATA(page_to_nid(page)), NR_PAGE_METADATA,
> + (long)-1 * (PAGE_ALIGN(table_size) >> PAGE_SHIFT));
> +
what happens with vmalloc()ed page_ext?
> }
> }
>
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index a2cbe44c48e1..e33f302db7c6 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -469,5 +469,8 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn,
> if (r < 0)
> return NULL;
>
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> + PAGE_ALIGN(end - start) >> PAGE_SHIFT);
> +
> return pfn_to_page(pfn);
> }
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e565045..db78233a85ef 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -14,7 +14,7 @@
> #include <linux/swap.h>
> #include <linux/swapops.h>
> #include <linux/bootmem_info.h>
> -
> +#include <linux/vmstat.h>
> #include "internal.h"
> #include <asm/dma.h>
>
> @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> */
> sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid, true);
> sparsemap_buf_end = sparsemap_buf + size;
> +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> + mod_node_early_perpage_metadata(nid, PAGE_ALIGN(size) >> PAGE_SHIFT);
All early struct pages are allocated in memmap_alloc(). It'd make sense to update
the counter there.
> +#endif
> }
>
> static void __init sparse_buffer_fini(void)
> @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> unsigned long start = (unsigned long) pfn_to_page(pfn);
> unsigned long end = start + nr_pages * sizeof(struct page);
>
> + __mod_node_page_state(NODE_DATA(page_to_nid(pfn_to_page(pfn))), NR_PAGE_METADATA,
> + (long)-1 * (PAGE_ALIGN(end - start) >> PAGE_SHIFT));
> vmemmap_free(start, end, altmap);
> }
> static void free_map_bootmem(struct page *memmap)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 00e81e99c6ee..731eb5264b49 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] = {
> "pgpromote_success",
> "pgpromote_candidate",
> #endif
> + "nr_page_metadata",
>
> /* enum writeback_stat_item counters */
> "nr_dirty_threshold",
> @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void)
> }
>
> module_init(extfrag_debug_init);
> +
> +// Page metadata size (struct page and page_ext) in pages
> +unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata;
static?
> +
> +void __init mod_node_early_perpage_metadata(int nid, long delta)
> +{
> + early_perpage_metadata[nid] += delta;
> +}
> +
> +void __init writeout_early_perpage_metadata(void)
> +{
> + int nid;
> + struct pglist_data *pgdat;
> +
> + for_each_online_pgdat(pgdat) {
> + nid = pgdat->node_id;
> + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> + early_perpage_metadata[nid]);
> + }
> +}
> #endif
> --
> 2.42.0.283.g2d96d420d3-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 20:51 ` Mike Rapoport
@ 2023-09-14 12:47 ` Matthew Wilcox
2023-09-14 22:45 ` Sourav Panda
2023-09-14 22:41 ` Sourav Panda
1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-09-14 12:47 UTC (permalink / raw)
To: Mike Rapoport
Cc: Sourav Panda, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, david, rdunlap, chenlinxuan, yang.yang29,
tomas.mudrunka, bhelgaas, ivan, pasha.tatashin, yosryahmed,
hannes, shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan,
vbabka, Liam.Howlett, surenb, linux-kernel, linux-fsdevel,
linux-doc, linux-mm
On Wed, Sep 13, 2023 at 11:51:25PM +0300, Mike Rapoport wrote:
> > @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> >
> > while (nr_pages--) {
> > page = alloc_pages_node(nid, gfp_mask, 0);
> > - if (!page)
> > + if (!page) {
> > goto out;
> > + } else {
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> > + NR_PAGE_METADATA, 1);
>
> We can update this once for nr_pages outside the loop, cannot we?
Except that nr_pages is being used as the loop counter.
Probably best to turn this into a normal (i = 0; i < nr_pages; i++)
loop, and then we can do as you say. But this isn't a particularly
interesting high-performance loop.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-14 12:47 ` Matthew Wilcox
@ 2023-09-14 22:45 ` Sourav Panda
0 siblings, 0 replies; 13+ messages in thread
From: Sourav Panda @ 2023-09-14 22:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Mike Rapoport, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, david, rdunlap, chenlinxuan, yang.yang29,
tomas.mudrunka, bhelgaas, ivan, pasha.tatashin, yosryahmed,
hannes, shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan,
vbabka, Liam.Howlett, surenb, linux-kernel, linux-fsdevel,
linux-doc, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]
Thank you Matthew Wilcox.
On Thu, Sep 14, 2023 at 5:48 AM Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Sep 13, 2023 at 11:51:25PM +0300, Mike Rapoport wrote:
> > > @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long
> start, unsigned long end,
> > >
> > > while (nr_pages--) {
> > > page = alloc_pages_node(nid, gfp_mask, 0);
> > > - if (!page)
> > > + if (!page) {
> > > goto out;
> > > + } else {
> > > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> > > + NR_PAGE_METADATA, 1);
> >
> > We can update this once for nr_pages outside the loop, cannot we?
>
> Except that nr_pages is being used as the loop counter.
> Probably best to turn this into a normal (i = 0; i < nr_pages; i++)
> loop, and then we can do as you say. But this isn't a particularly
> interesting high-performance loop.
>
I agree. I shall turn this into a normal (i = 0; i < nr_pages; i++) loop
and then make the relevant changes.
[-- Attachment #2: Type: text/html, Size: 1641 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 20:51 ` Mike Rapoport
2023-09-14 12:47 ` Matthew Wilcox
@ 2023-09-14 22:41 ` Sourav Panda
1 sibling, 0 replies; 13+ messages in thread
From: Sourav Panda @ 2023-09-14 22:41 UTC (permalink / raw)
To: Mike Rapoport
Cc: corbet, gregkh, rafael, akpm, mike.kravetz, muchun.song, david,
rdunlap, chenlinxuan, yang.yang29, tomas.mudrunka, bhelgaas,
ivan, pasha.tatashin, yosryahmed, hannes, shakeelb,
kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-kernel, linux-fsdevel, linux-doc,
linux-mm
[-- Attachment #1: Type: text/plain, Size: 9140 bytes --]
Thank you Mike Rapoport for reviewing this patch series. Please find my
responses below.
>
> > #endif /* _LINUX_VMSTAT_H */
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ba6d39b71cb1..ca36751be50e 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1758,6 +1758,10 @@ static void
> __update_and_free_hugetlb_folio(struct hstate *h,
> > destroy_compound_gigantic_folio(folio, huge_page_order(h));
> > free_gigantic_folio(folio, huge_page_order(h));
> > } else {
> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> > + __mod_node_page_state(NODE_DATA(page_to_nid(&folio->page)),
> > + NR_PAGE_METADATA,
> -huge_page_order(h));
>
> I don't think memory map will change here with classic SPARSEMEM
>
Thank you. Yes, I agree with your comment.
>
> > +#endif
> > __free_pages(&folio->page, huge_page_order(h));
> > }
> > }
> > @@ -2143,7 +2147,9 @@ static struct folio
> *alloc_buddy_hugetlb_folio(struct hstate *h,
> > __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> > return NULL;
> > }
> > -
> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> huge_page_order(h));
> > +#endif
> > __count_vm_event(HTLB_BUDDY_PGALLOC);
> > return page_folio(page);
> > }
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index 4b9734777f69..7f920bfa8e79 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -214,6 +214,8 @@ static inline void free_vmemmap_page(struct page
> *page)
> > free_bootmem_page(page);
> > else
> > __free_page(page);
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> > + NR_PAGE_METADATA, -1);
> > }
> >
> > /* Free a list of the vmemmap pages */
> > @@ -336,6 +338,7 @@ static int vmemmap_remap_free(unsigned long start,
> unsigned long end,
> > (void *)walk.reuse_addr);
> > list_add(&walk.reuse_page->lru, &vmemmap_pages);
> > }
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA, 1);
> >
> > /*
> > * In order to make remapping routine most efficient for the huge
> pages,
> > @@ -387,8 +390,12 @@ static int alloc_vmemmap_page_list(unsigned long
> start, unsigned long end,
> >
> > while (nr_pages--) {
> > page = alloc_pages_node(nid, gfp_mask, 0);
> > - if (!page)
> > + if (!page) {
> > goto out;
> > + } else {
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> > + NR_PAGE_METADATA, 1);
>
> We can update this once for nr_pages outside the loop, cannot we?
>
Thank you for the comment. I agree with you and shall incorporate it.
>
> > + }
> > list_add_tail(&page->lru, list);
> > }
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 50f2f34745af..e02dce7e2e9a 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -26,6 +26,7 @@
> > #include <linux/pgtable.h>
> > #include <linux/swap.h>
> > #include <linux/cma.h>
> > +#include <linux/vmstat.h>
> > #include "internal.h"
> > #include "slab.h"
> > #include "shuffle.h"
> > @@ -1656,6 +1657,8 @@ static void __init alloc_node_mem_map(struct
> pglist_data *pgdat)
> > panic("Failed to allocate %ld bytes for node %d
> memory map\n",
> > size, pgdat->node_id);
> > pgdat->node_mem_map = map + offset;
> > + mod_node_early_perpage_metadata(pgdat->node_id,
> > + PAGE_ALIGN(size) >>
> PAGE_SHIFT);
> > }
> > pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n",
> > __func__, pgdat->node_id, (unsigned
> long)pgdat,
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0c5be12f9336..4e295d5087f4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5443,6 +5443,7 @@ void __init setup_per_cpu_pageset(void)
> > for_each_online_pgdat(pgdat)
> > pgdat->per_cpu_nodestats =
> > alloc_percpu(struct per_cpu_nodestat);
> > + writeout_early_perpage_metadata();
>
> Why it's called here?
> You can copy early stats to actual node stats as soon as the nodes and page
> allocator are initialized.
>
Thank you for mentioning this. I agree with you and shall move it there.
>
> > }
> >
> > __meminit void zone_pcp_init(struct zone *zone)
> > diff --git a/mm/page_ext.c b/mm/page_ext.c
> > index 4548fcc66d74..b5b9d3079e20 100644
> > --- a/mm/page_ext.c
> > +++ b/mm/page_ext.c
> > @@ -201,6 +201,8 @@ static int __init alloc_node_page_ext(int nid)
> > return -ENOMEM;
> > NODE_DATA(nid)->node_page_ext = base;
> > total_usage += table_size;
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> > + PAGE_ALIGN(table_size) >> PAGE_SHIFT);
> > return 0;
> > }
> >
> > @@ -255,12 +257,15 @@ static void *__meminit alloc_page_ext(size_t size,
> int nid)
> > void *addr = NULL;
> >
> > addr = alloc_pages_exact_nid(nid, size, flags);
> > - if (addr) {
> > + if (addr)
> > kmemleak_alloc(addr, size, 1, flags);
> > - return addr;
> > - }
> > + else
> > + addr = vzalloc_node(size, nid);
> >
> > - addr = vzalloc_node(size, nid);
> > + if (addr) {
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> > + PAGE_ALIGN(size) >> PAGE_SHIFT);
> > + }
> >
> > return addr;
> > }
> > @@ -314,6 +319,10 @@ static void free_page_ext(void *addr)
> > BUG_ON(PageReserved(page));
> > kmemleak_free(addr);
> > free_pages_exact(addr, table_size);
> > +
> > + __mod_node_page_state(NODE_DATA(page_to_nid(page)),
> NR_PAGE_METADATA,
> > + (long)-1 * (PAGE_ALIGN(table_size)
> >> PAGE_SHIFT));
> > +
>
> what happens with vmalloc()ed page_ext?
>
Thank you for pointing this out. I shall also make this change for
vmalloc()ed page_ext.
>
> > }
> > }
> >
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index a2cbe44c48e1..e33f302db7c6 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -469,5 +469,8 @@ struct page * __meminit
> __populate_section_memmap(unsigned long pfn,
> > if (r < 0)
> > return NULL;
> >
> > + __mod_node_page_state(NODE_DATA(nid), NR_PAGE_METADATA,
> > + PAGE_ALIGN(end - start) >> PAGE_SHIFT);
> > +
> > return pfn_to_page(pfn);
> > }
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 77d91e565045..db78233a85ef 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -14,7 +14,7 @@
> > #include <linux/swap.h>
> > #include <linux/swapops.h>
> > #include <linux/bootmem_info.h>
> > -
> > +#include <linux/vmstat.h>
> > #include "internal.h"
> > #include <asm/dma.h>
> >
> > @@ -465,6 +465,9 @@ static void __init sparse_buffer_init(unsigned long
> size, int nid)
> > */
> > sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid,
> true);
> > sparsemap_buf_end = sparsemap_buf + size;
> > +#ifndef CONFIG_SPARSEMEM_VMEMMAP
> > + mod_node_early_perpage_metadata(nid, PAGE_ALIGN(size) >>
> PAGE_SHIFT);
>
> All early struct pages are allocated in memmap_alloc(). It'd make sense to
> update
> the counter there.
>
Thanks for the comment. The reason why we did not do it in memmap_alloc()
is because the struct pages can decrease as well.
>
> > +#endif
> > }
> >
> > static void __init sparse_buffer_fini(void)
> > @@ -641,6 +644,8 @@ static void depopulate_section_memmap(unsigned long
> pfn, unsigned long nr_pages,
> > unsigned long start = (unsigned long) pfn_to_page(pfn);
> > unsigned long end = start + nr_pages * sizeof(struct page);
> >
> > + __mod_node_page_state(NODE_DATA(page_to_nid(pfn_to_page(pfn))),
> NR_PAGE_METADATA,
> > + (long)-1 * (PAGE_ALIGN(end - start) >>
> PAGE_SHIFT));
> > vmemmap_free(start, end, altmap);
> > }
> > static void free_map_bootmem(struct page *memmap)
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 00e81e99c6ee..731eb5264b49 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -1245,6 +1245,7 @@ const char * const vmstat_text[] = {
> > "pgpromote_success",
> > "pgpromote_candidate",
> > #endif
> > + "nr_page_metadata",
> >
> > /* enum writeback_stat_item counters */
> > "nr_dirty_threshold",
> > @@ -2274,4 +2275,24 @@ static int __init extfrag_debug_init(void)
> > }
> >
> > module_init(extfrag_debug_init);
> > +
> > +// Page metadata size (struct page and page_ext) in pages
> > +unsigned long early_perpage_metadata[MAX_NUMNODES] __initdata;
>
> static?
>
Thanks for pointing this out. I shall make __initdata static in the next
version of the patch.
[-- Attachment #2: Type: text/html, Size: 12433 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
` (2 preceding siblings ...)
2023-09-13 20:51 ` Mike Rapoport
@ 2023-09-13 21:53 ` kernel test robot
2023-09-14 13:00 ` David Hildenbrand
2023-09-18 8:14 ` kernel test robot
5 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2023-09-13 21:53 UTC (permalink / raw)
To: Sourav Panda, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, rppt, david, rdunlap, chenlinxuan, yang.yang29,
tomas.mudrunka, bhelgaas, ivan, pasha.tatashin, yosryahmed,
hannes, shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan,
vbabka, Liam.Howlett, surenb, linux-kernel, linux-fsdevel,
linux-doc, linux-mm
Cc: oe-kbuild-all
Hi Sourav,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus linus/master v6.6-rc1 next-20230913]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Sourav-Panda/mm-report-per-page-metadata-information/20230914-013201
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230913173000.4016218-2-souravpanda%40google.com
patch subject: [PATCH v1 1/1] mm: report per-page metadata information
config: um-defconfig (https://download.01.org/0day-ci/archive/20230914/202309140522.z5SLip5C-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/202309140522.z5SLip5C-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309140522.z5SLip5C-lkp@intel.com/
All errors (new ones prefixed by >>):
/usr/bin/ld: mm/mm_init.o: in function `alloc_node_mem_map':
>> mm/mm_init.c:1660: undefined reference to `mod_node_early_perpage_metadata'
/usr/bin/ld: mm/page_alloc.o: in function `setup_per_cpu_pageset':
>> mm/page_alloc.c:5500: undefined reference to `writeout_early_perpage_metadata'
collect2: error: ld returned 1 exit status
vim +1660 mm/mm_init.c
1628
1629 #ifdef CONFIG_FLATMEM
1630 static void __init alloc_node_mem_map(struct pglist_data *pgdat)
1631 {
1632 unsigned long __maybe_unused start = 0;
1633 unsigned long __maybe_unused offset = 0;
1634
1635 /* Skip empty nodes */
1636 if (!pgdat->node_spanned_pages)
1637 return;
1638
1639 start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
1640 offset = pgdat->node_start_pfn - start;
1641 /* ia64 gets its own node_mem_map, before this, without bootmem */
1642 if (!pgdat->node_mem_map) {
1643 unsigned long size, end;
1644 struct page *map;
1645
1646 /*
1647 * The zone's endpoints aren't required to be MAX_ORDER
1648 * aligned but the node_mem_map endpoints must be in order
1649 * for the buddy allocator to function correctly.
1650 */
1651 end = pgdat_end_pfn(pgdat);
1652 end = ALIGN(end, MAX_ORDER_NR_PAGES);
1653 size = (end - start) * sizeof(struct page);
1654 map = memmap_alloc(size, SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT,
1655 pgdat->node_id, false);
1656 if (!map)
1657 panic("Failed to allocate %ld bytes for node %d memory map\n",
1658 size, pgdat->node_id);
1659 pgdat->node_mem_map = map + offset;
> 1660 mod_node_early_perpage_metadata(pgdat->node_id,
1661 PAGE_ALIGN(size) >> PAGE_SHIFT);
1662 }
1663 pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n",
1664 __func__, pgdat->node_id, (unsigned long)pgdat,
1665 (unsigned long)pgdat->node_mem_map);
1666 #ifndef CONFIG_NUMA
1667 /*
1668 * With no DISCONTIG, the global mem_map is just set as node 0's
1669 */
1670 if (pgdat == NODE_DATA(0)) {
1671 mem_map = NODE_DATA(0)->node_mem_map;
1672 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
1673 mem_map -= offset;
1674 }
1675 #endif
1676 }
1677 #else
1678 static inline void alloc_node_mem_map(struct pglist_data *pgdat) { }
1679 #endif /* CONFIG_FLATMEM */
1680
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
` (3 preceding siblings ...)
2023-09-13 21:53 ` kernel test robot
@ 2023-09-14 13:00 ` David Hildenbrand
2023-09-14 22:47 ` Sourav Panda
2023-09-18 8:14 ` kernel test robot
5 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2023-09-14 13:00 UTC (permalink / raw)
To: Sourav Panda, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, rppt, rdunlap, chenlinxuan, yang.yang29,
tomas.mudrunka, bhelgaas, ivan, pasha.tatashin, yosryahmed,
hannes, shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan,
vbabka, Liam.Howlett, surenb, linux-kernel, linux-fsdevel,
linux-doc, linux-mm
On 13.09.23 19:30, Sourav Panda wrote:
> Adds a new per-node PageMetadata field to
> /sys/devices/system/node/nodeN/meminfo
> and a global PageMetadata field to /proc/meminfo. This information can
> be used by users to see how much memory is being used by per-page
> metadata, which can vary depending on build configuration, machine
> architecture, and system use.
>
> Per-page metadata is the amount of memory that Linux needs in order to
> manage memory at the page granularity. The majority of such memory is
> used by "struct page" and "page_ext" data structures.
It's probably worth mentioning, that in contrast to most other "memory
consumption" statistics, this metadata might not be included "MemTotal";
when the memmap is allocated using the memblock allocator, it's not
included, when it's dynamically allocated using the buddy (e.g., memory
hotplug), it's included.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-14 13:00 ` David Hildenbrand
@ 2023-09-14 22:47 ` Sourav Panda
0 siblings, 0 replies; 13+ messages in thread
From: Sourav Panda @ 2023-09-14 22:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: corbet, gregkh, rafael, akpm, mike.kravetz, muchun.song, rppt,
rdunlap, chenlinxuan, yang.yang29, tomas.mudrunka, bhelgaas,
ivan, pasha.tatashin, yosryahmed, hannes, shakeelb,
kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-kernel, linux-fsdevel, linux-doc,
linux-mm
[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]
Thank you David Hildenbrand for reviewing this patch.
On Thu, Sep 14, 2023 at 6:00 AM David Hildenbrand <david@redhat.com> wrote:
> On 13.09.23 19:30, Sourav Panda wrote:
> > Adds a new per-node PageMetadata field to
> > /sys/devices/system/node/nodeN/meminfo
> > and a global PageMetadata field to /proc/meminfo. This information can
> > be used by users to see how much memory is being used by per-page
> > metadata, which can vary depending on build configuration, machine
> > architecture, and system use.
> >
> > Per-page metadata is the amount of memory that Linux needs in order to
> > manage memory at the page granularity. The majority of such memory is
> > used by "struct page" and "page_ext" data structures.
>
> It's probably worth mentioning, that in contrast to most other "memory
> consumption" statistics, this metadata might not be included "MemTotal";
> when the memmap is allocated using the memblock allocator, it's not
> included, when it's dynamically allocated using the buddy (e.g., memory
> hotplug), it's included.
>
>
Thank you for your comment, Completely agree with you and shall make this
change.
[-- Attachment #2: Type: text/html, Size: 1601 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/1] mm: report per-page metadata information
2023-09-13 17:30 ` [PATCH v1 1/1] mm: report per-page " Sourav Panda
` (4 preceding siblings ...)
2023-09-14 13:00 ` David Hildenbrand
@ 2023-09-18 8:14 ` kernel test robot
5 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2023-09-18 8:14 UTC (permalink / raw)
To: Sourav Panda
Cc: oe-lkp, lkp, Pasha Tatashin, linux-kernel, linux-fsdevel,
linux-mm, corbet, gregkh, rafael, akpm, mike.kravetz,
muchun.song, rppt, david, rdunlap, chenlinxuan, yang.yang29,
souravpanda, tomas.mudrunka, bhelgaas, ivan, yosryahmed, hannes,
shakeelb, kirill.shutemov, wangkefeng.wang, adobriyan, vbabka,
Liam.Howlett, surenb, linux-doc, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_mm/vmstat.c:#__mod_node_page_state" on:
commit: af92fce0e99952613b7dac06b40a35decef4cad9 ("[PATCH v1 1/1] mm: report per-page metadata information")
url: https://github.com/intel-lab-lkp/linux/commits/Sourav-Panda/mm-report-per-page-metadata-information/20230914-013201
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20230913173000.4016218-2-souravpanda@google.com/
patch subject: [PATCH v1 1/1] mm: report per-page metadata information
in testcase: boot
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309181546.fc42f414-oliver.sang@intel.com
[ 4.596915][ T1] ------------[ cut here ]------------
[ 4.597618][ T1] WARNING: CPU: 0 PID: 1 at mm/vmstat.c:393 __mod_node_page_state (kbuild/src/rand-x86_64-2/mm/vmstat.c:393)
[ 4.598717][ T1] Modules linked in:
[ 4.598835][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00154-gaf92fce0e999 #4
[ 4.599915][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 4.601238][ T1] RIP: 0010:__mod_node_page_state (kbuild/src/rand-x86_64-2/mm/vmstat.c:393)
[ 4.602031][ T1] Code: 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 65 8b 0d bc 7e e0 7e 81 e1 ff ff ff 7f 75 b5 65 8b 0d 11 7c c3 7e 85 c9 74 aa 90 <0f> 0b 90 eb a4 49 83 fe 29 77 26 4e 8d 3c f5 88 33 00 00 f0 4b 01
All code
========
0: 5e pop %rsi
1: 41 5f pop %r15
3: 5d pop %rbp
4: 31 c0 xor %eax,%eax
6: 31 d2 xor %edx,%edx
8: 31 c9 xor %ecx,%ecx
a: 31 f6 xor %esi,%esi
c: 31 ff xor %edi,%edi
e: c3 ret
f: 65 8b 0d bc 7e e0 7e mov %gs:0x7ee07ebc(%rip),%ecx # 0x7ee07ed2
16: 81 e1 ff ff ff 7f and $0x7fffffff,%ecx
1c: 75 b5 jne 0xffffffffffffffd3
1e: 65 8b 0d 11 7c c3 7e mov %gs:0x7ec37c11(%rip),%ecx # 0x7ec37c36
25: 85 c9 test %ecx,%ecx
27: 74 aa je 0xffffffffffffffd3
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 90 nop
2d: eb a4 jmp 0xffffffffffffffd3
2f: 49 83 fe 29 cmp $0x29,%r14
33: 77 26 ja 0x5b
35: 4e 8d 3c f5 88 33 00 lea 0x3388(,%r14,8),%r15
3c: 00
3d: f0 lock
3e: 4b rex.WXB
3f: 01 .byte 0x1
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 90 nop
3: eb a4 jmp 0xffffffffffffffa9
5: 49 83 fe 29 cmp $0x29,%r14
9: 77 26 ja 0x31
b: 4e 8d 3c f5 88 33 00 lea 0x3388(,%r14,8),%r15
12: 00
13: f0 lock
14: 4b rex.WXB
15: 01 .byte 0x1
[ 4.602165][ T1] RSP: 0000:ffff888100307e20 EFLAGS: 00010202
[ 4.602954][ T1] RAX: 00000000001f2cc0 RBX: 0000000000000000 RCX: 0000000000000001
[ 4.604020][ T1] RDX: 0000000000000240 RSI: 0000000000000023 RDI: ffffffff83c6ef00
[ 4.605021][ T1] RBP: ffff888100307e48 R08: 0000000000000000 R09: 0000000000000000
[ 4.605499][ T1] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff83c6ef00
[ 4.606507][ T1] R13: 00000000001f2ce9 R14: 0000000000000028 R15: 0000000000240000
[ 4.607520][ T1] FS: 0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[ 4.608632][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.608832][ T1] CR2: ffff88843ffff000 CR3: 0000000003644000 CR4: 00000000000406b0
[ 4.609842][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.612168][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4.613186][ T1] Call Trace:
[ 4.613606][ T1] <TASK>
[ 4.613982][ T1] ? show_regs (kbuild/src/rand-x86_64-2/arch/x86/kernel/dumpstack.c:479)
[ 4.614528][ T1] ? __warn (kbuild/src/rand-x86_64-2/kernel/panic.c:677)
[ 4.615040][ T1] ? __mod_node_page_state (kbuild/src/rand-x86_64-2/mm/vmstat.c:393)
[ 4.615501][ T1] ? report_bug (kbuild/src/rand-x86_64-2/lib/bug.c:180 kbuild/src/rand-x86_64-2/lib/bug.c:219)
[ 4.616083][ T1] ? handle_bug (kbuild/src/rand-x86_64-2/arch/x86/kernel/traps.c:237)
[ 4.616642][ T1] ? exc_invalid_op (kbuild/src/rand-x86_64-2/arch/x86/kernel/traps.c:258 (discriminator 1))
[ 4.617240][ T1] ? asm_exc_invalid_op (kbuild/src/rand-x86_64-2/arch/x86/include/asm/idtentry.h:568)
[ 4.617895][ T1] ? __mod_node_page_state (kbuild/src/rand-x86_64-2/mm/vmstat.c:393)
[ 4.618575][ T1] init_section_page_ext (kbuild/src/rand-x86_64-2/mm/page_ext.c:292)
[ 4.618836][ T1] page_ext_init (kbuild/src/rand-x86_64-2/mm/page_ext.c:482)
[ 4.619433][ T1] page_alloc_init_late (kbuild/src/rand-x86_64-2/mm/mm_init.c:2417)
[ 4.620098][ T1] kernel_init_freeable (kbuild/src/rand-x86_64-2/init/main.c:1325 kbuild/src/rand-x86_64-2/init/main.c:1547)
[ 4.620765][ T1] ? rest_init (kbuild/src/rand-x86_64-2/init/main.c:1429)
[ 4.621325][ T1] kernel_init (kbuild/src/rand-x86_64-2/init/main.c:1439)
[ 4.621880][ T1] ? schedule_tail (kbuild/src/rand-x86_64-2/kernel/sched/core.c:5318)
[ 4.622167][ T1] ret_from_fork (kbuild/src/rand-x86_64-2/arch/x86/kernel/process.c:153)
[ 4.622753][ T1] ? rest_init (kbuild/src/rand-x86_64-2/init/main.c:1429)
[ 4.623314][ T1] ret_from_fork_asm (kbuild/src/rand-x86_64-2/arch/x86/entry/entry_64.S:312)
[ 4.623942][ T1] </TASK>
[ 4.624331][ T1] irq event stamp: 11857
[ 4.624861][ T1] hardirqs last enabled at (11865): __up_console_sem (kbuild/src/rand-x86_64-2/kernel/printk/printk.c:347 (discriminator 1))
[ 4.625498][ T1] hardirqs last disabled at (11874): __up_console_sem (kbuild/src/rand-x86_64-2/kernel/printk/printk.c:345 (discriminator 1))
[ 4.626689][ T1] softirqs last enabled at (11546): __do_softirq (kbuild/src/rand-x86_64-2/arch/x86/include/asm/preempt.h:27 kbuild/src/rand-x86_64-2/kernel/softirq.c:400 kbuild/src/rand-x86_64-2/kernel/softirq.c:582)
[ 4.627866][ T1] softirqs last disabled at (11541): __irq_exit_rcu (kbuild/src/rand-x86_64-2/kernel/softirq.c:427 kbuild/src/rand-x86_64-2/kernel/softirq.c:632)
[ 4.628832][ T1] ---[ end trace 0000000000000000 ]---
[ 4.694353][ T1] allocated 301989888 bytes of page_ext
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230918/202309181546.fc42f414-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread