linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: akpm@linux-foundation.org
Cc: Sourav Panda <souravpanda@google.com>,
	corbet@lwn.net, gregkh@linuxfoundation.org,  rafael@kernel.org,
	mike.kravetz@oracle.com, muchun.song@linux.dev,  rppt@kernel.org,
	david@redhat.com, rdunlap@infradead.org,
	 chenlinxuan@uniontech.com, yang.yang29@zte.com.cn,
	tomas.mudrunka@gmail.com,  bhelgaas@google.com,
	ivan@cloudflare.com, yosryahmed@google.com,  hannes@cmpxchg.org,
	shakeelb@google.com, kirill.shutemov@linux.intel.com,
	 wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz,
	 Liam.Howlett@oracle.com, surenb@google.com,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	 willy@infradead.org, weixugc@google.com
Subject: Re: [PATCH v9 1/1] mm: report per-page metadata information
Date: Tue, 19 Mar 2024 10:25:30 -0400	[thread overview]
Message-ID: <CA+CK2bCwi0yU_jX8qKCBMUnTeqoDYc65z7GKd5uEKcpkPAn4MA@mail.gmail.com> (raw)
In-Reply-To: <CA+CK2bAM4Xe7BT3TFZT-+3qQTFGgkYBiYY=oVkdqMN8gyJg_0g@mail.gmail.com>

On Wed, Mar 13, 2024 at 6:40 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Tue, Feb 20, 2024 at 4:46 PM Sourav Panda <souravpanda@google.com> wrote:
> >
> > Adds two new per-node fields, namely nr_memmap and nr_memmap_boot,
> > to /sys/devices/system/node/nodeN/vmstat and a global Memmap field
> > to /proc/meminfo. This information can be used by users to see how
> > much memory is being used by per-page metadata, which can vary
> > depending on build configuration, machine architecture, and system
> > use.
> >
> > Per-page metadata is the amount of memory that Linux needs in order to
> > manage memory at the page granularity. The majority of such memory is
> > used by "struct page" and "page_ext" data structures. In contrast to
> > most other memory consumption statistics, per-page metadata might not
> > be included in MemTotal. For example, MemTotal does not include memblock
> > allocations but includes buddy allocations. In this patch, exported
> > field nr_memmap in /sys/devices/system/node/nodeN/vmstat would
> > exclusively track buddy allocations while nr_memmap_boot would
> > exclusively track memblock allocations. Furthermore, Memmap in
> > /proc/meminfo would exclusively track buddy allocations allowing it to
> > be compared against MemTotal.
> >
> > This memory depends on build configurations, machine architectures, and
> > the way system is used:
> >
> > Build configuration may include extra fields into "struct page",
> > and enable / disable "page_ext"
> > Machine architecture defines base page sizes. For example 4K x86,
> > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> > overhead is smaller on machines with larger page sizes.
> > System use can change per-page overhead by using vmemmap
> > optimizations with hugetlb pages, and emulated pmem devdax pages.
> > Also, boot parameters can determine whether page_ext is needed
> > to be allocated. This memory can be part of MemTotal or be outside
> > MemTotal depending on whether the memory was hot-plugged, booted with,
> > or hugetlb memory was returned back to the system.
> >
> > Utility for userspace:
> >
> > Application Optimization: Depending on the kernel version and command
> > line options, the kernel would relinquish a different number of pages
> > (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6
> > or 7 for a 2MB hugepage). The userspace application would want to know
> > the exact savings achieved through page metadata deallocation without
> > dealing with the intricacies of the kernel.
> >
> > Observability: Struct page overhead can only be calculated on-paper at
> > boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are
> > reserved or memory is hotplugged, the computation becomes complex.
> > Per-page metrics will help explain part of the system memory overhead,
> > which shall help guide memory optimizations and memory cgroup sizing.
> >
> > Debugging: Tracking the changes or absolute value in struct pages can
> > help detect anomalies as they can be correlated with other metrics in
> > the machine (e.g., memtotal, number of huge pages, etc).
> >
> > page_ext overheads: Some kernel features such as page_owner
> > page_table_check that use page_ext can be optionally enabled via kernel
> > parameters. Having the total per-page metadata information helps users
> > precisely measure impact.

Hi Andrew,

Can you please give this patch another look, does it require more
reviews before you can take it in?

Thank you,
Pasha


  reply	other threads:[~2024-03-19 14:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 21:45 [PATCH v9 0/1] " Sourav Panda
2024-02-20 21:45 ` [PATCH v9 1/1] " Sourav Panda
2024-02-26 18:12   ` Wei Xu
2024-03-13 22:37     ` Pasha Tatashin
2024-03-13 22:40   ` Pasha Tatashin
2024-03-19 14:25     ` Pasha Tatashin [this message]
2024-03-19 21:33   ` Andrew Morton
2024-04-10 23:58     ` David Rientjes
2024-04-11  5:42       ` Sourav Panda
2024-04-12 17:04     ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+CK2bCwi0yU_jX8qKCBMUnTeqoDYc65z7GKd5uEKcpkPAn4MA@mail.gmail.com \
    --to=pasha.tatashin@soleen.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=tomas.mudrunka@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yang.yang29@zte.com.cn \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox