linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm <linux-mm@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] Single Owner Memory
Date: Tue, 21 Feb 2023 13:46:20 +0000	[thread overview]
Message-ID: <Y/TLLFPMRl6vzuO0@casper.infradead.org> (raw)
In-Reply-To: <CA+CK2bD5gztcyTm18cnznNi48o_G7H-F+LG=TSk=0WSGL39hrA@mail.gmail.com>

On Mon, Feb 20, 2023 at 02:10:24PM -0500, Pasha Tatashin wrote:
> Within Google the vast majority of memory, over 90% has a single
> owner. This is because most of the jobs are not multi-process but
> instead multi-threaded. The examples of single owner memory
> allocations are all tcmalloc()/malloc() allocations, and
> mmap(MAP_ANONYMOUS | MAP_PRIVATE) allocations without forks. On the
> other hand, the struct page metadata that is shared for all types of
> memory takes 1.6% of the system memory. It would be reasonable to find
> ways to optimize memory such that the common som case has a reduced
> amount of metadata.
> 
> This would be similar to HugeTLB and DAX that are treated as special
> cases, and can release struct pages for the subpages back to the
> system.

DAX can't, unless something's changed recently.  You're referring to
CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP

> The proposal is to discuss a new som driver that would use HugeTLB as
> a source of 2M chunks. When user creates a som memory, i.e.:
> 
> mmap(MAP_ANONYMOUS | MAP_PRIVATE);
> madvise(mem, length, MADV_DONTFORK);
> 
> A vma from the som driver is used instead of regular anon vma.

That's going to be "interesting".  The VMA is already created with
the call to mmap(), and madvise has not traditionally allowed drivers
to replace a VMA.  You might be better off creating a /dev/som and
hacking the malloc libraries to pass an fd from that instead of passing
MAP_ANONYMOUS.

> The discussion should include the following topics:
> -  Interaction with folio and the proposed struct page {memdesc}.
> - Handling for migrate_pages() and friends.
> - Handling for FOLL_PIN and FOLL_LONGTERM.
> - What type of madvise() properties the som memory should handle

Obviously once we get to dynamically allocated memdescs, this whole
thing goes away, so I'm not excited about making big changes to the
kernel to support this.

The savings you'll see are 6 pages (24kB) per 2MB allocated (1.2%).
That's not nothing, but it's not huge either.


  reply	other threads:[~2023-02-21 13:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-20 19:10 Pasha Tatashin
2023-02-21 13:46 ` Matthew Wilcox [this message]
2023-02-21 14:37   ` Pasha Tatashin
2023-02-21 15:05     ` Matthew Wilcox
2023-02-21 17:16       ` Pasha Tatashin
2023-02-22 16:18         ` Matthew Wilcox
2023-02-22 16:40           ` Pasha Tatashin
2023-02-21 15:55 ` Matthew Wilcox
2023-02-21 17:20   ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/TLLFPMRl6vzuO0@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pasha.tatashin@soleen.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox