linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Alexandrov <aalexand@google.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, ccross@google.com,
	sumit.semwal@linaro.org, mhocko@suse.com, dave.hansen@intel.com,
	keescook@chromium.org, willy@infradead.org,
	kirill.shutemov@linux.intel.com, vbabka@suse.cz,
	hannes@cmpxchg.org, corbet@lwn.net, viro@zeniv.linux.org.uk,
	rdunlap@infradead.org, kaleshsingh@google.com, peterx@redhat.com,
	rppt@kernel.org, peterz@infradead.org, catalin.marinas@arm.com,
	vincenzo.frascino@arm.com, chinwen.chang@mediatek.com,
	axelrasmussen@google.com, aarcange@redhat.com, jannh@google.com,
	apopple@nvidia.com, jhubbard@nvidia.com, yuzhao@google.com,
	will@kernel.org, fenghua.yu@intel.com,
	thunder.leizhen@huawei.com, hughd@google.com,
	feng.tang@intel.com, jgg@ziepe.ca, guro@fb.com,
	tglx@linutronix.de, krisman@collabora.com,
	chris.hyser@oracle.com, pcc@google.com, ebiederm@xmission.com,
	axboe@kernel.dk, legion@kernel.org, eb@emlix.com,
	gorcunov@gmail.com, pavel@ucw.cz, songmuchun@bytedance.com,
	viresh.kumar@linaro.org, thomascedeno@google.com,
	sashal@kernel.org, cxfcosmos@gmail.com, linux@rasmusvillemoes.dk,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@android.com
Subject: Re: [PATCH v11 2/3] mm: add a field to store names for private anonymous memory
Date: Wed, 27 Oct 2021 11:35:01 -0700	[thread overview]
Message-ID: <89664270-4B9F-45E0-AC0B-8A185ED1F531@google.com> (raw)
In-Reply-To: <20211019215511.3771969-2-surenb@google.com>

> On Oct 19, 2021, at 2:55 PM, Suren Baghdasaryan <surenb@google.com> wrote:
> 
> From: Colin Cross <ccross@google.com>
> 
> In many userspace applications, and especially in VM based applications
> like Android uses heavily, there are multiple different allocators in use.
> At a minimum there is libc malloc and the stack, and in many cases there
> are libc malloc, the stack, direct syscalls to mmap anonymous memory, and
> multiple VM heaps (one for small objects, one for big objects, etc.).
> Each of these layers usually has its own tools to inspect its usage;
> malloc by compiling a debug version, the VM through heap inspection tools,
> and for direct syscalls there is usually no way to track them.
> 
> On Android we heavily use a set of tools that use an extended version of
> the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
> in userspace and slice their usage by process, shared (COW) vs.  unique
> mappings, backing, etc.  This can account for real physical memory usage
> even in cases like fork without exec (which Android uses heavily to share
> as many private COW pages as possible between processes), Kernel SamePage
> Merging, and clean zero pages.  It produces a measurement of the pages
> that only exist in that process (USS, for unique), and a measurement of
> the physical memory usage of that process with the cost of shared pages
> being evenly split between processes that share them (PSS).
> 
> If all anonymous memory is indistinguishable then figuring out the real
> physical memory usage (PSS) of each heap requires either a pagemap walking
> tool that can understand the heap debugging of every layer, or for every
> layer's heap debugging tools to implement the pagemap walking logic, in
> which case it is hard to get a consistent view of memory across the whole
> system.
> 
> Tracking the information in userspace leads to all sorts of problems.
> It either needs to be stored inside the process, which means every
> process has to have an API to export its current heap information upon
> request, or it has to be stored externally in a filesystem that
> somebody needs to clean up on crashes.  It needs to be readable while
> the process is still running, so it has to have some sort of
> synchronization with every layer of userspace.  Efficiently tracking
> the ranges requires reimplementing something like the kernel vma
> trees, and linking to it from every layer of userspace.  It requires
> more memory, more syscalls, more runtime cost, and more complexity to
> separately track regions that the kernel is already tracking.
> 
> This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
> userspace-provided name for anonymous vmas.  The names of named anonymous
> vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].
> 
> Userspace can set the name for a region of memory by calling
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> Setting the name to NULL clears it. The name length limit is 80 bytes
> including NUL-terminator and is checked to contain only printable ascii
> characters (including space), except '[',']','\','$' and '`'. Ascii
> strings are being used to have a descriptive identifiers for vmas, which
> can be understood by the users reading /proc/pid/maps or /proc/pid/smaps.
> Names can be standardized for a given system and they can include some
> variable parts such as the name of the allocator or a library, tid of
> the thread using it, etc.
> 
> The name is stored in a pointer in the shared union in vm_area_struct
> that points to a null terminated string. Anonymous vmas with the same
> name (equivalent strings) and are otherwise mergeable will be merged.
> The name pointers are not shared between vmas even if they contain the
> same name. The name pointer is stored in a union with fields that are
> only used on file-backed mappings, so it does not increase memory usage.
> 
> CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
> feature. It keeps the feature disabled by default to prevent any
> additional memory overhead and to avoid confusing procfs parsers on
> systems which are not ready to support named anonymous vmas.
> 
> The patch is based on the original patch developed by Colin Cross, more
> specifically on its latest version [1] posted upstream by Sumit Semwal.
> It used a userspace pointer to store vma names. In that design, name
> pointers could be shared between vmas. However during the last upstreaming
> attempt, Kees Cook raised concerns [2] about this approach and suggested
> to copy the name into kernel memory space, perform validity checks [3]
> and store as a string referenced from vm_area_struct.
> One big concern is about fork() performance which would need to strdup
> anonymous vma names. Dave Hansen suggested experimenting with worst-case
> scenario of forking a process with 64k vmas having longest possible names
> [4]. I ran this experiment on an ARM64 Android device and recorded a
> worst-case regression of almost 40% when forking such a process. This
> regression is addressed in the followup patch which replaces the pointer
> to a name with a refcounted structure that allows sharing the name pointer
> between vmas of the same name. Instead of duplicating the string during
> fork() or when splitting a vma it increments the refcount.
> 
> [1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
> [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
> [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
> [4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/
> 
> Changes for prctl(2) manual page (in the options section):
> 
> PR_SET_VMA
> 	Sets an attribute specified in arg2 for virtual memory areas
> 	starting from the address specified in arg3 and spanning the
> 	size specified	in arg4. arg5 specifies the value of the attribute
> 	to be set. Note that assigning an attribute to a virtual memory
> 	area might prevent it from being merged with adjacent virtual
> 	memory areas due to the difference in that attribute's value.
> 
> 	Currently, arg2 must be one of:
> 
> 	PR_SET_VMA_ANON_NAME
> 		Set a name for anonymous virtual memory areas. arg5 should
> 		be a pointer to a null-terminated string containing the
> 		name. The name length including null byte cannot exceed
> 		80 bytes. If arg5 is NULL, the name of the appropriate
> 		anonymous virtual memory areas will be reset. The name
> 		can contain only printable ascii characters (including
>                space), except '[',']','\','$' and '`'.
> 
>                This feature is available only if the kernel is built with
>                the CONFIG_ANON_VMA_NAME option enabled.

For what it’s worth, it’s definitely interesting to see this going upstream.
In particular, we would use it for high-level grouping of the data in
production profiling when proper symbolization is not available:

* JVM could associate a name with the memory regions it uses for the JIT
  code so that Linux perf data are associated with a high level name like
  "Java JIT" even if the proper Java JIT profiling is not enabled.
* Similar for other JIT engines like v8 - they could annotate the memory
  regions they manage and use as well.
* Traditional memory allocators like tcmalloc can use this as well so
  that the associated name is used in data access profiling via Linux perf.



  parent reply	other threads:[~2021-10-27 18:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 21:55 [PATCH v11 1/3] mm: rearrange madvise code to allow for reuse Suren Baghdasaryan
2021-10-19 21:55 ` [PATCH v11 2/3] mm: add a field to store names for private anonymous memory Suren Baghdasaryan
2021-10-19 21:58   ` Suren Baghdasaryan
2021-11-16  5:19     ` Andrew Morton
2021-11-16  6:10       ` Suren Baghdasaryan
2021-10-27 18:35   ` Alexey Alexandrov [this message]
2021-10-27 20:01     ` Suren Baghdasaryan
2021-10-28 22:08       ` Suren Baghdasaryan
2021-11-15 18:59         ` Suren Baghdasaryan
2021-11-16  9:51           ` Michal Hocko
2021-11-16 16:29             ` Suren Baghdasaryan
2021-10-19 21:55 ` [PATCH v11 3/3] mm: add anonymous vma name refcounting Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89664270-4B9F-45E0-AC0B-8A185ED1F531@google.com \
    --to=aalexand@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=ccross@google.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=cxfcosmos@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=eb@emlix.com \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=gorcunov@gmail.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kaleshsingh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=krisman@collabora.com \
    --cc=legion@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mhocko@suse.com \
    --cc=pavel@ucw.cz \
    --cc=pcc@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sashal@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=sumit.semwal@linaro.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomascedeno@google.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox