Re: Percpu allocator: CPU hotplug support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Pratik Sampat <psampat@linux.ibm.com>
To: Alexey Makhalov <amakhalov@vmware.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Dennis Zhou <dennis@kernel.org>, Roman Gushchin <guro@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Christoph Lameter <cl@linux.com>,
	ldufour@linux.ibm.com, Tejun Heo <tj@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	pratik.r.sampat@gmail.com
Subject: Re: Percpu allocator: CPU hotplug support
Date: Thu, 29 Apr 2021 17:09:19 +0530	[thread overview]
Message-ID: <832bd0f9-eefb-9f63-828d-dc81b9a21eb9@linux.ibm.com> (raw)
In-Reply-To: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com>

Hello

On 22/04/21 6:14 am, Alexey Makhalov wrote:
> Current implementation of percpu allocator uses total possible number of CPUs (nr_cpu_ids) to
> get number of units to allocate per chunk. Every alloc_percpu() request of N bytes will allocate
> N*nr_cpu_ids bytes even if the number of present CPUs is much less. Percpu allocator grows by
> number of chunks keeping number of units per chunk constant. This is done in that way to
> simplify CPU hotplug/remove to have per-cpu area preallocated.
>
> Problem: This behavior can lead to inefficient memory usage for big server machines and VMs,
> where nr_cpu_ids is huge.
>
> Example from my experiment:
> 2 vCPU VM with hotplug support (up to 128):
> [    0.105989] smpboot: Allowing 128 CPUs, 126 hotplug CPUs
> By creating huge amount of active or/and dying memory cgroups, I can generate active percpu
> allocations of 100 MB (per single CPU) including fragmentation overhead. But in that case total
> percpu memory consumption (reported in /proc/meminfo) will be 12.8 GB. BTW, chunks are
> filled by ~75% in my experiment, so fragmentation is not a concern.
> Out of 12.8 GB:
>   - 0.2 GB are actually used by present vCPUs, and
>   - 12.6 GB are "wasted"!
>
> I've seen production VMs consuming 16-20 GB of memory by Percpu. Roman reported 100 GB.
> There are solutions to reduce "wasted" memory overhead such as: disabling CPU hotplug; reducing
> number of maximum CPUs reported by hypervisor or/and firmware; using possible_cpus= kernel
> parameter. But it won't eliminate fundamental issue with "wasted" memory.
>
> Suggestion: To support percpu chunks scaling by number of units there. To allocate/deallocate new
> units for existing chunks on CPU hotplug/remove event.
>
> Any thoughts? Thanks! --Alexey
>
>
I've run some traces around memory cgroups to determine memory consumption by
the Percpu allocator and the major contributers to these allocations by either
creating an empty memory cgroup or an empty container.

There are 4 memcg percpu allocation charges I see when I create a cgroup
attached to a memory controller. They seem to belong to mm/memcontrol.c's
lruvec_stat and vmstats.

I've run this experiment in 2 configurations on a POWER9 box
1. cpus=16 (present), maxcpus=16   (possible)
2. cpus=16 (present), maxcpus=1024 (possible)

On system boot,
Maxcpus    Sum percpu charges(MB)
16         2.4979
1024       159.86

0 MB container setup (empty parallel container setup that just spawns and spins)
Maxcpus    per container avg(MB)
16         0.0398
1024       2.5507

The difference in cgroup charges, although in absolute numbers is quite small,
wastes memory proportionally when the cgroup or the container setup is scaled
to say 10,000 containers.

If memory cgroup is the point of focus then would it make sense to attempt to
optimize only those callers to be hotplug aware than to attempt to optimize the
whole percpu allocator?

Thanks,
Pratik

     prev parent reply	other threads:[~2021-04-29 11:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22  0:44 Alexey Makhalov
2021-04-22  1:10 ` Roman Gushchin
2021-04-22  1:33 ` Dennis Zhou
2021-04-22  7:45   ` Laurent Dufour
2021-04-22  8:22     ` Alexey Makhalov
2021-04-22 17:52       ` Vlastimil Babka
2021-04-29 11:39 ` Pratik Sampat [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=832bd0f9-eefb-9f63-828d-dc81b9a21eb9@linux.ibm.com \
    --to=psampat@linux.ibm.com \
    --cc=amakhalov@vmware.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=pratik.r.sampat@gmail.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox