From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C768C433B4 for ; Thu, 22 Apr 2021 01:33:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF6BC613F6 for ; Thu, 22 Apr 2021 01:33:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF6BC613F6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F14DD6B006C; Wed, 21 Apr 2021 21:33:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBCBD6B006E; Wed, 21 Apr 2021 21:33:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D84886B0070; Wed, 21 Apr 2021 21:33:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id BCE6D6B006C for ; Wed, 21 Apr 2021 21:33:27 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6857C18021B98 for ; Thu, 22 Apr 2021 01:33:27 +0000 (UTC) X-FDA: 78058280454.29.F61A21D Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by imf06.hostedemail.com (Postfix) with ESMTP id E082FC0007CB for ; Thu, 22 Apr 2021 01:33:29 +0000 (UTC) Received: by mail-il1-f169.google.com with SMTP id r5so28310051ilb.2 for ; Wed, 21 Apr 2021 18:33:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5YeCZB7ayhDvCpVUi/SaF77oZjR7mR7gycLGnAvV5t4=; b=dE9mDzPQrKTcL3Acr8kv0StCd9mLRbvnWenGZ9nLkA9kSEJsYINm0BBVie6ciIMzg+ l6xto3uNzHt/EMRD5izRW06i2wBdLmhzUjf1A1R3et27FmozdrSvU7kkVMYFo5YoPboY H36IMYGJ1by53nmJblCuqT6UhGUsEbpPLvCU3ZvVtrrtcsF0Hf9ZcYyzWHNVKqGGsrU9 6gBkIMrppGsKXVzNC1rCjBlZebf9HlGeIZf7znf1wVBK6blFVPjuy9yhFvp5ArpBA8yW MuIs6RcrRx7BgTbjWdp8TA2shA1Df64dIOOkygG+4ipDKzCVwnWY7Wzw54oG5DuYtI32 z5HQ== X-Gm-Message-State: AOAM531xD3mn2f2nI7j3lKQvh/Nc2hbueJO6VuROTXwWZJVXntUibZc2 0CzujyR/S+oamfslRyr4rqg= X-Google-Smtp-Source: ABdhPJz2vTk2yKS4Av2YIfzKpivJaKkbybba9wbnB6pyC5wb9gl0HpNduLX3ouygHWYkVEy7R6o5Qw== X-Received: by 2002:a05:6e02:1d88:: with SMTP id h8mr743873ila.66.1619055206464; Wed, 21 Apr 2021 18:33:26 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id o6sm574556ioa.21.2021.04.21.18.33.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Apr 2021 18:33:26 -0700 (PDT) Date: Thu, 22 Apr 2021 01:33:24 +0000 From: Dennis Zhou To: Alexey Makhalov Cc: "linux-mm@kvack.org" , Tejun Heo , Christoph Lameter , Roman Gushchin Subject: Re: Percpu allocator: CPU hotplug support Message-ID: References: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E082FC0007CB X-Stat-Signature: 1o49runtaa7gggoi1mrxg1zed3pdrxgf Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mail-il1-f169.google.com; client-ip=209.85.166.169 X-HE-DKIM-Result: none/none X-HE-Tag: 1619055209-19737 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Thu, Apr 22, 2021 at 12:44:37AM +0000, Alexey Makhalov wrote: > Current implementation of percpu allocator uses total possible number of CPUs (nr_cpu_ids) to > get number of units to allocate per chunk. Every alloc_percpu() request of N bytes will allocate > N*nr_cpu_ids bytes even if the number of present CPUs is much less. Percpu allocator grows by > number of chunks keeping number of units per chunk constant. This is done in that way to > simplify CPU hotplug/remove to have per-cpu area preallocated. > > Problem: This behavior can lead to inefficient memory usage for big server machines and VMs, > where nr_cpu_ids is huge. > > Example from my experiment: > 2 vCPU VM with hotplug support (up to 128): > [ 0.105989] smpboot: Allowing 128 CPUs, 126 hotplug CPUs > By creating huge amount of active or/and dying memory cgroups, I can generate active percpu > allocations of 100 MB (per single CPU) including fragmentation overhead. But in that case total > percpu memory consumption (reported in /proc/meminfo) will be 12.8 GB. BTW, chunks are > filled by ~75% in my experiment, so fragmentation is not a concern. > Out of 12.8 GB: > - 0.2 GB are actually used by present vCPUs, and > - 12.6 GB are "wasted"! > > I've seen production VMs consuming 16-20 GB of memory by Percpu. Roman reported 100 GB. > There are solutions to reduce "wasted" memory overhead such as: disabling CPU hotplug; reducing > number of maximum CPUs reported by hypervisor or/and firmware; using possible_cpus= kernel > parameter. But it won't eliminate fundamental issue with "wasted" memory. > > Suggestion: To support percpu chunks scaling by number of units there. To allocate/deallocate new > units for existing chunks on CPU hotplug/remove event. > Idk. In theory it sounds doable. In practice I'm not so sure. The two problems off the top of my head: 1) What happens if we can't allocate new pages when a cpu is onlined? 2) It's possible users set particular conditions in percpu variables that are not tied to just statistics summing (such as the cpu runqueues). Users would have to provide online init and exit functions which could get weird. As Roman mentioned, I think it would be much better to not have the large discrepancy between the cpu_online_mask and the cpu_possible_mask. Thanks, Dennis