From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9244DC433ED for ; Thu, 29 Apr 2021 11:39:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E157261445 for ; Thu, 29 Apr 2021 11:39:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E157261445 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F39D96B006C; Thu, 29 Apr 2021 07:39:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC2DC6B006E; Thu, 29 Apr 2021 07:39:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEDF56B0070; Thu, 29 Apr 2021 07:39:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id AA8606B006C for ; Thu, 29 Apr 2021 07:39:37 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 61F81583A for ; Thu, 29 Apr 2021 11:39:37 +0000 (UTC) X-FDA: 78085209594.34.706A6C2 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf18.hostedemail.com (Postfix) with ESMTP id 73F862000382 for ; Thu, 29 Apr 2021 11:39:38 +0000 (UTC) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13TBXJH8046139; Thu, 29 Apr 2021 07:39:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=Ql5S/1kRIJ5yG2xWurx0Xo/o3mJiVx2ZlKhKOaY8dNA=; b=CSIkVPEucO8eSb7SPHf1eytSFp2q5xzGq+ln0J87Q83g+zGmgiZD9ckBci3zGNCT02no KaCvHq60GzGPflurjYFO9LlSKad5nhldpFAO9TmqTKR1IkJnHdTUmM/xIBoeWaVac7vc s7qanGfLGXng3LshQ9kOpgq8thkagtO560UTKdsSIjBxxVpEclyDKgYov7xR8YNBnZ/f 2xlfx3to4VQByPiOIMByiZtvfDAW0CBZkXwtmyn9+Mx89dDcJPxe7chcZjS27WS14xVK zcRMN13203CJwdwWlqUSfUgrxcUAZzYvEjB/VDU2w91yl7P3mF5nm3RWbfz6x/17kWZL 9A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 387pafjm78-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 07:39:32 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 13TBXUTo050792; Thu, 29 Apr 2021 07:39:32 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 387pafjm6a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 07:39:32 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13TBc1S0031093; Thu, 29 Apr 2021 11:39:29 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma06ams.nl.ibm.com with ESMTP id 384akhachm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 11:39:29 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13TBdQ4337224900 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 11:39:26 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1BF5C5206D; Thu, 29 Apr 2021 11:39:26 +0000 (GMT) Received: from [9.199.39.76] (unknown [9.199.39.76]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 7B7B75204E; Thu, 29 Apr 2021 11:39:21 +0000 (GMT) Subject: Re: Percpu allocator: CPU hotplug support To: Alexey Makhalov , "linux-mm@kvack.org" Cc: Dennis Zhou , Roman Gushchin , Vlastimil Babka , Christoph Lameter , ldufour@linux.ibm.com, Tejun Heo , "Aneesh Kumar K.V" , Srikar Dronamraju , pratik.r.sampat@gmail.com References: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com> From: Pratik Sampat Message-ID: <832bd0f9-eefb-9f63-828d-dc81b9a21eb9@linux.ibm.com> Date: Thu, 29 Apr 2021 17:09:19 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: VZpkd_BlaU03UM7V3xL7wsJ2GGYDmMMJ X-Proofpoint-ORIG-GUID: un9zMrAGgzWKtyW2r1tBlanJSLQhZN0d X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-29_06:2021-04-28,2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1011 mlxscore=0 spamscore=0 mlxlogscore=999 impostorscore=0 adultscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290079 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 73F862000382 X-Stat-Signature: q9dfrncfkhsoz1pd1sckbmewfj6d4ggu Received-SPF: none (linux.ibm.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=mx0a-001b2d01.pphosted.com; client-ip=148.163.156.1 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619696378-213604 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello On 22/04/21 6:14 am, Alexey Makhalov wrote: > Current implementation of percpu allocator uses total possible number of CPUs (nr_cpu_ids) to > get number of units to allocate per chunk. Every alloc_percpu() request of N bytes will allocate > N*nr_cpu_ids bytes even if the number of present CPUs is much less. Percpu allocator grows by > number of chunks keeping number of units per chunk constant. This is done in that way to > simplify CPU hotplug/remove to have per-cpu area preallocated. > > Problem: This behavior can lead to inefficient memory usage for big server machines and VMs, > where nr_cpu_ids is huge. > > Example from my experiment: > 2 vCPU VM with hotplug support (up to 128): > [ 0.105989] smpboot: Allowing 128 CPUs, 126 hotplug CPUs > By creating huge amount of active or/and dying memory cgroups, I can generate active percpu > allocations of 100 MB (per single CPU) including fragmentation overhead. But in that case total > percpu memory consumption (reported in /proc/meminfo) will be 12.8 GB. BTW, chunks are > filled by ~75% in my experiment, so fragmentation is not a concern. > Out of 12.8 GB: > - 0.2 GB are actually used by present vCPUs, and > - 12.6 GB are "wasted"! > > I've seen production VMs consuming 16-20 GB of memory by Percpu. Roman reported 100 GB. > There are solutions to reduce "wasted" memory overhead such as: disabling CPU hotplug; reducing > number of maximum CPUs reported by hypervisor or/and firmware; using possible_cpus= kernel > parameter. But it won't eliminate fundamental issue with "wasted" memory. > > Suggestion: To support percpu chunks scaling by number of units there. To allocate/deallocate new > units for existing chunks on CPU hotplug/remove event. > > Any thoughts? Thanks! --Alexey > > I've run some traces around memory cgroups to determine memory consumption by the Percpu allocator and the major contributers to these allocations by either creating an empty memory cgroup or an empty container. There are 4 memcg percpu allocation charges I see when I create a cgroup attached to a memory controller. They seem to belong to mm/memcontrol.c's lruvec_stat and vmstats. I've run this experiment in 2 configurations on a POWER9 box 1. cpus=16 (present), maxcpus=16 (possible) 2. cpus=16 (present), maxcpus=1024 (possible) On system boot, Maxcpus Sum percpu charges(MB) 16 2.4979 1024 159.86 0 MB container setup (empty parallel container setup that just spawns and spins) Maxcpus per container avg(MB) 16 0.0398 1024 2.5507 The difference in cgroup charges, although in absolute numbers is quite small, wastes memory proportionally when the cgroup or the container setup is scaled to say 10,000 containers. If memory cgroup is the point of focus then would it make sense to attempt to optimize only those callers to be hotplug aware than to attempt to optimize the whole percpu allocator? Thanks, Pratik