From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A797C433B4 for ; Thu, 22 Apr 2021 07:45:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C90476121E for ; Thu, 22 Apr 2021 07:45:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C90476121E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 26E7B6B006C; Thu, 22 Apr 2021 03:45:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F6496B006E; Thu, 22 Apr 2021 03:45:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0702E6B0070; Thu, 22 Apr 2021 03:45:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id DC4C26B006C for ; Thu, 22 Apr 2021 03:45:41 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8AB9D18045A62 for ; Thu, 22 Apr 2021 07:45:41 +0000 (UTC) X-FDA: 78059218482.11.1942A70 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf27.hostedemail.com (Postfix) with ESMTP id D28EB80192EB for ; Thu, 22 Apr 2021 07:45:23 +0000 (UTC) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13M7ZHj9180199; Thu, 22 Apr 2021 03:45:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=77z9sVskLOmMPnxIXAg7iFNm6GcDiOZJXmuNOY/XB50=; b=fkZgTS+/HqyKHKFjVP6ni3Q/BbakzX0h4i8RTwc5BWWj1i8MXrPOQ1+TkTV70epXicXD RMHew12FPEQsG39J/XsdgIaDZopZADzPvlDcI/OrxNT1nugNJOkyouaqmSybh07c6VV7 WLW8egE2xXW+4gPz+dfK2x6lgUVEwva4ExBZTLlLZyArAtju5NRFF/CFQsobFLMeY94/ AhsaX5g74XcdWSQx3PLsYKDYw8AEmkdqzpNGxEoocKOR/WWhopzSlZ5Lpt3hl4qLqXf2 zZyOgv3v2SlPYtvswI9T1Ayi/30tq8sCCtUAS3/9r7FIqP9J5pfxPvhvz6m6HB9HSXr8 8g== Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0b-001b2d01.pphosted.com with ESMTP id 3834jyguv9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Apr 2021 03:45:37 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13M7hLE6006180; Thu, 22 Apr 2021 07:45:35 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma02fra.de.ibm.com with ESMTP id 37yqa89gb4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Apr 2021 07:45:35 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13M7j9sK20447596 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 22 Apr 2021 07:45:09 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 14AFA11C06F; Thu, 22 Apr 2021 07:45:33 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C066A11C05B; Thu, 22 Apr 2021 07:45:32 +0000 (GMT) Received: from pomme.local (unknown [9.145.49.218]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 22 Apr 2021 07:45:32 +0000 (GMT) Subject: Re: Percpu allocator: CPU hotplug support To: Dennis Zhou , Alexey Makhalov Cc: "linux-mm@kvack.org" , Tejun Heo , Christoph Lameter , Roman Gushchin , "Aneesh Kumar K.V" , Srikar Dronamraju References: <8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com> From: Laurent Dufour Message-ID: <3320a36c-9270-a7f7-88da-0a9bfa13c774@linux.ibm.com> Date: Thu, 22 Apr 2021 09:45:32 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: AEgMDpB62-2ulQbuNjLE2fTIr8UD0Lwm X-Proofpoint-GUID: AEgMDpB62-2ulQbuNjLE2fTIr8UD0Lwm X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-22_01:2021-04-21,2021-04-21 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 priorityscore=1501 clxscore=1011 impostorscore=0 malwarescore=0 bulkscore=0 mlxlogscore=999 adultscore=0 lowpriorityscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104220063 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D28EB80192EB X-Stat-Signature: 3ohxmpatsk965dpifyfet9t3zbtgrqou Received-SPF: none (linux.ibm.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=mx0a-001b2d01.pphosted.com; client-ip=148.163.158.5 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619077523-226920 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 22/04/2021 =C3=A0 03:33, Dennis Zhou a =C3=A9crit=C2=A0: > Hello, >=20 > On Thu, Apr 22, 2021 at 12:44:37AM +0000, Alexey Makhalov wrote: >> Current implementation of percpu allocator uses total possible number = of CPUs (nr_cpu_ids) to >> get number of units to allocate per chunk. Every alloc_percpu() reques= t of N bytes will allocate >> N*nr_cpu_ids bytes even if the number of present CPUs is much less. Pe= rcpu allocator grows by >> number of chunks keeping number of units per chunk constant. This is d= one in that way to >> simplify CPU hotplug/remove to have per-cpu area preallocated. >> >> Problem: This behavior can lead to inefficient memory usage for big se= rver machines and VMs, >> where nr_cpu_ids is huge. >> >> Example from my experiment: >> 2 vCPU VM with hotplug support (up to 128): >> [ 0.105989] smpboot: Allowing 128 CPUs, 126 hotplug CPUs >> By creating huge amount of active or/and dying memory cgroups, I can g= enerate active percpu >> allocations of 100 MB (per single CPU) including fragmentation overhea= d. But in that case total >> percpu memory consumption (reported in /proc/meminfo) will be 12.8 GB.= BTW, chunks are >> filled by ~75% in my experiment, so fragmentation is not a concern. >> Out of 12.8 GB: >> - 0.2 GB are actually used by present vCPUs, and >> - 12.6 GB are "wasted"! >> >> I've seen production VMs consuming 16-20 GB of memory by Percpu. Roman= reported 100 GB. >> There are solutions to reduce "wasted" memory overhead such as: disabl= ing CPU hotplug; reducing >> number of maximum CPUs reported by hypervisor or/and firmware; using p= ossible_cpus=3D kernel >> parameter. But it won't eliminate fundamental issue with "wasted" memo= ry. >> >> Suggestion: To support percpu chunks scaling by number of units there.= To allocate/deallocate new >> units for existing chunks on CPU hotplug/remove event. >> >=20 > Idk. In theory it sounds doable. In practice I'm not so sure. The two > problems off the top of my head: > 1) What happens if we can't allocate new pages when a cpu is onlined? > 2) It's possible users set particular conditions in percpu variables > that are not tied to just statistics summing (such as the cpu > runqueues). Users would have to provide online init and exit functions > which could get weird. >=20 > As Roman mentioned, I think it would be much better to not have the > large discrepancy between the cpu_online_mask and the cpu_possible_mask= . Indeed it is quite common on PowerPC to set a VM with a possible high num= ber of=20 CPUs but with a reasonnable number of online CPUs. This allows the user t= o scale=20 up its VM when needed. For instance we may see up to 1024 possible CPUs while the online number = is=20 *only* 128. Cheers, Laurent.