From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03D11ECAAD5 for ; Thu, 8 Sep 2022 08:25:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B30E8D0001; Thu, 8 Sep 2022 04:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 262C56B0073; Thu, 8 Sep 2022 04:25:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12A658D0001; Thu, 8 Sep 2022 04:25:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 035E26B0072 for ; Thu, 8 Sep 2022 04:25:56 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CEC211212C5 for ; Thu, 8 Sep 2022 08:25:55 +0000 (UTC) X-FDA: 79888235070.21.2735EC4 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf05.hostedemail.com (Postfix) with ESMTP id 011F210008B for ; Thu, 8 Sep 2022 08:25:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662625555; x=1694161555; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=+wKxW6UyvtEGfEqoiIAp8ewCqvIukzr2poSjxykFMR8=; b=hj4LpL+Do6ntR2SD+oBW1QEVa0FEfkjY8QirGjYr4o1VDpE2lpbJHYli 9pLeACg8LzsVgK0SYVl4QVTmJmh7wfVmK2GRk4BYci1HdSMUCAQUMG3ew QMQ24gE051Y8x/Wx/pMMIzsAEGWjrSFeipUJDLe8JtVTPaE22N6MF4lRN KgWWh/MPwWB8QTh5R+sY7a8KPssDdgd7wfyKpxQYm4Ypa8e4C4b3nbFJu KOuOSkZbbe2KmxoA9tMDrDP92D+VWhewUyFT3esZ1OL1VDOITvsLdK3cr 3oDBKDlDDmO9phkl4mWS5Z+vqxCj/7b+JYjd0aJ3NPIGtXgjO6/nFKif4 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10463"; a="295841595" X-IronPort-AV: E=Sophos;i="5.93,299,1654585200"; d="scan'208";a="295841595" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2022 01:25:53 -0700 X-IronPort-AV: E=Sophos;i="5.93,299,1654585200"; d="scan'208";a="676588504" Received: from jiebinsu-mobl.ccr.corp.intel.com (HELO [10.238.0.228]) ([10.238.0.228]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2022 01:25:49 -0700 Message-ID: Date: Thu, 8 Sep 2022 16:25:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Subject: Re: [PATCH v4] ipc/msg: mitigate the lock contention with percpu counter Content-Language: en-US To: Andrew Morton , Tim Chen Cc: vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com References: <20220907172516.1210842-1-jiebin.sun@intel.com> <20220907143427.0ce54bbf096943ffca197fee@linux-foundation.org> From: "Sun, Jiebin" In-Reply-To: <20220907143427.0ce54bbf096943ffca197fee@linux-foundation.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662625555; a=rsa-sha256; cv=none; b=pptsYtZGkyjIrJHFndoQCV/Mv/GENvmI4t4MksT3voRMCxF/rqCeoyaFAXs+C+w4NyMShz NFLCJurZPtq2jdyewWyEuGoACAtFIDKCfHRHWe22ookA9Xy9utExxnn3/nvECoBwbFHqjy P93DY32iT3J6gwT1L4tMf8DTPnlFUh8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hj4LpL+D; spf=fail (imf05.hostedemail.com: domain of jiebin.sun@intel.com does not designate 192.55.52.120 as permitted sender) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662625555; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=32duDWK58nQOqPN+iRZAa1qmOjKSJSRQqiSowSi/I34=; b=RPgqDd18P3iZ7SHixZXWjpGcMC5yjn3FYmDS2gLAsm/apCbfphAEIB2VQL9WIsJlYq/bir Tvskcnd9WD8ZSS7+u5pZJeB1vTHLkKrNfVLROUKiFxBslMmD79DMUHuPUdT4niJiy8Dc12 4kC12Mjp2Oeg/0zNT7wUgknF0/aaVBU= X-Stat-Signature: 95871x1xa7s83zamymffghn6ph6igutf X-Rspamd-Queue-Id: 011F210008B X-Rspamd-Server: rspam11 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hj4LpL+D; spf=fail (imf05.hostedemail.com: domain of jiebin.sun@intel.com does not designate 192.55.52.120 as permitted sender) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-HE-Tag: 1662625554-301600 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/8/2022 5:34 AM, Andrew Morton wrote: > On Wed, 07 Sep 2022 09:01:53 -0700 Tim Chen wrote: > >> On Thu, 2022-09-08 at 01:25 +0800, Jiebin Sun wrote: >>> The msg_bytes and msg_hdrs atomic counters are frequently >>> updated when IPC msg queue is in heavy use, causing heavy >>> cache bounce and overhead. Change them to percpu_counter >>> greatly improve the performance. Since there is one percpu >>> struct per namespace, additional memory cost is minimal. >>> Reading of the count done in msgctl call, which is infrequent. >>> So the need to sum up the counts in each CPU is infrequent. >>> >>> >>> Apply the patch and test the pts/stress-ng-1.4.0 >>> -- system v message passing (160 threads). >>> >>> Score gain: 3.17x >>> >>> >> ... >>> >>> +/* large batch size could reduce the times to sum up percpu counter */ >>> +#define MSG_PERCPU_COUNTER_BATCH 1024 >>> + >> Jiebin, >> >> 1024 is a small size (1/4 page). >> The local per cpu counter could overflow to the gloabal count quickly >> if it is limited to this size, since our count tracks msg size. >> >> I'll suggest something larger, say 8*1024*1024, about >> 8MB to accommodate about 2 large page worth of data. Maybe that >> will further improve throughput on stress-ng by reducing contention >> on adding to the global count. >> > I think this concept of a percpu_counter_add() which is massively > biased to the write side and with very rare reading is a legitimate > use-case. Perhaps it should become an addition to the formal interface. > Something like > > /* > * comment goes here > */ > static inline void percpu_counter_add_local(struct percpu_counter *fbc, > s64 amount) > { > percpu_counter_add_batch(fbc, amount, INT_MAX); > } > > and percpu_counter_sub_local(), I guess. > > The only instance I can see is > block/blk-cgroup-rwstat.h:blkg_rwstat_add() which is using INT_MAX/2 > because it always uses percpu_counter_sum_positive() on the read side. > > But that makes two! Yes. Using INT_MAX or INT_MAX/2 could have a big improvement on the performance if heavy writing but rare reading. In our case, if the local percpu counter is near to INT_MAX and there comes a big msgsz, the overflow issue could happen. So I think INT_MAX/2, which is used in blkg_rwstat_add(), might be a better choice. /$ percpu_counter_add_batch(&ns->percpu_msg_bytes, msgsz, batch); /I will send the performance data and draft patch out for discussing.//Jiebin//