From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD728ECAAD8 for ; Tue, 20 Sep 2022 05:50:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AD4E940008; Tue, 20 Sep 2022 01:50:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35C61940007; Tue, 20 Sep 2022 01:50:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 223EA940008; Tue, 20 Sep 2022 01:50:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 14925940007 for ; Tue, 20 Sep 2022 01:50:29 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D5181C1D48 for ; Tue, 20 Sep 2022 05:50:28 +0000 (UTC) X-FDA: 79931388936.24.CF5EE15 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf10.hostedemail.com (Postfix) with ESMTP id 44A4EC001D for ; Tue, 20 Sep 2022 05:50:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663653028; x=1695189028; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=azjpc/W1xgm63w6co4quNz/N14+KNfXPM4ToOaCNS2Q=; b=FV5aCr8/sRcirdZM9aQ6Ba+Nv0hFt20vrU0izdFjTJY/DfrzZ3tuZH3E QQdYt3I5dV7G0dqd+7uYA/vcpajgpZ6ZuMY6k6CUFzMglP+V/TuHNBT4z O1NcKNRGuAP23ZpgtDF/NDJscvU02/X6w9oGyKuvP3KZXcHmvN+epMKH4 a7IqRP1EnYnHYK1+Zv+iqwYlSJ6i487F1udwj7osaOQyLFIPPep73GOhx dGpiCpVeeAaxzhMVHbeBWg1AtFbOsJ+tD7+E10jZ3lfxvWwuyE7g3eLxS ke4rsEQspCA85chqV9pyClmdZtGSGuncshC4NjlOL8xAlyPTTgKnsMSYo A==; X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="279985317" X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="279985317" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 22:50:26 -0700 X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="569944306" Received: from jiebinsu-mobl.ccr.corp.intel.com (HELO [10.238.4.108]) ([10.238.4.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 22:50:22 -0700 Message-ID: Date: Tue, 20 Sep 2022 13:50:20 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Subject: Re: [PATCH v6 2/2] ipc/msg: mitigate the lock contention with percpu counter Content-Language: en-US To: Manfred Spraul , akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, Tim Chen References: <20220902152243.479592-1-jiebin.sun@intel.com> <20220913192538.3023708-1-jiebin.sun@intel.com> <20220913192538.3023708-3-jiebin.sun@intel.com> <6ed22478-0c89-92ea-a346-0349be2dd99c@intel.com> <8d74a7d4-b80f-2a0f-ee95-243bdbd51ccd@colorfullife.com> From: "Sun, Jiebin" In-Reply-To: <8d74a7d4-b80f-2a0f-ee95-243bdbd51ccd@colorfullife.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663653028; a=rsa-sha256; cv=none; b=Rq1/QKqdabD/y4rXDUdS3iKZaJKz6WRBby/1y8zUFQy1zoiy38rCh5yZ4CBY4w7aDX8eJg vwhecIxubG97+Sjg9Iq1umOhoLRDgetLiE80FwDyfOCOJPY2oNWLSA6brSgGmQ5SBw5INO DbQwbbdPYBxG7x+DD/+Ob8lZyIxZwuE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="FV5aCr8/"; spf=pass (imf10.hostedemail.com: domain of jiebin.sun@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=jiebin.sun@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663653028; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9/NzavPB3hFDPHdwYmjyyfE1pQzLDjFnSae1AembNgo=; b=LbBZ0GrEM4RaRqLdZbd3jUBsnMA/MLj8UGILorTHx3fhbV05BiO8/7LyAt7xEV0rKMv9sp T4baPmznSgmV3YDVcztMVL0euNNOG3HAMuh3fEbjuwVsMJga5dOofs5TWpCDXEa68dkQDv fPzjhsKUt/JGOP6Xr14s9c75uFCesd0= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: udhn9ppxkpfcegityn8kzpw9sx8u9tf7 X-Rspamd-Queue-Id: 44A4EC001D Authentication-Results: imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="FV5aCr8/"; spf=pass (imf10.hostedemail.com: domain of jiebin.sun@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=jiebin.sun@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1663653028-303250 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/20/2022 12:53 PM, Manfred Spraul wrote: > On 9/20/22 04:36, Sun, Jiebin wrote: >> >> On 9/18/2022 8:53 PM, Manfred Spraul wrote: >>> Hi Jiebin, >>> >>> On 9/13/22 21:25, Jiebin Sun wrote: >>>> The msg_bytes and msg_hdrs atomic counters are frequently >>>> updated when IPC msg queue is in heavy use, causing heavy >>>> cache bounce and overhead. Change them to percpu_counter >>>> greatly improve the performance. Since there is one percpu >>>> struct per namespace, additional memory cost is minimal. >>>> Reading of the count done in msgctl call, which is infrequent. >>>> So the need to sum up the counts in each CPU is infrequent. >>>> >>>> Apply the patch and test the pts/stress-ng-1.4.0 >>>> -- system v message passing (160 threads). >>>> >>>> Score gain: 3.99x >>>> >>>> CPU: ICX 8380 x 2 sockets >>>> Core number: 40 x 2 physical cores >>>> Benchmark: pts/stress-ng-1.4.0 >>>> -- system v message passing (160 threads) >>>> >>>> Signed-off-by: Jiebin Sun >>>> Reviewed-by: Tim Chen >>> Reviewed-by: Manfred Spraul >>>> @@ -495,17 +496,18 @@ static int msgctl_info(struct ipc_namespace >>>> *ns, int msqid, >>>>       msginfo->msgssz = MSGSSZ; >>>>       msginfo->msgseg = MSGSEG; >>>>       down_read(&msg_ids(ns).rwsem); >>>> -    if (cmd == MSG_INFO) { >>>> +    if (cmd == MSG_INFO) >>>>           msginfo->msgpool = msg_ids(ns).in_use; >>>> -        msginfo->msgmap = atomic_read(&ns->msg_hdrs); >>>> -        msginfo->msgtql = atomic_read(&ns->msg_bytes); >>>> +    max_idx = ipc_get_maxidx(&msg_ids(ns)); >>>> +    up_read(&msg_ids(ns).rwsem); >>>> +    if (cmd == MSG_INFO) { >>>> +        msginfo->msgmap = percpu_counter_sum(&ns->percpu_msg_hdrs); >>>> +        msginfo->msgtql = percpu_counter_sum(&ns->percpu_msg_bytes); >>> >>> Not caused by your change, it just now becomes obvious: >>> >>> msginfo->msgmap and ->msgtql are type int, i.e. signed 32-bit, and >>> the actual counters are 64-bit. >>> This can overflow - and I think the code should handle this. Just >>> clamp the values to INT_MAX. >>> >> Hi Manfred, >> >> Thanks for your advice. But I'm not sure if we could fix the overflow >> issue in ipc/msg totally by >> >> clamp(val, low, INT_MAX). If the value is over s32, we might avoid >> the reversal sign, but still could >> >> not get the accurate value. > > I think just clamping it to INT_MAX is the best approach. > Reporting negative values is worse than clamping. If (and only if) > there are real users that need to know the total amount of memory > allocated for messages queues in one namespace, then we could add a > MSG_INFO64 with long values. But I would not add that right now, I do > not see a real use case where the value would be needed. > > Any other opinions? > > -- > >     Manfred > > OK. I will work on it and send it out for review.