From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 225E5C77B75 for ; Wed, 17 May 2023 16:24:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2E8F1900004; Wed, 17 May 2023 12:24:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29969900003; Wed, 17 May 2023 12:24:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 160FE900004; Wed, 17 May 2023 12:24:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 063E0900003 for ; Wed, 17 May 2023 12:24:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BBEDB80419 for ; Wed, 17 May 2023 16:24:53 +0000 (UTC) X-FDA: 80800270866.11.56251ED Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf24.hostedemail.com (Postfix) with ESMTP id B2E86180016 for ; Wed, 17 May 2023 16:24:51 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bornvmoK; spf=pass (imf24.hostedemail.com: domain of 30v9kZAgKCI8B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=30v9kZAgKCI8B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684340691; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Uhj13WmifJa2aeKhbFLQzqbb2EDANf9eZoINWvtSE2A=; b=E0Esehs6Hu0WlaN7NP9Dw3qg6rg2BKlPNZ7VkejGj5z7IKcD2YsJAF+quLvmAxwjqmz/em 0Gk0qKCAo7iEuHt/mTOKAJrH1eaS2Pi2nAofES49xoXofB974AngcYu730bkgNLe5YukOY NggHdJTzoFw/QbcDxthKTQz3KfcEk9k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684340691; a=rsa-sha256; cv=none; b=c2IzRsRTOVidB3V38RS3fjlwl5rHzC5WlnCekb2ghkTyhatssNsM6H8Ubk4AvogjpqkLbF EvjzdzclXsGvGr+by6eFCa7aVMNf7scAh5EWUGCljLaV2xnXOrUle6Lmo5A5X5YK58k7td 3OdsxGn0GFavl72UrHocgxi1DgKq1Ik= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bornvmoK; spf=pass (imf24.hostedemail.com: domain of 30v9kZAgKCI8B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=30v9kZAgKCI8B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2533e0cd8f2so658778a91.2 for ; Wed, 17 May 2023 09:24:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684340690; x=1686932690; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Uhj13WmifJa2aeKhbFLQzqbb2EDANf9eZoINWvtSE2A=; b=bornvmoKTbwKLAGdqyP8wPwI3RCChOb5qwKBQ0fDlVktzwYmhxPBqaW0/eIvuj3T+K ImF5TtiPn4AsUtOKiIVELZWbayvn1TgApAq+JIhcSsJM82IR6qqp8r9VIbQYDtT8z4+2 onCxpRR/l9r0C5Nn8aANiK50xxasfIQPKTx7Q+Za7SNHVQRpUBstEeFe0XiC3tfX8fTc p5BdtY8jsudGxUql8beUeGSM+AcbsgzAcPQB0rCvwDPdten8h1jULfvYOKcqlAEI4jGZ jaavH0l4P5S3a6QzksSnrZ+hYobEf8O+BpVpeZFBLoLHTn+l8JLujpqbk2UsYecq6Kit vqlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684340690; x=1686932690; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Uhj13WmifJa2aeKhbFLQzqbb2EDANf9eZoINWvtSE2A=; b=ENVeRsouys/XPu4dlsNhdTneRUCo9mnIk7alX8rZZxwyhlVEssIykX3ihS5afGoWzi qI6mSpp0yVxwBguu7baaiWjPq216Krt1FJ9hORVfAe76tmF3UtkE0b4//gx+u/x6Uy7h 6KC5viWM05aokK5Eq41tVhqJNDTrY7SqhP/bP27fyfBbkrAi2nquG2c6NUtKCcTmkI8o LfRG7vsj9NfbJZkfPiJMc+LaSSVRkCnGNdwVd4VczOrV1sO4QEuf4jDPVFCgmvKkvLDP BlVDQRv/1FzlSYvqxa88XTkEIRhgmPuGhEjTawMvprq2sW6T9rCE5KuDvDlSdA9hTvrn F36g== X-Gm-Message-State: AC+VfDy8ZOx9DAcS37eOiW7YH9xeeFFrcnCGA5O/z7blC8Kj/CareJzD 7QsT7tFUeidV/JbzWXNqEC/P13zOl8Tckw== X-Google-Smtp-Source: ACHHUZ7zFcQiC3H4iYl2GxelKoQrZbRMuimFsIc0Q+CPOEISLhvFpSJ45HZ4JcoZXUAnsWbz1B9rl5iVnhwgzQ== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a17:90b:103:b0:253:1ddb:6ab7 with SMTP id p3-20020a17090b010300b002531ddb6ab7mr69193pjz.7.1684340690517; Wed, 17 May 2023 09:24:50 -0700 (PDT) Date: Wed, 17 May 2023 16:24:47 +0000 In-Reply-To: Mime-Version: 1.0 References: <20230512171702.923725-1-shakeelb@google.com> Message-ID: <20230517162447.dztfzmx3hhetfs2q@google.com> Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size From: Shakeel Butt To: Oliver Sang Cc: Zhang Cathy , Yin Fengwei , Feng Tang , Eric Dumazet , Linux MM , Cgroups , Paolo Abeni , "davem@davemloft.net" , "kuba@kernel.org" , Brandeburg Jesse , Srinivas Suresh , Chen Tim C , You Lizhen , "eric.dumazet@gmail.com" , "netdev@vger.kernel.org" , philip.li@intel.com, yujie.liu@intel.com Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: B2E86180016 X-Stat-Signature: mryirse5rc3m41rqxrijgntrrg186ii3 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1684340691-981681 X-HE-Meta: U2FsdGVkX1+Nrp9dPb3zjbMTLuGE8Wk3KuJ57TgNyjjnOxpGOx8V02HMefU2dXaKzan5nuxuSpi0CufeUxUnbZACJkYu92eB+lBJNNMbjdbKvVPok3HqxX8SS/rSOTdCoE9yTNUMbnwesCxbPe1XQrB25LnUVGZQfpYSXlFVOTaY3AftSeMRzO/fHXevMKyycPP/PBBYxs3KnvowGg4JGqCgerFeBBMvsInCDPn4GS4C+7s1Ngd7SCKbhGDDTtfboyYPMG1e79xLpyTHsJj+lKcsuRm4G57LEYaad3i05NhOyEAWWYkxS7bMFTRAlWY3Tge09qHJCYk36hUrzxTnlkZMm/IBLxqSFu8Q+z7NNL19ytNu7zm/PKymn678Yf5b0Y2xHO00r4IAsF2UYRPhGAxZfaGMCOpSeJa1McNNLgtJ5UVfwoxkMq42bjSb3QWN7ho+7MpmdxloQ+pVRjlFkUi18xM+i35Hv5/3CHj2sn/t53igmZeaExRfRKZzwZnX7SA+Qa3nZGVLvjH1KOq2pekOPc5TF16kfuiwXA/tK8jJ0EwRZGtT+RUPrLdSkPD6wv3VcELYVg5FSZG8nf7dGjsVw44latIJ1T2mbRtPuKSHa5gezYTbj6Q7LV0f+QZSJyaG0veqqNqL//OWRcip69dcWcpzk3iuM0unKhPx9GpC8R2773LYGoJ5Yl9FkEYFtXBFe7f3v2MEKMvn3WXWiKlQEvc1DtTtIZ2CieQkAiPYfIGVw5JIE3UJhCkqEH7vXtr3TnJSF/n7ns5LI9wOaPbi73mwg82fAEweGnjiiLFm9fkDDg4uWQhLjEHro6Yt7jMSi+VEqs3wAst6jNWYV8J6huHaq+JeHLXaeskruOjsTLmia2ioZs0u/IszKp8hYbz5ycVwziMNy9ttnLThVLcuMgf1zHKhSYUNZOW6uc8eBJ/uNaNzH9UBlLCskjInv2nSUtQU0MhMqNrzwyU 81Gs9Ndu dg5eGpoljuscVmnMvOBjl6/bAIlHkYv2xrfWkQdKqlHhCPiAuCiKNWPcVoRuncolwOvSvE10mCGS7UzoLPQpSnmjEWE45GvTrcC6x4aZpLzst5YsD5/wXhOGJC1eOnOXzlkn1Bx1+fWJZ7/vhGPiioVSODKACRvwXoNoS9zKMIYQ0unYwD3g43R+M6vz3b+NKteAjSyhZCMECeRL0PwPlfxETI1z0V3s10mxsuQkoiyBgM/tvZjHlVRSPR4cBJKfO/yWrIC2s681P//BAEKHmBg560XT4zkeVWNaTu7tf2pmCyQptRVZryHh+LCGmBgEAvpoEbFT+HGwZMwoKlC2EdF7RAvpsXbxnQ9mrO5hUWwZisoFiZjGYJ48hTV+X51YUCFAn+8vg21XTDf2Trg0PXqnVZlCzHiUyol23+0J5HSamP1+NGY3JiL5YZu+k9t7+VSfknLH81QJe18z7+7DS1yZlAfoAOeFM0zOmPDD5wpasdKAY0P+ceYiBxMF4OSqSeAASNd0iqQIzGFDDXWA67Ydkq++ajPHNWhemGxhyMfRM8+u5o2S37va4uy9XyPMrNTIYFzTy2KDF006I7DaeEdg9FrDXHlzUxud57YC6AOJk85X1yGxYnLNuY6uYSp8HwNc0EyVyQ0Oh2IRPahU3NTyMCFrsyzZlw4hjZ3KHVQSIOGypamr6SjPX0kfmWDGWwszQ8eTyNk0TC/6sqXra+xfn4J2XMb5KHNDUvKcuVCAeEy2NrAxDcAtjNxvW1Zax8sik5R3DbzpSYh/+20CoSRlLnUz/2e30EW9ABJB+tN0oVIR7W6soEvvM6SsKNKZ9I1EtN8Hblbd9aumj38AEToxExKN5bk+uDe+WGXtQ7s62E20= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote: > hi Shakeel, > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote: > > +Feng, Yin and Oliver > > > > > > > > > Thanks a lot Cathy for testing. Do you see any performance improvement for > > > > the memcached benchmark with the patch? > > > > > > Yep, absolutely :- ) RPS (with/without patch) = +1.74 > > > > Thanks a lot Cathy. > > > > Feng/Yin/Oliver, can you please test the patch at [1] with other > > workloads used by the test robot? Basically I wanted to know if it has > > any positive or negative impact on other perf benchmarks. > > is it possible for you to resend patch with Signed-off-by? > without it, test robot will regard the patch as informal, then it cannot feed > into auto test process. > and could you tell us the base of this patch? it will help us apply it > correctly. > > on the other hand, due to resource restraint, we normally cannot support > this type of on-demand test upon a single patch, patch set, or a branch. > instead, we try to merge them into so-called hourly-kernels, then distribute > tests and auto-bisects to various platforms. > after we applying your patch and merging it to hourly-kernels sccussfully, > if it really causes some performance changes, the test robot could spot out > this patch as 'fbc' and we will send report to you. this could happen within > several weeks after applying. > but due to the complexity of whole process (also limited resourse, such like > we cannot run all tests on all platforms), we cannot guanrantee capture all > possible performance impacts of this patch. and it's hard for us to provide > a big picture like what's the general performance impact of this patch. > this maybe is not exactly what you want. is it ok for you? > > Yes, that is fine and thanks for the help. The patch is below: >From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Tue, 16 May 2023 20:30:26 +0000 Subject: [PATCH] memcg: skip stock refill in irq context The linux kernel processes incoming packets in softirq on a given CPU and those packets may belong to different jobs. This is very normal on large systems running multiple workloads. With memcg enabled, network memory for such packets is charged to the corresponding memcgs of the jobs. Memcg charging can be a costly operation and the memcg code implements a per-cpu memcg charge caching optimization to reduce the cost of charging. More specifically, the kernel charges the given memcg for more memory than requested and keep the remaining charge in a local per-cpu cache. The insight behind this heuristic is that there will be more charge requests for that memcg in near future. This optimization works well when a specific job runs on a CPU for long time and majority of the charging requests happen in process context. However the kernel's incoming packet processing does not work well with this optimization. Recently Cathy Zhang has shown [1] that memcg charge flushing within the memcg charge path can become a performance bottleneck for the memcg charging of network traffic. Perf profile: 8.98% mc-worker [kernel.vmlinux] [k] page_counter_cancel | --8.97%--page_counter_cancel | --8.97%--page_counter_uncharge drain_stock __refill_stock refill_stock | --8.91%--try_charge_memcg mem_cgroup_charge_skmem | --8.91%--__sk_mem_raise_allocated __sk_mem_schedule | |--5.41%--tcp_try_rmem_schedule | tcp_data_queue | tcp_rcv_established | tcp_v4_do_rcv | tcp_v4_rcv The simplest way to solve this issue is to not refill the memcg charge stock in the irq context. Since networking is the main source of memcg charging in the irq context, other users will not be impacted. In addition, this will preseve the memcg charge cache of the application running on that CPU. There are also potential side effects. What if all the packets belong to the same application and memcg? More specifically, users can use Receive Flow Steering (RFS) to make sure the kernel process the packets of the application on the CPU where the application is running. This change may cause the kernel to do slowpath memcg charging more often in irq context. Link: https://lore.kernel.org/all/IA0PR11MB73557DEAB912737FD61D2873FC749@IA0PR11MB7355.namprd11.prod.outlook.com [1] Signed-off-by: Shakeel Butt --- mm/memcontrol.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..2635aae82b3e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2652,6 +2652,14 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, bool raised_max_event = false; unsigned long pflags; + /* + * Skip the refill in irq context as it may flush the charge cache of + * the process running on the CPUs or the kernel may have to process + * incoming packets for different memcgs. + */ + if (!in_task()) + batch = nr_pages; + retry: if (consume_stock(memcg, nr_pages)) return 0; -- 2.40.1.606.ga4b1b128d6-goog