From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52916C77B7D for ; Wed, 17 May 2023 16:33:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1FBF280001; Wed, 17 May 2023 12:33:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCF3C900003; Wed, 17 May 2023 12:33:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B97ED280001; Wed, 17 May 2023 12:33:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AA494900003 for ; Wed, 17 May 2023 12:33:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6291CC0634 for ; Wed, 17 May 2023 16:33:39 +0000 (UTC) X-FDA: 80800292958.25.77CBFC1 Received: from mail-il1-f172.google.com (mail-il1-f172.google.com [209.85.166.172]) by imf05.hostedemail.com (Postfix) with ESMTP id 49270100023 for ; Wed, 17 May 2023 16:33:37 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ICxeJ4TR; spf=pass (imf05.hostedemail.com: domain of edumazet@google.com designates 209.85.166.172 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684341217; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YGCTJkjb+BkBWhgl9nkCivR/yxXltKo5X5s56+BQds8=; b=aaz5j75tb6nmCV8D5yuzHunUL9rD4Q+BbWqZrN+7wiLYM/pxSq7g1RoTRv4QGVt0T38yp8 o6/xEEsFb2iI2cZ/rKEdMCU+mOUkyIWQivwAhld5VoekhYyTLl4yPUQTBOlFRDASo7brPp aB7phJ6nea+aqIEP6nW9npXDNOsBGMU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ICxeJ4TR; spf=pass (imf05.hostedemail.com: domain of edumazet@google.com designates 209.85.166.172 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684341217; a=rsa-sha256; cv=none; b=6X/oSn7ZbYWGstLpIGDsPpzLx9/qsmox8pdPdPAY4kONlJ0uqO+k7hLlb1MZrCU7+BrAeb XhNmKlt6Jm4SE9JkE1j2BakptiSrS706EA08ZBLgtbIDVzmegi/0KIaO3gyZQ3Amy1dXnQ 6HClN8vCsXZvzB8O7yic78Wd2GV0SAY= Received: by mail-il1-f172.google.com with SMTP id e9e14a558f8ab-335d6260e9bso137235ab.1 for ; Wed, 17 May 2023 09:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684341216; x=1686933216; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YGCTJkjb+BkBWhgl9nkCivR/yxXltKo5X5s56+BQds8=; b=ICxeJ4TRj0Aa912CHQqgpNLoLbhTEMQAjIlNLTt0p2NM9VeYmoid36yRYdHRJmkrSV id3QBB4TohYYolGYjdav6TEgp9WRM8G9HcoFmBzh8mDTpkseEIlx3Ue0SCUNZR0hJcKu RDmAHgGZny6lAq6/4J0GIsPcNf7Z6BCrWt2O2T16aUTbsFtPgN54lRb9oIktSbJwdbAZ q0YY5qoO4vRfV5C/f4cPfx2e9TwuktWeDHyXbPjPR6OgY3BEqOXkIpO1WU5fDrcAm8PD tbQuukIo8WN0jofAcuhdtDJScPXXh4puE1NnuUwbBqSWsKOcLFr+TatAOah9SidYGTja cRsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684341216; x=1686933216; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YGCTJkjb+BkBWhgl9nkCivR/yxXltKo5X5s56+BQds8=; b=B1hn81qsvO42j+KkpneV5sbMiuvIv6NYgBfWlQZpEvMu6I2DndPvNkSvggX50B8dAk s4VFjMdmDZEsPhOjg7rqrJKpVJ+PqGmkTqlSZEj+lco7TquO2SMWk0jvbEbMcxbHP7gs 7dAx+1KnAT+DDogmpnkErKELjdyZbPCAb022q1KftAuLYc0nALIEaFiWJKTU/PJk+52q I2GV7GOWT/DA5hERpGqmXCjRywQ8PND6aPXRakulfwXy3jPQbKU7bN0zCkRfVDWurn6F grjag7NkwABKf0fwW5qc4wP2HeTM0ifRGUpwk5fSENcR2SGIe3pYMvsiv/ZunECA94+q zqhw== X-Gm-Message-State: AC+VfDxG4FWQO0tfZjBUj2IV3f9AjKVr6cYInuWnk11DSywrnBgVEJvx 1QLxcG0AQRnCQ0BhtVenFrfyDpBkBRwYNB1d9xVx8Q== X-Google-Smtp-Source: ACHHUZ55tKP7LoNajakx4E0yARHVP7p25x16TwzLqiz9gUfuRnaj4ppiBBfCnsVdhpZik4FydQALsG+RoaaSSAYkz3g= X-Received: by 2002:a05:6e02:20ce:b0:338:13f1:8c0c with SMTP id 14-20020a056e0220ce00b0033813f18c0cmr355316ilq.16.1684341216148; Wed, 17 May 2023 09:33:36 -0700 (PDT) MIME-Version: 1.0 References: <20230512171702.923725-1-shakeelb@google.com> <20230517162447.dztfzmx3hhetfs2q@google.com> In-Reply-To: <20230517162447.dztfzmx3hhetfs2q@google.com> From: Eric Dumazet Date: Wed, 17 May 2023 18:33:24 +0200 Message-ID: Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size To: Shakeel Butt Cc: Oliver Sang , Zhang Cathy , Yin Fengwei , Feng Tang , Linux MM , Cgroups , Paolo Abeni , "davem@davemloft.net" , "kuba@kernel.org" , Brandeburg Jesse , Srinivas Suresh , Chen Tim C , You Lizhen , "eric.dumazet@gmail.com" , "netdev@vger.kernel.org" , philip.li@intel.com, yujie.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 49270100023 X-Stat-Signature: 9jxnpzgi6ktfcax5chmrgrzqw1bzdkai X-HE-Tag: 1684341217-499846 X-HE-Meta: U2FsdGVkX1+pS4qBv1d5JxbI0SwdHbai2t+8HkaHAPcAMmKuS8EfWW0DX2LiRw5K7CsIwveLw+f0VDA+yZkdnmai66N8R20rsJT6lz2VGwL1dQQxL4Lz2Btmc3LLRe+rQvTQFFOlIwOXasYuJAMYEEYilbQ6ZPH9fyGAo1hOTeiN2pLqUGh9aAG+hp7Rp1yrwghxfkWhFSSMlIaCmpzLATcmEObGZOM2iiCBD4T7uIqZhSTanWsuvA0sEhxwp3/ezya+4CHJ/r/FLTzI/JG3qytTGKiYXHo/z6Kru4Hg9196SOAwoa6SGvOW+iEXGQEyVVE8Ck9Q6Ge0nFHvlCzlKW6RcdnUfpkq+PWPZR6YygobL0F8yYLoT/SuuwhHqFChBWQTs4nCSfzG8iqFjpZj+gHMc+2GL/Y9qM5oFQAhveseFrAzKf6QraKY5P1c65RLJBZm5QvC2Mj71V4OIwoPmars5zwwb8M6qvLEgTuAz9YZTQl+Z1Gm19axZyl6h0FevxDbHIkzgTo2c+clk6C/dccAUfbF8QCalTXIShcpXCLsvvhL1aE8a1HGxjfcvVNk/8MREEpAiXffTMCaCCHlfc/3AeqndqR5tRHTW+HADW3P6L+0pGU4oxq+WHVY1Gv+mMGRD/ZLkxW2Rtn6CNc1Bu/eRRASi0wVIwnjuKRunJRhnjP8EB6Ac8DDehUIGjGj9wf98A0EtkBNJe4jf+1WDKKwc14MJlapUQknJH2F9XVzMpwGXTBP3sZ6s59uWUe8iV4tx2HeSTW2IKf9P4knlfW0nisjslOUSqz1NYNCGxPeQUYYc9inuLYt+CNpabQ8jN8Zvk1Waa08G7K8f6KJi6RLa25xfIuV2x8IuBMcvmgKimiOWrafVjlR0Uw9F0QdRJ8byYj9v26WKDnszBKSG7da240ZX7NyMQoOCBr5K4jhONIlL582fSbalW4Lr+UXJNjYHDnJONlgwnS9pig /TgJIswb BtzYBmurFQBDqZ3sVQnKrGHiNzxLnw5sZC+pKQHLr/ZQzvGr8hWACpFI4UNqdr1RYoo5MMzxP1EAfJfP9kNLKECNCGG0ejwo7jZOaXlnGOW8DhKPb/sOyYY/qDXYefvOwLkNeK3P6wITu5elCCvn90+TpvzWL3wk+Pk6+MnQ6JcoEUq+TxMXAkz/lwKxzGAbB2W0xFqMz/f8oVjrjDGRn/Vf2/+poKg5CtwGk/k+LrHOx/gMB/hp2Ky0alI+s2lSk2+HmyZmLaFFqa9F+RX6sPXptrcdzMyYz3RfQVTxhA9uWtp7hrsOL3zlp2JuWI3TTN80hWcRhDNbPMxgxyGl3oYezuEvDeOf828IRmh7Dqg+2uGVX91IwsM7l4h/mae/2MhtLUe7lF5XSkgHFWFVNAwg2+Gu/xYcZhoLidxZp9I88dtfLCWNgEwVhSAz2/dxDRoIj4XCG4a5FPY9xoJjAdQrrRVTzjvbNt9OI10TNS2oIQwWX+9Ov7mGlBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 6:24=E2=80=AFPM Shakeel Butt = wrote: > > On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote: > > hi Shakeel, > > > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote: > > > +Feng, Yin and Oliver > > > > > > > > > > > > Thanks a lot Cathy for testing. Do you see any performance improv= ement for > > > > > the memcached benchmark with the patch? > > > > > > > > Yep, absolutely :- ) RPS (with/without patch) =3D +1.74 > > > > > > Thanks a lot Cathy. > > > > > > Feng/Yin/Oliver, can you please test the patch at [1] with other > > > workloads used by the test robot? Basically I wanted to know if it ha= s > > > any positive or negative impact on other perf benchmarks. > > > > is it possible for you to resend patch with Signed-off-by? > > without it, test robot will regard the patch as informal, then it canno= t feed > > into auto test process. > > and could you tell us the base of this patch? it will help us apply it > > correctly. > > > > on the other hand, due to resource restraint, we normally cannot suppor= t > > this type of on-demand test upon a single patch, patch set, or a branch= . > > instead, we try to merge them into so-called hourly-kernels, then distr= ibute > > tests and auto-bisects to various platforms. > > after we applying your patch and merging it to hourly-kernels sccussful= ly, > > if it really causes some performance changes, the test robot could spot= out > > this patch as 'fbc' and we will send report to you. this could happen w= ithin > > several weeks after applying. > > but due to the complexity of whole process (also limited resourse, such= like > > we cannot run all tests on all platforms), we cannot guanrantee capture= all > > possible performance impacts of this patch. and it's hard for us to pro= vide > > a big picture like what's the general performance impact of this patch. > > this maybe is not exactly what you want. is it ok for you? > > > > > > Yes, that is fine and thanks for the help. The patch is below: > > > From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00 2001 > From: Shakeel Butt > Date: Tue, 16 May 2023 20:30:26 +0000 > Subject: [PATCH] memcg: skip stock refill in irq context > > The linux kernel processes incoming packets in softirq on a given CPU > and those packets may belong to different jobs. This is very normal on > large systems running multiple workloads. With memcg enabled, network > memory for such packets is charged to the corresponding memcgs of the > jobs. > > Memcg charging can be a costly operation and the memcg code implements > a per-cpu memcg charge caching optimization to reduce the cost of > charging. More specifically, the kernel charges the given memcg for more > memory than requested and keep the remaining charge in a local per-cpu > cache. The insight behind this heuristic is that there will be more > charge requests for that memcg in near future. This optimization works > well when a specific job runs on a CPU for long time and majority of the > charging requests happen in process context. However the kernel's > incoming packet processing does not work well with this optimization. > > Recently Cathy Zhang has shown [1] that memcg charge flushing within the > memcg charge path can become a performance bottleneck for the memcg > charging of network traffic. > > Perf profile: > > 8.98% mc-worker [kernel.vmlinux] [k] page_counter_cancel > | > --8.97%--page_counter_cancel > | > --8.97%--page_counter_uncharge > drain_stock > __refill_stock > refill_stock > | > --8.91%--try_charge_memcg > mem_cgroup_charge_skmem > | > --8.91%--__sk_mem_raise_allocated > __sk_mem_schedule > | > |--5.41%--tcp_try_rmem_sc= hedule > | tcp_data_queue > | tcp_rcv_establ= ished > | tcp_v4_do_rcv > | tcp_v4_rcv > > The simplest way to solve this issue is to not refill the memcg charge > stock in the irq context. Since networking is the main source of memcg > charging in the irq context, other users will not be impacted. In > addition, this will preseve the memcg charge cache of the application > running on that CPU. > > There are also potential side effects. What if all the packets belong to > the same application and memcg? More specifically, users can use Receive > Flow Steering (RFS) to make sure the kernel process the packets of the > application on the CPU where the application is running. This change may > cause the kernel to do slowpath memcg charging more often in irq > context. Could we have per-memcg per-cpu caches, instead of one set of per-cpu cache= s needing to be drained evertime a cpu deals with 'another memcg' ? > > Link: https://lore.kernel.org/all/IA0PR11MB73557DEAB912737FD61D2873FC749@= IA0PR11MB7355.namprd11.prod.outlook.com [1] > Signed-off-by: Shakeel Butt > --- > mm/memcontrol.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5abffe6f8389..2635aae82b3e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2652,6 +2652,14 @@ static int try_charge_memcg(struct mem_cgroup *mem= cg, gfp_t gfp_mask, > bool raised_max_event =3D false; > unsigned long pflags; > > + /* > + * Skip the refill in irq context as it may flush the charge cach= e of > + * the process running on the CPUs or the kernel may have to proc= ess > + * incoming packets for different memcgs. > + */ > + if (!in_task()) > + batch =3D nr_pages; > + > retry: > if (consume_stock(memcg, nr_pages)) > return 0; > -- > 2.40.1.606.ga4b1b128d6-goog >