From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0A4AC77B7A for ; Wed, 17 May 2023 17:06:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B26B900004; Wed, 17 May 2023 13:06:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23AD0900003; Wed, 17 May 2023 13:06:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 104C7900004; Wed, 17 May 2023 13:06:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EF9AC900003 for ; Wed, 17 May 2023 13:06:03 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AA7B3A02A4 for ; Wed, 17 May 2023 17:06:03 +0000 (UTC) X-FDA: 80800374606.26.5DB3004 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf17.hostedemail.com (Postfix) with ESMTP id 78077400FA for ; Wed, 17 May 2023 17:04:57 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=MuHHRRyw; spf=pass (imf17.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684343097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=; b=eD2m6bpMILSYd4buJ3fD+JmjreHJeYhd8w3LKlZZiZ6qobk1Zl9//zNhLLJmwueiI0uAkY vzg+n4IvQQypYuRofoQ7d+IoIxAdSaaF7rhDPqI4jv4fevOMyKpQRr9sPe5xXZTr4TZfxq X9bQWWSFFnImxz0iXWapCi6gHfR/JiQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684343097; a=rsa-sha256; cv=none; b=LkMP8+HeSRB1+wKmqxAGIp7Zxp7D734LtyFx4GQdFQDHQtqUU0mbCLOKuZZ5JwDMd91XmB yZXbJkiLRODvPVRrJ5HuVUltU8umN03cYIMSz6gItPVQAteZFOpRacE9PgsIbs+nUiK/Xt QN4O2WyVGDBQdcvv6Zn+pFH8wJ+z3AY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=MuHHRRyw; spf=pass (imf17.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1ae3f74c98bso4445ad.1 for ; Wed, 17 May 2023 10:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684343096; x=1686935096; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=; b=MuHHRRywhO1aSHU/MEanLSNwrqTsXNgGhXpEHv+Gd8S0hwOcQL217X2Ki8tX/r2Mdp 2NEszZRFQrKKEzSloCfMXL+YkCte1+2uEysd7xGefM+J+TWlNJh4TsRgeW77bhP6+kwO 23Xi1rxMk2QV7bWUV4P//0NIZbIyeuqzNOBaUoMJ84HfZTnE1l5O2PUW2b+iG0T9/T13 Z9V8t9i5e5YBLwx3c+4zXl3IlWl+5jSODWaiVlemT0TnZoahQxM7WN4iSLNmr5tFG0pH BwYPJYGxU7Ajkdyn4I8jL1bvPDyO1qrrHxoTcvahP6GpyH9bqqCWJIwkJRVeShq5tJLn LieA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684343096; x=1686935096; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=; b=JAxa1jNddFV8OGw1sKk+mKAgNboFxqBTxcRHEKpHrecBbpasFvsyzymZeNbRDY9xGN QYKY2gf93NWgppFntaDctBzIQvFpTBUMMkOuxWHLnxRwzGD6YO7MAgATX5AV7zKIbFHb TBSEDXjOlLRJZ6/OJBOGaCOrJDYGvt/qCtLhQs99HHkD8X9afFBoL+jRdKgbMISZkDS1 vBe+FGf1V3gxvnYhNzBTRX4sOBtsDELq44Z9DktvsTPfKhIUaQHIJ5WqADrpDS+oliCW 2q0xgRQNq2Wnyl/Pt4i0q7/INCt058cOZdEm5I2uacY5Ruv0EIz88zMSoTvaWFOgXNhS Bygw== X-Gm-Message-State: AC+VfDyz7fEmh9JL3x8nbWWx+y4aDuz38AfrDwtRgpxu/KolFtk2u4+7 rZ1+n1S7gDbpt014Iuhgm5AsQWXiUzBq2XPXT6kSt/MeJL+I80ZQj7Mywg== X-Google-Smtp-Source: ACHHUZ7KKUoCffZgSyGuzx7i3/4H3NPwhF7EzftKGDKe1OgKxOmnMBm6vVmgCSTa78JQT6bJzOPyyzJSIV0YwtBHRVU= X-Received: by 2002:a17:902:d483:b0:1a9:bb1d:64e with SMTP id c3-20020a170902d48300b001a9bb1d064emr353009plg.15.1684343093602; Wed, 17 May 2023 10:04:53 -0700 (PDT) MIME-Version: 1.0 References: <20230512171702.923725-1-shakeelb@google.com> <20230517162447.dztfzmx3hhetfs2q@google.com> In-Reply-To: From: Shakeel Butt Date: Wed, 17 May 2023 10:04:42 -0700 Message-ID: Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size To: Eric Dumazet , Andrew Morton Cc: Oliver Sang , Zhang Cathy , Yin Fengwei , Feng Tang , Linux MM , Cgroups , Paolo Abeni , "davem@davemloft.net" , "kuba@kernel.org" , Brandeburg Jesse , Srinivas Suresh , Chen Tim C , You Lizhen , "eric.dumazet@gmail.com" , "netdev@vger.kernel.org" , philip.li@intel.com, yujie.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 78077400FA X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: nzszxg7drc6swymw7yj7h8kmtnf13e54 X-HE-Tag: 1684343097-779896 X-HE-Meta: U2FsdGVkX18lQuU4qWSVaVRRftuq9TlSRowqlQn97Wg+OGDq+FSNK1878jkrp1s+6IeFW5qjR1bR/ewk2O7FClujDJPAi7gHon0sEw5W58dMtwu0AD/X53IF3b/ksMKwqmKdbnW7PAHfkCw4fAgLk+cCBunObNG0KuEhjyd7xNotoUTh+dYZ61NTtwdeSDwVzEtJyUr8HRcXVXtiukCdDdqN22p4WiQ+WSaThTnzn0gYvo0aBGIwbTheFMWUv8qYtBCFM5KhjmueOcBezja88kuTc7qYrTpN14aV5nIrsLQ2RLvUy2kgFyAyAek1p4ZzLPF20AfFj3n+I6XxOGADv5BGdiKk6/q+rwOX5Hkh3TSvdF3hyGTGEYA24LUUomff6k24PQm/uqTg/LfP2XZZg2JZ4wtcw+yRj1/ONIQKo/cGkU3/ry5OOwrYFFqt9LJKnXc6kINNwxngiEGquapqf7XaWZXDBl5SG8o72eMml4hWib+CTiWJh4ZiTfk80SE16kdU4gWPzyBGLbs5tNwJUJdJW0qVze+dGOMfp6E48uQEui3ou3FiJYDlL5tAAoZ7NZX0lzJQEpFW2797lmYdUKyo7ykSNO1boMtfjG+buP67OQZmAz2od224FOqHkAjlI+2sAOW/vEh4vvB/iZXpA+dzdqp89LFuiys+MMcfgK3ZZ5OXdg+B7VZ1uu9qV57iSs/eT60vBeyULdugTfI77qICmA92R27GEesms2YyEycZKOxVhnfKuOXMXVeBzQMUuS1NYZD0LicLx2SND1iGMEt7q2uy4TOPixxUo/cRg9yccsTJyegMEB9ZY7gIzAcfzvDUrI4qQlrVzE5aY/L1ZpzoQFYaALSMZ0VLUELxLcpj68HBzWjmoVwldo7PXqIlbmQlSlUHTGQB2nnYHa27xgpP55rG0cL+wMUNGdnFuQlZPIuVafMc4XYjW5UtbWI8LncBMsan9VkieM9npcB NAVrXDb1 XYHCsy7iW8lLD60mjliaQVVthDgwx/w4w7ads8KOmQs94/SBDkIPdh1LsjRWrLmoxIklp0S4VH2yvzsgCjh8+WRh8Uiq+Q3DaWXC5f2wi1UWtvEbMLpKRJmyhUx9tPu/KzUJR1rDwb+RAJNG6XAVp2igay9tjDgOItoAiHgtuDrlbDpA41bKOPdmdhM+fiFX76RL7rcU0YPEvfb6vBp/kAaZk5FLNvQzDyHj3EWpZez6u/00o6/sOG01/z+uFaChCRVwqNmommI7HarXDNthNnSAw+w6xW3P8998AsuCOHAnt1aDH3uV+taHp3cKLcQIbY6Phr/Ei+PQN4xzj9N+z+vsEr4+QusQcRF/7a7gkzfnmDqle0wRJCyHS8H2/SpVKxJm7xI1eH0Q4MzuGBbl03hsRwRuqZA6YZT5Y X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: +Andrew On Wed, May 17, 2023 at 9:33=E2=80=AFAM Eric Dumazet = wrote: > > On Wed, May 17, 2023 at 6:24=E2=80=AFPM Shakeel Butt wrote: > > > > On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote: > > > hi Shakeel, > > > > > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote: > > > > +Feng, Yin and Oliver > > > > > > > > > > > > > > > Thanks a lot Cathy for testing. Do you see any performance impr= ovement for > > > > > > the memcached benchmark with the patch? > > > > > > > > > > Yep, absolutely :- ) RPS (with/without patch) =3D +1.74 > > > > > > > > Thanks a lot Cathy. > > > > > > > > Feng/Yin/Oliver, can you please test the patch at [1] with other > > > > workloads used by the test robot? Basically I wanted to know if it = has > > > > any positive or negative impact on other perf benchmarks. > > > > > > is it possible for you to resend patch with Signed-off-by? > > > without it, test robot will regard the patch as informal, then it can= not feed > > > into auto test process. > > > and could you tell us the base of this patch? it will help us apply i= t > > > correctly. > > > > > > on the other hand, due to resource restraint, we normally cannot supp= ort > > > this type of on-demand test upon a single patch, patch set, or a bran= ch. > > > instead, we try to merge them into so-called hourly-kernels, then dis= tribute > > > tests and auto-bisects to various platforms. > > > after we applying your patch and merging it to hourly-kernels sccussf= ully, > > > if it really causes some performance changes, the test robot could sp= ot out > > > this patch as 'fbc' and we will send report to you. this could happen= within > > > several weeks after applying. > > > but due to the complexity of whole process (also limited resourse, su= ch like > > > we cannot run all tests on all platforms), we cannot guanrantee captu= re all > > > possible performance impacts of this patch. and it's hard for us to p= rovide > > > a big picture like what's the general performance impact of this patc= h. > > > this maybe is not exactly what you want. is it ok for you? > > > > > > > > > > Yes, that is fine and thanks for the help. The patch is below: > > > > > > From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00 2001 > > From: Shakeel Butt > > Date: Tue, 16 May 2023 20:30:26 +0000 > > Subject: [PATCH] memcg: skip stock refill in irq context > > > > The linux kernel processes incoming packets in softirq on a given CPU > > and those packets may belong to different jobs. This is very normal on > > large systems running multiple workloads. With memcg enabled, network > > memory for such packets is charged to the corresponding memcgs of the > > jobs. > > > > Memcg charging can be a costly operation and the memcg code implements > > a per-cpu memcg charge caching optimization to reduce the cost of > > charging. More specifically, the kernel charges the given memcg for mor= e > > memory than requested and keep the remaining charge in a local per-cpu > > cache. The insight behind this heuristic is that there will be more > > charge requests for that memcg in near future. This optimization works > > well when a specific job runs on a CPU for long time and majority of th= e > > charging requests happen in process context. However the kernel's > > incoming packet processing does not work well with this optimization. > > > > Recently Cathy Zhang has shown [1] that memcg charge flushing within th= e > > memcg charge path can become a performance bottleneck for the memcg > > charging of network traffic. > > > > Perf profile: > > > > 8.98% mc-worker [kernel.vmlinux] [k] page_counter_canc= el > > | > > --8.97%--page_counter_cancel > > | > > --8.97%--page_counter_uncharge > > drain_stock > > __refill_stock > > refill_stock > > | > > --8.91%--try_charge_memcg > > mem_cgroup_charge_skmem > > | > > --8.91%--__sk_mem_raise_allocated > > __sk_mem_schedule > > | > > |--5.41%--tcp_try_rmem_= schedule > > | tcp_data_que= ue > > | tcp_rcv_esta= blished > > | tcp_v4_do_rc= v > > | tcp_v4_rcv > > > > The simplest way to solve this issue is to not refill the memcg charge > > stock in the irq context. Since networking is the main source of memcg > > charging in the irq context, other users will not be impacted. In > > addition, this will preseve the memcg charge cache of the application > > running on that CPU. > > > > There are also potential side effects. What if all the packets belong t= o > > the same application and memcg? More specifically, users can use Receiv= e > > Flow Steering (RFS) to make sure the kernel process the packets of the > > application on the CPU where the application is running. This change ma= y > > cause the kernel to do slowpath memcg charging more often in irq > > context. > > Could we have per-memcg per-cpu caches, instead of one set of per-cpu cac= hes > needing to be drained evertime a cpu deals with 'another memcg' ? > The hierarchical nature of memcg makes that a bit complicated. We have something similar for memcg stats which is rstat infra where the stats are saved per-memcg per-cpu and get accumulated hierarchically every 2 seconds. This works fine for stats but for limits there would be a need for some additional restrictions. Also sometime ago Andrew asked me to explore replacing the atomic counter in page_counter with percpu_counter. Intuition is that most of the time the usage is not hitting the limit, so we can use __percpu_counter_compare for enforcement. Let me spend some time to explore per-memcg per-cpu cache or if percpu_counter would be better. For now, this patch is more like an RFC.