From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60436C77B7F for ; Thu, 11 May 2023 16:35:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6B716B0075; Thu, 11 May 2023 12:35:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1C1C6B0078; Thu, 11 May 2023 12:35:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BC326B007B; Thu, 11 May 2023 12:35:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8ADA86B0075 for ; Thu, 11 May 2023 12:35:19 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3F03E80D96 for ; Thu, 11 May 2023 16:35:19 +0000 (UTC) X-FDA: 80778524358.26.4B35286 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf30.hostedemail.com (Postfix) with ESMTP id 148748001A for ; Thu, 11 May 2023 16:35:16 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=onivH4Pb; spf=pass (imf30.hostedemail.com: domain of edumazet@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683822917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ON9Fi0zzx5YtFUBmyUYsMJo9GANJ9V6684fyCL5kqm8=; b=8bJf79FKuFomu6fOn4DFXJpG7WXrx8IEhZEeqVoPkC9yPYQ21ik0KBqL3wgMhMJukPs0/7 NkTpNvWg7PhPBiRVevVS2ht6uv4wn8rjdAzSooez2N/NnvA00OZJ8hKksKgolsF2SuF7eu fbcucjs1MUgzvHMOvoTOBJPK+mpQ/qo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=onivH4Pb; spf=pass (imf30.hostedemail.com: domain of edumazet@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683822917; a=rsa-sha256; cv=none; b=GR+TRoU4U5PS3y6/VHTfzL5iEf6lCqCklWc3TrAsZnyqmIcSUWxEGO1mjqTyJ9EqXwyhKZ 02K3SNnvP4mrUGtuaoktLWnFpGXTP5x7kK/UBlPqvwFurVooATzIW28jQCuJOm52zJPaja PL2lmnxpa9QtXQaH02r6qWKhmTUf21A= Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-3f4234f67feso213945e9.0 for ; Thu, 11 May 2023 09:35:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683822915; x=1686414915; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ON9Fi0zzx5YtFUBmyUYsMJo9GANJ9V6684fyCL5kqm8=; b=onivH4PbV04Tk0WiILbY5H3Of3lUq7JBJJPuKNzsvonzqv7mmCj6pJ5BVjQLw5bX30 5pOUoulnj4KQuRv8jvtwujMTJnv2rpMjzZRn/nRmYrXIkyduyXwo6jfjW1h41dvOwgbW XDKBxmqV4THmYj1gbn1XGGcLvykRxXKC67X3W4vLK1YhUJrhP2WLJcnaHkzpm3IBx6Uz ixn2OEN9y4DskXPqeohNXioM/ydi7i0zVMPbUAXiBxUMOpBn/jMJQ9MAlgeuoytIUM7K qLtY5yesqXKv/kH+uyBjJQXL6JkoKnbrVzK3NmwUXSn+fUGUHRlhUXOmR2FOfShxPGu2 TkZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683822915; x=1686414915; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ON9Fi0zzx5YtFUBmyUYsMJo9GANJ9V6684fyCL5kqm8=; b=VnGijRc014FRAZ9SzZWI0lrRCcHl7JOlb8gdTbFBb8vZC7YYdIzfR0RWiOOoooTvc3 +y1yhZR68/Ks26FofguoABj8iRwzAKMirB99tyOUvw8aUyy4vD2aDhYqjeUVFcdRFUag 8VFXYSbvkr+t/wSU2458s75YUZ0FznjtsyC133QcT8qVfCjfIOYwcXBXss94+a4COHvA 7ZulA1wSSHlVIlYQxf5qHKXJjFabqYu1cJN173Sk8euMMMotih77/hM75FTCPdMYQ1Ym fywhdvJqminQ5WaDOJeu/AUkWEt0BvKHtmwoMO9P8t6jKuq5G40KZeCnYhwLqFKscaJE PjuA== X-Gm-Message-State: AC+VfDwI0GVqNFaX2evkLoJ0cYwK2crFAqgOZ6lT6RM4iKLJkJ5vpXmt L5jWGHFr6hlLM9PcLE57VSV3eJ5k1Sa92uxy7mP9/g== X-Google-Smtp-Source: ACHHUZ44DUVZ6CXKVQxVcocy3hWsEM06jeFyQKl7DACOHu54aeGpEBnvtTWKTJIO6BzH86YDumo89nKfoX5r9w4ck+0= X-Received: by 2002:a05:600c:502b:b0:3f3:3855:c5d8 with SMTP id n43-20020a05600c502b00b003f33855c5d8mr26741wmr.6.1683822915271; Thu, 11 May 2023 09:35:15 -0700 (PDT) MIME-Version: 1.0 References: <20230508020801.10702-1-cathy.zhang@intel.com> <20230508020801.10702-2-cathy.zhang@intel.com> <3887b08ac0e55e27a24d2f66afcfff1961ed9b13.camel@redhat.com> In-Reply-To: From: Eric Dumazet Date: Thu, 11 May 2023 18:35:03 +0200 Message-ID: Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size To: Shakeel Butt Cc: "Zhang, Cathy" , Linux MM , Cgroups , Paolo Abeni , "davem@davemloft.net" , "kuba@kernel.org" , "Brandeburg, Jesse" , "Srinivas, Suresh" , "Chen, Tim C" , "You, Lizhen" , "eric.dumazet@gmail.com" , "netdev@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: nqe9nk3mmy4gfq8bdu4ns13mj9sinnmw X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 148748001A X-Rspam-User: X-HE-Tag: 1683822916-656347 X-HE-Meta: U2FsdGVkX1/Md/LrG7b2VOJWiVHyMd/rDOwFg9DtTjllrixMwu0h2nKXP4UdhLBIev85RLG5bCdPtkekEJaeHEghFk5ouTDXHWU6Bm72bT5rcsRxoiqqE+U7Bwa4eKbVI7BKUMN6lOp2tInOEPWu0WE5Fl0Wlg9GP8pA8NGnMNe+JIV17ev0pGVqoKdKlWlqZY/zUfuIagb85KqCoz6nqiA9z4agse2ZrLaKdtWolv3YtxSwz5clV9cemskcCDSf1OHv70zeZalwcNcchexQSRPyDN+rbc3MFYHqu3YJ8NjsXDiOk/7dTrmQj6NQx7BoEjuncCOBlRim9N+P8cznvz16VJ+vRZ8be37+m4+gEiARQf92hZ0lhppMO9TAyH60Vu55aqkYF+zd+n53h8+8C+Ay8pE6Nmu6Fi4csxd+iKXAm3FyiehedDln2cicCepG5qUEWCuyTFTaE2ItTvPIbXofBMoKyRYFik/XE9M/cAK0x/7Y1XyziWA2u3hLKZXfCzbUga+2vFBe4ysfBDx6Eu25BD3L5ppcosJCBrIveXC9l3Q7eM5jnmLAIzZ504mzLcDF04aZcBYt59+dW6UnlbbniXyUbqTfKwba3mbRtcmX9c8+mPAUBSBS9r0C10nus/LXHrAv5uQkGWtivZNKcYcX7jk7rbUs2lqY8sF+PFxpuYqaFjT+x3SBNcNI9GObk9tVitVtIB3Em2MtCG2ctBz2FRL5al0to9BOt501OhA4yngiYZruIRF56XyX/EoBJtgVoP1wMT5zjncBAj5Ryl1JhaFHI5682pCt6O5DGLXSKx9FXkn93TFe8Xx/Mp9UmJoeox9uhSJXVFS+h4f6Nw6f9zGRA5YYxVL1I8Bbg+ZFK1MCJJZFUDZBJaUInzkvCRcnyeSUIpWkdXffFQlPpEbG+PNZMJfVojKZiCB6Mh0o2Uma4m22Mxd41HksykbkN8mVby9I/7tC3e1CXaE NJQOyuB1 RMwolh4/l1sSAw6cyuj5KW4po7Y85CrstDcSIWZanZuJaKYSDVzE6CnixR/Bnh8YjEaGfFjMAXp0l0IOBv/YXFNfFHZKqHyDSkTjmEmyUlChlklcnOzlqsdvXF8/L1l1+54r825mdJh7+95wRqqfqq5y1acB05Ut8xAJQ8ppw6NtknHfWvtElKHt77e8LhtLNsp8W+qxnS38ETUvh+LMSrXYooQpNHr9XITUF4piApzJ7IiF5LkasnmyfB1sc0AB+jXWCRev84lTuecAumbivW10NtB8rVOY0TsVuTZFYVN23V3JR7hrb5Rj//ZtXePD0/WznF5lpj1Pk/5DTWsX0ETbxcH4TS/XPytWNJvRAVFh7AjyjO3BHi+G0HxZNZ8GrmapQM78qw7+RME2mZJKHv30F7F4gngzLt1Xh2CPa4OFmCvn0wuTMjiv9gA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 11, 2023 at 6:24=E2=80=AFPM Shakeel Butt = wrote: > > On Thu, May 11, 2023 at 2:27=E2=80=AFAM Zhang, Cathy wrote: > > > > > > > [...] > > > > Here is the output with the command you paste, it's from system wide, > > I only show pieces of memcached records, and it seems to be a > > callee -> caller stack trace: > > > > 9.02% mc-worker [kernel.vmlinux] [k] page_counter= _try_charge > > | > > --9.00%--page_counter_try_charge > > | > > --9.00%--try_charge_memcg > > mem_cgroup_charge_skmem > > | > > --9.00%--__sk_mem_raise_allocated > > __sk_mem_schedule > > | > > |--5.32%--tcp_try_rmem_sch= edule > > | tcp_data_queue > > | tcp_rcv_establi= shed > > | tcp_v4_do_rcv > > | tcp_v4_rcv > > | ip_protocol_del= iver_rcu > > | ip_local_delive= r_finish > > | ip_local_delive= r > > | ip_rcv > > | __netif_receive= _skb_one_core > > | __netif_receive= _skb > > | process_backlog > > | __napi_poll > > | net_rx_action > > | __do_softirq > > | | > > | --5.32%--do_so= ftirq.part.0 > > | __lo= cal_bh_enable_ip > > | __de= v_queue_xmit > > | ip_f= inish_output2 > > | __ip= _finish_output > > | ip_f= inish_output > > | ip_o= utput > > | ip_l= ocal_out > > | __ip= _queue_xmit > > | ip_q= ueue_xmit > > | __tc= p_transmit_skb > > | tcp_= write_xmit > > | __tc= p_push_pending_frames > > | tcp_= push > > | tcp_= sendmsg_locked > > | tcp_= sendmsg > > | inet= _sendmsg > > | sock= _sendmsg > > | ____= sys_sendmsg > > > > 8.98% mc-worker [kernel.vmlinux] [k] page_counter= _cancel > > | > > --8.97%--page_counter_cancel > > | > > --8.97%--page_counter_uncharge > > drain_stock > > __refill_stock > > refill_stock > > | > > --8.91%--try_charge_memcg > > mem_cgroup_charge_skmem > > | > > --8.91%--__sk_mem_raise_a= llocated > > __sk_mem_schedu= le > > | > > |--5.41%--tcp_t= ry_rmem_schedule > > | tcp_= data_queue > > | tcp_= rcv_established > > | tcp_= v4_do_rcv > > | tcp_= v4_rcv > > | ip_p= rotocol_deliver_rcu > > | ip_l= ocal_deliver_finish > > | ip_l= ocal_deliver > > | ip_r= cv > > | __ne= tif_receive_skb_one_core > > | __ne= tif_receive_skb > > | proc= ess_backlog > > | __na= pi_poll > > | net_= rx_action > > | __do= _softirq > > | do_s= oftirq.part.0 > > | __lo= cal_bh_enable_ip > > | __de= v_queue_xmit > > | ip_f= inish_output2 > > | __ip= _finish_output > > | ip_f= inish_output > > | ip_o= utput > > | ip_l= ocal_out > > | __ip= _queue_xmit > > | ip_q= ueue_xmit > > | __tc= p_transmit_skb > > | tcp_= write_xmit > > | __tc= p_push_pending_frames > > | tcp_= push > > | tcp_= sendmsg_locked > > | tcp_= sendmsg > > | inet= _sendmsg > > > > 8.78% mc-worker [kernel.vmlinux] [k] try_charge_m= emcg > > | > > --8.77%--try_charge_memcg > > | > > --8.76%--mem_cgroup_charge_skmem > > | > > --8.76%--__sk_mem_raise_allocated > > __sk_mem_schedule > > | > > |--5.21%--tcp_try_rmem_sch= edule > > | tcp_data_queue > > | tcp_rcv_establi= shed > > | tcp_v4_do_rcv > > | | > > | --5.21%--tcp_v= 4_rcv > > | ip_p= rotocol_deliver_rcu > > | ip_l= ocal_deliver_finish > > | ip_l= ocal_deliver > > | ip_r= cv > > | __ne= tif_receive_skb_one_core > > | __ne= tif_receive_skb > > | proc= ess_backlog > > | __na= pi_poll > > | net_= rx_action > > | __do= _softirq > > | | > > | --5= .21%--do_softirq.part.0 > > | = __local_bh_enable_ip > > | = __dev_queue_xmit > > | = ip_finish_output2 > > | = __ip_finish_output > > | = ip_finish_output > > | = ip_output > > | = ip_local_out > > | = __ip_queue_xmit > > | = ip_queue_xmit > > | = __tcp_transmit_skb > > | = tcp_write_xmit > > | = __tcp_push_pending_frames > > | = tcp_push > > | = tcp_sendmsg_locked > > | = tcp_sendmsg > > | = inet_sendmsg > > | = sock_sendmsg > > | = ____sys_sendmsg > > | = ___sys_sendmsg > > | = __sys_sendmsg > > > > > > > > > > I am suspecting we are doing a lot of charging for a specific memcg on > one CPU (or a set of CPUs) and a lot of uncharging on the different > CPU (or a different set of CPUs) and thus both of these code paths are > hitting the slow path a lot. > > Eric, I remember we have an optimization in the networking stack that > tries to free the memory on the same CPU where the allocation > happened. Is that optimization enabled for this code path? Or maybe we > should do something similar in memcg code (with the assumption that my > suspicion is correct). The suspect part is really: > 8.98% mc-worker [kernel.vmlinux] [k] page_counter_c= ancel > | > --8.97%--page_counter_cancel > | > --8.97%--page_counter_uncharge > drain_stock > __refill_stock > refill_stock > | > --8.91%--try_charge_memcg > mem_cgroup_charge_skmem > | > --8.91%--__sk_mem_raise_all= ocated > __sk_mem_schedule Shakeel, networking has a per-cpu cache, of +/- 1MB. Even with asymmetric alloc/free, this would mean that a 100Gbit NIC would require something like 25,000 operations on the shared cache line per second. Hardly an issue I think. memcg does not seem to have an equivalent strategy ?