From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7688FC7EE23 for ; Tue, 9 May 2023 17:19:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0DF76B0071; Tue, 9 May 2023 13:19:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBF316B0072; Tue, 9 May 2023 13:19:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C88306B0074; Tue, 9 May 2023 13:19:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B69B56B0071 for ; Tue, 9 May 2023 13:19:16 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 662E4A07F1 for ; Tue, 9 May 2023 17:19:16 +0000 (UTC) X-FDA: 80771377512.08.FE7E098 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 5F1A81C0009 for ; Tue, 9 May 2023 17:19:14 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=uKEfNpdw; spf=pass (imf18.hostedemail.com: domain of 3kYBaZAgKCCYUJCMGGNDIQQING.EQONKPWZ-OOMXCEM.QTI@flex--shakeelb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3kYBaZAgKCCYUJCMGGNDIQQING.EQONKPWZ-OOMXCEM.QTI@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683652754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IBTSBftt1PJZ947KTT5HuLmK9Q4JRp1ckp/xOLvbM2Y=; b=rrDPMzJlb4LMR8oW7cdeRAyPjyVn4TgQCbOdfyAay79OcJkQkUgN77ZnSiGuKt02d6j/yH aiVCOdlPXsaYLqRo2094TbfJpfqZqKnpjbCgc/9+uaF0Xb23lFeIlvP6VQNF+uv+FqESsO TUKjVjRA1XnWEqBijZpxnTw8hal2VcM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683652754; a=rsa-sha256; cv=none; b=h0Cla0pdirg2eNDmuVBzN1qG1VesDIwol4V+KshGs/JT2tAyzaHo6QcLAp6UxM3cqY6ryR pKzeObUxN9d8150GGsAy1+ejS/p/QOqKK7YuApE82tr+ZiIBcR3NRuG2rOevmzQSDzr10u xjJm73NXlPkg/XgkrKXB73MjmVwgFcg= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=uKEfNpdw; spf=pass (imf18.hostedemail.com: domain of 3kYBaZAgKCCYUJCMGGNDIQQING.EQONKPWZ-OOMXCEM.QTI@flex--shakeelb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3kYBaZAgKCCYUJCMGGNDIQQING.EQONKPWZ-OOMXCEM.QTI@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-b9dcfade347so11302694276.2 for ; Tue, 09 May 2023 10:19:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683652753; x=1686244753; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IBTSBftt1PJZ947KTT5HuLmK9Q4JRp1ckp/xOLvbM2Y=; b=uKEfNpdw9WWrPRRTrj9h9foiBlNLT3WJq9pqOpE0YJHCEjNNaI4D5e5x8ELW8Q/bMI Pj0vnF9eTPxfw/ebSZ11JjYWE6hK6804ytp1NBPJ/uTC1i/wNay4e6osrUIIo2Jb3Gfa eMCArBpj9e7crAws57r8k4yEyjUKPMs9qC6D2/WBe6/cxlSlJhHjNmlhiPFyioeGPwDu vSaJRd+eDJr87t1zT9uXO6tSHdm2cp/o5kVEohG9qH650W9DB23S11Tmw34ykZHcR1Se 8Mymhg2d5yjvkXuVPohyN752Sz1nP5nRx8keUCCYOe+WhCANlSO6+5t/svXtH2MdvlGE a3wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683652753; x=1686244753; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IBTSBftt1PJZ947KTT5HuLmK9Q4JRp1ckp/xOLvbM2Y=; b=A5CZMgO+wch4lt9vG4K5ve6kH5ZMwZ7aRC1UNEsPDPnZTmzs9oYoljpLRd2xqm5OmN rJAxATg+aBOcZy2naYAj/dRr3DCIp2wmuEtTi7f4YsjGe9jlKbn7+5mCfbXEvlTC6Kmp pE9EPQ39V8FJ2OzwqYVQh7Ez8UKQGrhZ4rXVICZRvzhohBZjTsd1u9fYVt6oayYZlkDl yXHXd/QV4fr+vYHhVqpPDnglTgZFhuiaYEU27jl2Po/tNp2b/g59969M8HrvKQbMStHo G2IpoM7kg5YlTSwOj12nZNgBec9R623iat27Cu/2/0oedNBmxXPM/CAKROL7nr9ZRr9H Yhqw== X-Gm-Message-State: AC+VfDzl2Q6LPXWs1sKqetblpK1/JTdd0OqPWgzYiJFpjcUkpvyyfmES /GbAKJ8ibjOqfpmKl6A3KkSbjdzk2zg6og== X-Google-Smtp-Source: ACHHUZ4qq+O4yGw2rL/VJ5eM47IH1JbWFnhn4TU5LOG2QI4Lwps+bE/wvyYGRSdzui38NPnxXQIfxPUPBQGkdA== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a25:8010:0:b0:b9e:7d81:4b91 with SMTP id m16-20020a258010000000b00b9e7d814b91mr6606417ybk.9.1683652753298; Tue, 09 May 2023 10:19:13 -0700 (PDT) Date: Tue, 9 May 2023 17:19:10 +0000 In-Reply-To: <20230508020801.10702-2-cathy.zhang@intel.com> Mime-Version: 1.0 References: <20230508020801.10702-1-cathy.zhang@intel.com> <20230508020801.10702-2-cathy.zhang@intel.com> Message-ID: <20230509171910.yka3hucbwfnnq5fq@google.com> Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size From: Shakeel Butt To: Cathy Zhang Cc: edumazet@google.com, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, jesse.brandeburg@intel.com, suresh.srinivas@intel.com, tim.c.chen@intel.com, lizhen.you@intel.com, eric.dumazet@gmail.com, netdev@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Stat-Signature: kx6cjc5tr9uannptjyoic86h56535ph1 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5F1A81C0009 X-Rspam-User: X-HE-Tag: 1683652754-692163 X-HE-Meta: U2FsdGVkX1+JjBaCA6LlAOLSQ7yOFZ68e2B9H7O/bmVr+87lNlzEe2Z/2IN1WE0cN0/z8zaaRjjbjtB5hghUdSNPiSWb7Aoi4DJl/IkvJ/pwur+y3Kar5SmSW2FEtRW1qnqd4dI6dQNBUbLNpdjCK8DtaVmcg3RWFVKY/H1KC1i5Vdp2fwZy8stny2gSsXijcPtH/4C9SyLCTgmYOhTb9HN3M8OiIbWJtUBy3oWCNTiMQdGMLguDD6Qiih40jalo2OYb4oooS/Ia6U+369rOhPTfcfQfJJdkhW9ETJPVUk5WYvyGGXUV8f7H4lyH3LGIrQfRjob0ogQWKkAMiX5nzHHcq6xMxDKfNblZp2ymuOaM0PkvKY/v1FNUEiq6bG6qhxjisbGbGtasjSNFeaMOhLVU75hImzsmV5CNlU7I10DFVas1GBKaxswLeQa6a3/4tGhZWDHqR4/4ny8J2A+M/qRwezrKgXkA9mdEaFQ8BcL8BMxbYzZlzW9nZD3rg07aoC1jVIaSj95TzBVMJNYI/2q4Vpr32mYzk3yWYoR402BM6pzv/ZsgKqiKNpysacvoSjg3Z0aosgZRhzN8UT9iX+mu1R8WiVXwEaZm/yL0sLhtLHJc9/cdrau42RnCCGPyevmntvKJ0Ysqn1N3/0Ej1QL20fbq94Uqvu0c57RJmUKSsDPLUXpAm0l0UWqwfumJvMt9zobA+YS63aZw+4toSrJoDtWx4Tc/zPGcdIaW9/0cWyJeEcK9q27N+yZl7yUJzlwZRjZBr/qymL7OC+QkDS+BmhDPezK2x4im4z+AxuSMFHxdG6ib24dsQWu6NNmjJGmYPZLzI/lGPJvXOTX5FDb+Igg7M+n7fpcdEJ7L/7oK/wLzEmfqiTUt+gUS4RqucHKJBR8eIazuNcq77cRWekLGyFuAYFHrNPsro+0nBTCzgHGA9cVOfLuFoqIdIO9MkfwDhy053gfX5RvduG2 JZ/ulhYf mAbFu1cuUUvYp9hTUhtU15V8KXmDmRmM3+x61qduqgVpO55uOZ1msHS2VxZq+21YLMDd1lIa2G/lKFFC61z6cug+mxG+0zXTYaxHAbhbtwFs6+4MVqOtPHs5925x+OHAzzpeXqhdRelrBK0DWsGw7fnUG/CG2F3IEZKYFBEWQll9ELTjE7Zs6uo9njUtdYDDUFcDNlGJGY9NZcQ+V93Npog/AngDfax66yl2OL5FcE2XkgjH9SARNjKHZgtwTcLK7ddKQkELdRoI/FDnPQGiVDI7CJkbneBbUcOgIhsE+e1IQSS0xyneUq7ldTOrqiQtjeKJ6CdZHsHpDpmIy+uJlXQqyr7vIKgnVZh9x/n8HbQ/5hYwYqfsyEC7d70YMI7mPdxd+I4AI1ihbucBrLSFN5EXfWPUr+QGGWmPFpNvGi5PFxV50mRJ/mShg6n3edVZMAPbQ9yzvL6sQOPsQMFVR4XWnbY6fEvOPaVOjrl7/b2/+0cFHdv3uY3YmLj8e54kpT82c4Mu3jil8YBd9WKeHmqvTGyE6f/GSlHjy/c193sNcuG0Dpz3yBphqOCEPmXc2mhfX8D7ElF4Rv5WAOasAp42TK7i0Lerk/9QrazsFpdFFvv1XU3Yx1aPDi7X+7815FoZBMKs4RNd5OOuxi4SYL7DfsvVO46EocUR3TUcYxuK4Jjb7PCDCmF32ZPuIh9rMT5z/WxVpBO2EOzYvufQ8i+isiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, May 07, 2023 at 07:08:00PM -0700, Cathy Zhang wrote: > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as > possible"), each TCP can forward allocate up to 2 MB of memory and > tcp_memory_allocated might hit tcp memory limitation quite soon. Not just the system level tcp memory limit but we have actually seen in production unneeded and unexpected memcg OOMs and the commit 4890b686f408 fixes those OOMs as well. > To > reduce the memory pressure, that commit keeps sk->sk_forward_alloc as > small as possible, which will be less than 1 page size if SO_RESERVE_MEM > is not specified. > > However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as > small as possible"), memcg charge hot paths are observed while system is > stressed with a large amount of connections. That is because > sk->sk_forward_alloc is too small and it's always less than > sk->truesize, network handlers like tcp_rcv_established() should jump to > slow path more frequently to increase sk->sk_forward_alloc. Each memory > allocation will trigger memcg charge, then perf top shows the following > contention paths on the busy system. > > 16.77% [kernel] [k] page_counter_try_charge > 16.56% [kernel] [k] page_counter_cancel > 15.65% [kernel] [k] try_charge_memcg > > In order to avoid the memcg overhead and performance penalty, IMO this is not the right place to fix memcg performance overhead, specifically because it will re-introduce the memcg OOMs issue. Please fix the memcg overhead in the memcg code. Please share the detail profile of the memcg code. I can help in brainstorming and reviewing the fix. > sk->sk_forward_alloc should be kept with a proper size instead of as > small as possible. Keep memory up to 64KB from reclaims when uncharging > sk_buff memory, which is closer to the maximum size of sk_buff. It will > help reduce the frequency of allocating memory during TCP connection. > The original reclaim threshold for reserved memory per-socket is 2MB, so > the extraneous memory reserved now is about 32 times less than before > commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as > possible"). > > Run memcached with memtier_benchamrk to verify the optimization fix. 8 > server-client pairs are created with bridge network on localhost, server > and client of the same pair share 28 logical CPUs. > > Results (Average for 5 run) > RPS (with/without patch) +2.07x > Do you have regression data from any production workload? Please keep in mind that many times we (MM subsystem) accepts the regressions of microbenchmarks over complicated optimizations. So, if there is a real production regression, please be very explicit about it.