From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26646C0015E for ; Fri, 28 Jul 2023 12:45:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4FAC8D0002; Fri, 28 Jul 2023 08:45:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FFAA6B0074; Fri, 28 Jul 2023 08:45:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C77F8D0002; Fri, 28 Jul 2023 08:45:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7EB446B0071 for ; Fri, 28 Jul 2023 08:45:18 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 463A81207F2 for ; Fri, 28 Jul 2023 12:45:18 +0000 (UTC) X-FDA: 81060991116.19.A2B6A94 Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) by imf01.hostedemail.com (Postfix) with ESMTP id 9966B40027 for ; Fri, 28 Jul 2023 12:45:14 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TgN5QPu9; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690548315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; b=gPO4PgU5TJXKiTJfjgvqrMC/rVer3vUNJnmi8SSIBsvQzzfXo4dqumfVEI+RWTocgJd7k8 YFVE9KuBbBDpdY0eSU4mjvLWe642UF1lQnb6fMigW8gpOQwc0zu0ZnJsS90zxOgDUb0Om3 K4pWLc5upCcQy4ox9JPBF6dYuLalSjY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TgN5QPu9; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690548315; a=rsa-sha256; cv=none; b=03uRDVevwdLm+n2gTeJ1bkBI5oQNwg2gdMwAc/I0tqnuDsXjFm+DhIWuJamoSX37MxEJ+H PusZd8qIc5nLRjy/5Gep4FT57i6dLPeKU2ZD64X/hjWK9mnTdJnpG2Qv+sVlWYRmRAy0Oi sI9sEvtScMyAKNP11qy3srmG/LSs8sI= Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-5607cdb0959so1126087eaf.2 for ; Fri, 28 Jul 2023 05:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690548313; x=1691153113; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; b=TgN5QPu9YYBwRx1weOr/WieWPm5UErXX8Yl2ZnYRnuydyDECUF4gTKm/YstmY5l56B +wnzxryDqAXht0H4tNvFivjOPlHsYMJXJT16Ya4FZKhXjvLU+tu2ANlM2Hnggey8OizN kDKwNfr3/H1DGnnb8CFZ3Da0gxp96slSEQTjWN+YY002DAwTEY+vBkU5qAN2XCWJu/gT esYhuHCDSJ7t6D7godXKexmEfaiKr8p6lUWl2qVGZEu58BKWOAgb1Dxwj7NUCNOh0esu 0woAXuC1jDCs0SqpMA1m8vH5WR568aWfDh6ShMuQiWhHo8Rn/ZYpvSR8Y3tRNJRX9GW7 st9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690548313; x=1691153113; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; b=KBMj8n27MhFlfdI4FnmdbDttWJ/qe4reiuhQoUali0nXjooB0PmNteY1WmK7H4i6RA 6B6qmR8BobXRZ+YfnRE2kG2icrPRPn8mk9EdL0divroy1iCYWSIMMMXybbexFF3sci6r gFaNYjzqohSCbGov3fveEkWj3IctPhDFcPqSSYUNET3FrAGotHeoxHwzMePcYzJ8tqiJ PB1FWOK7ELcVIoTpMfxR1o3P5zh5b52sr6cWHUlbPJSE9ko17eA4fwktknvIE4q9qaeY CaKSj0GjRVWwyh/LRN7sNNgSJt0mCsnAAeQWCQM2V6E/1evBgCiiGGEF30ugj3a8uLKi FFLg== X-Gm-Message-State: ABy/qLaFtceUDNHOpySsDMNiOacJWtrsEi+7Xu5aXeA1KLj16u4E1CF/ 6PdRVhIrOsqTxsA7onwxf/a4Vg== X-Google-Smtp-Source: APBJJlF4g9xuHBTxSNvuB+rX75MGVV2iZ9M7FOEM/W01jL7dos+/qDXYGULWfOSaRyQ0x7r2zW5CBw== X-Received: by 2002:a05:6808:e82:b0:3a4:6b13:b721 with SMTP id k2-20020a0568080e8200b003a46b13b721mr2742390oil.46.1690548313043; Fri, 28 Jul 2023 05:45:13 -0700 (PDT) Received: from [10.254.173.0] ([139.177.225.238]) by smtp.gmail.com with ESMTPSA id ep11-20020a17090ae64b00b00262eccfa29fsm4145574pjb.33.2023.07.28.05.45.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 05:45:12 -0700 (PDT) Message-ID: Date: Fri, 28 Jul 2023 20:45:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 From: Abel Wu Subject: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure To: Roman Gushchin Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Alexander Mikhalitsyn , Breno Leitao , David Howells , Jason Xing , Xin Long , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" References: <20230711124157.97169-1-wuyun.abel@bytedance.com> <58e75f44-16e3-a40a-4c8a-0f61bbf393f9@bytedance.com> <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9966B40027 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: qgt7nmjxy18xtqc9jo6aiacdr73t7hgg X-HE-Tag: 1690548314-997101 X-HE-Meta: U2FsdGVkX1+QgXHpeysOKQDOt83kLZzt4UnzryrLlkj2VBfkBR1uTQ9T+dn2IKnxug7PSxi263XF79po/llV/UxUFupgt9yPbH2skpQ3KKEoMksANBFHLIxvU2OyfIOHLSodL6FipzslJ+S1/1GFCj26U435Qr2ZpIBkgJ3W/N1KAGgTcZhKCeeNmcGwOlBaKvy8OvycM0S235seHtmWd012z3S+y/1jBKZ1Y7v1vbKhfsjCMWaGUkENU9olUYPe0VA+NLd++xpXQgw58GgbiQr9Z97/ObpGGslFv8Zp5gzE+a+jLkONJzaPqpFf9FQOxf98NTe+My+tprtytyiGBmutFKttIPrUDrGfLuTeX7iBsqlXQu2yRT+e20I6vJwd4crA7o8UVFM0WfgLNHlgsjvLqd4Q00odyeXrwmb1bcIkZhGjp3ZBuQ4+ndD3zDYPWGWHjxiircTMPdav+96twIiv+SP7KlgFUJsSCneSDufme1lkXBM/WrSk9RogHPCiCw2NYELwHnK0NNNYZLaENwm411b6b7xyAW4e5qn/cKdujxQnOsvHw7vvjDE+owfZmun8tFachosr/Mi5ZzW1ZtTiTmw0vm498LzFB7IVUnufu1ZBubAaCiiOZ8qn2zf32WWwiWSmstPWtITooJcoS7MC4Q/4sasY7fCudZr8z+jnwgb3r0SjnvCGXqo3RedW0vxn/75wE9b4CDMm/g4tqZWRvwuthQ9m+D0niem4kBAL9Ohr/1cnN79wdQxT57F99MMCuNEGXBEU0cWniiKpuV0sAql1cP7+pDCLvxOPXLW1Lj+9LS6AIh6+fKJK+VPjj9LpOvpgkJcx0vzpMO7JfQYJmFQaeVE0hwh7cQXRkHEuUThgViF8+dj7jeMSn+7hBUOzwSsA1kumAvE0owIWWfL0JqWeW19kLIJH7FtktNhjmKyKldERud5LRpt+n4ER82qWoRO/Kmx4He6/Xy6 Wl6DA/tK /NyTCTWJbejDCUnCG95y40u19QmtaoqSzu86P+Yc+mGmPadI/opL7GEZFuDKY0xGj+EBDmBQCx0uoD8Aydi3+oaKoijQh8gQzrWLrh3oYw+m3JHWQXGU3dL0o8twaaFuXi4LzmwW3kPQSBS6wVQIUe3+1FhomrH1xAJD38lkCBJg2h29fXUIhOarcgm+YIQOnp/4SBcgZed5Ws+y1vXZhgd74sco586lQ0gNQuGYTszbIN0Ke6SsNl3H60usZO8mILdG/uALvJy3Ikj/dTnq001EGb02sowUR3KEvfR9EtAn/+N7Ct9yHcjd0oM+NohhUXqtqbhMjNsBt7duskkIrperQGcCtmlIDLF5G+2uplAsR91cxCKxIPKoZI6Kx1Pz624HD3bmYyHuAFMD1DP2HOrCC9bfuuDaQ/+14bAZcpw5TFLHEROXJAHuM4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/27/23 8:19 AM, Roman Gushchin wrote: > On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote: >> On 7/26/23 10:56 AM, Roman Gushchin wrote: >>> On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote: >>>> Hi Roman, thanks for taking time to have a look! >>>>> >>>>> Overall I think it's a good idea to clean these things up and thank you >>>>> for working on this. But I wonder if we can make the next step and leave only >>>>> one mechanism for both cgroup v1 and v2 instead of having this weird setup >>>>> where memcg->socket_pressure is set differently from different paths on cgroup >>>>> v1 and v2. >>>> >>>> There is some difficulty in unifying the mechanism for both cgroup >>>> designs. Throttling socket memory allocation when memcg is under >>>> pressure only makes sense when socket memory and other usages are >>>> sharing the same limit, which is not true for cgroupv1. Thoughts? >>> >>> I see... Generally speaking cgroup v1 is considered frozen, so we can leave it >>> as it is, except when it creates an unnecessary complexity in the code. >> >> Are you suggesting that the 2nd patch can be ignored and keep >> ->tcpmem_pressure as it is? Or keep the 2nd patch and add some >> explanation around as you suggested in last reply? > > I suggest to split a code refactoring (which is not expected to bring any > functional changes) and an actual change of the behavior on cgroup v1. > Re the refactoring: I see a lot of value in adding comments and make the > code more readable, I don't see that much value in merging two variables. > But if it comes organically with the code simplification - nice. I see, thanks for the clarification! > >>> I'm curious, was your work driven by some real-world problem or a desire to clean >>> up the code? Both are valid reasons of course. >> >> We (a cloud service provider) are migrating users to cgroupv2, >> but encountered some problems among which the socket memory >> really puts us in a difficult situation. There is no specific >> threshold for socket memory in cgroupv2 and relies largely on >> workloads doing traffic control themselves. >> >> Say one workload behaves fine in cgroupv1 with 10G of ->memory >> and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2 >> with 11G of ->memory due to burst memory usage on socket. >> >> It's rational for the workloads to build some traffic control >> to better utilize the resources they bought, but from kernel's >> point of view it's also reasonable to suppress the allocation >> of socket memory once there is a shortage of free memory, given >> that performance degradation is better than failure. > > Yeah, I can see it. But Idk if it's too workload-specific to have > a single-policy-fits-all-cases approach. > E.g. some workloads might prefer to have a portion of pagecache > being reclaimed. > What do you think? Now the memcg is considered to be under pressure if the number of pages reclaimed is much less than desired. I doubt it could be a win in such case to spend more time on reclaiming while letting socket continue to allocate memory (which could make things worse), compared to relieving reclaim pressure and putting time on its real work. Best, Abel