From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21BC0C0015E for ; Wed, 26 Jul 2023 08:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA3C98D0003; Wed, 26 Jul 2023 04:44:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A545C8D0001; Wed, 26 Jul 2023 04:44:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91CA48D0003; Wed, 26 Jul 2023 04:44:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 837708D0001 for ; Wed, 26 Jul 2023 04:44:39 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 54EF8A0D45 for ; Wed, 26 Jul 2023 08:44:39 +0000 (UTC) X-FDA: 81053127078.19.1E8D7CB Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf30.hostedemail.com (Postfix) with ESMTP id DC18580008 for ; Wed, 26 Jul 2023 08:44:36 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=W+EanSUl; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf30.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690361077; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OaHVQJdsQFWkcZBsx/T38G+Lqdb6Whv+CYMlwTb7aHo=; b=50KAkuuT766Hc+4yqPcegKqyuvWboiHVBwHPP4W4Y9n/jqNMHEapL6JPyUwr7UhIChbElc OPrLFe5LYGwGvyIBlS0sb1l+U3dKAS9R/JrtlzS7KldBqKRoL8liv+ZsWWpyAuENjoaLHj EnUsIMQXTomdCSxewsii1mc2eoXk/V0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=W+EanSUl; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf30.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690361077; a=rsa-sha256; cv=none; b=Ir0RdAs/izK5YZ2spSBHY46Ca2k9pUzO09EjH2JJWhswViPRj7qAj/nbwCbnLhSLeZg9nY baf8oFyPLkhuVOVl4V46UbcN1pcL8aNZVupKoGHd23o3lIuNjlEZbyeLuhIoNs2j7A57vG a+Y2U24dcM8s4fpBcKpc6ZFQzbtNaAs= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-686b643df5dso858826b3a.1 for ; Wed, 26 Jul 2023 01:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690361075; x=1690965875; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=OaHVQJdsQFWkcZBsx/T38G+Lqdb6Whv+CYMlwTb7aHo=; b=W+EanSUlLb0IGwq3cqcllspMnd3tUPyj31OymLbNzWRw2Mib/39CPdtiQOifF0ZSc5 5O6JPEAAkD+WNzZeYqwGtNprVuBi9/h5937q4EAzrX1aZUYbKYA0NjoqJpKJrQuRRdzY 5tGTvyajdtHNdEk3h7retqoeRnvxvm+AJMyCmw8p1+tTmyD5odyJZ+PYWpF0tFDTLDrC D6QOv7OQxqA/isASWaSoMHqL711oBKs93+5SvEWbCnweovlFRX5fNI7Ug2RKJyM16PCl vpMqwLrx0LW3PypNKK5hzgqJuHdHBi8ZDc+oPWmdc5b9HcGFOPKuW/YtHz88IfXSvHTS t3ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690361075; x=1690965875; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OaHVQJdsQFWkcZBsx/T38G+Lqdb6Whv+CYMlwTb7aHo=; b=WP24o8nl8VjtEPnxekN/HRYASm6/zxwLjeVlhMjKxN2vDMm7AOI8ZOffHat9xQN1yK hH7n/M0lM5CJP+ME+SEtWzVvZqj7zhpE3jMvmZzU0uuNjzCk6oNu090obf/hdLCXGs2f OEZhKt12dEfugk8aKOSfqG/dckecQ7gORqKPvmxoLOhsFQviUwFDMn/sgn4rn6NLFsZx vmPQqHMFH9s2q4vQITCkOmFlgRcSLF34ulrymGL0rrZF/XcmzNZsbm7In9R/2zh8gDiK Hh4FJoXKg/Fiit7aggDSLuSlLc0kxXVTvpDGU5iGZ8bbUZK2gijeLzk/r2wjFmJW8rlw 0Meg== X-Gm-Message-State: ABy/qLaiR/Bo51XGLh+aj/5M8zubqQKYeLzyNDTq1n2epWS+7IOa/8uH aFRa+Q7FywyjqqHSy7AilDHjlw== X-Google-Smtp-Source: APBJJlFvYnSn4ZZ+O3SIpkv4jtIYxlgXOxE89PbGNUcmqXLAozlbdFXGBYOXeawoBUpQByVxpmrpXg== X-Received: by 2002:a05:6a20:9188:b0:12c:f124:5b72 with SMTP id v8-20020a056a20918800b0012cf1245b72mr1224570pzd.43.1690361075154; Wed, 26 Jul 2023 01:44:35 -0700 (PDT) Received: from ?IPV6:fdbd:ff1:ce00:11bb:1457:9302:1528:c8f4? ([2408:8000:b001:1:1f:58ff:f102:103]) by smtp.gmail.com with ESMTPSA id c16-20020aa78810000000b006828ee9fdaesm10879530pfo.127.2023.07.26.01.44.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Jul 2023 01:44:34 -0700 (PDT) Message-ID: <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com> Date: Wed, 26 Jul 2023 16:44:24 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Subject: Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure To: Roman Gushchin Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Alexander Mikhalitsyn , Breno Leitao , David Howells , Jason Xing , Xin Long , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" References: <20230711124157.97169-1-wuyun.abel@bytedance.com> <58e75f44-16e3-a40a-4c8a-0f61bbf393f9@bytedance.com> Content-Language: en-US From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: x7na9sr8nk6twu8cd6k8qsrhyy8omuja X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DC18580008 X-HE-Tag: 1690361076-640798 X-HE-Meta: U2FsdGVkX1/BB5ryFHUU97ZjBNNm60ntggPlD9uJOVQVOUZnGoSXV7iO0bUOuwYVzRN7HMIVnRf38IntAlnBjer72W9vkCrAc84BMX3qDIivgF9U2CzoKnXhtL0X5XbYc+49UAZefTa55ymfOKfIAOM3x7bHPaxowBNGkBsVdx9fWmpoRsI5dG5o2VichfCyWVt//y8x9+Vqwhc0uV24dnrtMYv1NsvzlYP7GhLFt0e7mYqUbBBtYaHkDK3kVN30b9txxAmBEGqaNSWmOr9t2zov2uN87oDm+RXnO46u4mF0IeEA4r38EOT7ofPBn8o+6A68DYxykmu+ZGoWRUpKwTZIKIulF2dPTZQQgOJCWYOebNKSVIsn381ODRdFA8Goxx2UsasWUGdYu3otXFEz48R9x6IGmnLKrm+6yQ9+DmdwQlNIh5dXpgMbGQyOpgr7mrmuYg35urMF8bCln4AtpzbaQXZ9zfsn8ObuGQTHyeWPczU1bGaOVJNFMM7/tacPStfkLvDvr13bAavuO92vo23cRFBmxhwJ26P1v/hsqmnjrPmP5tCDA1URqvkdK4HshurQcGUBtNuDR7VPpByhEc3N/T+6NYWt8uBGbVYSKhp2pXFvcnBBz9JwIZngOhFUOu7TDYicRy/DhDlGhcAA4u/wqqZ2BARcEK1VTXFuGnisPwJY7pEzEaFzLEnRQWntN8YEtcz0q7VisBo8KdsmqrWJutSF+uz4zVFf5WEtLTUAt67ck6T60dXdffWdUyY4M3vuEikJjT9w04fJbiURDZUSuVnDGAHBlNIiCWQPaoLUFZfeTvS38lXIKGmoLG9kC+vNibUAqbrgfG5DGuevXN27A0DuhNGqjTUcCelGT6kZ9FVWWzb968GLFRKr0CjSOfrlDMSuF3fVgpAVjs1F0+YbdDIQ20/EtRn3zRyfnF4MRMK0ALxoDotp7jsUPOVQidKWp4ZUSiotkWdFHS7 brwROx17 1vFN4VcfvYNu8bU2+JZ0Kfjhd35M+aw1zirctCORM1AJhfmvpalJR3I8uURby3WU5+EybHG9gqZCeg8MI2kngouX+iE/oKrRwiwJ+kLGmOp5xfaTgtavvmgbn05Oz70KGP5pzKTJbASQvgfFnwJXHENd9gwpicS0vGuHACLCc4+9CaQajExM6l2FFqAT8/pdZRNzx1Zp/Y/a1X+ghxbtKdpvBm4trCLj2/qKZdAEpo16vBjFO5x8jeiYqi2Qs33lQTLqee09LZtu6EGYG4kD5yBj5n9Hy7KWwOsPwuUG4hI4AhQ0D9C8rG9MRtnDGWShLvYc6zRMlpBpVW4ieH3Iqd5HQTbwJxr+Lt0eBEWcmhjkHczzD+7wUyQEaEUlVESWMF7oVNdiA8YXrISqmHinJvlRPFDb746sG5Ksfm4pP8VZDfqjfcM9dN5OQ1+9iSvM1h16kQY9xDjv4XmvkehKc6DN1nB2sPKqTjlGoXLpXlKP6Q6t6igYGIvAf1Eq6RUuxKHJ1HnNdLwrHr8y7TISyVuX2hV4HBvsDqSKVc28gbWPwBcBrzNCeeouSNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/26/23 10:56 AM, Roman Gushchin wrote: > On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote: >> Hi Roman, thanks for taking time to have a look! >>> >>>> When in legacy mode aka. cgroupv1, the socket memory is charged >>>> into a separate counter memcg->tcpmem rather than ->memory, so >>>> the reclaim pressure of the memcg has nothing to do with socket's >>>> pressure at all. >>> >>> But we still might set memcg->socket_pressure and propagate the pressure, >>> right? >> >> Yes, but the pressure comes from memcg->socket_pressure does not mean >> pressure in socket memory in cgroupv1, which might lead to premature >> reclamation or throttling on socket memory allocation. As the following >> example shows: >> >> ->memory ->tcpmem >> limit 10G 10G >> usage 9G 4G >> pressure true false > > Yes, now it makes sense to me. Thank you for the explanation. Cheers! > > Then I'd organize the patchset in the following way: > 1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure > 2) a formal code refactoring OK, I will take a try to re-organize in next version. >>> >>> Overall I think it's a good idea to clean these things up and thank you >>> for working on this. But I wonder if we can make the next step and leave only >>> one mechanism for both cgroup v1 and v2 instead of having this weird setup >>> where memcg->socket_pressure is set differently from different paths on cgroup >>> v1 and v2. >> >> There is some difficulty in unifying the mechanism for both cgroup >> designs. Throttling socket memory allocation when memcg is under >> pressure only makes sense when socket memory and other usages are >> sharing the same limit, which is not true for cgroupv1. Thoughts? > > I see... Generally speaking cgroup v1 is considered frozen, so we can leave it > as it is, except when it creates an unnecessary complexity in the code. Are you suggesting that the 2nd patch can be ignored and keep ->tcpmem_pressure as it is? Or keep the 2nd patch and add some explanation around as you suggested in last reply? > > I'm curious, was your work driven by some real-world problem or a desire to clean > up the code? Both are valid reasons of course. We (a cloud service provider) are migrating users to cgroupv2, but encountered some problems among which the socket memory really puts us in a difficult situation. There is no specific threshold for socket memory in cgroupv2 and relies largely on workloads doing traffic control themselves. Say one workload behaves fine in cgroupv1 with 10G of ->memory and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2 with 11G of ->memory due to burst memory usage on socket. It's rational for the workloads to build some traffic control to better utilize the resources they bought, but from kernel's point of view it's also reasonable to suppress the allocation of socket memory once there is a shortage of free memory, given that performance degradation is better than failure. Currently the mechanism of net-memcg's pressure doesn't work as we expected, please check the discussion in [1]. Besides this, we are also working on mitigating the priority inversion issue introduced by the net protocols' global shared thresholds [2], which has something to do with the net-memcg's pressure. This patchset and maybe some other are byproducts of the above work. [1] https://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/ [2] https://lore.kernel.org/netdev/20230609082712.34889-1-wuyun.abel@bytedance.com/ Thanks! Abel