From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31B5EC87FE2 for ; Fri, 9 Jun 2023 09:07:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7C148E0003; Fri, 9 Jun 2023 05:07:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2C948E0001; Fri, 9 Jun 2023 05:07:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CDB38E0003; Fri, 9 Jun 2023 05:07:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8DEB38E0001 for ; Fri, 9 Jun 2023 05:07:22 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 640D9401B3 for ; Fri, 9 Jun 2023 09:07:22 +0000 (UTC) X-FDA: 80882630724.18.7ABAD2D Received: from mail-il1-f181.google.com (mail-il1-f181.google.com [209.85.166.181]) by imf28.hostedemail.com (Postfix) with ESMTP id 8AC55C0008 for ; Fri, 9 Jun 2023 09:07:20 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=hCpzPeWK; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of edumazet@google.com designates 209.85.166.181 as permitted sender) smtp.mailfrom=edumazet@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686301640; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=gZi4UxU6ahJZZDf+rvvWZ26RQl4fo/uBAP/ctx6Du4hKtI2Zjd/X0m4dDRWs8jfoROQknA ZuglCGAbvMo5xAfStQWq+aE7yPW/Ai4khUYfMDeRNexJaFZtT68om5hb5nCa+ZWQJdlI7M zHJMKmrSVnmCLwcS36fVjMdjaiPh8kg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=hCpzPeWK; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of edumazet@google.com designates 209.85.166.181 as permitted sender) smtp.mailfrom=edumazet@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686301640; a=rsa-sha256; cv=none; b=BJ4k6GiKpzxTZ7nCI6r9eJi9UUU0ANrHMfhO94Wttd2T8+eShyrGnh79UAmw2ZNyGDOyml mDpxRv67S/54dvU9w1K/zRVkmpi90CUlp12+lu9/iKEWij8Zu7kDCvRF7BjAzIeGnGL1wP abw9fVfCVeLPEUAg4DEulBOZ7vocP0M= Received: by mail-il1-f181.google.com with SMTP id e9e14a558f8ab-33bf12b5fb5so216135ab.1 for ; Fri, 09 Jun 2023 02:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686301639; x=1688893639; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=hCpzPeWKrNa3kekE9t5+6tDGSGcCQNB3zp8QEUHG4S9aAMvfBGhQ4PfM93rGsuXPFa XAdMHbYGoAgMHUdP7C86X3/YC5WBNHcO+tYLDQ7WJRH3KHMLAhAPceHEz5V+7a0eNz8n 6BdC2h3I6c52v+lOoNNkqwgFIRxI4y27BpE5+EQ5P94dTP6vMSPoEzHQ766ExdYHZyCe bWLvSKay1lJ71VzY0BYjxm0cgMMSDA5XA4F8DOYu6EbSlK4V1KEpgo0701dPY9IADLJR imqcGAwsIwdrHTunYAFWuUgoUTl9bxSM7r7RNzf7zJJvpJRLUZXM/8Sz3/X1hRhSW5jD GJUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686301639; x=1688893639; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=Cmoyd0KRsAhqSvQrKHbwmTEI/Dleob0fuTXCTjlmkIBgZ/XaVjHCVcgV8qLQhmxUmo BGkOsjIYstRzgel2V9UIxEAltNnxZHVpUZzNMd6m2SYqID9s7TTE3PrgKltLLnpsTRJ0 9aZ5A18m4opEXX78BIENovKvOhFGjzoJ7XZAuoqhNsrj0i7/xTlJyLT1qN3zQR4nBQ/c tIQcdYcqTy/ziIE+T7xbTvap/PQhVgeGkq281aZqNhRtlnN51HFwJx1kaY/TDu15GtR1 vhlPADy7sXSNEnaIR0jVFrDxE5VmNXIw65CxiAYqeq3JLigMeT2nyhEDWPhyAtAgCH9Z wboA== X-Gm-Message-State: AC+VfDxbxpRow5sCBqlWy6hh/QVZ4pv10nnRMlMHqb6LpFU1m62azeNW Fw5TcYNiqSud78oYCN89W0ONneprfSmzjS1Qma+kxg== X-Google-Smtp-Source: ACHHUZ5vmT2eOAYEGZtWaf9C1WLVOXwyQP5WUv2+r5yCqWBKuOI1tIBVQM0GgqzsQe0OOAWiCe5U+vDQbastTKW7LbQ= X-Received: by 2002:a05:6e02:1b01:b0:33d:ac65:f95e with SMTP id i1-20020a056e021b0100b0033dac65f95emr341983ilv.12.1686301639383; Fri, 09 Jun 2023 02:07:19 -0700 (PDT) MIME-Version: 1.0 References: <20230609082712.34889-1-wuyun.abel@bytedance.com> In-Reply-To: <20230609082712.34889-1-wuyun.abel@bytedance.com> From: Eric Dumazet Date: Fri, 9 Jun 2023 11:07:05 +0200 Message-ID: Subject: Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation To: Abel Wu Cc: Tejun Heo , Christian Warloe , Wei Wang , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Vasily Averin , Kuniyuki Iwashima , Martin KaFai Lau , Xin Long , Jason Xing , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8AC55C0008 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ru4yuzu7kwk7b3r8c5og3onc1d8o9yr9 X-HE-Tag: 1686301640-426382 X-HE-Meta: U2FsdGVkX19vKe5uW+mtKG1fCSRFOnG7nGYZKKqTnJ2Xhm1wO48DATDHQmH72891N8eU9m1OK3QABsjyUDXY2JejYB5oBoiujOaY4EsN2avj4iMpFTgWpaZORWqGMiZzKhuh1Zw5/lSqkc9HZ2x8YyjMd0lkyF+if4uUZ6tF8IhexPKRrLOKbsRc1xxZTYSytfvGNmCwk2LDJwhFJ4BnT1NcQwrssKVRD3iuQ6MY72zt7wTy4Az1GEfLozWz8aguYcui0E3TDMkGIyIHP6xKbuKiyjG0G69LYCoeMfn+BRPtN8qKFz7hFmjUp/528cjXWpDr/houGOHGjUIwVmIArkWRFeqTlxctgUZu2aArV6qF9YYQfanW4sO7LO/Kq3wv4C4QN/JPIEKn/gXBl6kWcXQ4/CTPlx0asQUGnSghAb2OIJC19s2Q7xL6RbLS7PqXelleaQtLJQWMOLVjOTV56HAkG8JdA3RA6han/OdxWKzFULP+NU5fxDHRAs1v6iukzM3SHgnBybsAvGPf06EM2psgynCITml/MdcwRboK8FfN3uDGx9wE5fEQ1F++TIdA0F/BYVM5IDGIq9JjY9w+Tt8InXswsZKOOIvYmBBZgspHXbTpZtcyr0IbZvgLXFkXzOkFzslsUx7t+rrgBnafa5dBvnIzAN14oSGoDDGiu78WYl/26oNQCV6iL8+7D7fHM4YIvd4OM1G+y4SkDvPrMHJrirTBKiHjEj62pyjn1IsAFKaFUL1/hK19E5aw6sLeLUJuq7wqCZWGuN6KSkBjjfkO35B9ya7Wfn4k6jttye4KIjZFZkUWCOMoRNcQ93+e/vJecPaA+DSRKfsUTO7el90qp+Qu0FY3S+j5PFUXjCljL9/WZ5aeEaqJAlG033jMSh1kAQ9BkahKAmyj0AWI9lfpgT42IOLtLHM+lEvTQekU4Mmxpd2d9aAkeNbyXhLd3p3H/MbFvt/7sYkDzh3 Gbivh8fA 5evCqdxwY4fsHcNjkuJNQcGdHFENm5PKAECo5TsVculuh+De1yTVHG4cAXMU/g/gR/5jze6wro5bAAztYedCk5Eu952Pigl0ZJR1LBdLkSoyvSt8MEUsMSFRvDXUa1WTQv777TmZ8UCCLaLF2JAAgyeYy/WZ4DOM1A8o+kcJ/YTVncvl+XX3rhsNuqyZ2fN3odEV+8bWDE3Jn+rubuz/cbZ7jtdV48mMeK3D4S4xqveMxLhyZ+AAtLo+g9ef/b4PojnwrLzD97cuJtQBQdRLV2X46xckqQ4JSL1JZ3fEw3kEfT214r474jDYOI/eU03Fwh+9saR/jBffA4ZCB5rlOeg5Zq5qPEqEG2km5N739fE9jzFo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 9, 2023 at 10:28=E2=80=AFAM Abel Wu = wrote: > > This is just a PoC patch intended to resume the discussion about > tcpmem isolation opened by Google in LPC'22 [1]. > > We are facing the same problem that the global shared threshold can > cause isolation issues. Low priority jobs can hog TCP memory and > adversely impact higher priority jobs. What's worse is that these > low priority jobs usually have smaller cpu weights leading to poor > ability to consume rx data. > > To tackle this problem, an interface for non-root cgroup memory > controller named 'socket.urgent' is proposed. It determines whether > the sockets of this cgroup and its descendants can escape from the > constrains or not under global socket memory pressure. > > The 'urgent' semantics will not take effect under memcg pressure in > order to protect against worse memstalls, thus will be the same as > before without this patch. > > This proposal doesn't remove protocal's threshold as we found it > useful in restraining memory defragment. As aforementioned the low > priority jobs can hog lots of memory, which is unreclaimable and > unmovable, for some time due to small cpu weight. > > So in practice we allow high priority jobs with net-memcg accounting > enabled to escape the global constrains if the net-memcg itselt is > not under pressure. While for lower priority jobs, the budget will > be tightened as the memory usage of 'urgent' jobs increases. In this > way we can finally achieve: > > - Important jobs won't be priority inversed by the background > jobs in terms of socket memory pressure/limit. > > - Global constrains are still effective, but only on non-urgent > jobs, useful for admins on policy decision on defrag. > > Comments/Ideas are welcomed, thanks! > This seems to go in a complete opposite direction than memcg promises. Can we fix memcg, so that : Each group can use the memory it was provisioned (this includes TCP buffers= ) Global tcp_memory can disappear (set tcp_mem to infinity)