From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA84FC7EE29 for ; Fri, 9 Jun 2023 17:54:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00B228E0003; Fri, 9 Jun 2023 13:54:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFCBD8E0002; Fri, 9 Jun 2023 13:54:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9E4C8E0003; Fri, 9 Jun 2023 13:54:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C918E8E0002 for ; Fri, 9 Jun 2023 13:54:04 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 97575AEE4E for ; Fri, 9 Jun 2023 17:54:04 +0000 (UTC) X-FDA: 80883958008.25.B447BD7 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf04.hostedemail.com (Postfix) with ESMTP id C475A40014 for ; Fri, 9 Jun 2023 17:54:02 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3aDAkRb3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of shakeelb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686333242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=80A8FjuVOOa9xV5urorcQcKqtLHX/x3yj6q554YpmN4=; b=f7aWdi9Ye4uDnL4eUCX9N7RvqopqX1b8uam+vVRtfPJAiHiGwIzp90ulEsfi3dwme3y+1e P159bQUKSIS6iAfrVzFvlh55ZztcEP9AqBcfVZyXAY7yfBMj25ZZOqMvePYjT+BWEqB2Qc QQA32zpYwJC5sLsPzSEt31DU4G7F1CA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3aDAkRb3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of shakeelb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686333242; a=rsa-sha256; cv=none; b=I7gUbzZo9ZlxPxI6+P5LI+TNqrdJG5tEorSiSN8kJ11PsBbsdh+w+2pGDbWrOtAZrxRaMi /qTnJACZ2/k+8uB++G9xTQ5Z3SkGTbKpfHuaX7AAgMUBT6km3F9LDntlgSgezec4Hr7BpI HjaTuJfNvFQ2PDrNowqKDD+vKmH4/d4= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-3f9b7de94e7so16171cf.0 for ; Fri, 09 Jun 2023 10:54:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686333242; x=1688925242; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=80A8FjuVOOa9xV5urorcQcKqtLHX/x3yj6q554YpmN4=; b=3aDAkRb3lc+9oQjqGBI7DxxQhUAZCXyJz4umZdc4T+QCL+WhlmT6IcVLQeZPvpuc1t WaMCCTXRNkd48dRGNKGZEyL3tawhP6gy7GZ/+mPabZRJ3ySOkh6qKrqMQU+9R2JSghhR dMi9fNhDt+RgOePLbLSV32bwAdP1jAaW1MU5HJwdU/m9ss40PrWSVoklPyaec+qBNwwy Ms+Eh1bui8+GG4xojnc/FFU18ZLKSBkS/7ywj8Tt5cWDQ5TF9MmwrGqNc/CtWDCgD5Jy k1YFAWQYH2r9Yj8cnmBMYIc3RV/2HP2lIoMdSHUYdbCW5r/DjAPhCqjvIoRpkaw7fJrB T36g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686333242; x=1688925242; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=80A8FjuVOOa9xV5urorcQcKqtLHX/x3yj6q554YpmN4=; b=V7ZfdmOoVQ6z3BDUH2Ej5SJtw7N4TzPGYJES1OTvm5dPescLdMrU2yO2Jl5z6uWKsl legk4RbP57NPttQ+tNMxg8T8P3PPNwMDsXPZi9FSh6MPS8SQiMB2/7U1jvhMTvg+ZoF6 u+uW5GJiRQn7UWnSa/XzoWTvtwdccPQ1ZcDaWFrN0ZU7eFk9qLftER0KR39cyx5RwIi6 6kVpzJfPqsSyCYL732ojhNYLD88nEBzqrQgFrk0RP7iqyVlLwtEoOS5yJlBV6rl842yn DWc6lIcG/ikNLPPG54EdRmfNzXsz1+dlCyzdEtFpKyW/QHm76uOnRAKuggid5SCFIDhF RIYQ== X-Gm-Message-State: AC+VfDxfdFousfWwJqqXbYTaKEkk1zBy9q+exoqatFB2mk4XCJdzmgA9 mZ9mwE0Ji5+b6U94ED1QWl+me4uQiZXG8MwJ4Yy2mw== X-Google-Smtp-Source: ACHHUZ7wpAp4kNm4XE+nmsNmT+6iKQaK3dMunm88OVGoM6QMdbZslAcsNjcQkvXXaVHYApeSLPpZAD6e/s+V0TxCxxU= X-Received: by 2002:ac8:5b04:0:b0:3e3:8c75:461 with SMTP id m4-20020ac85b04000000b003e38c750461mr368917qtw.6.1686333241816; Fri, 09 Jun 2023 10:54:01 -0700 (PDT) MIME-Version: 1.0 References: <20230609082712.34889-1-wuyun.abel@bytedance.com> In-Reply-To: From: Shakeel Butt Date: Fri, 9 Jun 2023 22:53:50 +0500 Message-ID: Subject: Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation To: Eric Dumazet Cc: Abel Wu , Tejun Heo , Christian Warloe , Wei Wang , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Vasily Averin , Kuniyuki Iwashima , Martin KaFai Lau , Xin Long , Jason Xing , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C475A40014 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: tmyj1bpgsbnp6rxwa6mruoacrp44ft3m X-HE-Tag: 1686333242-190517 X-HE-Meta: U2FsdGVkX1+JyxUmhV2afcCTtxzF9LPxHdS+KJ02h7LJA+cKqpi1oWZXsgSNcqt1IiCocUV0PLommzSE5mmnE+x+OokM/kotbjXbSIsgPKAbXocn1Xx9CAfMARc4i0m0Z9+Aw97KnNOJe3yv5vmsQkFjc9WeN/Q2fRyFoqc6cPKwgeROLryTf05rGJvykaS2iev3WLvSZVnWYhJIfhQSsmTHdOFQcTCiSCTo5vgqAmPv6srSMZoWeInG5rBaVrFi4qNfQ3d3i84bIhE/yWOwWnuo5yzJcz9SlyDGbHl96HTvYZt3s7+7Z/H8EsTmfVG8msC93i8cDTw166heWNoK0I1Kg/cxopPUOcxsbcv7VoJ9PQuXKgjysmR4To7W7kJzZmZu54IGMZH+s7R1JokcPYOUV8/wVspfV6UaxUGkNcVI5tyIDWlisQnZ1UyFzjE5p42ifUqzhCxXmoIAvwX8dkKGeumLfLQhVi8Rfoj5dnRm1P4WqGW01rc9yAeD65lC0nEh/lLD9FJON7hXOoN03kzRqQ/A4kZv77MiexkRvSnudUfbczV+yRXaTbRigSYmJ5DKAnjxVnv1OKvg6oPYClTNRnbVV8+kwVGvMCrrvc0qiU6pItoHF/D5AJW0xBnUjxv1af1MymCX4nVE7FQfcqvtOpOYTidrjtMv/ZjoD3EflCEISverV0LiDORdJ3MCnljPblAosYJH92PKEC8qGq5ylnThikK+5Y2xseqwuIK0l56ZgeM1k0AuHXX2X90WDaa6Nitu6e5QlSFlt6Slhye0zDS3eKkw6i5ptalngBtK/Lmre/98l04YS2OIVHqWlpjCAHNwAhzWszjvaSQ5mtGRRvQT9PEAAzUWO9Dmj9xrDqk/n9ijwOi2F4UmyFI+z9CSW6PaL5Gg4bpxxB8Ux8ptORb/vmGuTsEGEUUM0k2czk7ewGUGkr3kyiMeVXygw3hqSFuYfP/gvjjgLCK 2mqj0pRF cFPz8Z4ltquxOsCyeJKoOt4nzPHfFx5dv51pKbOuI84esHMPktORgWSO+y+I5oYlN7mS3tPfXb4EBCrOO+9RkC6m3/xxI1x7N/OVYxJ7lmkC+Wno2yfpfzcH1gmpsvRnbbLGOzh5WhjV5PML3IwXz2JBWKZa2Pe0YOtVHCDiAJJKVyYzj+0NMJMuaqaA0m7Zg7aCj6JsQP+IbPpmctc9xT8xQ3Wv5N+k3pfLwoFAdRMgS3ebqobHtTSK8tZkgYHPe9x6FpQfoPtx1AptzXm1hiRJ4yEyl7evqOiMQFe6Kj/h0cF6SznN2NWX4nvetNU9yAIGZ76jVzxQrTz5nmvtu3kQODtMxdosH8EdADFSOR3vM3Xlxu1HhTJkCLT44+65XG5h7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 9, 2023 at 2:07=E2=80=AFPM Eric Dumazet w= rote: > > On Fri, Jun 9, 2023 at 10:28=E2=80=AFAM Abel Wu wrote: > > > > This is just a PoC patch intended to resume the discussion about > > tcpmem isolation opened by Google in LPC'22 [1]. > > > > We are facing the same problem that the global shared threshold can > > cause isolation issues. Low priority jobs can hog TCP memory and > > adversely impact higher priority jobs. What's worse is that these > > low priority jobs usually have smaller cpu weights leading to poor > > ability to consume rx data. > > > > To tackle this problem, an interface for non-root cgroup memory > > controller named 'socket.urgent' is proposed. It determines whether > > the sockets of this cgroup and its descendants can escape from the > > constrains or not under global socket memory pressure. > > > > The 'urgent' semantics will not take effect under memcg pressure in > > order to protect against worse memstalls, thus will be the same as > > before without this patch. > > > > This proposal doesn't remove protocal's threshold as we found it > > useful in restraining memory defragment. As aforementioned the low > > priority jobs can hog lots of memory, which is unreclaimable and > > unmovable, for some time due to small cpu weight. > > > > So in practice we allow high priority jobs with net-memcg accounting > > enabled to escape the global constrains if the net-memcg itselt is > > not under pressure. While for lower priority jobs, the budget will > > be tightened as the memory usage of 'urgent' jobs increases. In this > > way we can finally achieve: > > > > - Important jobs won't be priority inversed by the background > > jobs in terms of socket memory pressure/limit. > > > > - Global constrains are still effective, but only on non-urgent > > jobs, useful for admins on policy decision on defrag. > > > > Comments/Ideas are welcomed, thanks! > > > > This seems to go in a complete opposite direction than memcg promises. > > Can we fix memcg, so that : > > Each group can use the memory it was provisioned (this includes TCP buffe= rs) > > Global tcp_memory can disappear (set tcp_mem to infinity) I agree with Eric and this is exactly how we at Google overcome the isolation issue. We have set tcp_mem to unlimited and enabled memcg accounting of network memory (by surgically incorporating v2 semantics of network memory accounting in our v1 environment). I do have one question though: > This proposal doesn't remove protocal's threshold as we found it > useful in restraining memory defragment. Can you explain how you find the global tcp limit useful? What does memory defragment mean?