From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 075D0C77B7A for ; Tue, 13 Jun 2023 06:46:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47D148E0002; Tue, 13 Jun 2023 02:46:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42E3B6B0075; Tue, 13 Jun 2023 02:46:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31B698E0002; Tue, 13 Jun 2023 02:46:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 20B966B0074 for ; Tue, 13 Jun 2023 02:46:51 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E83511203F4 for ; Tue, 13 Jun 2023 06:46:50 +0000 (UTC) X-FDA: 80896791780.08.277E318 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) by imf23.hostedemail.com (Postfix) with ESMTP id C3CA014001F for ; Tue, 13 Jun 2023 06:46:46 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=N3xIpdGJ; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf23.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.54 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686638809; a=rsa-sha256; cv=none; b=NWrdMjl/ayoG/BxI504UKJ+V3lEpfcrykTLvnvKTO0/yec7Wikg3C0DQtc+F0uUFWt0vAT ElPI9EWHk+Pb1kScnYdOb5UmxJuwz67BUw8IL3FNOAaiYcK7ZRhX2k1Ltt1MBUCmIvqBhU gNNgqYdfCB70CSHYZP7DPmS95nwoW4k= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=N3xIpdGJ; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf23.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.54 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686638809; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NHNXSDSAPdGJpNANCbsAPKFp2yaEgGD+KFXl6OMicdU=; b=kXi1vG7Ol6cZoPS5eOhAp6l1IuKQETpeIW8CDEabygZFdL0xOV7lZ063QIQcalsXYFvkBW 3LwSp1woBwJ0v3nVeuviT8TDsQXY1jD1yUHT+ly19QrB7wB4kley1MC/B9Jod3tN0U/Sjl vNRksC0Y7Nlqw7U1c0ZvWBbcabDBjT0= Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-6b2b6910facso3064871a34.1 for ; Mon, 12 Jun 2023 23:46:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1686638805; x=1689230805; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=NHNXSDSAPdGJpNANCbsAPKFp2yaEgGD+KFXl6OMicdU=; b=N3xIpdGJflW3eOUqyzlLAk5abO21PLtlV9Mx6Umc9Y1/aUS6ZbK6xRXRwrBlCl8Hpt c4VFcfT/4bTwUNGcVg175+fbr1PAChggD6uVFx3lHagvhox0mUPosWozHs3j411OtiIg q1Z9LwzcFS02JAqJpGtU1gtlg9MoyXwZnR3qtAhSBZqlvB3neANc61YpQBDp8CqDL8LB zEmXHHS2jlaLBD/YUj1zWpzLM2mP/B13pd8pD5UXNt67O89fC37Inmaq9CEiTcuxvhdX fHhNFtYe96lllaquqJtuXVScNXSyTJXbWF6oVsGptmNDOdfS0LS+Pg89wOXHPW1dV3a+ flmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686638805; x=1689230805; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NHNXSDSAPdGJpNANCbsAPKFp2yaEgGD+KFXl6OMicdU=; b=B6o46hKIv8ySnjoCNr00zHJ0N3oxfvYcrc+YoCeyz82qOw6hRwUajMh2LJ70gIcP8X f2dj78oQDMgQLTNdDjcyVK6HH9jkd+rkjHKE5QjZfefu6AZk/GykRLz//MPJdjY399RN VuAEc3F1sM9bBXrDDeYMEoiH1pJQXsW3nlv06+uO0C2cbVZxu+XYGG86ByVyFWA7vecw IKBMi1XJNqoTJDmDDS1vIaJ8CGsWiWUNOu1V2Ry/gwvXOYSZ75H89eV3I26adPGj7FfP d5NxYVCsZZ+hAn9v/BGSscIj5g66fHoLmh7e1SRzogfQ3f8ccLQxAhtyUJbYg4zdurVM pNUQ== X-Gm-Message-State: AC+VfDwW292YaK4FQSjXTPmVUQb9I3shWCg7QRr4sMdnIOBGhGFTj0VC TBOHLSqQjiklNlIbGurE8eXJ9w== X-Google-Smtp-Source: ACHHUZ5yWwpRuqEtXXDwbA/oYjDxwnLkcpw00/aVzdZTCGVeM5RWwjUvOUXKhz+EyFWvMpgm+Xdikw== X-Received: by 2002:a05:6358:f1e:b0:129:94e6:7069 with SMTP id b30-20020a0563580f1e00b0012994e67069mr5181771rwj.0.1686638805446; Mon, 12 Jun 2023 23:46:45 -0700 (PDT) Received: from [10.254.80.225] ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id 17-20020a630011000000b00542d7720a6fsm8736801pga.88.2023.06.12.23.46.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Jun 2023 23:46:44 -0700 (PDT) Message-ID: Date: Tue, 13 Jun 2023 14:46:32 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation Content-Language: en-US To: Eric Dumazet Cc: Tejun Heo , Christian Warloe , Wei Wang , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Vasily Averin , Kuniyuki Iwashima , Martin KaFai Lau , Xin Long , Jason Xing , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" References: <20230609082712.34889-1-wuyun.abel@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C3CA014001F X-Stat-Signature: pzadf178hw1oqwif6scigk8mx5icdr8j X-HE-Tag: 1686638806-139209 X-HE-Meta: U2FsdGVkX18jOETT6zYsV9K3P6JGbKy1WghKlgEi9PN2wd+BARXoGUSK8HQng/Dh9UuJ5j2kaD6iG5nj1BDhsh9x4h0C0AAzIrkiMfjeeM67QQ8QFHOMS3A+ru6OLG2NVt3FSJWrhH3kVIF4oNP+BcrVt/AyJn8Z7dXwgeXxju/y2c/kr2Nt+HzsKP9MddZaDY22nu2HvM1idn3s5Soc7YtuYNi/Sp/pw0r7VDDb6rubc+sEfHEfkNbSzXnzU1BYFO0TaKjDTusxeHBvWILN5NL8Kev+Nfr9skGWWI/kDOnS75d9A3LwMTgvo2sfDb6B40u2TzaXzm1rxFHatmFUPdhaUlFeqehGg1vBX42gWumZnDHSMdI+oOV21u5Pd9Xp7SydTGotFGsYHGSTzmi2a6ppc25Kei5QtoEbruc8ID/x/kN5ahjZ/tTk3u6tdYU69qYP/sZ9v36Zy5e019QV5o1qV5ErS4lq51ZrtxF6rFS/x1xFyPhk7SEs+iL661k4vn8Fdqu8lTIgeyEyIXwp5cEpdqkUBiz1GF9KJzlf/TBh7k9FAfefSiLEG4EUuU/ksq30+dBml/zI9wwBAsojd7phh3ERkfaYpSsdWLsC0LEz0TCl8/sP7XJVSwwBRMCfewkq2EkcEsEEmGw6jo1KTxKYSxACPyuA9stovC4W0RGjp34T2gCuPANA1odeDFNIYNQ8Q8PzVWBUWXinD+zSCgtgRS+dlTaSP4He5vILY2dOPYE+plmHNZrHo2d4y23wba0pnHv08Dstx68u/9oomKyhQwE1dzAZvjyr1/fafKS7Oqvrw7QLn08gnGErL8u60V1Rwru3d7zNZ4CQ3kzSgV06BlBUz4yj5MZUblROmSDHQfMzRHw3BCs4IgWHK2qMJaR3X2LTZM3ib69FMP9OUPJyT6G9xukCGAD+osjgnoZlU5DvCJjPecYSdK7vi1JXT9b65hWVh7IQHCWu9BY mI0oBVrM Lu7uXtlZqcPlrGWChtP4vp6dad0vfaAvRJjVvQ+H4F6SWN3uOaymfBn2ugpxe5G1RwuCELBm7+/CI0EIoTKBSxU3mGTcTOtumnJJwGRrTvHea0mM8XBt9jeRdW0pcP/rYq6UEQfLzygojAXRPhoK49oiKfrc0+bOOKe67uF8jc/uN4Is++R8+GwW0qNa4ZeJ3uea6TqLoy+b16iL2ry2MI4zceoOpqBCwl/zh/lcjbUltLB46EzJ0Z8MteBYTDtksfo9mJoQcef35BVf0AOHWhsdT9LsSueDfiXtF0qJNrZAQzYBKQ0L4lQiwHGLoDiohBFRfH7M9XXM4BLNlPyPG0uASc+ABIIzEH51QraWACa9GMMs+Ut/u9rheo/Eh4H5MX/yD0YFwXag1Pz6ZD+07KzBXLY5jmKM5ntLaGaduuLhLQIs/PZfkq6Dy1UHIew89GFS9Lurqn52sEWtV8im89XldE7nc023XoPLccJOfYxo3IoILNYPZaO+3XpNVfwZfVqX/2kjK1j1awK7q2Yh+c98hbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/9/23 5:07 PM, Eric Dumazet wrote: > On Fri, Jun 9, 2023 at 10:28 AM Abel Wu wrote: >> >> This is just a PoC patch intended to resume the discussion about >> tcpmem isolation opened by Google in LPC'22 [1]. >> >> We are facing the same problem that the global shared threshold can >> cause isolation issues. Low priority jobs can hog TCP memory and >> adversely impact higher priority jobs. What's worse is that these >> low priority jobs usually have smaller cpu weights leading to poor >> ability to consume rx data. >> >> To tackle this problem, an interface for non-root cgroup memory >> controller named 'socket.urgent' is proposed. It determines whether >> the sockets of this cgroup and its descendants can escape from the >> constrains or not under global socket memory pressure. >> >> The 'urgent' semantics will not take effect under memcg pressure in >> order to protect against worse memstalls, thus will be the same as >> before without this patch. >> >> This proposal doesn't remove protocal's threshold as we found it >> useful in restraining memory defragment. As aforementioned the low >> priority jobs can hog lots of memory, which is unreclaimable and >> unmovable, for some time due to small cpu weight. >> >> So in practice we allow high priority jobs with net-memcg accounting >> enabled to escape the global constrains if the net-memcg itselt is >> not under pressure. While for lower priority jobs, the budget will >> be tightened as the memory usage of 'urgent' jobs increases. In this >> way we can finally achieve: >> >> - Important jobs won't be priority inversed by the background >> jobs in terms of socket memory pressure/limit. >> >> - Global constrains are still effective, but only on non-urgent >> jobs, useful for admins on policy decision on defrag. >> >> Comments/Ideas are welcomed, thanks! >> > > This seems to go in a complete opposite direction than memcg promises. > > Can we fix memcg, so that : > > Each group can use the memory it was provisioned (this includes TCP buffers) Yes, but might not be easy once memory gets over-committed (which is common in modern data-centers). So as a tradeoff, we intend to put harder constraint on memory allocation for low priority jobs. Or else if every job can use its provisioned memory, than there will be more memstalls blocking random jobs which could be the important ones. Either way hurts performance, but the difference is whose performance gets hurt. Memory protection (memory.{min,low}) helps the important jobs less affected by memstalls. But once low priority jobs use lots of kernel memory like sockmem, the protection might become much less efficient. > > Global tcp_memory can disappear (set tcp_mem to infinity)