From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A7EEE57E1 for ; Fri, 8 Sep 2023 07:55:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53D136B00A2; Fri, 8 Sep 2023 03:55:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EC3D6B00A3; Fri, 8 Sep 2023 03:55:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B4CD6B00A4; Fri, 8 Sep 2023 03:55:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2B5116B00A2 for ; Fri, 8 Sep 2023 03:55:33 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E7AC1140EBE for ; Fri, 8 Sep 2023 07:55:32 +0000 (UTC) X-FDA: 81212670504.01.953168F Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf20.hostedemail.com (Postfix) with ESMTP id 996801C001B for ; Fri, 8 Sep 2023 07:55:30 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=OHMoFQ1b; spf=pass (imf20.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694159731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=giUUe/jnEGyFiEmywWHcEaYFSDUdG775ovtenAQnoSM=; b=0EJpf9KGVTQxycj+edNjaNgV/gY0DrL4u9U8UGyNk6GV4nDR+iznw1J2Ijc2kFjFavcndh nKdfREYj8+uw+vIy634zR35BPniLBezovehCYNmxdPGs+foFeAuWIP5r5xCvkbVbi2qPC9 wkbls4oTppxZNVyDVl/flgyp0+LjYeo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=OHMoFQ1b; spf=pass (imf20.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694159731; a=rsa-sha256; cv=none; b=rOW3lxOeM+WXZaclE7S9gKcu0o4maaRxo6BBXM1es62v3mx1CmOsMh6CY2oIkMl3vrdgxx JJjwBOenXEAFEtOnY+25cvFFhSKC3EG4yvhO/SJ6PoSq0nHcpanxJRM0GWdfVJgMJ13Jk2 dEJNBi6XHhGVXUPXtdQZE0YPYG5GMcI= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1bc0d39b52cso14206085ad.2 for ; Fri, 08 Sep 2023 00:55:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1694159729; x=1694764529; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=giUUe/jnEGyFiEmywWHcEaYFSDUdG775ovtenAQnoSM=; b=OHMoFQ1bqsmwNStp1e6QdmSgIND7ULuHkknk8xlOGEpkBowFv+Tbfmzrk7E6J6FQtM +TVn5vXIK2z8qXz08D3c0dlwd4rgIYY00bZQaEHuZNMoWtt432sSX7QQ6SgoeDP3Vwgu J494ObxQyAwd6WnV+Xloctx8ya3SD3MbOS1WFuPOAhpVsPlopeIi4sdXViMW3HutUR+c njX9llvoYahXuAPDfTBwYoyxrcEWiIzijIBIss0+gGgFpfl2mwdMkF1DAgIZd0JIQZ/P 9IF02BZyj9i6zuhKrMV5+NG4V7dXUD9epiNQOhJ4okJquaJ6cg0j84X6kgu3ApCbFX7C cIBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694159729; x=1694764529; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=giUUe/jnEGyFiEmywWHcEaYFSDUdG775ovtenAQnoSM=; b=gS52UfJpuxkVG25gDbHZufvJZ1+8Bv/17keRuLasX19MeQu4R1mrcv+S3vIYQahg3/ HIvvP4KRomuGYqKUd5xtJfnX8nDWhWO2xhccVQ1BI0SmDGVthdsPuu3OKdJTwi1LKh7+ 6lVl45s5eUt0hojkQgb1vH9ovvOnI6hukCPpCOLOwLzESgzoKQHYn14NIV45KZ1O2hXy sERSMNGsuRkAPjUEoLP2GX9S4/0FuaKqZPagmTEGpCl3KIcf534UtH/S52w4eDtji7jK CUnzsmQtA/F3BQvyffdJluU45Bi4+H5tarzG/BHdi/DHv68iUj2Z4N3zqjn9fD52ya1h 00CA== X-Gm-Message-State: AOJu0YyIz9xbvLHwsqTUWkfWEYwsHTncvDMGoTjvtH/nKDfS1Uf2KRcd m0XeFJx04pMmSM3VmyzGTKy9Zw== X-Google-Smtp-Source: AGHT+IHC8wA/VzAMER8S/VGcxkWCjFOqx3Y1pqK7rOci8309HoiJlbcULqFpPT0Baw4zZygiJXHgpQ== X-Received: by 2002:a17:903:2302:b0:1bc:2c58:ad97 with SMTP id d2-20020a170903230200b001bc2c58ad97mr1803370plh.22.1694159729351; Fri, 08 Sep 2023 00:55:29 -0700 (PDT) Received: from [10.84.152.163] ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id i9-20020a1709026ac900b001bde877a7casm942099plt.264.2023.09.08.00.55.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Sep 2023 00:55:28 -0700 (PDT) Message-ID: <1d935bfc-50b0-54f3-22f0-d360f8a7c1ac@bytedance.com> Date: Fri, 8 Sep 2023 15:55:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Subject: Re: [RFC PATCH net-next 0/3] sock: Be aware of memcg pressure on alloc To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Morton , Shakeel Butt , Roman Gushchin , Michal Hocko , Johannes Weiner , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Breno Leitao , Alexander Mikhalitsyn , David Howells , Jason Xing Cc: open list , "open list:NETWORKING [GENERAL]" , "open list:MEMORY MANAGEMENT" References: <20230901062141.51972-1-wuyun.abel@bytedance.com> Content-Language: en-US From: Abel Wu In-Reply-To: <20230901062141.51972-1-wuyun.abel@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 996801C001B X-Rspam-User: X-Stat-Signature: gymioxp4p9pasjskxx1jnoiq4nzdz9s4 X-Rspamd-Server: rspam01 X-HE-Tag: 1694159730-376206 X-HE-Meta: U2FsdGVkX1/eHHtmNgKUD8k+YRLcQRXS5PL+QsizsOQCv0jVtzB/nB2DX1/OtO7oJUeywAomyLtIoeGXIgszmKN1x01OBc8lByrlmHbk7unqGmLl3PkSuyd9HVr+/y2VVkIPEyj86VxoIJYPdaYUc5GIj+GNtgPxqHLuwnuhexBUM9K6jhuHl+emBnvHBVea8Nsqcrmdqchp92vq6xKlzdMZOLGJfs0UJSRlYVcYij6Modc4FLT41tQIRpDfmeAyH+WiX4XDZHc1W+z43s6mhmPlsEJrReAqVCfdy1caxnJ+NJgid+tXWjd/NjBdaa3ortZWuijJgwvLh68wqkudIoSHZDaFunckljuaEbomfPl3FJoU2jiATJOyFnvhEYbqtTJ1Mvb2eLNQE7+sN+xMFnG0CWP72pt1KhG36U2RA0W4O1YxB2tfu9eIoxAsUHLK1QtjQxuFp+aaRMhvDPEHirqOltSfXTccmBOVzDepaUpcRBv29zmLO6NnC5QoChmguH3Rqe6IAW3pNnhmKwtBxF6lWrvhXhqjVHdm8wFacvSJYo32n/6T6jsqSUFNC55Q4WHiirsAraOynrNIsFoJV1h8igTISD8EcEG+JHlfTgcpaPXuN8M3fLdba8u/RuvkJf9LHkkT5p1XB+Sk97HOgr6edjvfKc4aUMS9fcFsBAtb2dkqI/pdTLhkX8NW3inhMCJAJe+iCaG7H+zeniSPxEW0MdmnfuwoF70qhrAYBhxil/Ev9VKHpRjWggSBrfIlo9hsmBpJ812xx7DdeH+fN2kZWD0x7bE6SPAVGrGrmV8moSAykwImpGchDUiVFliktpUSi/DmQ/wJM2qstVyUAMeskigtPQcSJ3K0OS/y4mBFJFxUDSfPFutdrTc3LK/p8VnIr5392tG3GOyzQVuJsGRBMSn3ajbjqWkWC3n73LxK9m1H9BAOdyLhNxRocABz/jFpZOvw6FNlDYBQ4Jv F8zQNf23 lDfbCefutoiewGO8lCSmj35+7Nk/5u59BVRiddtSpPmPPsbcSyqEsGNgF4Y0t2/8zbIAsvQUO6j6K+E4hERZJ8c1fX1XAshsNQv9rwTQF8yyhm+TBcznb7WSIvKrlBMo5+y7ashNBj2U0zYm11/AaPdT7Ops6hXqbGK/iicdbXin7ruYINtpf9xA7LtgtIjvAh4Kps+FTjwWIpe0DC8bWiJmhBqZOAUCQ6fIRQEeMheuDon4wSSco+sJHgESDZyXHnfV3raz6P5vU6TAv+I9RZ05NHhafV+1HOZN5sGwKym6IDp0iKfeVyY5NiHTWgsbNsvcBu4S+WJdVdxvKbwYIhTwmdcv/knEPevjKXjZ5k8XHdooaw6y6CeJ+f8soaXclLwjoT7KdCFlw9VB658P+KSyV3u7CYRKUDQM302d7VVlP8VFyQLPwrXheXm3htEAJiYN8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Friendly ping :) On 9/1/23 2:21 PM, Abel Wu wrote: > As a cloud service provider, we encountered a problem in our production > environment during the transition from cgroup v1 to v2 (partly due to the > heavy taxes of accounting socket memory in v1). Say one workload behaves > fine in cgroupv1 with memcg limit configured to 10GB memory and another > 1GB tcpmem, but will suck (or even be OOM-killed) in v2 with 11GB memory > due to burst memory usage on socket, since there is no specific limit for > socket memory in cgroupv2 and relies largely on workloads doing traffic > control themselves. > > It's rational for the workloads to build some traffic control to better > utilize the resources they bought, but from kernel's point of view it's > also reasonable to suppress the allocation of socket memory once there is > a shortage of free memory, given that performance degradation is usually > better than failure. > > This patchset aims to be more conservative on alloc for pressure-aware > sockets under global and/or memcg pressure, to avoid further memstall or > possibly OOM in such case. The patchset includes: > > 1/3: simple code cleanup, no functional change intended. > 2/3: record memcg pressure level to enable fine-grained control. > 3/3: throttle alloc for pressure-aware sockets under pressure. > > The whole patchset focuses on the pressure-aware protocols, and should > have no/little impact on pressure-unaware protocols like UDP etc. > > Tested on Intel Xeon(R) Platinum 8260, a dual socket machine containing 2 > NUMA nodes each of which has 24C/48T. All the benchmarks are done inside a > separate memcg in a clean host. > > baseline: net-next c639a708a0b8 > compare: baseline + patchset > > case load baseline(std%) compare%( std%) > tbench-loopback thread-24 1.00 ( 0.50) -0.98 ( 0.87) > tbench-loopback thread-48 1.00 ( 0.76) -0.29 ( 0.92) > tbench-loopback thread-72 1.00 ( 0.75) +1.51 ( 0.14) > tbench-loopback thread-96 1.00 ( 4.11) +1.29 ( 3.73) > tbench-loopback thread-192 1.00 ( 3.52) +1.44 ( 3.30) > TCP_RR thread-24 1.00 ( 1.87) -0.87 ( 2.40) > TCP_RR thread-48 1.00 ( 0.92) -0.22 ( 1.61) > TCP_RR thread-72 1.00 ( 2.35) +2.42 ( 2.27) > TCP_RR thread-96 1.00 ( 2.66) -1.37 ( 3.02) > TCP_RR thread-192 1.00 ( 13.25) +0.29 ( 11.80) > TCP_STREAM thread-24 1.00 ( 1.26) -0.75 ( 0.87) > TCP_STREAM thread-48 1.00 ( 0.29) -1.55 ( 0.14) > TCP_STREAM thread-72 1.00 ( 0.05) -1.59 ( 0.05) > TCP_STREAM thread-96 1.00 ( 0.19) -0.06 ( 0.29) > TCP_STREAM thread-192 1.00 ( 0.23) -0.01 ( 0.28) > UDP_RR thread-24 1.00 ( 2.27) +0.33 ( 2.82) > UDP_RR thread-48 1.00 ( 1.25) -0.30 ( 1.21) > UDP_RR thread-72 1.00 ( 2.54) +2.99 ( 2.34) > UDP_RR thread-96 1.00 ( 4.76) +2.49 ( 2.19) > UDP_RR thread-192 1.00 ( 14.43) -0.02 ( 12.98) > UDP_STREAM thread-24 1.00 (107.41) -0.48 (106.93) > UDP_STREAM thread-48 1.00 (100.85) +1.38 (100.59) > UDP_STREAM thread-72 1.00 (103.43) +1.40 (103.48) > UDP_STREAM thread-96 1.00 ( 99.91) -0.25 (100.06) > UDP_STREAM thread-192 1.00 (109.83) -3.67 (104.12) > > As patch 3 moves forward traversal of cgroup hierarchy for pressure-aware > protocols, which could turn a conditional overhead into constant, tests > running inside 5-level-depth cgroups are also performed. > > case load baseline(std%) compare%( std%) > tbench-loopback thread-24 1.00 ( 0.59) +0.68 ( 0.09) > tbench-loopback thread-48 1.00 ( 0.16) +0.01 ( 0.26) > tbench-loopback thread-72 1.00 ( 0.34) -0.67 ( 0.48) > tbench-loopback thread-96 1.00 ( 4.40) -3.27 ( 4.84) > tbench-loopback thread-192 1.00 ( 0.49) -1.07 ( 1.18) > TCP_RR thread-24 1.00 ( 2.40) -0.34 ( 2.49) > TCP_RR thread-48 1.00 ( 1.62) -0.48 ( 1.35) > TCP_RR thread-72 1.00 ( 1.26) +0.46 ( 0.95) > TCP_RR thread-96 1.00 ( 2.98) +0.13 ( 2.64) > TCP_RR thread-192 1.00 ( 13.75) -0.20 ( 15.42) > TCP_STREAM thread-24 1.00 ( 0.21) +0.68 ( 1.02) > TCP_STREAM thread-48 1.00 ( 0.20) -1.41 ( 0.01) > TCP_STREAM thread-72 1.00 ( 0.09) -1.23 ( 0.19) > TCP_STREAM thread-96 1.00 ( 0.01) +0.01 ( 0.01) > TCP_STREAM thread-192 1.00 ( 0.20) -0.02 ( 0.25) > UDP_RR thread-24 1.00 ( 2.20) +0.84 ( 17.45) > UDP_RR thread-48 1.00 ( 1.34) -0.73 ( 1.12) > UDP_RR thread-72 1.00 ( 2.32) +0.49 ( 2.11) > UDP_RR thread-96 1.00 ( 2.36) +0.53 ( 2.42) > UDP_RR thread-192 1.00 ( 16.34) -0.67 ( 14.06) > UDP_STREAM thread-24 1.00 (106.55) -0.70 (107.13) > UDP_STREAM thread-48 1.00 (105.11) +1.60 (103.48) > UDP_STREAM thread-72 1.00 (100.60) +1.98 (101.13) > UDP_STREAM thread-96 1.00 ( 99.91) +2.59 (101.04) > UDP_STREAM thread-192 1.00 (135.39) -2.51 (108.00) > > As expected, no obvious performance gain or loss observed. As for the > issue we encountered, this patchset provides better worst-case behavior > that such OOM cases are reduced at some extent. While further fine- > grained traffic control is what the workloads need to think about. > > Comments are welcomed! Thanks! > > Abel Wu (3): > sock: Code cleanup on __sk_mem_raise_allocated() > net-memcg: Record pressure level when under pressure > sock: Throttle pressure-aware sockets under pressure > > include/linux/memcontrol.h | 39 +++++++++++++++++++++++++---- > include/net/sock.h | 2 +- > include/net/tcp.h | 2 +- > mm/vmpressure.c | 9 ++++++- > net/core/sock.c | 51 +++++++++++++++++++++++++++++--------- > 5 files changed, 83 insertions(+), 20 deletions(-) >