From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4E31C28B2E for ; Mon, 10 Mar 2025 10:19:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4F70280003; Mon, 10 Mar 2025 06:19:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B01C3280001; Mon, 10 Mar 2025 06:19:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C9FB280003; Mon, 10 Mar 2025 06:19:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 787A7280001 for ; Mon, 10 Mar 2025 06:19:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 68255A8309 for ; Mon, 10 Mar 2025 10:19:03 +0000 (UTC) X-FDA: 83205243366.17.146783A Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf26.hostedemail.com (Postfix) with ESMTP id 74E3F14000B for ; Mon, 10 Mar 2025 10:19:01 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Isnx//5F"; spf=pass (imf26.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741601941; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tYIXrxRXI3mrwzjgczaAzsl5AVZ+PHyTsjrGME/HqJY=; b=jAA3XCL/lKzMlUOwiSMWtv5xAorNfJ6Sy+Bn5iemD1V8+KKxOsNo8GD1T1knRSDIzW/kOq fTRFSjuo/41Q8GO6ZngYqVWdCdl9f4fui4XQl+wWJpVPJEIb+NDEVjL5eCXPXA8qNUdkQT 8XW50Pk2GcGvG3bph0wi+zfdawbwImk= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Isnx//5F"; spf=pass (imf26.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741601941; a=rsa-sha256; cv=none; b=3QyyRqjlL6ocnDFdcKToGeethdZv0Vop+h1cAZQih4NDqGV/fcpmcx4Q6XR4qgXDO89U+l MwUc1ZGiISfeF5AsYBU4b+AatxscfQAB1mQsiz6eX0Kp9s/EQgTpNw98vx1S/lKKxanBh8 NOsWB43grcUIGWBwYctyEd+dEexCZWg= Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-43cf848528aso6244255e9.2 for ; Mon, 10 Mar 2025 03:19:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741601940; x=1742206740; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tYIXrxRXI3mrwzjgczaAzsl5AVZ+PHyTsjrGME/HqJY=; b=Isnx//5F/yFiQmhaWNqnuw9s8c3TNeBYg+Lq17zKdO0ezHmRAJ9pb8m9rkaX1jhrfw LOYawinrvyy7kCwZ7AVAEa+o0eMmM2rABSrKxue6AavynfQ2d3ldmaAHApZEn2L7zF7E 0I1OYvevewhMUwwVAXl6u8iwyWCkbH8V8jv/IROqfgBRiEtrhM7f3kzlm9CeB0Hl5wsk Fekgycw5Mzx8ORxmGbrDGi8SoLuzNqSaERe/DxitYuukHaIY1wG94V0Z5gjxuH1dD4Li cEmOCOyIjXaV4UqllePT+QeQ98R6UfDmWwz9exFJCPKJr8eK5u+lDY1fAtSq4SHB5+Bl y91w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741601940; x=1742206740; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tYIXrxRXI3mrwzjgczaAzsl5AVZ+PHyTsjrGME/HqJY=; b=MkaDt5mQ9GACiNVMPFpGNbrKCEofiCBo9hCUdKKNaFqeSCt53xvl1+ALfhYMjqyCLa CSELtwVWpssCdYpqukrNYlstgA/Xx4zMdFBgsHuo/GlpNTPdDQKsFLjtd7xkisaUt4P6 QzhpoT9QpG7MJnSzEppigldjmqoA1DTo9dYIsx2WrHiYap3Kumlu1p1uQ1MANHCzvejB f9r4Lac+JOrkP9HotLwwTTntJCeU+HHhDZbJOnwcLujrZbjjnd2V0nckV6sL7zO7jLnO 7U/ikf2PRDOO9YHetnqG+ou2QIAdl3SlKTvaDnp0wlzPm0s4ylXKKxMaFC26C3VaPBoq Z/ig== X-Forwarded-Encrypted: i=1; AJvYcCWwKey/HhEpJYA15xsqxkGmUamhs46UF0NKFOofdVw6cERkYNEGrawoAvdEdpkCtnr2M2+i3v2Xlg==@kvack.org X-Gm-Message-State: AOJu0Ywofv//kd7QFdPMCqx4qvVPdt+sUucUcTO8QnvsV4KFpefxAA9E pgdaXYHT+n2Gye+rG+yCVEeOYjcBHluFeaNllADGVEcng4RGnZeyB5tGGtdJ+RB3lbvOZJ2y7I6 zSdPLxL84Mg4ztcjuFpFQ0dhDZ9A= X-Gm-Gg: ASbGncsaxpdUQKgpqxgmpVkGUpZt3Brh4755sepxME3/m7L6GzBulpJfyxuy7jpXBjJ f5zjtGxP1STkOqTYl01N9Z+Kmv5vvoXmVnEPm31AZHAwgJH6HLTi6hkuV+5cveX+TSiXSF9wKyM xbYQpBSNGO4Hq+GIZoyOMeGZ9h0Q== X-Google-Smtp-Source: AGHT+IG3KKR3sztBqM854chtE6ZigydmE5r2mOU3fRk0CgwZyiSgnHBPqXoRnnAMaab3bvuMcyOuCUbobZKBc/m9An0= X-Received: by 2002:a05:600c:5108:b0:43b:c0fa:f9cd with SMTP id 5b1f17b1804b1-43c5ca74af6mr80960235e9.7.1741601939328; Mon, 10 Mar 2025 03:18:59 -0700 (PDT) MIME-Version: 1.0 References: <202503101254.cfd454df-lkp@intel.com> <7c41d8d7-7d5a-4c3d-97b3-23642e376ff9@suse.cz> In-Reply-To: <7c41d8d7-7d5a-4c3d-97b3-23642e376ff9@suse.cz> From: Alexei Starovoitov Date: Mon, 10 Mar 2025 11:18:45 +0100 X-Gm-Features: AQ5f1JqFqw3tvfIj8rggR3xxiFMFdt52DuLrMnDSxkTVXR-gJz_FZwsVzLZ6uWs Message-ID: Subject: Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression To: Vlastimil Babka Cc: kernel test robot , Alexei Starovoitov , oe-lkp@lists.linux.dev, kbuild test robot , Michal Hocko , Shakeel Butt , "open list:CONTROL GROUP (CGROUP)" , linux-mm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: pqgn6ddietf8sbp5qo7hjifz91kbg133 X-Rspam-User: X-Rspamd-Queue-Id: 74E3F14000B X-Rspamd-Server: rspam01 X-HE-Tag: 1741601941-562116 X-HE-Meta: U2FsdGVkX1+0hiriZ5RmZGxSPiV1kCbveg/P/OB/O4z7fziHD9YEMWeghrFvwM2B5/Va2nAKtlL0U2ReK5tFKuCSAi70rp9TlQW7QKna1BO0UE4yYBHg76Tn7PrYIWp7mrlF46PJ/QOMJgZJ5aBfxYZtZ+FAbQWxDa5u+wForQNXc/SsGpgm84CC6Bv+j+zwoXi9yNoSUKagYKoJwudOfMvYFhKRpygsNnEJh/UZEVkxprpn6zvnTFdkBpSqKq1Vx62plxpkRQ9zudR4yvL45AX+oMRwohyWCcU5vtbiniZ8QwD+yr3zJt6exiVa2/Vgtrjka4pBAOBi3AtV8div8eTczRJBKEW0RuR6ZOMzz7VOV+2GrDV2QSO6bCRXkBK6mj5I25P001gkJJFCQOKDDfDQBH18LYDAG3g9dsjjkYe5BR8D3uXKIsFaedLWQuCw/tT1GvOWQNvC+ZrK91WLX2r/b9bNjRqI/Q79ndLhJrpbq2Ofx0y8zxJ/9KxfuVb9HvBu+te/G2qq15xQU1GQ9DjmpgYvcJwt0g3PD8A5aNbTNcqyvvcGp3Q2Jsg567vsuzmOqNqwWUmKpNMNzhUMKaD2vHnBB7I9vrtuhqw7AL9eJEmxIHIxn6YATt3/51WhG71bo5PBv8r5N4Rq8Q3SssDKlcOo40BCoX1Ne7vtIuH7+2+EzzW9K6HURSs6ueBpYRBQX1VDx55Y5LPV9/7tF9fNL+kW8aVZOfSgytqGPUjcBHDmqahZkL2gEfrgvCFcFv+UT+gcZzc0WXcrHgqBMxfmkyXCxb5QNJq3Pi6Waj4FSJPsNhm93yI0BOA/n5qzJpwNdeu5RYI74xKwXuEczzx7JhFp+s8lpWiDYREtUuFU+y0GzQ4xeIfW2HnjoG212OHvMSUmdDJSiCFZZ8q1RHryP5nKn5timHibcr3CHhFaas68yxSqhvvvIhZdGRe92IIROxPq/VIB4tpNcn8 0g7TGJlC KxCOP8GAlbtr/GAPR8wnG/AKLNYXhbpObnYXjeQzqFrX55p6lWH4ewBjK/z0RV3q8gvc1j9ONosSyuQODpSmM6esPlEMQ/vF8S0l9tPpIeFhLxQ+ko67sYHs8rAu/P7MXOp4hKuMANYccy+tv8Qsb31P3AqfiH1vBNCqQ5ed4dV22wP+CTJWOaR3nXtJ75x9VFBKhGUcqW7cFY3ulR7greBL1kxy8H7pS60Vpw4tmRO6k8JVlPL1I02LvZKOM9ctxBuhJBMvXBf+mXxXhpu2qWTWXDm7yfgs+nbJizZiudyq5T4o6G+XxrSpKOY/6H5Hr1EkyTiIsC0AZkFDIPrUmdm5GI/m+mGHMsRNLyLCM+vqkH2c8Gsvywx82/ra0uCOmgQ2/e3H5AXuz5VQlybH9xlnm2A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 10, 2025 at 10:55=E2=80=AFAM Vlastimil Babka w= rote: > > On 3/10/25 06:50, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 37.9% regression of netperf.Throughput_Mbps= on: > > I assume this is some network receive context where gfpflags do not allow > blocking. gfpflags_allow_spinning() should be true for all current callers including networking. > > commit: 01d37228d331047a0bbbd1026cec2ccabef6d88d ("memcg: Use trylock t= o access memcg stock_lock.") > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > [test failed on linux-next/master 7ec162622e66a4ff886f8f28712ea1b13069e= 1aa] > > > > testcase: netperf > > config: x86_64-rhel-9.4 > > compiler: gcc-12 > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.= 00GHz (Ice Lake) with 256G memory > > parameters: > > > > ip: ipv4 > > runtime: 300s > > nr_threads: 50% > > cluster: cs-localhost > > test: TCP_MAERTS > > cpufreq_governor: performance > > > > > > In addition to that, the commit also has significant impact on the foll= owing tests: > > > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | stress-ng: stress-ng.mmapfork.ops_per_sec 63.5% r= egression | > > Hm interesting, this one at least from the name would be a GFP_KERNEL con= text? weird indeed. > > | test machine | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 85= 92+ (Emerald Rapids) with 256G memory | > > | test parameters | cpufreq_governor=3Dperformance = | > > | | nr_threads=3D100% = | > > | | test=3Dmmapfork = | > > | | testtime=3D60s = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | hackbench: hackbench.throughput 26.6% regression = | > > | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 C= PU @ 2.00GHz (Ice Lake) with 256G memory | > > | test parameters | cpufreq_governor=3Dperformance = | > > | | ipc=3Dsocket = | > > | | iterations=3D4 = | > > | | mode=3Dthreads = | > > | | nr_threads=3D100% = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec= 33.0% regression | > > | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 84= 80CTDX (Sapphire Rapids) with 512G memory | > > | test parameters | cpufreq_governor=3Dperformance = | > > | | mode=3Ddevelopment = | > > | | nr_threads=3D100% = | > > | | test=3DTCP = | > > | | test_memory_size=3D50% = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | vm-scalability: vm-scalability.throughput 86.8% r= egression | > > | test machine | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 85= 92+ (Emerald Rapids) with 256G memory | > > | test parameters | cpufreq_governor=3Dperformance = | > > | | runtime=3D300s = | > > | | size=3D1T = | > > | | test=3Dlru-shm = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | netperf: netperf.Throughput_Mbps 39.9% improvement= | > > An improvement? Weird. Even more weird and makes no sense to me so far. > > > | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 C= PU @ 2.00GHz (Ice Lake) with 256G memory | > > | test parameters | cluster=3Dcs-localhost = | > > | | cpufreq_governor=3Dperformance = | > > | | ip=3Dipv4 = | > > | | nr_threads=3D200% = | > > | | runtime=3D300s = | > > | | test=3DTCP_MAERTS = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > | testcase: change | will-it-scale: will-it-scale.per_thread_ops 68.8%= regression | > > | test machine | 104 threads 2 sockets (Skylake) with 192G memory = | > > | test parameters | cpufreq_governor=3Dperformance = | > > | | mode=3Dthread = | > > | | nr_task=3D100% = | > > | | test=3Dfallocate1 = | > > +------------------+---------------------------------------------------= -------------------------------------------------+ > > Some of those as well. > > Anyway we should not be expecting the localtry_trylock_irqsave() itself b= e > failing and resulting in a slow path, as that woulre require an allocatio= n > attempt from a nmi. So what else the commit does? > > > 0.10 =C2=B1 4% +11.3 11.43 =C2=B1 3% perf-profile.se= lf.cycles-pp.try_charge_memcg > > 0.00 +13.7 13.72 =C2=B1 2% perf-profile.self.cy= cles-pp.page_counter_try_charge > > This does suggest more time spent in try_charge_memcg() because consume_s= tock() has failed. > > And I suspect this: > > + if (!gfpflags_allow_spinning(gfp_mask)) > + /* Avoid the refill and flush of the older stock */ > + batch =3D nr_pages; > > because this will affect the refill even if consume_stock() fails not due= to > a trylock failure (which should not be happening), but also just because = the > stock was of a wrong memcg or depleted. So in the nowait context we deny = the > refill even if we have the memory. Attached patch could be used to see if= it > if fixes things. I'm not sure about the testcases where it doesn't look l= ike > nowait context would be used though, let's see. Not quite. GFP_NOWAIT includes __GFP_KSWAPD_RECLAIM, so gfpflags_allow_spinning() will return true. So 'batch' won't change. > I've also found this: > https://lore.kernel.org/all/7s6fbpwsynadnzybhdqg3jwhls4pq2sptyxuyghxpaufh= issj5@iadb6ibzscjj/ Right. And notice Shakeel's suggestion doesn't include '!' in the condition. I assumed it's a typo. Hence added it as "if (!gfpflags_allow_spinning(gfp_mask))" > > > > BTW after the done_restock tag in try_charge_memcg(), we will another > > gfpflags_allow_spinning() check to avoid schedule_work() and > > mem_cgroup_handle_over_high(). Maybe simply return early for > > gfpflags_allow_spinning() without checking high marks. > > looks like a small possible optimization that was forgotten? > ----8<---- > From 29e7d18645577ce13d8a0140c0df050ce1ce0f95 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka > Date: Mon, 10 Mar 2025 10:32:14 +0100 > Subject: [PATCH] memcg: Avoid stock refill only if stock_lock can't be > acquired > > Since commit 01d37228d331 ("memcg: Use trylock to access memcg > stock_lock.") consume_stock() can fail if it can't obtain > memcg_stock.stock_lock. In that case try_charge_memcg() also avoids > refilling or flushing the stock when gfp flags indicate we are in the > context where obtaining the lock could fail. > > However consume_stock() can also fail because the stock was depleted, or > belonged to a different memcg. Avoiding the stock refill then reduces > the caching efficiency, as the refill could still succeed with memory > available. This has caused various regressions to be reported by the > kernel test robot. > > To fix this, make the decision to avoid stock refill more precise by > making consume_stock() return -EBUSY when it fails to obtain stock_lock, > and using that for the no-refill decision. > > Fixes: 01d37228d331 ("memcg: Use trylock to access memcg stock_lock.") > Reported-by: kernel test robot > Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.co= m > > Signed-off-by: Vlastimil Babka > --- > mm/memcontrol.c | 22 ++++++++++++++-------- > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 092cab99dec7..a8371a22c7f4 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1772,22 +1772,23 @@ static bool obj_stock_flush_required(struct memcg= _stock_pcp *stock, > * stock, and at least @nr_pages are available in that stock. Failure t= o > * service an allocation will refill the stock. > * > - * returns true if successful, false otherwise. > + * returns 0 if successful, -EBUSY if lock cannot be acquired, or -ENOME= M > + * if the memcg does not match or there are not enough pages > */ > -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_page= s, > +static int consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages= , > gfp_t gfp_mask) > { > struct memcg_stock_pcp *stock; > unsigned int stock_pages; > unsigned long flags; > - bool ret =3D false; > + bool ret =3D -ENOMEM; > > if (nr_pages > MEMCG_CHARGE_BATCH) > return ret; > > if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) { > if (!gfpflags_allow_spinning(gfp_mask)) > - return ret; > + return -EBUSY; > localtry_lock_irqsave(&memcg_stock.stock_lock, flags); > } > > @@ -1795,7 +1796,7 @@ static bool consume_stock(struct mem_cgroup *memcg,= unsigned int nr_pages, > stock_pages =3D READ_ONCE(stock->nr_pages); > if (memcg =3D=3D READ_ONCE(stock->cached) && stock_pages >=3D nr_= pages) { > WRITE_ONCE(stock->nr_pages, stock_pages - nr_pages); > - ret =3D true; > + ret =3D 0; > } > > localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); > @@ -2228,13 +2229,18 @@ int try_charge_memcg(struct mem_cgroup *memcg, gf= p_t gfp_mask, > bool drained =3D false; > bool raised_max_event =3D false; > unsigned long pflags; > + int consume_ret; > > retry: > - if (consume_stock(memcg, nr_pages, gfp_mask)) > + consume_ret =3D consume_stock(memcg, nr_pages, gfp_mask); > + if (!consume_ret) > return 0; > > - if (!gfpflags_allow_spinning(gfp_mask)) > - /* Avoid the refill and flush of the older stock */ > + /* > + * Avoid the refill and flush of the older stock if we failed to = acquire > + * the stock_lock > + */ > + if (consume_ret =3D=3D -EBUSY) > batch =3D nr_pages; Sure. I think it's a good optimization, but I don't think it will make any difference here. Fixes tag is not appropriate and the commit log is off too. I have strong suspicion that this bot report is bogus. I'll try to repro anyway.