From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78AECC83F1D for ; Sun, 13 Jul 2025 19:57:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B771C6B007B; Sun, 13 Jul 2025 15:57:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B28136B0089; Sun, 13 Jul 2025 15:57:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A64D06B008A; Sun, 13 Jul 2025 15:57:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 929E66B007B for ; Sun, 13 Jul 2025 15:57:36 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 868A91A042E for ; Sun, 13 Jul 2025 19:57:35 +0000 (UTC) X-FDA: 83660301270.24.07B9222 Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf16.hostedemail.com (Postfix) with ESMTP id DE41C180008 for ; Sun, 13 Jul 2025 19:57:33 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nNlK8nZh; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752436653; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=puv99ouo5wHgk+GBtNR2PPe4/WSCIo0qF8be0U1ROe4=; b=cAqiQSXPhHu8JDLoJVhLofaOpmXx14ZQ2nJ4vqcfxjjyWK+rGr8YNARMzY6jKo6RYk7YRY RDlWcx/kCBa+IjccmFluLzXuTwjWP+40ksC/W4F0+cuPZAoK7Q8U8c66AbIA7e0oOqTLh3 F4lHNRYkMBDoPQ5MQIHlGICekgq45hs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nNlK8nZh; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752436653; a=rsa-sha256; cv=none; b=6Xpy71yYZf+x4jcpSvij204NB2ldxGCjRXueb2UZM7UM+fGLknbfg1jV20ppubLsHnBbY2 Mn/JZE30ZIf2w52lM7BVLN7sDxs7bugF/GOOw50pOEFl4V2WT2VW30/+lY7BPNwBzYtJCS dVj2X277VeY+a+QZgSdXPQMSpK31mjk= Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-e8b8b669371so1445285276.3 for ; Sun, 13 Jul 2025 12:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752436653; x=1753041453; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=puv99ouo5wHgk+GBtNR2PPe4/WSCIo0qF8be0U1ROe4=; b=nNlK8nZh8z2IvJkt0U2IsvwF2hDgoh6yDgQDodMbMeLc2mGuLD4sTTiukPNk061RlV X51uzpTDC1GjX76tjO0o8bfCBI5UX3hPtoVIE2ltN+j5Sx9SE+zdSDGhEVzHiWabaZjE AQhI48qIl81DgqTh3jeegQ8OIRvB2IQZ7yvd5U+mUJIIcfYqnrGMICQxY7A6Xk7GQQ51 jv6QbTg2ZoudfpHU6UGF9H6FumX+OXv8l2VmL2IY996UyE/ssaDQ+DiO5pd64mHn1ykV eUY0RuG8dK3SLkObytlV12xv3G4jgCWFjvJsLqTAFBeMxtx5k0sC8DfnL69IRAO4srV3 9fAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752436653; x=1753041453; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=puv99ouo5wHgk+GBtNR2PPe4/WSCIo0qF8be0U1ROe4=; b=wqdRT3Uk3thlHSJpKov+msC2Tt1r+v8dnpKtqhWorZlPy8lgpJlfCc8cn4MoZCo3nl BU8Qgf7LKHPy9gUjuGvE9evRxpg6fN/eWkqWJ8blVdgUY90x1t2NerhZfsWH4STJRL4T d1Z2BSIXq8nW+Kzos6i84P7HR7DELzNQRgUc22VQTKj3v5gSFG3vLAmqT6XhGLqrE5ps VJ2cXWj/UgoFPP4iHmtpq9QBCS2DOB666Cs9d+q93MuWPbNZIGepBrlx95Iq4YEz9U8n KTI6kq4k2UYUvTdLGCYIAQ1xY+xj5jz4XswnpIUjRq/DjnM01nburEZFuzLV+hPHCc7s KeHA== X-Forwarded-Encrypted: i=1; AJvYcCX4q+ctNBJbAWOwpBKqZch+FV37mReedGmyYNCjBn+fPYkOxdpfatVRfwR5dyhpFr/Hub3w5Ym46g==@kvack.org X-Gm-Message-State: AOJu0YwOYL/M2Mad7alnZmYNBEHGIsmU5bp+4t1C+RmFTbp6rNTGYv+N urcPLhyJfIZanq0/HobtZX4u4uZ2VUw1A7lsBzateaNXa7UZrWNhaZcYmc1l5MJsNQ== X-Gm-Gg: ASbGncseN6ZFfgfRIPsWj/9FH5nCgCIrLPsyimsf24991rRg9uNZyVZicnmhtTtWozc R7SLFbA4aWQpMt1wIxDlB5P9hvrrGogIICbn4jqxuMkQqmKZHnkCU0n+NMY7SHukfobyEDO4CMw 1cRXOEtUh79b9C43XysI/R/eX4Yat8pygRYGY4EZINbRq8fR1glt30mfgAOAPhbTG0ymMUcl7ZC J4p3OrpkV018Q+2Qj7VXgDIRhLbolnhqyzqpu7vEU80umE45tD00WJjXaDo0ggDmlKnq8m+MES4 Y+S1aXCINJkKDgXtfrD7P1RJcaQpOG4wBo4VF10L4GivhrDjAOw2rYW3G3S7VZOXh4zFyXzjoRW tTDCGZpCDpQjWksHG5EhLsnXawPNZHV2DNasXKq2URFyrODagFIOoPEvM4H18MK0PG5tuKwWO/3 cuNzHfSNc= X-Google-Smtp-Source: AGHT+IFiebH+PL/0Lj4regqiAiXa4Tp91yy4FPEaY5u8sBCHF/ZH06ecEQwinRXyZ05uwCsN02PPzg== X-Received: by 2002:a05:690c:6802:b0:712:d70b:45d5 with SMTP id 00721157ae682-717d7a60057mr149542037b3.33.1752436652590; Sun, 13 Jul 2025 12:57:32 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 00721157ae682-717c61eae7bsm16518917b3.104.2025.07.13.12.57.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Jul 2025 12:57:31 -0700 (PDT) Date: Sun, 13 Jul 2025 12:57:18 -0700 (PDT) From: Hugh Dickins To: Roman Gushchin cc: Andrew Morton , Johannes Weiner , Shakeel Butt , Lorenzo Stoakes , Michal Hocko , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: skip lru_note_cost() when scanning only file or anon In-Reply-To: <8734b21tie.fsf@linux.dev> Message-ID: <21100102-51b6-79d5-03db-1bb7f97fa94c@google.com> References: <20250711155044.137652-1-roman.gushchin@linux.dev> <8734b21tie.fsf@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: DE41C180008 X-Stat-Signature: 5nr6wk6yn4w5ucxou78pk5r5ebez7cqk X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1752436653-487815 X-HE-Meta: U2FsdGVkX1+ukyce0Jgo87L2jym+FQYRY4lP0qqrEYG98GnTyxSjbLNYRLb7hNYO2ZDnt3H9+SWCaYwJz+ZMNNZ5jD3dN4/X2dpUxCCBSHl/7W2sdGIZbIloZa2ORC07UO7ys/lx2E4N+bKhAxL7tHZLihB7SoNeDXn9Z5Qv/aDfcQLnQkcTxuoULPsiv8Xvbd14odEbuGaUt0mTusyN/+UJaNEM/YQQ9MdsF8ZjbT+K7Rl7o1k1bmnqoDpIiqOmYsFp7NdHiyX+bn4fwgN/f/X8VketwBzlh8aMyo6M9PNngBG2ygg7WegFSLNHWHvsKK+oid4l0KnmvTdM+Rn5H2j+gTfPE7SuPP4yQe9W0yfFM64vrlE9CFyL2Me/zYoC92zBCt6Wkv6Zoe3G55vHx/tkFYoXDejZqljSBECK3rD8Q9ss9TTv4QdbEKWXqLRg/vMtdGJJMNR3XAHD7gHqp4fY4Wotv946149TTgvP759xyYshg7XiwdBjEla1bu7h7rA8p4lTNxTnj8ufid8QlbKvo3r0gEFYqlCcmvCYQQeO8OwtRxsK4vcaGeWAHQrJ/FXSyefrF6poUA4pSZ7ZNq1xdUg7L5mY6AlnsbB+EQ9NEQR11/ZiNgutQ7mnU0xRJUnsVsNnMMqrd7qvF8maKRj9xJFFGgeWluYgPwvNmB/cdMV9TLXf/2FinErbPQNMJWz302SPQ9wzax6u83GNFf/p3NjxKe8xpxd7DopvTD2zC1teAUFNdERRWyZ2bYcWlNYzhq7uVfptnGwv4fdEIXmuyI6FRm6YbNryrMwDx5h1NciFQjxsJCzYBv0gypVGluQ1Yzqkh4iieLCH5B9t0TiQHwzo3qZR1zX4ZBWcqZ/ZiritOlBA7uTAsb0djW3k1BSeDGpx/TjmcWsi1m9UfMQDqFX7FRi/gTqWY82LPKbSpHz3iA5r5Hb/46dU4on8YKXpAp0HDcgmLJKitrT hjmrSq66 PWxWXNlA+HhB97yBUpPjQqBVIEQ0933ddc679eaO42DuXGnhyrmwPdzWDk47MqlRsbMxZNY9sY/Wbuvn7L4eKoPk+Vp0pkFNXwARd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 11 Jul 2025, Roman Gushchin wrote: > Sorry, sent a wrong version with a trivial bug. Below is the correct > one. The only difference is s/&sc/sc when calling scan_balance_biased(). > > -- > > From c06530edfb8a11139f2d7878ce3956b9238cc702 Mon Sep 17 00:00:00 2001 > From: Roman Gushchin > Subject: [PATCH] mm: skip lru_note_cost() when scanning only file or anon > > lru_note_cost() records relative cost of incurring io and cpu spent > on lru rotations, which is used to balance the pressure on file and > anon memory. The applied pressure is inversely proportional to the > recorded cost of reclaiming, but only within 2/3 of the range > (swappiness aside). > > This is useful when both anon and file memory is reclaimable, however > in many cases it's not the case: e.g. there might be no swap, > proactive reclaim can target anon memory specifically, > the memory pressure can come from cgroup v1's memsw limit, etc. > In all these cases recording the cost will only bias all following > reclaim, also potentially outside of the scope of the original memcg. > > So it's better to not record the cost if it comes from the initially > biased reclaim. > > lru_note_cost() is a relatively expensive function, which traverses > the memcg tree up to the root and takes the lruvec lock on each level. > Overall it's responsible for about 50% of cycles spent on lruvec lock, > which might be a non-trivial number overall under heavy memory > pressure. So optimizing out a large number of lru_note_cost() calls > is also beneficial from the performance perspective. > > Signed-off-by: Roman Gushchin > --- > mm/vmscan.c | 34 +++++++++++++++++++++++++--------- > 1 file changed, 25 insertions(+), 9 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c86a2495138a..7d08606b08ea 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -71,6 +71,13 @@ > #define CREATE_TRACE_POINTS > #include > > +enum scan_balance { > + SCAN_EQUAL, > + SCAN_FRACT, > + SCAN_ANON, > + SCAN_FILE, > +}; > + > struct scan_control { > /* How many pages shrink_list() should reclaim */ > unsigned long nr_to_reclaim; > @@ -90,6 +97,7 @@ struct scan_control { > /* > * Scan pressure balancing between anon and file LRUs > */ > + enum scan_balance scan_balance; > unsigned long anon_cost; > unsigned long file_cost; > > @@ -1988,6 +1996,17 @@ static int current_may_throttle(void) > return !(current->flags & PF_LOCAL_THROTTLE); > } > > +static bool scan_balance_biased(struct scan_control *sc) > +{ > + switch (sc->scan_balance) { > + case SCAN_EQUAL: > + case SCAN_FRACT: > + return false; > + default: > + return true; > + } > +} > + > /* > * shrink_inactive_list() is a helper for shrink_node(). It returns the number > * of reclaimed pages > @@ -2054,7 +2073,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); > spin_unlock_irq(&lruvec->lru_lock); > > - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); > + if (!scan_balance_biased(sc)) > + lru_note_cost(lruvec, file, stat.nr_pageout, > + nr_scanned - nr_reclaimed); > > /* > * If dirty folios are scanned that are not queued for IO, it > @@ -2202,7 +2223,7 @@ static void shrink_active_list(unsigned long nr_to_scan, > __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); > spin_unlock_irq(&lruvec->lru_lock); > > - if (nr_rotated) > + if (nr_rotated && !scan_balance_biased(sc)) > lru_note_cost(lruvec, file, 0, nr_rotated); > trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, > nr_deactivate, nr_rotated, sc->priority, file); > @@ -2327,13 +2348,6 @@ static bool inactive_is_low(struct lruvec *lruvec, enum lru_list inactive_lru) > return inactive * inactive_ratio < active; > } > > -enum scan_balance { > - SCAN_EQUAL, > - SCAN_FRACT, > - SCAN_ANON, > - SCAN_FILE, > -}; > - > static void prepare_scan_control(pg_data_t *pgdat, struct scan_control *sc) > { > unsigned long file; > @@ -2613,6 +2627,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > calculate_pressure_balance(sc, swappiness, fraction, &denominator); > > out: > + sc->scan_balance = scan_balance; > + > for_each_evictable_lru(lru) { > bool file = is_file_lru(lru); > unsigned long lruvec_size; > -- > 2.50.0 Roman, I'm expressing no opinion on your patch above, but please may I throw the patch below (against 6.16-rc) over the wall to you, to add as a 1/2 or 2/2 to yours (as it stands, it does conflict slightly with yours). My attention needs to be on other things; but five years ago in https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2009211440570.5214@eggly.anvils/ I noted how lru_note_cost() became more costly with per-memcg lru_lock, but did nothing about it at the time. Apparently now is the time. Thanks, Hugh [PATCH] mm: lru_note_cost_unlock_irq() Dropping a lock, just to demand it again for an afterthought, cannot be good if contended: convert lru_note_cost() to lru_note_cost_unlock_irq(). Signed-off-by: Hugh Dickins --- include/linux/swap.h | 5 +++-- mm/swap.c | 26 +++++++++++++++++++------- mm/vmscan.c | 8 +++----- 3 files changed, 25 insertions(+), 14 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index bc0e1c275fc0..a64a87cda960 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -376,8 +376,9 @@ extern unsigned long totalreserve_pages; /* linux/mm/swap.c */ -void lru_note_cost(struct lruvec *lruvec, bool file, - unsigned int nr_io, unsigned int nr_rotated); +void lru_note_cost_unlock_irq(struct lruvec *lruvec, bool file, + unsigned int nr_io, unsigned int nr_rotated) + __releases(lruvec->lru_lock); void lru_note_cost_refault(struct folio *); void folio_add_lru(struct folio *); void folio_add_lru_vma(struct folio *, struct vm_area_struct *); diff --git a/mm/swap.c b/mm/swap.c index 4fc322f7111a..37053f222a6e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -237,8 +237,9 @@ void folio_rotate_reclaimable(struct folio *folio) folio_batch_add_and_move(folio, lru_move_tail, true); } -void lru_note_cost(struct lruvec *lruvec, bool file, - unsigned int nr_io, unsigned int nr_rotated) +void lru_note_cost_unlock_irq(struct lruvec *lruvec, bool file, + unsigned int nr_io, unsigned int nr_rotated) + __releases(lruvec->lru_lock) { unsigned long cost; @@ -250,8 +251,12 @@ void lru_note_cost(struct lruvec *lruvec, bool file, * different between them, adjust scan balance for CPU work. */ cost = nr_io * SWAP_CLUSTER_MAX + nr_rotated; + if (!cost) { + spin_unlock_irq(&lruvec->lru_lock); + return; + } - do { + for (;;) { unsigned long lrusize; /* @@ -261,7 +266,6 @@ void lru_note_cost(struct lruvec *lruvec, bool file, * rcu lock, so would be safe even if the page was on the LRU * and could move simultaneously to a new lruvec). */ - spin_lock_irq(&lruvec->lru_lock); /* Record cost event */ if (file) lruvec->file_cost += cost; @@ -285,14 +289,22 @@ void lru_note_cost(struct lruvec *lruvec, bool file, lruvec->file_cost /= 2; lruvec->anon_cost /= 2; } + spin_unlock_irq(&lruvec->lru_lock); - } while ((lruvec = parent_lruvec(lruvec))); + lruvec = parent_lruvec(lruvec); + if (!lruvec) + break; + spin_lock_irq(&lruvec->lru_lock); + } } void lru_note_cost_refault(struct folio *folio) { - lru_note_cost(folio_lruvec(folio), folio_is_file_lru(folio), - folio_nr_pages(folio), 0); + struct lruvec *lruvec; + + lruvec = folio_lruvec_lock_irq(folio); + lru_note_cost_unlock_irq(lruvec, folio_is_file_lru(folio), + folio_nr_pages(folio), 0); } static void lru_activate(struct lruvec *lruvec, struct folio *folio) diff --git a/mm/vmscan.c b/mm/vmscan.c index f8dfd2864bbf..5ba49f884bc0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2059,9 +2059,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, __count_vm_events(item, nr_reclaimed); count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&lruvec->lru_lock); - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); + lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout, + nr_scanned - nr_reclaimed); /* * If dirty folios are scanned that are not queued for IO, it @@ -2207,10 +2207,8 @@ static void shrink_active_list(unsigned long nr_to_scan, count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&lruvec->lru_lock); - if (nr_rotated) - lru_note_cost(lruvec, file, 0, nr_rotated); + lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); } -- 2.43.0