From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A3BDC77B61 for ; Thu, 13 Apr 2023 10:46:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 824EB6B0072; Thu, 13 Apr 2023 06:46:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D44A900002; Thu, 13 Apr 2023 06:46:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 674A56B0075; Thu, 13 Apr 2023 06:46:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 581526B0072 for ; Thu, 13 Apr 2023 06:46:22 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 126F71A02BE for ; Thu, 13 Apr 2023 10:46:22 +0000 (UTC) X-FDA: 80676038604.24.C5BD691 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf17.hostedemail.com (Postfix) with ESMTP id 3163C40013 for ; Thu, 13 Apr 2023 10:46:19 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=iBBpmFcz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681382780; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1kRbU6nc4ZqgrBcPQoPoqFqYgjL9hmlR/TueIIn2YYA=; b=Dc1eBJjLeUxkjo2s580SYdqlrtal6C+oL9JUvMAkoXgqsaUVcZ2oE9drzFwjcuPoV2ZBP0 +nSI8DxSSpqSYxTATaWNiUE8QFCExa4rMVkwp4RGP3HQE6JxmuNKSHH3/HmuVy5SeYTriF gVZBcyL7DUuA3HoNoVMZavp7KVp0Y70= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=iBBpmFcz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681382780; a=rsa-sha256; cv=none; b=aN11oESs5ekO/zDQKppel03RJIye0WlLufBVgIU+gAjKsejl9bA7sf8BfrpAEVIcsVWCaQ DcdjeSTofV/SB9tTDCDyR1jbwXgVy41sV9u5BWbQXfvd6YG1FxGymEDnZ0lJVMmn5W70tc Z5/WVldHqFYb9w/TZf5S6Ir/5tVibAg= Received: by mail-ej1-f46.google.com with SMTP id f26so30181138ejb.1 for ; Thu, 13 Apr 2023 03:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681382778; x=1683974778; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1kRbU6nc4ZqgrBcPQoPoqFqYgjL9hmlR/TueIIn2YYA=; b=iBBpmFcz38y6AeHKjTWPduIZWh2DEuihaDb0wcg8c1Swf5CchMT/WMzFn4cJ8Njwec 2gDac2A57jS+VMzUkiCYDfG4WR+Wcg5thruLzZXtaRBfLmvMb8tK3ihbjQeC2MhcVfVA 50yFhgUuVejuA/BqJ4vqXv1mG6v6e7DsHn5yGwQ6iwaLHJDKAC/iPHsg7lAnlBhC/ltl S51m91lIglufOEq1XKWfWm8LIK1OKhPGniM7gPPg0kWsCy75x0a6QeQzmPtLSqSFMlJY uCC1h3nCIcuvsqWVL1pHf5GgNgkH2R6E9NUheV9BH5u91MmuY3N3L5IDdcChQOp4Kh2g buCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681382778; x=1683974778; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1kRbU6nc4ZqgrBcPQoPoqFqYgjL9hmlR/TueIIn2YYA=; b=GOqm0s4rf+i65BibMp29jzR6Kqb7MZ4IRbMuaKf3y1YyZyMn07W7bwy+27O/U3rstl YPkpPVEm6BB5paOjct5FZTr1+QrwtGPF739ZuS+qRUWufrK7OwO0FY8ZuOWzK1hNs9IJ 3bnT385K2G6UWpyQwmLSj5bg5Zc4gcBY641VuoZJRh3wsQMxGEApouc+HBiRwbeV8eh1 IhiJ9qIZrgd4i78+bICBuZ8aaZCB6Gss3s8b5NfTKlfv/T+Vvlsi6WvAhYIn/ZkwNWI0 rsW/tovjopGYxBLz7X8ACJg2m8e6vZDP0r84E078C8usVckw59ecEpsA4mqFdWX3GByD Ep8w== X-Gm-Message-State: AAQBX9d6w5RdIAr5Ss6wmlCGanuf/QX+LJc8whK6WP4M3LfN58mxREbt EyR6yTBYzYaOJ3vMRQkvhpHgypm7L2FRLrJ9RgZhDw== X-Google-Smtp-Source: AKy350aASI2gRddtiUsu6uEbTsoIdtPM75Lk+ucwTweLG2axKrPCTsxtIZLNtTpdA3utnAcOD3Prf8rOfFSZkJPnxXQ= X-Received: by 2002:a17:906:2c1a:b0:94e:8e6f:4f1c with SMTP id e26-20020a1709062c1a00b0094e8e6f4f1cmr1032618ejh.15.1681382778488; Thu, 13 Apr 2023 03:46:18 -0700 (PDT) MIME-Version: 1.0 References: <20230413104034.1086717-1-yosryahmed@google.com> <20230413104034.1086717-2-yosryahmed@google.com> In-Reply-To: <20230413104034.1086717-2-yosryahmed@google.com> From: Yosry Ahmed Date: Thu, 13 Apr 2023 03:45:42 -0700 Message-ID: Subject: Re: [PATCH v6 1/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim To: Andrew Morton Cc: Alexander Viro , "Darrick J. Wong" , Christoph Lameter , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Matthew Wilcox (Oracle)" , Miaohe Lin , David Hildenbrand , Johannes Weiner , Peter Xu , NeilBrown , Shakeel Butt , Michal Hocko , Yu Zhao , Dave Chinner , Tim Chen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3163C40013 X-Stat-Signature: aziq77gu3z8ctoxko631bzk536d93wrt X-HE-Tag: 1681382779-149625 X-HE-Meta: U2FsdGVkX1/7yNnmv7UWUMMh0HFTUCCd7WN9BdHtANjN+nyrcBtYjDWz1s3BRV3aEsq5x6fX5w6oEIQhYwlxiqnJosGLze7cobzoi0wwrbjepd294BpiPl+eRKxFLdS0/JS2vA0j0JwAK5ouuCHhddzoxcmXwWI2k1ewMjmCkBqeLsvJE1BgHbMsHv98ccMPA5Adw4ihMJf46wFrP/PiY6k0blyG4iAPh/wrrK1DpCwOYlYDYtD626Vm7d2eKfNHF2/JPTgIVSkSiCfup+mafnOYlKHKtcr3rmYKDeWgRcL0ZMjIOH+f5C6/nehkyiowbWyq7rGTcdHo2vHmdQENByZFFTMbhBbMue9HRLttbgUY3ujiBc6seplsd6oTU/mAEMHbWk+fh0l0WCO+ZtyRajVYwDQyO+otTKmhZ5BtooCY+fGxn+YZvQWegU0YYs45vsGDRzgGEcQfy9vnPMbDcuyzdo5KQXZj9rBdbKg/fu0DNkiK1C2J/fpcVIMKUq4dko2d8Xmr6jkNzgc+3U21FraD8OASYWikQjkM3Y3i/xhYe38lo6eN/3stSUFv0JuR3zzXesCPRmGngbwKjh2N9InIv4M8BzgAkk8jkcqIwCdzIzxJNvulNw5AnICr9gk3nk9PD0eEYcy+vxWuFfda/QfAv4Jyysco5nIB/nIIm/olpD3ZxqXrPP7j86zKGCHAfI6bEmpm0iDz2tqm8UY352JiCAN4oDIPly74c7dKhgh7wxfhlJKXwbRzItjtRUyJyJZJ+hQshvIGTTCZCpS/zn1AcCBsRADx2BZq7X2P5IIJAurK4dnonQJzuyDqytd3dA7vL0BW4Z2aLUXjqWZogipfCUMaqEOlR5oQD/3W+cGGoZHarkJFaR2riAWFUopUvfezZ/JSIP7djuwLttlCSdkkbWxM+pjPU7USBDGTOgDK83yoh904139GM+LH+VFDL75PLUPVwq/lKLDkjyn E7XGKnjA tugKMbHvC58UQf4gDQDwDUsZV0Ehg+mIs/XxqVks+I/zenevvSgMrgtgpV7E/yQrjxh4cwBbXYumWbRsuwdb4yQf4nmbTsFYkA2t9XSN25bTkVik6jxTliSw8A5bnNoyOQDStmEFH7EDBLZCIKEhH4RO9S3tZtd3wyCtE01UdwP5Y7dM9A9rzh+ytVzPS2acRikwUlznA2Ch4ENY3C2eeyrtu7mMH9vFCtLCiiWgZSjt/3ll4j4WO5EN6K7lL92QRluJBFZNDx4SRGh3giVFBo7cqVt2CSo7p3/XYO389Va5vfknPO8+swENMNVm6+u3wPz1oOyxAWcmCXHGgzh8qDYcIbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 13, 2023 at 3:40=E2=80=AFAM Yosry Ahmed = wrote: > > We keep track of different types of reclaimed pages through > reclaim_state->reclaimed_slab, and we add them to the reported number > of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg > reclaim, we have no clue if those pages are charged to the memcg under > reclaim. > > Slab pages are shared by different memcgs, so a freed slab page may have > only been partially charged to the memcg under reclaim. The same goes fo= r > clean file pages from pruned inodes (on highmem systems) or xfs buffer > pages, there is no simple way to currently link them to the memcg under > reclaim. > > Stop reporting those freed pages as reclaimed pages during memcg reclaim. > This should make the return value of writing to memory.reclaim, and may > help reduce unnecessary reclaim retries during memcg charging. Writing t= o > memory.reclaim on the root memcg is considered as cgroup_reclaim(), but > for this case we want to include any freed pages, so use the > global_reclaim() check instead of !cgroup_reclaim(). > > Generally, this should make the return value of > try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. > freed a slab page that was mostly charged to the memcg under reclaim), > the return value of try_to_free_mem_cgroup_pages() can be underestimated, > but this should be fine. The freed pages will be uncharged anyway, and we > can charge the memcg the next time around as we usually do memcg reclaim > in a retry loop. > > Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages") Andrew, I removed the CC: stable as you were sceptical about the need for a backport, but left the Fixes tag so that it's easy to identify where to backport it if you and/or stable maintainers decide otherwise. > > > Signed-off-by: Yosry Ahmed > --- > mm/vmscan.c | 49 ++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 42 insertions(+), 7 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 9c1c5e8b24b8..be657832be48 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -511,6 +511,46 @@ static bool writeback_throttling_sane(struct scan_co= ntrol *sc) > } > #endif > > +/* > + * flush_reclaim_state(): add pages reclaimed outside of LRU-based recla= im to > + * scan_control->nr_reclaimed. > + */ > +static void flush_reclaim_state(struct scan_control *sc) > +{ > + /* > + * Currently, reclaim_state->reclaimed includes three types of pa= ges > + * freed outside of vmscan: > + * (1) Slab pages. > + * (2) Clean file pages from pruned inodes (on highmem systems). > + * (3) XFS freed buffer pages. > + * > + * For all of these cases, we cannot universally link the pages t= o a > + * single memcg. For example, a memcg-aware shrinker can free one= object > + * charged to the target memcg, causing an entire page to be free= d. > + * If we count the entire page as reclaimed from the memcg, we en= d up > + * overestimating the reclaimed amount (potentially under-reclaim= ing). > + * > + * Only count such pages for global reclaim to prevent under-recl= aiming > + * from the target memcg; preventing unnecessary retries during m= emcg > + * charging and false positives from proactive reclaim. > + * > + * For uncommon cases where the freed pages were actually mostly > + * charged to the target memcg, we end up underestimating the rec= laimed > + * amount. This should be fine. The freed pages will be uncharged > + * anyway, even if they are not counted here properly, and we wil= l be > + * able to make forward progress in charging (which is usually in= a > + * retry loop). > + * > + * We can go one step further, and report the uncharged objcg pag= es in > + * memcg reclaim, to make reporting more accurate and reduce > + * underestimation, but it's probably not worth the complexity fo= r now. > + */ > + if (current->reclaim_state && global_reclaim(sc)) { > + sc->nr_reclaimed +=3D current->reclaim_state->reclaimed; > + current->reclaim_state->reclaimed =3D 0; > + } > +} > + > static long xchg_nr_deferred(struct shrinker *shrinker, > struct shrink_control *sc) > { > @@ -5346,8 +5386,7 @@ static int shrink_one(struct lruvec *lruvec, struct= scan_control *sc) > vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - s= canned, > sc->nr_reclaimed - reclaimed); > > - sc->nr_reclaimed +=3D current->reclaim_state->reclaimed_slab; > - current->reclaim_state->reclaimed_slab =3D 0; > + flush_reclaim_state(sc); > > return success ? MEMCG_LRU_YOUNG : 0; > } > @@ -6450,7 +6489,6 @@ static void shrink_node_memcgs(pg_data_t *pgdat, st= ruct scan_control *sc) > > static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) > { > - struct reclaim_state *reclaim_state =3D current->reclaim_state; > unsigned long nr_reclaimed, nr_scanned; > struct lruvec *target_lruvec; > bool reclaimable =3D false; > @@ -6472,10 +6510,7 @@ static void shrink_node(pg_data_t *pgdat, struct s= can_control *sc) > > shrink_node_memcgs(pgdat, sc); > > - if (reclaim_state) { > - sc->nr_reclaimed +=3D reclaim_state->reclaimed_slab; > - reclaim_state->reclaimed_slab =3D 0; > - } > + flush_reclaim_state(sc); > > /* Record the subtree's reclaim efficiency */ > if (!sc->proactive) > -- > 2.40.0.577.gac1e443424-goog >