From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F4CC678D5 for ; Wed, 8 Mar 2023 20:24:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F1A06B0071; Wed, 8 Mar 2023 15:24:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A0D76B0074; Wed, 8 Mar 2023 15:24:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64189280001; Wed, 8 Mar 2023 15:24:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 51F806B0071 for ; Wed, 8 Mar 2023 15:24:49 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 27225404FE for ; Wed, 8 Mar 2023 20:24:49 +0000 (UTC) X-FDA: 80546859498.19.0A84479 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf18.hostedemail.com (Postfix) with ESMTP id 492501C000B for ; Wed, 8 Mar 2023 20:24:47 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pCaJbe3D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678307087; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lJfFbUmgGUScXMrHUtC6nJ4LuBCrcr+wpYFFKWYzLa4=; b=rvKJX4ICqNxtzVpJnfsZXafMQIkK0s5wUXhDkbQq9k2p4h9PHBlI3gNth0g1Fw5JNjAlPR pYpuwR8/K/7esc3Nw6NGcErFzG9TFikAa2S1R25QRrVZOay8w7feaVAuEMw9gyvHGNICsB 5Ii47Z0FmVnCOEr2glE+r79Z+kUruxY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pCaJbe3D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678307087; a=rsa-sha256; cv=none; b=NegkuA6jPKXivSNtILZqxRjkhti8RXxQwk+lHidiwDqt3bgmsOkewsLEkWJ1BKP5KOGu17 v07W4oJR22T6MPUd2jA2hBYRfnIp5fugszFm8P6C1MChQzQIux+gz1ADYadt+ii5VaMvpZ KxZlwSmLBb5HtOuzVg6X/74eatgp7rk= Received: by mail-ed1-f53.google.com with SMTP id j11so51098704edq.4 for ; Wed, 08 Mar 2023 12:24:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678307086; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lJfFbUmgGUScXMrHUtC6nJ4LuBCrcr+wpYFFKWYzLa4=; b=pCaJbe3DmW0dyXfacjDPlJY5/4U/ixkj3XablpiH1L6RdZms5pExSKySI+k3rFZ0W+ XlgY9bNlROzdIEj4gUOPml8wQHiAZv4RBcvlEls+/I48YmqhZzU4lVSQp1y/Xt/hVSvC JCPoUk0K9747ZtqzB1JeG9f0KR1fBQH9Ik/SyRhoZD4RON4rAiIRSZLdVIwSgl9IGowk Pa7zqoOz/+RndMfT9O3ofxye9Yi8yNIfSBKY/FE5arKW9G9FhxZk0gBqQvC3XaUmPM3D VaFgmyCxkZNyYrJ1Gcfug6wHmKW64Win8yJ0deJXYmWzelnxKBid7hCZHMJP8F/xl8LD utqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678307086; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lJfFbUmgGUScXMrHUtC6nJ4LuBCrcr+wpYFFKWYzLa4=; b=2Yhq4dDKmKmw4WLjmYK5aPUuVx3idMfAUh5GVsahvYhn2DDAkU0xjZMk0/Z5O95VBL Xf04yFPbnly6efXGeIBLgkIxZE1RMmnX3zhf4lUZ1AISqA9Jkb+3wuJxS7JrcN1pc/Qd 4xJMlXHORiS8aqtW+r54B4lnKjW0lIZE3tHtrfL6I653fEnx7xg0TH0kufi4n/8jEMcn 8mEGiWkLkq/klvTfq3wi94kgWAmlZ6be7OQpQtQacIWYABjhA3svVVySYdKqtHwgW0W/ o3+OT3m9ookr+6WFr5w1cE0eiskIx9pSdQfB0fd8PSgTNNSpFaWpOS/lV9+HwfFXPsE8 xYwQ== X-Gm-Message-State: AO0yUKW4u8b7oRUhC5qq80ApsvFIQufoN4yHa+et5Y1z6vOJpcRch2Oh 1X8neTTm6LjKzQHXrMH9OqwO4I5wCOno2qRleo6MLw== X-Google-Smtp-Source: AK7set+DV9RxenSdhDCOFgakZZcsnB5QX/El2Wvkj9fjCVlQsjhbhU0QFrcMSUEA/hLFDWBIFA8W0HOSXVt6Xx+KiFY= X-Received: by 2002:a17:906:b11a:b0:8f8:edfc:b68b with SMTP id u26-20020a170906b11a00b008f8edfcb68bmr9931531ejy.6.1678307085464; Wed, 08 Mar 2023 12:24:45 -0800 (PST) MIME-Version: 1.0 References: <20230228085002.2592473-1-yosryahmed@google.com> <20230308160056.GA414058@cmpxchg.org> <20230308201629.GB476158@cmpxchg.org> In-Reply-To: <20230308201629.GB476158@cmpxchg.org> From: Yosry Ahmed Date: Wed, 8 Mar 2023 12:24:08 -0800 Message-ID: Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim To: Johannes Weiner Cc: Alexander Viro , "Darrick J. Wong" , Christoph Lameter , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Matthew Wilcox (Oracle)" , Miaohe Lin , David Hildenbrand , Peter Xu , NeilBrown , Shakeel Butt , Michal Hocko , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 492501C000B X-Stat-Signature: yf74w6yenx99pd3ur4s115paptxfm3ak X-HE-Tag: 1678307087-626649 X-HE-Meta: U2FsdGVkX1+XA42b1DFoifbQ+Javb5ClxF4cfY+OAHNQ1Uca1zYlra7dCLaKWQLcWL50zPnNebnE/oG5L3gD/VMR78QKew3VpzYW/9JjjSJoi4sqznwc5F2GbVaJuMUT3MABWUig13IrhvLBqVgUQtRRixtJdiW8a9nYDxJfMfH24QqaNMiw8JePQVirVx2TCjwLrE5YVM+R5TAT2iSospuB211hZ63WnqQAzjLc4Rgh7UF2yi0v5A0LEG9FTLJfEhq1XvJ+x5Cysct4xTxNF3UUrDiB70NC/hbQX6PoQ9uEvlYHxlq2Wyl3qxwRJ72hEzD3m5K7S1rodpQp9plftNNRfLxSeAZa35rGNXnJhofGQAdcPfQ1m02pTwdsU9bWQb3PJVwe+ZqS9u88448j7Zmt39CS2d82AuUIQhoy6ATEP3D6XTnQ6/xk3yhTZM1Kw3yYjATzlk117g4sQKjos1fouI/UlP+jWvcfpH9BtdCw2CsHU2ohZxycnUV1EZLGdZJuRypCGvLl/DPQpY6y7f3dfQpRY6rVZysRmmNr+vCco+F4IBGk/cih3/tZf17iahTGqBCFx9m+L0XmHf7IxiIPlFEhACwAQoORw329K2eNq+EdGU5PG0CJZLb9/x1v/s+PGhPK2Ah7fUgD+Kp3peALTZtar4kLqK05aENIHq5ijy2axVWyG7ly1tW76JhfnQOPT26HhGbdjkSzFM+5tiUZ3yqL96Rba134wn11O2GMLg+AcZr2GRVQ/CaymB1mEg0FvBKO5WCZKcOTcn+LPZTuSInPnQKf12GpcWnSkV0gB/8RBpNLIHuG+n+xqN1Q6FOcG7M/wE5CII1fyKbmb4bLTrt3jyL1K6wyCU7YiWbmvB8h3UC7PaNgHYu0nFKxzAw4l0GknWexGSR3ERO3d8DDDrpruXMzCk2Xu2DE+PBDunRNw29eMPyqemVqx09jJGrdqhYfkU1ocRCIvXR HiDN8FS8 5uROTjIV6Yl6Ij9EWey6VCGOPSlaT/YaRSEJJ5ezESJyBUBzCmkAxam8bf56EZI3W1/STomyXEx/DoiB1TkY8FiuqUXwY9Fw4Fxh2VI1XpPc69uFGfe5RnWRn3jeFhzTrk/JCPm4D1MPjcdX2s8XismtOEwv7pUqgBcURDgneLjU32T1i2BGBgtTeBDc512E46/J+ce2ZLgd+E0xEUDYlPjcoST2vnfaeKs7or86878fwOZROtyxFepWsB45eSLoQieuiG1cj3HAf0B5KcWWcjTRkRYQ1PO30JaOx6p3G/t7V2xGbD5jrZAsuk2g61LF1L0SKdVaq1+nLwk5c3ogDNSU0eI8n2HbFRIM7NSes7aVPR8GTfE0qj0z7aA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 8, 2023 at 12:16=E2=80=AFPM Johannes Weiner wrote: > > On Wed, Mar 08, 2023 at 10:01:24AM -0800, Yosry Ahmed wrote: > > On Wed, Mar 8, 2023 at 8:00=E2=80=AFAM Johannes Weiner wrote: > > > > > > Hello Yosry, > > > > > > On Tue, Feb 28, 2023 at 08:50:00AM +0000, Yosry Ahmed wrote: > > > > Reclaimed pages through other means than LRU-based reclaim are trac= ked > > > > through reclaim_state in struct scan_control, which is stashed in > > > > current task_struct. These pages are added to the number of reclaim= ed > > > > pages through LRUs. For memcg reclaim, these pages generally cannot= be > > > > linked to the memcg under reclaim and can cause an overestimated co= unt > > > > of reclaimed pages. This short series tries to address that. > > > > > > Could you please add more details on how this manifests as a problem > > > with real workloads? > > > > We haven't observed problems in production workloads, but we have > > observed problems in testing using memory.reclaim when sometimes a > > write to memory.reclaim would succeed when we didn't fully reclaim the > > requested amount. This leads to tests flaking sometimes, and we have > > to look into the failures to find out if there is a real problem or > > not. > > Ah, that would be great to have in the cover letter. Thanks! Will do in the next version. > > Have you also tested this patch against prod without memory.reclaim? > Just to make sure there are no problems with cgroup OOMs or > similar. There shouldn't be, but, you know... Honestly, no. I was debugging a test flake and I spotted that this was the cause, came up with patches to address it, and sent it to the mailing list for feedback. We did not want to merge it internally if it's not going to land upstream -- with the rationale that making the test more tolerant might be better than maintaining the patch internally, although that is not ideal of course (as it can hide actual failures from different sources). > > > > > Patch 1 is just refactoring updating reclaim_state into a helper > > > > function, and renames reclaimed_slab to just reclaimed, with a comm= ent > > > > describing its true purpose. > > > > > > Looking through the code again, I don't think these helpers add value= . > > > > > > report_freed_pages() is fairly vague. Report to who? It abstracts onl= y > > > two lines of code, and those two lines are more descriptive of what's > > > happening than the helper is. Just leave them open-coded. > > > > I agree the name is not great, I am usually bad at naming things and > > hope people would point that out (like you're doing now). The reason I > > added it is to contain the logic within mm/vmscan.c such that future > > changes do not have to add noisy diffs to a lot of unrelated files. If > > you have a better name that makes more sense to you please let me > > know, otherwise I'm fine dropping the helper as well, no strong > > opinions here. > > I tried to come up with something better, but wasn't happy with any of > the options, either. So I defaulted to just leaving it alone :-) > > It's part of the shrinker API and the name hasn't changed since the > initial git import of the kernel tree. It should be fine, churn-wise. Last attempt, just update_reclaim_state() (corresponding to flush_reclaim_state() below). It doesn't tell a story, but neither does incrementing a counter in current->reclaim_state. If that doesn't make you happy I'll give up now and leave it as-is :) > > > > add_non_vmanscan_reclaimed() may or may not add anything. But let's > > > take a step back. It only has two callsites because lrugen duplicates > > > the entire reclaim implementation, including the call to shrink_slab(= ) > > > and the transfer of reclaim_state to sc->nr_reclaimed. > > > > > > IMO the resulting code would overall be simpler, less duplicative and > > > easier to follow if you added a common shrink_slab_reclaim() that > > > takes sc, handles the transfer, and documents the memcg exception. > > > > IIUC you mean something like: > > > > void shrink_slab_reclaim(struct scan_control *sc, pg_data_t *pgdat, > > struct mem_cgroup *memcg) > > { > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); > > > > /* very long comment */ > > if (current->reclaim_state && !cgroup_reclaim(sc)) { > > sc->nr_reclaimed +=3D current->reclaim_state->reclaimed; > > current->reclaim_state->reclaimed =3D 0; > > } > > } > > Sorry, I screwed up, that doesn't actually work. > > reclaim_state is used by buffer heads freed in shrink_folio_list() -> > filemap_release_folio(). So flushing the count cannot be shrink_slab() > specific. Bummer. Your patch had it right by making a helper for just > flushing the reclaim state. But add_non_vmscan_reclaimed() is then > also not a great name because these frees are directly from vmscan. > > Maybe simply flush_reclaim_state()? Sounds good and simple enough, I will use that for the next version. > > As far as the name reclaimed_slab, I agree it's not optimal, although > 90% accurate ;-) I wouldn't mind a rename to just 'reclaimed'. Got it.