linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J. Wong" <djwong@kernel.org>,
	 Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Peter Xu <peterx@redhat.com>, NeilBrown <neilb@suse.de>,
	 Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>, Yu Zhao <yuzhao@google.com>,
	 Dave Chinner <david@fromorbit.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v6 1/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
Date: Thu, 13 Apr 2023 03:45:42 -0700	[thread overview]
Message-ID: <CAJD7tkbnsSbZ2+Rf5NQKgBtH_JdN4AKMCuh8jasbQ-hcOOz-KA@mail.gmail.com> (raw)
In-Reply-To: <20230413104034.1086717-2-yosryahmed@google.com>

On Thu, Apr 13, 2023 at 3:40 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> We keep track of different types of reclaimed pages through
> reclaim_state->reclaimed_slab, and we add them to the reported number
> of reclaimed pages.  For non-memcg reclaim, this makes sense. For memcg
> reclaim, we have no clue if those pages are charged to the memcg under
> reclaim.
>
> Slab pages are shared by different memcgs, so a freed slab page may have
> only been partially charged to the memcg under reclaim.  The same goes for
> clean file pages from pruned inodes (on highmem systems) or xfs buffer
> pages, there is no simple way to currently link them to the memcg under
> reclaim.
>
> Stop reporting those freed pages as reclaimed pages during memcg reclaim.
> This should make the return value of writing to memory.reclaim, and may
> help reduce unnecessary reclaim retries during memcg charging.  Writing to
> memory.reclaim on the root memcg is considered as cgroup_reclaim(), but
> for this case we want to include any freed pages, so use the
> global_reclaim() check instead of !cgroup_reclaim().
>
> Generally, this should make the return value of
> try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g.
> freed a slab page that was mostly charged to the memcg under reclaim),
> the return value of try_to_free_mem_cgroup_pages() can be underestimated,
> but this should be fine. The freed pages will be uncharged anyway, and we
> can charge the memcg the next time around as we usually do memcg reclaim
> in a retry loop.
>
> Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects
> instead of pages")


Andrew, I removed the CC: stable as you were sceptical about the need
for a backport, but left the Fixes tag so that it's easy to identify
where to backport it if you and/or stable maintainers decide
otherwise.

>
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  mm/vmscan.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9c1c5e8b24b8..be657832be48 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -511,6 +511,46 @@ static bool writeback_throttling_sane(struct scan_control *sc)
>  }
>  #endif
>
> +/*
> + * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
> + * scan_control->nr_reclaimed.
> + */
> +static void flush_reclaim_state(struct scan_control *sc)
> +{
> +       /*
> +        * Currently, reclaim_state->reclaimed includes three types of pages
> +        * freed outside of vmscan:
> +        * (1) Slab pages.
> +        * (2) Clean file pages from pruned inodes (on highmem systems).
> +        * (3) XFS freed buffer pages.
> +        *
> +        * For all of these cases, we cannot universally link the pages to a
> +        * single memcg. For example, a memcg-aware shrinker can free one object
> +        * charged to the target memcg, causing an entire page to be freed.
> +        * If we count the entire page as reclaimed from the memcg, we end up
> +        * overestimating the reclaimed amount (potentially under-reclaiming).
> +        *
> +        * Only count such pages for global reclaim to prevent under-reclaiming
> +        * from the target memcg; preventing unnecessary retries during memcg
> +        * charging and false positives from proactive reclaim.
> +        *
> +        * For uncommon cases where the freed pages were actually mostly
> +        * charged to the target memcg, we end up underestimating the reclaimed
> +        * amount. This should be fine. The freed pages will be uncharged
> +        * anyway, even if they are not counted here properly, and we will be
> +        * able to make forward progress in charging (which is usually in a
> +        * retry loop).
> +        *
> +        * We can go one step further, and report the uncharged objcg pages in
> +        * memcg reclaim, to make reporting more accurate and reduce
> +        * underestimation, but it's probably not worth the complexity for now.
> +        */
> +       if (current->reclaim_state && global_reclaim(sc)) {
> +               sc->nr_reclaimed += current->reclaim_state->reclaimed;
> +               current->reclaim_state->reclaimed = 0;
> +       }
> +}
> +
>  static long xchg_nr_deferred(struct shrinker *shrinker,
>                              struct shrink_control *sc)
>  {
> @@ -5346,8 +5386,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
>                 vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
>                            sc->nr_reclaimed - reclaimed);
>
> -       sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
> -       current->reclaim_state->reclaimed_slab = 0;
> +       flush_reclaim_state(sc);
>
>         return success ? MEMCG_LRU_YOUNG : 0;
>  }
> @@ -6450,7 +6489,6 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>
>  static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  {
> -       struct reclaim_state *reclaim_state = current->reclaim_state;
>         unsigned long nr_reclaimed, nr_scanned;
>         struct lruvec *target_lruvec;
>         bool reclaimable = false;
> @@ -6472,10 +6510,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>
>         shrink_node_memcgs(pgdat, sc);
>
> -       if (reclaim_state) {
> -               sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> -               reclaim_state->reclaimed_slab = 0;
> -       }
> +       flush_reclaim_state(sc);
>
>         /* Record the subtree's reclaim efficiency */
>         if (!sc->proactive)
> --
> 2.40.0.577.gac1e443424-goog
>


  reply	other threads:[~2023-04-13 10:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 10:40 [PATCH v6 0/3] Ignore " Yosry Ahmed
2023-04-13 10:40 ` [PATCH v6 1/3] mm: vmscan: ignore " Yosry Ahmed
2023-04-13 10:45   ` Yosry Ahmed [this message]
2023-04-13 11:16   ` David Hildenbrand
2023-04-13 11:25     ` Yosry Ahmed
2023-04-14  8:15   ` Michal Hocko
2023-05-01 10:12   ` Yosry Ahmed
2023-04-13 10:40 ` [PATCH v6 2/3] mm: vmscan: move set_task_reclaim_state() near flush_reclaim_state() Yosry Ahmed
2023-04-13 11:19   ` David Hildenbrand
2023-04-13 11:26     ` Yosry Ahmed
2023-04-14  8:16   ` Michal Hocko
2023-04-13 10:40 ` [PATCH v6 3/3] mm: vmscan: refactor updating current->reclaim_state Yosry Ahmed
2023-04-13 11:20   ` David Hildenbrand
2023-04-13 11:29     ` Yosry Ahmed
2023-04-13 11:31       ` David Hildenbrand
2023-04-13 21:00       ` Dave Chinner
2023-04-13 21:38         ` Yosry Ahmed
2023-04-14 21:47           ` Andrew Morton
2023-04-14 23:11             ` Yosry Ahmed
2023-04-14  8:18   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkbnsSbZ2+Rf5NQKgBtH_JdN4AKMCuh8jasbQ-hcOOz-KA@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=neilb@suse.de \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox