Re: [PATCH 1/1] mm/vmscan: prevent MGLRU reclaim from pinning address space

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Suren Baghdasaryan <surenb@google.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, david@kernel.org,
	 mhocko@kernel.org, zhengqi.arch@bytedance.com,
	yuzhao@google.com,  shakeel.butt@linux.dev, willy@infradead.org,
	Liam.Howlett@oracle.com,  axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] mm/vmscan: prevent MGLRU reclaim from pinning address space
Date: Mon, 23 Mar 2026 09:19:04 -0700	[thread overview]
Message-ID: <CAJuCfpFqz3OvOrJOnYG8NF0gjzkzmSaiM0LZKD_aLoW9Br2srA@mail.gmail.com> (raw)
In-Reply-To: <f22cb9d9-7fc8-4a79-ada8-02d66a1155b2@lucifer.local>

On Mon, Mar 23, 2026 at 6:43 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Sun, Mar 22, 2026 at 12:08:43AM -0700, Suren Baghdasaryan wrote:
> > When shrinking lruvec, MGLRU pins address space before walking it.
> > This is excessive since all it needs for walking the page range is
> > a stable mm_struct to be able to take and release mmap_read_lock and
> > a stable mm->mm_mt tree to walk. This address space pinning results
>
> Hmm, I guess exit_mmap() calls __mt_destroy(), but that'll just destroy
> allocated state and leave the tree empty right, so traversal of that tree
> at that point would just do nothing?

Correct. And __mt_destroy() happens under mmap_write_lock while
traversal under mmap_read_lock, so they should not race.

>
> > in delays when releasing the memory of a dying process. This also
> > prevents mm reapers (both in-kernel oom-reaper and userspace
> > process_mrelease()) from doing their job during MGLRU scan because
> > they check task_will_free_mem() which will yield negative result due
> > to the elevated mm->mm_users.
> >
> > Replace unnecessary address space pinning with mm_struct pinning by
> > replacing mmget/mmput with mmgrab/mmdrop calls. mm_mt is contained
> > within mm_struct itself, therefore it won't be freed as long as
> > mm_struct is stable and it won't change during the walk because
> > mmap_read_lock is being held.
> >
> > Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  mm/vmscan.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 33287ba4a500..68e8e90e38f5 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2863,8 +2863,9 @@ static struct mm_struct *get_next_mm(struct lru_gen_mm_walk *walk)
> >               return NULL;
>
> Not related to this series, but I really don't like how coupled MGLRU is to
> the rest of the 'classic' reclaim code.
>
> Just in the middle of vmscan you walk into generic mm walker logic and the
> only hint it's MGLRU is you see lru_gen_xxx stuff (I'm also annoyed that we
> call it MGLRU but it's called lru_gen_xxx in the kernel :)

I don't have a strong opinion on this. Perhaps the naming can be
changed outside of this series.

>
> >
> >       clear_bit(key, &mm->lru_gen.bitmap);
> > +     mmgrab(mm);
>
> Is the mm somehow pinned here or, on destruction, would move it from the mm
> list meaning that we can safely assume we have something sane in mm-> to
> grab? I guess this must have already been the case for mmget_not_zero() to
> have been used before though.

Yes, mm is stable because it's fetched from mm_list. When mm is added
to this list via lru_gen_add_mm(mm) it is referenced and that
reference is dropped only after lru_gen_del_mm(mm) removes the mm from
this list (see https://elixir.bootlin.com/linux/v7.0-rc4/source/kernel/fork.c#L1185
and https://elixir.bootlin.com/linux/v7.0-rc4/source/kernel/fork.c#L1187).
Addition, removal and retrieval from that list happen under
mm_list->lock which prevents races.

>
> >
> > -     return mmget_not_zero(mm) ? mm : NULL;
> > +     return mm;
> >  }
> >
> >  void lru_gen_add_mm(struct mm_struct *mm)
> > @@ -3064,7 +3065,7 @@ static bool iterate_mm_list(struct lru_gen_mm_walk *walk, struct mm_struct **ite
> >               reset_bloom_filter(mm_state, walk->seq + 1);
> >
> >       if (*iter)
> > -             mmput_async(*iter);
> > +             mmdrop(*iter);
>
> This will now be a blocking call that could free the mm (via __mmdrop()),
> so could take a while, is that ok?

mmdrop() should not be a heavy-weight operation. It simply destroys
the metadata associated with mm_struct. mmput() OTOH will call
exit_mmap() if it drops the last reference and that can take a while
because that's when we free the memory of the process. I believe
that's why mmput_async() was used here.

>
> If before the code was intentionally deferring work here, doesn't that
> imply that being slow here might be an issue, somehow? Or was it just
> because they could? :)

I think the reason was the possibility of calling mmput() -> __mmput()
-> exit_mmap(mm) which could indeed block us for a while.

>
> >
> >       *iter = mm;
> >
> >
> > base-commit: 8c65073d94c8b7cc3170de31af38edc9f5d96f0e
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >
>
> Thanks, Lorenzo

next prev parent reply	other threads:[~2026-03-23 16:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-22  7:08 Suren Baghdasaryan
2026-03-23 13:43 ` Lorenzo Stoakes (Oracle)
2026-03-23 16:19   ` Suren Baghdasaryan [this message]
2026-03-23 17:06     ` Lorenzo Stoakes (Oracle)
2026-03-23 17:24       ` Suren Baghdasaryan
2026-03-27 15:20       ` Suren Baghdasaryan
2026-03-27 19:53         ` Andrew Morton
2026-03-27 20:12           ` Suren Baghdasaryan
2026-03-23 13:43 ` Lorenzo Stoakes (Oracle)
2026-03-23 16:26   ` Suren Baghdasaryan
2026-03-23 17:02     ` Lorenzo Stoakes (Oracle)
2026-03-23 17:43       ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpFqz3OvOrJOnYG8NF0gjzkzmSaiM0LZKD_aLoW9Br2srA@mail.gmail.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox