linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Zhang Peng <bruzzhang@tencent.com>, Usama Arif <usama.arif@linux.dev>
Cc: Zhang Peng via B4 Relay
	<devnull+zippermonkey.icloud.com@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm, vmscan: flush TLB for every 31 folios evictions
Date: Mon, 9 Mar 2026 21:19:29 +0800	[thread overview]
Message-ID: <CAMgjq7C-gisq454qu9Eo62baTpf+cLFcpMnEbv3Z6ZH9jH9kmw@mail.gmail.com> (raw)
In-Reply-To: <20260309122939.723610-1-usama.arif@linux.dev>

On Mon, Mar 9, 2026 at 8:42 PM Usama Arif <usama.arif@linux.dev> wrote:
>
> On Mon, 09 Mar 2026 16:17:42 +0800 Zhang Peng via B4 Relay <devnull+zippermonkey.icloud.com@kernel.org> wrote:
>
> > From: bruzzhang <bruzzhang@tencent.com>
> >
> > Currently we flush TLB for every dirty folio, which is a bottleneck for
> > systems with many cores as this causes heavy IPI usage.
> >
> > So instead, batch the folios, and flush once for every 31 folios (one
> > folio_batch). These folios will be held in a folio_batch releasing their
> > lock, then when folio_batch is full, do following steps:
> >
> > - For each folio: lock - check still evictable - unlock
> >   - If no longer evictable, return the folio to the caller.
> > - Flush TLB once for the batch
> > - Pageout the folios (refcount freeze happens in the pageout path)
> >
> > Note we can't hold a frozen folio in folio_batch for long as it will
> > cause filemap/swapcache lookup to livelock. Fortunately pageout usually
> > won't take too long; sync IO is fast, and non-sync IO will be issued
> > with the folio marked writeback.
> >
> > Suggested-by: Kairui Song <kasong@tencent.com>
> > Signed-off-by: bruzzhang <bruzzhang@tencent.com>
> > ---
> >  mm/vmscan.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 61 insertions(+), 7 deletions(-)

...

> >       folio_batch_init(&free_folios);
> > +     folio_batch_init(&flush_folios);
> > +
> >       memset(stat, 0, sizeof(*stat));
> >       cond_resched();
> >       do_demote_pass = can_demote(pgdat->node_id, sc, memcg);
> > @@ -1578,15 +1624,19 @@ static void shrink_folio_list(struct list_head *folio_list,
> >                               goto keep_locked;
> >                       if (!sc->may_writepage)
> >                               goto keep_locked;
> > -
> >                       /*
> > -                      * Folio is dirty. Flush the TLB if a writable entry
> > -                      * potentially exists to avoid CPU writes after I/O
> > -                      * starts and then write it out here.
> > +                      * For anon, we should only see swap cache (anon) and
> > +                      * the list pinning the page. For file page, the filemap
> > +                      * and the list pins it. Combined with the page_ref_freeze
> > +                      * in pageout_batch ensure nothing else touches the page
> > +                      * during lock unlocked.
> >                        */
>
> page_ref_freeze happens inside pageout_one() -> pageout() -> __remove_mapping(),
> which runs after the folio is re-locked and after the TLB flush.  During
> the unlocked window, the refcount is not frozen. Right?
>
> With this patch, the folio is unlocked before try_to_unmap_flush_dirty() runs
> in pageout_batch(). During this window, TLB entries on other CPUs could allow
> writes to the folio after it has been selected for pageout. My understanding
> is that the original code intentionally flushed TLB while the folio was locked
> to prevent this? Could there be data corruption can result if a write through
> a stale TLB entry races with the pageout I/O?

Hi Usama,

Thanks for the review. Yeah the comment here seems wrong, I agree with you.

Hi, Peng, I think you might have used some stall comment, at least
page_ref_freeze doesn't exist here and that doesn't seem to be how
this patch works currently. Can you help double check and update?

These folios are kept in the batch unlocked and unfreeze. Also,
unmapped. They could get mapped again or touched, so the batch flush
should relocks the folios and redo some routines before that unmap
before, and if they are still in a ready to be freed status, then
flush and do the IO, then free.

BTW some checks seem missing in the batch check? eg. folio_maybe_dma_pinned.


  reply	other threads:[~2026-03-09 13:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09  8:17 [PATCH 0/2] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng via B4 Relay
2026-03-09  8:17 ` [PATCH 1/2] mm/vmscan: refactor shrink_folio_list for readability and maintainability Zhang Peng via B4 Relay
2026-03-09  8:17 ` [PATCH 2/2] mm, vmscan: flush TLB for every 31 folios evictions Zhang Peng via B4 Relay
2026-03-09 12:29   ` Usama Arif
2026-03-09 13:19     ` Kairui Song [this message]
2026-03-09 14:56     ` Zhang Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7C-gisq454qu9Eo62baTpf+cLFcpMnEbv3Z6ZH9jH9kmw@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bruzzhang@tencent.com \
    --cc=david@kernel.org \
    --cc=devnull+zippermonkey.icloud.com@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox