linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Liu Shixin <liushixin2@huawei.com>
Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing
Date: Mon, 28 Jul 2025 11:16:40 +0200	[thread overview]
Message-ID: <khtydy2lpwip3cysfjiw7sa7effaagpx7tcbvgopqhsdb2h3fc@4xveh6pjxuoq> (raw)
In-Reply-To: <87jz3vdf9e.fsf@linux.dev>

On Fri 25-07-25 16:25:49, Roman Gushchin wrote:
> Roman Gushchin <roman.gushchin@linux.dev> writes:
> > Jan Kara <jack@suse.cz> writes:
> >> On Thu 10-07-25 12:52:32, Roman Gushchin wrote:
> >>> We've noticed in production that under a very heavy memory pressure
> >>> the readahead behavior becomes unstable causing spikes in memory
> >>> pressure and CPU contention on zone locks.
> >>> 
> >>> The current mmap_miss heuristics considers minor pagefaults as a
> >>> good reason to decrease mmap_miss and conditionally start async
> >>> readahead. This creates a vicious cycle: asynchronous readahead
> >>> loads more pages, which in turn causes more minor pagefaults.
> >>> This problem is especially pronounced when multiple threads of
> >>> an application fault on consecutive pages of an evicted executable,
> >>> aggressively lowering the mmap_miss counter and preventing readahead
> >>> from being disabled.
> >>
> >> I think you're talking about filemap_map_pages() logic of handling
> >> mmap_miss. It would be nice to mention it in the changelog. There's one
> >> thing that doesn't quite make sense to me: When there's memory pressure,
> >> I'd expect the pages to be reclaimed from memory and not just unmapped. 
> >> Also given your solution uses !uptodate folios suggests the pages were
> >> actually fully reclaimed and the problem really is that filemap_map_pages()
> >> treats as minor page fault (i.e., cache hit) what is in fact a major page
> >> fault (i.e., cache miss)?
> >>
> >> Actually, now that I digged deeper I've remembered that based on Liu
> >> Shixin's report
> >> (https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/)
> >> which sounds a lot like what you're reporting, we have eventually merged his
> >> fixes (ended up as commits 0fd44ab213bc ("mm/readahead: break read-ahead
> >> loop if filemap_add_folio return -ENOMEM"), 5c46d5319bde ("mm/filemap:
> >> don't decrease mmap_miss when folio has workingset flag")). Did you test a
> >> kernel with these fixes (6.10 or later)? In particular after these fixes
> >> the !folio_test_workingset() check in filemap_map_folio_range() and
> >> filemap_map_order0_folio() should make sure we don't decrease mmap_miss
> >> when faulting fresh pages. Or was in your case page evicted so long ago
> >> that workingset bit is already clear?
> >>
> >> Once we better understand the situation, let me also mention that I have
> >> two patches which I originally proposed to fix Liu's problems. They didn't
> >> quite fix them so his patches got merged in the end but the problems
> >> described there are still somewhat valid:
> >
> > Ok, I got a better understanding of the situation now. Basically we have
> > a multi-threaded application which is under very heavy memory pressure.
> > I multiple threads are faulting simultaneously into the same page,
> > do_sync_mmap_readahead() can be called multiple times for the same page.
> > This creates a negative pressure on the mmap_miss counter, which can't be
> > matched by do_sync_mmap_readahead(), which is be called only once
> > for every page. This basically keeps the readahead on, despite the heavy
> > memory pressure.
> >
> > The following patch solves the problem, at least in my test scenario.
> > Wdyt?
> 
> Actually, a better version is below. We don't have to avoid the actual
> readahead, just not decrease mmap_miss if the page is locked.
> 
> --
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 0d0369fb5fa1..1756690dd275 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3323,9 +3323,15 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
>         if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
>                 return fpin;
>  
> -       mmap_miss = READ_ONCE(ra->mmap_miss);
> -       if (mmap_miss)
> -               WRITE_ONCE(ra->mmap_miss, --mmap_miss);
> +       /* If folio is locked, we're likely racing against another fault,
> +        * don't decrease the mmap_miss counter to avoid decreasing it
> +        * multiple times for the same page and break the balance.
> +        */
> +       if (likely(!folio_test_locked(folio))) {

I like this, although even more understandable to me would be to have

	  if (likely(folio_test_uptodate(folio)))

which should be more or less equivalent for your situation but would better
express, whether this is indeed a cache hit or not. But I can live with
either variant.

								Honza

> +               mmap_miss = READ_ONCE(ra->mmap_miss);
> +               if (mmap_miss)
> +                       WRITE_ONCE(ra->mmap_miss, --mmap_miss);
> +       }
>  
>         if (folio_test_readahead(folio)) {
>                 fpin = maybe_unlock_mmap_for_io(vmf, fpin);
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


      reply	other threads:[~2025-07-28  9:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 19:52 Roman Gushchin
2025-07-10 20:57 ` Andrew Morton
2025-07-10 22:54   ` Roman Gushchin
2025-07-10 21:43 ` Matthew Wilcox
2025-07-11 16:29   ` Roman Gushchin
2025-07-14 15:16 ` Jan Kara
2025-07-14 20:12   ` Roman Gushchin
2025-07-25 22:42   ` Roman Gushchin
2025-07-25 23:25     ` Roman Gushchin
2025-07-28  9:16       ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=khtydy2lpwip3cysfjiw7sa7effaagpx7tcbvgopqhsdb2h3fc@4xveh6pjxuoq \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=roman.gushchin@linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox