Re: [PATCH] mm/readahead: Skip fully overlapped range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	Nanhai Zou <nanhai.zou@intel.com>,
	 Gang Deng <gang.deng@intel.com>,
	Tianyou Li <tianyou.li@intel.com>,
	 Vinicius Gomes <vinicius.gomes@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	 Chen Yu <yu.c.chen@intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	 Roman Gushchin <roman.gushchin@linux.dev>
Subject: Re: [PATCH] mm/readahead: Skip fully overlapped range
Date: Tue, 23 Sep 2025 11:57:21 +0200	[thread overview]
Message-ID: <cghebadvzchca3lo2cakcihwyoexx7fdqtibfywfm4xjo7eyp2@vbccezepgtoe> (raw)
In-Reply-To: <93f7e2ad-563b-4db5-bab6-4ce2e994dbae@linux.intel.com>

On Tue 23-09-25 13:11:37, Aubrey Li wrote:
> On 9/23/25 11:49, Andrew Morton wrote:
> > On Tue, 23 Sep 2025 11:59:46 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:
> > 
> >> RocksDB sequential read benchmark under high concurrency shows severe
> >> lock contention. Multiple threads may issue readahead on the same file
> >> simultaneously, which leads to heavy contention on the xas spinlock in
> >> filemap_add_folio(). Perf profiling indicates 30%~60% of CPU time spent
> >> there.
> >>
> >> To mitigate this issue, a readahead request will be skipped if its
> >> range is fully covered by an ongoing readahead. This avoids redundant
> >> work and significantly reduces lock contention. In one-second sampling,
> >> contention on xas spinlock dropped from 138,314 times to 2,144 times,
> >> resulting in a large performance improvement in the benchmark.
> >>
> >> 				w/o patch       w/ patch
> >> RocksDB-readseq (ops/sec)
> >> (32-threads)			1.2M		2.4M
> > 
> > On which kernel version?  In recent times we've made a few readahead
> > changes to address issues with high concurrency and a quick retest on
> > mm.git's current mm-stable branch would be interesting please.
> 
> I'm on v6.16.7. Thanks Andrew for the information, let me check with mm.git.

I don't expect much of a change for this load but getting test result with
mm.git as a confirmation would be nice. Also, based on the fact that the
patch you propose helps, this looks like there are many threads sharing one
struct file which race to read the same content. That is actually rather
problematic for current readahead code because there's *no synchronization*
on updating file's readhead state. So threads can race and corrupt the
state in interesting ways under one another's hands. On rare occasions I've
observed this with heavy NFS workload where the NFS server is
multithreaded. Since the practical outcome is "just" reduced read
throughput / reading too much, it was never high enough on my priority list
to fix properly (I do have some preliminary patch for that laying around
but there are some open questions that require deeper thinking - like how
to handle a situation where one threads does readahead, filesystem requests
some alignment of the request size after the fact, so we'd like to update
readahead state but another thread has modified the shared readahead state
in the mean time).  But if we're going to work on improving behavior of
readahead for multiple threads sharing readahead state, fixing the code so
that readahead state is at least consistent is IMO the first necessary
step. And then we can pile more complex logic on top of that.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2025-09-23  9:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23  3:59 Aubrey Li
2025-09-23  3:49 ` Andrew Morton
2025-09-23  5:11   ` Aubrey Li
2025-09-23  9:57     ` Jan Kara [this message]
2025-09-24  0:27       ` Aubrey Li
2025-09-30  5:35       ` Aubrey Li
2025-10-11 22:20         ` Andrew Morton
2025-10-16 16:21           ` Jan Kara
2025-11-07 10:28             ` Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cghebadvzchca3lo2cakcihwyoexx7fdqtibfywfm4xjo7eyp2@vbccezepgtoe \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=gang.deng@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vinicius.gomes@intel.com \
    --cc=willy@infradead.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox