linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Jan Kara <jack@suse.cz>,
	lsf-pc@lists.linux-foundation.org,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	David Hildenbrand <david@redhat.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Juan Yescas <jyescas@google.com>,
	android-mm <android-mm@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
	"Cc: Android Kernel" <kernel-team@android.com>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior
Date: Tue, 25 Feb 2025 10:56:21 +1100	[thread overview]
Message-ID: <Z70HJWliB4wXE-DD@dread.disaster.area> (raw)
In-Reply-To: <CAC_TJvcnD731xyudgapjHx=dvVHY+cxoO1--2us7oo9TqA9-_g@mail.gmail.com>

On Mon, Feb 24, 2025 at 01:36:50PM -0800, Kalesh Singh wrote:
> Another possible way we can look at this: in the regressions shared
> above by the ELF padding regions, we are able to make these regions
> sparse (for *almost* all cases) -- solving the shared-zero page
> problem for file mappings, would also eliminate much of this overhead.
> So perhaps we should tackle this angle? If that's a more tangible
> solution ?
> 
> From the previous discussions that Matthew shared [7], it seems like
> Dave proposed an alternative to moving the extents to the VFS layer to
> invert the IO read path operations [8]. Maybe this is a move
> approachable solution since there is precedence for the same in the
> write path?
> 
> [7] https://lore.kernel.org/linux-fsdevel/Zs97qHI-wA1a53Mm@casper.infradead.org/
> [8] https://lore.kernel.org/linux-fsdevel/ZtAPsMcc3IC1VaAF@dread.disaster.area/

Yes, if we are going to optimise away redundant zeros being stored
in the page cache over holes, we need to know where the holes in the
file are before the page cache is populated.

As for efficient hole tracking in the mapping tree, I suspect that
we should be looking at using exceptional entries in the mapping
tree for holes, not inserting mulitple references to the zero folio.
i.e. the important information for data storage optimisation is that
the region covers a hole, not that it contains zeros.

For buffered reads, all that is required when such an exceptional
entry is returned is a memset of the user buffer. For buffered
writes, we simply treat it like a normal folio allocating write and
replace the exceptional entry with the allocated (and zeroed) folio.

For read page faults, the zero page gets mapped (and maybe
accounted) via the vma rather than the mapping tree entry. For write
faults, a folio gets allocated and the exception entry replaced
before we call into ->page_mkwrite().

Invalidation simply removes the exceptional entries.

This largely gets rid of needing to care about the zero page outside
of mmap() context where something needs to be mapped into the
userspace mm context. Let the page fault/mm context substitute the
zero page in the PTE mappings where necessary, but we don't need to
use and/or track the zero page in the page cache itself....

FWIW, this also lends itself to storing unwritten extent information
in exceptional entries. One of the problems we have is unwritten
extents can contain either zeros (been read) and data (been
overwritten in memory, but not flushed to disk). This is the problem
that SEEK_DATA has to navigate - it has to walk the page cache over
unwritten extents to determine if there is data over the unwritten
extent or not.

In this case, an exceptional entry gets added on read, which is then
replaced with an actual folio on write. Now SEEK_DATA can easily and
safely determine where the data actually lies over the unwritten
extent with a mapping tree walk instead of having to load and lock
each folio to check it is dirty or not....

-Dave.
-- 
Dave Chinner
david@fromorbit.com


  parent reply	other threads:[~2025-02-24 23:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 21:13 Kalesh Singh
2025-02-22 18:03 ` Kent Overstreet
2025-02-23  5:36   ` Kalesh Singh
2025-02-23  5:42     ` Kalesh Singh
2025-02-23  9:30     ` Lorenzo Stoakes
2025-02-23 12:24       ` Matthew Wilcox
2025-02-23  5:34 ` Ritesh Harjani
2025-02-23  6:50   ` Kalesh Singh
2025-02-24 12:56   ` David Sterba
2025-02-24 14:14 ` [Lsf-pc] " Jan Kara
2025-02-24 14:21   ` Lorenzo Stoakes
2025-02-24 16:31     ` Jan Kara
2025-02-24 16:52       ` Lorenzo Stoakes
2025-02-24 21:36         ` Kalesh Singh
2025-02-24 21:55           ` Kalesh Singh
2025-02-24 23:56           ` Dave Chinner [this message]
2025-02-25  6:45             ` Kalesh Singh
2025-02-27 22:12             ` Matthew Wilcox
2025-02-28  1:12               ` Dave Chinner
2025-02-28  9:07               ` David Hildenbrand
2025-04-02  0:13                 ` Kalesh Singh
2025-02-25  5:44           ` Lorenzo Stoakes
2025-02-25  6:59             ` Kalesh Singh
2025-02-25 16:36           ` Jan Kara
2025-02-26  0:49             ` Kalesh Singh
2025-02-25 16:21         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z70HJWliB4wXE-DD@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=android-mm@google.com \
    --cc=david@redhat.com \
    --cc=jack@suse.cz \
    --cc=jyescas@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox