From: David Hildenbrand <david@redhat.com>
To: Mike Snitzer <snitzer@kernel.org>, Dave Chinner <david@fromorbit.com>
Cc: Ming Lei <ming.lei@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Don Dutile <ddutile@redhat.com>,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
linux-block@vger.kernel.org
Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range
Date: Tue, 30 Jan 2024 11:43:21 +0100 [thread overview]
Message-ID: <a754add2-de29-4c91-b4f4-cbd7eb888cb6@redhat.com> (raw)
In-Reply-To: <Zbgq3B8nmMuJooEl@redhat.com>
On 29.01.24 23:46, Mike Snitzer wrote:
> On Mon, Jan 29 2024 at 5:12P -0500,
> Dave Chinner <david@fromorbit.com> wrote:
>
>> On Mon, Jan 29, 2024 at 12:19:02PM -0500, Mike Snitzer wrote:
>>> While I'm sure this legacy application would love to not have to
>>> change its code at all, I think we can all agree that we need to just
>>> focus on how best to advise applications that have mixed workloads
>>> accomplish efficient mmap+read of both sequential and random.
>>>
>>> To that end, I heard Dave clearly suggest 2 things:
>>>
>>> 1) update MADV/FADV_SEQUENTIAL to set file->f_ra.ra_pages to
>>> bdi->io_pages, not bdi->ra_pages * 2
>>>
>>> 2) Have the application first issue MADV_SEQUENTIAL to convey that for
>>> the following MADV_WILLNEED is for sequential file load (so it is
>>> desirable to use larger ra_pages)
>>>
>>> This overrides the default of bdi->ra_pages and _should_ provide the
>>> required per-file duality of control for readahead, correct?
>>
>> I just discovered MADV_POPULATE_READ - see my reply to Ming
>> up-thread about that. The applicaiton should use that instead of
>> MADV_WILLNEED because it gives cache population guarantees that
>> WILLNEED doesn't. Then we can look at optimising the performance of
>> MADV_POPULATE_READ (if needed) as there is constrained scope we can
>> optimise within in ways that we cannot do with WILLNEED.
>
> Nice find! Given commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), I've cc'd David
> Hildenbrand just so he's in the loop.
Thanks for CCing me.
MADV_POPULATE_READ is indeed different; it doesn't give hints (not
"might be a good idea to read some pages" like MADV_WILLNEED documents),
it forces swapin/read/.../.
In a sense, MADV_POPULATE_READ is similar to simply reading one byte
from each PTE, triggering page faults. However, without actually reading
from the target pages.
MADV_POPULATE_READ has a conceptual benefit: we know exactly how much
memory user space wants to have populated (which range). In contrast,
page faults contain no such hints and we have to guess based on
historical behavior. One could use that range information to *not* do
any faultaround/readahead when we come via MADV_POPULATE_READ, and
really only popoulate the range of interest.
Further, one can use that range information to allocate larger folios,
without having to guess where placement of a large folio is reasonable,
and which size we should use.
>
> FYI, I proactively raised feedback and questions to the reporter of
> this issue:
>
> CONTEXT: madvise(WILLNEED) doesn't convey the nature of the access,
> sequential vs random, just the range that may be accessed.
Indeed. The "problem" with MADV_SEQUENTIAL/MADV_RANDOM is that it will
fragment/split VMAs. So applying it to smaller chunks (like one would do
with MADV_WILLNEED) is likely not a good option.
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2024-01-30 10:43 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-28 14:25 Ming Lei
2024-01-28 22:02 ` Matthew Wilcox
2024-01-28 23:12 ` Mike Snitzer
2024-01-29 0:21 ` Matthew Wilcox
2024-01-29 0:39 ` Mike Snitzer
2024-01-29 1:47 ` Dave Chinner
2024-01-29 2:12 ` Mike Snitzer
2024-01-29 4:56 ` Dave Chinner
2024-01-29 3:57 ` Ming Lei
2024-01-29 5:15 ` Dave Chinner
2024-01-29 8:25 ` Ming Lei
2024-01-29 13:26 ` Matthew Wilcox
2024-01-29 22:07 ` Dave Chinner
2024-01-30 3:13 ` Ming Lei
2024-01-30 5:29 ` Dave Chinner
2024-01-30 11:34 ` Ming Lei
2024-01-29 3:20 ` Ming Lei
2024-01-29 3:00 ` Ming Lei
2024-01-29 17:19 ` Mike Snitzer
2024-01-29 17:42 ` Mike Snitzer
2024-01-29 22:12 ` Dave Chinner
2024-01-29 22:46 ` Mike Snitzer
2024-01-30 10:43 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a754add2-de29-4c91-b4f4-cbd7eb888cb6@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=ddutile@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.lei@redhat.com \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=snitzer@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox