linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org, Joel Fernandes <joel@joelfernandes.org>,
	Brian Norris <briannorris@chromium.org>,
	Ching-lin Yu <chinglinyu@google.com>
Subject: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
Date: Thu, 12 Jan 2023 13:21:53 -0500	[thread overview]
Message-ID: <20230112132153.38d52708@gandalf.local.home> (raw)


Title: Tracing mapped pages for quicker boot performance

Description:

ChromeOS currently uses ureadahead that will periodically trace files that
are opened by the processes during the boot sequence. Then it will use this
information to call the readahead() system call in order to prefetch pages
before they are needed and speed up the applications. We have seen upward
towards 60% (and even higher is certain cases) performance gains when it's
working properly.

The ureadahead program comes from Canonical, and has not been updated since
2009 (although we've been adding patches on top of it since).

  https://launchpad.net/ubuntu/+source/ureadahead

The only changes Ubuntu has been doing with it is forward porting it to the
next release. But no code actually has changed. The 0.100.0 release was
last done in 2009.

Another problem with ureadahead is that it requires kernel modifications.
It adds in two tracepoints into the open paths so that it can see what
files have been opened (and it doesn't handle relative paths). These
tracepoints have been rejected upstream. We've been carrying them in our
ChromeOS kernel to use ureadahead.

ureadahead only looks at the files that are opened during boot, and then
reads the extents to see what parts of the file are interesting. It stores
this information into a "pack" file. Then on subsequent boots, instead of
tracing, it reads the pack file, calls the readahead() system call on the
locations it has in that pack file, to make sure they are in cache when the
applications need them.

One issue is that it can pick too much of the file, where it's reading
ahead portions of the file that will never be read, and hence, waste system
resources.

I've been looking into other approaches. I wrote a simple program that
reads the page_fault_user trace event, and every time it sees a new PID, it
reads the /proc/<pid>/maps file. And using the page fault trace event's
address, it can see exactly where in the file it is mapped to.

There's several issues with this approach. The main one being the race
condition between reading the pid and the /proc/<pid>/maps file. As the pid
may no longer exist, or it does an exec where the page faults no longer map
to the right location. But even with that, it does surprisingly well
(especially since we care more about long running applications than short
ones).

  https://rostedt.org/code/file-mapping.c

The above is just a toy application that tries this out, but could be used
as a starting point to replace ureadahead.

What I would like to discuss, is if there could be a way to add some sort
of trace events that can tell an application exactly what pages in a file
are being read from disk, where there is no such races. Then an application
would simply have to read this information and store it, and then it can
use this information later to call readahead() on these locations of the
file so that they are available when needed.

Note, in our use case boot ups do not change much. But I'm sure this could
be useful for other distributions.

This topic will require coordination with File systems, Storage, and MM.

I'm also open to having BPF help with this. One issue I want to make sure
we avoid, is any ABI we come up with that will hinder development later on.

-- Steve


             reply	other threads:[~2023-01-12 18:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12 18:21 Steven Rostedt [this message]
2023-01-12 20:35 ` Matthew Wilcox
2023-01-12 22:17   ` Steven Rostedt
2023-01-12 22:24     ` Matthew Wilcox
2023-01-12 22:30       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230112132153.38d52708@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=bpf@vger.kernel.org \
    --cc=briannorris@chromium.org \
    --cc=chinglinyu@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox