linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
@ 2023-01-12 18:21 Steven Rostedt
  2023-01-12 20:35 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2023-01-12 18:21 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-fsdevel, linux-mm, bpf, Joel Fernandes, Brian Norris, Ching-lin Yu


Title: Tracing mapped pages for quicker boot performance

Description:

ChromeOS currently uses ureadahead that will periodically trace files that
are opened by the processes during the boot sequence. Then it will use this
information to call the readahead() system call in order to prefetch pages
before they are needed and speed up the applications. We have seen upward
towards 60% (and even higher is certain cases) performance gains when it's
working properly.

The ureadahead program comes from Canonical, and has not been updated since
2009 (although we've been adding patches on top of it since).

  https://launchpad.net/ubuntu/+source/ureadahead

The only changes Ubuntu has been doing with it is forward porting it to the
next release. But no code actually has changed. The 0.100.0 release was
last done in 2009.

Another problem with ureadahead is that it requires kernel modifications.
It adds in two tracepoints into the open paths so that it can see what
files have been opened (and it doesn't handle relative paths). These
tracepoints have been rejected upstream. We've been carrying them in our
ChromeOS kernel to use ureadahead.

ureadahead only looks at the files that are opened during boot, and then
reads the extents to see what parts of the file are interesting. It stores
this information into a "pack" file. Then on subsequent boots, instead of
tracing, it reads the pack file, calls the readahead() system call on the
locations it has in that pack file, to make sure they are in cache when the
applications need them.

One issue is that it can pick too much of the file, where it's reading
ahead portions of the file that will never be read, and hence, waste system
resources.

I've been looking into other approaches. I wrote a simple program that
reads the page_fault_user trace event, and every time it sees a new PID, it
reads the /proc/<pid>/maps file. And using the page fault trace event's
address, it can see exactly where in the file it is mapped to.

There's several issues with this approach. The main one being the race
condition between reading the pid and the /proc/<pid>/maps file. As the pid
may no longer exist, or it does an exec where the page faults no longer map
to the right location. But even with that, it does surprisingly well
(especially since we care more about long running applications than short
ones).

  https://rostedt.org/code/file-mapping.c

The above is just a toy application that tries this out, but could be used
as a starting point to replace ureadahead.

What I would like to discuss, is if there could be a way to add some sort
of trace events that can tell an application exactly what pages in a file
are being read from disk, where there is no such races. Then an application
would simply have to read this information and store it, and then it can
use this information later to call readahead() on these locations of the
file so that they are available when needed.

Note, in our use case boot ups do not change much. But I'm sure this could
be useful for other distributions.

This topic will require coordination with File systems, Storage, and MM.

I'm also open to having BPF help with this. One issue I want to make sure
we avoid, is any ABI we come up with that will hinder development later on.

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-12 22:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 18:21 [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance Steven Rostedt
2023-01-12 20:35 ` Matthew Wilcox
2023-01-12 22:17   ` Steven Rostedt
2023-01-12 22:24     ` Matthew Wilcox
2023-01-12 22:30       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox