On Wed, Apr 07, 2010 at 12:06:07PM +0800, Minchan Kim wrote: > On Wed, Apr 7, 2010 at 11:54 AM, Taras Glek wrote: > > On 04/06/2010 07:24 PM, Wu Fengguang wrote: > >> > >> Hi Taras, > >> > >> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote: > >> > >>> > >>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote: > >>> > >>>> > >>>> Hello, > >>>> I am working on improving Mozilla startup times. It turns out that page > >>>> faults(caused by lack of cooperation between user/kernelspace) are the > >>>> main cause of slow startup. I need some insights from someone who > >>>> understands linux vm behavior. > >>>> > >> > >> How about improve Fedora (and other distros) to preload Mozilla (and > >> other apps the user run at the previous boot) with fadvise() at boot > >> time? This sounds like the most reasonable option. > >> > > > > That's a slightly different usecase. I'd rather have all large apps startup > > as efficiently as possible without any hacks. Though until we get there, > > we'll be using all of the hacks we can. > >> > >> As for the kernel readahead, I have a patchset to increase default > >> mmap read-around size from 128kb to 512kb (except for small memory > >> systems). A This should help your case as well. > >> > > > > Yes. Is the current readahead really doing read-around(ie does it read pages > > before the one being faulted)? From what I've seen, having the dynamic > > linker read binary sections backwards causes faults. > > http://sourceware.org/bugzilla/show_bug.cgi?id=11447 > >> > >> > >>>> > >>>> Current Situation: > >>>> The dynamic linker mmap()s A executable and data sections of our > >>>> executable but it doesn't call madvise(). > >>>> By default page faults trigger 131072byte reads. To make matters worse, > >>>> the compile-time linker + gcc lay out code in a manner that does not > >>>> correspond to how the resulting executable will be executed(ie the > >>>> layout is basically random). This means that during startup 15-40mb > >>>> binaries are read in basically random fashion. Even if one orders the > >>>> binary optimally, throughput is still suboptimal due to the puny > >>>> readahead. > >>>> > >>>> IO Hints: > >>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb > >>>> reads and a binary that tends to take 110 page faults(ie program stops > >>>> execution and waits for disk) can be reduced down to 6. This has the > >>>> potential to double application startup of large apps without any clear > >>>> downsides. > >>>> > >>>> Suse ships their glibc with a dynamic linker patch to fadvise() > >>>> dynamic libraries(not sure why they switched from doing madvise > >>>> before). > >>>> > >> > >> This is interesting. I wonder how SuSE implements the policy. > >> Do you have the patch or some strace output that demonstrates the > >> fadvise() call? > >> > > > > glibc-2.3.90-ld.so-madvise.diff in > > http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0: > > > > As I recall they just fadvise the filedescriptor before accessing it. > >> > >> > >>>> > >>>> I filed a glibc bug about this at > >>>> http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented > >>>> with his concern about wasting memory resources. What is the impact of > >>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory > >>>> pressure? Does the kernel simply start ignoring these hints? > >>>> > >>> > >>> It will throttle based on memory pressure. A In idle situations it will > >>> eat your file cache, however, to satisfy the request. > >>> > >>> Now, the file cache should be much bigger than the amount of unneeded > >>> pages you prefault with the hint over the whole library, so I guess the > >>> benefit of prefaulting the right pages outweighs the downside of evicting > >>> some cache for unused library pages. > >>> > >>> Still, it's a workaround for deficits in the demand-paging/readahead > >>> heuristics and thus a bit ugly, I feel. A Maybe Wu can help. > >>> > >> > >> Program page faults are inherently random, so the straightforward > >> solution would be to increase the mmap read-around size (for desktops > >> with reasonable large memory), rather than to improve program layout > >> or readahead heuristics :) > >> > > > > Program page faults may exhibit random behavior once they've started. > > > > During startup page-in pattern of over-engineered OO applications is very > > predictable. Programs are laid out based on compilation units, which have no > > relation to how they are executed. Another problem is that any large old > > application will have lots of code that is either rarely executed or > > completely dead. Random sprinkling of live code among mostly unneeded code > > is a problem. > > I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB with > > proper binary layout. Even if one lays out a program wrongly, the worst-case > > pagein pattern will be pretty similar to what it is by default. > > > > But yes, I completely agree that it would be awesome to increase the > > readahead size proportionally to available memory. It's a little silly to be > > reading tens of megabytes in 128kb increments :) A You rock for trying to > > modernize this. > > Hi, Wu and Taras. > > I have been watched at this thread. > That's because I had a experience on reducing startup latency of application > in embedded system. > > I think sometime increasing of readahead size wouldn't good in embedded. > Many of embedded system has nand as storage and compression file system. > About nand, as you know, random read effect isn't rather big than hdd. > About compression file system, as one has a big compression, > it would make startup late(big block read and decompression). > We had to disable readahead of code page with kernel hacking. > And it would make application slow as time goes by. > But at that time we thought latency is more important than performance > on our application. > > Of course, it is different whenever what is file system and > compression ratio we use . > So I think increasing of readahead size might always be not good. > > Please, consider embedded system when you have a plan to tweak > readahead, too. :) Minchan, glad to know that you have experiences on embedded Linux. While increasing the general readahead size from 128kb to 512kb, I also added a limit for mmap read-around: if system memory size is less than X MB, then limit read-around size to X KB. For example, do only 128KB read-around for a 128MB embedded box, and 32KB ra for 32MB box. Do you think it a reasonable safety guard? Patch attached. Thanks, Fengguang