Christian Ehrhardt wrote: > > > Rik van Riel wrote: >> On 04/20/2010 11:32 AM, Johannes Weiner wrote: >> >>> The idea is that it pans out on its own. If the workload changes, new >>> pages get activated and when that set grows too large, we start >>> shrinking >>> it again. >>> >>> Of course, right now this unscanned set is way too large and we can end >>> up wasting up to 50% of usable page cache on false active pages. >> >> Thing is, changing workloads often change back. >> >> Specifically, think of a desktop system that is doing >> work for the user during the day and gets backed up >> at night. >> >> You do not want the backup to kick the working set >> out of memory, because when the user returns in the >> morning the desktop should come back quickly after >> the screensaver is unlocked. > > IMHO it is fine to prevent that nightly backup job from not being > finished when the user arrives at morning because we didn't give him > some more cache - and e.g. a 30 sec transition from/to both optimized > states is fine. > But eventually I guess the point is that both behaviors are reasonable > to achieve - depending on the users needs. > > What we could do is combine all our thoughts we had so far: > a) Rik could create an experimental patch that excludes the in flight pages > b) Johannes could create one for his suggestion to "always scan active > file pages but only deactivate them when the ratio is off and otherwise > strip buffers of clean pages" > c) I would extend the patch from Johannes setting the ratio of > active/inactive pages to be a userspace tunable A first revision of patch c is attached. I tested assigning different percentages, so far e.g. 50 really behave like before and 25 protects ~42M Buffers in my example which would match the intended behavior - see patch for more details. Checkpatch and some basic function tests went fine. While it may be not perfect yet, I think it is ready for feedback now. > a,b,a+b would then need to be tested if they achieve a better behavior. > > c on the other hand would be a fine tunable to let administrators > (knowing their workloads) or distributions (e.g. different values for > Desktop/Server defaults) adapt their installations. > > In theory a,b and c should work fine together in case we need all of them. > >> The big question is, what workload suffers from >> having the inactive list at 50% of the page cache? >> >> So far the only big problem we have seen is on a >> very unbalanced virtual machine, with 256MB RAM >> and 4 fast disks. The disks simply have more IO >> in flight at once than what fits in the inactive >> list. > > Did I get you right that this means the write case - explaining why it > is building up buffers to the 50% max? > Thinking about it I wondered for what these Buffers are protected. If the intention to save these buffers is for reuse with similar loads I wonder why I "need" three iozones to build up the 85M in my case. Buffers start at ~0, after iozone run 1 they are at ~35, then after #2 ~65 and after run #3 ~85. Shouldn't that either allocate 85M for the first directly in case that much is needed for a single run - or if not the second and third run just "resuse" the 35M Buffers from the first run still held? Note - "1 iozone run" means "iozone ... -i 0" which sequentially writes and then rewrites a 2Gb file on 16 disks in my current case. looking forward especially to patch b as I'd really like to see a kernel able to win back these buffers if they are no more used for a longer period while still allowing to grow&protect them while needed. -- GrA 1/4 sse / regards, Christian Ehrhardt IBM Linux Technology Center, System z Linux Performance