From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id 820646B00EE for ; Mon, 15 Aug 2011 14:56:18 -0400 (EDT) Received: from hpaq2.eem.corp.google.com (hpaq2.eem.corp.google.com [172.25.149.2]) by smtp-out.google.com with ESMTP id p7FIuAUm007969 for ; Mon, 15 Aug 2011 11:56:10 -0700 Received: from qyk2 (qyk2.prod.google.com [10.241.83.130]) by hpaq2.eem.corp.google.com with ESMTP id p7FIu8b3000354 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 15 Aug 2011 11:56:09 -0700 Received: by qyk2 with SMTP id 2so2886818qyk.15 for ; Mon, 15 Aug 2011 11:56:08 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110815184023.GA16369@quack.suse.cz> References: <1313189245-7197-1-git-send-email-curtw@google.com> <1313189245-7197-2-git-send-email-curtw@google.com> <20110815134846.GB13534@localhost> <20110815184023.GA16369@quack.suse.cz> Date: Mon, 15 Aug 2011 11:56:08 -0700 Message-ID: Subject: Re: [PATCH 2/2 v2] writeback: Add writeback stats for pages written From: Curt Wohlgemuth Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Wu Fengguang , Christoph Hellwig , Andrew Morton , Dave Chinner , Michael Rubin , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" Hi Jan: On Mon, Aug 15, 2011 at 11:40 AM, Jan Kara wrote: > On Mon 15-08-11 10:16:38, Curt Wohlgemuth wrote: >> On Mon, Aug 15, 2011 at 6:48 AM, Wu Fengguang w= rote: >> > Curt, >> > >> > Some thoughts about the interface..before dipping into the code. >> > >> > On Sat, Aug 13, 2011 at 06:47:25AM +0800, Curt Wohlgemuth wrote: >> >> Add a new file, /proc/writeback/stats, which displays >> > >> > That's creating a new top directory in /proc. Do you have plans for >> > adding more files under it? >> >> Good question. =A0We have several files under /proc/writeback in our >> kernels that we created at various times, some of which are probably >> no longer useful, but others seem to be. =A0For example: >> =A0 - congestion: prints # of calls, # of jiffies slept in >> congestion_wait() / io_schedule_timeout() from various call points >> =A0 - threshold_dirty : prints the current global FG threshold >> =A0 - threshold_bg : prints the current global BG threshold >> =A0 - pages_cleaned : prints the # pages sent to writeback -- same as >> 'nr_written' in /proc/vmstat (ours was earlier :-( ) >> =A0 - pages_dirtied (same as nr_dirtied in /proc/vmstat) >> =A0 - prop_vm_XXX : print shift/events from vm_completions and vm_dirtie= s >> >> I'm not sure right now if global FG/BG thresholds appear anywhere in a >> 3.1 kernel; if so, the two threshold files above are superfluous. =A0So >> are the pages_cleaned/dirtied. =A0The prop_vm files have not proven >> useful to me. =A0I think the congestion file has a lot of value, >> especially in an IO-less throttling world... > =A0/sys/kernel/debug/bdi//stats has BdiDirtyThresh, DirtyThresh, and > BackgroundThresh. So we should already expose all you have in the thresho= ld > files. Ah, right, I knew that and overlooked it. I get confused looking at lots of kernel versions and patches at the same time :-) . > Regarding congestion_wait() statistics - do I get right that the numbers > gathered actually depend on the number of threads using the congested > device? They are something like > =A0\sum_{over threads} time_waited_for_bdi > How do you interpret the resulting numbers then? I don't have it by thread; just stupidly as totals, like this: calls: ttfp 11290 time: ttfp 558191 calls: shrink_inactive_list isolated xxx time : shrink_inactive_list isolated xxx calls: shrink_inactive_list lumpy reclaim xxx time : shrink_inactive_list lumpy reclaim xxx calls: balance_pgdat xxx time : balance_pgdat xxx calls: alloc_pages_high_priority xxx time : alloc_pages_high_priority xxx calls: alloc_pages_slowpath xxx time : alloc_pages_slowpath xxx calls: throttle_vm_writeout xxx time : throttle_vm_writeout xxx calls: balance_dirty_pages xxx time : balance_dirty_pages xxx Note that the "call" points above are from a very old (2.6.34 + backports) kernel, but you get the idea. We just wrap congestion_wait() with a routine that takes a 'type' parameter; does the congestion_wait(); and increments the appropriate 'call' stat, and adds to the appropriate 'time' stat the return value from congestion_wait(). For a given workload, you can get an idea for where congestion is adding to delays. I really think that for IO-less balance_dirty_pages(), we need some insight into how long writer threads are being throttled. And tracepoints are great, but not sufficient, IMHO. Thanks, Curt > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Honza > >> >> machine global data for how many pages were cleaned for >> >> which reasons. =A0It also displays some additional counts for >> >> various writeback events. >> >> >> >> These data are also available for each BDI, in >> >> /sys/block//bdi/writeback_stats . >> > >> >> Sample output: >> >> >> >> =A0 =A0page: balance_dirty_pages =A0 =A0 =A0 =A0 =A0 2561544 >> >> =A0 =A0page: background_writeout =A0 =A0 =A0 =A0 =A0 =A0 =A05153 >> >> =A0 =A0page: try_to_free_pages =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 >> >> =A0 =A0page: sync =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A00 >> >> =A0 =A0page: kupdate =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01= 02723 >> >> =A0 =A0page: fdatawrite =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0122877= 9 >> >> =A0 =A0page: laptop_periodic =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = 0 >> >> =A0 =A0page: free_more_memory =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= 0 >> >> =A0 =A0page: fs_free_space =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 0 >> >> =A0 =A0periodic writeback =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= 377 >> >> =A0 =A0single inode wait =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 0 >> >> =A0 =A0writeback_wb wait =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 1 >> > >> > That's already useful data, and could be further extended (in >> > future patches) to answer questions like "what's the writeback >> > efficiency in terms of effective chunk size?" >> > >> > So in future there could be lines like >> > >> > =A0 =A0pages: balance_dirty_pages =A0 =A0 =A0 =A0 =A0 2561544 >> > =A0 =A0chunks: balance_dirty_pages =A0 =A0 =A0 =A0 =A0XXXXXXX >> > =A0 =A0works: balance_dirty_pages =A0 =A0 =A0 =A0 =A0 XXXXXXX >> > >> > or even derived lines like >> > >> > =A0 =A0pages_per_chunk: balance_dirty_pages =A0 =A0 =A0 =A0 XXXXXXX >> > =A0 =A0pages_per_work: balance_dirty_pages =A0 =A0 =A0 =A0 =A0XXXXXXX >> > >> > Another question is, how can the display format be script friendly? >> > The current form looks not easily parse-able at least for "cut".. >> >> I suppose you mean because of the variable number of tokens. =A0Yeah, >> this can be hard. =A0Of course, I always just use "awk '{print $NF}'" >> and it works for me :-) . =A0But I'd be happy to change these to use a >> consistent # of args. >> >> Thanks, >> Curt >> >> >> > Thanks, >> > Fengguang >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org