[RFC][PATCH] IO wait accounting

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] IO wait accounting
@ 2002-05-09  0:55 Rik van Riel
  2002-05-09 14:30 ` Bill Davidsen
  2002-05-14 19:49 ` Kurtis D. Rader
  0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2002-05-09  0:55 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

Hi,

the following patch implements simple IO wait accounting, with the
following two oddities:

1) only page fault IO is currently counted
2) while idle, a tick can be counted as both system time
   and iowait time, hence IO wait time is not substracted
   from idle time (also to ensure backwards compatability
   with procps)

I'm doubting whether or not to change these two issues and if
they should be changed, how should they behave instead ?

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.383.26.6 -> 1.383.26.7
#	  include/linux/mm.h	1.43    -> 1.44
#	 fs/proc/proc_misc.c	1.14    -> 1.15
#	      kernel/timer.c	1.3     -> 1.3.1.1
#	include/linux/kernel_stat.h	1.3     -> 1.4
#	         mm/memory.c	1.52    -> 1.53
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/05/08	riel@mirkwood.rielhome.conectiva	1.383.26.7
# iowait accounting, note that iowait is not substracted from idle time
# since a jiffie can be counted as both system and iowait
# (still untested, -pre7 doesn't boot on my test box)
# --------------------------------------------
#
diff -Nru a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
--- a/fs/proc/proc_misc.c	Wed May  8 21:51:32 2002
+++ b/fs/proc/proc_misc.c	Wed May  8 21:51:32 2002
@@ -266,7 +266,7 @@
 	int i, len;
 	extern unsigned long total_forks;
 	unsigned long jif = jiffies;
-	unsigned int sum = 0, user = 0, nice = 0, system = 0;
+	unsigned int sum = 0, user = 0, nice = 0, system = 0, iowait = 0;
 	int major, disk;

 	for (i = 0 ; i < smp_num_cpus; i++) {
@@ -275,23 +275,26 @@
 		user += kstat.per_cpu_user[cpu];
 		nice += kstat.per_cpu_nice[cpu];
 		system += kstat.per_cpu_system[cpu];
+		iowait += kstat.per_cpu_iowait[cpu];
 #if !defined(CONFIG_ARCH_S390)
 		for (j = 0 ; j < NR_IRQS ; j++)
 			sum += kstat.irqs[cpu][j];
 #endif
 	}

-	len = sprintf(page, "cpu  %u %u %u %lu\n", user, nice, system,
-		      jif * smp_num_cpus - (user + nice + system));
+	len = sprintf(page, "cpu  %u %u %u %lu %lu\n", user, nice, system,
+		      jif * smp_num_cpus - (user + nice + system),
+		      iowait);
 	for (i = 0 ; i < smp_num_cpus; i++)
-		len += sprintf(page + len, "cpu%d %u %u %u %lu\n",
+		len += sprintf(page + len, "cpu%d %u %u %u %lu %u\n",
 			i,
 			kstat.per_cpu_user[cpu_logical_map(i)],
 			kstat.per_cpu_nice[cpu_logical_map(i)],
 			kstat.per_cpu_system[cpu_logical_map(i)],
-			jif - (  kstat.per_cpu_user[cpu_logical_map(i)] \
-				   + kstat.per_cpu_nice[cpu_logical_map(i)] \
-				   + kstat.per_cpu_system[cpu_logical_map(i)]));
+			jif - (  kstat.per_cpu_user[cpu_logical_map(i)]
+				   + kstat.per_cpu_nice[cpu_logical_map(i)]
+				   + kstat.per_cpu_system[cpu_logical_map(i)]),
+			kstat.per_cpu_iowait[cpu_logical_map(i)]);
 	len += sprintf(page + len,
 		"page %u %u\n"
 		"swap %u %u\n"
diff -Nru a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
--- a/include/linux/kernel_stat.h	Wed May  8 21:51:32 2002
+++ b/include/linux/kernel_stat.h	Wed May  8 21:51:32 2002
@@ -18,7 +18,8 @@
 struct kernel_stat {
 	unsigned int per_cpu_user[NR_CPUS],
 	             per_cpu_nice[NR_CPUS],
-	             per_cpu_system[NR_CPUS];
+	             per_cpu_system[NR_CPUS],
+		     per_cpu_iowait[NR_CPUS];
 	unsigned int dk_drive[DK_MAX_MAJOR][DK_MAX_DISK];
 	unsigned int dk_drive_rio[DK_MAX_MAJOR][DK_MAX_DISK];
 	unsigned int dk_drive_wio[DK_MAX_MAJOR][DK_MAX_DISK];
diff -Nru a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h	Wed May  8 21:51:32 2002
+++ b/include/linux/mm.h	Wed May  8 21:51:32 2002
@@ -18,6 +18,7 @@
 extern unsigned long num_mappedpages;
 extern void * high_memory;
 extern int page_cluster;
+extern atomic_t pagefaults_in_progress;

 #include <asm/page.h>
 #include <asm/pgtable.h>
diff -Nru a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c	Wed May  8 21:51:32 2002
+++ b/kernel/timer.c	Wed May  8 21:51:32 2002
@@ -592,8 +592,16 @@
 		else
 			kstat.per_cpu_user[cpu] += user_tick;
 		kstat.per_cpu_system[cpu] += system;
-	} else if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
-		kstat.per_cpu_system[cpu] += system;
+	} else {
+		/*
+		 * No process is running, but if we're handling interrupts
+		 * or processes are waiting on disk IO, we're not really idle.
+		 */
+		if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
+			kstat.per_cpu_system[cpu] += system;
+		if (atomic_read(&pagefaults_in_progress) > 0)
+			kstat.per_cpu_iowait[cpu] += system;
+	}
 }

 /*
diff -Nru a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c	Wed May  8 21:51:32 2002
+++ b/mm/memory.c	Wed May  8 21:51:32 2002
@@ -57,6 +57,7 @@
 unsigned long num_mappedpages;
 void * high_memory;
 struct page *highmem_start_page;
+atomic_t pagefaults_in_progress = ATOMIC_INIT(0);

 /*
  * We special-case the C-O-W ZERO_PAGE, because it's such
@@ -1381,6 +1382,7 @@
 {
 	pgd_t *pgd;
 	pmd_t *pmd;
+	int ret = -1;

 	current->state = TASK_RUNNING;
 	pgd = pgd_offset(mm, address);
@@ -1397,16 +1399,20 @@
 	 * We need the page table lock to synchronize with kswapd
 	 * and the SMP-safe atomic PTE updates.
 	 */
+	atomic_inc(&pagefaults_in_progress);
 	spin_lock(&mm->page_table_lock);
 	pmd = pmd_alloc(mm, pgd, address);

 	if (pmd) {
 		pte_t * pte = pte_alloc(mm, pmd, address);
 		if (pte)
-			return handle_pte_fault(mm, vma, address, write_access, pte);
+			ret = handle_pte_fault(mm, vma, address, write_access, pte);
+		else
+			spin_unlock(&mm->page_table_lock);
 	}
-	spin_unlock(&mm->page_table_lock);
-	return -1;
+
+	atomic_dec(&pagefaults_in_progress);
+	return ret;
 }

 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-09  0:55 [RFC][PATCH] IO wait accounting Rik van Riel
@ 2002-05-09 14:30 ` Bill Davidsen
  2002-05-09 19:08   ` Rik van Riel
  2002-05-14 19:49 ` Kurtis D. Rader
  1 sibling, 1 reply; 13+ messages in thread
From: Bill Davidsen @ 2002-05-09 14:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Wed, 8 May 2002, Rik van Riel wrote:

> the following patch implements simple IO wait accounting, with the
> following two oddities:
> 
> 1) only page fault IO is currently counted
> 2) while idle, a tick can be counted as both system time
>    and iowait time, hence IO wait time is not substracted
>    from idle time (also to ensure backwards compatability
>    with procps)
> 
> I'm doubting whether or not to change these two issues and if
> they should be changed, how should they behave instead ?

I'm delighted that someone else is looking at this as well. I've been
trying to do a similar thing to determine how well tuning disks, using the
vm of the moment, and various RAID configs perform.

I have been simply counting WaitIO ticks when there is (a) no runable
process in the system, and (b) at least one process blocked for disk i/o,
either page or program. And instead of presenting it properly I just
stuffed it in a variable and read it from kmem.

While I don't defend my data presentation (I didn't want to break any
/proc-reading tools) what I was trying to measure is how often would the
system run faster if it had faster disk. And unfortunately the answer was
that with my typical load disk is not a problem if the system has enough
memory. There is always something which wants the CPU. However, disk speed
does make a big change in responsiveness, even though the CPU stays busy. 

I think what is useful is both what I measured, idle time due to disk, and
also some responsiveness value, which would be the sum of wait time for
all processes waiting i/o (ticks times processes waiting). You can
consider if two processes waiting for 50ms is more or less desirable than
one waiting 100ms, of course.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-09 14:30 ` Bill Davidsen
@ 2002-05-09 19:08   ` Rik van Riel
  2002-05-12 19:05     ` Zlatko Calusic
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2002-05-09 19:08 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-mm, linux-kernel

On Thu, 9 May 2002, Bill Davidsen wrote:

> I have been simply counting WaitIO ticks when there is (a) no runable
> process in the system, and (b) at least one process blocked for disk i/o,
> either page or program. And instead of presenting it properly I just
> stuffed it in a variable and read it from kmem.

OK, how did you measure this ?

And should we measure read() waits as well as page faults or
just page faults ?

regards,

Rik
-- 
	http://www.linuxsymposium.org/2002/
"You're one of those condescending OLS attendants"
"Here's a nickle kid.  Go buy yourself a real t-shirt"

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-09 19:08   ` Rik van Riel
@ 2002-05-12 19:05     ` Zlatko Calusic
  2002-05-12 21:14       ` Rik van Riel
  2002-05-13 16:08       ` Bill Davidsen
  0 siblings, 2 replies; 13+ messages in thread
From: Zlatko Calusic @ 2002-05-12 19:05 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Bill Davidsen, linux-mm, linux-kernel

Rik van Riel <riel@conectiva.com.br> writes:
>
> And should we measure read() waits as well as page faults or
> just page faults ?
>

Definitely both. Somewhere on the web was a nice document explaining
how Solaris measures iowait%, I read it few years ago and it was a
great stuff (quite nice explanation).

I'll try to find it, as it could be helpful.
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-12 19:05     ` Zlatko Calusic
@ 2002-05-12 21:14       ` Rik van Riel
  2002-05-13 11:40         ` BALBIR SINGH
  2002-05-13 11:45         ` Zlatko Calusic
  2002-05-13 16:08       ` Bill Davidsen
  1 sibling, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2002-05-12 21:14 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Bill Davidsen, linux-mm, linux-kernel

On Sun, 12 May 2002, Zlatko Calusic wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> >
> > And should we measure read() waits as well as page faults or
> > just page faults ?
>
> Definitely both.

OK, I'll look at a way to implement these stats so that
every IO wait counts as iowait time ... preferably in a
way that doesn't touch the code in too many places ;)

> Somewhere on the web was a nice document explaining
> how Solaris measures iowait%, I read it few years ago and it was a
> great stuff (quite nice explanation).
>
> I'll try to find it, as it could be helpful.

Please, it would be useful to get our info compatible with
theirs so sysadmins can read their statistics the same on
both systems.

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC][PATCH] IO wait accounting
  2002-05-12 21:14       ` Rik van Riel
@ 2002-05-13 11:40         ` BALBIR SINGH
  2002-05-13 13:58           ` Zlatko Calusic
  2002-05-13 14:32           ` Rik van Riel
  2002-05-13 11:45         ` Zlatko Calusic
  1 sibling, 2 replies; 13+ messages in thread
From: BALBIR SINGH @ 2002-05-13 11:40 UTC (permalink / raw)
  To: Rik van Riel, Zlatko Calusic; +Cc: Bill Davidsen, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

I found a URL that you might find useful.

http://sunsite.uakom.sk/sunworldonline/swol-08-1997/swol-08-insidesolaris.ht
ml

Simple and straight forward implementation of a per-cpu iowait statistics
counter.

Balbir

|-----Original Message-----
|From: linux-kernel-owner@vger.kernel.org
|[mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Rik van Riel
|Sent: Monday, May 13, 2002 2:44 AM
|To: Zlatko Calusic
|Cc: Bill Davidsen; linux-mm@kvack.org; linux-kernel@vger.kernel.org
|Subject: Re: [RFC][PATCH] IO wait accounting
|
|
|On Sun, 12 May 2002, Zlatko Calusic wrote:
|> Rik van Riel <riel@conectiva.com.br> writes:
|> >
|> > And should we measure read() waits as well as page faults or
|> > just page faults ?
|>
|> Definitely both.
|
|OK, I'll look at a way to implement these stats so that
|every IO wait counts as iowait time ... preferably in a
|way that doesn't touch the code in too many places ;)
|
|> Somewhere on the web was a nice document explaining
|> how Solaris measures iowait%, I read it few years ago and it was a
|> great stuff (quite nice explanation).
|>
|> I'll try to find it, as it could be helpful.
|
|Please, it would be useful to get our info compatible with
|theirs so sysadmins can read their statistics the same on
|both systems.
|
|kind regards,
|
|Rik
|--
|Bravely reimplemented by the knights who say "NIH".
|
|http://www.surriel.com/		http://distro.conectiva.com/
|
|-
|To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
|the body of a message to majordomo@vger.kernel.org
|More majordomo info at  http://vger.kernel.org/majordomo-info.html
|Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 490 bytes --]

**************************Disclaimer************************************

Information contained in this E-MAIL being proprietary to Wipro Limited
is 'privileged' and 'confidential' and intended for use only by the
individual or entity to which it is addressed. You are notified that any
use, copying or dissemination of the information contained in the E-MAIL
in any manner whatsoever is strictly prohibited.

 ********************************************************************

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-12 21:14       ` Rik van Riel
  2002-05-13 11:40         ` BALBIR SINGH
@ 2002-05-13 11:45         ` Zlatko Calusic
  2002-05-13 13:34           ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Zlatko Calusic @ 2002-05-13 11:45 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Bill Davidsen, linux-mm, linux-kernel

Rik van Riel <riel@conectiva.com.br> writes:

> On Sun, 12 May 2002, Zlatko Calusic wrote:
>> Rik van Riel <riel@conectiva.com.br> writes:
>> >
>> > And should we measure read() waits as well as page faults or
>> > just page faults ?
>>
>> Definitely both.
>
> OK, I'll look at a way to implement these stats so that
> every IO wait counts as iowait time ... preferably in a
> way that doesn't touch the code in too many places ;)
>
>> Somewhere on the web was a nice document explaining
>> how Solaris measures iowait%, I read it few years ago and it was a
>> great stuff (quite nice explanation).
>>
>> I'll try to find it, as it could be helpful.
>
> Please, it would be useful to get our info compatible with
> theirs so sysadmins can read their statistics the same on
> both systems.
>

Yes, that would be nice. Anyway, finding the document I mentioned will
be much harder than I thought. Googling last 15 minutes didn't make
progress. But, I'll keep trying.

Anyway, here is how Aix defines it:

 Average percentage of CPU time that the CPUs were idle during which
 the system had an outstanding disk I/O request. This value may be
 inflated if the actual number of I/O requesting threads is less than
 the number of idling processors.

(http://support.bull.de/download/redbooks/Performance/OptimizingYourSystemPerformance.pdf)

Also, Sun has a nice collection of articles at
http://www.sun.com/sun-on-net/itworld/, and among them
http://www.sun.com/sun-on-net/itworld/UIR981001perf.html which speaks
about wait time, but I'm still searching for a more technical document...
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-13 11:45         ` Zlatko Calusic
@ 2002-05-13 13:34           ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2002-05-13 13:34 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Bill Davidsen, linux-mm, linux-kernel

On Mon, 13 May 2002, Zlatko Calusic wrote:

> Anyway, here is how Aix defines it:
>
>  Average percentage of CPU time that the CPUs were idle during which
>  the system had an outstanding disk I/O request. This value may be
>  inflated if the actual number of I/O requesting threads is less than
>  the number of idling processors.

Ohhh, I ran into this implementation detail, too ;)

I hope that means I'm doing something right.

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-13 11:40         ` BALBIR SINGH
@ 2002-05-13 13:58           ` Zlatko Calusic
  2002-05-13 14:32           ` Rik van Riel
  1 sibling, 0 replies; 13+ messages in thread
From: Zlatko Calusic @ 2002-05-13 13:58 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Rik van Riel, Bill Davidsen, linux-mm, linux-kernel

"BALBIR SINGH" <balbir.singh@wipro.com> writes:

> I found a URL that you might find useful.
>
> http://sunsite.uakom.sk/sunworldonline/swol-08-1997/swol-08-insidesolaris.ht
> ml
>
> Simple and straight forward implementation of a per-cpu iowait statistics
> counter.
>

Yes, yes, yes, that's the article I was talking about.

Thanks Balbir!
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC][PATCH] IO wait accounting
  2002-05-13 11:40         ` BALBIR SINGH
  2002-05-13 13:58           ` Zlatko Calusic
@ 2002-05-13 14:32           ` Rik van Riel
  1 sibling, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2002-05-13 14:32 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Zlatko Calusic, Bill Davidsen, linux-mm, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 598 bytes --]

On Mon, 13 May 2002, BALBIR SINGH wrote:

> http://sunsite.uakom.sk/sunworldonline/swol-08-1997/swol-08-insidesolaris.html
>
> Simple and straight forward implementation of a per-cpu iowait statistics
> counter.

Hehe, so straight forward that I already did this part last
week, before searching around for papers like this.

At least it means the stats will be fully compatible and
sysadmins won't get lost (like they do with the different
meanings of the load average).

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 490 bytes --]

**************************Disclaimer************************************

Information contained in this E-MAIL being proprietary to Wipro Limited
is 'privileged' and 'confidential' and intended for use only by the
individual or entity to which it is addressed. You are notified that any
use, copying or dissemination of the information contained in the E-MAIL
in any manner whatsoever is strictly prohibited.

 ********************************************************************

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-12 19:05     ` Zlatko Calusic
  2002-05-12 21:14       ` Rik van Riel
@ 2002-05-13 16:08       ` Bill Davidsen
  1 sibling, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2002-05-13 16:08 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Rik van Riel, linux-mm, linux-kernel

On Sun, 12 May 2002, Zlatko Calusic wrote:

> Rik van Riel <riel@conectiva.com.br> writes:
> >
> > And should we measure read() waits as well as page faults or
> > just page faults ?
> >
> 
> Definitely both. Somewhere on the web was a nice document explaining
> how Solaris measures iowait%, I read it few years ago and it was a
> great stuff (quite nice explanation).

  I'm out of town so I miss a bit of this, but I agree, what you want time
waiting for IO, total.

  That said, it would probably be useful to keep the first patch
information, since overall disk performance reflects in total IOwait,
while wait VM is useful comparing the several flavors of vm tuning and
enhancement, bot the the implementors and the users, who may have unusual
configurations.

  I hope that write blocks are falling into place as well, because even
though they are less common, you still get programs which build ugly stuff
like a full 700MB CD image in memory and do that last write (or close, or
fsync, etc). This is bad with large memory, and unspeakable with small,
where stuff is being paged in and writen out.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-09  0:55 [RFC][PATCH] IO wait accounting Rik van Riel
  2002-05-09 14:30 ` Bill Davidsen
@ 2002-05-14 19:49 ` Kurtis D. Rader
  2002-05-14 21:38   ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Kurtis D. Rader @ 2002-05-14 19:49 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Wed, 2002-05-08 21:55:21, Rik van Riel wrote:
> I'm doubting whether or not to change these two issues and if
> they should be changed, how should they behave instead ?

On the topic of how this is defined by other UNIXes below I've included
a section from a technical note I wrote which explains how various sar(1)
statistics are derived for DYNIX/ptx. Since the DYNIX/ptx implementation
was derived from SysVR3 I suspect this is how many other UNIXes define it.
You may find this information useful in designing a Linux implementation of
this metric.

SAR -U: REPORT CPU UTILIZATION
------------------------------
    %usr  - percentage of time spent executing user code

    %sys  - percentage of time spent executing kernel code

    %wio  - percentage of time spent waiting for I/O to complete

    %idle - percentage of time spent in the idle loop

Separate user mode, system mode, and idle time counters are maintained for
each CPU and updated 100 times each second via the hardclock() interrupt
handler (see the v_time member of the vmmeter structure).  It is assumed
that the current state of the CPU was in effect for the entire preceding
interval.

Because "sar -u" assumes a uniprocessor configuration sadc(1) sums the
value of each counter across all CPUs then divides by the number of online
CPUs to normalize the values.  Thus the statistics for an interval will be
incorrect if the number of online CPUs changes.

Notice that %wio does not appear in the previous paragraph.  This is
because the %usr, %sys and %idle counters are maintained in the per engine
vmmeter structure while %wio is maintained in the global procstat structure
and is updated one hundred times a second by the todclock() interrupt
handler.

On each todclock() interrupt (100 times per second) the sum of the

    1) number of processes currently waiting on the swap in queue,
    2) number of processes waiting for a page to be brought into memory,
    3) number of processes waiting on filesystem I/O, and
    4) number or processes waiting on physical/raw I/O

is calculated.  The smaller of that value and the number of CPUs currently
idle is added to the procstat.ps_cpuwait counter (sar's %wio).  This means
that wait time is a subset of idle time.  To put it another way: if there
are 10 CPUs and only 1 was executing the idle loop at the last hardclock()
interrupt the idle percentage would be 10%.  If there was also one process
waiting for disk I/O to complete then %wio would be 10% (1 waiting process
over 10 online CPUs).  Sar would report %wio = 10 and %idle = 0.  If there
were no idle CPUs then %wio would be reported as 0 even though there was
one process waiting for I/O to complete.  If there were two processes
waiting for I/O to complete and 10 CPUs online the %wio would be 20% or the
number of idle CPUs divided by the number online; whichever is smaller.

Note that the calculation of wait time is performed asynchronously to the
collection of user/sys/idle time.  Furthermore, because of the way wait
time is calculated it may actually be larger than the idle time.  Sar(1)
deals with this by forcing the wait time to be less than or equal to the
idle time.  It then subtracts the wait time from the idle time.

The rationale for separating out I/O wait time is that since an I/O
operation may complete at any instant, and the process will be marked
runable and begin consuming CPU cycles, the CPUs should not really be
considered idle.  The %wio metric most definitely does not tell you
anything about how busy the disk subsystem is or whether the disks are
overloaded. It can indicate whether or not the workload is I/O bound.  Or,
to look at it another way, %wio is good for tracking how much busier the
CPUs would be if you could make the disk subsystem infinitely fast.
Finally, notice that this metric is reported by the sar(1) "-u" switch, not
the "-d" switch. Now that you understand what this metric indicates it
should be clear why that is.

-- 
Kurtis D. Rader, Systems Support Engineer    email: kdrader@us.ibm.com
IBM xSeries Integrated Technology Services   voice: +1 503-578-3714
15450 SW Koll Pkwy, MS RHE2-513              http://www.ibm.com
Beaverton, OR 97006-6063
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IO wait accounting
  2002-05-14 19:49 ` Kurtis D. Rader
@ 2002-05-14 21:38   ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2002-05-14 21:38 UTC (permalink / raw)
  To: Kurtis D. Rader; +Cc: linux-mm, linux-kernel

On Tue, 14 May 2002, Kurtis D. Rader wrote:

> On the topic of how this is defined by other UNIXes ...


> On each todclock() interrupt (100 times per second) the sum of the
>
>     1) number of processes currently waiting on the swap in queue,
>     2) number of processes waiting for a page to be brought into memory,
>     3) number of processes waiting on filesystem I/O, and
>     4) number or processes waiting on physical/raw I/O
>
> is calculated.  The smaller of that value and the number of CPUs
> currently idle is added to the procstat.ps_cpuwait counter (sar's %wio).
> This means that wait time is a subset of idle time.

This is basically what my patch does, except that it doesn't take
the minimum of the number of threads waiting on IO and the number
of idle CPUs. I'm still thinking about a cheap way to make this
work ...

> The rationale for separating out I/O wait time is that since an I/O
> operation may complete at any instant, and the process will be marked
> runable and begin consuming CPU cycles, the CPUs should not really be
> considered idle.  The %wio metric most definitely does not tell you
> anything about how busy the disk subsystem is or whether the disks are
> overloaded. It can indicate whether or not the workload is I/O bound.  Or,
> to look at it another way, %wio is good for tracking how much busier the
> CPUs would be if you could make the disk subsystem infinitely fast.

Indeed, this would be a good paragraph to copy into the procps
manual ;)

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-05-14 21:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-09  0:55 [RFC][PATCH] IO wait accounting Rik van Riel
2002-05-09 14:30 ` Bill Davidsen
2002-05-09 19:08   ` Rik van Riel
2002-05-12 19:05     ` Zlatko Calusic
2002-05-12 21:14       ` Rik van Riel
2002-05-13 11:40         ` BALBIR SINGH
2002-05-13 13:58           ` Zlatko Calusic
2002-05-13 14:32           ` Rik van Riel
2002-05-13 11:45         ` Zlatko Calusic
2002-05-13 13:34           ` Rik van Riel
2002-05-13 16:08       ` Bill Davidsen
2002-05-14 19:49 ` Kurtis D. Rader
2002-05-14 21:38   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox