Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list
@ 1999-06-12  1:02 Kanoj Sarcar
  1999-06-12 10:21 ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Kanoj Sarcar @ 1999-06-12  1:02 UTC (permalink / raw)
  To: linux-mm; +Cc: torvalds

Attached is a patch to mm/page_alloc.c that will report the cumulative
number of alloc and free requests for pages of each size via the
MagicSysRq 'm' command. To turn on the display, you need to add a

#define FREELIST_STAT

to mm/page_alloc.c before the #ifdef FREELIST_STAT line.

On my HP-Kayak 2p ia32 system, the relevant output right after I get a 
login prompt is:

2*4kB (19651, 33223) 1*8kB (373, 299) 1*16kB (2, 0) 0*32kB (2, 0) 1*64kB (0, 0) 0*128kB (0, 0) 0*256kB (1, 0) 0*512kB (0, 0) 0*1024kB (0, 0) 26*2048kB (0, 0) = 53344kB)

And after running a 2.2.9 kernel compile:

183*4kB (510767, 515934) 19*8kB (2480, 2323) 10*16kB (2, 0) 3*32kB (2, 0) 0*64kB (0, 0) 0*128kB (0, 0) 2*256kB (1, 0) 2*512kB (0, 0) 0*1024kB (0, 0) 8*2048kB (0, 0) = 19060kB)

The first number in the bracketed pair is the number of alloc requests, the
second is the number of free requests. (Yes, don't ask me how the frees 
outnumber the allocs for 4K pages, probably some code is asking for bigger 
pages and freeing single pages).

Anyway, this raises some interesting questions about the buddy algorithm.
Is it really worth aggressively coalescing pages on each free? Wouldn't
it be better to lazily coalesce pages (maybe by a kernel thread), or even
on demand? By far, the most number of requests are coming for the 4K pages,
followed by 8K (task/stack pair). A kernel compile is no representative
app, but I would be surprised if there are too many apps/drivers which 
will force bigger page requests, once kernel initialization is complete.
Wouldn't it be better to optimize the more common case?

Not to mention that if we do not do aggressive coalescing, we could think
about maintaining pages freed by shrink_mmap in the SwapCache/filecache,
so that those pages could be reclaimed from the free list on re-reference.

Kanoj
kanoj@engr.sgi.com

--- mm/page_alloc.c	Fri May 14 14:33:29 1999
+++ mm/page_alloc.new	Fri Jun 11 17:24:14 1999
@@ -36,11 +36,32 @@
 #define NR_MEM_LISTS 10
 #endif

+#define FREELIST_STAT
+#ifdef FREELIST_STAT
+#define	alloc_stat_field	unsigned long alloc_stat;
+#define free_stat_field		unsigned long free_stat;
+#define init_stat(area)		(area)->alloc_stat = (area)->free_stat = 0
+#define alloc_stat_inc(area)	(area)->alloc_stat++
+#define free_stat_inc(area)	(area)->free_stat++
+#define alloc_stat_get(area)	(area)->alloc_stat
+#define free_stat_get(area)	(area)->free_stat
+#else
+#define	alloc_stat_field
+#define	free_stat_field
+#define	init_stat(area)
+#define alloc_stat_inc(area)
+#define free_stat_inc(area)
+#define alloc_stat_get(area)	(unsigned long)0
+#define free_stat_get(area)	(unsigned long)0
+#endif
+
 /* The start of this MUST match the start of "struct page" */
 struct free_area_struct {
 	struct page *next;
 	struct page *prev;
 	unsigned int * map;
+	alloc_stat_field
+	free_stat_field
 };

 #define memory_head(x) ((struct page *)(x))
@@ -51,6 +72,7 @@
 {
 	head->next = memory_head(head);
 	head->prev = memory_head(head);
+	init_stat(head);
 }

 static inline void add_mem_queue(struct free_area_struct * head, struct page * entry)
@@ -99,6 +121,8 @@

 	spin_lock_irqsave(&page_alloc_lock, flags);

+	free_stat_inc(free_area + order);
+
 #define list(x) (mem_map+(x))

 	map_nr &= mask;
@@ -236,6 +260,7 @@
 	}
 ok_to_allocate:
 	spin_lock_irqsave(&page_alloc_lock, flags);
+	alloc_stat_inc(free_area + order);
 	RMQUEUE(order, gfp_mask);
 	spin_unlock_irqrestore(&page_alloc_lock, flags);

@@ -277,7 +302,7 @@
 			nr ++;
 		}
 		total += nr * ((PAGE_SIZE>>10) << order);
-		printk("%lu*%lukB ", nr, (unsigned long)((PAGE_SIZE>>10) << order));
+		printk("%lu*%lukB (%lu, %lu) ", nr, (unsigned long)((PAGE_SIZE>>10) << order), alloc_stat_get(free_area + order), free_stat_get(free_area + order));
 	}
 	spin_unlock_irqrestore(&page_alloc_lock, flags);
 	printk("= %lukB)\n", total);
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list
  1999-06-12  1:02 Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list Kanoj Sarcar
@ 1999-06-12 10:21 ` Andi Kleen
  1999-06-14 17:34   ` Kanoj Sarcar
  0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 1999-06-12 10:21 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm, torvalds

On Sat, Jun 12, 1999 at 03:02:18AM +0200, Kanoj Sarcar wrote:
> Anyway, this raises some interesting questions about the buddy algorithm.
> Is it really worth aggressively coalescing pages on each free? Wouldn't
> it be better to lazily coalesce pages (maybe by a kernel thread), or even
> on demand? By far, the most number of requests are coming for the 4K pages,
> followed by 8K (task/stack pair). A kernel compile is no representative
> app, but I would be surprised if there are too many apps/drivers which 
> will force bigger page requests, once kernel initialization is complete.
> Wouldn't it be better to optimize the more common case?

There is a important case ATM that needs bigger blocks allocated from 
bottom half context: NFS packet defragmenting. For a 8K wsize it needs
even 16K blocks (8K payload + the IP/UDP header forces it to the next
buddy size). I guess your statistics would look very different on a nfsroot
machine. Until lazy defragmenting is supported for UDP it is probably 
better not to change it.


-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list
  1999-06-14 17:34   ` Kanoj Sarcar
@ 1999-06-14 16:45     ` Andi Kleen
  0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 1999-06-14 16:45 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Andi Kleen, linux-mm, torvalds

On Mon, Jun 14, 1999 at 07:34:49PM +0200, Kanoj Sarcar wrote:
> > 
> > There is a important case ATM that needs bigger blocks allocated from 
> > bottom half context: NFS packet defragmenting. For a 8K wsize it needs
> > even 16K blocks (8K payload + the IP/UDP header forces it to the next
> > buddy size). I guess your statistics would look very different on a nfsroot
> > machine. Until lazy defragmenting is supported for UDP it is probably 
> > better not to change it.
> > 
> 
> This is the experiment I tried: using automount, I cd'ed into a nfs
> mounted directiory, and copied kernel sources over to the local (client)
> machine. The statistics before and after the copy on the client:
> 
> Before:
> 
> 10*4kB (20993, 34343) 3*8kB (398, 319) 0*16kB (2, 0) 0*32kB (2, 0) 0*64kB (0, 0) 1*128kB (0, 0) 0*256kB (1, 0) 0*512kB (0, 0) 1*1024kB (0, 0) 25*2048kB (0, 0) = 52416kB)
> 
> 
> After:
> 
> 192*4kB (88737, 89889) 27*8kB (744, 405) 3*16kB (2, 0) 0*32kB (2, 0) 0*64kB (0,
> 0) 0*128kB (0, 0) 0*256kB (1, 0) 1*512kB (0, 0) 0*1024kB (0, 0) 0*2048kB (0, 0)
> = 1544kB)
> 
> I am not sure about the wsize though ... maybe someone with access to
> a nfsroot machine can try a quick experiment and publish the results?

You probably used the default of 4K (=8K blocks). 8K wsize often performs
better against other Linux servers.

BTW, I am a bit surprised that you don't have a nfsroot or at least
nfs-/usr machine - it is really handy for experimental kernel testing:
no fscks, no fs corruptions ..


> Btw, if the nfs defrag code is coming from bottom half, it probably has
> logic to handle allocation failures? Andi, could you please send me a
> pointer to the relevant code? 

The relevant code in net/ipv4/ip_fragment.c; called from the IP input
path running in net_bh (net/core/dev.c:net_bh() -> net/ipv4/ip_input.c)

It simply drops the packet then. Remember that IP is unreliable so this 
works always.

In theory it could set a a short retry timer and try again later (because the 
smaller fragments in the frag queue are still there), but this would need
some complicated backlog refeed logic. 

Fixing defragmenting to directly defragment into the target buffer is 
on my list for 2.3; I assume it is on David's list too so I it'll probably
change.



-Andi
-- 
This is like TV. I don't like TV.
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list
  1999-06-12 10:21 ` Andi Kleen
@ 1999-06-14 17:34   ` Kanoj Sarcar
  1999-06-14 16:45     ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Kanoj Sarcar @ 1999-06-14 17:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, torvalds

> 
> There is a important case ATM that needs bigger blocks allocated from 
> bottom half context: NFS packet defragmenting. For a 8K wsize it needs
> even 16K blocks (8K payload + the IP/UDP header forces it to the next
> buddy size). I guess your statistics would look very different on a nfsroot
> machine. Until lazy defragmenting is supported for UDP it is probably 
> better not to change it.
> 

This is the experiment I tried: using automount, I cd'ed into a nfs
mounted directiory, and copied kernel sources over to the local (client)
machine. The statistics before and after the copy on the client:

Before:

10*4kB (20993, 34343) 3*8kB (398, 319) 0*16kB (2, 0) 0*32kB (2, 0) 0*64kB (0, 0) 1*128kB (0, 0) 0*256kB (1, 0) 0*512kB (0, 0) 1*1024kB (0, 0) 25*2048kB (0, 0) = 52416kB)


After:

192*4kB (88737, 89889) 27*8kB (744, 405) 3*16kB (2, 0) 0*32kB (2, 0) 0*64kB (0,
0) 0*128kB (0, 0) 0*256kB (1, 0) 1*512kB (0, 0) 0*1024kB (0, 0) 0*2048kB (0, 0)
= 1544kB)

I am not sure about the wsize though ... maybe someone with access to
a nfsroot machine can try a quick experiment and publish the results?

Btw, if the nfs defrag code is coming from bottom half, it probably has
logic to handle allocation failures? Andi, could you please send me a
pointer to the relevant code? 

Thanks.

Kanoj

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1999-06-14 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-06-12  1:02 Some issues + [PATCH] kanoj-mm8-2.2.9 Show statistics on alloc/free requests for each pagefree list Kanoj Sarcar
1999-06-12 10:21 ` Andi Kleen
1999-06-14 17:34   ` Kanoj Sarcar
1999-06-14 16:45     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox