linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-18  9:54 Eric Dumazet
  2007-05-18 18:21 ` Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Dumazet @ 2007-05-18  9:54 UTC (permalink / raw)
  To: Andrew Morton, linux-mm; +Cc: linux kernel, David Miller

alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.

Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.

On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.

We can free all pages that wont be used by the hash table.

On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.

TCP established hash table entries: 32768 (order: 6, 393216 bytes)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..2e0ba08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
 			for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
 				;
 			table = (void*) __get_free_pages(GFP_ATOMIC, order);
+			/*
+			 * If bucketsize is not a power-of-two, we may free
+			 * some pages at the end of hash table.
+			 */
+			if (table) {
+				unsigned long alloc_end = (unsigned long)table +
+						(PAGE_SIZE << order);
+				unsigned long used = (unsigned long)table +
+						PAGE_ALIGN(size);
+				while (used < alloc_end) {
+					free_page(used);
+					used += PAGE_SIZE;
+				}
+			}
 		}
 	} while (!table && size > PAGE_SIZE && --log2qty);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-18  9:54 [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize Eric Dumazet
@ 2007-05-18 18:21 ` Christoph Lameter
  2007-05-19  8:37 ` Andrew Morton
  2007-05-19 18:21 ` William Lee Irwin III
  2 siblings, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2007-05-18 18:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrew Morton, linux-mm, linux kernel, David Miller

On Fri, 18 May 2007, Eric Dumazet wrote:

>  			table = (void*) __get_free_pages(GFP_ATOMIC, order);

ATOMIC? Is there some reason why we need atomic here?

> +			/*
> +			 * If bucketsize is not a power-of-two, we may free
> +			 * some pages at the end of hash table.
> +			 */
> +			if (table) {
> +				unsigned long alloc_end = (unsigned long)table +
> +						(PAGE_SIZE << order);
> +				unsigned long used = (unsigned long)table +
> +						PAGE_ALIGN(size);
> +				while (used < alloc_end) {
> +					free_page(used);

Isnt this going to interfere with the kernel_map_pages debug stuff?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-18  9:54 [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize Eric Dumazet
  2007-05-18 18:21 ` Christoph Lameter
@ 2007-05-19  8:37 ` Andrew Morton
  2007-05-19 18:07   ` Eric Dumazet
  2007-05-19 18:21 ` William Lee Irwin III
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-05-19  8:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-mm, linux kernel, David Miller

On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:

> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
> 
> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
> 
> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.

Watch the 200-column text, please.

> We can free all pages that wont be used by the hash table.
> 
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
> 
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ae96dd8..2e0ba08 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
>  			for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
>  				;
>  			table = (void*) __get_free_pages(GFP_ATOMIC, order);
> +			/*
> +			 * If bucketsize is not a power-of-two, we may free
> +			 * some pages at the end of hash table.
> +			 */
> +			if (table) {
> +				unsigned long alloc_end = (unsigned long)table +
> +						(PAGE_SIZE << order);
> +				unsigned long used = (unsigned long)table +
> +						PAGE_ALIGN(size);
> +				while (used < alloc_end) {
> +					free_page(used);
> +					used += PAGE_SIZE;
> +				}
> +			}
>  		}
>  	} while (!table && size > PAGE_SIZE && --log2qty);
>  

It went BUG.

static inline int put_page_testzero(struct page *page)
{
	VM_BUG_ON(atomic_read(&page->_count) == 0);
	return atomic_dec_and_test(&page->_count);
}

http://userweb.kernel.org/~akpm/s5000523.jpg
http://userweb.kernel.org/~akpm/config-vmm.txt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-19  8:37 ` Andrew Morton
@ 2007-05-19 18:07   ` Eric Dumazet
  2007-05-19 18:54     ` David Miller, Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:07 UTC (permalink / raw)
  To: Andrew Morton, David Howells; +Cc: linux-mm, linux kernel, David Miller

Andrew Morton a ecrit :
> On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
>>
>> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
>>
>> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
> 
> Watch the 200-column text, please.
> 
>> We can free all pages that wont be used by the hash table.
>>
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>>
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> ---
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ae96dd8..2e0ba08 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
>>  			for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
>>  				;
>>  			table = (void*) __get_free_pages(GFP_ATOMIC, order);
>> +			/*
>> +			 * If bucketsize is not a power-of-two, we may free
>> +			 * some pages at the end of hash table.
>> +			 */
>> +			if (table) {
>> +				unsigned long alloc_end = (unsigned long)table +
>> +						(PAGE_SIZE << order);
>> +				unsigned long used = (unsigned long)table +
>> +						PAGE_ALIGN(size);
>> +				while (used < alloc_end) {
>> +					free_page(used);
>> +					used += PAGE_SIZE;
>> +				}
>> +			}
>>  		}
>>  	} while (!table && size > PAGE_SIZE && --log2qty);
>>  
> 
> It went BUG.
> 
> static inline int put_page_testzero(struct page *page)
> {
> 	VM_BUG_ON(atomic_read(&page->_count) == 0);
> 	return atomic_dec_and_test(&page->_count);
> }
> 
> http://userweb.kernel.org/~akpm/s5000523.jpg
> http://userweb.kernel.org/~akpm/config-vmm.txt

I see :(

Maybe David has an idea how this can be done properly ?

ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-18  9:54 [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize Eric Dumazet
  2007-05-18 18:21 ` Christoph Lameter
  2007-05-19  8:37 ` Andrew Morton
@ 2007-05-19 18:21 ` William Lee Irwin III
  2007-05-19 18:41   ` Eric Dumazet
  2 siblings, 1 reply; 9+ messages in thread
From: William Lee Irwin III @ 2007-05-19 18:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrew Morton, linux-mm, linux kernel, David Miller

On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
> alloc_large_system_hash() is called at boot time to allocate space
> for several large hash tables.
> Lately, TCP hash table was changed and its bucketsize is not a
> power-of-two anymore.
> On most setups, alloc_large_system_hash() allocates one big page
> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
> high_order page has a power-of-two size, bigger than the needed size.
> We can free all pages that wont be used by the hash table.
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)

The proper way to do this is to convert the large system hashtable
users to use some data structure / algorithm  other than hashing by
separate chaining.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-19 18:21 ` William Lee Irwin III
@ 2007-05-19 18:41   ` Eric Dumazet
  2007-05-21  8:11     ` William Lee Irwin III
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:41 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-mm, linux kernel, David Miller

William Lee Irwin III a ecrit :
> On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
>> alloc_large_system_hash() is called at boot time to allocate space
>> for several large hash tables.
>> Lately, TCP hash table was changed and its bucketsize is not a
>> power-of-two anymore.
>> On most setups, alloc_large_system_hash() allocates one big page
>> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
>> high_order page has a power-of-two size, bigger than the needed size.
>> We can free all pages that wont be used by the hash table.
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
> 
> The proper way to do this is to convert the large system hashtable
> users to use some data structure / algorithm  other than hashing by
> separate chaining.

No thanks. This was already discussed to death on netdev. To date, hash tables 
are a good compromise.

I dont mind losing part of memory, I prefer to keep good performance when 
handling 1.000.000 or more tcp sessions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-19 18:07   ` Eric Dumazet
@ 2007-05-19 18:54     ` David Miller, Eric Dumazet
  2007-05-19 20:36       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller, Eric Dumazet @ 2007-05-19 18:54 UTC (permalink / raw)
  To: dada1; +Cc: akpm, dhowells, linux-mm, linux-kernel

> Maybe David has an idea how this can be done properly ?
> 
> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2

You need to use __GFP_COMP or similar to make this splitting+freeing
thing work.

Otherwise the individual pages don't have page references, only
the head page of the high-order page will.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-19 18:54     ` David Miller, Eric Dumazet
@ 2007-05-19 20:36       ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2007-05-19 20:36 UTC (permalink / raw)
  To: David Miller, akpm; +Cc: dhowells, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]

David Miller a ecrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Sat, 19 May 2007 20:07:11 +0200
> 
>> Maybe David has an idea how this can be done properly ?
>>
>> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
> 
> You need to use __GFP_COMP or similar to make this splitting+freeing
> thing work.
> 
> Otherwise the individual pages don't have page references, only
> the head page of the high-order page will.
> 

Oh thanks David for the hint.

I added a split_page() call and it seems to work now.


[PATCH] MM : alloc_large_system_hash() can free some memory for non 
power-of-two bucketsize

alloc_large_system_hash() is called at boot time to allocate space for several 
large hash tables.

Lately, TCP hash table was changed and its bucketsize is not a power-of-two 
anymore.

On most setups, alloc_large_system_hash() allocates one big page (order > 0) 
with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a 
power-of-two size, bigger than the needed size.

We can free all pages that wont be used by the hash table.

On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.

TCP established hash table entries: 32768 (order: 6, 393216 bytes)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

[-- Attachment #2: alloc_large.patch --]
[-- Type: text/plain, Size: 823 bytes --]

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..7c219eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,21 @@ void *__init alloc_large_system_hash(const char *tablename,
 			for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
 				;
 			table = (void*) __get_free_pages(GFP_ATOMIC, order);
+			/*
+			 * If bucketsize is not a power-of-two, we may free
+			 * some pages at the end of hash table.
+			 */
+			if (table) {
+				unsigned long alloc_end = (unsigned long)table +
+						(PAGE_SIZE << order);
+				unsigned long used = (unsigned long)table +
+						PAGE_ALIGN(size);
+				split_page(virt_to_page(table), order);
+				while (used < alloc_end) {
+					free_page(used);
+					used += PAGE_SIZE;
+				}
+			}
 		}
 	} while (!table && size > PAGE_SIZE && --log2qty);
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
  2007-05-19 18:41   ` Eric Dumazet
@ 2007-05-21  8:11     ` William Lee Irwin III
  0 siblings, 0 replies; 9+ messages in thread
From: William Lee Irwin III @ 2007-05-21  8:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrew Morton, linux-mm, linux kernel, David Miller

William Lee Irwin III a ?crit :
>> The proper way to do this is to convert the large system hashtable
>> users to use some data structure / algorithm  other than hashing by
>> separate chaining.

On Sat, May 19, 2007 at 08:41:01PM +0200, Eric Dumazet wrote:
> No thanks. This was already discussed to death on netdev. To date, hash 
> tables are a good compromise.
> I dont mind losing part of memory, I prefer to keep good performance when 
> handling 1.000.000 or more tcp sessions.

The data structures perform well enough, but I suppose it's not worth
pushing the issue this way.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-21  8:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-18  9:54 [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize Eric Dumazet
2007-05-18 18:21 ` Christoph Lameter
2007-05-19  8:37 ` Andrew Morton
2007-05-19 18:07   ` Eric Dumazet
2007-05-19 18:54     ` David Miller, Eric Dumazet
2007-05-19 20:36       ` Eric Dumazet
2007-05-19 18:21 ` William Lee Irwin III
2007-05-19 18:41   ` Eric Dumazet
2007-05-21  8:11     ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox