Re: [Fwd: [PATCH 2/4] cpusets new __GFP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
       [not found] <1121101013.15095.19.camel@localhost>
@ 2005-07-11 17:36 ` Joel Schopp
  2005-07-11 17:49   ` Dave Hansen
  2005-07-12  2:55   ` Paul Jackson
  0 siblings, 2 replies; 13+ messages in thread
From: Joel Schopp @ 2005-07-11 17:36 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Dave Hansen, linux-mm, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 2946 bytes --]

Dave Hansen brought this to my attention.  I've attached the bit of the 
memory fragmentation avoidance you conflict with (I'm working with Mel 
on his patches).  I think we share similar goals, and I wouldn't mind 
changing __GFP_USERRCLM to __GFP_USERALLOC or some neutral name we could 
share.  Anything to increase the chances of fragmentation avoidance 
getting merged is good in my book.

-Joel


>>GFP_USER allocations, and distinguish them from GFP_KERNEL allocations.
>>
>>Allocations (such as GFP_USER) marked GFP_HARDWALL are constrainted to
>>the current tasks cpuset.  Other allocations (such as GFP_KERNEL) can
>>steal from the possibly larger nearest mem_exclusive cpuset ancestor,
>>if memory is tight on every node in the current cpuset.
>>
>>This patch collides with Mel Gorman's patch to reduce fragmentation
>>in the standard buddy allocator, which adds two GFP flags.  At first
>>glance, it seems that his added __GFP_USERRCLM flag could be used in
>>place of the following __GFP_HARDWALL, as they both seem to be set
>>the same way - for GFP_USER and GFP_HIGHUSER.  Perhaps we should call
>>this flag __GFP_USER, rather than some name dependent on its use(s).
> 
> 
> Does this make sense to integrate into your patches?
> 
> Index: linux-2.6-mem_exclusive/include/linux/gfp.h
> ===================================================================
> --- linux-2.6-mem_exclusive.orig/include/linux/gfp.h	2005-07-02 17:42:02.000000000 -0700
> +++ linux-2.6-mem_exclusive/include/linux/gfp.h	2005-07-02 17:43:00.000000000 -0700
> @@ -40,6 +40,7 @@ struct vm_area_struct;
>  #define __GFP_ZERO	0x8000u	/* Return zeroed page on success */
>  #define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
>  #define __GFP_NORECLAIM  0x20000u /* No zone reclaim during page_cache_alloc */
> +#define __GFP_HARDWALL   0x40000u /* Enforce hardwall cpuset memory allocs */
>  
>  #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
>  #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
> @@ -48,14 +49,15 @@ struct vm_area_struct;
>  #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
>  			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
>  			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
> -			__GFP_NOMEMALLOC|__GFP_NORECLAIM)
> +			__GFP_NOMEMALLOC|__GFP_NORECLAIM|__GFP_HARDWALL)
>  
>  #define GFP_ATOMIC	(__GFP_HIGH)
>  #define GFP_NOIO	(__GFP_WAIT)
>  #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
>  #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
> -#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS)
> -#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM)
> +#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
> +#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
> +			 __GFP_HIGHMEM)
>  
>  /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
>     platforms, used as appropriate on others */


[-- Attachment #2: patch-defrag-flags --]
[-- Type: text/plain, Size: 5255 bytes --]

Index: 2.6.13-rc1-mhp1/fs/buffer.c
===================================================================
--- 2.6.13-rc1-mhp1.orig/fs/buffer.c	2005-06-29 15:11:40.%N -0500
+++ 2.6.13-rc1-mhp1/fs/buffer.c	2005-07-06 12:30:55.%N -0500
@@ -1110,7 +1110,8 @@ grow_dev_page(struct block_device *bdev,
 	struct page *page;
 	struct buffer_head *bh;
 
-	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, index,
+				   GFP_NOFS | __GFP_USERRCLM);
 	if (!page)
 		return NULL;
 
@@ -3079,7 +3080,8 @@ static void recalc_bh_state(void)
 	
 struct buffer_head *alloc_buffer_head(unsigned int __nocast gfp_flags)
 {
-	struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags);
+	struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+						   gfp_flags|__GFP_KERNRCLM);
 	if (ret) {
 		preempt_disable();
 		__get_cpu_var(bh_accounting).nr++;
Index: 2.6.13-rc1-mhp1/fs/dcache.c
===================================================================
--- 2.6.13-rc1-mhp1.orig/fs/dcache.c	2005-06-29 15:11:18.%N -0500
+++ 2.6.13-rc1-mhp1/fs/dcache.c	2005-07-06 12:32:00.%N -0500
@@ -719,7 +719,7 @@ struct dentry *d_alloc(struct dentry * p
 	struct dentry *dentry;
 	char *dname;
 
-	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); 
+	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL|__GFP_KERNRCLM);
 	if (!dentry)
 		return NULL;
 
Index: 2.6.13-rc1-mhp1/fs/ext2/super.c
===================================================================
--- 2.6.13-rc1-mhp1.orig/fs/ext2/super.c	2005-06-29 15:11:18.%N -0500
+++ 2.6.13-rc1-mhp1/fs/ext2/super.c	2005-07-06 12:34:16.%N -0500
@@ -138,7 +138,8 @@ static kmem_cache_t * ext2_inode_cachep;
 static struct inode *ext2_alloc_inode(struct super_block *sb)
 {
 	struct ext2_inode_info *ei;
-	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep, SLAB_KERNEL);
+	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep,
+						SLAB_KERNEL|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT2_FS_POSIX_ACL
Index: 2.6.13-rc1-mhp1/fs/ext3/super.c
===================================================================
--- 2.6.13-rc1-mhp1.orig/fs/ext3/super.c	2005-06-29 15:11:18.%N -0500
+++ 2.6.13-rc1-mhp1/fs/ext3/super.c	2005-06-29 16:02:25.%N -0500
@@ -440,7 +440,7 @@ static struct inode *ext3_alloc_inode(st
 {
 	struct ext3_inode_info *ei;
 
-	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS);
+	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT3_FS_POSIX_ACL
Index: 2.6.13-rc1-mhp1/fs/ntfs/inode.c
===================================================================
--- 2.6.13-rc1-mhp1.orig/fs/ntfs/inode.c	2005-06-29 15:11:18.%N -0500
+++ 2.6.13-rc1-mhp1/fs/ntfs/inode.c	2005-07-06 13:10:49.%N -0500
@@ -317,8 +317,8 @@ struct inode *ntfs_alloc_big_inode(struc
 	ntfs_inode *ni;
 
 	ntfs_debug("Entering.");
-	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_big_inode_cache,
-			SLAB_NOFS);
+	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_big_inode_cache,
+					    SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return VFS_I(ni);
@@ -343,7 +343,8 @@ static inline ntfs_inode *ntfs_alloc_ext
 	ntfs_inode *ni;
 
 	ntfs_debug("Entering.");
-	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_inode_cache, SLAB_NOFS);
+	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_inode_cache,
+					    SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return ni;
Index: 2.6.13-rc1-mhp1/include/linux/gfp.h
===================================================================
--- 2.6.13-rc1-mhp1.orig/include/linux/gfp.h	2005-06-29 15:11:35.%N -0500
+++ 2.6.13-rc1-mhp1/include/linux/gfp.h	2005-07-06 12:39:56.%N -0500
@@ -40,22 +40,26 @@ struct vm_area_struct;
 #define __GFP_ZERO	0x8000u	/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
 #define __GFP_NORECLAIM  0x20000u /* No realy zone reclaim during allocation */
+#define __GFP_KERNRCLM 0x40000u /* Kernel page that is easily reclaimable */
+#define __GFP_USERRCLM 0x80000u /* User is a userspace user */
 
-#define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
 
 /* if you forget to add the bitmask here kernel will crash, period */
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_NORECLAIM)
+			__GFP_NOMEMALLOC|__GFP_NORECLAIM| \
+			__GFP_USERRCLM|__GFP_KERNRCLM)
 
 #define GFP_ATOMIC	(__GFP_HIGH)
 #define GFP_NOIO	(__GFP_WAIT)
 #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
 #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
-#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS)
-#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM)
+#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_USERRCLM)
+#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM | \
+			 __GFP_USERRCLM)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-11 17:36 ` [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag] Joel Schopp
@ 2005-07-11 17:49   ` Dave Hansen
  2005-07-12  2:55   ` Paul Jackson
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2005-07-11 17:49 UTC (permalink / raw)
  To: Joel Schopp; +Cc: Paul Jackson, linux-mm, Mel Gorman

On Mon, 2005-07-11 at 12:36 -0500, Joel Schopp wrote:
> Dave Hansen brought this to my attention.  I've attached the bit of the 
> memory fragmentation avoidance you conflict with (I'm working with Mel 
> on his patches).  I think we share similar goals, and I wouldn't mind 
> changing __GFP_USERRCLM to __GFP_USERALLOC or some neutral name we could 
> share.  Anything to increase the chances of fragmentation avoidance 
> getting merged is good in my book.

The nice part about using __GFP_USER as the name is that it describes
how it's going to be used rather than how the kernel is going to treat
it.  Somebody making a random allocator call is much more likely to know
how they're going to use it than how the kernel _should_.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-11 17:36 ` [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag] Joel Schopp
  2005-07-11 17:49   ` Dave Hansen
@ 2005-07-12  2:55   ` Paul Jackson
  2005-07-12  5:24     ` Dave Hansen
  2005-07-12 13:05     ` Mel Gorman
  1 sibling, 2 replies; 13+ messages in thread
From: Paul Jackson @ 2005-07-12  2:55 UTC (permalink / raw)
  To: Joel Schopp; +Cc: haveblue, linux-mm, mel

Joel wrote:
> I wouldn't mind  changing __GFP_USERRCLM to __GFP_USERALLOC
> or some neutral name we could share.

A neutral term would be good.  Since you are ahead of me (being
already in Andrew's tree, while I just made my first linux-mm post),
I figure that means you get to pick the name.  Unless it is seriously
defective for my purposes, I will just accept what is.

Dave wrote:
> The nice part about using __GFP_USER as the name is that it describes
> how it's going to be used rather than how the kernel is going to treat
> it.

Yup - agreed.  Though, in real life, that's hidden beneath the (no
underscore) GFP_USER flag, so it's only a few kernel memory hackers
we will be confusing, not the horde of driver writers.

One question.  I've not actually read the memory fragmentation
avoidance patch, so this might be a stupid question.  That
notwithstanding, do you really need two flags, one KERN and one USER?
Or would one flag be sufficient - to mark USER pages.  Unmarked pages
would be KERN, presumably.  One really only needs 2 bits if one has
3 or 4 states to track -- if that's the case, it's not clear to me
what those 3 or 4 states are (maybe if I actually read the patch it
would be clear ;).

I intended to CC Mel on the original post -- but then forgot to.
Thanks for passing it along to him, Dave.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-12  2:55   ` Paul Jackson
@ 2005-07-12  5:24     ` Dave Hansen
  2005-07-12  6:11       ` Paul Jackson
  2005-07-12 13:05     ` Mel Gorman
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2005-07-12  5:24 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Joel Schopp, linux-mm, mel

On Mon, 2005-07-11 at 19:55 -0700, Paul Jackson wrote:
> One question.  I've not actually read the memory fragmentation
> avoidance patch, so this might be a stupid question.  That
> notwithstanding, do you really need two flags, one KERN and one USER?
> Or would one flag be sufficient - to mark USER pages.  Unmarked pages
> would be KERN, presumably.  One really only needs 2 bits if one has
> 3 or 4 states to track -- if that's the case, it's not clear to me
> what those 3 or 4 states are (maybe if I actually read the patch it
> would be clear ;).

There are four types, but it only consumes two GFP bits.  It's correctly
packed.  

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-12  5:24     ` Dave Hansen
@ 2005-07-12  6:11       ` Paul Jackson
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Jackson @ 2005-07-12  6:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: jschopp, linux-mm, mel

Dave wrote:
> four types ==> two GFP bits

Ok.  Guess I should read the patch to figure out
what these 4 types are (and which subsets thereof
map to my 2 types USER and !USER aka KERN.)

If there is not a surjective function from your
4 types to my 2 types, then I can't so easily
share your GFP bits.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-12  2:55   ` Paul Jackson
  2005-07-12  5:24     ` Dave Hansen
@ 2005-07-12 13:05     ` Mel Gorman
  2005-07-12 20:29       ` Paul Jackson
  1 sibling, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2005-07-12 13:05 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Joel Schopp, haveblue, linux-mm

On Mon, 11 Jul 2005, Paul Jackson wrote:

> Joel wrote:
> > I wouldn't mind  changing __GFP_USERRCLM to __GFP_USERALLOC
> > or some neutral name we could share.
>
> A neutral term would be good.

I agree as having two flags for essentially the same thing is just
a waste. I guess there will be some discussion on what the other
fragmentation flags should be called.

For consistency, we might want to rethink the name of KERNRCLM. I am still
happy with the name as it says "this kernel allocation is something that
will be reclaimed shortly or can be reclaimed on demand" but others might
feel differently.

> Dave wrote:
> > The nice part about using __GFP_USER as the name is that it describes
> > how it's going to be used rather than how the kernel is going to treat
> > it.
>
> Yup - agreed.  Though, in real life, that's hidden beneath the (no
> underscore) GFP_USER flag, so it's only a few kernel memory hackers
> we will be confusing, not the horde of driver writers.
>
> One question.  I've not actually read the memory fragmentation
> avoidance patch, so this might be a stupid question.  That
> notwithstanding, do you really need two flags, one KERN and one USER?

There are two GFP flags to determine three types of allocation

__GFP_USERRCLM => Allocation is a user page or a disk buffer page
__GFP_KERNRCLM => Kernel reclaimable allocation or one that is short-lived

Neither flag set implies a normal kernel allocation that is not expected
to be reclaimed.

Joel, when merging the patches, there is one hack you need to watch out
for. It is important for performance reasons but it is 100% obvious
either.

gfp_flags & (__GFP_KERNRCLM | __GFP_USERRCLM) >> __GFP_TYPE_SHIFT gives
the type of allocation as ALLOC_USERRCLM, ALLOC_KERNRCLM, ALLOC_KERNNORCLM
or ALLOC_FALLBACK. Th ALLOC_* values are used to index into the array of
freelists in the struct zone.

If the USER flag is shared, it means that your patches will be adding the
__GFP_KERNRCLM flag which reverses the values. This means you will also
need to update the values of ALLOC_* to keep the __GFP_TYPE_SHIFT hack
working. Older patches used a switch statement on the flags but it takes a
surprising length of time in a critical path.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-12 13:05     ` Mel Gorman
@ 2005-07-12 20:29       ` Paul Jackson
  2005-07-13 11:15         ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Jackson @ 2005-07-12 20:29 UTC (permalink / raw)
  To: Mel Gorman; +Cc: jschopp, haveblue, linux-mm

Mel wrote:
> Joel, when merging the patches, there is one hack you need to watch out
> for. It is important for performance reasons but it is 100% obvious
> either.

I suspect you meant "it is _not_ 100% obvious" ...

Is there someway that the gfp.h changes could be reworked to make it
100% obvious that these two bits are not separate and independent
bits, but rather are a two bit field, counting an index from 0 to 3?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-12 20:29       ` Paul Jackson
@ 2005-07-13 11:15         ` Mel Gorman
  2005-07-14 11:06           ` Paul Jackson
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2005-07-13 11:15 UTC (permalink / raw)
  To: Paul Jackson; +Cc: jschopp, haveblue, linux-mm

On Tue, 12 Jul 2005, Paul Jackson wrote:

> Mel wrote:
> > Joel, when merging the patches, there is one hack you need to watch out
> > for. It is important for performance reasons but it is 100% obvious
> > either.
>
> I suspect you meant "it is _not_ 100% obvious" ...

Sorry, yes, it is not 100% obvious

> Is there someway that the gfp.h changes could be reworked to make it
> 100% obvious that these two bits are not separate and independent
> bits, but rather are a two bit field, counting an index from 0 to 3?
>

Well, what would people feel is obvious? I will always think it is clear
as I am the source of the confusion.

The two flags are not a two bit field as such, they just get treated as
that to save a few cycles. It could also be done with something like;

index = (!!(gfp_flags & __GFP_KERNRCLM) << 1) || (!!(gfp_flags & __GFP_USERRCLM))

(untested) which means if the bits change position or value, the code
won't care. This is a slightly more expensive, but possibly clearer way to
do things.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-13 11:15         ` Mel Gorman
@ 2005-07-14 11:06           ` Paul Jackson
  2005-07-18 12:32             ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Jackson @ 2005-07-14 11:06 UTC (permalink / raw)
  To: Mel Gorman; +Cc: jschopp, haveblue, linux-mm

Mel wrote:
> Well, what would people feel is obvious?

The lines that you (Mel) add that I am puzzling over ways to clarify are
these added lines in gfp.h:

    +#define __GFP_KERNRCLM  0x20000u  /* Kernel page that is easily reclaimable */
    +#define __GFP_USERRCLM  0x40000u  /* User is a userspace user */

    +#define __GFP_TYPE_SHIFT 17     /* Translate RCLM flags to array index */

and perhaps these added lines in mmzone.h:

    +/* Page allocations are divided into these types */
    +#define ALLOC_TYPES 4
    +#define ALLOC_KERNNORCLM 0
    +#define ALLOC_KERNRCLM 1
    +#define ALLOC_USERRCLM 2
    +#define ALLOC_FALLBACK 3
    +
    +/* Number of bits required to encode the type */
    +#define BITS_PER_ALLOC_TYPE 2

It didn't jump out at me, first pass, that these two GFP bits
were a 2 bit field, not 2 separate and independent bits.  The name
GFP_TYPE_SHIFT is vague.  There are some redundant (interdependent)
defines here.

How about (just brainstorming here) something like the following:

    #define __GFP_RCLM_BITS 0x60000u	/* page reclaim types: see RCLM_* defines */

    /*
     * Reduce buddy heap fragmentation by keeping pages with similar
     * reclaimability behavior together.  The two bit field __GFP_RECLAIMBITS
     * enumerates the following 4 kinds of page reclaimability:
     */
    #define RCLM_NONRECLAIMABLE 0	/* nonreclaimable kernel pages */
    #define RCLM_KERNEL 1		/* reclaimable kernel pages */
    #define RCLM_USER 2			/* reclaimable user pages */
    #define RCLM_FALLBACK 3		/* mark alloc requests when memory low */

    #define RCLM_SHIFT 17		/* Shift __GFP_RECLAIMBITS to RCLM_* values */

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-14 11:06           ` Paul Jackson
@ 2005-07-18 12:32             ` Mel Gorman
  2005-07-18 20:08               ` Joel Schopp
  2005-07-27  8:29               ` Paul Jackson
  0 siblings, 2 replies; 13+ messages in thread
From: Mel Gorman @ 2005-07-18 12:32 UTC (permalink / raw)
  To: Paul Jackson; +Cc: jschopp, haveblue, Linux Memory Management List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1425 bytes --]

On Thu, 14 Jul 2005, Paul Jackson wrote:

> Mel wrote:
> > Well, what would people feel is obvious?
>
> The lines that you (Mel) add that I am puzzling over ways to clarify are
> these added lines in gfp.h:
>
>     +#define __GFP_KERNRCLM  0x20000u  /* Kernel page that is easily reclaimable */
>     +#define __GFP_USERRCLM  0x40000u  /* User is a userspace user */
>
>     +#define __GFP_TYPE_SHIFT 17     /* Translate RCLM flags to array index */
>
> and perhaps these added lines in mmzone.h:
>
>     +/* Page allocations are divided into these types */
>     +#define ALLOC_TYPES 4
>     +#define ALLOC_KERNNORCLM 0
>     +#define ALLOC_KERNRCLM 1
>     +#define ALLOC_USERRCLM 2
>     +#define ALLOC_FALLBACK 3
>     +
>     +/* Number of bits required to encode the type */
>     +#define BITS_PER_ALLOC_TYPE 2
>
> It didn't jump out at me, first pass, that these two GFP bits
> were a 2 bit field, not 2 separate and independent bits.  The name
> GFP_TYPE_SHIFT is vague.  There are some redundant (interdependent)
> defines here.
>
> How about (just brainstorming here) something like the following:
>

That makes sense to me. Taking into account other threads, attached are
patches 01 and 02 from Joels patchset with the different namings and
comments. The main changes are the renaming of __GFP_USERRCLM to
__GFP_USER to be neutral and comments explaining how the RCLM flags are
tied together.

Is this any better?

[-- Attachment #2: 01_gfp_flags --]
[-- Type: TEXT/PLAIN, Size: 5989 bytes --]

diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/fs/buffer.c linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/buffer.c
--- linux-2.6.13-rc3-mhp1/fs/buffer.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/buffer.c	2005-07-18 11:43:11.000000000 +0100
@@ -1119,7 +1119,8 @@ grow_dev_page(struct block_device *bdev,
 	struct page *page;
 	struct buffer_head *bh;
 
-	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, index,
+				   GFP_NOFS | __GFP_USER);
 	if (!page)
 		return NULL;
 
@@ -3044,7 +3045,8 @@ static void recalc_bh_state(void)
 	
 struct buffer_head *alloc_buffer_head(unsigned int __nocast gfp_flags)
 {
-	struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags);
+	struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+						   gfp_flags|__GFP_KERNRCLM);
 	if (ret) {
 		preempt_disable();
 		__get_cpu_var(bh_accounting).nr++;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/fs/dcache.c linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/dcache.c
--- linux-2.6.13-rc3-mhp1/fs/dcache.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/dcache.c	2005-07-18 11:43:11.000000000 +0100
@@ -719,7 +719,7 @@ struct dentry *d_alloc(struct dentry * p
 	struct dentry *dentry;
 	char *dname;
 
-	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); 
+	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL|__GFP_KERNRCLM);
 	if (!dentry)
 		return NULL;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/fs/ext2/super.c linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ext2/super.c
--- linux-2.6.13-rc3-mhp1/fs/ext2/super.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ext2/super.c	2005-07-18 11:43:11.000000000 +0100
@@ -138,7 +138,8 @@ static kmem_cache_t * ext2_inode_cachep;
 static struct inode *ext2_alloc_inode(struct super_block *sb)
 {
 	struct ext2_inode_info *ei;
-	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep, SLAB_KERNEL);
+	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep,
+						SLAB_KERNEL|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT2_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/fs/ext3/super.c linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ext3/super.c
--- linux-2.6.13-rc3-mhp1/fs/ext3/super.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ext3/super.c	2005-07-18 11:43:11.000000000 +0100
@@ -440,7 +440,7 @@ static struct inode *ext3_alloc_inode(st
 {
 	struct ext3_inode_info *ei;
 
-	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS);
+	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT3_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/fs/ntfs/inode.c linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ntfs/inode.c
--- linux-2.6.13-rc3-mhp1/fs/ntfs/inode.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/fs/ntfs/inode.c	2005-07-18 11:43:11.000000000 +0100
@@ -318,7 +318,7 @@ struct inode *ntfs_alloc_big_inode(struc
 
 	ntfs_debug("Entering.");
 	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_big_inode_cache,
-			SLAB_NOFS);
+					    SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return VFS_I(ni);
@@ -343,7 +343,8 @@ static inline ntfs_inode *ntfs_alloc_ext
 	ntfs_inode *ni;
 
 	ntfs_debug("Entering.");
-	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_inode_cache, SLAB_NOFS);
+	ni = (ntfs_inode *)kmem_cache_alloc(ntfs_inode_cache,
+					    SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return ni;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1/include/linux/gfp.h linux-2.6.13-rc3-mhp1-01_gfp_flags/include/linux/gfp.h
--- linux-2.6.13-rc3-mhp1/include/linux/gfp.h	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-01_gfp_flags/include/linux/gfp.h	2005-07-18 11:43:11.000000000 +0100
@@ -41,21 +41,37 @@ struct vm_area_struct;
 #define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
 #define __GFP_NORECLAIM  0x20000u /* No realy zone reclaim during allocation */
 
-#define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
+/*
+ * Allocation type modifiers - Group the allocation types together if possible
+ *
+ * __GFP_USER: Allocation for a user page or a buffer page.
+ *
+ * __GFP_KERNRCLM: Kernel allocation that is either very short-lived or 
+ * 		reclaimable like inode caches
+ *
+ * __GFP_RCLM_BITS: Sum of all the reclaimable bits.
+ */
+				     
+#define __GFP_USER     0x40000u /* Easily reclaimable userspace page */
+#define __GFP_KERNRCLM 0x80000u /* Kernel page that is easily reclaimable */
+#define __GFP_RCLM_BITS (__GFP_USER|__GFP_KERNRCLM) 
+
+#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
 
 /* if you forget to add the bitmask here kernel will crash, period */
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_NORECLAIM)
+			__GFP_NOMEMALLOC|__GFP_KERNRCLM|__GFP_USER)
 
 #define GFP_ATOMIC	(__GFP_HIGH)
 #define GFP_NOIO	(__GFP_WAIT)
 #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
 #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
-#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS)
-#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM)
+#define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_USER)
+#define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM | \
+			 __GFP_USER)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */

[-- Attachment #3: 02_more_defines --]
[-- Type: TEXT/PLAIN, Size: 5859 bytes --]

diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1-01_gfp_flags/include/linux/mmzone.h linux-2.6.13-rc3-mhp1-02_more_defines/include/linux/mmzone.h
--- linux-2.6.13-rc3-mhp1-01_gfp_flags/include/linux/mmzone.h	2005-07-18 12:05:27.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-02_more_defines/include/linux/mmzone.h	2005-07-18 12:24:33.000000000 +0100
@@ -21,6 +21,20 @@
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif
 
+/*
+ * Reduce buddy heap fragmentation by keeping pages with similar
+ * reclaimability behavior together.  The two bit field __GFP_RECLAIMBITS
+ * enumerates the following 4 kinds of page reclaimability:
+ */
+#define RCLM_TYPES 4
+#define RCLM_NORCLM 0
+#define RCLM_USER 1
+#define RCLM_KERN 2
+#define RCLM_FALLBACK 3
+
+#define RCLM_SHIFT 17 /* Shift __GFP_RECLAIMBITS to RCLM_* values */
+#define BITS_PER_RCLM_TYPE 2
+
 struct free_area {
 	struct list_head	free_list;
 	unsigned long		nr_free;
@@ -137,8 +151,47 @@ struct zone {
 	 * free areas of different sizes
 	 */
 	spinlock_t		lock;
-	struct free_area	free_area[MAX_ORDER];
 
+	/*
+	 * free_area to be removed in later patch  as it is replaced by
+	 * free_area_list
+	 */
+	struct free_area        free_area[MAX_ORDER];
+
+#ifndef CONFIG_SPARSEMEM
+	/*
+	 * The map tracks what each 2^MAX_ORDER-1 sized block is being used for.
+	 * Each 2^MAX_ORDER block have pages has BITS_PER_RCLM_TYPE bits in
+	 * this map to remember what the block is for. When a page is freed,
+	 * it's index within this bitmap is calculated in get_pageblock_type()
+	 * This means that pages will always be freed into the correct list in
+	 * free_area_lists
+	 *
+	 * The bits are set when a 2^MAX_ORDER block of pages is split
+	 */
+	unsigned long           *free_area_usemap;
+#endif
+
+	/*
+	 * free_area_lists contains buddies of split MAX_ORDER blocks indexed
+	 * by their intended allocation type, while free_area_global contains
+	 * whole MAX_ORDER blocks that can be used for any allocation type.
+	 */
+	struct free_area        free_area_lists[RCLM_TYPES][MAX_ORDER];
+
+	/*
+	 * A percentage of a zone is reserved for falling back to. Without
+	 * a fallback, memory will slowly fragment over time meaning the
+	 * placement policy only delays the fragmentation problem, not
+	 * fixes it
+	 */
+	unsigned long fallback_reserve;
+
+	/*
+	 * When negative, 2^MAX_ORDER-1 sized blocks of pages will be reserved
+	 * for fallbacks
+	 */
+	long fallback_balance;
 
 	ZONE_PADDING(_pad1_)
 
@@ -230,6 +283,18 @@ struct zone {
 } ____cacheline_maxaligned_in_smp;
 
 
+static inline void inc_reserve_count(struct zone* zone, int type)
+{
+	if(type == RCLM_FALLBACK)
+		zone->fallback_reserve++;
+}
+static inline void dec_reserve_count(struct zone* zone, int type)
+{
+	if(type == RCLM_FALLBACK && zone->fallback_reserve)
+		zone->fallback_reserve--;
+
+}
+
 /*
  * The "priority" of VM scanning is how much of the queues we will scan in one
  * go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
@@ -473,6 +538,9 @@ extern struct pglist_data contig_page_da
 #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
+#if ((SECTION_SIZE_BITS - MAX_ORDER) * BITS_PER_ALLOC) > 64
+#error free_area_usemap is not big enough
+#endif
 
 struct page;
 struct mem_section {
@@ -485,6 +553,7 @@ struct mem_section {
 	 * before using it wrong.
 	 */
 	unsigned long section_mem_map;
+	u64 free_area_usemap;
 };
 
 extern struct mem_section mem_section[NR_MEM_SECTIONS];
@@ -536,6 +605,17 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+static inline unsigned long *pfn_to_usemap(struct zone *zone, unsigned long pfn)
+{
+	return &__pfn_to_section(pfn)->free_area_usemap;
+}
+
+static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+{
+	pfn &= (PAGES_PER_SECTION-1);
+	return (int)((pfn >> (MAX_ORDER-1)) * BITS_PER_RCLM_TYPE);
+}
+
 #define pfn_to_page(pfn) 						\
 ({ 									\
 	unsigned long __pfn = (pfn);					\
@@ -572,6 +652,15 @@ static inline int pfn_valid(unsigned lon
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
+static inline unsigned long *pfn_to_usemap(struct zone *zone, unsigned long pfn)
+{
+	return (zone->free_area_usemap);
+}
+static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+{
+	pfn = pfn - zone->zone_start_pfn;
+	return (int)((pfn >> (MAX_ORDER-1)) * BITS_PER_RCLM_TYPE);
+}
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.13-rc3-mhp1-01_gfp_flags/mm/page_alloc.c linux-2.6.13-rc3-mhp1-02_more_defines/mm/page_alloc.c
--- linux-2.6.13-rc3-mhp1-01_gfp_flags/mm/page_alloc.c	2005-07-13 05:46:46.000000000 +0100
+++ linux-2.6.13-rc3-mhp1-02_more_defines/mm/page_alloc.c	2005-07-18 12:27:09.000000000 +0100
@@ -65,6 +65,20 @@ EXPORT_SYMBOL(totalram_pages);
 EXPORT_SYMBOL(nr_swap_pages);
 
 /*
+ * fallback_allocs contains the fallback types for low memory conditions
+ * where the preferred alloction type if not available.
+ */
+int fallback_allocs[RCLM_TYPES][RCLM_TYPES+1] = {
+	{RCLM_NORCLM,    RCLM_FALLBACK,  RCLM_KERN,   RCLM_USER, -1},
+	{RCLM_KERN,      RCLM_FALLBACK,  RCLM_NORCLM, RCLM_USER, -1},
+	{RCLM_USER,      RCLM_FALLBACK,  RCLM_NORCLM, RCLM_KERN, -1},
+	{RCLM_FALLBACK,  RCLM_NORCLM,    RCLM_KERN,   RCLM_USER, -1}
+};
+static char *type_names[RCLM_TYPES] = { "Kernnel Unreclaimable",
+					 "Kernel Reclaimable",
+					 "User Reclaimable", "Fallback"};
+
+/*
  * Used by page_zone() to look up the address of the struct zone whose
  * id is encoded in the upper bits of page->flags
  */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-18 12:32             ` Mel Gorman
@ 2005-07-18 20:08               ` Joel Schopp
  2005-07-27  8:29               ` Paul Jackson
  1 sibling, 0 replies; 13+ messages in thread
From: Joel Schopp @ 2005-07-18 20:08 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Paul Jackson, haveblue, Linux Memory Management List

> +static char *type_names[RCLM_TYPES] = { "Kernnel Unreclaimable",

You picked up my typo.  Otherwise I've integrated these two patches back 
into my own.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-18 12:32             ` Mel Gorman
  2005-07-18 20:08               ` Joel Schopp
@ 2005-07-27  8:29               ` Paul Jackson
  2005-07-27 11:10                 ` Mel Gorman
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Jackson @ 2005-07-27  8:29 UTC (permalink / raw)
  To: Mel Gorman; +Cc: jschopp, haveblue, linux-mm

Mel wrote:
> That makes sense to me. Taking into account other threads, attached are
> patches 01 and 02 from Joels patchset with the different namings and
> comments. The main changes are the renaming of __GFP_USERRCLM to
> __GFP_USER to be neutral and comments explaining how the RCLM flags are
> tied together.

Ok - gaining.

One thing still confuses me here.  What would it mean to have a gfp
flag with both (__GFP_USER|__GFP_KERNRCLM) bits set?  Is this a valid
gfp flag, or is it just in pfn's that both bits can be set (meaning
FALLBACK)?

If both bits can be set at the same time in a gfp flag, then I don't
think either of the following two comments are accurate:

+#define __GFP_USER     0x40000u /* Easily reclaimable userspace page */
+#define __GFP_KERNRCLM 0x80000u /* Kernel page that is easily reclaimable */

Just looking at the GFP_USER bit and seeing it is set doesn't tell me
for sure it's a userspace page request.  It might be a reclaimable
kernel page that we had to fallback on, right?  Similarly for the
__GFP_KERNRCLM bit.

And if both bits can be set in a gfp flag at the same time, then the
test that _I_ need, for my two flavors of cpuset allocation is not
possible, because I need to distinguish FALLBACK allocations for
USER space requests from FALLBACK allocations for KERNEL space
requests (USER space memory placement is confined more tightly).

Continuing this line of inquiry, what does it mean if neither bit
is set in a gfp flag?  I guess that's a valid gfp flag, and it means
that the request is for non-reclaimable kernel memory.  Is that
right?  If so, fine and this detail doesn't impact my intended use.

But the overloading of both bits set to mean FALLBACK, in the gfp
flag, if that's what you intend here, does seem to make the apparent
flagging userspace requests useless to my purposes, because I want
to treat userspace FALLBACK requests differently than kernelspace
FALLBACKs.  For me, they are still userspace and kernel space.  For
you, they are both FALLBACKs.  If my train of thought here hasn't
gone off the rails, this would mean that I would still need my own
GFP USER flag, and that I would encourage you to reinstate the
RCLM tag on your __GFP_USER* flag, to distinguish it from mine.
That, or perhaps it works to _not_ encode the fallback case in the
gfp flags using the USER|KERN bits both set, but rather have a
separate bit for the FALLBACK case.  I can appreciate that in pfn's
you have to encode this tightly for performance, but I'd be surprised
if you have to do so in gfp flags for performance.

And in any case, the assymmetry of the __GFP_USER and __GFP_KERNRCLM
names is a wart - one gets the RCLM tag and one doesn't.  And the
comment above for __GFP_USER still reflects solely the reclaim use
of this bit, not a more neutral use.

... Please don't send patches as base64 encodings of carriage
return terminated lines.  Patches should be plain text inline,
or at most plain text attachments.  In either case, they should
have newline terminated lines.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag]
  2005-07-27  8:29               ` Paul Jackson
@ 2005-07-27 11:10                 ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2005-07-27 11:10 UTC (permalink / raw)
  To: Paul Jackson; +Cc: jschopp, haveblue, linux-mm

On Wed, 27 Jul 2005, Paul Jackson wrote:

> Mel wrote:
> > That makes sense to me. Taking into account other threads, attached are
> > patches 01 and 02 from Joels patchset with the different namings and
> > comments. The main changes are the renaming of __GFP_USERRCLM to
> > __GFP_USER to be neutral and comments explaining how the RCLM flags are
> > tied together.
>
> Ok - gaining.
>
> One thing still confuses me here.  What would it mean to have a gfp
> flag with both (__GFP_USER|__GFP_KERNRCLM) bits set?  Is this a valid
> gfp flag, or is it just in pfn's that both bits can be set (meaning
> FALLBACK)?
>

I would consider that combination a bug because it does not make sense.
The side-effect is that the allocation type starts as a fallback which
will work, but does not make sense.

> If both bits can be set at the same time in a gfp flag, then I don't
> think either of the following two comments are accurate:
>

They can't, but it is not enforced

> +#define __GFP_USER     0x40000u /* Easily reclaimable userspace page */
> +#define __GFP_KERNRCLM 0x80000u /* Kernel page that is easily reclaimable */
>
> Just looking at the GFP_USER bit and seeing it is set doesn't tell me
> for sure it's a userspace page request.  It might be a reclaimable
> kernel page that we had to fallback on, right?  Similarly for the
> __GFP_KERNRCLM bit.
>

No, the bit is set by the caller. The caller should only have this bit set
if the allocation is a userspace allocation that can be reclaimed. That
said, I have since found that setting the __GFP_USERRCLM bit on GFP_USER
and GFP_HIGHUSER is not the correct thing to do as GFP_USER and
GFP_HIGHUSER can be for allocations that are not reclaimable.

> And if both bits can be set in a gfp flag at the same time, then the
> test that _I_ need, for my two flavors of cpuset allocation is not
> possible, because I need to distinguish FALLBACK allocations for
> USER space requests from FALLBACK allocations for KERNEL space
> requests (USER space memory placement is confined more tightly).
>

If you have to be sure the two bits are not set, then a check can be made
and BUG() called

> Continuing this line of inquiry, what does it mean if neither bit
> is set in a gfp flag?  I guess that's a valid gfp flag, and it means
> that the request is for non-reclaimable kernel memory.  Is that
> right?  If so, fine and this detail doesn't impact my intended use.
>

That is correct

> But the overloading of both bits set to mean FALLBACK,

The reason the fallback bits are needed at all is to flag regions of
physical memory to be used for any time of allocation. i.e. Try to place
allocations in the right place, but failing that, use a fallback region,
failing that, use anywhere at all.

> in the gfp
> flag, if that's what you intend here, does seem to make the apparent
> flagging userspace requests useless to my purposes, because I want
> to treat userspace FALLBACK requests differently than kernelspace
> FALLBACKs.  For me, they are still userspace and kernel space.  For
> you, they are both FALLBACKs.  If my train of thought here hasn't
> gone off the rails, this would mean that I would still need my own
> GFP USER flag, and that I would encourage you to reinstate the
> RCLM tag on your __GFP_USER* flag, to distinguish it from mine.

I am not convinced we need the separate flag yet. Does it make a
difference that both flags should never be specified for an allocation?

We are using the same bit right now for different reasons. In our case, it
determines where the page, in physical memory, the allocated page comes
from. In your case, it determines if the page should be allocated at all

> That, or perhaps it works to _not_ encode the fallback case in the
> gfp flags using the USER|KERN bits both set, but rather have a
> separate bit for the FALLBACK case.  I can appreciate that in pfn's
> you have to encode this tightly for performance, but I'd be surprised
> if you have to do so in gfp flags for performance.
>

Bits are per 2^(MAX_ORDER-1) number of pages, not every page frame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-07-27 11:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1121101013.15095.19.camel@localhost>
2005-07-11 17:36 ` [Fwd: [PATCH 2/4] cpusets new __GFP_HARDWALL flag] Joel Schopp
2005-07-11 17:49   ` Dave Hansen
2005-07-12  2:55   ` Paul Jackson
2005-07-12  5:24     ` Dave Hansen
2005-07-12  6:11       ` Paul Jackson
2005-07-12 13:05     ` Mel Gorman
2005-07-12 20:29       ` Paul Jackson
2005-07-13 11:15         ` Mel Gorman
2005-07-14 11:06           ` Paul Jackson
2005-07-18 12:32             ` Mel Gorman
2005-07-18 20:08               ` Joel Schopp
2005-07-27  8:29               ` Paul Jackson
2005-07-27 11:10                 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox