* [patch][rfc] radix-tree: be a nice citizen
@ 2007-08-29 8:50 Nick Piggin
2007-08-29 8:57 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2007-08-29 8:50 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
ISTR that last time I sent you a patch to do the same thing, you
had some objections. I can't remember what they were though, but
I guess you didn't end up merging it.
I was reminded by the problem after seeing an atomic allocation
failure trace from pagecache radix tree inesrtion.
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags. But
the preallocation is squandered when it comes time to allocate a node,
we default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we
don't need to be using.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/lib/radix-tree.c
===================================================================
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -90,11 +90,10 @@ static inline gfp_t root_gfp_mask(struct
static struct radix_tree_node *
radix_tree_node_alloc(struct radix_tree_root *root)
{
- struct radix_tree_node *ret;
+ struct radix_tree_node *ret = NULL;
gfp_t gfp_mask = root_gfp_mask(root);
- ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
- if (ret == NULL && !(gfp_mask & __GFP_WAIT)) {
+ if (!(gfp_mask & __GFP_WAIT)) {
struct radix_tree_preload *rtp;
rtp = &__get_cpu_var(radix_tree_preloads);
@@ -104,6 +103,8 @@ radix_tree_node_alloc(struct radix_tree_
rtp->nr--;
}
}
+ if (ret == NULL)
+ ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
BUG_ON(radix_tree_is_direct_ptr(ret));
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 8:50 [patch][rfc] radix-tree: be a nice citizen Nick Piggin
@ 2007-08-29 8:57 ` Andrew Morton
2007-08-29 9:03 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-29 8:57 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Wed, 29 Aug 2007 10:50:39 +0200 Nick Piggin <npiggin@suse.de> wrote:
> ISTR that last time I sent you a patch to do the same thing, you
> had some objections. I can't remember what they were though, but
> I guess you didn't end up merging it.
So you can't remember what the problem was, and I have to work iyut
out again. Is this efficient?
> I was reminded by the problem after seeing an atomic allocation
> failure trace from pagecache radix tree inesrtion.
wot? radix-tree node allocation for pagecache insertion doesn't fail.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 8:57 ` Andrew Morton
@ 2007-08-29 9:03 ` Nick Piggin
2007-08-29 9:20 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2007-08-29 9:03 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Wed, Aug 29, 2007 at 01:57:02AM -0700, Andrew Morton wrote:
> On Wed, 29 Aug 2007 10:50:39 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > ISTR that last time I sent you a patch to do the same thing, you
> > had some objections. I can't remember what they were though, but
> > I guess you didn't end up merging it.
>
> So you can't remember what the problem was, and I have to work iyut
> out again. Is this efficient?
No, but better than leaving it unfixed. It was years ago. I did try
to find it.
> > I was reminded by the problem after seeing an atomic allocation
> > failure trace from pagecache radix tree inesrtion.
>
> wot? radix-tree node allocation for pagecache insertion doesn't fail.
The atomic allocation that's part of node allocation can.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 9:03 ` Nick Piggin
@ 2007-08-29 9:20 ` Andrew Morton
2007-08-29 9:45 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-29 9:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Wed, 29 Aug 2007 11:03:01 +0200 Nick Piggin <npiggin@suse.de> wrote:
> On Wed, Aug 29, 2007 at 01:57:02AM -0700, Andrew Morton wrote:
> > On Wed, 29 Aug 2007 10:50:39 +0200 Nick Piggin <npiggin@suse.de> wrote:
> >
> > > ISTR that last time I sent you a patch to do the same thing, you
> > > had some objections. I can't remember what they were though, but
> > > I guess you didn't end up merging it.
> >
> > So you can't remember what the problem was, and I have to work iyut
> > out again. Is this efficient?
>
> No, but better than leaving it unfixed. It was years ago. I did try
> to find it.
>
ho hum.
>
> > > I was reminded by the problem after seeing an atomic allocation
> > > failure trace from pagecache radix tree inesrtion.
> >
> > wot? radix-tree node allocation for pagecache insertion doesn't fail.
>
> The atomic allocation that's part of node allocation can.
radix-tree node allocations under add_to_page_cache() can't fail. Or if they
can, there's some bug. IOW, I don't have a clue what you're trying to tell me.
Can we start again?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 9:20 ` Andrew Morton
@ 2007-08-29 9:45 ` Nick Piggin
2007-08-29 22:45 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2007-08-29 9:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Wed, Aug 29, 2007 at 02:20:44AM -0700, Andrew Morton wrote:
> On Wed, 29 Aug 2007 11:03:01 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > On Wed, Aug 29, 2007 at 01:57:02AM -0700, Andrew Morton wrote:
> > > On Wed, 29 Aug 2007 10:50:39 +0200 Nick Piggin <npiggin@suse.de> wrote:
> > >
> > > > ISTR that last time I sent you a patch to do the same thing, you
> > > > had some objections. I can't remember what they were though, but
> > > > I guess you didn't end up merging it.
> > >
> > > So you can't remember what the problem was, and I have to work iyut
> > > out again. Is this efficient?
> >
> > No, but better than leaving it unfixed. It was years ago. I did try
> > to find it.
> >
>
> ho hum.
>
> >
> > > > I was reminded by the problem after seeing an atomic allocation
> > > > failure trace from pagecache radix tree inesrtion.
> > >
> > > wot? radix-tree node allocation for pagecache insertion doesn't fail.
> >
> > The atomic allocation that's part of node allocation can.
>
> radix-tree node allocations under add_to_page_cache() can't fail. Or if they
> can, there's some bug. IOW, I don't have a clue what you're trying to tell me.
>
> Can we start again?
Oh, OK. Yeah I'm sure the radix_tree_insert isn't failing, but the
first kmem_cache_alloc in radix_tree_node_alloc is failing (page
allocator is giving the backtrace). Because it is GFP_ATOMIC and
being done under the spinlock.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 9:45 ` Nick Piggin
@ 2007-08-29 22:45 ` Andrew Morton
2007-08-30 1:22 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-29 22:45 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Wed, 29 Aug 2007 11:45:03 +0200 Nick Piggin <npiggin@suse.de> wrote:
> Yeah I'm sure the radix_tree_insert isn't failing, but the
> first kmem_cache_alloc in radix_tree_node_alloc is failing (page
> allocator is giving the backtrace). Because it is GFP_ATOMIC and
> being done under the spinlock.
OK, that's expected. Add a __GFP_NOWARN to the caller's gfp_t?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-29 22:45 ` Andrew Morton
@ 2007-08-30 1:22 ` Nick Piggin
2007-08-30 2:08 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2007-08-30 1:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Wed, Aug 29, 2007 at 03:45:31PM -0700, Andrew Morton wrote:
> On Wed, 29 Aug 2007 11:45:03 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > Yeah I'm sure the radix_tree_insert isn't failing, but the
> > first kmem_cache_alloc in radix_tree_node_alloc is failing (page
> > allocator is giving the backtrace). Because it is GFP_ATOMIC and
> > being done under the spinlock.
>
> OK, that's expected. Add a __GFP_NOWARN to the caller's gfp_t?
It eats GFP_ATOMIC reserves (and yes, we could ad a ~__GFP_HIGH, but
the allocator still has a small reserve for non-sleeping GFP_KERNEL
allocations, so it would eat that).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-30 1:22 ` Nick Piggin
@ 2007-08-30 2:08 ` Andrew Morton
2007-08-30 3:16 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-30 2:08 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Thu, 30 Aug 2007 03:22:37 +0200 Nick Piggin <npiggin@suse.de> wrote:
> On Wed, Aug 29, 2007 at 03:45:31PM -0700, Andrew Morton wrote:
> > On Wed, 29 Aug 2007 11:45:03 +0200 Nick Piggin <npiggin@suse.de> wrote:
> >
> > > Yeah I'm sure the radix_tree_insert isn't failing, but the
> > > first kmem_cache_alloc in radix_tree_node_alloc is failing (page
> > > allocator is giving the backtrace). Because it is GFP_ATOMIC and
> > > being done under the spinlock.
> >
> > OK, that's expected. Add a __GFP_NOWARN to the caller's gfp_t?
>
> It eats GFP_ATOMIC reserves
Really? The caller does a great pile of GFP_HIGHUSER pagecache allocations
for each page which he allocates for ratnodes. I guess if we're a highmem
machine then we could be low on ZONE_NORMAL, but have plenty of
ZONE_HIGHMEM available, so maybe in that situation the kernel could end up
chewing away a significant amount of the lowmem reserve, dunno.
But I'm more suspecting that your ZONE_NORMAL got eaten by something else
(networking?) and the radix-tree allocation failure you saw was collateral
damage?
> (and yes, we could ad a ~__GFP_HIGH, but
> the allocator still has a small reserve for non-sleeping GFP_KERNEL
> allocations, so it would eat that).
spose so.
I'm still struggling to see whether the value of the proposed fix is worth
the additional overhead?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch][rfc] radix-tree: be a nice citizen
2007-08-30 2:08 ` Andrew Morton
@ 2007-08-30 3:16 ` Nick Piggin
0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-08-30 3:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Wed, Aug 29, 2007 at 07:08:04PM -0700, Andrew Morton wrote:
> On Thu, 30 Aug 2007 03:22:37 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > On Wed, Aug 29, 2007 at 03:45:31PM -0700, Andrew Morton wrote:
> > > On Wed, 29 Aug 2007 11:45:03 +0200 Nick Piggin <npiggin@suse.de> wrote:
> > >
> > > > Yeah I'm sure the radix_tree_insert isn't failing, but the
> > > > first kmem_cache_alloc in radix_tree_node_alloc is failing (page
> > > > allocator is giving the backtrace). Because it is GFP_ATOMIC and
> > > > being done under the spinlock.
> > >
> > > OK, that's expected. Add a __GFP_NOWARN to the caller's gfp_t?
> >
> > It eats GFP_ATOMIC reserves
>
> Really? The caller does a great pile of GFP_HIGHUSER pagecache allocations
> for each page which he allocates for ratnodes. I guess if we're a highmem
> machine then we could be low on ZONE_NORMAL, but have plenty of
> ZONE_HIGHMEM available, so maybe in that situation the kernel could end up
> chewing away a significant amount of the lowmem reserve, dunno.
Yeah, and not just that. We can be allocating pages from ZONE_NORMAL but
nodes from ZONE_DMA, or allocating pages off-node and nodes from atomic
reserves.
> But I'm more suspecting that your ZONE_NORMAL got eaten by something else
> (networking?) and the radix-tree allocation failure you saw was collateral
> damage?
It was, but it reminded me it should be fixed. There is a reasonable
chance it will actually happen now and again with heavy loads and
more than a single zone and node. Barely noticable? Probably, but
every little bit helps.
> > (and yes, we could ad a ~__GFP_HIGH, but
> > the allocator still has a small reserve for non-sleeping GFP_KERNEL
> > allocations, so it would eat that).
>
> spose so.
>
> I'm still struggling to see whether the value of the proposed fix is worth
> the additional overhead?
What aditional overhead? Nothing jumps out at me... we've already touched
the per-cpu data in preload()...
If anything, I would have thought this behaviour is preferable because now
we don't have to worry about potentially taking a heavily contended
zone->lock and a lot of cachelines misses splitting up a huge buddy page to
4K, and filling a pcp->batch worth of pages, all while holding tree_lock ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-08-30 3:16 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-29 8:50 [patch][rfc] radix-tree: be a nice citizen Nick Piggin
2007-08-29 8:57 ` Andrew Morton
2007-08-29 9:03 ` Nick Piggin
2007-08-29 9:20 ` Andrew Morton
2007-08-29 9:45 ` Nick Piggin
2007-08-29 22:45 ` Andrew Morton
2007-08-30 1:22 ` Nick Piggin
2007-08-30 2:08 ` Andrew Morton
2007-08-30 3:16 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox