* ia64 needs to shake memory from quicklists when there is memory pressure.
@ 2005-03-09 17:09 Robin Holt
2005-03-09 17:15 ` Martin J. Bligh
2005-03-09 19:32 ` Andrew Morton
0 siblings, 2 replies; 7+ messages in thread
From: Robin Holt @ 2005-03-09 17:09 UTC (permalink / raw)
To: akpm; +Cc: tony.luck, linux-ia64, linux-mm
Andrew,
I am searching for some direction. I am in the process of pushing
changes to the ia64 page table cache (quicklist) code. One result of
the changes is I end up changing the algorithm for freeing pages from
the quicklist being based on a boot-time calculation of a percentage of
total system memory to a percentage of memory free on the node (whole
system for non-numa) at the time the shrink call is made.
Right now, there are two places that the shrink is invoked. One is
from the tlb_finish_mmu() code which would be immediately after the only
place that items are added to the list. The other is from cpu_idle which
appears to be a carry over from when x86 code was pulled over to ia64.
The purpose for that appears to have been making the sysctl (which has
been removed) take effect in situations where a cpu is never calling
tlb_finish_mmu().
The "ideal" would be to have a node aware slab cache. Since that
is probably a long time coming, I was wondering if there would be
any possibility of getting some sort of hook into wakeup_kswapd(),
kswapd(), or balance_pgdat(). Since the quicklists are maintained per
cpu, we would need to perform an smp_call_function_single() for other
cpus on this node. Is there some mechanism in place already to handle
anything similar to this? Is there a better way to accomplish this?
Can you offer any suggestions?
Thanks,
Robin Holt
PS: Some relevant links.
Discuss the shrink issues:
http://marc.theaimsgroup.com/?l=linux-ia64&m=110990848315823&w=2
The code change to do the free.
http://marc.theaimsgroup.com/?l=linux-ia64&m=110978917715909&w=2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-09 17:09 ia64 needs to shake memory from quicklists when there is memory pressure Robin Holt
@ 2005-03-09 17:15 ` Martin J. Bligh
2005-03-14 16:24 ` Robin Holt
2005-03-09 19:32 ` Andrew Morton
1 sibling, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2005-03-09 17:15 UTC (permalink / raw)
To: Robin Holt, akpm; +Cc: tony.luck, linux-ia64, linux-mm
> The "ideal" would be to have a node aware slab cache. Since that
> is probably a long time coming,
Manfred already did one. Perhaps we can get that going again? would
be useful for more than just this ...
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-09 17:09 ia64 needs to shake memory from quicklists when there is memory pressure Robin Holt
2005-03-09 17:15 ` Martin J. Bligh
@ 2005-03-09 19:32 ` Andrew Morton
2005-03-14 16:40 ` Robin Holt
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2005-03-09 19:32 UTC (permalink / raw)
To: Robin Holt; +Cc: tony.luck, linux-ia64, linux-mm
Robin Holt <holt@sgi.com> wrote:
>
> Andrew,
>
> I am searching for some direction. I am in the process of pushing
> changes to the ia64 page table cache (quicklist) code. One result of
> the changes is I end up changing the algorithm for freeing pages from
> the quicklist being based on a boot-time calculation of a percentage of
> total system memory to a percentage of memory free on the node (whole
> system for non-numa) at the time the shrink call is made.
>
> Right now, there are two places that the shrink is invoked. One is
> from the tlb_finish_mmu() code which would be immediately after the only
> place that items are added to the list. The other is from cpu_idle which
> appears to be a carry over from when x86 code was pulled over to ia64.
> The purpose for that appears to have been making the sysctl (which has
> been removed) take effect in situations where a cpu is never calling
> tlb_finish_mmu().
>
> The "ideal" would be to have a node aware slab cache. Since that
> is probably a long time coming, I was wondering if there would be
> any possibility of getting some sort of hook into wakeup_kswapd(),
> kswapd(), or balance_pgdat(). Since the quicklists are maintained per
> cpu, we would need to perform an smp_call_function_single() for other
> cpus on this node. Is there some mechanism in place already to handle
> anything similar to this? Is there a better way to accomplish this?
> Can you offer any suggestions?
>
Suggest you hook into the existing set_shrinker() API.
Then, in the shrinker callback, perform reclaim of the calling CPU's
node's pages.
Try to return the right numbers from the shrinker callback so that
shrink_slab() will keep this cache balanced wrt all the other ones which it
is managing.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-09 17:15 ` Martin J. Bligh
@ 2005-03-14 16:24 ` Robin Holt
2005-03-14 16:37 ` Martin J. Bligh
0 siblings, 1 reply; 7+ messages in thread
From: Robin Holt @ 2005-03-14 16:24 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Robin Holt, akpm, tony.luck, linux-ia64, linux-mm
On Wed, Mar 09, 2005 at 09:15:10AM -0800, Martin J. Bligh wrote:
> > The "ideal" would be to have a node aware slab cache. Since that
> > is probably a long time coming,
>
> Manfred already did one. Perhaps we can get that going again? would
> be useful for more than just this ...
Is this the kmem_cache_alloc_node() stuff? If so, when I use that
for page table allocations, running AIM7 on a 32 processor machine
consistently live-locks the system. Everybody is waiting inside
kmem_cache_alloc_node() cachep->spinlock.
Is there a different patch I should be looking for?
Thanks,
Robin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-14 16:24 ` Robin Holt
@ 2005-03-14 16:37 ` Martin J. Bligh
0 siblings, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2005-03-14 16:37 UTC (permalink / raw)
To: Robin Holt; +Cc: akpm, tony.luck, linux-ia64, linux-mm
>> > The "ideal" would be to have a node aware slab cache. Since that
>> > is probably a long time coming,
>>
>> Manfred already did one. Perhaps we can get that going again? would
>> be useful for more than just this ...
>
> Is this the kmem_cache_alloc_node() stuff? If so, when I use that
Yes, I think so.
> for page table allocations, running AIM7 on a 32 processor machine
> consistently live-locks the system. Everybody is waiting inside
> kmem_cache_alloc_node() cachep->spinlock.
>
> Is there a different patch I should be looking for?
Don't know of one, but maybe we can get Manfred to fix the existing one ;-)
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-09 19:32 ` Andrew Morton
@ 2005-03-14 16:40 ` Robin Holt
2005-03-14 21:39 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Robin Holt @ 2005-03-14 16:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Robin Holt, tony.luck, linux-ia64, linux-mm
On Wed, Mar 09, 2005 at 11:32:27AM -0800, Andrew Morton wrote:
> Robin Holt <holt@sgi.com> wrote:
> >
> > Andrew,
> >
> > I am searching for some direction. I am in the process of pushing
> > changes to the ia64 page table cache (quicklist) code. One result of
> > the changes is I end up changing the algorithm for freeing pages from
> > the quicklist being based on a boot-time calculation of a percentage of
> > total system memory to a percentage of memory free on the node (whole
> > system for non-numa) at the time the shrink call is made.
> >
> > Right now, there are two places that the shrink is invoked. One is
> > from the tlb_finish_mmu() code which would be immediately after the only
> > place that items are added to the list. The other is from cpu_idle which
> > appears to be a carry over from when x86 code was pulled over to ia64.
> > The purpose for that appears to have been making the sysctl (which has
> > been removed) take effect in situations where a cpu is never calling
> > tlb_finish_mmu().
> >
> > The "ideal" would be to have a node aware slab cache. Since that
> > is probably a long time coming, I was wondering if there would be
> > any possibility of getting some sort of hook into wakeup_kswapd(),
> > kswapd(), or balance_pgdat(). Since the quicklists are maintained per
> > cpu, we would need to perform an smp_call_function_single() for other
> > cpus on this node. Is there some mechanism in place already to handle
> > anything similar to this? Is there a better way to accomplish this?
> > Can you offer any suggestions?
> >
>
> Suggest you hook into the existing set_shrinker() API.
>
> Then, in the shrinker callback, perform reclaim of the calling CPU's
> node's pages.
>
> Try to return the right numbers from the shrinker callback so that
> shrink_slab() will keep this cache balanced wrt all the other ones which it
> is managing.
I wedged a shrinker in which simply does a smp_call_function() to invoke
the cache shrinker. I did modify the shrinker function to return the
number of pages freed, but am currently doing nothing with it as this
will require a spinlock/atomic operation and am not ready to take that
performance hit. The one issue I have is we lose information about
which nodes to shake memory from and therefore end up calling the function
for every node in the system. This appears very heavy handed.
I put this kernel on a machine and ran over the weekend with no issues.
Unfortunately, I do not have any test loads which are really causing much
flushing. Most of the tests result in adequate calls to tlb_finish_mmu()
to keep the page tables in check. Over the weekend, there were only 4
times when the smp_call_function() returned any pages and that turned out
to be only 16 pages from each call. I can not give you percentages,
because the int counter I was using appears to have overflowed.
I am really not convinced this is a problem which needs to be fixed.
The code was written this way for i386, carried to ia64, and operated in
both environments without issue for that length of time. The callout
from cpu_idle() was essentially a freebie and provided a means for the
sysctl to take effect.
Thanks
Robin Holt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ia64 needs to shake memory from quicklists when there is memory pressure.
2005-03-14 16:40 ` Robin Holt
@ 2005-03-14 21:39 ` Andrew Morton
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2005-03-14 21:39 UTC (permalink / raw)
To: Robin Holt; +Cc: tony.luck, linux-ia64, linux-mm
Robin Holt <holt@sgi.com> wrote:
>
> > > The "ideal" would be to have a node aware slab cache. Since that
> > > is probably a long time coming, I was wondering if there would be
> > > any possibility of getting some sort of hook into wakeup_kswapd(),
> > > kswapd(), or balance_pgdat(). Since the quicklists are maintained per
> > > cpu, we would need to perform an smp_call_function_single() for other
> > > cpus on this node. Is there some mechanism in place already to handle
> > > anything similar to this? Is there a better way to accomplish this?
> > > Can you offer any suggestions?
> > >
> >
> > Suggest you hook into the existing set_shrinker() API.
> >
> > Then, in the shrinker callback, perform reclaim of the calling CPU's
> > node's pages.
> >
> > Try to return the right numbers from the shrinker callback so that
> > shrink_slab() will keep this cache balanced wrt all the other ones which it
> > is managing.
>
> I wedged a shrinker in which simply does a smp_call_function() to invoke
> the cache shrinker. I did modify the shrinker function to return the
> number of pages freed, but am currently doing nothing with it as this
> will require a spinlock/atomic operation and am not ready to take that
> performance hit. The one issue I have is we lose information about
> which nodes to shake memory from and therefore end up calling the function
> for every node in the system. This appears very heavy handed.
As I said, "in the shrinker callback, perform reclaim of the calling CPU's
node's pages.". kswapd is already node-affine, as are callers of
try_to_free_pages().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-03-14 21:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-09 17:09 ia64 needs to shake memory from quicklists when there is memory pressure Robin Holt
2005-03-09 17:15 ` Martin J. Bligh
2005-03-14 16:24 ` Robin Holt
2005-03-14 16:37 ` Martin J. Bligh
2005-03-09 19:32 ` Andrew Morton
2005-03-14 16:40 ` Robin Holt
2005-03-14 21:39 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox