linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][1/2] return values shrink_dcache_memory etc
@ 2002-07-20 19:40 Rik van Riel
  2002-07-20 20:11 ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2002-07-20 19:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-mm, Ed Tomlinson

Hi,

this patch, against current 2.5.27, builds on the patch that let
kmem_cache_shrink return the number of pages freed. This value
is used as the return value for shrink_dcache_memory and friends.

This is useful not just for more accurate OOM detection, but also as
a preparation for putting these reclaimable slab pages on the LRU list.
This change was originally done by Ed Tomlinson.

please apply,
thank you,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

 fs/dcache.c |    3 +--
 fs/dquot.c  |    3 +--
 fs/inode.c  |    3 +--
 mm/vmscan.c |    6 +++---
 4 files changed, 6 insertions(+), 9 deletions(-)

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.660   -> 1.661
#	         fs/dcache.c	1.29    -> 1.30
#	          fs/dquot.c	1.43    -> 1.44
#	         mm/vmscan.c	1.85    -> 1.86
#	          fs/inode.c	1.66    -> 1.67
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/07/19	riel@imladris.surriel.com	1.661
# use the return values from shrink_dcache_memory, shrink_icache_memory
# and shrink_dqcache_memory (thanks to Ed Tomlinson)
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Fri Jul 19 18:22:35 2002
+++ b/fs/dcache.c	Fri Jul 19 18:22:35 2002
@@ -603,8 +603,7 @@
 	count = dentry_stat.nr_unused / priority;

 	prune_dcache(count);
-	kmem_cache_shrink(dentry_cache);
-	return 0;
+	return kmem_cache_shrink(dentry_cache);
 }

 #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
diff -Nru a/fs/dquot.c b/fs/dquot.c
--- a/fs/dquot.c	Fri Jul 19 18:22:35 2002
+++ b/fs/dquot.c	Fri Jul 19 18:22:35 2002
@@ -498,8 +498,7 @@
 	count = dqstats.free_dquots / priority;
 	prune_dqcache(count);
 	unlock_kernel();
-	kmem_cache_shrink(dquot_cachep);
-	return 0;
+	return kmem_cache_shrink(dquot_cachep);
 }

 /*
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Fri Jul 19 18:22:35 2002
+++ b/fs/inode.c	Fri Jul 19 18:22:35 2002
@@ -431,8 +431,7 @@
 	count = inodes_stat.nr_unused / priority;

 	prune_icache(count);
-	kmem_cache_shrink(inode_cachep);
-	return 0;
+	return kmem_cache_shrink(inode_cachep);
 }

 /*
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Fri Jul 19 18:22:35 2002
+++ b/mm/vmscan.c	Fri Jul 19 18:22:35 2002
@@ -389,12 +389,12 @@

 	wakeup_bdflush();

-	shrink_dcache_memory(priority, gfp_mask);
+	nr_pages += shrink_dcache_memory(priority, gfp_mask);

 	/* After shrinking the dcache, get rid of unused inodes too .. */
-	shrink_icache_memory(1, gfp_mask);
+	nr_pages += shrink_icache_memory(1, gfp_mask);
 #ifdef CONFIG_QUOTA
-	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
+	nr_pages += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
 #endif

 	return nr_pages;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-20 19:40 [PATCH][1/2] return values shrink_dcache_memory etc Rik van Riel
@ 2002-07-20 20:11 ` Linus Torvalds
  2002-07-20 20:41   ` Rik van Riel
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2002-07-20 20:11 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, Ed Tomlinson


On Sat, 20 Jul 2002, Rik van Riel wrote:
>
> this patch, against current 2.5.27, builds on the patch that let
> kmem_cache_shrink return the number of pages freed. This value
> is used as the return value for shrink_dcache_memory and friends.

I disagree with the whole approach of having shrink_cache() return the
number of pages free.

The number is meaningless, since it has nothing to do with the actual
memory zones that are under pressure (right now, the memory zone is almost
always ZONE_NORMAL, which is correct, but that's just pure luck rather
than anything fundamental).

I'd be much more interested in the "put the cache pages on the dirty list,
and have memory pressure push them out in LRU order" approach. Somebody
already had preliminary patches.

That gets _rid_ of dcache_shrink() and friends, instead of making them
return meaningless numbers.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-20 20:11 ` Linus Torvalds
@ 2002-07-20 20:41   ` Rik van Riel
  2002-07-20 20:53     ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2002-07-20 20:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-mm, Ed Tomlinson

On Sat, 20 Jul 2002, Linus Torvalds wrote:
> On Sat, 20 Jul 2002, Rik van Riel wrote:
> >
> > this patch, against current 2.5.27, builds on the patch that let
> > kmem_cache_shrink return the number of pages freed. This value
> > is used as the return value for shrink_dcache_memory and friends.

> I'd be much more interested in the "put the cache pages on the dirty list,
> and have memory pressure push them out in LRU order" approach. Somebody
> already had preliminary patches.
>
> That gets _rid_ of dcache_shrink() and friends, instead of making them
> return meaningless numbers.

OK, I'll try to forward-port Ed's code to do that from 2.4 to 2.5
this weekend...

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-20 20:41   ` Rik van Riel
@ 2002-07-20 20:53     ` Linus Torvalds
  2002-07-20 21:42       ` Rik van Riel
  2002-07-22  5:04       ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2002-07-20 20:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, Ed Tomlinson


On Sat, 20 Jul 2002, Rik van Riel wrote:
>
> OK, I'll try to forward-port Ed's code to do that from 2.4 to 2.5
> this weekend...

Side note: while I absolutely think that is the right thing to do, that's
also the much more "interesting" change. As a result, I'd be happier if it
went through channels (ie probably Andrew) and had some wider testing
first at least in the form of a CFT on linux-kernel.

[ Or has it already been in 2.4.x in any major tree? (In which case my
  testing argument is lessened to some degree and it's mainly just to
  verify that the forward-port works). ]

Thanks,
		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-20 20:53     ` Linus Torvalds
@ 2002-07-20 21:42       ` Rik van Riel
  2002-07-22  5:04       ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2002-07-20 21:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-mm, Ed Tomlinson

On Sat, 20 Jul 2002, Linus Torvalds wrote:
> On Sat, 20 Jul 2002, Rik van Riel wrote:
> >
> > OK, I'll try to forward-port Ed's code to do that from 2.4 to 2.5
> > this weekend...
>
> Side note: while I absolutely think that is the right thing to do, that's
> also the much more "interesting" change. As a result, I'd be happier if it
> went through channels (ie probably Andrew) and had some wider testing
> first at least in the form of a CFT on linux-kernel.

Absolutely. This idea has been implemented but hasn't yet
received wider testing.

I guess I'll try to get this patch on LWN with a large
CFT attached ;)))

> [ Or has it already been in 2.4.x in any major tree?

Not yet. I'll pass it on through Andrew...

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-20 20:53     ` Linus Torvalds
  2002-07-20 21:42       ` Rik van Riel
@ 2002-07-22  5:04       ` Andrew Morton
  2002-07-22  5:16         ` William Lee Irwin III
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-07-22  5:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-kernel, linux-mm, Ed Tomlinson

Linus Torvalds wrote:
> 
> On Sat, 20 Jul 2002, Rik van Riel wrote:
> >
> > OK, I'll try to forward-port Ed's code to do that from 2.4 to 2.5
> > this weekend...
> 
> Side note: while I absolutely think that is the right thing to do, that's
> also the much more "interesting" change. As a result, I'd be happier if it
> went through channels (ie probably Andrew) and had some wider testing
> first at least in the form of a CFT on linux-kernel.
> 

I'd suggest that we avoid putting any additional changes into
the VM until we have solutions available for:

2: Make it work with pte-highmem  (Bill Irwin is signed up for this)

4: Move the pte_chains into highmem too (Bill, I guess)

6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)


Especially pte_chains in highmem.  Failure to fix this well
is a showstopper for rmap on large ia32 machines, which makes
it a showstopper full stop.

If we can get something in place which works acceptably on Martin
Bligh's machines, and we can see that the gains of rmap (whatever
they are ;)) are worth the as-yet uncoded pains then let's move on.
But until then, adding new stuff to the VM just makes a `patch -R'
harder to do.

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  5:04       ` Andrew Morton
@ 2002-07-22  5:16         ` William Lee Irwin III
  2002-07-22  5:38           ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: William Lee Irwin III @ 2002-07-22  5:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Rik van Riel, linux-kernel, linux-mm, Ed Tomlinson

On Sun, Jul 21, 2002 at 10:04:29PM -0700, Andrew Morton wrote:
> I'd suggest that we avoid putting any additional changes into
> the VM until we have solutions available for:
> 2: Make it work with pte-highmem  (Bill Irwin is signed up for this)
> 4: Move the pte_chains into highmem too (Bill, I guess)
> 6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)
> Especially pte_chains in highmem.  Failure to fix this well
> is a showstopper for rmap on large ia32 machines, which makes
> it a showstopper full stop.

I'll send you an update of my solution for (6), the initial version of
which was posted earlier today, in a separate post.

highpte_chain will do (2) and (4) simultaneously when it's debugged.


On Sun, Jul 21, 2002 at 10:04:29PM -0700, Andrew Morton wrote:
> If we can get something in place which works acceptably on Martin
> Bligh's machines, and we can see that the gains of rmap (whatever
> they are ;)) are worth the as-yet uncoded pains then let's move on.
> But until then, adding new stuff to the VM just makes a `patch -R'
> harder to do.

I have the same kinds of machines and have already been testing with
precisely the many tasks workloads he's concerned about for the sake of
correctness, and efficiency is also a concern here. highpte_chain is
already so high up on my priority queue that all other work is halted.


Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  5:16         ` William Lee Irwin III
@ 2002-07-22  5:38           ` Andrew Morton
  2002-07-22  6:06             ` Martin J. Bligh
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-07-22  5:38 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Linus Torvalds, Rik van Riel, linux-kernel, linux-mm, Ed Tomlinson

William Lee Irwin III wrote:
> 
> On Sun, Jul 21, 2002 at 10:04:29PM -0700, Andrew Morton wrote:
> > I'd suggest that we avoid putting any additional changes into
> > the VM until we have solutions available for:
> > 2: Make it work with pte-highmem  (Bill Irwin is signed up for this)
> > 4: Move the pte_chains into highmem too (Bill, I guess)
> > 6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)
> > Especially pte_chains in highmem.  Failure to fix this well
> > is a showstopper for rmap on large ia32 machines, which makes
> > it a showstopper full stop.
> 
> I'll send you an update of my solution for (6), the initial version of
> which was posted earlier today, in a separate post.

Thanks, Bill.  Yup, I'm playing with pte_chain_mempool at present.

> highpte_chain will do (2) and (4) simultaneously when it's debugged.
> 
> On Sun, Jul 21, 2002 at 10:04:29PM -0700, Andrew Morton wrote:
> > If we can get something in place which works acceptably on Martin
> > Bligh's machines, and we can see that the gains of rmap (whatever
> > they are ;)) are worth the as-yet uncoded pains then let's move on.
> > But until then, adding new stuff to the VM just makes a `patch -R'
> > harder to do.
> 
> I have the same kinds of machines and have already been testing with
> precisely the many tasks workloads he's concerned about for the sake of
> correctness, and efficiency is also a concern here. highpte_chain is
> already so high up on my priority queue that all other work is halted.

OK.  But we're adding non-trivial amounts of new code simply
to get the reverse mapping working as robustly as the virtual
scan.  And we'll always have rmap's additional storage requirements.

At some point we need to make a decision as to whether it's all
worth it.  Right now we do not even have the information on the
pluses side to do this.  That's worrisome.

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  5:38           ` Andrew Morton
@ 2002-07-22  6:06             ` Martin J. Bligh
  2002-07-22  6:46               ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Martin J. Bligh @ 2002-07-22  6:06 UTC (permalink / raw)
  To: Andrew Morton, William Lee Irwin III
  Cc: Linus Torvalds, Rik van Riel, linux-kernel, linux-mm, Ed Tomlinson

>> > If we can get something in place which works acceptably on Martin
>> > Bligh's machines, and we can see that the gains of rmap (whatever
>> > they are ;)) are worth the as-yet uncoded pains then let's move on.
>> > But until then, adding new stuff to the VM just makes a `patch -R'
>> > harder to do.
>> 
>> I have the same kinds of machines and have already been testing with
>> precisely the many tasks workloads he's concerned about for the sake of
>> correctness, and efficiency is also a concern here. highpte_chain is
>> already so high up on my priority queue that all other work is halted.
> 
> OK.  But we're adding non-trivial amounts of new code simply
> to get the reverse mapping working as robustly as the virtual
> scan.  And we'll always have rmap's additional storage requirements.
> 
> At some point we need to make a decision as to whether it's all
> worth it.  Right now we do not even have the information on the
> pluses side to do this.  That's worrisome.

These large NUMA machines should actually be rmap's glory day in the
sun. Per-node kswapd, being able to free mem pressure on one node
easily (without cross-node bouncing), breakup of the lru list into 
smaller chunks, etc. These actually fix some of the biggest problems
that we have right now and are hard to solve in other ways.

The large rmap overheads we still have to kill seem to me to be the
memory usage and the fork overhead. There's also a certain amount of
overhead to managing any more data structures, of course. I think we
know how to kill most of it. I don't think adding highpte_chain is
the correct thing to do ... seems like adding insult to injury. I'd
rather see us drive a silver stake through the problem's heart and
kill it properly ...

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  6:06             ` Martin J. Bligh
@ 2002-07-22  6:46               ` Andrew Morton
  2002-07-22  7:20                 ` Martin J. Bligh
  2002-07-22 13:34                 ` Rik van Riel
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2002-07-22  6:46 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: William Lee Irwin III, Linus Torvalds, Rik van Riel,
	linux-kernel, linux-mm, Ed Tomlinson

"Martin J. Bligh" wrote:
> 
> >> > If we can get something in place which works acceptably on Martin
> >> > Bligh's machines, and we can see that the gains of rmap (whatever
> >> > they are ;)) are worth the as-yet uncoded pains then let's move on.
> >> > But until then, adding new stuff to the VM just makes a `patch -R'
> >> > harder to do.
> >>
> >> I have the same kinds of machines and have already been testing with
> >> precisely the many tasks workloads he's concerned about for the sake of
> >> correctness, and efficiency is also a concern here. highpte_chain is
> >> already so high up on my priority queue that all other work is halted.
> >
> > OK.  But we're adding non-trivial amounts of new code simply
> > to get the reverse mapping working as robustly as the virtual
> > scan.  And we'll always have rmap's additional storage requirements.
> >
> > At some point we need to make a decision as to whether it's all
> > worth it.  Right now we do not even have the information on the
> > pluses side to do this.  That's worrisome.
> 
> These large NUMA machines should actually be rmap's glory day in the
> sun.

"should be".  Sigh.  Be nice to see an "is" one day ;)

> Per-node kswapd, being able to free mem pressure on one node
> easily (without cross-node bouncing), breakup of the lru list into
> smaller chunks, etc. These actually fix some of the biggest problems
> that we have right now and are hard to solve in other ways.
> 
> The large rmap overheads we still have to kill seem to me to be the
> memory usage and the fork overhead. There's also a certain amount of
> overhead to managing any more data structures, of course. I think we
> know how to kill most of it. I don't think adding highpte_chain is
> the correct thing to do ... seems like adding insult to injury. I'd
> rather see us drive a silver stake through the problem's heart and
> kill it properly ...

Well that would be nice.  And by extension, pte-highmem gets a stake
as well.

Do you think that large pages alone would be enough to allow us
to leave pte_chains (and page tables?) in ZONE_NORMAL, or would
shared pagetables also be needed?

Was it purely Oracle which drove pte-highmem, or do you think
that page table and pte_chain consumption could be a problem
on applications which can't/won't use large pages?

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  6:46               ` Andrew Morton
@ 2002-07-22  7:20                 ` Martin J. Bligh
  2002-07-22 14:00                   ` Rik van Riel
  2002-07-22 13:34                 ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Martin J. Bligh @ 2002-07-22  7:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Linus Torvalds, Rik van Riel,
	linux-kernel, linux-mm, Ed Tomlinson, bcrl

> Well that would be nice.  And by extension, pte-highmem gets a stake
> as well.
> 
> Do you think that large pages alone would be enough to allow us
> to leave pte_chains (and page tables?) in ZONE_NORMAL, or would
> shared pagetables also be needed?
> 
> Was it purely Oracle which drove pte-highmem, or do you think

I don't see you can get into pathalogical crap without heavy
sharing of large amounts of data .... without sharing, you're at
a fixed percentage of phys mem - with sharing, I can have more
PTEs needed that I have phys mem. So anything that shares heavily
is the worst problem, and databases seemt to be the worst for this ...

If phys ram is significantly greater than virtual address space, 
you can, of course, still kill yourself quite happily, particularly
as ZONE_NORMAL is full of struct page's etc at this point. But I am
under the impression we can make pte_highmem much nicer by putting
everything into UKVA, and thus just mapping our own processes mappings
rather than everyone in the universe ... giving some sort of solution
to the problem. pte_chains don't do that nicely.

> that page table and pte_chain consumption could be a problem
> on applications which can't/won't use large pages?

Bill and Ben (desperately tempting to bring up English children's
programs at this point ...) were describing some sort of 32Kb
windowing Oracle uses which would render large pages useless
without some funky windowing we apparently used to do for PTX.
Ben ... could you describe this 32Kb stuff?

M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  6:46               ` Andrew Morton
  2002-07-22  7:20                 ` Martin J. Bligh
@ 2002-07-22 13:34                 ` Rik van Riel
  2002-07-22 13:44                   ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2002-07-22 13:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin J. Bligh, William Lee Irwin III, Linus Torvalds,
	linux-kernel, linux-mm, Ed Tomlinson

On Sun, 21 Jul 2002, Andrew Morton wrote:
> "Martin J. Bligh" wrote:

> > These large NUMA machines should actually be rmap's glory day in the
> > sun.
>
> "should be".  Sigh.  Be nice to see an "is" one day ;)

You asked for a "minimal rmap" patch and you got it. ;)

Bill and I actually have code for many of the things listed
but we haven't submitted it yet exactly because everybody
wanted the code merged in small, manageable chunks.

> Do you think that large pages alone would be enough to allow us
> to leave pte_chains (and page tables?) in ZONE_NORMAL, or would
> shared pagetables also be needed?

Large pages should reduce the page table overhead by a factor
of 1024 (or 512 for PAE) and have the same alignment restrictions
that shared page tables have.

OTOH, shared page tables would allow us to map in chunks smaller
than 4MB ... but at what seems like a pretty horrible locking and
accounting complexity, unless somebody comes up with a smart trick.

Apart from both of these we'll also need code to garbage collect
empty page tables so users can't clog up memory by mmaping a page
every 4 MB ;)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22 13:34                 ` Rik van Riel
@ 2002-07-22 13:44                   ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2002-07-22 13:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin J. Bligh, William Lee Irwin III, Linus Torvalds,
	linux-kernel, linux-mm, Ed Tomlinson

On Mon, 22 Jul 2002, Rik van Riel wrote:

> Apart from both of these we'll also need code to garbage collect
> empty page tables so users can't clog up memory by mmaping a page
> every 4 MB ;)

Btw, I've started work on this code already.

Putting the dcache/icache pages on the LRU list in the way
Linus wanted is definately a lower priority thing for me at
this point, especially considering the fact that Ed Tomlinson's
way of having these pages on the LRU seems to work just fine ;)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][1/2] return values shrink_dcache_memory etc
  2002-07-22  7:20                 ` Martin J. Bligh
@ 2002-07-22 14:00                   ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2002-07-22 14:00 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, William Lee Irwin III, Linus Torvalds,
	linux-kernel, linux-mm, Ed Tomlinson, bcrl

On Mon, 22 Jul 2002, Martin J. Bligh wrote:

> > Was it purely Oracle which drove pte-highmem, or do you think
>
> I don't see you can get into pathalogical crap without heavy
> sharing of large amounts of data .... without sharing, you're at
> a fixed percentage of phys mem - with sharing, I can have more
> PTEs needed that I have phys mem.

... for which pte_highmem wouldn't fix the problem, either.

>From what I can see we really want/need 2 complementary
solutions to fix this problem:

1) large pages and/or shared page tables to reduce page
   table overhead, which is a real "solution"

2) page table garbage collection, to reduce the amount
   of (resident?) page tables when the shit hits the fan;
   this is an "emergency" thing to have and we wouldn't
   want to use it continously, but it might be important
   to keep the box alive

Since I've heard that (1) is already in use by some people
I've started working on (2) the moment Linus asked me to put
the dentry/icache pages in the LRU ;)))

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-07-22 14:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-20 19:40 [PATCH][1/2] return values shrink_dcache_memory etc Rik van Riel
2002-07-20 20:11 ` Linus Torvalds
2002-07-20 20:41   ` Rik van Riel
2002-07-20 20:53     ` Linus Torvalds
2002-07-20 21:42       ` Rik van Riel
2002-07-22  5:04       ` Andrew Morton
2002-07-22  5:16         ` William Lee Irwin III
2002-07-22  5:38           ` Andrew Morton
2002-07-22  6:06             ` Martin J. Bligh
2002-07-22  6:46               ` Andrew Morton
2002-07-22  7:20                 ` Martin J. Bligh
2002-07-22 14:00                   ` Rik van Riel
2002-07-22 13:34                 ` Rik van Riel
2002-07-22 13:44                   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox