Re: Subtle MM bug

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Subtle MM bug
       [not found] <200101080602.WAA02132@pizda.ninka.net>
@ 2001-01-08  6:42 ` Linus Torvalds
  2001-01-08 13:11   ` Marcelo Tosatti
                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-08  6:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: Rik van Riel, Marcelo Tosatti, linux-mm

[ MM people Cc'd, because while I have a plan, I don't have enough time to
  actually put that plan in action. And mayb esomebody can shoot down my
  brilliant plan. ]

On Sun, 7 Jan 2001, David S. Miller wrote:
> 
> BTW, this reminds me.  Now that you keep track of the "all mm's" list
> thingy, you can also keep track of "nr_mms" in the system and do that
> little:
> 
> 	for (i = 0; i < (nr_mms >> priority); i++)
> 		pagetable_scan();
> 
> thing you were talking about last week.

This is the whole reason for making that list in the first place. 

Even more subtle: see the comment in kernel/fork.c about keeping the list
of mm's in order. What I _really_ want to do is something like

void swap_out(void)
{
	for (i = 0; i < (nr_mms >> priority); i++) {
		struct list_head *p;
		struct mm_struct *mm;

		spin_lock(&mmlist_lock);
		p = initmm.mmlist.next;
		if (p != &initmm.mmlist) {
			struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist);

			/* Move it to the back of the queue */
			list_del(p);
			__list_add(p, initmm.mmlist.prev, &initmm.mmlist);
			atomic_inc(&mm->mm_users);
			spin_unlock(&mmlist_lock);

			swap_out_mm(mm);
			continue;
		}
		/* empty mm-list - shouldn't really happen except during bootup */ 
		spin_unlock(&mmlist_lock);
		break;
	}
}

and just get rid of all the logic to try to "find the best mm". It's bogus
anyway: we should get perfectly fair access patterns by just doing
everything in round-robin, and each "swap_out_mm(mm)" would just try to
walk some fixed percentage of the RSS size (say, something like

	count = (mm->rss >> 4)

and be done with it.

Then, with something like the above, we just try to make sure that we scan
the whole virtual memory space every once in a while. Make the "every once
in a while" be some simple heuristic like "try to keep the active list to
less than 50% of all memory". So "try_to_free_memory()" would just start
off with something like

	/*
	 * Too many active pages? That implies that we don't have enough
	 * of a working set for page_launder() to do a good job. Start by
	 * walking the VM space..
	 */
	if ((nr_active_pages >> 1) > total_pages)
		swap_out();

	/*
	 * This is where we actually free memory
	 */
	page_launder(..);

and we'd be all done. (And that "max 50% of all pages should be active"
number was taken out of my ass. AND the above will work really badly if
there is no swap-space, so it needs tweaking - think of it not as a hard
algorithm, but more as a "this is where I think we need to go").

Advantage: it automatically does the right thing: if the reason for the
memory pressure is that we have lots of pages mapped, it will scan the VM
lists. If the reason is that we just have tons of pages cached, it won't
even bother to age the page tables.

Right now we have this cockamamy scheme to try to balance off the lists
against each other, and then at fairly random points we'll get to
"swap_out()" if we haven't found anything nice on the other lists. That's
just not the way to get nice MM behaviour.

I'll bet you $5 USD that the above approach will (a) work fairly and
(b) give much smoother behavior with a much more understandable swap-out
policy.

Of course, I've been wrong before. But I'd like somebody to take a look.

Anybody?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08  6:42 ` Subtle MM bug Linus Torvalds
@ 2001-01-08 13:11   ` Marcelo Tosatti
  2001-01-08 16:42     ` Rik van Riel
  2001-01-08 17:43     ` Linus Torvalds
  2001-01-08 13:57   ` Stephen C. Tweedie
  2001-01-08 16:45   ` Rik van Riel
  2 siblings, 2 replies; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-08 13:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Rik van Riel, linux-mm

On Sun, 7 Jan 2001, Linus Torvalds wrote:

> and just get rid of all the logic to try to "find the best mm". It's bogus
> anyway: we should get perfectly fair access patterns by just doing
> everything in round-robin, and each "swap_out_mm(mm)" would just try to
> walk some fixed percentage of the RSS size (say, something like
> 
> 	count = (mm->rss >> 4)
> 
> and be done with it.

I have the impression that a fixed percentage of the RSS will be a problem
when you have a memory hog (or hogs) running.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08  6:42 ` Subtle MM bug Linus Torvalds
  2001-01-08 13:11   ` Marcelo Tosatti
@ 2001-01-08 13:57   ` Stephen C. Tweedie
  2001-01-08 17:29     ` Linus Torvalds
  2001-01-08 16:45   ` Rik van Riel
  2 siblings, 1 reply; 38+ messages in thread
From: Stephen C. Tweedie @ 2001-01-08 13:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Rik van Riel, Marcelo Tosatti, linux-mm

Hi,

On Sun, Jan 07, 2001 at 10:42:11PM -0800, Linus Torvalds wrote:
> 
> and just get rid of all the logic to try to "find the best mm". It's bogus
> anyway: we should get perfectly fair access patterns by just doing
> everything in round-robin

Definitely.

> Then, with something like the above, we just try to make sure that we scan
> the whole virtual memory space every once in a while. Make the "every once
> in a while" be some simple heuristic like "try to keep the active list to
> less than 50% of all memory".

... which will produce an enormous storm of soft page faults for
workloads involving mmaping large amounts of data or where we have
a lot of space devoted to anonymous pages, such as static
computational workloads.

The idea of an inactive list target is sound, but it needs to be based
on memory pressure: we don't need anything like 50% if we aren't under
any pressure, so compute-bound workloads with large data sets can
achieve stability.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 13:11   ` Marcelo Tosatti
@ 2001-01-08 16:42     ` Rik van Riel
  2001-01-08 17:43     ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2001-01-08 16:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, David S. Miller, linux-mm

On Mon, 8 Jan 2001, Marcelo Tosatti wrote:
> On Sun, 7 Jan 2001, Linus Torvalds wrote:
> 
> > and just get rid of all the logic to try to "find the best mm". It's bogus
> > anyway: we should get perfectly fair access patterns by just doing
> > everything in round-robin, and each "swap_out_mm(mm)" would just try to
> > walk some fixed percentage of the RSS size (say, something like
> > 
> > 	count = (mm->rss >> 4)
> > 
> > and be done with it.
> 
> I have the impression that a fixed percentage of the RSS will be
> a problem when you have a memory hog (or hogs) running.

My RSS ulimit enforcing patches solve this problem in a
very simple way.

If a process is exceeding its RSS limit, we scan ALL pages
from the process. Otherwise, we scan the normal percentage.

Furthermore, I have put a default soft RSS limit of half
of physical memory in the system. This means that when you
have one big runaway process, kswapd will be more agressive
against that process then against others. The fact that it
is a soft limit, OTOH, means that the process can use all
the available memory if there is no memory pressure in the
system...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08  6:42 ` Subtle MM bug Linus Torvalds
  2001-01-08 13:11   ` Marcelo Tosatti
  2001-01-08 13:57   ` Stephen C. Tweedie
@ 2001-01-08 16:45   ` Rik van Riel
  2001-01-08 17:50     ` Linus Torvalds
  2 siblings, 1 reply; 38+ messages in thread
From: Rik van Riel @ 2001-01-08 16:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Marcelo Tosatti, linux-mm

On Sun, 7 Jan 2001, Linus Torvalds wrote:

> 	/*
> 	 * Too many active pages? That implies that we don't have enough
> 	 * of a working set for page_launder() to do a good job. Start by
> 	 * walking the VM space..
> 	 */
> 	if ((nr_active_pages >> 1) > total_pages)
> 		swap_out();
> 
> 	/*
> 	 * This is where we actually free memory
> 	 */
> 	page_launder(..);

Ahhh, but this is NOT the balancing problem we're trying to
pin down in 2.4...

The (possible) problem is in the balancing between swap_out()
and refill_inactive_scan().

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 13:57   ` Stephen C. Tweedie
@ 2001-01-08 17:29     ` Linus Torvalds
  2001-01-08 18:10       ` Stephen C. Tweedie
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2001-01-08 17:29 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: David S. Miller, Rik van Riel, Marcelo Tosatti, linux-mm

On Mon, 8 Jan 2001, Stephen C. Tweedie wrote:
> 
> > Then, with something like the above, we just try to make sure that we scan
> > the whole virtual memory space every once in a while. Make the "every once
> > in a while" be some simple heuristic like "try to keep the active list to
> > less than 50% of all memory".
> 
> ... which will produce an enormous storm of soft page faults for
> workloads involving mmaping large amounts of data or where we have
> a lot of space devoted to anonymous pages, such as static
> computational workloads.

I don't think you'll find that in practice. 

It would obviously trigger only on low-memory code _anyway_ (we don't even
get into "try_to_free_pages()" unless there is memory pressure), so I
think you're _completely_ off the mark here.

Remember: the thing doesn't require that < 50% of memory is in the page
tables. It only says: if 50% or more of memory is in the page tables, we
will always scan the page tables first when we try to find free pages.

If you have a well-behaving application that doesn't even have memory
pressure, but fills up >50% of memory in its VM, nothing will actually
happen in the steady state. It can have 99% of available memory, and not a
single soft page fault.

But think about what happens if you now start up another application? And
think about what SHOULD happen. The 50% ruls is perfectly fine: if we're
starting to swap, we're better off taking soft page faults that give us a
better LRU than letting the MM scrub the same pages over and over because
it effectively only sees a subset of the total pages (with the mapped
pages being "invisible").

The fact is, that we absolutely _have_ to do the VM scan in order for the
inactive lists to be at all representative of the state of affairs. If we
just rely on page_launder() and refill_inactive() as the #1 way to get
free pages, we will never consider anything but the pages that are already
on the lists.

Stephen: have you tried the behaviour of a working set that is dirty in
the VM's and slightly larger than available ram? Not pretty. We do
_really_ well on many loads, but this one we do badly on. And from what
I've been able to see so far, it's because we're just too damn good at
waiting on page_launder() and doing refill_inactive_scan().

There's another advantage to the 50% rule: if we are under memory
pressure, and somebody is dirtying pages in its VM (which is otherwise an
"invisible" event to the kernel), the 50% rule is much more likely to mean
that we actually _see_ the dirtying, and can slow it down.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 13:11   ` Marcelo Tosatti
  2001-01-08 16:42     ` Rik van Riel
@ 2001-01-08 17:43     ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-08 17:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Marcelo Tosatti wrote:
> 
> On Sun, 7 Jan 2001, Linus Torvalds wrote:
> 
> > and just get rid of all the logic to try to "find the best mm". It's bogus
> > anyway: we should get perfectly fair access patterns by just doing
> > everything in round-robin, and each "swap_out_mm(mm)" would just try to
> > walk some fixed percentage of the RSS size (say, something like
> > 
> > 	count = (mm->rss >> 4)
> > 
> > and be done with it.
> 
> I have the impression that a fixed percentage of the RSS will be a problem
> when you have a memory hog (or hogs) running.

Nothing but testing can prove it, but I don't think that's really an
issue.

Remember: we're not actually swapping stuff out any more in VM scanning.
We're just saying "we're low on memory, let's evict the page tables so
that we _could_ swap stuff out if necessary". We're going to have to evict
_something_, and walking the page tables really gives us a lot better
knowledge of WHAT to evict.

The cost of scanning the VM is (a) the cost of scanning itself and (b) the
cost of soft-faults and CPU TLB invalidate cross-calls for the scanning.
Both of which might be noticeable - but I have this fairly strong feeling
that neither of them is big enough to offset the cost of paging out the
wrong page. Which we definitely do now - I've got some simple
test-programs that have a VM footprint that is not _that_ much more than
the available memory, and they _really_ show problems.

(The "lots of dirty pages" case is not the common case under most loads,
so the fact that 2.4.0 has some performance problems with it was not a
show-stopper for me - during my testing with low memory most loads were
very nice indeed).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 16:45   ` Rik van Riel
@ 2001-01-08 17:50     ` Linus Torvalds
  2001-01-08 18:21       ` Rik van Riel
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2001-01-08 17:50 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, Marcelo Tosatti, linux-mm


On Mon, 8 Jan 2001, Rik van Riel wrote:

> On Sun, 7 Jan 2001, Linus Torvalds wrote:
> 
> > 	/*
> > 	 * Too many active pages? That implies that we don't have enough
> > 	 * of a working set for page_launder() to do a good job. Start by
> > 	 * walking the VM space..
> > 	 */
> > 	if ((nr_active_pages >> 1) > total_pages)
> > 		swap_out();
> > 
> > 	/*
> > 	 * This is where we actually free memory
> > 	 */
> > 	page_launder(..);
> 
> Ahhh, but this is NOT the balancing problem we're trying to
> pin down in 2.4...
> 
> The (possible) problem is in the balancing between swap_out()
> and refill_inactive_scan().

That _is_ the problem the above will fix. Don't read "page_launder()"
there: it's more meant to be "this is the old code that does
page_launder() etc.."

Trust me. Try my code. It will work.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 17:29     ` Linus Torvalds
@ 2001-01-08 18:10       ` Stephen C. Tweedie
  2001-01-08 21:52         ` Marcelo Tosatti
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen C. Tweedie @ 2001-01-08 18:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel,
	Marcelo Tosatti, linux-mm

On Mon, Jan 08, 2001 at 09:29:15AM -0800, Linus Torvalds wrote:
> On Mon, 8 Jan 2001, Stephen C. Tweedie wrote:

> If you have a well-behaving application that doesn't even have memory
> pressure, but fills up >50% of memory in its VM, nothing will actually
> happen in the steady state. It can have 99% of available memory, and not a
> single soft page fault.

Agreed, but that's not how I read your statement about scanning the VM
regularly.  The problem happens if you are working happily with enough
free memory and you suddenly need a large amount of allocation: having
some relatively uptodate page age information may give you a _much_
better idea of what to page out.

Rik was going to experiment with this --- Rik, do you have any hard
numbers for the benefit of maintaining a background page aging task?

> But think about what happens if you now start up another application? And
> think about what SHOULD happen. The 50% ruls is perfectly fine: 

Right, I interpreted your 50% as a steady-state limit.

> Stephen: have you tried the behaviour of a working set that is dirty in
> the VM's and slightly larger than available ram? Not pretty. 

Yes, and this is something that Marcelo's swap clustering code ought
to be ideal for.

> _really_ well on many loads, but this one we do badly on. And from what
> I've been able to see so far, it's because we're just too damn good at
> waiting on page_launder() and doing refill_inactive_scan().

do_try_to_free_pages() is trying to

	/*
	 * If needed, we move pages from the active list
	 * to the inactive list. We also "eat" pages from
	 * the inode and dentry cache whenever we do this.
	 */
	if (free_shortage() || inactive_shortage()) {
		shrink_dcache_memory(6, gfp_mask);
		shrink_icache_memory(6, gfp_mask);
		ret += refill_inactive(gfp_mask, user);
	} else {

So we're refilling the inactive list regardless of its current size
whenever free_shortage() is true.  In the situation you describe,
there's no point refilling the inactive list too far beyond the
ability of the swapper to launder it, regardless of whether
free_shortage() is set.

refill_inactive contains exactly the opposite logic: it breaks out if

		/*
		 * If we either have enough free memory, or if
		 * page_launder() will be able to make enough
		 * free memory, then stop.
		 */
		if (!inactive_shortage() || !free_shortage())
			goto done;

but that still means that we're doing unnecessary inactive list
refilling whenever free_shortage() is true: this test only occurs
after we've tried at least one swap_out().  We're calling
refill_inactive if either condition is true, but we're staying inside
it only if both conditions are true.

Shouldn't we really just be making the refill_inactive() here depend
on inactive_shortage() alone, not free_shortage()?  By refilling the
inactive list too agressively we actually end up discarding aging
information which might be of use to us.

Rik, any thoughts?  This looks as if it's destroying any hope of
maintaining the intended inactive_shortage() targets.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 17:50     ` Linus Torvalds
@ 2001-01-08 18:21       ` Rik van Riel
  2001-01-08 18:38         ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Rik van Riel @ 2001-01-08 18:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Marcelo Tosatti, linux-mm

On Mon, 8 Jan 2001, Linus Torvalds wrote:
> On Mon, 8 Jan 2001, Rik van Riel wrote:
> > On Sun, 7 Jan 2001, Linus Torvalds wrote:
> > 
> > > 	/*
> > > 	 * Too many active pages? That implies that we don't have enough
> > > 	 * of a working set for page_launder() to do a good job. Start by
> > > 	 * walking the VM space..
> > > 	 */
> > > 	if ((nr_active_pages >> 1) > total_pages)
> > > 		swap_out();

> That _is_ the problem the above will fix. Don't read
> "page_launder()" there: it's more meant to be "this is the old
> code that does page_launder() etc.."
> 
> Trust me. Try my code. It will work.

Except for the small detail that pages inside the processes
are often not on the active list  ;)

But I agree with your idea that we really should make sure
we have enough pages available to choose from when swapping
stuff out.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 18:21       ` Rik van Riel
@ 2001-01-08 18:38         ` Linus Torvalds
  0 siblings, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-08 18:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, Marcelo Tosatti, linux-mm

On Mon, 8 Jan 2001, Rik van Riel wrote:
> 
> > That _is_ the problem the above will fix. Don't read
> > "page_launder()" there: it's more meant to be "this is the old
> > code that does page_launder() etc.."
> > 
> > Trust me. Try my code. It will work.
> 
> Except for the small detail that pages inside the processes
> are often not on the active list  ;)

Yes, you're right - we don't have a good counter to test right now.		

That's actually fairly nasty. We can't even use the "reverse" test,
because while we can make it do something like

	if (nr_inactive + nr_inactive_dirty < X %)

that won't pick up on things like the dentry and inode caches, so that
would be wrong too. 

We would really need to count the number of mapped anonymous pages to get
this right. Damn. That makes it harder than I thought.

(Hmm.. Increment counter in "do_anonymous_page()" and "do_wp_page()".
Decrement in "add_to_swap_cache()". Decrement in "free_pte()" for the
!page->mapping case. Test. Find the places I forgot. Maybe it's not that
bad, after all).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 18:10       ` Stephen C. Tweedie
@ 2001-01-08 21:52         ` Marcelo Tosatti
  2001-01-09  0:28           ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-08 21:52 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Linus Torvalds, David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Stephen C. Tweedie wrote:

> > _really_ well on many loads, but this one we do badly on. And from what
> > I've been able to see so far, it's because we're just too damn good at
> > waiting on page_launder() and doing refill_inactive_scan().
> 
> do_try_to_free_pages() is trying to
> 
> 	/*
> 	 * If needed, we move pages from the active list
> 	 * to the inactive list. We also "eat" pages from
> 	 * the inode and dentry cache whenever we do this.
> 	 */
> 	if (free_shortage() || inactive_shortage()) {
> 		shrink_dcache_memory(6, gfp_mask);
> 		shrink_icache_memory(6, gfp_mask);
> 		ret += refill_inactive(gfp_mask, user);
> 	} else {
> 
> So we're refilling the inactive list regardless of its current size
> whenever free_shortage() is true.  In the situation you describe,
> there's no point refilling the inactive list too far beyond the
> ability of the swapper to launder it, regardless of whether
> free_shortage() is set.

Agreed.

After some fights me and Rik agreed on doing a per-zone inactive shortage
check in inactive_shortage().

This allow us to check _only_ for inactive_shortage()  before calling
refill_inactive().

> 
> refill_inactive contains exactly the opposite logic: it breaks out if
> 
> 		/*
> 		 * If we either have enough free memory, or if
> 		 * page_launder() will be able to make enough
> 		 * free memory, then stop.
> 		 */
> 		if (!inactive_shortage() || !free_shortage())
> 			goto done;
> 
> but that still means that we're doing unnecessary inactive list
> refilling whenever free_shortage() is true: this test only occurs
> after we've tried at least one swap_out().  We're calling
> refill_inactive if either condition is true, but we're staying inside
> it only if both conditions are true.
> 
> Shouldn't we really just be making the refill_inactive() here depend
> on inactive_shortage() alone, not free_shortage()?  By refilling the
> inactive list too agressively we actually end up discarding aging
> information which might be of use to us.

Yes.

I've removed the free_shortage() of refill_inactive() in the patch.

Comments are welcome.


--- linux.orig/mm/vmscan.c	Thu Jan  4 02:45:26 2001
+++ linux/mm/vmscan.c	Mon Jan  8 20:43:59 2001
@@ -808,6 +808,9 @@
 int inactive_shortage(void)
 {
 	int shortage = 0;
+	pg_data_t *pgdat = pgdat_list;
+
+	/* Is the inactive dirty list too small? */
 
 	shortage += freepages.high;
 	shortage += inactive_target;
@@ -818,7 +821,27 @@
 	if (shortage > 0)
 		return shortage;
 
-	return 0;
+	/* If not, do we have enough per-zone pages on the inactive list? */
+
+	shortage = 0;
+
+	do {
+		int i;
+		for(i = 0; i < MAX_NR_ZONES; i++) {
+			int zone_shortage;
+			zone_t *zone = pgdat->node_zones+ i;
+
+			zone_shortage = zone->pages_high;
+			zone_shortage -= zone->inactive_dirty_pages;
+			zone_shortage -= zone->inactive_clean_pages;
+			zone_shortage -= zone->free_pages;
+			if (zone_shortage > 0)
+				shortage += zone_shortage;
+		}
+		pgdat = pgdat->node_next;
+	} while (pgdat);
+
+	return shortage;
 }
 
 /*
@@ -861,12 +884,13 @@
 		}
 
 		/*
-		 * don't be too light against the d/i cache since
-	   	 * refill_inactive() almost never fail when there's
-	   	 * really plenty of memory free. 
+		 * Only free memory from i/d caches if we have 
+		 * are under low memory.
 		 */
-		shrink_dcache_memory(priority, gfp_mask);
-		shrink_icache_memory(priority, gfp_mask);
+		if(free_shortage()) {
+			shrink_dcache_memory(priority, gfp_mask);
+			shrink_icache_memory(priority, gfp_mask);
+		}
 
 		/*
 		 * Then, try to page stuff out..
@@ -878,11 +902,10 @@
 		}
 
 		/*
-		 * If we either have enough free memory, or if
-		 * page_launder() will be able to make enough
+		 * If page_launder() will be able to make enough
 		 * free memory, then stop.
 		 */
-		if (!inactive_shortage() || !free_shortage())
+		if (!inactive_shortage())
 			goto done;
 
 		/*
@@ -922,14 +945,20 @@
 
 	/*
 	 * If needed, we move pages from the active list
-	 * to the inactive list. We also "eat" pages from
-	 * the inode and dentry cache whenever we do this.
+	 * to the inactive list.
+	 */
+	if (inactive_shortage())
+		ret += refill_inactive(gfp_mask, user);
+
+	/* 	
+	 * Delete pages from the inode and dentry cache 
+	 * if memory is low. 
 	 */
-	if (free_shortage() || inactive_shortage()) {
+	if (free_shortage()) {
 		shrink_dcache_memory(6, gfp_mask);
 		shrink_icache_memory(6, gfp_mask);
-		ret += refill_inactive(gfp_mask, user);
-	} else {
+	} else { 
+
 		/*
 		 * Reclaim unused slab cache memory.
 		 */



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09  0:28           ` Linus Torvalds
@ 2001-01-08 23:49             ` Marcelo Tosatti
  2001-01-09  3:12               ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-08 23:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Linus Torvalds wrote:

> 
> On Mon, 8 Jan 2001, Marcelo Tosatti wrote:
> > 
> > I've removed the free_shortage() of refill_inactive() in the patch.
> > 
> > Comments are welcome.
> 
> One comment: why does refill_inactive() do the shrink_dcache_memory() at
> all? Why not just remove that?
> 
> do_try_to_free_pages() will do that, and that's where it makes more sense
> (shrinking the dcache/icache has absolutely nothing to do with the
> inactive list).

Right. kmem_cache_reap() should not be there too.

> Also, we should probably remove the "made_progress" and "count--" from the
> swap_out() case, as swap_out() hasn't actually caused pages to be free'd
> in a long time.. 

Indeed. 

Your lazy enough to ask me to regenerate a patch or you can by
yourself? :) 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 21:52         ` Marcelo Tosatti
@ 2001-01-09  0:28           ` Linus Torvalds
  2001-01-08 23:49             ` Marcelo Tosatti
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2001-01-09  0:28 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Marcelo Tosatti wrote:
> 
> I've removed the free_shortage() of refill_inactive() in the patch.
> 
> Comments are welcome.

One comment: why does refill_inactive() do the shrink_dcache_memory() at
all? Why not just remove that?

do_try_to_free_pages() will do that, and that's where it makes more sense
(shrinking the dcache/icache has absolutely nothing to do with the
inactive list).

Historical code?

Also, we should probably remove the "made_progress" and "count--" from the
swap_out() case, as swap_out() hasn't actually caused pages to be free'd
in a long time.. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-08 23:49             ` Marcelo Tosatti
@ 2001-01-09  3:12               ` Linus Torvalds
  2001-01-09 20:33                 ` Marcelo Tosatti
  2001-01-17  4:54                 ` Rik van Riel
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-09  3:12 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Marcelo Tosatti wrote:
> 
> Your lazy enough to ask me to regenerate a patch or you can by
> yourself? :) 

Try out 2.4.1-pre1 in testing.

It does three things: 

 - gets rid of the complex "best mm" logic and replaces it with the
   round-robin thing as discussed. I have this suspicion that we
   eventually want to make this based on fault rates etc in an effort to
   more aggressively control big RSS processes, but I also suspect that
   this is tied in to the the RSS limiting patches, so this will simmer
   for a while.

 - it cleans up the unnecessary dcache/icache shrink that is already done
   more properly elsewhere.

 - it cleans up and simplifies the MM "priority" thing. In fact, right now
   only one priority is ever used, and I suspect strongly that all the
   "made_progress" logic was really there because that's how we want to do
   it (and just having one priority made "made_progress" unnecessary).

(It also has some non-VM patches, of course, but for this discussion the
VM ones are the only interesting ones).

As far as I can tell, the non-priority version is every bit as good as the
one that counts down priorities, and if nobody can argue against it I'll
just remove the priority argument altogether at some point. Right now it
still exists, it just doesn't change.

That kmem_cache_reap() thing still looks completely bogus, but I didn't
touch it. It looks _so_ bogus that there must be some reason for doing it
that ass-backwards way. Why should anybody have does a kmem_cache_reap()
when we're _not_ short of free pages? That code just makes me very
confused, so I'm not touching it.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09  3:12               ` Linus Torvalds
@ 2001-01-09 20:33                 ` Marcelo Tosatti
  2001-01-09 22:44                   ` Linus Torvalds
  2001-01-17  4:54                 ` Rik van Riel
  1 sibling, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-09 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Mon, 8 Jan 2001, Linus Torvalds wrote:

> Try out 2.4.1-pre1 in testing.

The "while (!inactive_shortage())" should be "while (inactive_shortage())"
as Benjamin noted on lk.

The second problem is that background scanning is being done
unconditionally, and it should not. You end up getting all pages with the
same age if the system is idle. Look at this example (2.4.1-pre1):

MemTotal:       900148 kB
MemFree:        145060 kB
Cached:         725624 kB
Active:           3972 kB
Inact_dirty:    722940 kB
Inact_clean:         0 kB
Inact_target:      188 kB

> That kmem_cache_reap() thing still looks completely bogus, but I didn't
> touch it. It looks _so_ bogus that there must be some reason for doing it
> that ass-backwards way. Why should anybody have does a kmem_cache_reap()
> when we're _not_ short of free pages? That code just makes me very
> confused, so I'm not touching it.

This patch removes kmem_cache_reap() from refill_inactive() and moves it
to inside the free_shortage() check in do_try_to_free_pages().

It also changes the "while (!inactive_shortage())" mistake.

Comments?

diff -Nur linux.orig/include/linux/fs.h linux/include/linux/fs.h
--- linux.orig/include/linux/fs.h	Tue Jan  9 19:32:51 2001
+++ linux/include/linux/fs.h	Tue Jan  9 20:07:32 2001
@@ -985,7 +985,7 @@
 
 extern int fs_may_remount_ro(struct super_block *);
 
-extern int try_to_free_buffers(struct page *, int);
+extern void try_to_free_buffers(struct page *, int);
 extern void refile_buffer(struct buffer_head * buf);
 
 #define BUF_CLEAN	0
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h	Tue Jan  9 19:32:51 2001
+++ linux/include/linux/swap.h	Tue Jan  9 20:07:38 2001
@@ -108,7 +108,7 @@
 extern int free_shortage(void);
 extern int inactive_shortage(void);
 extern void wakeup_kswapd(int);
-extern int try_to_free_pages(unsigned int gfp_mask);
+extern void try_to_free_pages(unsigned int gfp_mask);
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, struct page *, int);
diff -Nur linux.orig/mm/vmscan.c linux/mm/vmscan.c
--- linux.orig/mm/vmscan.c	Tue Jan  9 19:35:41 2001
+++ linux/mm/vmscan.c	Tue Jan  9 20:06:01 2001
@@ -825,9 +825,6 @@
 		count = (1 << page_cluster);
 	start_count = count;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
 	priority = 6;
 	do {
 		if (current->need_resched) {
@@ -842,16 +839,14 @@
 
 		/* If refill_inactive_scan failed, try to page stuff out.. */
 		swap_out(priority, gfp_mask);
-	} while (!inactive_shortage());
+	} while (inactive_shortage());
 
 done:
 	return (count < start_count);
 }
 
-static int do_try_to_free_pages(unsigned int gfp_mask, int user)
+static void do_try_to_free_pages(unsigned int gfp_mask, int user)
 {
-	int ret = 0;
-
 	/*
 	 * If we're low on free pages, move pages from the
 	 * inactive_dirty list to the inactive_clean list.
@@ -862,32 +857,24 @@
 	 */
 	if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() +
 			nr_inactive_clean_pages())
-		ret += page_launder(gfp_mask, user);
+		page_launder(gfp_mask, user);
 
 	/*
 	 * If needed, we move pages from the active list
 	 * to the inactive list.
 	 */
 	if (inactive_shortage())
-		ret += refill_inactive(gfp_mask, user);
+		refill_inactive(gfp_mask, user);
 
 	/* 	
-	 * Delete pages from the inode and dentry cache 
-	 * if memory is low. 
+	 * Delete pages from the inode and dentry cache and
+	 * reclaim unused slab cache if memory is low.
 	 */
 	if (free_shortage()) {
 		shrink_dcache_memory(6, gfp_mask);
 		shrink_icache_memory(6, gfp_mask);
-	} else { 
-
-		/*
-		 * Reclaim unused slab cache memory.
-		 */
 		kmem_cache_reap(gfp_mask);
-		ret = 1;
 	}
-
-	return ret;
 }
 
 DECLARE_WAIT_QUEUE_HEAD(kswapd_wait);
@@ -1029,17 +1016,13 @@
  * memory but are unable to sleep on kswapd because
  * they might be holding some IO locks ...
  */
-int try_to_free_pages(unsigned int gfp_mask)
+void try_to_free_pages(unsigned int gfp_mask)
 {
-	int ret = 1;
-
 	if (gfp_mask & __GFP_WAIT) {
 		current->flags |= PF_MEMALLOC;
-		ret = do_try_to_free_pages(gfp_mask, 1);
+		do_try_to_free_pages(gfp_mask, 1);
 		current->flags &= ~PF_MEMALLOC;
 	}
-
-	return ret;
 }
 
 DECLARE_WAIT_QUEUE_HEAD(kreclaimd_wait);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09 22:44                   ` Linus Torvalds
@ 2001-01-09 21:33                     ` Marcelo Tosatti
  2001-01-09 22:11                       ` Yet another bogus piece of do_try_to_free_pages() Marcelo Tosatti
  2001-01-09 23:58                       ` Subtle MM bug Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-09 21:33 UTC (permalink / raw)
  To: Stephen C. Tweedie, Linus Torvalds
  Cc: David S. Miller, Rik van Riel, linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:

> 
> 
> On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> > 
> > The "while (!inactive_shortage())" should be "while (inactive_shortage())"
> > as Benjamin noted on lk.
> 
> Yes. Also, it does need something to make sure that it doesn't end up
> being an endless loop. 

Ok, I'll send another patch which fixes this later today.

> > The second problem is that background scanning is being done
> > unconditionally, and it should not. You end up getting all pages with the
> > same age if the system is idle. Look at this example (2.4.1-pre1):
> 
> I agree. However, I think that we do want to do some background scanning
> to push out dirty pages in the background, kind of like bdflush. It just
> shouldn't age the pages (and thus not move them to the inactive list).

Actually it must age the pages, but aging should not be unconditional. 

Stephen has some thoughts on this. Stephen? 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Yet another bogus piece of do_try_to_free_pages()
  2001-01-09 21:33                     ` Marcelo Tosatti
@ 2001-01-09 22:11                       ` Marcelo Tosatti
  2001-01-10  0:06                         ` Linus Torvalds
  2001-01-09 23:58                       ` Subtle MM bug Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-09 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

 
Hi,

Look at this piece of code from kswapd: 

                /* If needed, try to free some memory. */
                if (inactive_shortage() || free_shortage()) {
                        int wait = 0;
                        /* Do we need to do some synchronous flushing? */
                        if (waitqueue_active(&kswapd_done))
                                wait = 1;
                        do_try_to_free_pages(GFP_KSWAPD, wait);
                }

The problem is that do_try_to_free_pages uses the "wait" argument when
calling page_launder() (where the paramater is used to indicate if we want
todo sync or async IO) _and_ used to call refill_inactive(), where this
parameter is used to indicate if its being called from a normal process or
from kswapd:

 * OTOH, if we're a user process (and not kswapd), we
 * really care about latency. In that case we don't try
 * to free too many pages.
 */
static int refill_inactive(unsigned int gfp_mask, int user)
{
        int priority, count, start_count;

        count = inactive_shortage() + free_shortage();
        if (user)
                count = (1 << page_cluster);
        start_count = count;


This is probably quite nasty in practice (low memory conditions) because
if we have waiters on kswapd, we want to free more memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09 23:58                       ` Subtle MM bug Linus Torvalds
@ 2001-01-09 22:21                         ` Marcelo Tosatti
  2001-01-10  0:23                           ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-09 22:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:

> 
> On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> > 
> > > > The second problem is that background scanning is being done
> > > > unconditionally, and it should not. You end up getting all pages with the
> > > > same age if the system is idle. Look at this example (2.4.1-pre1):
> > > 
> > > I agree. However, I think that we do want to do some background scanning
> > > to push out dirty pages in the background, kind of like bdflush. It just
> > > shouldn't age the pages (and thus not move them to the inactive list).
> > 
> > Actually it must age the pages, but aging should not be unconditional. 
> 
> No, I'm saying that "the background scanning" should not do the page
> aging.

If you age pages only when there is memory pressure/low memory, you'll
have less knowledge about which pages were unused/used pages over time.

> Obviously "refill_inactive()" needs to do the page aging. I'm just not at
> all convinced that "background scanning" == "refill_inactive()". 

This is the background scanning I refer (in kswapd):

                /*
                 * Do some (very minimal) background scanning. This
                 * will scan all pages on the active list once
                 * every minute. This clears old referenced bits
                 * and moves unused pages to the inactive list.
                 */
                refill_inactive_scan(6, 0);




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09 20:33                 ` Marcelo Tosatti
@ 2001-01-09 22:44                   ` Linus Torvalds
  2001-01-09 21:33                     ` Marcelo Tosatti
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2001-01-09 22:44 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> 
> The "while (!inactive_shortage())" should be "while (inactive_shortage())"
> as Benjamin noted on lk.

Yes. Also, it does need something to make sure that it doesn't end up
being an endless loop. 

Now, the oom_killer() thing should make sure it's not endless, but the
fact is that kswapd() (who calls the oom-killer) also calls the very same
do_try_to_free_pages(), so we really do have to make sure that it doesn't
loop forever trying to find a page. 

The priority countdown used to handle this, and while I disagree with the
_other_ uses of the priority (it used to make the freeing action
"chunkier" by walking bigger pieces of the VM or the active lists), I
think we need to rename "priority" to "maxtry", and use that to give up
gracefully when we truly do run out of memory.

(I _suspect_ that the oom killer would be invoced before this happens in
practice, and refill_inactive_scan() would find _something_ to make
slight progress on all the time, but the fact is that we shouldn't have
those kinds of assumptions in the VM code).

This would make the return value (that you removed in this patch) still a
valid thing. So I don't think it should go away.

> The second problem is that background scanning is being done
> unconditionally, and it should not. You end up getting all pages with the
> same age if the system is idle. Look at this example (2.4.1-pre1):

I agree. However, I think that we do want to do some background scanning
to push out dirty pages in the background, kind of like bdflush. It just
shouldn't age the pages (and thus not move them to the inactive list).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09 21:33                     ` Marcelo Tosatti
  2001-01-09 22:11                       ` Yet another bogus piece of do_try_to_free_pages() Marcelo Tosatti
@ 2001-01-09 23:58                       ` Linus Torvalds
  2001-01-09 22:21                         ` Marcelo Tosatti
  1 sibling, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2001-01-09 23:58 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm


On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> 
> > > The second problem is that background scanning is being done
> > > unconditionally, and it should not. You end up getting all pages with the
> > > same age if the system is idle. Look at this example (2.4.1-pre1):
> > 
> > I agree. However, I think that we do want to do some background scanning
> > to push out dirty pages in the background, kind of like bdflush. It just
> > shouldn't age the pages (and thus not move them to the inactive list).
> 
> Actually it must age the pages, but aging should not be unconditional. 

No, I'm saying that "the background scanning" should not do the page
aging.

Obviously "refill_inactive()" needs to do the page aging. I'm just not at
all convinced that "background scanning" == "refill_inactive()". 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-09 22:11                       ` Yet another bogus piece of do_try_to_free_pages() Marcelo Tosatti
@ 2001-01-10  0:06                         ` Linus Torvalds
  2001-01-10  6:39                           ` Marcelo Tosatti
  2001-01-17  6:52                           ` Rik van Riel
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-10  0:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> 
> The problem is that do_try_to_free_pages uses the "wait" argument when
> calling page_launder() (where the paramater is used to indicate if we want
> todo sync or async IO) _and_ used to call refill_inactive(), where this
> parameter is used to indicate if its being called from a normal process or
> from kswapd:

Yes. Bogus.

I suspect that the proper fix is something more along the lines of what we
did to bdflush: get rid of the notion of waiting synchronously from
bdflush, and instead do the work yourself. 

Doing the same to kswapd would imply getting rid of that kswapd_wait
thing, and instead of having people wait on it, they would do
"page_launder(gfp_mask, 1)" themselves (and we _do_ want them to wait,
because that ends up being rate-limiting especially on the applications
that do a lot of memory allocation - which are the applications that end
up being the problem in the first place).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-10  0:23                           ` Linus Torvalds
@ 2001-01-10  0:12                             ` Marcelo Tosatti
  2001-01-10 11:29                               ` Stephen C. Tweedie
  2001-01-11  3:30                             ` Marcelo Tosatti
  1 sibling, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-10  0:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:

> Hmm.. Fair enough. However, if you don't have VM pressure, you're also not
> going to look at the page tables, so you are not going to get any use
> information from them, either.

Are you sure that potentially unmapping pte's and swapping out its pages
in the background scanning is ok? 

I mean, what kind of swap behaviour we will have if we do it?

> The aging should really be done at roughly the same rate as the "mark
> active", wouldn't you say? If you mark things active without aging, pages
> end up all being marked as "new". And if you age without marking things
> active, they all end up being "old". Neither is good. What you really want
> to have is aging that happens at the same rate as reference marking.
> So one "conditional aging" algorithm might just be something as simple as
> 
>  - every time you mark something referenced, you increment a counter
>  - every time you want to age something, you check whethe rthe counter is
>    positive first (and decrement it if you age something)

Seems to be a nice solution.

I'll send you the previously promised patch and then I'll send the
background scanning one as soon as we (or I?) figure out the previous
question about background pte scanning.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09 22:21                         ` Marcelo Tosatti
@ 2001-01-10  0:23                           ` Linus Torvalds
  2001-01-10  0:12                             ` Marcelo Tosatti
  2001-01-11  3:30                             ` Marcelo Tosatti
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2001-01-10  0:23 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm


On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> > 
> > No, I'm saying that "the background scanning" should not do the page
> > aging.
> 
> If you age pages only when there is memory pressure/low memory, you'll
> have less knowledge about which pages were unused/used pages over time.

Hmm.. Fair enough. However, if you don't have VM pressure, you're also not
going to look at the page tables, so you are not going to get any use
information from them, either. 

The aging should really be done at roughly the same rate as the "mark
active", wouldn't you say? If you mark things active without aging, pages
end up all being marked as "new". And if you age without marking things
active, they all end up being "old". Neither is good. What you really want
to have is aging that happens at the same rate as reference marking.

So one "conditional aging" algorithm might just be something as simple as

 - every time you mark something referenced, you increment a counter
 - every time you want to age something, you check whethe rthe counter is
   positive first (and decrement it if you age something)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-10  0:06                         ` Linus Torvalds
@ 2001-01-10  6:39                           ` Marcelo Tosatti
  2001-01-10 22:19                             ` Roger Larsson
  2001-01-11  0:11                             ` Zlatko Calusic
  2001-01-17  6:52                           ` Rik van Riel
  1 sibling, 2 replies; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-10  6:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:

> I suspect that the proper fix is something more along the lines of what we
> did to bdflush: get rid of the notion of waiting synchronously from
> bdflush, and instead do the work yourself. 

Agreed. 

Without blocking on sync IO, kswapd can keep aging pages and moving
them to the inactive lists. 

The following patch changes some stuff we've discussed before (the
kmem_cache_reap and maxtry thingies) and it also removes the kswapd
sleeping scheme.

I haven't tested it yet, though I'll do it tomorrow.

diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h	Wed Jan 10 02:17:59 2001
+++ linux/include/linux/swap.h	Wed Jan 10 05:52:02 2001
@@ -107,7 +107,7 @@
 extern int page_launder(int, int);
 extern int free_shortage(void);
 extern int inactive_shortage(void);
-extern void wakeup_kswapd(int);
+extern void wakeup_kswapd(void);
 extern int try_to_free_pages(unsigned int gfp_mask);
 
 /* linux/mm/page_io.c */
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c	Wed Jan 10 02:17:59 2001
+++ linux/mm/filemap.c	Wed Jan 10 05:54:56 2001
@@ -306,7 +306,7 @@
 	 */
 	age_page_up(page);
 	if (inactive_shortage() > inactive_target / 2 && free_shortage())
-			wakeup_kswapd(0);
+			wakeup_kswapd();
 not_found:
 	return page;
 }
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/page_alloc.c linux/mm/page_alloc.c
--- linux.orig/mm/page_alloc.c	Wed Jan 10 02:17:59 2001
+++ linux/mm/page_alloc.c	Wed Jan 10 06:04:05 2001
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/pagemap.h>
 #include <linux/bootmem.h>
+#include <linux/slab.h>
 
 int nr_swap_pages;
 int nr_active_pages;
@@ -303,7 +304,7 @@
 	 * an inactive page shortage, wake up kswapd.
 	 */
 	if (inactive_shortage() > inactive_target / 2 && free_shortage())
-		wakeup_kswapd(0);
+		wakeup_kswapd();
 	/*
 	 * If we are about to get low on free pages and cleaning
 	 * the inactive_dirty pages would fix the situation,
@@ -379,7 +380,7 @@
 	 * - if we don't have __GFP_IO set, kswapd may be
 	 *   able to free some memory we can't free ourselves
 	 */
-	wakeup_kswapd(0);
+	wakeup_kswapd();
 	if (gfp_mask & __GFP_WAIT) {
 		__set_current_state(TASK_RUNNING);
 		current->policy |= SCHED_YIELD;
@@ -404,7 +405,7 @@
 	 * - we're doing a higher-order allocation
 	 * 	--> move pages to the free list until we succeed
 	 * - we're /really/ tight on memory
-	 * 	--> wait on the kswapd waitqueue until memory is freed
+	 * 	--> try to free pages ourselves with page_launder
 	 */
 	if (!(current->flags & PF_MEMALLOC)) {
 		/*
@@ -443,36 +444,23 @@
 		/*
 		 * When we arrive here, we are really tight on memory.
 		 *
-		 * We wake up kswapd and sleep until kswapd wakes us
-		 * up again. After that we loop back to the start.
-		 *
-		 * We have to do this because something else might eat
-		 * the memory kswapd frees for us and we need to be
-		 * reliable. Note that we don't loop back for higher
-		 * order allocations since it is possible that kswapd
-		 * simply cannot free a large enough contiguous area
-		 * of memory *ever*.
-		 */
-		if ((gfp_mask & (__GFP_WAIT|__GFP_IO)) == (__GFP_WAIT|__GFP_IO)) {
-			wakeup_kswapd(1);
-			memory_pressure++;
-			if (!order)
-				goto try_again;
-		/*
-		 * If __GFP_IO isn't set, we can't wait on kswapd because
-		 * kswapd just might need some IO locks /we/ are holding ...
-		 *
-		 * SUBTLE: The scheduling point above makes sure that
-		 * kswapd does get the chance to free memory we can't
-		 * free ourselves...
+		 * We try to free pages ourselves by:
+		 * 	- shrinking the i/d caches.
+		 * 	- reclaiming unused memory from the slab caches.
+		 * 	- swapping/syncing pages to disk (done by page_launder)
+		 * 	- moving clean pages from the inactive dirty list to
+		 * 	  the inactive clean list. (done by page_launder)
 		 */
-		} else if (gfp_mask & __GFP_WAIT) {
-			try_to_free_pages(gfp_mask);
-			memory_pressure++;
+		if (gfp_mask & __GFP_WAIT) {
+			shrink_icache_memory(6, gfp_mask);
+			shrink_dcache_memory(6, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+
+			page_launder(gfp_mask, 1);
+
 			if (!order)
 				goto try_again;
 		}
-
 	}
 
 	/*
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/slab.c linux/mm/slab.c
--- linux.orig/mm/slab.c	Wed Jan 10 02:17:59 2001
+++ linux/mm/slab.c	Wed Jan 10 06:01:27 2001
@@ -1702,7 +1702,7 @@
  * kmem_cache_reap - Reclaim memory from caches.
  * @gfp_mask: the type of memory required.
  *
- * Called from try_to_free_page().
+ * Called from do_try_to_free_pages() and __alloc_pages()
  */
 void kmem_cache_reap (int gfp_mask)
 {
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/vmscan.c linux/mm/vmscan.c
--- linux.orig/mm/vmscan.c	Wed Jan 10 02:17:59 2001
+++ linux/mm/vmscan.c	Wed Jan 10 05:57:45 2001
@@ -156,20 +156,6 @@
 	return 0;
 }
 
-/*
- * A new implementation of swap_out().  We do not swap complete processes,
- * but only a small number of blocks, before we continue with the next
- * process.  The number of blocks actually swapped is determined on the
- * number of page faults, that this process actually had in the last time,
- * so we won't swap heavily used processes all the time ...
- *
- * Note: the priority argument is a hint on much CPU to waste with the
- *       swap block search, not a hint, of how much blocks to swap with
- *       each process.
- *
- * (C) 1993 Kai Petzke, wpp@marie.physik.tu-berlin.de
- */
-
 static inline int swap_out_pmd(struct mm_struct * mm, struct vm_area_struct * vma, pmd_t *dir, unsigned long address, unsigned long end)
 {
 	pte_t * pte;
@@ -818,17 +804,14 @@
  */
 static int refill_inactive(unsigned int gfp_mask, int user)
 {
-	int priority, count, start_count;
+	int priority, count, start_count, maxtry;
 
 	count = inactive_shortage() + free_shortage();
 	if (user)
 		count = (1 << page_cluster);
 	start_count = count;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	priority = 6;
+	maxtry = priority = 6;
 	do {
 		if (current->need_resched) {
 			__set_current_state(TASK_RUNNING);
@@ -842,7 +825,10 @@
 
 		/* If refill_inactive_scan failed, try to page stuff out.. */
 		swap_out(priority, gfp_mask);
-	} while (!inactive_shortage());
+
+		if(--maxtry <= 0)
+			return 0;
+	} while (inactive_shortage());
 
 done:
 	return (count < start_count);
@@ -872,20 +858,14 @@
 		ret += refill_inactive(gfp_mask, user);
 
 	/* 	
-	 * Delete pages from the inode and dentry cache 
-	 * if memory is low. 
+	 * Delete pages from the inode and dentry caches and 
+	 * reclaim unused slab cache if memory is low.
 	 */
 	if (free_shortage()) {
 		shrink_dcache_memory(6, gfp_mask);
 		shrink_icache_memory(6, gfp_mask);
-	} else { 
-
-		/*
-		 * Reclaim unused slab cache memory.
-		 */
 		kmem_cache_reap(gfp_mask);
-		ret = 1;
-	}
+	} 
 
 	return ret;
 }
@@ -938,13 +918,8 @@
 		static int recalc = 0;
 
 		/* If needed, try to free some memory. */
-		if (inactive_shortage() || free_shortage()) {
-			int wait = 0;
-			/* Do we need to do some synchronous flushing? */
-			if (waitqueue_active(&kswapd_done))
-				wait = 1;
-			do_try_to_free_pages(GFP_KSWAPD, wait);
-		}
+		if (inactive_shortage() || free_shortage()) 
+			do_try_to_free_pages(GFP_KSWAPD, 0);
 
 		/*
 		 * Do some (very minimal) background scanning. This
@@ -960,11 +935,6 @@
 			recalculate_vm_stats();
 		}
 
-		/*
-		 * Wake up everybody waiting for free memory
-		 * and unplug the disk queue.
-		 */
-		wake_up_all(&kswapd_done);
 		run_task_queue(&tq_disk);
 
 		/* 
@@ -995,33 +965,10 @@
 	}
 }
 
-void wakeup_kswapd(int block)
+void wakeup_kswapd(void)
 {
-	DECLARE_WAITQUEUE(wait, current);
-
-	if (current == kswapd_task)
-		return;
-
-	if (!block) {
-		if (waitqueue_active(&kswapd_wait))
-			wake_up(&kswapd_wait);
-		return;
-	}
-
-	/*
-	 * Kswapd could wake us up before we get a chance
-	 * to sleep, so we have to be very careful here to
-	 * prevent SMP races...
-	 */
-	__set_current_state(TASK_UNINTERRUPTIBLE);
-	add_wait_queue(&kswapd_done, &wait);
-
-	if (waitqueue_active(&kswapd_wait))
-		wake_up(&kswapd_wait);
-	schedule();
-
-	remove_wait_queue(&kswapd_done, &wait);
-	__set_current_state(TASK_RUNNING);
+	if (current != kswapd_task)
+		wake_up_process(kswapd_task);
 }
 
 /*
@@ -1046,7 +993,7 @@
 /*
  * Kreclaimd will move pages from the inactive_clean list to the
  * free list, in order to keep atomic allocations possible under
- * all circumstances. Even when kswapd is blocked on IO.
+ * all circumstances.
  */
 int kreclaimd(void *unused)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-10  0:12                             ` Marcelo Tosatti
@ 2001-01-10 11:29                               ` Stephen C. Tweedie
  0 siblings, 0 replies; 38+ messages in thread
From: Stephen C. Tweedie @ 2001-01-10 11:29 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Linus Torvalds, Stephen C. Tweedie, David S. Miller,
	Rik van Riel, linux-mm

Hi,

On Tue, Jan 09, 2001 at 10:12:45PM -0200, Marcelo Tosatti wrote:
> On Tue, 9 Jan 2001, Linus Torvalds wrote:
> 
> > Hmm.. Fair enough. However, if you don't have VM pressure, you're also not
> > going to look at the page tables, so you are not going to get any use
> > information from them, either.
> 
> Are you sure that potentially unmapping pte's and swapping out its pages
> in the background scanning is ok? 

Why not?  We're only going to be aging things slowly in the absense of
memory pressure, and if a page hasn't been used between two
widely-separated passes then inactivating the page isn't likely to
have much impact: it's only a soft-fault to get it back.

> > The aging should really be done at roughly the same rate as the "mark
> > active", wouldn't you say? If you mark things active without aging, pages
> > end up all being marked as "new". And if you age without marking things
> > active, they all end up being "old". Neither is good. What you really want
> > to have is aging that happens at the same rate as reference marking.
> > So one "conditional aging" algorithm might just be something as simple as
> > 
> >  - every time you mark something referenced, you increment a counter
> >  - every time you want to age something, you check whethe rthe counter is
> >    positive first (and decrement it if you age something)
> 
> Seems to be a nice solution.

This is _exactly_ what I proposed to Rick last time we talked about
it, and it seems to be the right balance between maintaining uptodate
information when data is being accessed, and maintaining old state
when it isn't.  You need to decay the counter appropriately, though.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-10  6:39                           ` Marcelo Tosatti
@ 2001-01-10 22:19                             ` Roger Larsson
  2001-01-11  0:11                             ` Zlatko Calusic
  1 sibling, 0 replies; 38+ messages in thread
From: Roger Larsson @ 2001-01-10 22:19 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

On Wednesday 10 January 2001 07:39, Marcelo Tosatti wrote:
> On Tue, 9 Jan 2001, Linus Torvalds wrote:
> > I suspect that the proper fix is something more along the lines of what
> > we did to bdflush: get rid of the notion of waiting synchronously from
> > bdflush, and instead do the work yourself.
>
> Agreed.
>
> Without blocking on sync IO, kswapd can keep aging pages and moving
> them to the inactive lists.
>
> The following patch changes some stuff we've discussed before (the
> kmem_cache_reap and maxtry thingies) and it also removes the kswapd
> sleeping scheme.
>
> I haven't tested it yet, though I'll do it tomorrow.
>

I have have it running...
It gave me the highest dbench 16 result I have seen [recently begun to
run against a faster disk...]

On my PPro 180 with 96 M RAM [best of 3]
write, copy, read, diff uses plain bash commands with data of 150 or 300 MB.
[streaming]
only one run of dbench (takes tooo... much time)
[the CLIENTS goes via a symbolic link to the other disk - not perfect but...]

kernel		write	copy	read	diff	dbench
2.4.0		10.6	10.9	14.1	8.3	10.2
2.4.1-pre1+neg	10.1	10.9	14.0	8.2	10.0
2.4.1-pre1+this	11.5	10.6	14.4	8.2	10.8

as a comparisation
2.2.18		10.6	 9.7	12.8	7.2	 7.7

The only really strange thing that is common for all the 2.4 kernels is
konquerors brk usage resulting in SIGSEGV. Reported earlier to linux-kernel.

select(16, [3 4 6 7 9 10 12 13 14 15], NULL, NULL, {0, 0}) = 2 (in [7 13], 
left {0, 0})
read(13, "     4_ a_", 10)              = 10
read(13, "\0\0\0\0", 4)                 = 4
read(7, "\2\1\0\2.\1\0\0", 8)           = 8
read(7, "\1\0\0\0", 4)                  = 4
read(7, "\0\0\0\17konqueror-3415\0\0\0\0\vkonqueror"..., 302) = 302
brk(0x84f8000)                          = 0x84f8000
brk(0x84fd000)                          = 0x84fd000
brk(0x8502000)                          = 0x8502000
brk(0x8507000)                          = 0x8507000
brk(0x850c000)                          = 0x850c000
brk(0x8511000)                          = 0x8511000
brk(0x8516000)                          = 0x8516000
brk(0x851b000)                          = 0x851b000
brk(0x8520000)                          = 0x8520000
[...]
brk(0xd02d000)                          = 0xd02d000
brk(0xd02f000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
brk(0xd031000)                          = 0xd02f000
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++  


-- 
Home page:
  http://www.norran.net/nra02596/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-10  6:39                           ` Marcelo Tosatti
  2001-01-10 22:19                             ` Roger Larsson
@ 2001-01-11  0:11                             ` Zlatko Calusic
  2001-01-17  6:58                               ` Rik van Riel
  1 sibling, 1 reply; 38+ messages in thread
From: Zlatko Calusic @ 2001-01-11  0:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-mm

Marcelo Tosatti <marcelo@conectiva.com.br> writes:

> On Tue, 9 Jan 2001, Linus Torvalds wrote:
> 
> > I suspect that the proper fix is something more along the lines of what we
> > did to bdflush: get rid of the notion of waiting synchronously from
> > bdflush, and instead do the work yourself. 
> 
> Agreed. 
> 
> Without blocking on sync IO, kswapd can keep aging pages and moving
> them to the inactive lists. 
> 
> The following patch changes some stuff we've discussed before (the
> kmem_cache_reap and maxtry thingies) and it also removes the kswapd
> sleeping scheme.
> 
> I haven't tested it yet, though I'll do it tomorrow.
> 

I have tested it for you and results are great. On some tests I got
20% to 30% better results which is amazing. I'll do some more tests
but I would vote for this to get in immediately. Yes, it's *so* good.

Great work Marcelo!
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-10  0:23                           ` Linus Torvalds
  2001-01-10  0:12                             ` Marcelo Tosatti
@ 2001-01-11  3:30                             ` Marcelo Tosatti
  2001-01-11  9:42                               ` Stephen C. Tweedie
  1 sibling, 1 reply; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-11  3:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, David S. Miller, Rik van Riel, linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:

> So one "conditional aging" algorithm might just be something as simple as

I've done a very easy conditional aging patch (I dont think doing new
functions to scan the active list and the pte's is necessary)

kswapd is not perfectly obeing the counter: if the counter reaches 0, we
keep doing a previously (when counter > 0) called swap_out().

But since swap_out() is only scanning a small part of a mm I dont think
the "non perfect" scanning is a big issue.

Comments? 


diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h	Thu Jan 11 00:27:46 2001
+++ linux/include/linux/swap.h	Thu Jan 11 02:45:04 2001
@@ -101,6 +101,8 @@
 extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
+extern int bg_page_aging;
+
 extern struct page * reclaim_page(zone_t *);
 extern wait_queue_head_t kswapd_wait;
 extern wait_queue_head_t kreclaimd_wait;
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/swap.c linux/mm/swap.c
--- linux.orig/mm/swap.c	Thu Jan 11 00:27:45 2001
+++ linux/mm/swap.c	Thu Jan 11 02:12:01 2001
@@ -214,6 +214,8 @@
 	/* Make sure the page gets a fair chance at staying active. */
 	if (page->age < PAGE_AGE_START)
 		page->age = PAGE_AGE_START;
+
+	bg_page_aging++;
 }
 
 void activate_page(struct page * page)
diff --exclude-from=/home/marcelo/exclude -Nur linux.orig/mm/vmscan.c linux/mm/vmscan.c
--- linux.orig/mm/vmscan.c	Thu Jan 11 00:27:45 2001
+++ linux/mm/vmscan.c	Thu Jan 11 02:53:40 2001
@@ -24,6 +24,8 @@
 
 #include <asm/pgalloc.h>
 
+int bg_page_aging = 0;
+
 /*
  * The swap-out functions return 1 if they successfully
  * threw something out, and we got a free page. It returns
@@ -60,9 +62,12 @@
 		age_page_up(page);
 		goto out_failed;
 	}
-	if (!onlist)
+	if (!onlist) {
 		/* The page is still mapped, so it can't be freeable... */
+		if(bg_page_aging)
+			bg_page_aging--;
 		age_page_down_ageonly(page);
+	}
 
 	/*
 	 * If the page is in active use by us, or if the page
@@ -650,11 +655,12 @@
  * This function will scan a portion of the active list to find
  * unused pages, those pages will then be moved to the inactive list.
  */
-int refill_inactive_scan(unsigned int priority, int oneshot)
+int refill_inactive_scan(unsigned int priority, int background)
 {
 	struct list_head * page_lru;
 	struct page * page;
-	int maxscan, page_active = 0;
+	int maxscan, page_active;
+	int deactivate = 1;
 	int ret = 0;
 
 	/* Take the lock while messing with the list... */
@@ -674,8 +680,21 @@
 		/* Do aging on the pages. */
 		if (PageTestandClearReferenced(page)) {
 			age_page_up_nolock(page);
-			page_active = 1;
-		} else {
+		} else if (deactivate) {
+
+			/* 
+			 * We're aging down a page. 
+			 * Decrement the counter if it has not reached zero
+			 * yet. If it reached zero, and we are doing background 
+			 * scan and the counter reached 0, stop deactivating pages.
+			 */
+			if (bg_page_aging)
+				bg_page_aging--;
+			else if (background) {
+				deactivate = 0;	
+				continue;
+			}
+
 			age_page_down_ageonly(page);
 			/*
 			 * Since we don't hold a reference on the page
@@ -691,8 +710,6 @@
 						(page->buffers ? 2 : 1)) {
 				deactivate_page_nolock(page);
 				page_active = 0;
-			} else {
-				page_active = 1;
 			}
 		}
 		/*
@@ -705,7 +722,8 @@
 			list_add(page_lru, &active_list);
 		} else {
 			ret = 1;
-			if (oneshot)
+			/* Stop scanning if we're not doing background scan */
+			if (!background)
 				break;
 		}
 	}
@@ -818,7 +836,7 @@
 			schedule();
 		}
 
-		while (refill_inactive_scan(priority, 1)) {
+		while (refill_inactive_scan(priority, 0)) {
 			if (--count <= 0)
 				goto done;
 		}
@@ -921,13 +939,19 @@
 		if (inactive_shortage() || free_shortage()) 
 			do_try_to_free_pages(GFP_KSWAPD, 0);
 
+
+		/* Do some (very minimal) background scanning. */
+
 		/*
-		 * Do some (very minimal) background scanning. This
-		 * will scan all pages on the active list once
+		 * This will scan all pages on the active list once
 		 * every minute. This clears old referenced bits
 		 * and moves unused pages to the inactive list.
 		 */
-		refill_inactive_scan(6, 0);
+		refill_inactive_scan(6, 1);
+	
+		/* This will scan the pte's. */
+		if(bg_page_aging)
+			swap_out(6, 0);
 
 		/* Once a second, recalculate some VM stats. */
 		if (time_after(jiffies, recalc + HZ)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-11  3:30                             ` Marcelo Tosatti
@ 2001-01-11  9:42                               ` Stephen C. Tweedie
  2001-01-11 15:24                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen C. Tweedie @ 2001-01-11  9:42 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Linus Torvalds, Stephen C. Tweedie, David S. Miller,
	Rik van Riel, linux-mm

Hi,

On Thu, Jan 11, 2001 at 01:30:18AM -0200, Marcelo Tosatti wrote:
> 
> On Tue, 9 Jan 2001, Linus Torvalds wrote:
> 
> > So one "conditional aging" algorithm might just be something as simple as
> 
> I've done a very easy conditional aging patch (I dont think doing new
> functions to scan the active list and the pte's is necessary)

You still need to decay the bg_page_aging counter a little somewhere,
otherwise if you've been running a long-lived workload which keeps
most of memory recently activated, you'll build up such a large
counter that going idle will still age everything to zero.

This might be as simple as clamping the value of the counter to some
arbitrary maximum value such as num_physpages.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-11  9:42                               ` Stephen C. Tweedie
@ 2001-01-11 15:24                                 ` Marcelo Tosatti
  0 siblings, 0 replies; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-11 15:24 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Linus Torvalds, David S. Miller, Rik van Riel, linux-mm


On Thu, 11 Jan 2001, Stephen C. Tweedie wrote:

> This might be as simple as clamping the value of the counter to some
> arbitrary maximum value such as num_physpages.

Ok, I've taken this suggestion and used to limit the counter.

I've also changed some Linus changes to swap_out() in pre2 (related to
page aging).

I've noted quite nice performance improvements with the pte scanning
(which moves the dirty pte bits to the pages) on dbench: 7Mb/sec to
9.5Mb/sec. (128MB, 48 threads)

The pte scanning will be a big win for databases with heavy IO, I suppose.

The following patch is against 2.4.1pre2.

Comments?

diff -Nur --exclude-from=exclude linux.orig/mm/swap.c linux/mm/swap.c
--- linux.orig/mm/swap.c	Thu Jan 11 11:13:37 2001
+++ linux/mm/swap.c	Thu Jan 11 14:38:09 2001
@@ -200,17 +200,22 @@
 {
 	if (PageInactiveDirty(page)) {
 		del_page_from_inactive_dirty_list(page);
-		add_page_to_active_list(page);
 	} else if (PageInactiveClean(page)) {
 		del_page_from_inactive_clean_list(page);
-		add_page_to_active_list(page);
 	} else {
 		/*
 		 * The page was not on any list, so we take care
 		 * not to do anything.
 		 */
+		goto inc_age;
 	}
 
+	add_page_to_active_list(page);
+	
+	if(bg_page_aging < num_physpages)
+		bg_page_aging++;
+
+inc_age:
 	/* Make sure the page gets a fair chance at staying active. */
 	if (page->age < PAGE_AGE_START)
 		page->age = PAGE_AGE_START;
diff -Nur --exclude-from=exclude linux.orig/mm/vmscan.c linux/mm/vmscan.c
--- linux.orig/mm/vmscan.c	Thu Jan 11 11:13:37 2001
+++ linux/mm/vmscan.c	Thu Jan 11 14:52:04 2001
@@ -24,17 +24,8 @@
 
 #include <asm/pgalloc.h>
 
-/*
- * The swap-out functions return 1 if they successfully
- * threw something out, and we got a free page. It returns
- * zero if it couldn't do anything, and any other value
- * indicates it decreased rss, but the page was shared.
- *
- * NOTE! If it sleeps, it *must* return 1 to make sure we
- * don't continue with the swap-out. Otherwise we may be
- * using a process that no longer actually exists (it might
- * have died while we slept).
- */
+int bg_page_aging = 0;
+
 static void try_to_swap_out(struct mm_struct * mm, struct vm_area_struct* vma, unsigned long address, pte_t * page_table, struct page *page)
 {
 	pte_t pte;
@@ -42,12 +33,18 @@
 
 	/* Don't look at this pte if it's been accessed recently. */
 	if (ptep_test_and_clear_young(page_table)) {
-		page->age += PAGE_AGE_ADV;
-		if (page->age > PAGE_AGE_MAX)
-			page->age = PAGE_AGE_MAX;
+		age_page_up(page);
 		return;
+	} else {
+		age_page_down_ageonly(page);
+		if (bg_page_aging)
+			bg_page_aging--;
 	}
 
+	/* Unmap only old pages */
+	if (page->age > 0)
+		return;
+
 	if (TryLockPage(page))
 		return;
 
@@ -268,7 +265,7 @@
 	return nr < SWAP_MIN ? SWAP_MIN : nr;
 }
 
-static int swap_out(unsigned int priority, int gfp_mask)
+static int swap_out(unsigned int priority, int background)
 {
 	int counter;
 	int retval = 0;
@@ -300,6 +297,13 @@
 		/* Walk about 6% of the address space each time */
 		retval |= swap_out_mm(mm, swap_amount(mm));
 		mmput(mm);
+		/* 
+		 *  In the case of background aging, stop
+		 *  the scan when we aged the necessary amount
+		 *  of pages.
+		 */
+		if (background && !bg_page_aging)
+			break;
 	} while (--counter >= 0);
 	return retval;
 
@@ -630,22 +634,24 @@
 /**
  * refill_inactive_scan - scan the active list and find pages to deactivate
  * @priority: the priority at which to scan
- * @oneshot: exit after deactivating one page
+ * @background: slightly different behaviour for background scanning
  *
  * This function will scan a portion of the active list to find
  * unused pages, those pages will then be moved to the inactive list.
  */
-int refill_inactive_scan(unsigned int priority, int oneshot)
+int refill_inactive_scan(unsigned int priority, int background)
 {
 	struct list_head * page_lru;
 	struct page * page;
-	int maxscan, page_active = 0;
+	int maxscan;
 	int ret = 0;
+	int deactivate = 1;
 
 	/* Take the lock while messing with the list... */
 	spin_lock(&pagemap_lru_lock);
 	maxscan = nr_active_pages >> priority;
 	while (maxscan-- > 0 && (page_lru = active_list.prev) != &active_list) {
+		int page_active = 0;
 		page = list_entry(page_lru, struct page, lru);
 
 		/* Wrong page on list?! (list corruption, should not happen) */
@@ -660,9 +666,19 @@
 		if (PageTestandClearReferenced(page)) {
 			age_page_up_nolock(page);
 			page_active = 1;
-		} else {
+		} else if (deactivate) {
 			age_page_down_ageonly(page);
 			/*
+			 * We're aging down a page. Decrement the counter if it
+ 			 * has not reached zero yet. If it reached zero, and we 			 * are doing background scan, stop deactivating pages.
+			 */
+			if (bg_page_aging)
+				bg_page_aging--;
+			else if (background) {
+				deactivate = 0;
+				continue;	
+			}
+			/*
 			 * Since we don't hold a reference on the page
 			 * ourselves, we have to do our test a bit more
 			 * strict then deactivate_page(). This is needed
@@ -676,21 +692,20 @@
 						(page->buffers ? 2 : 1)) {
 				deactivate_page_nolock(page);
 				page_active = 0;
-			} else {
-				page_active = 1;
 			}
 		}
 		/*
 		 * If the page is still on the active list, move it
 		 * to the other end of the list. Otherwise it was
-		 * deactivated by age_page_down and we exit successfully.
+		 * deactivated by deactivate_page_nolock and we exit 
+		 * successfully.
 		 */
 		if (page_active || PageActive(page)) {
 			list_del(page_lru);
 			list_add(page_lru, &active_list);
 		} else {
 			ret = 1;
-			if (oneshot)
+			if (!background)
 				break;
 		}
 	}
@@ -804,13 +819,13 @@
 			schedule();
 		}
 
-		while (refill_inactive_scan(DEF_PRIORITY, 1)) {
+		while (refill_inactive_scan(DEF_PRIORITY, 0)) {
 			if (--count <= 0)
 				goto done;
 		}
 
 		/* If refill_inactive_scan failed, try to page stuff out.. */
-		swap_out(DEF_PRIORITY, gfp_mask);
+		swap_out(DEF_PRIORITY, 0);
 
 		if (--maxtry <= 0)
 				return 0;
@@ -914,7 +929,11 @@
 		 * every minute. This clears old referenced bits
 		 * and moves unused pages to the inactive list.
 		 */
-		refill_inactive_scan(DEF_PRIORITY, 0);
+		refill_inactive_scan(DEF_PRIORITY, 1);
+
+		/* Walk the pte's and age them. */
+		if (bg_page_aging)
+			swap_out(DEF_PRIORITY, 1);
 
 		/* Once a second, recalculate some VM stats. */
 		if (time_after(jiffies, recalc + HZ)) {
diff -Nur --exclude-from=exclude linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h	Thu Jan 11 11:13:38 2001
+++ linux/include/linux/swap.h	Thu Jan 11 14:54:57 2001
@@ -101,6 +101,7 @@
 extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
+extern int bg_page_aging;
 extern struct page * reclaim_page(zone_t *);
 extern wait_queue_head_t kswapd_wait;
 extern wait_queue_head_t kreclaimd_wait;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Subtle MM bug
  2001-01-09  3:12               ` Linus Torvalds
  2001-01-09 20:33                 ` Marcelo Tosatti
@ 2001-01-17  4:54                 ` Rik van Riel
  1 sibling, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2001-01-17  4:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marcelo Tosatti, Stephen C. Tweedie, David S. Miller, linux-mm

On Mon, 8 Jan 2001, Linus Torvalds wrote:

>  - gets rid of the complex "best mm" logic and replaces it with the
>    round-robin thing as discussed.

This could help IO clustering as well, which should be good
whenever we want to swap the data back in ;)

>  - it cleans up and simplifies the MM "priority" thing. In fact, right now
>    only one priority is ever used,

Sounds great.

In the week that I've been offline I have been working on
page_launder and doing a few other improvements to the VM.

Once I get the time to clean everything up I think we can
take 2.4 to a slightly better performance level without
having to change anything big.

regards,

Rik (at linux.conf.au)
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-17  6:58                               ` Rik van Riel
@ 2001-01-17  6:07                                 ` Marcelo Tosatti
  2001-01-17 19:04                                 ` Zlatko Calusic
  1 sibling, 0 replies; 38+ messages in thread
From: Marcelo Tosatti @ 2001-01-17  6:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Zlatko Calusic, Linus Torvalds, linux-mm


On Wed, 17 Jan 2001, Rik van Riel wrote:

> On 11 Jan 2001, Zlatko Calusic wrote:
> 
> > I have tested it for you and results are great. On some tests I got
> > 20% to 30% better results which is amazing. I'll do some more tests
> > but I would vote for this to get in immediately. Yes, it's *so* good.
> 
> Don't be so rash.
> 
> The patch hasn't been tested very thoroughly, otherwise
> people would have noticed the problem that PG_MEMALLOC
> isn't set around the page freeing code, possibly leading
> to deadlocks, triple faults and other nasties.

Look at 2.4.1pre8.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-10  0:06                         ` Linus Torvalds
  2001-01-10  6:39                           ` Marcelo Tosatti
@ 2001-01-17  6:52                           ` Rik van Riel
  1 sibling, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2001-01-17  6:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-mm

On Tue, 9 Jan 2001, Linus Torvalds wrote:
> On Tue, 9 Jan 2001, Marcelo Tosatti wrote:
> >
> > The problem is that do_try_to_free_pages uses the "wait" argument when
> > calling page_launder() (where the paramater is used to indicate if we want
> > todo sync or async IO) _and_ used to call refill_inactive(), where this
> > parameter is used to indicate if its being called from a normal process or
> > from kswapd:
>
> Yes. Bogus.
>
> I suspect that the proper fix is something more along the lines
> of what we did to bdflush: get rid of the notion of waiting
> synchronously from bdflush, and instead do the work yourself.

Agreed. I've been working on this a bit in the last week and
have achieved some interesting results.

The main thing I found that it is *not* trivial to do this
because we can end up with multiple instances of eg. page_launder()
running at the same time and we will want to balance them against
each other in some way to prevent them from flushing too many pages
at once.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-11  0:11                             ` Zlatko Calusic
@ 2001-01-17  6:58                               ` Rik van Riel
  2001-01-17  6:07                                 ` Marcelo Tosatti
  2001-01-17 19:04                                 ` Zlatko Calusic
  0 siblings, 2 replies; 38+ messages in thread
From: Rik van Riel @ 2001-01-17  6:58 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Marcelo Tosatti, Linus Torvalds, linux-mm

On 11 Jan 2001, Zlatko Calusic wrote:

> I have tested it for you and results are great. On some tests I got
> 20% to 30% better results which is amazing. I'll do some more tests
> but I would vote for this to get in immediately. Yes, it's *so* good.

Don't be so rash.

The patch hasn't been tested very thoroughly, otherwise
people would have noticed the problem that PG_MEMALLOC
isn't set around the page freeing code, possibly leading
to deadlocks, triple faults and other nasties.

(and yes, I'm sure there will be somebody able to trigger
this bug)

Remember that we - officially - still are in the 2.4 BUGFIX
period, it's time to be careful with the code now and we should
IMHO not randomly introduce new bugs in the name of performance.

Performance enhancements are perfectly fine, of course, but IMHO
not after they've been posted 2 hours ago and haven't been
reviewed and stresstested yet.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-17  6:58                               ` Rik van Riel
  2001-01-17  6:07                                 ` Marcelo Tosatti
@ 2001-01-17 19:04                                 ` Zlatko Calusic
  2001-01-17 19:22                                   ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Zlatko Calusic @ 2001-01-17 19:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marcelo Tosatti, Linus Torvalds, linux-mm

Rik van Riel <riel@conectiva.com.br> writes:

> On 11 Jan 2001, Zlatko Calusic wrote:
> 
> > I have tested it for you and results are great. On some tests I got
> > 20% to 30% better results which is amazing. I'll do some more tests
> > but I would vote for this to get in immediately. Yes, it's *so* good.
> 
> Don't be so rash.
> 
> The patch hasn't been tested very thoroughly, otherwise
> people would have noticed the problem that PG_MEMALLOC
> isn't set around the page freeing code, possibly leading
> to deadlocks, triple faults and other nasties.
>

Oh, believe me I tested that patch very thoroughly with lots of
utilities, and it worked very very well. I don't remember that it
fiddled anywhere with the PG_MEMALLOC flag.

But, anyway, it's in the kernel now so I can delete
/boot/vmlinuz-marcelo which was my performance etalon, it was so
good. :)

> (and yes, I'm sure there will be somebody able to trigger
> this bug)
> 
> Remember that we - officially - still are in the 2.4 BUGFIX
> period, it's time to be careful with the code now and we should
> IMHO not randomly introduce new bugs in the name of performance.
>

Yeah, right! And Linus has just included reiserfs in a prepatch.

> Performance enhancements are perfectly fine, of course, but IMHO
> not after they've been posted 2 hours ago and haven't been
> reviewed and stresstested yet.
> 

They have been tested well enough.
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-17 19:04                                 ` Zlatko Calusic
@ 2001-01-17 19:22                                   ` Ingo Molnar
  2001-01-18  0:55                                     ` Rik van Riel
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2001-01-17 19:22 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Rik van Riel, Marcelo Tosatti, Linus Torvalds, linux-mm

On 17 Jan 2001, Zlatko Calusic wrote:

> Oh, believe me I tested that patch very thoroughly with lots of
> utilities, and it worked very very well. I don't remember that it
> fiddled anywhere with the PG_MEMALLOC flag.

yep, same result here, Marcelo's patch is plain *wonderful*. Combined with
the block-IO changes, -pre8 is really behaving spectacularly in under high
VM or pagecache load.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Yet another bogus piece of do_try_to_free_pages()
  2001-01-17 19:22                                   ` Ingo Molnar
@ 2001-01-18  0:55                                     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2001-01-18  0:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Zlatko Calusic, Marcelo Tosatti, Linus Torvalds, linux-mm

On Wed, 17 Jan 2001, Ingo Molnar wrote:
> On 17 Jan 2001, Zlatko Calusic wrote:
>
> > Oh, believe me I tested that patch very thoroughly with lots of
> > utilities, and it worked very very well. I don't remember that it
> > fiddled anywhere with the PG_MEMALLOC flag.
>
> yep, same result here, Marcelo's patch is plain *wonderful*.
> Combined with the block-IO changes, -pre8 is really behaving
> spectacularly in under high VM or pagecache load.

Oh, I'm not doubting that. I just got suspicious when Linus
got asked to put it in the kernel after Zlatko tested it for
a few hours ... and when I spotted a lack of flags|=PF_MEMALLOC
around the thing.

(but from what marcelo told me, it got fixed in -pre8)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2001-01-18  0:55 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200101080602.WAA02132@pizda.ninka.net>
2001-01-08  6:42 ` Subtle MM bug Linus Torvalds
2001-01-08 13:11   ` Marcelo Tosatti
2001-01-08 16:42     ` Rik van Riel
2001-01-08 17:43     ` Linus Torvalds
2001-01-08 13:57   ` Stephen C. Tweedie
2001-01-08 17:29     ` Linus Torvalds
2001-01-08 18:10       ` Stephen C. Tweedie
2001-01-08 21:52         ` Marcelo Tosatti
2001-01-09  0:28           ` Linus Torvalds
2001-01-08 23:49             ` Marcelo Tosatti
2001-01-09  3:12               ` Linus Torvalds
2001-01-09 20:33                 ` Marcelo Tosatti
2001-01-09 22:44                   ` Linus Torvalds
2001-01-09 21:33                     ` Marcelo Tosatti
2001-01-09 22:11                       ` Yet another bogus piece of do_try_to_free_pages() Marcelo Tosatti
2001-01-10  0:06                         ` Linus Torvalds
2001-01-10  6:39                           ` Marcelo Tosatti
2001-01-10 22:19                             ` Roger Larsson
2001-01-11  0:11                             ` Zlatko Calusic
2001-01-17  6:58                               ` Rik van Riel
2001-01-17  6:07                                 ` Marcelo Tosatti
2001-01-17 19:04                                 ` Zlatko Calusic
2001-01-17 19:22                                   ` Ingo Molnar
2001-01-18  0:55                                     ` Rik van Riel
2001-01-17  6:52                           ` Rik van Riel
2001-01-09 23:58                       ` Subtle MM bug Linus Torvalds
2001-01-09 22:21                         ` Marcelo Tosatti
2001-01-10  0:23                           ` Linus Torvalds
2001-01-10  0:12                             ` Marcelo Tosatti
2001-01-10 11:29                               ` Stephen C. Tweedie
2001-01-11  3:30                             ` Marcelo Tosatti
2001-01-11  9:42                               ` Stephen C. Tweedie
2001-01-11 15:24                                 ` Marcelo Tosatti
2001-01-17  4:54                 ` Rik van Riel
2001-01-08 16:45   ` Rik van Riel
2001-01-08 17:50     ` Linus Torvalds
2001-01-08 18:21       ` Rik van Riel
2001-01-08 18:38         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox