swapout selection change in pre1

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* swapout selection change in pre1
@ 2001-01-13  3:28 Marcelo Tosatti
  2001-01-13  8:05 ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2001-01-13  3:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

Linus,

The swapout selection change in pre1 will make the kernel swapout behavior
not fair anymore to tasks which are sharing the VM (vfork()).

I dont see any clean fix for that problem. Do you? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-13  8:05 ` Linus Torvalds
@ 2001-01-13  7:41   ` Marcelo Tosatti
  2001-01-15  1:22   ` Ed Tomlinson
  1 sibling, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2001-01-13  7:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm


On Sat, 13 Jan 2001, Linus Torvalds wrote:

> It's the other way around: it used to be _extremely_ unfair towards
> threads, because threads woul dget swapped out _much_ more that
> non-threads. The new "count only nr of mm's" actually fixes a real problem
> in this area: a process with hundreds of threads would just get swapped
> out _way_ too quickly (it used to be counted as "hundreds of VM's", even
> though it's obviously just one VM, and should be swapped out as such).

The point is: Should this VM with hundreds of threads be treaded as a VM
with one thread ?

With the old "per-task" selection scheme (before -prerelease), swap_cnt
used to avoid us from scanning a VM too much (if swap_cnt reached zero the
VM would not be scanned until all other VM's had been scanned).



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-13  3:28 swapout selection change in pre1 Marcelo Tosatti
@ 2001-01-13  8:05 ` Linus Torvalds
  2001-01-13  7:41   ` Marcelo Tosatti
  2001-01-15  1:22   ` Ed Tomlinson
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2001-01-13  8:05 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

On Sat, 13 Jan 2001, Marcelo Tosatti wrote:
> 
> The swapout selection change in pre1 will make the kernel swapout behavior
> not fair anymore to tasks which are sharing the VM (vfork()).
> 
> I dont see any clean fix for that problem. Do you? 

What?

It's the other way around: it used to be _extremely_ unfair towards
threads, because threads woul dget swapped out _much_ more that
non-threads. The new "count only nr of mm's" actually fixes a real problem
in this area: a process with hundreds of threads would just get swapped
out _way_ too quickly (it used to be counted as "hundreds of VM's", even
though it's obviously just one VM, and should be swapped out as such).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-13  8:05 ` Linus Torvalds
  2001-01-13  7:41   ` Marcelo Tosatti
@ 2001-01-15  1:22   ` Ed Tomlinson
  2001-01-15  2:48     ` Linus Torvalds
  2001-01-17  7:19     ` Rik van Riel
  1 sibling, 2 replies; 16+ messages in thread
From: Ed Tomlinson @ 2001-01-15  1:22 UTC (permalink / raw)
  To: Linus Torvalds, Marcelo Tosatti; +Cc: linux-mm

On Saturday 13 January 2001 03:05, Linus Torvalds wrote:
> On Sat, 13 Jan 2001, Marcelo Tosatti wrote:
> > The swapout selection change in pre1 will make the kernel swapout
> > behavior not fair anymore to tasks which are sharing the VM (vfork()).
> >
> > I dont see any clean fix for that problem. Do you?
>
> What?
>
> It's the other way around: it used to be _extremely_ unfair towards
> threads, because threads woul dget swapped out _much_ more that
> non-threads. The new "count only nr of mm's" actually fixes a real problem
> in this area: a process with hundreds of threads would just get swapped
> out _way_ too quickly (it used to be counted as "hundreds of VM's", even
> though it's obviously just one VM, and should be swapped out as such).

Think its gone too far in the other direction now.  Running a heavily 
threaded java program, 35 threads and RSS of 44M a 128M KIII-400 with cpu 
usage of 4-10%, the rest of the system is getting paged out very quickly and 
X feels slugish.  While we may not want to treat each thread as if it was a 
process, I think we need more than one scan per group of threads sharing 
memory.  

Ideas?
Ed Tomlinson
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15  1:22   ` Ed Tomlinson
@ 2001-01-15  2:48     ` Linus Torvalds
  2001-01-15  9:24       ` Jamie Lokier
  2001-01-17  7:19     ` Rik van Riel
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2001-01-15  2:48 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: Marcelo Tosatti, linux-mm

On Sun, 14 Jan 2001, Ed Tomlinson wrote:
> 
> Think its gone too far in the other direction now.  Running a heavily 
> threaded java program, 35 threads and RSS of 44M a 128M KIII-400 with cpu 
> usage of 4-10%, the rest of the system is getting paged out very quickly and 
> X feels slugish.  While we may not want to treat each thread as if it was a 
> process, I think we need more than one scan per group of threads sharing 
> memory.  

No, what we _really_ want is to penalize processes that have high
page-fault ratios: it indicates that they have a big working set, which in
turn is the absolute best way to find a memory hog in low-memory
conditions.

This is why I think the page allocation should end up having
"swap_out(self)" in the memory allocation path - because it will quite
naturally mean that processes with high page fault ratios are also going
to be the ones paged out more aggressively.

And this is what you should get if you do "try_to_free_pages()" in
page_alloc() instead of the current "page_launder()".

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15  9:24       ` Jamie Lokier
@ 2001-01-15  8:16         ` Marcelo Tosatti
  2001-01-15 18:24         ` Linus Torvalds
  1 sibling, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2001-01-15  8:16 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linus Torvalds, Ed Tomlinson, linux-mm

On Mon, 15 Jan 2001, Jamie Lokier wrote:

> Freeing pages aggressively from a process that's paging lots will make
> that process page more, meaning more aggressive freeing etc. etc.

First, we are not necessarily freeing pages from the process. We're just
unmapping the pages and putting them on the inactive lists so they can be
actually written to swap later when they become relatively old (because
the process did not faulted the page in).

Also, the process which is trying to free pages by itself will almost
certainly do IO (to sync dirty pages), which avoids it from screwing up
the system. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15  2:48     ` Linus Torvalds
@ 2001-01-15  9:24       ` Jamie Lokier
  2001-01-15  8:16         ` Marcelo Tosatti
  2001-01-15 18:24         ` Linus Torvalds
  0 siblings, 2 replies; 16+ messages in thread
From: Jamie Lokier @ 2001-01-15  9:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

Linus Torvalds wrote:
> > While we may not want to treat each thread as if it was a 
> > process, I think we need more than one scan per group of threads sharing 
> > memory.  

> No, what we _really_ want is to penalize processes that have high
> page-fault ratios: it indicates that they have a big working set, which in
> turn is the absolute best way to find a memory hog in low-memory
> conditions.

Freeing pages aggressively from a process that's paging lots will make
that process page more, meaning more aggressive freeing etc. etc.
Either it works and reduces overall paging fairly (great), it spirals
out of control, which will be obvious, or it'll simply be stable at many
different rates which is undesirable but not so obvious in testing.

Perhaps the fair thing would be to not give a group of 35 threads 35
times as much CPU as someone else's single process.  The shared VM would
then fault much as if there were a single process doing user space
threading (or Python continuations or...).  That may still mean a larger
working set than a typical normal process, or it may not.  But at least
fault rate based paging heuristics wouldn't be skewed by unfair
allocation of CPU time.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15  9:24       ` Jamie Lokier
  2001-01-15  8:16         ` Marcelo Tosatti
@ 2001-01-15 18:24         ` Linus Torvalds
  2001-01-15 18:40           ` Jamie Lokier
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2001-01-15 18:24 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

On Mon, 15 Jan 2001, Jamie Lokier wrote:
> 
> Freeing pages aggressively from a process that's paging lots will make
> that process page more, meaning more aggressive freeing etc. etc.
> Either it works and reduces overall paging fairly (great), it spirals
> out of control, which will be obvious, or it'll simply be stable at many
> different rates which is undesirable but not so obvious in testing.

I doubt that it gets to any of the bad cases.

See - when the VM layer frees pages from a virtual mapping, it doesn't
throw them away. The pages are still there, and there won't be any "spiral
of death". If the faulter faults them in quickly, a soft-fault will happen
without any new memory allocation, and you won't see any more vmascanning.
It doesn't get "worse", if the working set actually fits in memory.

So the only case that actually triggers a "meltdown" is when the working
set does _not_ fit in memory, in which case not only will the pages be
unmapped, but they'll also get freed aggressively by the page_launder()
logic. At that point, the big process will actually end up waiting for the
pages, and will end up penalizing itself, which is exactly what we want. 

So it should never "spiral out of control", simply because of the fact
that if we fit in memory it has no other impact than initially doing more
soft page faults when it tries to find the right balancing point. It only
really kicks in for real when people are continually trying to free
memory: which is only true when we really have a working set bigger than
available memory, and which is exactly the case where we _want_ to
penalize the people who seem to be the worst offenders.

So I woubt you get any "subtle cases".

Note that this ties in to the thread issue too: if you have a single VM
and 50 threads that all fault in, that single VM _will_ be penalized. Not
because it has 50 threads (like the old code did), but because it has a
very active paging behaviour.

Which again is exactly what we want: we don't want to penalize threads per
se, because threads are often used for user interfaces etc and can often
be largely dormant. What we really want to penalize is bad VM behaviour,
and that's exactly the information we get from heavy page faulting.

NOTE! I'm not saying that tuning isn't necessary. Of course it is. And I
suspect that we actually want to add a page allocation flag (__GPF_VM)
that says that "this allocation is for growing our VM", and perhaps make
the VM shrinking conditional on that - so that the VM shrinking really
only kicks in for the big VM offenders, not for people who just read files
into the page cache.

So yes, we'll have VM tuning, the same as 2.2.x had and probably still
has. But I think our algorithms are a lot more "fundamentally stable" than
they were before. Which is not to say that the tuning is obvious - I just
claim that we will probably have a lot better time doing it, and that we
have more tools in our tool-chest.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 18:24         ` Linus Torvalds
@ 2001-01-15 18:40           ` Jamie Lokier
  2001-01-15 18:55             ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Jamie Lokier @ 2001-01-15 18:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

Linus Torvalds wrote:
> See - when the VM layer frees pages from a virtual mapping, it doesn't
> throw them away. The pages are still there, and there won't be any "spiral
> of death". If the faulter faults them in quickly, a soft-fault will happen
> without any new memory allocation, and you won't see any more vmascanning.
> It doesn't get "worse", if the working set actually fits in memory.

Ok, as long as the agressive scanning is only increased by hard faults.

> So the only case that actually triggers a "meltdown" is when the working
> set does _not_ fit in memory, in which case not only will the pages be
> unmapped, but they'll also get freed aggressively by the page_launder()
> logic. At that point, the big process will actually end up waiting for the
> pages, and will end up penalizing itself, which is exactly what we want.
> 
> So it should never "spiral out of control", simply because of the fact
> that if we fit in memory it has no other impact than initially doing more
> soft page faults when it tries to find the right balancing point. It only
> really kicks in for real when people are continually trying to free
> memory: which is only true when we really have a working set bigger than
> available memory, and which is exactly the case where we _want_ to
> penalize the people who seem to be the worst offenders.
> 
> So I woubt you get any "subtle cases".

Suppose you have two processes with the same size working set.  Process
A is almost entirely paged out and so everything it does triggers a hard
fault.  This causes A to be agressively vmscanned, which ensures that
most of A's working set pages aren't mapped, and therefore can be paged
out.

Process B is almost entirely paged in and doesn't fault very much.  It
is not being aggressively vmscanned.  After it does hard fault, there is
a good chance that the subsequent few pages it wants are still mapped.

So process A is heavily hard faulting, process B is not, and the
aggressive vmscanning of process A conspires to keep it that way.

Like the TCP unfairness problem, where one stream captures the link and
other streams cannot get a fair share.

I am waving my hands a bit but no more than Linus I think :)

Btw, reverse page mapping resolves this and makes it very simple: no
vmscanning (*), so no hand waving heuristic.  I agree that every scheme
except Dave's for reverse mapping has appeared rather too heavy.  I
don't know if anyone remembers the one I suggested a few months ago,
based on Dave's.  I believe it addresses the problems Dave noted with
anonymous pages etc.  Must find the time etc.

(*) You might vmscan for efficiency sake anyway, but it needn't affect
paging decisions.

> Note that this ties in to the thread issue too: if you have a single VM
> and 50 threads that all fault in, that single VM _will_ be penalized. Not
> because it has 50 threads (like the old code did), but because it has a
> very active paging behaviour.
> 
> Which again is exactly what we want: we don't want to penalize threads per
> se, because threads are often used for user interfaces etc and can often
> be largely dormant. What we really want to penalize is bad VM behaviour,
> and that's exactly the information we get from heavy page faulting.

Certainly, it's most desirable to simply treat VMs as just VMs.

What _may_ be a factor is that thread VMs get an unfair share of the
processor.  Probably they should not, but right now they do.  And this
unfair share certainly skews the scanning and paging statistics.  I'm
not sure if any counterbalance is needed.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 18:40           ` Jamie Lokier
@ 2001-01-15 18:55             ` Linus Torvalds
  2001-01-15 21:44               ` Jamie Lokier
  2001-01-17 23:40               ` Rik van Riel
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2001-01-15 18:55 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

On Mon, 15 Jan 2001, Jamie Lokier wrote:
> 
> Btw, reverse page mapping resolves this and makes it very simple: no
> vmscanning (*), so no hand waving heuristic.

Ehh.. Try to actually _implement_ reverse mapping, and THEN say that.

Reverse mapping is basically not simple at all. For each page table entry,
you need a

	struct reverse_map {
		/* actual pte pointer is implied by location,
		   if you implement this cleverly, but still
		   needed, of course */
		struct reverse_map *prev, *next;
		struct vm_struct *vma;
	};

thing to be efficient (and yes, you _do_ need the VMA, it's needed for
TLB invalidation when you remove the page table entry: you can't just
silently remove it).

This basically means that your page tables just grew by a factor of 4
(from one word to 1+3 words).

In addition to that, your reverse mapping thing is going to suck raw eggs:
yes, it's easy to remove a mapping (assuming you have the above kind of
thing), but you won't actually see the "accessed" bit until you get to
this point, so you won't really be able to do aging until _after_ you have
done all the work - at which point you may find that you didn't want to
remove it after all.

Finally, your cache footprint is going to suck. The advantage of scanning
the page tables is that it's a nice cache-friendly linear search. The
reverse mapping is going to be quite horrible - not only are the data
structures now four times larger, but they are jumping all over the place.

Trust me: I encourage everybody to try reverse mappings, but the only
reason people _think_ they are a good idea is that they didn't implement
them. It's damn easy to say "oh, if we only could do X, this problem would
go away", without understanding that "X" itself is a major pain in the
ass.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 18:55             ` Linus Torvalds
@ 2001-01-15 21:44               ` Jamie Lokier
  2001-01-15 21:57                 ` Linus Torvalds
  2001-01-17 23:40               ` Rik van Riel
  1 sibling, 1 reply; 16+ messages in thread
From: Jamie Lokier @ 2001-01-15 21:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

Linus Torvalds wrote:
> Ehh.. Try to actually _implement_ reverse mapping, and THEN say that.
> 
> Reverse mapping is basically not simple at all. For each page table entry,
> you need a
> 
> 	struct reverse_map {
> 		/* actual pte pointer is implied by location,
> 		   if you implement this cleverly, but still
> 		   needed, of course */
> 		struct reverse_map *prev, *next;
> 		struct vm_struct *vma;
> 	};

No, that's the point, you _don't_ need a structure per page table entry.

We have the page cache, and VMAs naturally divide the space into regions
where you can scan the list of VMAs per page in the page cache.

Anonymous pages, including private modified pages, require a bit of
structure on top of VMAs but not much.  Dave Miller basically got the
idea and provided the code.  You yourself alluded to this a few years
back.  Dave's code has a few difficulties but they are fixable.  I've
already explained how.

> thing to be efficient (and yes, you _do_ need the VMA, it's needed for
> TLB invalidation when you remove the page table entry: you can't just
> silently remove it).
> 
> This basically means that your page tables just grew by a factor of 4
> (from one word to 1+3 words).

Read my lips (*): the page tables are not mirrored.  The reverse mapping
is implicit, not explicit.  It takes virtually no space, and is still fast.

(*) By copying the Linus expression, I am expecting to be roasted now :)

> In addition to that, your reverse mapping thing is going to suck raw eggs:
> yes, it's easy to remove a mapping (assuming you have the above kind of
> thing), but you won't actually see the "accessed" bit until you get to
> this point, so you won't really be able to do aging until _after_ you have
> done all the work - at which point you may find that you didn't want to
> remove it after all.

Of course you can scan the physical pages directly.  For each physical
page, look at the "accessed" bit of all ptes pointing to that page.  If
any are set, the page is considered accessed.

I'm not saying it's a good idea to scan physical pages directly, but you
can certainly do it and you will get page aging.

> Finally, your cache footprint is going to suck. The advantage of scanning
> the page tables is that it's a nice cache-friendly linear search. The
> reverse mapping is going to be quite horrible - not only are the data
> structures now four times larger, but they are jumping all over the
> place.

These two reasons are why vmscanning is still very good.

Physical scanning and vmscanning are really quite similar.  The
statistics may come out a little in favour of physical scanning, simply
because after finding an available page it's really available _right
now_.  Whereas with vmscanning you've got to free a page at different
times in different VMs, and hope for the coincidence that the page count
reaches zero before any of the VMs faults it back.

(If you're really desparate, you can even free a very active physical
page, and there is the possibility of moving unlocked pages, in order to
defragment).

> Trust me: I encourage everybody to try reverse mappings, but the only
> reason people _think_ they are a good idea is that they didn't implement
> them. It's damn easy to say "oh, if we only could do X, this problem would
> go away", without understanding that "X" itself is a major pain in the
> ass.

I agree.  It would be a good to implement the bulky (but easy) style of
reverse mapping just to see if Rik et al. can get better paging
behaviour out of it.  If they can't, we abandon the experiment.  If they
can, then we can think about an implicit representation that doesn't use
any memory but would require bigger changes.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 21:44               ` Jamie Lokier
@ 2001-01-15 21:57                 ` Linus Torvalds
  2001-01-15 22:36                   ` Jamie Lokier
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2001-01-15 21:57 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

On Mon, 15 Jan 2001, Jamie Lokier wrote:
> 
> No, that's the point, you _don't_ need a structure per page table entry.

Ok. In that case, we already have all the infrastructure. It's just too
slow to use as a generic replacement for scanning the VM.

It's just fairly slow to look things up that way. That's going to be
especially true of you have _lots_ of people mapping that vma - you'd have
to look them all up, even if only one or two actually have the page in
question mapped.

(The alternative, of course, is to add a new "struct list_head" to the
"struct page" structure, and make that be the anchor for all VMA's that
have this page actually inserted. That would be pretty efficient, but I'd
hate wasting the memory, ugh. We could be clever and share a list for
multiple pages, ho humm..)

I still don't think it's actually worth it, but hey, I still say that if
you find a good use for it, go right ahead..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 21:57                 ` Linus Torvalds
@ 2001-01-15 22:36                   ` Jamie Lokier
  0 siblings, 0 replies; 16+ messages in thread
From: Jamie Lokier @ 2001-01-15 22:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ed Tomlinson, Marcelo Tosatti, linux-mm

Linus Torvalds wrote:
> It's just fairly slow to look things up that way. That's going to be
> especially true of you have _lots_ of people mapping that vma - you'd have
> to look them all up, even if only one or two actually have the page in
> question mapped.
>
> (The alternative, of course, is to add a new "struct list_head" to the
> "struct page" structure, and make that be the anchor for all VMA's that
> have this page actually inserted. That would be pretty efficient, but I'd
> hate wasting the memory, ugh. We could be clever and share a list for
> multiple pages, ho humm..)

I don't see how you can anchor "all VMAs that have this page actually
inserted".  That's a list per page.  Where do all the links live
(without using tons of memory)?

But anyway, as long as you can arrange that a page is hooked into a list
of regions, using region splitting, where on average at least X% of the
regions have the page mapped, that should be ok.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15  1:22   ` Ed Tomlinson
  2001-01-15  2:48     ` Linus Torvalds
@ 2001-01-17  7:19     ` Rik van Riel
  1 sibling, 0 replies; 16+ messages in thread
From: Rik van Riel @ 2001-01-17  7:19 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: Linus Torvalds, Marcelo Tosatti, linux-mm

On Sun, 14 Jan 2001, Ed Tomlinson wrote:

> Think its gone too far in the other direction now.  Running a
> heavily threaded java program, 35 threads and RSS of 44M a 128M
> KIII-400 with cpu usage of 4-10%, the rest of the system is
> getting paged out very quickly and X feels slugish.  While we
> may not want to treat each thread as if it was a process, I
> think we need more than one scan per group of threads sharing
> memory.
>
> Ideas?

Bullshit.

The old MM selection code used mm->swap_cnt to give
exactly the same result, only scanning through a larger
list.

The change that could affect this could be the thing
where we immediately unmap a page from a process if it
isn't used, so refill_inactive_scan() has better chances.

I have something (ugly?) for this in my patch on
http://www.surriel.com/patches/ ... I'll clean it up and
send it.

(damn, a week without internet is horrible ... lots of
duplicated/different/... work, some of it wasted)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-15 18:55             ` Linus Torvalds
  2001-01-15 21:44               ` Jamie Lokier
@ 2001-01-17 23:40               ` Rik van Riel
  2001-01-18 15:38                 ` Roman Zippel
  1 sibling, 1 reply; 16+ messages in thread
From: Rik van Riel @ 2001-01-17 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jamie Lokier, Ed Tomlinson, Marcelo Tosatti, linux-mm

On Mon, 15 Jan 2001, Linus Torvalds wrote:
> On Mon, 15 Jan 2001, Jamie Lokier wrote:
> >
> > Btw, reverse page mapping resolves this and makes it very simple: no
> > vmscanning (*), so no hand waving heuristic.
>
> Ehh.. Try to actually _implement_ reverse mapping, and THEN say that.
>
> Reverse mapping is basically not simple at all. For each page table entry,
> you need a
>
> 	struct reverse_map {
> 		/* actual pte pointer is implied by location,
> 		   if you implement this cleverly, but still
> 		   needed, of course */
> 		struct reverse_map *prev, *next;
> 		struct vm_struct *vma;
> 	};

Actually, you need only 2 pointers per page.

struct reverse_map {
	pte_t * pte;
	struct reverse_map * next;
};

To find the vma and mm, we will want to use the ->mapping
and ->index in the page_struct of the page table page to
indicate which mm_struct this page table is part of and which
offset this page table has in the mm_struct.

The only thing where this structure will be weak is when
you have many processes mapping the same page and blowing
away this single mapping (eg. on exec after fork, not vfork).

For large (many processes) systems it may be worth it to have
the *prev pointer as well. For small systems we can do without
it and reduce overhead.

Whether this extra memory use is offset by the fact that we can
get page replacement balancing right and page scanning CPU use
more predictable I don't know ... but I want to find out for 2.5 ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: swapout selection change in pre1
  2001-01-17 23:40               ` Rik van Riel
@ 2001-01-18 15:38                 ` Roman Zippel
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Zippel @ 2001-01-18 15:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Jamie Lokier, Ed Tomlinson, Marcelo Tosatti, linux-mm

Hi,

Rik van Riel wrote:

> > Reverse mapping is basically not simple at all. For each page table entry,
> > you need a
> >
> >       struct reverse_map {
> >               /* actual pte pointer is implied by location,
> >                  if you implement this cleverly, but still
> >                  needed, of course */
> >               struct reverse_map *prev, *next;
> >               struct vm_struct *vma;
> >       };
> 
> Actually, you need only 2 pointers per page.
> 
> struct reverse_map {
>         pte_t * pte;
>         struct reverse_map * next;
> };

To keep memory usage low and to still be reasonably fast, we could
restrict the size of a vma to two mmu levels and cache a pointer to the
pmd table in the vma, so you have less to lookup in the page table. It
would also speed up normal mapping/unmapping of entries for
architectures with more than 2 mmu levels. Generic mm code had mostly
only to deal with two mmu levels and e.g could call "pmd =
pmd_alloc_vma(vma, address);" instead of "pgd = pgd_offset(mm, address);
pmd = pmd_alloc(pgd, address);". No idea if this is fast enough for
balancing, but it would simplify other parts. :-)

bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-01-18 15:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-13  3:28 swapout selection change in pre1 Marcelo Tosatti
2001-01-13  8:05 ` Linus Torvalds
2001-01-13  7:41   ` Marcelo Tosatti
2001-01-15  1:22   ` Ed Tomlinson
2001-01-15  2:48     ` Linus Torvalds
2001-01-15  9:24       ` Jamie Lokier
2001-01-15  8:16         ` Marcelo Tosatti
2001-01-15 18:24         ` Linus Torvalds
2001-01-15 18:40           ` Jamie Lokier
2001-01-15 18:55             ` Linus Torvalds
2001-01-15 21:44               ` Jamie Lokier
2001-01-15 21:57                 ` Linus Torvalds
2001-01-15 22:36                   ` Jamie Lokier
2001-01-17 23:40               ` Rik van Riel
2001-01-18 15:38                 ` Roman Zippel
2001-01-17  7:19     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox