Question: memory management and QoS

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Question: memory management and QoS
@ 2000-08-24 10:13 Jan Astalos
  2000-08-28  7:47 ` Andrey Savochkin
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Astalos @ 2000-08-24 10:13 UTC (permalink / raw)
  To: linux-mm

Hello,

I have a question about possibility to provide Quality of Service
guaranties by Linux memory management. I'm asking this in the
context of possible use of Linux clusters in computational grids
(http://www.gridforum.org/). There is still more computing power
(mostly unused) in workstations...

One of the most important issues (IMO) is QoS. Especially, how
OS can guarantee availability of resources. Since Linux is 
top-ranking OS in high-performance clusters, obviously there will
be need to implement QoS in it.

So, why am I writing this to this list ? In last couple of days
I was experimenting with Linux MM subsystem to find out whether
Linux can (how it could) assure exclusive access to some amount 
of memory for user. Of course I was searching the archives. So 
far, I found only the beancounter patch, which is designed for 
limiting of memory usage. This is not quite exactly what I am 
looking for. Rather, users should have their memory reserved... 

If I missed something please send me the pointers.

I have some (rough) ideas how it could work and I would be 
happy if you'll send me your opinions.

Concept of personal swapfiles:

- each user would have its own swapfile (size would depend on 
  his memory needs and disk quota, he would be able to resize it)
- system swapfile would be shared between daemons and superuser
- each active user would have some amount of physical pages 
  allocated (according to selected policy)

The benefits (among others):
- there wouldn't be system OOM (only per user OOM)
- user would be able to check his available memory
- no limits for VM address space
- there could be more policies for sharing of physical memory
  by users (and system)

Drawbacks:
<please fill>

Thanks in advance for your comments,

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-24 10:13 Question: memory management and QoS Jan Astalos
@ 2000-08-28  7:47 ` Andrey Savochkin
  2000-08-28  9:28   ` Jan Astalos
  0 siblings, 1 reply; 20+ messages in thread
From: Andrey Savochkin @ 2000-08-28  7:47 UTC (permalink / raw)
  To: Jan Astalos; +Cc: linux-mm, Yuri Pudgorodsky

Hello,

On Thu, Aug 24, 2000 at 12:13:28PM +0200, Jan Astalos wrote:
[snip]
> 
> So, why am I writing this to this list ? In last couple of days
> I was experimenting with Linux MM subsystem to find out whether
> Linux can (how it could) assure exclusive access to some amount 
> of memory for user. Of course I was searching the archives. So 
> far, I found only the beancounter patch, which is designed for 
> limiting of memory usage. This is not quite exactly what I am 
> looking for. Rather, users should have their memory reserved... 
> 
> If I missed something please send me the pointers.

Well, the main goal of the memory management part of user beancounter patch
is exactly QoS.  It allows to control how to share resources between
accounting subjects and specify the minimal amount of resources that are
guaranteed to be available to them.  These minimal amounts are the guaranteed
level of service, the remaining resources are provided on a best-effort
basis, doing it more or less fairly.  The mentioned resources are total
amount of memory, and in-core memory (as opposite to swap).

The code implementing this kind of QoS has been in user beancounter patch
since version IV-0006.  See
ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/user_beancounter/
ftp://ftp.swusa.com/pub/Linux/people/saw/kernel/user_beancounter/
The current code is dirty and incomplete, so questions (and comments) are
welcome.

The patch also contains some upper limits on virtual address space.  But they
don't play any significant role, clearly being not a QoS or DoS protection
mechanism.

> 
> I have some (rough) ideas how it could work and I would be 
> happy if you'll send me your opinions.
> 
> Concept of personal swapfiles:
[snip]

I don't think that personal swapfiles is an efficient approach to achieve
QoS.  Most of the space will be reserved for exceptional cases, and, thus,
wasted, as Yuri has mentioned.  A shared swap space allowing exceeding the
guaranteed amount (if the memory isn't really used) is much more efficient
spending of the space.  If the system has some spare memory, users exceeding
their limits may still use it (but, certainly, only if only some of them, not
all, exceed the limits).  Moreover, if some users don't consume all the
memory guaranteed to them, others may temporarily use it.

Best regards
					Andrey V.
					Savochkin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28  7:47 ` Andrey Savochkin
@ 2000-08-28  9:28   ` Jan Astalos
  2000-08-28 11:30     ` Andrey Savochkin
  2000-08-28 17:25     ` Rik van Riel
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Astalos @ 2000-08-28  9:28 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: linux-mm, Yuri Pudgorodsky

Andrey Savochkin wrote:
> 
> Hello,

Hi.

>  
> I don't think that personal swapfiles is an efficient approach to achieve
> QoS.  Most of the space will be reserved for exceptional cases, and, thus,
> wasted, as Yuri has mentioned.  A shared swap space allowing exceeding the
> guaranteed amount (if the memory isn't really used) is much more efficient
> spending of the space.  If the system has some spare memory, users exceeding
> their limits may still use it (but, certainly, only if only some of them, not
> all, exceed the limits).  Moreover, if some users don't consume all the
> memory guaranteed to them, others may temporarily use it.

I think I explained my points clearly enough in my second reply to Yuri so I won't 
repeat it again. 

I still claim that per user swapfiles will:
- be _much_ more efficient in the sense of wasting disk space (saving money)
  because it will teach users efficiently use their memory resources (if
  user will waste the space inside it's own disk quota it will be his own
  problem)
- provide QoS on VM memory allocation to users (will guarantee amount of
  available VM for user)
- be able to improve _per_user_ performance of system (localizing performance
  problems to users that caused them and reducing disk seek times)
- shift the problem with OOM from system to user.

Please, don't repeat Yuri's argument with unswapable kernel objects and locked
memory. Users should be able to lock only memory inside their own allocation
and kernel objects should be accounted to this kind of memory too. Whether
it is easy or hard to implement really does not matter for design. Anyway, there
still could be pool of memory allocated to anonymous objects...

I think that your beancounter is a big step towards good QoS in Linux MM, but
I'm a bit confused when I'll hear "...users exceeding their limits". What's the
limit good for if it can be exceeded ? Can you rethought the term ?

Can you describe how to avoid VM shortage by beancounter ? 
Other than I described in my first reply to Yuri (point A) .

Regards, 
Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28  9:28   ` Jan Astalos
@ 2000-08-28 11:30     ` Andrey Savochkin
  2000-08-28 12:38       ` Jan Astalos
  2000-08-28 17:25     ` Rik van Riel
  1 sibling, 1 reply; 20+ messages in thread
From: Andrey Savochkin @ 2000-08-28 11:30 UTC (permalink / raw)
  To: Jan Astalos; +Cc: linux-mm, Yuri Pudgorodsky

On Mon, Aug 28, 2000 at 11:28:15AM +0200, Jan Astalos wrote:
> Andrey Savochkin wrote:
> > I don't think that personal swapfiles is an efficient approach to achieve
> > QoS.  Most of the space will be reserved for exceptional cases, and, thus,
> > wasted, as Yuri has mentioned.  A shared swap space allowing exceeding the
> > guaranteed amount (if the memory isn't really used) is much more efficient
> > spending of the space.  If the system has some spare memory, users exceeding
> > their limits may still use it (but, certainly, only if only some of them, not
> > all, exceed the limits).  Moreover, if some users don't consume all the
> > memory guaranteed to them, others may temporarily use it.
> 
> I think I explained my points clearly enough in my second reply to Yuri so I won't 
> repeat it again. 
> 
> I still claim that per user swapfiles will:
> - be _much_ more efficient in the sense of wasting disk space (saving money)
>   because it will teach users efficiently use their memory resources (if
>   user will waste the space inside it's own disk quota it will be his own
>   problem)
> - provide QoS on VM memory allocation to users (will guarantee amount of
>   available VM for user)
> - be able to improve _per_user_ performance of system (localizing performance
>   problems to users that caused them and reducing disk seek times)
> - shift the problem with OOM from system to user.

Ok, tell me: if user A has swapfile of 10MB and doesn't use it, whether user
B is allowed to use it meanwhile?
If the answer is no, it's a waste of space, as I said.
If the answer is yes, I don't buy your argument of better clustering and less
fragmentation.

>From my point of view, the real topics are
1. memory QoS, which starts from controlled sharing of in-core memory between
   users and, then, sharing of swap space, and the swap storage organization
   (per-user or global) being a second-order question because separate
   storages may easily be "emulated" by just quotas, and visa versa;
2. swap-out clusterization.
Speaking about the clusterization, the current code already keeps this aspect
in mind.  It may be more or less efficient, but it's a separate topic.

> I think that your beancounter is a big step towards good QoS in Linux MM, but
> I'm a bit confused when I'll hear "...users exceeding their limits". What's the
> limit good for if it can be exceeded ? Can you rethought the term ?

Well, I usually call them "thresholds" rather than "limits".
Users are guaranteed to have some quality of service below the these
thresholds, i.e. that their allocations succeed, that the processes aren't
killed because of OOM, that the pages aren't swapped out.
Over the thresholds the resources are given and requests are served on the
best-effort basis.

> Can you describe how to avoid VM shortage by beancounter ? 

I don't want to avoid VM shortage.
The goal is to introduce different levels of service and allows
administrators to manage it.
Users obeying their "contracts" (staying below the thresholds set for them)
have some guarantees.  The guarantees are real if the administrator ensures
that the sum of guaranteed amounts of resources is not greater than what's
available.
Users disobeying their "contracts" may face negative effects with the chances
depending on the amount of unused resources and the degree of their
violation.

VM shortage is possible (and total avoiding it is very inefficient).
The goal is to make its consequences controllable, guarantee that certain
processes will never suffer from it etc.

Regards
					Andrey V.
					Savochkin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 11:30     ` Andrey Savochkin
@ 2000-08-28 12:38       ` Jan Astalos
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Astalos @ 2000-08-28 12:38 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: linux-mm, Yuri Pudgorodsky

Andrey Savochkin wrote:
> 
> On Mon, Aug 28, 2000 at 11:28:15AM +0200, Jan Astalos wrote:
> > Andrey Savochkin wrote:
> > > I don't think that personal swapfiles is an efficient approach to achieve
> > > QoS.  Most of the space will be reserved for exceptional cases, and, thus,
> > > wasted, as Yuri has mentioned.  A shared swap space allowing exceeding the
> > > guaranteed amount (if the memory isn't really used) is much more efficient
> > > spending of the space.  If the system has some spare memory, users exceeding
> > > their limits may still use it (but, certainly, only if only some of them, not
> > > all, exceed the limits).  Moreover, if some users don't consume all the
> > > memory guaranteed to them, others may temporarily use it.
> >
> > I think I explained my points clearly enough in my second reply to Yuri so I won't
> > repeat it again.
> >
> > I still claim that per user swapfiles will:
> > - be _much_ more efficient in the sense of wasting disk space (saving money)
> >   because it will teach users efficiently use their memory resources (if
> >   user will waste the space inside it's own disk quota it will be his own
> >   problem)
> > - provide QoS on VM memory allocation to users (will guarantee amount of
> >   available VM for user)
> > - be able to improve _per_user_ performance of system (localizing performance
> >   problems to users that caused them and reducing disk seek times)
> > - shift the problem with OOM from system to user.
> 
> Ok, tell me: if user A has swapfile of 10MB and doesn't use it, whether user
> B is allowed to use it meanwhile?
> If the answer is no, it's a waste of space, as I said.

As a user, you would waste 10MB of your 20MB quota ? I don't think so...

> If the answer is yes, I don't buy your argument of better clustering and less
> fragmentation.

Do you really need to post questions like that ? It's obvious that swapfiles would
be protected from access by another users. That's why they will be _personal_.

> 
> >From my point of view, the real topics are
> 1. memory QoS, which starts from controlled sharing of in-core memory between
>    users and, then, sharing of swap space, and the swap storage organization
>    (per-user or global) being a second-order question because separate
>    storages may easily be "emulated" by just quotas, and visa versa;

How the quotas will give you per user clustered pages ? If the quotas will
change who will maintain them, sysadmin ? Look, how much of system maintenance cost
is cost of system administration ? Still convinced that quotas on VM are good
idea ?

> 2. swap-out clusterization.
> Speaking about the clusterization, the current code already keeps this aspect
> in mind.  It may be more or less efficient, but it's a separate topic.

? OK. Your are aimed on management of physical memory. I _is_ important.
I'm aimed on VM QoS guaranties. From my point of view this is important too...
As I said, MM of physical memory is core of QoS. But without VM QoS there
wouldn't be _any_ memory QoS at all.

> 
> > I think that your beancounter is a big step towards good QoS in Linux MM, but
> > I'm a bit confused when I'll hear "...users exceeding their limits". What's the
> > limit good for if it can be exceeded ? Can you rethought the term ?
> 
> Well, I usually call them "thresholds" rather than "limits".
> Users are guaranteed to have some quality of service below the these
> thresholds, i.e. that their allocations succeed, that the processes aren't
> killed because of OOM, that the pages aren't swapped out.
> Over the thresholds the resources are given and requests are served on the
> best-effort basis.
> 
> > Can you describe how to avoid VM shortage by beancounter ?
> 
> I don't want to avoid VM shortage.
> The goal is to introduce different levels of service and allows
> administrators to manage it.

excellent. If you'll make it flexible enough to make adding of new MM
policy straightforward, you'll have my thanks...

> Users obeying their "contracts" (staying below the thresholds set for them)
> have some guarantees.  The guarantees are real if the administrator ensures
> that the sum of guaranteed amounts of resources is not greater than what's
> available.
> Users disobeying their "contracts" may face negative effects with the chances
> depending on the amount of unused resources and the degree of their
> violation.
> 
> VM shortage is possible (and total avoiding it is very inefficient).
> The goal is to make its consequences controllable, guarantee that certain
> processes will never suffer from it etc.

I can't resist :-). So you have effectively transformed the _problem_of_VM_shortage_
to the _someone_else's_problem_ putting it completely on the shoulders of
sysadmins. Why I have still impression that it's not the right way ?
Hmm, maybe because the cost of system administation...
Can't you still see how easilly personal swapfiles would solve it ?

Regards,
Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28  9:28   ` Jan Astalos
  2000-08-28 11:30     ` Andrey Savochkin
@ 2000-08-28 17:25     ` Rik van Riel
  2000-08-30  7:38       ` Jan Astalos
  1 sibling, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2000-08-28 17:25 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Andrey Savochkin, linux-mm, Yuri Pudgorodsky

On Mon, 28 Aug 2000, Jan Astalos wrote:

> I still claim that per user swapfiles will:
> - be _much_ more efficient in the sense of wasting disk space (saving money)
>   because it will teach users efficiently use their memory resources (if
>   user will waste the space inside it's own disk quota it will be his own
>   problem)
> - provide QoS on VM memory allocation to users (will guarantee amount of
>   available VM for user)
> - be able to improve _per_user_ performance of system (localizing performance
>   problems to users that caused them and reducing disk seek times)
> - shift the problem with OOM from system to user.

Do you have any reasons for this, or are you just asserting
them as if they were fact? ;)

I think we can achieve the same thing, with higher over-all
system performance, if we simply give each user a VM quota
and do the bookkeeping on a central swap area.

The reasons for this are multiple:
1) having one swap partition will reduce disk seeks
   (no matter how you put it, disk seeks are a _system_
   thing, not a per user thing)
2) not all users are logged in at the same time, so you
   can do a minimal form of overcomitting here (if you want)
3) you can easily give users _2_ VM quotas, a guaranteed one
   and a maximum one ... if a user goes over the guaranteed
   quota, processes can be killed in OOM situations
   (this allows each user to make their own choices wrt.
   overcommitment)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 17:25     ` Rik van Riel
@ 2000-08-30  7:38       ` Jan Astalos
  2000-08-30 16:53         ` Rik van Riel
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Astalos @ 2000-08-30  7:38 UTC (permalink / raw)
  To: Rik van Riel, Andrey Savochkin, Yuri Pudgorodsky, Linux MM mailing list

Rik van Riel wrote:
> 
> On Mon, 28 Aug 2000, Jan Astalos wrote:
> 
> > I still claim that per user swapfiles will:
> > - be _much_ more efficient in the sense of wasting disk space (saving money)
> >   because it will teach users efficiently use their memory resources (if
> >   user will waste the space inside it's own disk quota it will be his own
> >   problem)
> > - provide QoS on VM memory allocation to users (will guarantee amount of
> >   available VM for user)
> > - be able to improve _per_user_ performance of system (localizing performance
> >   problems to users that caused them and reducing disk seek times)
> > - shift the problem with OOM from system to user.
> 
> Do you have any reasons for this, or are you just asserting
> them as if they were fact? ;)
> 
> I think we can achieve the same thing, with higher over-all
> system performance, if we simply give each user a VM quota
> and do the bookkeeping on a central swap area.

Sorry, As a user I wouldn't care a bit about overall system 
performance... I would care only if I can get the service I 
has paid for.

> 
> The reasons for this are multiple:
> 1) having one swap partition will reduce disk seeks
>    (no matter how you put it, disk seeks are a _system_
>    thing, not a per user thing)

I would be happy if you said that "we can guarantee that pages
of one process will be swapped to compact swap area". Reading
the code I didn't get that impression...

I didn't tested it (I always thought it's obvious) that
storing a bunch of pages to and get another bunch from the
same cylinder is _much_ faster than getting pages scattered
over large disk space (maybe in different order) forcing
disk heads to jump like mad. If you have tested it and can send
me your results, please, do it. I may be wrong, nobody's perfect. :-)
Technology changes, maybe disks have changed too...

> 2) not all users are logged in at the same time, so you
>    can do a minimal form of overcomitting here (if you want)

Overcommitting of what ? Virtual memory ? 

> 3) you can easily give users _2_ VM quotas, a guaranteed one
>    and a maximum one ... if a user goes over the guaranteed
>    quota, processes can be killed in OOM situations
>    (this allows each user to make their own choices wrt.
>    overcommitment)

What if user "suddenly" realizes that his guaranteed quota 
is not sufficient. Checkpoint ? Immediately contact sysadmin ?

Killing is bad (in general). By killing a process just to step
outside of guaranteed quota may waste all resources consumed
by killed process (including CPU time). Not saying that it's
quite inconvenient for users. (thrashing, stealing, killing
I wonder what's the most appropriate name for MM ;)

What I'd like to have are per user guarantees for:
 - performance: no one will use physical memory allocated to me.
                No matter whether I'm drinking coffee or not.
                Waiting for system to swap-in/out pages that I have
                paid for is absolutely unacceptable performance
                drop. (I may not be drinking coffee, my app
                may be just waiting for input from its other
                part located anywhere else).
 - reliability: system would prevent me from consuming more resources
                than I have. Especially the amount of requested
                VM can change over time.

Only if I ask, system will overcommit my VM memory. Then I should do
checkpointing (if I'm able to do it and accept performance drop). 
Overcommitting of physical memory up to VM guarantee is obviously desired.
Only if I allow others to use my unused memory, they would be allowed
to. This can be motivated by different charging policies.
No page stealing (just lending and reclaiming).

Yes. This can be done with one big swapfile. But I tried to imagine
how such system would be administered. And what would be the scalability
of that administration. I got a sysadmin nightmare...
- users should be charged for guarantied amount of VM.
  Otherwise they would have maximal requirements -> wasted resources
- the requirements could vary quite deeply (user to user and also by time)
- sysadmin (or more likely his MM agent) will have to schedule users
  to resources by their (changing) requirements and should take care
  about not to guarantee more than he actually can. As I said, user may
  know his VM needs only when he decides to run his app with
  desired arguments.

Consider the maintenance costs in the case of per user swapfiles.
Maybe they waste disk space when user is not logged in, but that's
user's own disk space. He can do anything he like with it, right ? (QoS).
(It could be used for storing of some persistent IPC objects...)

Maybe I'm only one with this view about memory QoS, but this can be solved only
putting both solutions to users/sysadmins/managers and see what will happen...
I'm far from claiming that per user swapfiles will be appropriate for
all situations. So don't look at it as I would claim that anyone with different
opinion is stupid. Everyone tends to bound his solution to concrete problem
he has to solve. And seeing how my solution will fit to the requirements of other
people is always helpful. I'm always willing to learn new things...

I certainly will implement my approach (in fact the most important
part is more or less done by Andrey (thanks). If for nothing else, then just for
seeing how it will perform. Look at it as at different approach to swap clustering.
My main intention is to get the list of potential implementation pitfalls that can 
arise. Thanks.

Regards,

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-30  7:38       ` Jan Astalos
@ 2000-08-30 16:53         ` Rik van Riel
  2000-08-31  1:48           ` Andrey Savochkin
  2000-08-31 11:49           ` Jan Astalos
  0 siblings, 2 replies; 20+ messages in thread
From: Rik van Riel @ 2000-08-30 16:53 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Andrey Savochkin, Yuri Pudgorodsky, Linux MM mailing list

On Wed, 30 Aug 2000, Jan Astalos wrote:
> Rik van Riel wrote:

> > I think we can achieve the same thing, with higher over-all
> > system performance, if we simply give each user a VM quota
> > and do the bookkeeping on a central swap area.
> 
> Sorry, As a user I wouldn't care a bit about overall system 
> performance... I would care only if I can get the service I 
> has paid for.

*sigh*

It's not always /you/ who is out drinking coffee. The other
users will be drinking coffee too some of the time, and when
they are out drinking coffee you'll have the extra performance
they're not using at that moment.

Oh wait ... you didn't pay for it so you don't want it. ;)

> > The reasons for this are multiple:
> > 1) having one swap partition will reduce disk seeks
> >    (no matter how you put it, disk seeks are a _system_
> >    thing, not a per user thing)
> 
> I would be happy if you said that "we can guarantee that pages
> of one process will be swapped to compact swap area". Reading
> the code I didn't get that impression...

That's because it's not implemented yet. But I definately plan
to have better swap clustering for Linux 2.5.

> I didn't tested it (I always thought it's obvious) that
> storing a bunch of pages to and get another bunch from the
> same cylinder is _much_ faster than getting pages scattered
> over large disk space (maybe in different order) forcing
> disk heads to jump like mad.

It is. Unfortunately you won't be able to swap one thing in
without swapping OUT something else. This is why you really
really want to have the swap for all users in the same place
on disk.

> > 2) not all users are logged in at the same time, so you
> >    can do a minimal form of overcomitting here (if you want)
> 
> Overcommitting of what ? Virtual memory ? 

Yes. If you sell resources to 10.000 users, there's usually no
need to have 10.000 times the maximum per-user quota for every
system resource.

Instead, you sell each user a guaranteed resource with the
possibility to go up to a certain maximum. That way you can give
your users a higher quality of service for much lower pricing,
only with 99.999% guarantee instead of 100%.

> > 3) you can easily give users _2_ VM quotas, a guaranteed one
> >    and a maximum one ... if a user goes over the guaranteed
> >    quota, processes can be killed in OOM situations
> >    (this allows each user to make their own choices wrt.
> >    overcommitment)
> 
> What if user "suddenly" realizes that his guaranteed quota 
> is not sufficient. Checkpoint ? Immediately contact sysadmin ?

That's up to the user. In general the system resources between
the guaranteed and the maximum quota should be fairly reliable
(say, 99.99%). If the user really needs better than that, (s)he
should buy more guaranteed quota...

> Killing is bad (in general). By killing a process just to step
> outside of guaranteed quota may waste all resources consumed
> by killed process (including CPU time). Not saying that it's
> quite inconvenient for users.

Of course it's inconvenient, but it should be far less inconvenient
than paying 10 times more for their quota on the system because they
want everything to be guaranteed.

The difference between 99.99% and 99.999% usually isn't worth a
10-fold increase in price for most things.

> What I'd like to have are per user guarantees for:
>  - performance: no one will use physical memory allocated to me.

So for /your/ system, you set the guaranteed and the maximum quota
to the same value and have your users pay the 10-fold extra in
price. If you can get any customers with that pricing, of course...

>                 Waiting for system to swap-in/out pages that I have
>                 paid for is absolutely unacceptable performance
>                 drop. (I may not be drinking coffee, my app
>                 may be just waiting for input from its other
>                 part located anywhere else).

"Its other part" may benefit a lot from being able to use some
extra physical memory by having something idle swapped out.

>  - reliability: system would prevent me from consuming more resources
>                 than I have. Especially the amount of requested
>                 VM can change over time.

That's an administrative decision. IMHO it would be a big mistake
to hardcode this in the OS.

Also, you should remember that the overall system performance is
an upper limit on the sum of per-user performance. Without good
overall performance, you cannot support either a lot of users or
good performance per user. This should most likely make it worth
it to keep overall system performance in mind even when you're
doing QoS things...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-30 16:53         ` Rik van Riel
@ 2000-08-31  1:48           ` Andrey Savochkin
  2000-08-31 11:49           ` Jan Astalos
  1 sibling, 0 replies; 20+ messages in thread
From: Andrey Savochkin @ 2000-08-31  1:48 UTC (permalink / raw)
  To: Rik van Riel, Jan Astalos; +Cc: Yuri Pudgorodsky, Linux MM mailing list

On Wed, Aug 30, 2000 at 01:53:09PM -0300, Rik van Riel wrote:
> 
> Yes. If you sell resources to 10.000 users, there's usually no
> need to have 10.000 times the maximum per-user quota for every
> system resource.
> 
> Instead, you sell each user a guaranteed resource with the
> possibility to go up to a certain maximum. That way you can give
> your users a higher quality of service for much lower pricing,
> only with 99.999% guarantee instead of 100%.

That's exactly what I was speaking about and what I've been implementing..
Rik, thanks for saying it for me :-)

	Andrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-30 16:53         ` Rik van Riel
  2000-08-31  1:48           ` Andrey Savochkin
@ 2000-08-31 11:49           ` Jan Astalos
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Astalos @ 2000-08-31 11:49 UTC (permalink / raw)
  To: Rik van Riel, Yuri Pudgorodsky, Andrey Savochkin, Linux MM mailing list

Rik van Riel wrote:
> 
> On Wed, 30 Aug 2000, Jan Astalos wrote:
> > Rik van Riel wrote:
> 
> > > I think we can achieve the same thing, with higher over-all
> > > system performance, if we simply give each user a VM quota
> > > and do the bookkeeping on a central swap area.
> >
> > Sorry, As a user I wouldn't care a bit about overall system
> > performance... I would care only if I can get the service I
> > has paid for.
> 
> *sigh*
> 
> It's not always /you/ who is out drinking coffee. The other
> users will be drinking coffee too some of the time, and when
> they are out drinking coffee you'll have the extra performance
> they're not using at that moment.
> 
> Oh wait ... you didn't pay for it so you don't want it. ;)

If the extra performance means loss of reliability then the answer
is no. Also, if I have to schedule 1000+ pieces of distributed
app and cannot count with the extra performance the answer is
no as well.

But back to your coffee drinking example. I think, that swapping
out of idle programs could lead to serious problems with interactive 
performance. Ok, not every program needs it, but IMO it would be
much more convenient for user to have some screensaver (or better
- moneysaver) which would suspend all his processes and return
physical memory (temporarily) back to the system. Or to allow
user to switch (use my free/all memory/don't use my memory).

> 
> > > The reasons for this are multiple:
> > > 1) having one swap partition will reduce disk seeks
> > >    (no matter how you put it, disk seeks are a _system_
> > >    thing, not a per user thing)
> >
> > I would be happy if you said that "we can guarantee that pages
> > of one process will be swapped to compact swap area". Reading
> > the code I didn't get that impression...
> 
> That's because it's not implemented yet. But I definately plan
> to have better swap clustering for Linux 2.5.

Can you share some ideas how it will look like ? 

> 
> > I didn't tested it (I always thought it's obvious) that
> > storing a bunch of pages to and get another bunch from the
> > same cylinder is _much_ faster than getting pages scattered
> > over large disk space (maybe in different order) forcing
> > disk heads to jump like mad.
> 
> It is. Unfortunately you won't be able to swap one thing in
> without swapping OUT something else. This is why you really
> really want to have the swap for all users in the same place
> on disk.

Why user that needs to swap something in should swap out
the pages from another user ? Why not to get the LRU pages of
that user and get the bunch of readahead pages from the (relatively)
small swap area ? As Yuri has pointed out it's not that new idea...

> 
> > > 2) not all users are logged in at the same time, so you
> > >    can do a minimal form of overcomitting here (if you want)
> >
> > Overcommitting of what ? Virtual memory ?
> 
> Yes. If you sell resources to 10.000 users, there's usually no
> need to have 10.000 times the maximum per-user quota for every
> system resource.
> 
> Instead, you sell each user a guaranteed resource with the
> possibility to go up to a certain maximum. That way you can give
> your users a higher quality of service for much lower pricing,
> only with 99.999% guarantee instead of 100%.

You're right, I was shortsighted... ;).
 
> > Killing is bad (in general). By killing a process just to step
> > outside of guaranteed quota may waste all resources consumed
> > by killed process (including CPU time). Not saying that it's
> > quite inconvenient for users.
> 
> Of course it's inconvenient, but it should be far less inconvenient
> than paying 10 times more for their quota on the system because they
> want everything to be guaranteed.
> 
> The difference between 99.99% and 99.999% usually isn't worth a
> 10-fold increase in price for most things.

Sometimes is. If you would have 1000 pieces, the probability that whole
app would fail is 1% which is still quite high if you consider consumed 
resources and money spent. And if just putting of some additional
swap area (inside otherwise unused disk quota) would make the 
guarantee rock solid it is certainly worth it. 
(Welcome to grid computing ;).

> 
> >  - reliability: system would prevent me from consuming more resources
> >                 than I have. Especially the amount of requested
> >                 VM can change over time.
> 
> That's an administrative decision. IMHO it would be a big mistake
> to hardcode this in the OS.

How about to allow user to switch whether he needs to be limited by 
VM guarantee or not...

If you will add some per_user_clustering into swap, it will certainly 
perform better than per user swap files. But administration costs 
would be still equally high. Not saying that if user needs reliable 
service and system is not able to increase his guarantee (lack of 
resources), user has to wait -> lower interactive performance. 
Which will lead to batch queuing system.

Even with per user clustered swapfile, personal swapfiles still could
get rid of low variations in user VM needs. Argument that they will waste
disk space I can't accept. User can still mmap file for extending his
VM memory. From this angle, personal swapfiles would be just an extra
feature provided by system for non-developpers...

Regards,

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-30  9:01             ` Jan Astalos
@ 2000-08-30 11:42               ` Marco Colombo
  0 siblings, 0 replies; 20+ messages in thread
From: Marco Colombo @ 2000-08-30 11:42 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Linux MM mailing list

On Wed, 30 Aug 2000, Jan Astalos wrote:

> Andrey Savochkin wrote:
> [snip]
> 
> > > As a user, I won't bear _any_ overcommits at all. Once service is paid, I expect
> > > guarantied level of quality. In the case of VM, all the memory I paid for.
> > > For all of my processes.
> > 
> > It means that you pay orders of magnitude more for it.
> 
> If I got it right you are speaking about disk space. About sum of disk quotas
> "orders of magnitude" higher than actual available disk space, right ?
> You will sell users more disk space than you have for the price of your
> actual space (and you'll hope that they won't use whole disk).
> 
> But you must get the disk space when users will need it (QoS), so in disk shortage,
> you'll need to buy next one. You'll then send an additional bill to them ?

Well, IMHO it's a matter of numbers. If you're going to use 1/2 of the
system resources, you may afford the cost of a whole dedicated system.
No quotas, no users, no problems.
If you're going to use 1/1000 of them (so we're speaking of a huge system)
you may consider that, on average, no all users will be using their 
resources, and it makes a lot of sense to overcommit.
If all citizens in a big town want to use their phone at the same time,
most of them won't get the service. But it almost never happens.
The bigger the numbers involved, the safer to overcommit. It allows the
service provider to lower costs a lot. I think no one is selling a 
phone line that is garanteed to *always* work (it works only for p-o-p links,
i.e. leased lines). It would cost too much, and, in practice, give no real
advantage... 
That's the whole idea behind time(and resource)-sharing systems...
Otherwise, that 1/1000 of the huge system will cost *much* more than
a personal workstation with better performances. And also you'll see
other users, paying 1/100 of what you're paying, get almost the same
service (the system almost never fails to fulfill their requests).

And you're not selling "more disk space than you have". You're selling
10MB and the user gets up to 10MB. Shortage *almost* never happens,
so you *almost* always provide the service...

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 13:10           ` Andrey Savochkin
@ 2000-08-30  9:01             ` Jan Astalos
  2000-08-30 11:42               ` Marco Colombo
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Astalos @ 2000-08-30  9:01 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: Yuri Pudgorodsky, Linux MM mailing list

Andrey Savochkin wrote:
[snip]

> > As a user, I won't bear _any_ overcommits at all. Once service is paid, I expect
> > guarantied level of quality. In the case of VM, all the memory I paid for.
> > For all of my processes.
> 
> It means that you pay orders of magnitude more for it.

If I got it right you are speaking about disk space. About sum of disk quotas
"orders of magnitude" higher than actual available disk space, right ?
You will sell users more disk space than you have for the price of your
actual space (and you'll hope that they won't use whole disk).

But you must get the disk space when users will need it (QoS), so in disk shortage,
you'll need to buy next one. You'll then send an additional bill to them ?

> 
> > Do you mean "pages shared between processes of particular user" ? Where's the problem ?
> > If you mean "pages provided by user to another user", I still don't see the problem...
> >
> > If you mean anonymous pages not owned by any user, I'm really interested why this should
> > be allowed (to let some trash to pollute system resources. Is it common practice ?).
> 
> Well, you're speaking about private pages only.

No.

> I speak about all memory resources, in-core and swap, and all kinds of
> memory, shared and private, file mapped and anonymous.
> 

I don't think it's a problem to associate private memory (or private file map)
with user. Shared memory should have its owner and permissions. Otherwise I don't know
what would be the permissions good for.

Mapped files I didn't considered at all. I thought that they have private swap space 
(the file). So it's not a problem of personal swapfiles. It's a problem of accounting
of physical memory (as I said, I know that this part of MM is much more complicated
and I'm not going to write whole MM myself and from scratch :). I hope that beancounter 
would become more discussed as 2.5 will fork and all the physical memory accounting
problems will be touched then.

So from the point of implementation of personal swapfiles it is important to select
the right swapfile for swapin/swapout. And solve the cases when a page changes owner.
And of course swapon/swapoff. Anything else is in the layer of physical memory 
management.

Can you be more concrete for whose memory objects (swappable) it is a problem
to find owner and why, it would help me a lot (maybe in private e-mail) ? Thanx.

Regards,

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 12:10         ` Jan Astalos
  2000-08-28 13:10           ` Andrey Savochkin
@ 2000-08-28 17:40           ` Rik van Riel
  1 sibling, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2000-08-28 17:40 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Andrey Savochkin, Yuri Pudgorodsky, Linux MM mailing list

On Mon, 28 Aug 2000, Jan Astalos wrote:

> I wont repeat it again. With personal swapfiles _all_ users
> would be guarantied to get the amount of virtual memory provided
> by _themselves_.

This is STUPID.

Suppose that one user has a 10MB swapfile and a 32MB physical
memory quota (quite reasonable or even low nowadays).

Now suppose that user is away from the console (drinking coffee)
and has 20MB of IDLE processes sitting around.

In the mean time, another user is running something that could
really need a bit more physical memory, but it CANNOT get the
memory because the first (coffee drinking) user doesn't have
the swap space available...

This is a rediculously inefficient situation that should (and
can) be easily avoided by simply having per-user VM and RSS
_quotas_, but sharing one system-wide swap area.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 12:10         ` Jan Astalos
@ 2000-08-28 13:10           ` Andrey Savochkin
  2000-08-30  9:01             ` Jan Astalos
  2000-08-28 17:40           ` Rik van Riel
  1 sibling, 1 reply; 20+ messages in thread
From: Andrey Savochkin @ 2000-08-28 13:10 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Yuri Pudgorodsky, Linux MM mailing list

On Mon, Aug 28, 2000 at 02:10:57PM +0200, Jan Astalos wrote:
> Andrey Savochkin wrote:
[snip]
> > 
> > That's what user beancounter patch is about.
> > Except that I'm not so strong in the judgements.
> > For example, I don't think that overcommits are evil.  They are quite ok if
> 
> Did you ever asked your users ? Whether they like to see their apps (possibly running
> for quite a long time) to be killed (no matter whether with or without warning) ?

Well, I was the person responsible for the work of servers (HTTP, FTP, mail,
proxy, statistic and accounting etc).
Yes, I want some applications to be killed under certain conditions.

> 
> > 1. the system can provide guarantee that certain processes can never be
> >    killed because of OOM;
> 
> Again. I wonder how beancounter would prevent overcommit of virtual memory if you don't
> set limits...

It doesn't prevent overcommit.
And it doesn't prevent out-of-memory situations.
It prevents processes that stays below preconfigured threshold to face
negative consequences of overcommits, OOM or whatever else.
If OOM happens then _someone_ is over the threshold, and this very one will
face the consequences.

You propose exactly the same: user whose swap file ends is the only one who
faces problems.  The difference is that with my code he likely avoids
problems if some other user doesn't consumed all his resources.
What you propose is just punish the user unconditionally, even if there are
some spare resources...

> 
> > 2. the whole system reaction to OOM situation is well predictable.
> > It's a part of quality of service: some processes/groups of processes have
> > better service, some others only best effort.
> 
> I wont repeat it again. With personal swapfiles _all_ users would be guarantied
> to get the amount of virtual memory provided by _themselves_.

Yes, personal swapfiles solve this problem, too.
They are just a waste of resources, to be very frank.

[snip]
> As a user, I won't bear _any_ overcommits at all. Once service is paid, I expect
> guarantied level of quality. In the case of VM, all the memory I paid for.
> For all of my processes.

It means that you pay orders of magnitude more for it.

> Do you mean "pages shared between processes of particular user" ? Where's the problem ?
> If you mean "pages provided by user to another user", I still don't see the problem...
> 
> If you mean anonymous pages not owned by any user, I'm really interested why this should
> be allowed (to let some trash to pollute system resources. Is it common practice ?).

Well, you're speaking about private pages only.
I speak about all memory resources, in-core and swap, and all kinds of
memory, shared and private, file mapped and anonymous.

Regards
					Andrey V.
					Savochkin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28 11:05       ` Andrey Savochkin
@ 2000-08-28 12:10         ` Jan Astalos
  2000-08-28 13:10           ` Andrey Savochkin
  2000-08-28 17:40           ` Rik van Riel
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Astalos @ 2000-08-28 12:10 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: Yuri Pudgorodsky, Linux MM mailing list

Andrey Savochkin wrote:
> 
> On Mon, Aug 28, 2000 at 10:36:53AM +0200, Jan Astalos wrote:
> [snip]
> > How about to split memory QoS into:
> >   - guarantied amount of physical memory
> >   - guarantied amount of virtual memory
> >
> > The former is much more complicated and includes page replacement policies
> > along with fair sharing of physical memory (true core of QoS).
> >
> > The latter should gurantee users requested amount of VM. I.e. avoid this kind
> > of situation: successful malloc, a lot of work, killed in action due to OOM (
> > out of munition^H^H^H^H^H^H^H^Hmemory), RIP...
> > In the current state it's the problem of system administration. In my approach
> > it will become user's problem. So user would be able to satisfy his need for
> > VM himself and system would only take care of fair management of physical memory.
> 
> That's what user beancounter patch is about.
> Except that I'm not so strong in the judgements.
> For example, I don't think that overcommits are evil.  They are quite ok if

Did you ever asked your users ? Whether they like to see their apps (possibly running
for quite a long time) to be killed (no matter whether with or without warning) ?

> 1. the system can provide guarantee that certain processes can never be
>    killed because of OOM;

Again. I wonder how beancounter would prevent overcommit of virtual memory if you don't
set limits...

> 2. the whole system reaction to OOM situation is well predictable.
> It's a part of quality of service: some processes/groups of processes have
> better service, some others only best effort.

I wont repeat it again. With personal swapfiles _all_ users would be guarantied
to get the amount of virtual memory provided by _themselves_.

> 
> It's simply impossible to run Internet servers without overcommits.

Which kind of Internet server ? Web server or e-mail server with 100+ active users...
Its questionable in what case QoS is more important. (sorry for flamebait)

> I encourage you to take a look at
> ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/user_beancounter/MemoryManagement.html,
> especially Overcommits section.
> I need real guarantees only to some of processes, and I can bear overcommits
> and 0.01%/year chances for other processes being killed if it saves me the
> cost of 10Gygabytes of RAM (and the cost of motherboard which supports this
> amount of memory).

As a user, I won't bear _any_ overcommits at all. Once service is paid, I expect
guarantied level of quality. In the case of VM, all the memory I paid for.
For all of my processes.

> 
> [snip]
> >
> > > Userbeancounters are for that accounting. The problem is there are many different objects
> > > in play here, and sometimes it is not possible to associate them with particular user.
> >
> > But that's not a design flaw, it's a problem of implementation.
> 
> No.
> How do you propose to associate shared pages (or unmapped page cache) with a
> particular user?
> 

Do you mean "pages shared between processes of particular user" ? Where's the problem ?
If you mean "pages provided by user to another user", I still don't see the problem...

If you mean anonymous pages not owned by any user, I'm really interested why this should
be allowed (to let some trash to pollute system resources. Is it common practice ?).
OK, this can be solved by allocating some amount of memory (along with swapfile) to
anonymous user.
This kind of pages can be (and should be) avoided by communicating via shared files...

(Btw, the best argument I saw so far. I'm really happy that we finally got to
real arguments why personal swapfiles wouldn't work. The efficiency question can be
solved only with implementation under heavy fire).

Thank you for suggestion...

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-28  8:36     ` Jan Astalos
@ 2000-08-28 11:05       ` Andrey Savochkin
  2000-08-28 12:10         ` Jan Astalos
  0 siblings, 1 reply; 20+ messages in thread
From: Andrey Savochkin @ 2000-08-28 11:05 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Yuri Pudgorodsky, Linux MM mailing list

On Mon, Aug 28, 2000 at 10:36:53AM +0200, Jan Astalos wrote:
[snip]
> How about to split memory QoS into:
>   - guarantied amount of physical memory
>   - guarantied amount of virtual memory  
> 
> The former is much more complicated and includes page replacement policies
> along with fair sharing of physical memory (true core of QoS).
> 
> The latter should gurantee users requested amount of VM. I.e. avoid this kind
> of situation: successful malloc, a lot of work, killed in action due to OOM (
> out of munition^H^H^H^H^H^H^H^Hmemory), RIP...
> In the current state it's the problem of system administration. In my approach
> it will become user's problem. So user would be able to satisfy his need for
> VM himself and system would only take care of fair management of physical memory.

That's what user beancounter patch is about.
Except that I'm not so strong in the judgements.
For example, I don't think that overcommits are evil.  They are quite ok if
1. the system can provide guarantee that certain processes can never be
   killed because of OOM;
2. the whole system reaction to OOM situation is well predictable.
It's a part of quality of service: some processes/groups of processes have
better service, some others only best effort.

It's simply impossible to run Internet servers without overcommits.
I encourage you to take a look at
ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/user_beancounter/MemoryManagement.html,
especially Overcommits section.
I need real guarantees only to some of processes, and I can bear overcommits
and 0.01%/year chances for other processes being killed if it saves me the
cost of 10Gygabytes of RAM (and the cost of motherboard which supports this
amount of memory).

[snip]
> 
> > Userbeancounters are for that accounting. The problem is there are many different objects
> > in play here, and sometimes it is not possible to associate them with particular user.
> 
> But that's not a design flaw, it's a problem of implementation.

No.
How do you propose to associate shared pages (or unmapped page cache) with a
particular user?

Regards
					Andrey V.
					Savochkin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-25 20:17   ` Yuri Pudgorodsky
@ 2000-08-28  8:36     ` Jan Astalos
  2000-08-28 11:05       ` Andrey Savochkin
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Astalos @ 2000-08-28  8:36 UTC (permalink / raw)
  To: Yuri Pudgorodsky, Linux MM mailing list

Yuri Pudgorodsky wrote:
> 
> Jan Astalos wrote:
> 
> > > I suppose you missed some points or I do not understand you needs.
> > > For general computation (and for almost all other workloads),
> > > I think you do not need "reserved" memory - "reserved memory == wasted memory".
> >
> > Only if reserved == unused. If reserved means 'available when needed' it's
> > completely different question. Other users can use it but when the owner
> > will reclaim it, system will swap them out.
> 
> Yes, I did not read your previous post in such a way. But think of reclaiming
> swap space, what does it mean? Copying pages from one swapfile to another?

Reclaiming swap space ? What would be that good for ? Pages would change swap
file only if they change owner (swapin, chown, swapout).

> 
> If we want to reclaim used physical pages, we're speaking about "page replacement
> policy". It is not important where we will store swapped pages: on a partition
> or a file or multiple files. What is important that's an algorithm used to
> choose what page to replace.

Incorrect. It is _very_ important where we store swapped pages. IMO it makes big
difference whether we scatter pages of single process across large swap file(s)
or whether we keep it inside relatively small (continuous) disk space.
I agree that LRU policy can improve _overall_ performance. But it sacrifices 
_per_user_ performance.

> 
> Moreover, speaking about performance I'm sure fragmenting swap space to multiple
> files is bad unless these files are on seperate physical disks.
> 
> What I wanted to say in previous post too, you buy nothing interesting
> using swapfile per user. What you need to change to provide a "fairness"
> to each user is a page replacement strategy.

How about to split memory QoS into:
  - guarantied amount of physical memory
  - guarantied amount of virtual memory  

The former is much more complicated and includes page replacement policies
along with fair sharing of physical memory (true core of QoS).

The latter should gurantee users requested amount of VM. I.e. avoid this kind
of situation: successful malloc, a lot of work, killed in action due to OOM (
out of munition^H^H^H^H^H^H^H^Hmemory), RIP...
In the current state it's the problem of system administration. In my approach
it will become user's problem. So user would be able to satisfy his need for
VM himself and system would only take care of fair management of physical memory.

> 
> There are algorithms for local page replacement policy, implemented for years
> on mainframe OS-es (OS/390, MVS, VMS also has it AFAIK). With such design,
> if process needs page, OS replaces a page from the process itself, or from
> another process of the same user. Of course thare are many configurable
> parameters :-)
> 
> Unixes on contast traditionally implement global page replacement polices,
> using some kind of LRU strategy. This proves to be more better for overall
> system performance with a large number of users,  while local policy allow you
> to fine tune system for a specific tasks. Global page replacement however often
> suffer from a page trashing problems when several processes actively replace each
> other pages. I'd like to see trash protecting algorithms find its way into Linux.

Me too...

> 
> Will, user will set a swapfile to all available disk quota and start trashing VM
> with  " *p = 0; p+= PAGE_SIZE ". With global LRU page replacement,
> we will end with all RAM occupied by his pages, and unusable performance for each other.
> 
> Or you suggest  user disk quota  <  RAM?

I think I didn't get your point here...

> Userbeancounters are for that accounting. The problem is there are many different objects
> in play here, and sometimes it is not possible to associate them with particular user.

But that's not a design flaw, it's a problem of implementation.

> 
> >
> > A: making a swap disk and setting per user VM memory limit to
> > size/maximal_number_of_simultaneous_users is TheRightWay (tm)
> > how to avoid memory pressure...
> 
> No. Overcommiting memory is a normal practice within UNIX world. So you
> should not want to restrict users in VM space. However you want to protect
> users using low VM set (actually, low resident pages) from users trashing
> large VM area. Regardless of swap space used.

If you want to implement QoS, you really should avoid overcommiting of memory.
Otherwise your clever page replacement techniques will be absolutely useless.

> 
> So,
> 
> 1) Allocation a large VM area is OK, even overcommitted
>    (and mmaping a large file is OK too).
>    You should not want restrict user VM space.

100% agreed. But it makes a difference in mapped VM and actually
used VM. I'd like to avoid even mention of limits on VM address
space (RLIMIT_AS). But it's necessary to limit the number of
used VM pages (in order to prevent OOM).

> 
> 2) If there is a plenty of RAM, active usage of VM by a single
>    user/process should use all available RAM (even 99,9% of
>    overall system memory).

agreed.

> 
>    Stealing not used for a long time pages from other users is OK too.
>    Stealing an active page from a user with low memory usage
>    (lower then some configurable value) is not OK.

In other words, used memory(meaning physical) pages inside the area
allocated to a process should be protected from swapout. See ? Not
a limit on RSS (beancounter's notation), because it's actually not 
a limit. Limit will be determined by MM policy (see below).

> 
> 3) When low-memory-usage user wants a page, we should first try to
>    take it from users with high (over quota) memory usage.
> 

Exactly. (quota == allocation). But not with LRU. Free memory should be 
divided between users fairly. I mean, according to the ratio of their
allocations.

> 
> I suppose system will loose performance with each logged in user due
> to swap space fragmentation.

I disagree.
  - users with low memory usage (inside their allocation) wont swap at all.
  - users with high memory usage will swap into their own (relatively compact)
    swap file. Therefore it would be possible to exchange a bunch of LRU
    pages (of this user) with his swapped pages without even moving disk heads.

> 
> (A)
>   + guarantee physically continuous space

not at all... even with clustering. Reading the code, the pages of one process
can get scattered over swap device. (Correct me if I missed something).

>   + less overhead

questionable. Finding free swap page would be much easier. But that's not the
point. _per_user_ swapping performance depends on the location of his pages
in swap device.

>   - difficult to resize, usually resulting in admin decision to waste  more disk space

i.e. money... (which could be otherwise spend on physical memory ;-)

> 
> (B)
>   + may use less disk space for many usage patterns
>   + easy to add/delete space
>   - is less efficient due to overhead of additional block bitmap operations,
>     (to convert fs block to pages) but I did not see any actual numbers;

I'm really interested how large is this overhead comparing to disk seek times.
(with regard to increasing CPU speed, readahead stuff in disks and still larger
swapfiles)

>   - may be less efficient (and even to order of magnitute) if we
>     take into account swap clustering: we loose ability to cluster
>     swap requests from different users.

Maybe, maybe not. I can't see right now how clustering of pages from different
users can guarantee (we still talking about QoS) _per_user_ performance...

> 
> Both (A) and (B) needs some QoS-aware "fairness" page replacement algorithms.
> But IMO (A) has more advantages :-) and using (B) has little to do with QoS.

You should ask users whether they accept QoS that forces them to use
calloc(3) to ensure that they really will get requested amount of VM.

IMO, QoS should be transparent to users. If I'll pay for some service,
I'll expect some guarantied quality. Not to force me to rewrite my code.
There is _much_ more users than developers...

> 
> > And another question: How the size of swapfile (partition) affects
> > the performance of swapping (with regard to fragmentation) ?
> 
> Even for unfragmented files, there should be difference.
> I'm interesting in this numbers too :-)
> 

By swap fragmentation I don't mean fragmentation of swap device, but the
fragmentation of swap area.

see the section about swap clustering in http://lwn.net/1999/0121/a/vmreview.html

Jan

PS: Memory management of top-ranking OS of 21st century is not a counting of beans...
    Sorry, I could not help myself :-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-25 15:51 ` Jan Astalos
@ 2000-08-25 20:17   ` Yuri Pudgorodsky
  2000-08-28  8:36     ` Jan Astalos
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Pudgorodsky @ 2000-08-25 20:17 UTC (permalink / raw)
  To: Jan Astalos; +Cc: Linux MM mailing list

Jan Astalos wrote:

> > I suppose you missed some points or I do not understand you needs.
> > For general computation (and for almost all other workloads),
> > I think you do not need "reserved" memory - "reserved memory == wasted memory".
>
> Only if reserved == unused. If reserved means 'available when needed' it's
> completely different question. Other users can use it but when the owner
> will reclaim it, system will swap them out.

Yes, I did not read your previous post in such a way. But think of reclaiming
swap space, what does it mean? Copying pages from one swapfile to another?

If we want to reclaim used physical pages, we're speaking about "page replacement
policy". It is not important where we will store swapped pages: on a partition
or a file or multiple files. What is important that's an algorithm used to
choose what page to replace.

Moreover, speaking about performance I'm sure fragmenting swap space to multiple
files is bad unless these files are on seperate physical disks.

What I wanted to say in previous post too, you buy nothing interesting
using swapfile per user. What you need to change to provide a "fairness"
to each user is a page replacement strategy.

There are algorithms for local page replacement policy, implemented for years
on mainframe OS-es (OS/390, MVS, VMS also has it AFAIK). With such design,
if process needs page, OS replaces a page from the process itself, or from
another process of the same user. Of course thare are many configurable
parameters :-)

Unixes on contast traditionally implement global page replacement polices,
using some kind of LRU strategy. This proves to be more better for overall
system performance with a large number of users,  while local policy allow you
to fine tune system for a specific tasks. Global page replacement however often
suffer from a page trashing problems when several processes actively replace each
other pages. I'd like to see trash protecting algorithms find its way into Linux.

Unfortunatly I did not see good papers comparing one system with another.
However I hear OS/390 is able to run 100+ Linux kernels on top of it CP,
while Linux cannot :-)

> In computational grid (multiple clusters and single machines with different
> computing power and resources) application _must_ have guaranteed requested memory
> or some of nodes will get into the state you described. I didn't heard anybody
> speak that nodes would be allocated to application exclusively. But I don't want
> to discuss whether QoS is important or not... (I don't have time to waste on flamewar)

I suppose you want "background" MM policy, to run memory hungry applications.
With swap-out guaranties, you may want to divide all processes to (a) system processes,
(b) user A processes, (c) user B processes, ...  And setup reasonable minimal guaranteed RSS for each group.

For true QoS, once in a while I found  interesting this paper:

    http://www.dcs.gla.ac.uk/~ian/papers/memh.ps

> > If, additionally, you want guaranteed low latency on data  access (for example for doing
> > real-time feed of audio/video/whatever samples), you may lock all process memory
> > to be resident in RAM: mlock(), mlockall() interfaces calls in mind.
>
> No, user should have all used physical memory pages (inside his allocation)
> always resident.

Surely it always better to have your page in RAM then on disk :-)
But for most systems this is expensive... or there would no VMM systems at all.

> >   What you actually suggest is an obscure and inefficient per-user limits
> >   of VM usage (to the size of RAM + swapfile size).
>
> Really ? If user would be able to set the size of his swapfile (according to
> his needs) or not use swapfile at all where are the limits (except his disk
> quota set by sysadmin) ? And btw, I would think twice before saying it
> would be inefficient (see below).

Will, user will set a swapfile to all available disk quota and start trashing VM
with  " *p = 0; p+= PAGE_SIZE ". With global LRU page replacement,
we will end with all RAM occupied by his pages, and unusable performance for each other.

Or you suggest  user disk quota  <  RAM?

> My test on limiting VM space by beancounter showed that mmapping of larger
> files than VM limit was impossible. IMO that's not the right way...

I also agreed here. Limiting VM is not practical, there're better resourses to be limited.
For example, unswappable memory used for PTE/PGD.

> >   Per-user OOM is again just a per-user VM / whatever resource limit.
> >   System OOM can still be triggered in a number of not-so-trivially-to-fix ways:
> >     - many small processes allocated multiple unswapable kernel memory for
> >       kernel objects (sockets, signals, locks, descriptors, ...);
> >     - large fragmented network traffic from a fast media.
>
> Do you have some numbers how much of these objects will hog 128MB of RAM ?
> (I could use your numbers for setting limits...) Is there any reason why
> not to account also this kind of memory to user ?

Userbeancounters are for that accounting. The problem is there are many different objects
in play here, and sometimes it is not possible to associate them with particular user.

> >   There is no point in reserving RAM or swap for possible future
> >   allocations: this memory will become wasted memory if no such allocation
>
> Again. By reserved, I don't mean unused... My fault, I thought it's obvious.
>
> >   occurs in near future, and we cannot predict this situation.
> >   Additionally, memory reservation policy does not scale well, specifically
> >   for systems with many idle users and a couple of active users, where active
> >   set of users is often changed.
> >
> > What will the beancounter patch http://www.asplinux.com.sg/install/ubpatch.shtml
> > trying to guarantee, is a _minimal_ resident memory for a group of processes.
> > I.e.,  if some group of processes behaves "well" and do not overcome their limits,
> > their pages are protected from being swapped out due to activity of over processes
>
> I don't claim that beancounter is bad. On the level of physical memory, MM should
> work exactly this way. Another question is level of virtual memory.
>
> > This should at least protect from swap-out attacks while one user trashing
> > all memory and other users suffer from heavy swapping.
>
> Impossible with per user swapfiles. If process would be outside his allocation
> and will want another page and system won't have any, page for swapout will be
> selected from the pages of its owner. So user could trash only his own processes.
>
> >
> > > Concept of personal swapfiles:
> > >
> > > The benefits (among others):
> > > - there wouldn't be system OOM (only per user OOM)
> >
> > there will be, see above
>
> :-)
>
> >
> > > - user would be able to check his available memory
> >
> > This buys nothing for users - users will be happy checking
> > his limits/guaranties, and the system will be happy
> > allocating *all* availiable memory to *any* user that need it
> > with a beancounter / swapout guarantee approach while
> > provides you quality of service for "well-behaved" objects.
>
> and without VM limits system will let users hog all available VM...
> and with limits there will be no mmap for larger files than
> VM memory limit. OK, some people can live with it...
> (Maybe it can be fixed in beancounter.)
>
> I think that marking processes/users as '[well/bad]-behaved' is
> very unfortunate. There should be strictly defined rules and
> system just should not let users break them.
>
> >
> > >  - no limits for VM address space
> >
> > ?
> >
> > Your VM is limited by your hardware/software implementation only,
> > and hard disk space. All other limits (per-process,
> > per-users, per-system - the ammount of disk space allocated
> > for swap) are actually administrative constraints.
>
> So you suggests that
>
> A: making a swap disk and setting per user VM memory limit to
> size/maximal_number_of_simultaneous_users is TheRightWay (tm)
> how to avoid memory pressure...

No. Overcommiting memory is a normal practice within UNIX world. So you
should not want to restrict users in VM space. However you want to protect
users using low VM set (actually, low resident pages) from users trashing
large VM area. Regardless of swap space used.

So,

1) Allocation a large VM area is OK, even overcommitted
   (and mmaping a large file is OK too).
   You should not want restrict user VM space.

2) If there is a plenty of RAM, active usage of VM by a single
   user/process should use all available RAM (even 99,9% of
   overall system memory).

   Stealing not used for a long time pages from other users is OK too.
   Stealing an active page from a user with low memory usage
   (lower then some configurable value) is not OK.

3) When low-memory-usage user wants a page, we should first try to
   take it from users with high (over quota) memory usage.

> I suggest that
>
> B: system swapfile would have only necessary size and won't
>    be ever touched by users (no system OOM). Only users that
>    _need_ VM would have swapfiles. And possibly
>
> .login
>    create_swapfile
>    swapon swapfile
>
> .logout
>    swapoff swapfile
>    destroy_swapfile
>
> Q: What approach is more inefficient and wasting disk space ?

I suppose system will loose performance with each logged in user due
to swap space fragmentation.

(A)
  + guarantee physically continuous space
  + less overhead
  - difficult to resize, usually resulting in admin decision to waste  more disk space

(B)
  + may use less disk space for many usage patterns
  + easy to add/delete space
  - is less efficient due to overhead of additional block bitmap operations,
    (to convert fs block to pages) but I did not see any actual numbers;
  - may be less efficient (and even to order of magnitute) if we
    take into account swap clustering: we loose ability to cluster
    swap requests from different users.

Both (A) and (B) needs some QoS-aware "fairness" page replacement algorithms.
But IMO (A) has more advantages :-) and using (B) has little to do with QoS.

> And another question: How the size of swapfile (partition) affects
> the performance of swapping (with regard to fragmentation) ?

Even for unfragmented files, there should be difference.
I'm interesting in this numbers too :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
  2000-08-25 13:22 Yuri Pudgorodsky
@ 2000-08-25 15:51 ` Jan Astalos
  2000-08-25 20:17   ` Yuri Pudgorodsky
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Astalos @ 2000-08-25 15:51 UTC (permalink / raw)
  To: linux-mm; +Cc: Yuri Pudgorodsky

Yuri Pudgorodsky wrote:
> 
> Hello,
> 
> I suppose you missed some points or I do not understand you needs.
> For general computation (and for almost all other workloads),
> I think you do not need "reserved" memory - "reserved memory == wasted memory".

Only if reserved == unused. If reserved means 'available when needed' it's
completely different question. Other users can use it but when the owner
will reclaim it, system will swap them out.

> 
> With a single memory-hungry computing hog per node in cluster, you may be
> happy with current Linux MM. As long as working set for this process fits in RAM
> you'll get top performance, and the system will handle  sporadic memory allocations
> for other process more or less well. If application working set does not fit in RAM,
> you'll get huge (1000+ times) performance drop and no OS algorithms helps
> you.

You are talking about single user dedicated cluster. I'm talking about multi user
resource sharing... To desing application that fits into a cluster (and uses 100 %
of resources) is non-trivial task. Of course unless your app is embarrassingly
parallel. In computational grid (multiple clusters and single machines with different 
computing power and resources) application _must_ have guaranteed requested memory
or some of nodes will get into the state you described. I didn't heard anybody
speak that nodes would be allocated to application exclusively. But I don't want 
to discuss whether QoS is important or not... (I don't have time to waste on flamewar)

> 
> If, additionally, you want guaranteed low latency on data  access (for example for doing
> real-time feed of audio/video/whatever samples), you may lock all process memory
> to be resident in RAM: mlock(), mlockall() interfaces calls in mind.

No, user should have all used physical memory pages (inside his allocation)
always resident.

> 
> Other memory related points on performance gain lay into your application.
> You should really take into account hierarchical memory structure, and make
> your application cache-friendly and swap-friendly. For some of my work,
> I found cache simulator from http://www.cacheprof.org/ to be useful.

There's really no doubt that application should be well designed. Is it ? :-)

> 
> QoS issues come to play, if multiple process instances fights with each other
> for memory resourse. Even when per-user swapfiles sounds overkill for me,
> fills with many drawbacks and a little benefits:

We'll see...

> 
>   What you actually suggest is an obscure and inefficient per-user limits
>   of VM usage (to the size of RAM + swapfile size).

Really ? If user would be able to set the size of his swapfile (according to
his needs) or not use swapfile at all where are the limits (except his disk
quota set by sysadmin) ? And btw, I would think twice before saying it 
would be inefficient (see below).

>   Beancounters (or other counters) based implementation is both faster
>   and straightforward.

My test on limiting VM space by beancounter showed that mmapping of larger
files than VM limit was impossible. IMO that's not the right way...

> 
>   Per-user OOM is again just a per-user VM / whatever resource limit.
>   System OOM can still be triggered in a number of not-so-trivially-to-fix ways:
>     - many small processes allocated multiple unswapable kernel memory for
>       kernel objects (sockets, signals, locks, descriptors, ...);
>     - large fragmented network traffic from a fast media.

Do you have some numbers how much of these objects will hog 128MB of RAM ?
(I could use your numbers for setting limits...) Is there any reason why
not to account also this kind of memory to user ?

> 
>   There is no point in reserving RAM or swap for possible future
>   allocations: this memory will become wasted memory if no such allocation

Again. By reserved, I don't mean unused... My fault, I thought it's obvious.

>   occurs in near future, and we cannot predict this situation.
>   Additionally, memory reservation policy does not scale well, specifically
>   for systems with many idle users and a couple of active users, where active
>   set of users is often changed.
> 
> What will the beancounter patch http://www.asplinux.com.sg/install/ubpatch.shtml
> trying to guarantee, is a _minimal_ resident memory for a group of processes.
> I.e.,  if some group of processes behaves "well" and do not overcome their limits,
> their pages are protected from being swapped out due to activity of over processes

I don't claim that beancounter is bad. On the level of physical memory, MM should
work exactly this way. Another question is level of virtual memory.

> This should at least protect from swap-out attacks while one user trashing
> all memory and other users suffer from heavy swapping.

Impossible with per user swapfiles. If process would be outside his allocation
and will want another page and system won't have any, page for swapout will be
selected from the pages of its owner. So user could trash only his own processes.

> 
> > Concept of personal swapfiles:
> >
> > The benefits (among others):
> > - there wouldn't be system OOM (only per user OOM)
> 
> there will be, see above

:-)

> 
> > - user would be able to check his available memory
> 
> This buys nothing for users - users will be happy checking
> his limits/guaranties, and the system will be happy
> allocating *all* availiable memory to *any* user that need it
> with a beancounter / swapout guarantee approach while
> provides you quality of service for "well-behaved" objects.

and without VM limits system will let users hog all available VM...
and with limits there will be no mmap for larger files than
VM memory limit. OK, some people can live with it...
(Maybe it can be fixed in beancounter.)

I think that marking processes/users as '[well/bad]-behaved' is
very unfortunate. There should be strictly defined rules and 
system just should not let users break them.

> 
> >  - no limits for VM address space
> 
> ?
> 
> Your VM is limited by your hardware/software implementation only,
> and hard disk space. All other limits (per-process,
> per-users, per-system - the ammount of disk space allocated
> for swap) are actually administrative constraints.

So you suggests that 

A: making a swap disk and setting per user VM memory limit to
size/maximal_number_of_simultaneous_users is TheRightWay (tm)
how to avoid memory pressure...

I suggest that

B: system swapfile would have only necessary size and won't
   be ever touched by users (no system OOM). Only users that 
   _need_ VM would have swapfiles. And possibly

.login
   create_swapfile
   swapon swapfile

.logout
   swapoff swapfile
   destroy_swapfile

Q: What approach is more inefficient and wasting disk space ?

And another question: How the size of swapfile (partition) affects
the performance of swapping (with regard to fragmentation) ?
Has anyone some numbers ?

Cheers,

Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Question: memory management and QoS
@ 2000-08-25 13:22 Yuri Pudgorodsky
  2000-08-25 15:51 ` Jan Astalos
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Pudgorodsky @ 2000-08-25 13:22 UTC (permalink / raw)
  To: linux-mm; +Cc: astalos

Hello,

I suppose you missed some points or I do not understand you needs.
For general computation (and for almost all other workloads),
I think you do not need "reserved" memory - "reserved memory == wasted memory".

With a single memory-hungry computing hog per node in cluster, you may be
happy with current Linux MM. As long as working set for this process fits in RAM
you'll get top performance, and the system will handle  sporadic memory allocations
for other process more or less well. If application working set does not fit in RAM,
you'll get huge (1000+ times) performance drop and no OS algorithms helps
you.

If, additionally, you want guaranteed low latency on data  access (for example for doing
real-time feed of audio/video/whatever samples), you may lock all process memory
to be resident in RAM: mlock(), mlockall() interfaces calls in mind.

Other memory related points on performance gain lay into your application.
You should really take into account hierarchical memory structure, and make
your application cache-friendly and swap-friendly. For some of my work,
I found cache simulator from http://www.cacheprof.org/ to be useful.

QoS issues come to play, if multiple process instances fights with each other
for memory resourse. Even when per-user swapfiles sounds overkill for me,
fills with many drawbacks and a little benefits:

  What you actually suggest is an obscure and inefficient per-user limits
  of VM usage (to the size of RAM + swapfile size).
  Beancounters (or other counters) based implementation is both faster
  and straightforward.

  Per-user OOM is again just a per-user VM / whatever resource limit.
  System OOM can still be triggered in a number of not-so-trivially-to-fix ways:
    - many small processes allocated multiple unswapable kernel memory for
      kernel objects (sockets, signals, locks, descriptors, ...);
    - large fragmented network traffic from a fast media.

  There is no point in reserving RAM or swap for possible future
  allocations: this memory will become wasted memory if no such allocation
  occurs in near future, and we cannot predict this situation.
  Additionally, memory reservation policy does not scale well, specifically
  for systems with many idle users and a couple of active users, where active
  set of users is often changed.

What will the beancounter patch http://www.asplinux.com.sg/install/ubpatch.shtml
trying to guarantee, is a _minimal_ resident memory for a group of processes.
I.e.,  if some group of processes behaves "well" and do not overcome their limits,
their pages are protected from being swapped out due to activity of over processes.
This should at least protect from swap-out attacks while one user trashing
all memory and other users suffer from heavy swapping.

> Concept of personal swapfiles:
>
> The benefits (among others):
> - there wouldn't be system OOM (only per user OOM)

there will be, see above

> - user would be able to check his available memory

This buys nothing for users - users will be happy checking
his limits/guaranties, and the system will be happy
allocating *all* availiable memory to *any* user that need it
with a beancounter / swapout guarantee approach while
provides you quality of service for "well-behaved" objects.

>  - no limits for VM address space

?

Your VM is limited by your hardware/software implementation only,
and hard disk space. All other limits (per-process,
per-users, per-system - the ammount of disk space allocated
for swap) are actually administrative constraints.

> - there could be more policies for sharing of physical memory
>   by users (and system)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2000-08-31 11:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-08-24 10:13 Question: memory management and QoS Jan Astalos
2000-08-28  7:47 ` Andrey Savochkin
2000-08-28  9:28   ` Jan Astalos
2000-08-28 11:30     ` Andrey Savochkin
2000-08-28 12:38       ` Jan Astalos
2000-08-28 17:25     ` Rik van Riel
2000-08-30  7:38       ` Jan Astalos
2000-08-30 16:53         ` Rik van Riel
2000-08-31  1:48           ` Andrey Savochkin
2000-08-31 11:49           ` Jan Astalos
2000-08-25 13:22 Yuri Pudgorodsky
2000-08-25 15:51 ` Jan Astalos
2000-08-25 20:17   ` Yuri Pudgorodsky
2000-08-28  8:36     ` Jan Astalos
2000-08-28 11:05       ` Andrey Savochkin
2000-08-28 12:10         ` Jan Astalos
2000-08-28 13:10           ` Andrey Savochkin
2000-08-30  9:01             ` Jan Astalos
2000-08-30 11:42               ` Marco Colombo
2000-08-28 17:40           ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox