swap, compress, discard: what's in the future?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* swap, compress, discard: what's in the future?
@ 2014-01-07  2:31 Luigi Semenzato
  2014-01-07  3:01 ` Minchan Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Luigi Semenzato @ 2014-01-07  2:31 UTC (permalink / raw)
  To: linux-mm

I would like to know (and I apologize if there is an obvious answer)
if folks on this list have pointers to documents or discussions
regarding the long-term evolution of the Linux memory manager.  I
realize there is plenty of shorter-term stuff to worry about, but a
long-term vision would be helpful---even more so if there is some
agreement.

My super-simple view is that when memory reclaim is possible there is
a cost attached to it, and the goal is to minimize the cost.  The cost
for reclaiming a unit of memory of some kind is a function of various
parameters: the CPU cycles, the I/O bandwidth, and the latency, to
name the main components.  This function can change a lot depending on
the load and in practice it may have to be grossly approximated, but
the concept is valid IMO.

For instance, the cost of compressing and decompressing RAM is mainly
CPU cycles.  A user program (a browser, for instance :) may be caching
decompressed JPEGs into transcendent (discardable) memory, for quick
display.  In this case, almost certainly the decompressed JPEGs should
be discarded before memory is compressed, under the realistic
assumption that one JPEG decompression is cheaper than one LZO
compression/decompression.  But there may be situations in which a lot
more work has gone into creating the application cache, and then it
makes sense to compress/decompress it rather than discard it.  It may
be hard for the kernel to figure out how expensive it is to recreate
the application cache, so the application should tell it.

Of course, for a cache the cost needs to be multiplied by the
probability that the memory will be used again in the future.  A good
part of the Linux VM is dedicated to estimating that probability, for
some kinds of memory.  But I don't see simple hooks for describing
various costs such as the one I mentioned, and I wonder if this
paradigm makes sense in general, or if it is peculiar to Chrome OS.

Thanks!
... and Happy New Year

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-07  2:31 swap, compress, discard: what's in the future? Luigi Semenzato
@ 2014-01-07  3:01 ` Minchan Kim
  2014-01-07  6:33   ` Bob Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Minchan Kim @ 2014-01-07  3:01 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: linux-mm

Hello Luigi,

On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote:
> I would like to know (and I apologize if there is an obvious answer)
> if folks on this list have pointers to documents or discussions
> regarding the long-term evolution of the Linux memory manager.  I
> realize there is plenty of shorter-term stuff to worry about, but a
> long-term vision would be helpful---even more so if there is some
> agreement.
> 
> My super-simple view is that when memory reclaim is possible there is
> a cost attached to it, and the goal is to minimize the cost.  The cost
> for reclaiming a unit of memory of some kind is a function of various
> parameters: the CPU cycles, the I/O bandwidth, and the latency, to
> name the main components.  This function can change a lot depending on
> the load and in practice it may have to be grossly approximated, but
> the concept is valid IMO.
> 
> For instance, the cost of compressing and decompressing RAM is mainly
> CPU cycles.  A user program (a browser, for instance :) may be caching
> decompressed JPEGs into transcendent (discardable) memory, for quick
> display.  In this case, almost certainly the decompressed JPEGs should
> be discarded before memory is compressed, under the realistic
> assumption that one JPEG decompression is cheaper than one LZO
> compression/decompression.  But there may be situations in which a lot
> more work has gone into creating the application cache, and then it
> makes sense to compress/decompress it rather than discard it.  It may
> be hard for the kernel to figure out how expensive it is to recreate
> the application cache, so the application should tell it.

Agreed. It's very hard for kernel to figure it out so VM should depend
on user's hint. and thing you said is the exact example of volatile
range system call that I am suggesting.

http://lwn.net/Articles/578761/

> 
> Of course, for a cache the cost needs to be multiplied by the
> probability that the memory will be used again in the future.  A good
> part of the Linux VM is dedicated to estimating that probability, for
> some kinds of memory.  But I don't see simple hooks for describing
> various costs such as the one I mentioned, and I wonder if this
> paradigm makes sense in general, or if it is peculiar to Chrome OS.

Your statement makes sense to me but unfortunately, current VM doesn't
consider everything you mentioned.
It is just based on page access recency by approximate LRU logic +
some heuristic(ex, mapped page and VM_EXEC pages are more precious).
The reason it makes hard is just complexity/overhead of implementation.
If someone has nice idea to define parameters and implement with
small overhead, it would be very nice!


> 
> Thanks!
> ... and Happy New Year
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-07  3:01 ` Minchan Kim
@ 2014-01-07  6:33   ` Bob Liu
  2014-01-07  7:13     ` Minchan Kim
  2014-01-07 13:45     ` Rik van Riel
  0 siblings, 2 replies; 7+ messages in thread
From: Bob Liu @ 2014-01-07  6:33 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Luigi Semenzato, Linux-MM, Rik van Riel

On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hello Luigi,
>
> On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote:
>> I would like to know (and I apologize if there is an obvious answer)
>> if folks on this list have pointers to documents or discussions
>> regarding the long-term evolution of the Linux memory manager.  I
>> realize there is plenty of shorter-term stuff to worry about, but a
>> long-term vision would be helpful---even more so if there is some
>> agreement.
>>
>> My super-simple view is that when memory reclaim is possible there is
>> a cost attached to it, and the goal is to minimize the cost.  The cost
>> for reclaiming a unit of memory of some kind is a function of various
>> parameters: the CPU cycles, the I/O bandwidth, and the latency, to
>> name the main components.  This function can change a lot depending on
>> the load and in practice it may have to be grossly approximated, but
>> the concept is valid IMO.
>>
>> For instance, the cost of compressing and decompressing RAM is mainly
>> CPU cycles.  A user program (a browser, for instance :) may be caching
>> decompressed JPEGs into transcendent (discardable) memory, for quick
>> display.  In this case, almost certainly the decompressed JPEGs should
>> be discarded before memory is compressed, under the realistic
>> assumption that one JPEG decompression is cheaper than one LZO
>> compression/decompression.  But there may be situations in which a lot
>> more work has gone into creating the application cache, and then it
>> makes sense to compress/decompress it rather than discard it.  It may
>> be hard for the kernel to figure out how expensive it is to recreate
>> the application cache, so the application should tell it.
>
> Agreed. It's very hard for kernel to figure it out so VM should depend
> on user's hint. and thing you said is the exact example of volatile
> range system call that I am suggesting.
>
> http://lwn.net/Articles/578761/
>
>>
>> Of course, for a cache the cost needs to be multiplied by the
>> probability that the memory will be used again in the future.  A good
>> part of the Linux VM is dedicated to estimating that probability, for
>> some kinds of memory.  But I don't see simple hooks for describing
>> various costs such as the one I mentioned, and I wonder if this
>> paradigm makes sense in general, or if it is peculiar to Chrome OS.
>
> Your statement makes sense to me but unfortunately, current VM doesn't
> consider everything you mentioned.
> It is just based on page access recency by approximate LRU logic +
> some heuristic(ex, mapped page and VM_EXEC pages are more precious).

It seems that the ARC page replacement algorithm in zfs have good
performance and more intelligent.
http://en.wikipedia.org/wiki/Adaptive_replacement_cache
Is there any history reason of linux didn't implement something like
ARC as the page cache replacement algorithm?

> The reason it makes hard is just complexity/overhead of implementation.
> If someone has nice idea to define parameters and implement with
> small overhead, it would be very nice!
>

-- 
Regards,
--Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-07  6:33   ` Bob Liu
@ 2014-01-07  7:13     ` Minchan Kim
  2014-01-07 13:45     ` Rik van Riel
  1 sibling, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2014-01-07  7:13 UTC (permalink / raw)
  To: Bob Liu; +Cc: Luigi Semenzato, Linux-MM, Rik van Riel

Hello Bob,

On Tue, Jan 07, 2014 at 02:33:11PM +0800, Bob Liu wrote:
> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim <minchan@kernel.org> wrote:
> > Hello Luigi,
> >
> > On Mon, Jan 06, 2014 at 06:31:29PM -0800, Luigi Semenzato wrote:
> >> I would like to know (and I apologize if there is an obvious answer)
> >> if folks on this list have pointers to documents or discussions
> >> regarding the long-term evolution of the Linux memory manager.  I
> >> realize there is plenty of shorter-term stuff to worry about, but a
> >> long-term vision would be helpful---even more so if there is some
> >> agreement.
> >>
> >> My super-simple view is that when memory reclaim is possible there is
> >> a cost attached to it, and the goal is to minimize the cost.  The cost
> >> for reclaiming a unit of memory of some kind is a function of various
> >> parameters: the CPU cycles, the I/O bandwidth, and the latency, to
> >> name the main components.  This function can change a lot depending on
> >> the load and in practice it may have to be grossly approximated, but
> >> the concept is valid IMO.
> >>
> >> For instance, the cost of compressing and decompressing RAM is mainly
> >> CPU cycles.  A user program (a browser, for instance :) may be caching
> >> decompressed JPEGs into transcendent (discardable) memory, for quick
> >> display.  In this case, almost certainly the decompressed JPEGs should
> >> be discarded before memory is compressed, under the realistic
> >> assumption that one JPEG decompression is cheaper than one LZO
> >> compression/decompression.  But there may be situations in which a lot
> >> more work has gone into creating the application cache, and then it
> >> makes sense to compress/decompress it rather than discard it.  It may
> >> be hard for the kernel to figure out how expensive it is to recreate
> >> the application cache, so the application should tell it.
> >
> > Agreed. It's very hard for kernel to figure it out so VM should depend
> > on user's hint. and thing you said is the exact example of volatile
> > range system call that I am suggesting.
> >
> > http://lwn.net/Articles/578761/
> >
> >>
> >> Of course, for a cache the cost needs to be multiplied by the
> >> probability that the memory will be used again in the future.  A good
> >> part of the Linux VM is dedicated to estimating that probability, for
> >> some kinds of memory.  But I don't see simple hooks for describing
> >> various costs such as the one I mentioned, and I wonder if this
> >> paradigm makes sense in general, or if it is peculiar to Chrome OS.
> >
> > Your statement makes sense to me but unfortunately, current VM doesn't
> > consider everything you mentioned.
> > It is just based on page access recency by approximate LRU logic +
> > some heuristic(ex, mapped page and VM_EXEC pages are more precious).
> 
> It seems that the ARC page replacement algorithm in zfs have good
> performance and more intelligent.
> http://en.wikipedia.org/wiki/Adaptive_replacement_cache
> Is there any history reason of linux didn't implement something like
> ARC as the page cache replacement algorithm?

I guess most biggest reason was patent?
Anyway, I think Rik and Peter saw it at that time.

> 
> > The reason it makes hard is just complexity/overhead of implementation.
> > If someone has nice idea to define parameters and implement with
> > small overhead, it would be very nice!
> >
> 
> -- 
> Regards,
> --Bob
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-07  6:33   ` Bob Liu
  2014-01-07  7:13     ` Minchan Kim
@ 2014-01-07 13:45     ` Rik van Riel
  2014-01-09  8:18       ` Bob Liu
  1 sibling, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2014-01-07 13:45 UTC (permalink / raw)
  To: Bob Liu, Minchan Kim; +Cc: Luigi Semenzato, Linux-MM, hnaz

On 01/07/2014 01:33 AM, Bob Liu wrote:
> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim <minchan@kernel.org> wrote:

>> Your statement makes sense to me but unfortunately, current VM doesn't
>> consider everything you mentioned.
>> It is just based on page access recency by approximate LRU logic +
>> some heuristic(ex, mapped page and VM_EXEC pages are more precious).
> 
> It seems that the ARC page replacement algorithm in zfs have good
> performance and more intelligent.
> http://en.wikipedia.org/wiki/Adaptive_replacement_cache
> Is there any history reason of linux didn't implement something like
> ARC as the page cache replacement algorithm?

ARC by itself was quickly superceded by CLOCK-Pro, which
looks like it would be even better.

Johannes introduces an algorithm with similar properties
in his "thrash based page cache replacement" patch series.

However, algorithms like ARC and clockpro are best for
a cache that caches a large data set (much larger than
the cache size), and has to deal with large inter-reference
distances.

For anonymous memory, we are dealing with the opposite:
the total amount of anonymous memory is on the same
order of magnitude as the amount of RAM, and the
inter-reference distance will be smaller as a result.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-07 13:45     ` Rik van Riel
@ 2014-01-09  8:18       ` Bob Liu
  2014-01-09 16:41         ` Rik van Riel
  0 siblings, 1 reply; 7+ messages in thread
From: Bob Liu @ 2014-01-09  8:18 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Bob Liu, Minchan Kim, Luigi Semenzato, Linux-MM, hnaz


On 01/07/2014 09:45 PM, Rik van Riel wrote:
> On 01/07/2014 01:33 AM, Bob Liu wrote:
>> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim <minchan@kernel.org> wrote:
> 
>>> Your statement makes sense to me but unfortunately, current VM doesn't
>>> consider everything you mentioned.
>>> It is just based on page access recency by approximate LRU logic +
>>> some heuristic(ex, mapped page and VM_EXEC pages are more precious).
>>
>> It seems that the ARC page replacement algorithm in zfs have good
>> performance and more intelligent.
>> http://en.wikipedia.org/wiki/Adaptive_replacement_cache
>> Is there any history reason of linux didn't implement something like
>> ARC as the page cache replacement algorithm?
> 
> ARC by itself was quickly superceded by CLOCK-Pro, which
> looks like it would be even better.
> 
> Johannes introduces an algorithm with similar properties
> in his "thrash based page cache replacement" patch series.
> 

But it seems you and Peter have already implemented CLOCK-Pro and CART
page cache replacement many years ago. Why they were not get merged at
that time?

I found some information from
http://linux-mm.org/AdvancedPageReplacement

Linux implementations:
Rahul Iyer's implementation of CART, RahulIyerCART

Rik van Riel's ClockProApproximation.

Rik van Riel's proposal for the tracking of NonResidentPages, which is
used by both his ClockProApproximation and by Peter Zijlstra's CART and
Clock-pro implementations.

Peter Zijlstra's CART PeterZCart

Peter Zijlstra's Clock-Pro PeterZClockPro2

Thanks,
-Bob

> However, algorithms like ARC and clockpro are best for
> a cache that caches a large data set (much larger than
> the cache size), and has to deal with large inter-reference
> distances.
> 
> For anonymous memory, we are dealing with the opposite:
> the total amount of anonymous memory is on the same
> order of magnitude as the amount of RAM, and the
> inter-reference distance will be smaller as a result.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: swap, compress, discard: what's in the future?
  2014-01-09  8:18       ` Bob Liu
@ 2014-01-09 16:41         ` Rik van Riel
  0 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2014-01-09 16:41 UTC (permalink / raw)
  To: Bob Liu; +Cc: Bob Liu, Minchan Kim, Luigi Semenzato, Linux-MM, hnaz

On 01/09/2014 03:18 AM, Bob Liu wrote:
>
> On 01/07/2014 09:45 PM, Rik van Riel wrote:
>> On 01/07/2014 01:33 AM, Bob Liu wrote:
>>> On Tue, Jan 7, 2014 at 11:01 AM, Minchan Kim <minchan@kernel.org> wrote:
>>
>>>> Your statement makes sense to me but unfortunately, current VM doesn't
>>>> consider everything you mentioned.
>>>> It is just based on page access recency by approximate LRU logic +
>>>> some heuristic(ex, mapped page and VM_EXEC pages are more precious).
>>>
>>> It seems that the ARC page replacement algorithm in zfs have good
>>> performance and more intelligent.
>>> http://en.wikipedia.org/wiki/Adaptive_replacement_cache
>>> Is there any history reason of linux didn't implement something like
>>> ARC as the page cache replacement algorithm?
>>
>> ARC by itself was quickly superceded by CLOCK-Pro, which
>> looks like it would be even better.
>>
>> Johannes introduces an algorithm with similar properties
>> in his "thrash based page cache replacement" patch series.
>>
>
> But it seems you and Peter have already implemented CLOCK-Pro and CART
> page cache replacement many years ago. Why they were not get merged at
> that time?

Scalability concerns, lack of time, and the VM not being
ready to take the code.

The split LRU code makes it much more logical to merge a
replacement scheme that is suitable for second level
caches, because the anonymous memory is in an LRU scheme
that is more suitable to its kind of usage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-09 16:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-07  2:31 swap, compress, discard: what's in the future? Luigi Semenzato
2014-01-07  3:01 ` Minchan Kim
2014-01-07  6:33   ` Bob Liu
2014-01-07  7:13     ` Minchan Kim
2014-01-07 13:45     ` Rik van Riel
2014-01-09  8:18       ` Bob Liu
2014-01-09 16:41         ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox