Another Clock-pro approx

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Another Clock-pro approx
@ 2005-10-13 18:26 Peter Zijlstra
  2005-10-14  6:44 ` Song Jiang
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2005-10-13 18:26 UTC (permalink / raw)
  To: riel, sjiang; +Cc: linux-mm

Hi,

I've been thinking on another clock-pro approximation.

Each page has 3 bits: hot/cold, test and referenced.
Say we have 3 lists: T1, T2 and T3.
and variable: s

T1 will have hot and cold pages, T2 will only have cold pages and T3
will have the non-resident pages.
c will be the total number of resident pages; |T1| + |T2| + |T3| = 2c.

T1-rotation:

h/c   test   ref          action
 0       0       0           T2-000
 0       0       1           T2-001
 0       1       0           T2-000
 0       1       1           T1-100
 1       0       0           T2-001
 1       0       1           T1-100
 1       1       0           <cannot happen>
 1       1       1           <cannot happen>

T2-rotation:

h/c   test   ref          action
 0       0       0           <remove page from list>
 0       0       1           T1-000
 0       1       0           T3-010
 0       1       1           T1-100

T3-rotation: frees up non-resident slots

So, on fault we rotate T2, unless empty then we start by rotating T1
until T2 contains at least 1 cold page.
If a T2 rotation creates a hot page, we rotate T1 to degrade a hot
page to a cold page in order to keep the cold page target m_c.
Every T1 rotation adds |T1| to s. While s > c, we subtract c from s and
turn T3 for each subtraction.

Compare to clock-pro:
  T1-rotation <-> Hand_hot
  T2-rotation <-> Hand_cold
  T3-rotation <-> Hand_test

The normal m_c adaption rules can be applied.

Zoned edition:
This can be done per zone by having:
T1_i, T2_i, T3_j, s, t, u_j
where _i is the zone index and _j the non-resident bucket index.

Then each T1_i turn will add |T1_i| to s, each c in s will increment t by 1.
On each non-resident bucket access we increment u_j until it equals t
and for each increment we rotate the bucket.

Kind regards,

Peter Zijlstra

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-13 18:26 Another Clock-pro approx Peter Zijlstra
@ 2005-10-14  6:44 ` Song Jiang
  2005-10-14  7:38   ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Song Jiang @ 2005-10-14  6:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: riel, linux-mm

Hi, Peter,

    I didn't see your explanation of the actions such as T2-000.
So I put my understanding at the below. 

While setting three lists, do you indicate that a page 
can have 3 set of pointers, each for the links in one list?

If we don't restrict ourselves to the current active and inactive
list data structure, can you point to us what major approximations
we have to make on the original clock-pro policy so that the policy 
becomes acceptable in the kernel? (I understand that zoned data
structure is one of the required adaptations.) The reason I'm asking 
the question is that I didn't see some major changes here in your
approxiamtion? Once I understand the kernel required adaptations, 
I can work with you for a clock-pro approximation.

I have a clock-pro simulator in C code and powerpoint presentation 
slides explaining clock-pro operations. If you are interested in
any of them, let me know.

Thanks.

   Song

 
 
On Thu, 2005-10-13 at 12:26, Peter Zijlstra wrote:
> Hi,
> 
> I've been thinking on another clock-pro approximation.
> 
> Each page has 3 bits: hot/cold, test and referenced.
> Say we have 3 lists: T1, T2 and T3.
> and variable: s
> 
> T1 will have hot and cold pages, T2 will only have cold pages and T3
> will have the non-resident pages.
> c will be the total number of resident pages; |T1| + |T2| + |T3| = 2c.
                                                Do you mean |T1 U T2 U
T3| = 2c ?

> 
> 
> T1-rotation:
> 
> h/c   test   ref          action
>  0         0       0                T2-000  the page leaves T1, T2, and T3     
>  0         0       1                T2-001   the page must be resident, do nothing          
>  0       1       0           T2-000           test bit is cleared. if it is non-resident, it leaves T3
>  0       1       1           T1-100           test bit is cleared
>  1       0       0           T2-001           turn into cold page
>  1       0       1           T1-100           ref bit is cleared
>  1       1       0           <cannot happen>
>  1       1       1           <cannot happen>
> 
> 
> T2-rotation:
> 
> h/c   test   ref          action
>  0         0       0           <remove page from list> the pages leaves T1, T2 and T3
>  0         0        1              T1-000  this page must be resident, free the page, the page leaves T1, T2, and T3   
>  0         1        0              T3-010  if the page is resident, free the page, place it in T3. Otherwise, do nothing 
>  0         1        1              T1-100   the page must be resident, move the page to T1, rotate T1 to a hot page into a cold one  
> 
> 
> T3-rotation: frees up non-resident slots
> 
> So, on fault we rotate T2, unless empty then we start by rotating T1
> until T2 contains at least 1 cold page.
> If a T2 rotation creates a hot page, we rotate T1 to degrade a hot
> page to a cold page in order to keep the cold page target m_c.
> Every T1 rotation adds |T1| to s. While s > c, we subtract c from s and
       Does |T1| mean the total size if T1? Can you explain why in more
detail?

> turn T3 for each subtraction.
> 
> Compare to clock-pro:
>   T1-rotation <-> Hand_hot
>   T2-rotation <-> Hand_cold
>   T3-rotation <-> Hand_test
> 
> The normal m_c adaption rules can be applied.
> 
> Zoned edition:
> This can be done per zone by having:
> T1_i, T2_i, T3_j, s, t, u_j
> where _i is the zone index and _j the non-resident bucket index.
> 
> Then each T1_i turn will add |T1_i| to s, each c in s will increment t by 1.
> On each non-resident bucket access we increment u_j until it equals t
> and for each increment we rotate the bucket.
> 
> 
> 
> Kind regards,
> 
> Peter Zijlstra

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-14  6:44 ` Song Jiang
@ 2005-10-14  7:38   ` Peter Zijlstra
  2005-10-15  3:35     ` Song Jiang
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2005-10-14  7:38 UTC (permalink / raw)
  To: Song Jiang; +Cc: riel, linux-mm

On Fri, 2005-10-14 at 00:44 -0600, Song Jiang wrote:
> Hi, Peter,
> 
>     I didn't see your explanation of the actions such as T2-000.
> So I put my understanding at the below. 

My bad, I keep forgetting to write half the stuff down, often the most
important pieces appear missing ;-(

As I told Rik yesterday on IRC, the main idea behind it was to make T1 +
T2 behave like 1 big SC-list and superimpose T3 thereon by coupling the
rotation speed. And thus creating 1 virtual wheel as clock-pro has. The
split in T1 and T2 acts as the hot hand and determines the length of the
test period.

|T1| + |T2| <= c
|T1| + |T2| + |T3| <= 2c

Resident:
T1 can contain pages marked as both hot and cold.
T2 can only contain cold pages.

Non-resident:
T3 can only contain non-resident cold pages in their test period.

The actions presented like Tn-abc are to be read as: place page on the
head of list Tn with the hot/cold bit set to a, the test bit set to b
and the referenced bit set to c.

> While setting three lists, do you indicate that a page 
> can have 3 set of pointers, each for the links in one list?

No, a page can only be on 1 list at a time.

> If we don't restrict ourselves to the current active and inactive
> list data structure, can you point to us what major approximations
> we have to make on the original clock-pro policy so that the policy 
> becomes acceptable in the kernel? (I understand that zoned data
> structure is one of the required adaptations.) The reason I'm asking 
> the question is that I didn't see some major changes here in your
> approxiamtion? Once I understand the kernel required adaptations, 
> I can work with you for a clock-pro approximation.

Yes the zoned data structure is the most difficult. It gives rise to the
fact that we have to split the resident from the non-resident page
management. Each zone will have a resident part and all zones will share
the non-resident part. This because a page leaving zone 1 could be
faulted back in zone 2.

Another is that the pageout operation is async, ie. we start writeback
against a page but until it is finished the page has to stay on the
resident lists but as soon as writeback finishes and we have not yet had
another reference we need to reclaim the page asap.

The last one, and I'm not quite sure on this one (Rik?), is that when we
insert a page we're sure it is going to be referenced right after the
context switch to userspace. Hence we effectively insert referenced
pages.

Rik, any things missing here?

> I have a clock-pro simulator in C code and powerpoint presentation 
> slides explaining clock-pro operations. If you are interested in
> any of them, let me know.
> 
> Thanks.
> 
>    Song
> 
>  
> 
> On Thu, 2005-10-13 at 12:26, Peter Zijlstra wrote:
> > Hi,
> > 
> > I've been thinking on another clock-pro approximation.
> > 
> > Each page has 3 bits: hot/cold, test and referenced.
> > Say we have 3 lists: T1, T2 and T3.
> > and variable: s
> > 
> > T1 will have hot and cold pages, T2 will only have cold pages and T3
> > will have the non-resident pages.
> > c will be the total number of resident pages; |T1| + |T2| + |T3| = 2c.
>                                                 Do you mean |T1 U T2 U
> T3| = 2c ?

See above.

New and improved actions:

> > T1-rotation:
> > 
> > h/c   test   ref       action
> >  0      0     0        T2-000   page is passed on to T2
> >  0      0     1        T2-001   page is passed on to T2
> >  0      1     0        T2-000   page to T2, clear test bit
> >  0      1     1        T1-100   make it hot
> >  1      0     0        T2-010   turn into cold page, start test period
> >  1      0     1        T1-100   keep it hot, clear ref bit
> >  1      1     0        <cannot happen>
> >  1      1     1        <cannot happen>
> > 
> > 
> > T2-rotation:
> > 
> > h/c   test   ref       action
> >  0      0     0        <remove page from list>
> >  0      0     1        T1-000  keep it cold, keep it resident.
> >  0      1     0        T3-010  make it non-resident
> >  0      1     1        T1-100  make it hot
> > 
> > 
> > T3-rotation: remove from non-resident list.

A fault will check the non-resident list and also remove the page if
found.

> > So, on fault we rotate T2, unless empty then we start by rotating T1
> > until T2 contains at least 1 cold page.
> > If a T2 rotation creates a hot page, we rotate T1 to degrade a hot
> > page to a cold page in order to keep the cold page target m_c.
> > Every T1 rotation adds |T1| to s. While s > c, we subtract c from s and

>        Does |T1| mean the total size if T1? 

Yes, |T1| is the size of T1. 

> Can you explain why in more
> detail?
> 

This because the hot hand pushes the test hand.

Hmmm, maybe it should not be every T1 rotation, but only rotations on
hot pages that get accounted. Because at that point the hot page with
the largest inter reference period will disappear and the test period
changes.

So the idea is to couple to rotation speed of T3 to T1 and scale the
respective sizes. 

> > turn T3 for each subtraction.
> > 
> > Compare to clock-pro:
> >   T1-rotation <-> Hand_hot
> >   T2-rotation <-> Hand_cold
> >   T3-rotation <-> Hand_test
> > 
> > The normal m_c adaption rules can be applied.
> > 
> > Zoned edition:
> > This can be done per zone by having:
> > T1_i, T2_i, T3_j, s, t, u_j
> > where _i is the zone index and _j the non-resident bucket index.
> > 
> > Then each T1_i turn will add |T1_i| to s, each c in s will increment t by 1.
> > On each non-resident bucket access we increment u_j until it equals t
> > and for each increment we rotate the bucket.
> > 

More thoughts, it should probably be: u_j = t/J. Where J is the total
number of buckets.

Kind regards,

Peter Zijlstra

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-14  7:38   ` Peter Zijlstra
@ 2005-10-15  3:35     ` Song Jiang
  2005-10-15  4:10       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Song Jiang @ 2005-10-15  3:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: riel, linux-mm

Peter,

Two obvious issues in the actions are 
(1) test bits have never been set; 
(2) pages in T3 have never been promoted into T1/T2.

I might have to take the responsibility for all
the confusion in the understanding of why clock-pro
works. (e.g. clock-pro can deal with access-once, loop
accesses effectively).  I should have put a pseudo code
in the clock-pro paper.  However, even with a pseudo
code, the description of clock-pro would seem much more
complicated than the idea behind it actually is. To
get its idea without being bothered by all the details
involved in the clock-pro description, my suggestion
is that let's base our discussion/design on the original
LIRS algorithm.   
(see section 3 of the following paper
http://www.cs.wm.edu/hpcs/WWW/HTML/publications/papers/TR-05-11.pdf)

In the context of the LIRS algorithm, compare to clock-pro,

cold <--> HIR 
hot <--> LIR

HAND_hot --> bottom of the LIRS stack (= last resident hot
page). This is the most critical position in the stack,
which serves as a criterion of hot/cold.  A reference
to a page above it (if the stack is vertically viewed)
makes the page a hot one, otherwise, the page remains as
a cold one. This is also the place where a hot page is
demoted to a cold one.

HAND-cold --> last resident cold page (== bottom of stack
Q) where a victim page is going to be searched.

If we allow an unlimited number of non-resident stay in
the stack, there is no need for a HAND_test. Actually
HAND_hot serves the purpose.  If we have a limit on that
(say the memory size c), then the HAND-test points to the
last non-resident cold page or last hot page, whichever
is higher.

The idea behind LIRS is very simple: the last hot page is the
one least deserving to reside in memory (ignoring the
small number of resident cold pages here). If you have
a reference above it (in the stack), then you are more
deserving to get in (as a hot one), and the last hot one
gets out (as a cold one). Otherwise, no h/c states change.

A major difference between LIRS and Clock-pro is that
clock-pro delays the actions on references (except page
faults). Taking this difference into reading clock-pro 
description, you can easily unravel all the tricks played 
in it.

The conclusion is that it is essential for clock-pro
to allow the location comparison in a single list between
(non-resident) cold page and the last hot page, so that
we know which cold page should be promoted into
a hot page with a reference. If we use HAND_test, we must
make sure HAND_test stays above HAND_hot.

If we use separate lists for h/c pages (like active/inactive
lists), we lose the chance for the comparison, and major 
performance advantages of clock-pro are lost.  Actually 
CAR/CART have been shown in the aforementioned paper 
not be able to deal with loop access problem of LRU, 
and that is the reason.

It is fine to have a common non-resident page pool for 
the zoned memory, if we can also set their corresponding
place-holders in a single list for each zone.

Let me know if the single list suggestion is feasible
in the Linux kernel. 

Thanks. 

Song

On Fri, 2005-10-14 at 01:38, Peter Zijlstra wrote:
> On Fri, 2005-10-14 at 00:44 -0600, Song Jiang wrote:
> > Hi, Peter,
> > 
> >     I didn't see your explanation of the actions such as T2-000.
> > So I put my understanding at the below. 
> 
> My bad, I keep forgetting to write half the stuff down, often the most
> important pieces appear missing ;-(
> 
> As I told Rik yesterday on IRC, the main idea behind it was to make T1 +
> T2 behave like 1 big SC-list and superimpose T3 thereon by coupling the
> rotation speed. And thus creating 1 virtual wheel as clock-pro has. The
> split in T1 and T2 acts as the hot hand and determines the length of the
> test period.
> 
> |T1| + |T2| <= c
> |T1| + |T2| + |T3| <= 2c
> 
> Resident:
> T1 can contain pages marked as both hot and cold.
> T2 can only contain cold pages.
> 
> Non-resident:
> T3 can only contain non-resident cold pages in their test period.
> 
> The actions presented like Tn-abc are to be read as: place page on the
> head of list Tn with the hot/cold bit set to a, the test bit set to b
> and the referenced bit set to c.
> 
> > While setting three lists, do you indicate that a page 
> > can have 3 set of pointers, each for the links in one list?
> 
> No, a page can only be on 1 list at a time.
> 
> > If we don't restrict ourselves to the current active and inactive
> > list data structure, can you point to us what major approximations
> > we have to make on the original clock-pro policy so that the policy 
> > becomes acceptable in the kernel? (I understand that zoned data
> > structure is one of the required adaptations.) The reason I'm asking 
> > the question is that I didn't see some major changes here in your
> > approxiamtion? Once I understand the kernel required adaptations, 
> > I can work with you for a clock-pro approximation.
> 
> Yes the zoned data structure is the most difficult. It gives rise to the
> fact that we have to split the resident from the non-resident page
> management. Each zone will have a resident part and all zones will share
> the non-resident part. This because a page leaving zone 1 could be
> faulted back in zone 2.
> 
> Another is that the pageout operation is async, ie. we start writeback
> against a page but until it is finished the page has to stay on the
> resident lists but as soon as writeback finishes and we have not yet had
> another reference we need to reclaim the page asap.
> 
> The last one, and I'm not quite sure on this one (Rik?), is that when we
> insert a page we're sure it is going to be referenced right after the
> context switch to userspace. Hence we effectively insert referenced
> pages.
> 
> Rik, any things missing here?
> 
> > I have a clock-pro simulator in C code and powerpoint presentation 
> > slides explaining clock-pro operations. If you are interested in
> > any of them, let me know.
> > 
> > Thanks.
> > 
> >    Song
> > 
> >  
> > 
> > On Thu, 2005-10-13 at 12:26, Peter Zijlstra wrote:
> > > Hi,
> > > 
> > > I've been thinking on another clock-pro approximation.
> > > 
> > > Each page has 3 bits: hot/cold, test and referenced.
> > > Say we have 3 lists: T1, T2 and T3.
> > > and variable: s
> > > 
> > > T1 will have hot and cold pages, T2 will only have cold pages and T3
> > > will have the non-resident pages.
> > > c will be the total number of resident pages; |T1| + |T2| + |T3| = 2c.
> >                                                 Do you mean |T1 U T2 U
> > T3| = 2c ?
> 
> See above.
> 
> New and improved actions:
> 
> > > T1-rotation:
> > > 
> > > h/c   test   ref       action
> > >  0      0     0        T2-000   page is passed on to T2
> > >  0      0     1        T2-001   page is passed on to T2
> > >  0      1     0        T2-000   page to T2, clear test bit
> > >  0      1     1        T1-100   make it hot
> > >  1      0     0        T2-010   turn into cold page, start test period
> > >  1      0     1        T1-100   keep it hot, clear ref bit
> > >  1      1     0        <cannot happen>
> > >  1      1     1        <cannot happen>
> > > 
> > > 
> > > T2-rotation:
> > > 
> > > h/c   test   ref       action
> > >  0      0     0        <remove page from list>
> > >  0      0     1        T1-000  keep it cold, keep it resident.
> > >  0      1     0        T3-010  make it non-resident
> > >  0      1     1        T1-100  make it hot
> > > 
> > > 
> > > T3-rotation: remove from non-resident list.
> 
> A fault will check the non-resident list and also remove the page if
> found.
> 
> > > So, on fault we rotate T2, unless empty then we start by rotating T1
> > > until T2 contains at least 1 cold page.
> > > If a T2 rotation creates a hot page, we rotate T1 to degrade a hot
> > > page to a cold page in order to keep the cold page target m_c.
> > > Every T1 rotation adds |T1| to s. While s > c, we subtract c from s and
> 
> >        Does |T1| mean the total size if T1? 
> 
> Yes, |T1| is the size of T1. 
> 
> > Can you explain why in more
> > detail?
> > 
> 
> This because the hot hand pushes the test hand.
> 
> Hmmm, maybe it should not be every T1 rotation, but only rotations on
> hot pages that get accounted. Because at that point the hot page with
> the largest inter reference period will disappear and the test period
> changes.
> 
> So the idea is to couple to rotation speed of T3 to T1 and scale the
> respective sizes. 
> 
> > > turn T3 for each subtraction.
> > > 
> > > Compare to clock-pro:
> > >   T1-rotation <-> Hand_hot
> > >   T2-rotation <-> Hand_cold
> > >   T3-rotation <-> Hand_test
> > > 
> > > The normal m_c adaption rules can be applied.
> > > 
> > > Zoned edition:
> > > This can be done per zone by having:
> > > T1_i, T2_i, T3_j, s, t, u_j
> > > where _i is the zone index and _j the non-resident bucket index.
> > > 
> > > Then each T1_i turn will add |T1_i| to s, each c in s will increment t by 1.
> > > On each non-resident bucket access we increment u_j until it equals t
> > > and for each increment we rotate the bucket.
> > > 
> 
> More thoughts, it should probably be: u_j = t/J. Where J is the total
> number of buckets.
> 
> 
> Kind regards,
> 
> Peter Zijlstra
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-15  3:35     ` Song Jiang
@ 2005-10-15  4:10       ` Peter Zijlstra
  2005-10-16  5:34         ` Song Jiang
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2005-10-15  4:10 UTC (permalink / raw)
  To: Song Jiang; +Cc: riel, linux-mm

On Fri, 2005-10-14 at 21:35 -0600, Song Jiang wrote:
> Peter,
> 
> Two obvious issues in the actions are 
> (1) test bits have never been set; 
> (2) pages in T3 have never been promoted into T1/T2.

Drad, more missing information:
On fault the new page is searched for in T3, if present T1-100 otherwise
T1-010.

> I might have to take the responsibility for all
> the confusion in the understanding of why clock-pro
> works. (e.g. clock-pro can deal with access-once, loop
> accesses effectively).  I should have put a pseudo code
> in the clock-pro paper.  However, even with a pseudo
> code, the description of clock-pro would seem much more
> complicated than the idea behind it actually is. To
> get its idea without being bothered by all the details
> involved in the clock-pro description, my suggestion
> is that let's base our discussion/design on the original
> LIRS algorithm.   
> (see section 3 of the following paper
> http://www.cs.wm.edu/hpcs/WWW/HTML/publications/papers/TR-05-11.pdf)
> 
> In the context of the LIRS algorithm, compare to clock-pro,
> 
> cold <--> HIR 
> hot <--> LIR
> 
> HAND_hot --> bottom of the LIRS stack (= last resident hot
> page). This is the most critical position in the stack,
> which serves as a criterion of hot/cold.  A reference
> to a page above it (if the stack is vertically viewed)
> makes the page a hot one, otherwise, the page remains as
> a cold one. This is also the place where a hot page is
> demoted to a cold one.
> 
> 
> HAND-cold --> last resident cold page (== bottom of stack
> Q) where a victim page is going to be searched.

Do you imply here that HAND-cold pushes HAND-hot? If so, then it is
possible to write this 2-hand clock as 2 lists, where the top one will
always push its tail into the head of the botton, and the botton will
puth into the head of the top one on reference.
This is exactly what I have done.

> If we allow an unlimited number of non-resident stay in
> the stack, there is no need for a HAND_test. Actually
> HAND_hot serves the purpose.  If we have a limit on that
> (say the memory size c), then the HAND-test points to the
> last non-resident cold page or last hot page, whichever
> is higher.

Hmm, need to think on this.

> If we use separate lists for h/c pages (like active/inactive
> lists), we lose the chance for the comparison, and major 
> performance advantages of clock-pro are lost.  

Neither my proposal nor CART have h/c separarted lists. CARTs T1 list
can contain both hot and cold pages as can mine.

> Actually 
> CAR/CART have been shown in the aforementioned paper 
> not be able to deal with loop access problem of LRU, 
> and that is the reason.

I fixed that:
  http://programming.kicks-ass.net/kernel-patches/cart/cart-cart-r.patch
That solution seems to behave properly, although more rigourous testing
would be nice.

> It is fine to have a common non-resident page pool for 
> the zoned memory, if we can also set their corresponding
> place-holders in a single list for each zone.
> 
> Let me know if the single list suggestion is feasible
> in the Linux kernel. 

Single list should not be a problem. However I still have some trouble
with placing the page meta-data of non-resident pages on those lists.


> On Fri, 2005-10-14 at 01:38, Peter Zijlstra wrote:
> > On Fri, 2005-10-14 at 00:44 -0600, Song Jiang wrote:
> > > Hi, Peter,
> > > 
> > >     I didn't see your explanation of the actions such as T2-000.
> > > So I put my understanding at the below. 
> > 
> > My bad, I keep forgetting to write half the stuff down, often the most
> > important pieces appear missing ;-(
> > 
> > As I told Rik yesterday on IRC, the main idea behind it was to make T1 +
> > T2 behave like 1 big SC-list and superimpose T3 thereon by coupling the
> > rotation speed. And thus creating 1 virtual wheel as clock-pro has. The
> > split in T1 and T2 acts as the hot hand and determines the length of the
> > test period.
> > 
> > |T1| + |T2| <= c
> > |T1| + |T2| + |T3| <= 2c
> > 
> > Resident:
> > T1 can contain pages marked as both hot and cold.
> > T2 can only contain cold pages.
> > 
> > Non-resident:
> > T3 can only contain non-resident cold pages in their test period.
> > 
> > The actions presented like Tn-abc are to be read as: place page on the
> > head of list Tn with the hot/cold bit set to a, the test bit set to b
> > and the referenced bit set to c.
> > 
> > > While setting three lists, do you indicate that a page 
> > > can have 3 set of pointers, each for the links in one list?
> > 
> > No, a page can only be on 1 list at a time.
> > 
> > > If we don't restrict ourselves to the current active and inactive
> > > list data structure, can you point to us what major approximations
> > > we have to make on the original clock-pro policy so that the policy 
> > > becomes acceptable in the kernel? (I understand that zoned data
> > > structure is one of the required adaptations.) The reason I'm asking 
> > > the question is that I didn't see some major changes here in your
> > > approxiamtion? Once I understand the kernel required adaptations, 
> > > I can work with you for a clock-pro approximation.
> > 
> > Yes the zoned data structure is the most difficult. It gives rise to the
> > fact that we have to split the resident from the non-resident page
> > management. Each zone will have a resident part and all zones will share
> > the non-resident part. This because a page leaving zone 1 could be
> > faulted back in zone 2.
> > 
> > Another is that the pageout operation is async, ie. we start writeback
> > against a page but until it is finished the page has to stay on the
> > resident lists but as soon as writeback finishes and we have not yet had
> > another reference we need to reclaim the page asap.
> > 
> > The last one, and I'm not quite sure on this one (Rik?), is that when we
> > insert a page we're sure it is going to be referenced right after the
> > context switch to userspace. Hence we effectively insert referenced
> > pages.
> > 
> > Rik, any things missing here?
> > 
> > > I have a clock-pro simulator in C code and powerpoint presentation 
> > > slides explaining clock-pro operations. If you are interested in
> > > any of them, let me know.
> > > 
> > > Thanks.
> > > 
> > >    Song
> > > 
> > >  
> > > 
> > > On Thu, 2005-10-13 at 12:26, Peter Zijlstra wrote:
> > > > Hi,
> > > > 
> > > > I've been thinking on another clock-pro approximation.
> > > > 
> > > > Each page has 3 bits: hot/cold, test and referenced.
> > > > Say we have 3 lists: T1, T2 and T3.
> > > > and variable: s
> > > > 
> > > > T1 will have hot and cold pages, T2 will only have cold pages and T3
> > > > will have the non-resident pages.
> > > > c will be the total number of resident pages; |T1| + |T2| + |T3| = 2c.
> > >                                                 Do you mean |T1 U T2 U
> > > T3| = 2c ?
> > 
> > See above.
> > 
> > New and improved actions:
> > 
> > > > T1-rotation:
> > > > 
> > > > h/c   test   ref       action
> > > >  0      0     0        T2-000   page is passed on to T2
> > > >  0      0     1        T2-001   page is passed on to T2
> > > >  0      1     0        T2-000   page to T2, clear test bit
> > > >  0      1     1        T1-100   make it hot
> > > >  1      0     0        T2-010   turn into cold page, start test period
> > > >  1      0     1        T1-100   keep it hot, clear ref bit
> > > >  1      1     0        <cannot happen>
> > > >  1      1     1        <cannot happen>
> > > > 
> > > > 
> > > > T2-rotation:
> > > > 
> > > > h/c   test   ref       action
> > > >  0      0     0        <remove page from list>
> > > >  0      0     1        T1-000  keep it cold, keep it resident.
> > > >  0      1     0        T3-010  make it non-resident
> > > >  0      1     1        T1-100  make it hot
> > > > 
> > > > 
> > > > T3-rotation: remove from non-resident list.
> > 
> > A fault will check the non-resident list and also remove the page if
> > found.
> > 
> > > > So, on fault we rotate T2, unless empty then we start by rotating T1
> > > > until T2 contains at least 1 cold page.
> > > > If a T2 rotation creates a hot page, we rotate T1 to degrade a hot
> > > > page to a cold page in order to keep the cold page target m_c.
> > > > Every T1 rotation adds |T1| to s. While s > c, we subtract c from s and
> > 
> > >        Does |T1| mean the total size if T1? 
> > 
> > Yes, |T1| is the size of T1. 
> > 
> > > Can you explain why in more
> > > detail?
> > > 
> > 
> > This because the hot hand pushes the test hand.
> > 
> > Hmmm, maybe it should not be every T1 rotation, but only rotations on
> > hot pages that get accounted. Because at that point the hot page with
> > the largest inter reference period will disappear and the test period
> > changes.
> > 
> > So the idea is to couple to rotation speed of T3 to T1 and scale the
> > respective sizes. 
> > 
> > > > turn T3 for each subtraction.
> > > > 
> > > > Compare to clock-pro:
> > > >   T1-rotation <-> Hand_hot
> > > >   T2-rotation <-> Hand_cold
> > > >   T3-rotation <-> Hand_test
> > > > 
> > > > The normal m_c adaption rules can be applied.
> > > > 
> > > > Zoned edition:
> > > > This can be done per zone by having:
> > > > T1_i, T2_i, T3_j, s, t, u_j
> > > > where _i is the zone index and _j the non-resident bucket index.
> > > > 
> > > > Then each T1_i turn will add |T1_i| to s, each c in s will increment t by 1.
> > > > On each non-resident bucket access we increment u_j until it equals t
> > > > and for each increment we rotate the bucket.
> > > > 
> > 
> > More thoughts, it should probably be: u_j = t/J. Where J is the total
> > number of buckets.
> > 
> > 
> > Kind regards,
> > 
> > Peter Zijlstra
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-15  4:10       ` Peter Zijlstra
@ 2005-10-16  5:34         ` Song Jiang
  2005-10-16  8:37           ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Song Jiang @ 2005-10-16  5:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: riel, linux-mm

On Fri, 2005-10-14 at 22:10, Peter Zijlstra wrote:
> On Fri, 2005-10-14 at 21:35 -0600, Song Jiang wrote:
> > Peter,
> > 
> > Two obvious issues in the actions are 
> > (1) test bits have never been set; 
> > (2) pages in T3 have never been promoted into T1/T2.
> 
> Drad, more missing information:
> On fault the new page is searched for in T3, if present T1-100 otherwise
> T1-010.
> 

I am interested in rethinking the mechanism of clock-pro under your 
3-digit code abstraction. 

I put my comments in pairs of brackets.

(Please confirm: all T1 and T2 pages are resident.)


T1-rotation:

h/c   test   ref          action
 0       0       0           T2-000 
 0       0       1           T2-001 

 0       1       0           T2-000 
 0       1       1           T1-100
 1       0       0           T2-001 [T1-010 because I don't see why 
 we can artificially give it a ref. But we can give it a second
chance to test its next reuse in T1]
                               
 1       0       1           T1-100
 1       1       0           <cannot happen>
 1       1       1           <cannot happen>


T2-rotation:

h/c   test   ref          action
 0       0       0           <remove page from list>
 0       0       1           T1-000  [T1-010 to test its next reuse]

 0       1       0           T3-010 
 0       1       1           T1-100


T3-rotation: 
    present:       T1-100
    not present:  T1-010


My obsevations: 

(1) The test bits of all T1 cold pages are 1 
(test bit is irrelevant to hot pages),
because once a T1 page with test bit 1 has its
test bit cleared, it either leaves T1 or
becomes a hot page.
So that in T1-rotation, the cases of 000, 001 
cannot happen.

(2)  The test bits of all T2 cold pages are 0,
because all T2 pages are from T1, and there are 
no T1 actions generating a test bit.
So that in T2-rotation, the cases of 010, 011 
cannot happen.

(3) The test bits of all T3 pages are designed
to be 1. (I will keep thinking how to really 
achieve the design goal).

Considering the above, there is no need to
explictly use a test bit.


> > 
> > 
> > HAND-cold --> last resident cold page (== bottom of stack
> > Q) where a victim page is going to be searched.
> 
> Do you imply here that HAND-cold pushes HAND-hot? If so, then it is

   HAND-hot pushes HAND-test, but not necessarily HAND-cold,
because a cold page that is turned from a hot page can stay 
below the HAND-hot in the LIRS stack.

> possible to write this 2-hand clock as 2 lists, where the top one will
> always push its tail into the head of the botton, and the botton will
> puth into the head of the top one on reference.
> This is exactly what I have done.

I have no issue with this.

> > If we use separate lists for h/c pages (like active/inactive
> > lists), we lose the chance for the comparison, and major 
> > performance advantages of clock-pro are lost.  
> 
> Neither my proposal nor CART have h/c separarted lists. CARTs T1 list
> can contain both hot and cold pages as can mine.
> 
My fault in expressing. I actually mean multiple lists,
not necessarily hot and cold pages have to be in seperate 
lists. To effectively set the test bits, I wonder we might 
have to put non-resident pages with resident pages
in the same list. But I will keep thinking how to 
make them seperate without compromising the role
of test period.   


> > Actually 
> > CAR/CART have been shown in the aforementioned paper 
> > not be able to deal with loop access problem of LRU, 
> > and that is the reason.
> 
> I fixed that:
>   http://programming.kicks-ass.net/kernel-patches/cart/cart-cart-r.patch

That is nice.


> > 
> > Let me know if the single list suggestion is feasible
> > in the Linux kernel. 
> 
> Single list should not be a problem. However I still have some trouble
> with placing the page meta-data of non-resident pages on those lists.
> 
    We should have a way out on this.

Song


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Another Clock-pro approx
  2005-10-16  5:34         ` Song Jiang
@ 2005-10-16  8:37           ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2005-10-16  8:37 UTC (permalink / raw)
  To: Song Jiang; +Cc: riel, linux-mm

On Sat, 2005-10-15 at 23:34 -0600, Song Jiang wrote:
> On Fri, 2005-10-14 at 22:10, Peter Zijlstra wrote:
> > On Fri, 2005-10-14 at 21:35 -0600, Song Jiang wrote:
> > > Peter,
> > > 
> > > Two obvious issues in the actions are 
> > > (1) test bits have never been set; 
> > > (2) pages in T3 have never been promoted into T1/T2.
> > 
> > Drad, more missing information:
> > On fault the new page is searched for in T3, if present T1-100 otherwise
> > T1-010.
> > 
> 
> I am interested in rethinking the mechanism of clock-pro under your 
> 3-digit code abstraction.

Great, thanks for the help and trust in my idea.

> I put my comments in pairs of brackets.
> 
> (Please confirm: all T1 and T2 pages are resident.)

Ack.

> 
> T1-rotation:
> 
> h/c   test   ref          action
>  0       0       0           T2-000

We could even remove this page here and be done with it.
Ok, that was before I reached the end of the email, with the current set
of actions this one is also a <cannot happen>. As would the next be.

>  
>  0       0       1           T2-001 
> 
>  0       1       0           T2-000 
>  0       1       1           T1-100
>  1       0       0           T2-001 [T1-010 because I don't see why 
>  we can artificially give it a ref. But we can give it a second
> chance to test its next reuse in T1]

Yes, my mistake, T1-010 was intended.

>  1       0       1           T1-100
>  1       1       0           <cannot happen>
>  1       1       1           <cannot happen>
> 
> 
> T2-rotation:
> 
> h/c   test   ref          action
>  0       0       0           <remove page from list>
>  0       0       1           T1-000  [T1-010 to test its next reuse]

Ah, reset test period. Good, good.

>  0       1       0           T3-010 
>  0       1       1           T1-100
> 
> 
> T3-rotation: 
>     present:       T1-100
>     not present:  T1-010

Almost, T3 is not actually rotated on fault time, it is searched for the
faulting page; then on presence the faulting page will become hot (and
removed from T3), otherwise cold-test. (Think of T3 as a
threaded-hash-table.)

The T3-rotation which is initiated from a T1-rotation (of a hot page)
will just remove the tail page on T3. 

This action is designed so that when the largest hot page is made cold
this change in test period is also reflected on the non-resident list.

> 
> My obsevations: 
> 
> (1) The test bits of all T1 cold pages are 1 
> (test bit is irrelevant to hot pages),
> because once a T1 page with test bit 1 has its
> test bit cleared, it either leaves T1 or
> becomes a hot page.
> So that in T1-rotation, the cases of 000, 001 
> cannot happen.
> 
> (2)  The test bits of all T2 cold pages are 0,
> because all T2 pages are from T1, and there are 
> no T1 actions generating a test bit.
> So that in T2-rotation, the cases of 010, 011 
> cannot happen.
> 
> (3) The test bits of all T3 pages are designed
> to be 1. (I will keep thinking how to really 
> achieve the design goal).
> 
> Considering the above, there is no need to
> explictly use a test bit.
> 

Hmm, nice. Good property.


> > > 
> > > 
> > > HAND-cold --> last resident cold page (== bottom of stack
> > > Q) where a victim page is going to be searched.
> > 
> > Do you imply here that HAND-cold pushes HAND-hot? If so, then it is
> 
>    HAND-hot pushes HAND-test, but not necessarily HAND-cold,
> because a cold page that is turned from a hot page can stay 
> below the HAND-hot in the LIRS stack.

Yes, I was afraid this might be the case. We'll have to see how good the
below approx works in that case.

> > possible to write this 2-hand clock as 2 lists, where the top one will
> > always push its tail into the head of the botton, and the botton will
> > puth into the head of the top one on reference.
> > This is exactly what I have done.
> 
> I have no issue with this.
> 
> > > If we use separate lists for h/c pages (like active/inactive
> > > lists), we lose the chance for the comparison, and major 
> > > performance advantages of clock-pro are lost.  
> > 
> > Neither my proposal nor CART have h/c separarted lists. CARTs T1 list
> > can contain both hot and cold pages as can mine.
> > 
> My fault in expressing. I actually mean multiple lists,
> not necessarily hot and cold pages have to be in seperate 
> lists. To effectively set the test bits, I wonder we might 
> have to put non-resident pages with resident pages
> in the same list. But I will keep thinking how to 
> make them seperate without compromising the role
> of test period.   

I hoped that the coupling between T1 and T3 might achieve this.

> > > 
> > > Let me know if the single list suggestion is feasible
> > > in the Linux kernel. 
> > 
> > Single list should not be a problem. However I still have some trouble
> > with placing the page meta-data of non-resident pages on those lists.
> > 
>     We should have a way out on this.

Lets see what happens.

I actually started coding up this thing, so we can do some tests. Like a
lot of ppl say, numbers talk ;-)

Kind regards,
Peter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-10-16  8:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-13 18:26 Another Clock-pro approx Peter Zijlstra
2005-10-14  6:44 ` Song Jiang
2005-10-14  7:38   ` Peter Zijlstra
2005-10-15  3:35     ` Song Jiang
2005-10-15  4:10       ` Peter Zijlstra
2005-10-16  5:34         ` Song Jiang
2005-10-16  8:37           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox