linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	 Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
	Roman Gushchin <guro@fb.com>,  Jeff Layton <jlayton@redhat.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	 "open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 Pavel Tatashin <pasha.tatashin@soleen.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	geng.ren@unisoc.com
Subject: Re: [PATCH] mm:workingset use real time to judge activity of the file page
Date: Fri, 5 Apr 2019 11:13:32 +0800	[thread overview]
Message-ID: <CAGWkznF-LV2BBjcSCmyJzqmYUUvxfNiLbtN5V8xwt3+=uHgqnQ@mail.gmail.com> (raw)
In-Reply-To: <20190404071512.GE12864@dhcp22.suse.cz>

resend it via the right mailling list and rewrite the comments by ZY.

On Thu, Apr 4, 2019 at 3:15 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> [Fixup email for Pavel and add Johannes]
>
> On Thu 04-04-19 11:30:17, Zhaoyang Huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > In previous implementation, the number of refault pages is used
> > for judging the refault period of each page, which is not precised as
> > eviction of other files will be affect a lot on current cache.
> > We introduce the timestamp into the workingset's entry and refault ratio
> > to measure the file page's activity. It helps to decrease the affection
> > of other files(average refault ratio can reflect the view of whole system
> > 's memory).
> > The patch is tested on an Android system, which can be described as
> > comparing the launch time of an application between a huge memory
> > consumption. The result is launch time decrease 50% and the page fault
> > during the test decrease 80%.
> >
I don't understand what exactly you're saying here, can you please elaborate?

The reason it's using distances instead of absolute time is because
the ordering of the LRU is relative and not based on absolute time.

E.g. if a page is accessed every 500ms, it depends on all other pages
to determine whether this page is at the head or the tail of the LRU.

So when you refault, in order to determine the relative position of
the refaulted page in the LRU, you have to compare it to how fast that
LRU is moving. The absolute refault time, or the average time between
refaults, is not comparable to what's already in memory.

comment by ZY
For current implementation, it is hard to deal with the evaluation of
refault period under the scenario of huge dropping of file pages
within short time, which maybe caused by a high order allocation or
continues single page allocation in KSWAPD. On the contrary, such page
which having a big refault_distance will be deemed as INACTIVE
wrongly, which will be reclaimed earlier than it should be and lead to
page thrashing. So we introduce 'avg_refault_time' & 'refault_ratio'
to judge if the refault is a accumulated thing or caused by a tight
reclaiming. That is to say, a big refault_distance in a long time
would also be inactive as the result of comparing it with ideal
time(avg_refault_time: avg_refault_time = delta_lru_reclaimed_pages/
avg_refault_retio (refault_ratio = lru->inactive_ages / time).
> > Signed-off-by: Zhaoyang Huang <huangzhaoyang@gmail.com>
> > ---
> >  include/linux/mmzone.h |  2 ++
> >  mm/workingset.c        | 24 +++++++++++++++++-------
> >  2 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 32699b2..c38ba0a 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -240,6 +240,8 @@ struct lruvec {
> >       atomic_long_t                   inactive_age;
> >       /* Refaults at the time of last reclaim cycle */
> >       unsigned long                   refaults;
> > +     atomic_long_t                   refaults_ratio;
> > +     atomic_long_t                   prev_fault;
> >  #ifdef CONFIG_MEMCG
> >       struct pglist_data *pgdat;
> >  #endif
> > diff --git a/mm/workingset.c b/mm/workingset.c
> > index 40ee02c..6361853 100644
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -159,7 +159,7 @@
> >                        NODES_SHIFT +  \
> >                        MEM_CGROUP_ID_SHIFT)
> >  #define EVICTION_MASK        (~0UL >> EVICTION_SHIFT)
> > -
> > +#define EVICTION_JIFFIES (BITS_PER_LONG >> 3)
> >  /*
> >   * Eviction timestamps need to be able to cover the full range of
> >   * actionable refaults. However, bits are tight in the radix tree
> > @@ -175,18 +175,22 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
> >       eviction >>= bucket_order;
> >       eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
> >       eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
> > +     eviction = (eviction << EVICTION_JIFFIES) | (jiffies >> EVICTION_JIFFIES);
> >       eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
> >
> >       return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
> >  }
> >
> >  static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
> > -                       unsigned long *evictionp)
> > +                       unsigned long *evictionp, unsigned long *prev_jiffp)
> >  {
> >       unsigned long entry = (unsigned long)shadow;
> >       int memcgid, nid;
> > +     unsigned long prev_jiff;
> >
> >       entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
> > +     entry >>= EVICTION_JIFFIES;
> > +     prev_jiff = (entry & ((1UL << EVICTION_JIFFIES) - 1)) << EVICTION_JIFFIES;
> >       nid = entry & ((1UL << NODES_SHIFT) - 1);
> >       entry >>= NODES_SHIFT;
> >       memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
> > @@ -195,6 +199,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
> >       *memcgidp = memcgid;
> >       *pgdat = NODE_DATA(nid);
> >       *evictionp = entry << bucket_order;
> > +     *prev_jiffp = prev_jiff;
> >  }
> >
> >  /**
> > @@ -242,8 +247,12 @@ bool workingset_refault(void *shadow)
> >       unsigned long refault;
> >       struct pglist_data *pgdat;
> >       int memcgid;
> > +     unsigned long refault_ratio;
> > +     unsigned long prev_jiff;
> > +     unsigned long avg_refault_time;
> > +     unsigned long refault_time;
> >
> > -     unpack_shadow(shadow, &memcgid, &pgdat, &eviction);
> > +     unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &prev_jiff);
> >
> >       rcu_read_lock();
> >       /*
> > @@ -288,10 +297,11 @@ bool workingset_refault(void *shadow)
> >        * list is not a problem.
> >        */
> >       refault_distance = (refault - eviction) & EVICTION_MASK;
> > -
> >       inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
> > -
> > -     if (refault_distance <= active_file) {
> > +     lruvec->refaults_ratio = atomic_long_read(&lruvec->inactive_age) / jiffies;
> > +     refault_time = jiffies - prev_jiff;
> > +     avg_refault_time = refault_distance / lruvec->refaults_ratio;
> > +     if (refault_time <= avg_refault_time) {
> >               inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
> >               rcu_read_unlock();
> >               return true;
> > @@ -521,7 +531,7 @@ static int __init workingset_init(void)
> >        * some more pages at runtime, so keep working with up to
> >        * double the initial memory by using totalram_pages as-is.
> >        */
> > -     timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
> > +     timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT - EVICTION_JIFFIES;
> >       max_order = fls_long(totalram_pages - 1);
> >       if (max_order > timestamp_bits)
> >               bucket_order = max_order - timestamp_bits;
> > --
> > 1.9.1
>
> --
> Michal Hocko
> SUSE Labs


  reply	other threads:[~2019-04-05  3:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04  3:30 Zhaoyang Huang
2019-04-04  7:15 ` Michal Hocko
2019-04-05  3:13   ` Zhaoyang Huang [this message]
2019-04-04 16:39 ` Johannes Weiner
2019-04-04 23:23   ` Zhaoyang Huang
2019-04-05 19:34     ` Johannes Weiner
2019-04-05  3:24 ` Matthew Wilcox
  -- strict thread matches above, loose matches on Subject: below --
2019-04-04  2:01 Zhaoyang Huang
2019-04-07  0:43 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGWkznF-LV2BBjcSCmyJzqmYUUvxfNiLbtN5V8xwt3+=uHgqnQ@mail.gmail.com' \
    --to=huangzhaoyang@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=geng.ren@unisoc.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jlayton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawilcox@microsoft.com \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox