linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, Alex Shi <alex.shi@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.com>,
	Roman Gushchin <guro@fb.com>, Vlastimil Babka <vbabka@suse.cz>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	Yang Shi <shy828301@gmail.com>,
	linux-kernel@vger.kernel.org, page-reclaim@google.com
Subject: Re: [PATCH v1 10/14] mm: multigenerational lru: core
Date: Mon, 15 Mar 2021 22:45:18 -0600	[thread overview]
Message-ID: <YFA33n+zQb8oomjJ@google.com> (raw)
In-Reply-To: <87im5rsvd8.fsf@yhuang6-desk1.ccr.corp.intel.com>

On Tue, Mar 16, 2021 at 10:08:51AM +0800, Huang, Ying wrote:
> Yu Zhao <yuzhao@google.com> writes:
> [snip]
> 
> > +/* Main function used by foreground, background and user-triggered aging. */
> > +static bool walk_mm_list(struct lruvec *lruvec, unsigned long next_seq,
> > +			 struct scan_control *sc, int swappiness)
> > +{
> > +	bool last;
> > +	struct mm_struct *mm = NULL;
> > +	int nid = lruvec_pgdat(lruvec)->node_id;
> > +	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> > +	struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
> > +
> > +	VM_BUG_ON(next_seq > READ_ONCE(lruvec->evictable.max_seq));
> > +
> > +	/*
> > +	 * For each walk of the mm list of a memcg, we decrement the priority
> > +	 * of its lruvec. For each walk of memcgs in kswapd, we increment the
> > +	 * priorities of all lruvecs.
> > +	 *
> > +	 * So if this lruvec has a higher priority (smaller value), it means
> > +	 * other concurrent reclaimers (global or memcg reclaim) have walked
> > +	 * its mm list. Skip it for this priority to balance the pressure on
> > +	 * all memcgs.
> > +	 */
> > +#ifdef CONFIG_MEMCG
> > +	if (!mem_cgroup_disabled() && !cgroup_reclaim(sc) &&
> > +	    sc->priority > atomic_read(&lruvec->evictable.priority))
> > +		return false;
> > +#endif
> > +
> > +	do {
> > +		last = get_next_mm(lruvec, next_seq, swappiness, &mm);
> > +		if (mm)
> > +			walk_mm(lruvec, mm, swappiness);
> > +
> > +		cond_resched();
> > +	} while (mm);
> 
> It appears that we need to scan the whole address space of multiple
> processes in this loop?
> 
> If so, I have some concerns about the duration of the function.  Do you
> have some number of the distribution of the duration of the function?
> And may be the number of mm_struct and the number of pages scanned.
> 
> In comparison, in the traditional LRU algorithm, for each round, only a
> small subset of the whole physical memory is scanned.

Reasonable concerns, and insightful too. We are sensitive to direct
reclaim latency, and we tuned another path carefully so that direct
reclaims virtually don't hit this path :)

Some numbers from the cover letter first:
  In addition, direct reclaim latency is reduced by 22% at 99th
  percentile and the number of refaults is reduced 7%. These metrics are
  important to phones and laptops as they are correlated to user
  experience.

And "another path" is the background aging in kswapd:
  age_active_anon()
    age_lru_gens()
      try_walk_mm_list()
        /* try to spread pages out across spread+1 generations */
        if (old_and_young[0] >= old_and_young[1] * spread &&
            min_nr_gens(max_seq, min_seq, swappiness) > max(spread, MIN_NR_GENS))
                return;

        walk_mm_list(lruvec, max_seq, sc, swappiness);

By default, spread = 2, which makes kswapd slight more aggressive
than direct reclaim for our use cases. This can be entirely disabled
by setting spread to 0, for worloads that don't care about direct
reclaim latency, or larger values, they are more sensitive than
ours.

It's worth noting that walk_mm_list() is multithreaded -- reclaiming
threads can work on different mm_structs on the same list
concurrently. We do occasionally see this function in direct reclaims,
on over-overcommitted systems, i.e., kswapd CPU usage is 100%. Under
the same condition, we saw the current page reclaim live locked and
triggered hardware watchdog timeouts (our hardware watchdog is set to
2 hours) many times.


  reply	other threads:[~2021-03-16  4:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16  2:08 Huang, Ying
2021-03-16  4:45 ` Yu Zhao [this message]
2021-03-16  6:52   ` Huang, Ying
2021-03-16  8:24     ` Yu Zhao
2021-03-16  8:53       ` Huang, Ying
2021-03-16 18:40         ` Yu Zhao
  -- strict thread matches above, loose matches on Subject: below --
2021-03-13  7:57 [PATCH v1 00/14] Multigenerational LRU Yu Zhao
2021-03-13  7:57 ` [PATCH v1 10/14] mm: multigenerational lru: core Yu Zhao
2021-03-15  2:02   ` Andi Kleen
2021-03-15  3:37     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFA33n+zQb8oomjJ@google.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=page-reclaim@google.com \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox