linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Henry Huang" <henry.hj@antgroup.com>
To: yuzhao@google.com
Cc: akpm@linux-foundation.org, "Henry Huang" <henry.hj@antgroup.com>,
	谈鉴锋 <henry.tjf@antgroup.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"朱辉(茶水)" <teawater@antgroup.com>
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap
Date: Fri, 15 Dec 2023 20:44:17 +0800	[thread overview]
Message-ID: <20231215124423.88878-1-henry.hj@antgroup.com> (raw)
In-Reply-To: <CAOUHufavCOqwkm4BJJzHY+RUOafFBLH7t0O+KRbw=ns-RdYwdA@mail.gmail.com>

On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao <yuzhao@google.com> wrote:
>Regarding the change itself, it'd cause a slight regression to other
>use cases (details below).
>
> > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 unsigned long pfn;
> >                 struct folio *folio;
> >                 pte_t ptent = ptep_get(pte + i);
> > +               bool is_pte_young;
> >
> >                 total++;
> >                 walk->mm_stats[MM_LEAF_TOTAL]++;
> > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 if (pfn == -1)
> >                         continue;
> >
> > -               if (!pte_young(ptent)) {
> > -                       walk->mm_stats[MM_LEAF_OLD]++;
>
> Most overhead from page table scanning normally comes from
> get_pfn_folio() because it almost always causes a cache miss. This is
> like a pointer dereference, whereas scanning PTEs is like streaming an
> array (bad vs good cache performance).
>
> pte_young() is here to avoid an unnecessary cache miss from
> get_pfn_folio(). Also see the first comment in get_pfn_folio(). It
> should be easy to verify the regression -- FlameGraph from the
> memcached benchmark in the original commit message should do it.
>
> Would a tracepoint here work for you?
>
>
>
> > +               is_pte_young = !!pte_young(ptent);
> > +               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young);
> > +               if (!folio) {
> > +                       if (!is_pte_young)
> > +                               walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> >                 }
> >
> > -               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
> > -               if (!folio)
> > +               if (!folio_test_clear_young(folio) && !is_pte_young) {
> > +                       walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> > +               }
> >
> > -               if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
> > +               if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i))
> >                         VM_WARN_ON_ONCE(true);
> >
> >                 young++;

Thanks for replying.

For avoiding below:
1. confict between page_idle/bitmap and mglru scan
2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte.

We have a new idea:
1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without
rmap walking or clearing pte young.
2. Use mglru proactive scan to clear page idle flag.

workflows:
      t1                      t2 
mark pages idle    mglru scan and check pages idle

It's easy for us to know that whether a page is accessed during t1~t2.

Some code changes like these:

We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow
original design.

static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
					struct pglist_data *pgdat, bool can_swap, bool clear_idle)
{
	struct folio *folio;

	/* try to avoid unnecessary memory loads */
	if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
		return NULL;

	folio = pfn_folio(pfn);
+
+	if (clear_idle && folio_test_idle(folio))
+		folio_clear_idle(folio);
+
	if (folio_nid(folio) != pgdat->node_id)
		return NULL;

	if (folio_memcg_rcu(folio) != memcg)
		return NULL;

	/* file VMAs can contain anon pages from COW */
	if (!folio_is_file_lru(folio) && !can_swap)
		return NULL;

	return folio;
}

static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
			struct mm_walk *args)
{
...
	for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) {
		unsigned long pfn;
		struct folio *folio;
		pte_t ptent = ptep_get(pte + i);

		total++;
		walk->mm_stats[MM_LEAF_TOTAL]++;

		pfn = get_pte_pfn(ptent, args->vma, addr);
		if (pfn == -1)
			continue;

		if (!pte_young(ptent)) {
			walk->mm_stats[MM_LEAF_OLD]++;
			continue;
		}

+		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true);
-		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
		if (!folio)
			continue;
...
}

Is it a good idea or not ?

-- 
2.43.0



      reply	other threads:[~2023-12-15 12:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 12:50 Henry Huang
2023-12-06 12:50 ` Henry Huang
2023-12-07  1:30   ` Yu Zhao
2023-12-08  7:12     ` Henry Huang
2023-12-15  6:46       ` Yu Zhao
2023-12-15 10:53         ` Henry Huang
2023-12-16 21:06           ` Yu Zhao
2023-12-17  6:59             ` Henry Huang
2023-12-21 23:15           ` Yuanchu Xie
2023-12-22  2:44             ` Henry Huang
2023-12-22  4:35               ` Yu Zhao
2023-12-22  5:14                 ` David Rientjes
2023-12-22 15:40                   ` Henry Huang
2024-01-10 19:24                     ` Yuanchu Xie
2024-01-12  4:40                       ` Henry Huang
2023-12-15  7:23   ` Yu Zhao
2023-12-15 12:44     ` Henry Huang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231215124423.88878-1-henry.hj@antgroup.com \
    --to=henry.hj@antgroup.com \
    --cc=akpm@linux-foundation.org \
    --cc=henry.tjf@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=teawater@antgroup.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox