From: "Henry Huang" <henry.hj@antgroup.com>
To: yuzhao@google.com
Cc: akpm@linux-foundation.org, "Henry Huang" <henry.hj@antgroup.com>,
谈鉴锋 <henry.tjf@antgroup.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
"朱辉(茶水)" <teawater@antgroup.com>
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap
Date: Fri, 15 Dec 2023 20:44:17 +0800 [thread overview]
Message-ID: <20231215124423.88878-1-henry.hj@antgroup.com> (raw)
In-Reply-To: <CAOUHufavCOqwkm4BJJzHY+RUOafFBLH7t0O+KRbw=ns-RdYwdA@mail.gmail.com>
On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao <yuzhao@google.com> wrote:
>Regarding the change itself, it'd cause a slight regression to other
>use cases (details below).
>
> > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> > unsigned long pfn;
> > struct folio *folio;
> > pte_t ptent = ptep_get(pte + i);
> > + bool is_pte_young;
> >
> > total++;
> > walk->mm_stats[MM_LEAF_TOTAL]++;
> > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> > if (pfn == -1)
> > continue;
> >
> > - if (!pte_young(ptent)) {
> > - walk->mm_stats[MM_LEAF_OLD]++;
>
> Most overhead from page table scanning normally comes from
> get_pfn_folio() because it almost always causes a cache miss. This is
> like a pointer dereference, whereas scanning PTEs is like streaming an
> array (bad vs good cache performance).
>
> pte_young() is here to avoid an unnecessary cache miss from
> get_pfn_folio(). Also see the first comment in get_pfn_folio(). It
> should be easy to verify the regression -- FlameGraph from the
> memcached benchmark in the original commit message should do it.
>
> Would a tracepoint here work for you?
>
>
>
> > + is_pte_young = !!pte_young(ptent);
> > + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young);
> > + if (!folio) {
> > + if (!is_pte_young)
> > + walk->mm_stats[MM_LEAF_OLD]++;
> > continue;
> > }
> >
> > - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
> > - if (!folio)
> > + if (!folio_test_clear_young(folio) && !is_pte_young) {
> > + walk->mm_stats[MM_LEAF_OLD]++;
> > continue;
> > + }
> >
> > - if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
> > + if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i))
> > VM_WARN_ON_ONCE(true);
> >
> > young++;
Thanks for replying.
For avoiding below:
1. confict between page_idle/bitmap and mglru scan
2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte.
We have a new idea:
1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without
rmap walking or clearing pte young.
2. Use mglru proactive scan to clear page idle flag.
workflows:
t1 t2
mark pages idle mglru scan and check pages idle
It's easy for us to know that whether a page is accessed during t1~t2.
Some code changes like these:
We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow
original design.
static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
struct pglist_data *pgdat, bool can_swap, bool clear_idle)
{
struct folio *folio;
/* try to avoid unnecessary memory loads */
if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
return NULL;
folio = pfn_folio(pfn);
+
+ if (clear_idle && folio_test_idle(folio))
+ folio_clear_idle(folio);
+
if (folio_nid(folio) != pgdat->node_id)
return NULL;
if (folio_memcg_rcu(folio) != memcg)
return NULL;
/* file VMAs can contain anon pages from COW */
if (!folio_is_file_lru(folio) && !can_swap)
return NULL;
return folio;
}
static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
struct mm_walk *args)
{
...
for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) {
unsigned long pfn;
struct folio *folio;
pte_t ptent = ptep_get(pte + i);
total++;
walk->mm_stats[MM_LEAF_TOTAL]++;
pfn = get_pte_pfn(ptent, args->vma, addr);
if (pfn == -1)
continue;
if (!pte_young(ptent)) {
walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
+ folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true);
- folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
if (!folio)
continue;
...
}
Is it a good idea or not ?
--
2.43.0
prev parent reply other threads:[~2023-12-15 12:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 12:50 Henry Huang
2023-12-06 12:50 ` Henry Huang
2023-12-07 1:30 ` Yu Zhao
2023-12-08 7:12 ` Henry Huang
2023-12-15 6:46 ` Yu Zhao
2023-12-15 10:53 ` Henry Huang
2023-12-16 21:06 ` Yu Zhao
2023-12-17 6:59 ` Henry Huang
2023-12-21 23:15 ` Yuanchu Xie
2023-12-22 2:44 ` Henry Huang
2023-12-22 4:35 ` Yu Zhao
2023-12-22 5:14 ` David Rientjes
2023-12-22 15:40 ` Henry Huang
2024-01-10 19:24 ` Yuanchu Xie
2024-01-12 4:40 ` Henry Huang
2023-12-15 7:23 ` Yu Zhao
2023-12-15 12:44 ` Henry Huang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231215124423.88878-1-henry.hj@antgroup.com \
--to=henry.hj@antgroup.com \
--cc=akpm@linux-foundation.org \
--cc=henry.tjf@antgroup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=teawater@antgroup.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox