From: David Hildenbrand <david@redhat.com>
To: Jinjiang Tu <tujinjiang@huawei.com>,
yangge1116@126.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, 21cnbao@gmail.com,
baolin.wang@linux.alibaba.com, aneesh.kumar@linux.ibm.com,
liuzixing@hygon.cn, Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch
Date: Tue, 8 Apr 2025 12:04:42 +0200 [thread overview]
Message-ID: <207a00a2-0895-4086-97ae-d31ead423cf8@redhat.com> (raw)
In-Reply-To: <5d0cb178-6436-d98b-3abf-3bcf8710eb6f@huawei.com>
On 08.04.25 10:47, Jinjiang Tu wrote:
>
> 在 2025/4/1 22:33, David Hildenbrand 写道:
>> On 27.03.25 12:16, Jinjiang Tu wrote:
>>>
>>> 在 2025/3/26 20:46, David Hildenbrand 写道:
>>>> On 26.03.25 13:42, Jinjiang Tu wrote:
>>>>> Hi,
>>>>>
>>>>
>>>> Hi!
>>>>
>>>>> We notiched a 12.3% performance regression for LibMicro pwrite
>>>>> testcase due to
>>>>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
>>>>> adding to LRU batch").
>>>>>
>>>>> The testcase is executed as follows, and the file is tmpfs file.
>>>>> pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE
>>>>
>>>> Do we know how much that reflects real workloads? (IOW, how much
>>>> should we care)
>>>
>>> No, it's hard to say.
>>>
>>>>
>>>>>
>>>>> this testcase writes 1KB (only one page) to the tmpfs and repeats
>>>>> this step for many times. The Flame
>>>>> graph shows the performance regression comes from
>>>>> folio_mark_accessed() and workingset_activation().
>>>>>
>>>>> folio_mark_accessed() is called for the same page for many times.
>>>>> Before this patch, each call will
>>>>> add the page to cpu_fbatches.activate. When the fbatch is full, the
>>>>> fbatch is drained and the page
>>>>> is promoted to active list. And then, folio_mark_accessed() does
>>>>> nothing in later calls.
>>>>>
>>>>> But after this patch, the folio clear lru flags after it is added to
>>>>> cpu_fbatches.activate. After then,
>>>>> folio_mark_accessed will never call folio_activate() again due to the
>>>>> page is without lru flag, and
>>>>> the fbatch will not be full and the folio will not be marked active,
>>>>> later folio_mark_accessed()
>>>>> calls will always call workingset_activation(), leading to
>>>>> performance regression.
>>>>
>>>> Would there be a good place to drain the LRU to effectively get that
>>>> processed? (we can always try draining if the LRU flag is not set)
>>>
>>> Maybe we could drain the search the cpu_fbatches.activate of the
>>> local cpu in __lru_cache_activate_folio()? Drain other fbatches is
>>> meaningless .
>>
>> So the current behavior is that folio_mark_accessed() will end up
>> calling folio_activate()
>> once, and then __lru_cache_activate_folio() until the LRU was drained
>> (which can
>> take a looong time).
>>
>> The old behavior was that folio_mark_accessed() would keep calling
>> folio_activate() until
>> the LRU was drained simply because it was full of "all the same pages"
>> ?. Only *after*
>> the LRU was drained, folio_mark_accessed() would actually not do
>> anything (desired behavior).
>>
>> So the overhead comes primarily from __lru_cache_activate_folio()
>> searching through
>> the list "more" repeatedly because the LRU does get drained less
>> frequently; and
>> it would never find it in there in this case.
>>
>> So ... it used to be suboptimal before, now it's more suboptimal I
>> guess?! :)
>>
>> We'd need a way to better identify "this folio is already queued for
>> activation". Searching
>> that list as well would be one option, but the hole "search the list"
>> is nasty.
>>
>> Maybe we can simply set the folio as active when staging it for
>> activation, after having
>> cleared the LRU flag? We already do that during folio_add.
>>
>> As the LRU flag was cleared, nobody should be messing with that folio
>> until the cache was
>> drained and the move was successful.
>>
>>
>> Pretty sure this doesn't work, but just to throw out an idea:
>>
>> From c26e1c0ceda6c818826e5b89dc7c7c9279138f80 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@redhat.com>
>> Date: Tue, 1 Apr 2025 16:31:56 +0200
>> Subject: [PATCH] test
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/swap.c | 21 ++++++++++++++++-----
>> 1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index fc8281ef42415..bbf9aa76db87f 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -175,6 +175,8 @@ static void folio_batch_move_lru(struct
>> folio_batch *fbatch, move_fn_t move_fn)
>> folios_put(fbatch);
>> }
>>
>> +static void lru_activate(struct lruvec *lruvec, struct folio *folio);
>> +
>> static void __folio_batch_add_and_move(struct folio_batch __percpu
>> *fbatch,
>> struct folio *folio, move_fn_t move_fn,
>> bool on_lru, bool disable_irq)
>> @@ -191,6 +193,10 @@ static void __folio_batch_add_and_move(struct
>> folio_batch __percpu *fbatch,
>> else
>> local_lock(&cpu_fbatches.lock);
>>
>> + /* We'll only perform the actual list move deferred. */
>> + if (move_fn == lru_activate)
>> + folio_set_active(folio);
>> +
>> if (!folio_batch_add(this_cpu_ptr(fbatch), folio) ||
>> folio_test_large(folio) ||
>> lru_cache_disabled())
>> folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn);
>> @@ -299,12 +305,14 @@ static void lru_activate(struct lruvec *lruvec,
>> struct folio *folio)
>> {
>> long nr_pages = folio_nr_pages(folio);
>>
>> - if (folio_test_active(folio) || folio_test_unevictable(folio))
>> - return;
>> -
>> + /*
>> + * We set the folio active after clearing the LRU flag, and set the
>> + * LRU flag only after moving it to the right list.
>> + */
>> + VM_WARN_ON_ONCE(!folio_test_active(folio));
>> + VM_WARN_ON_ONCE(folio_test_unevictable(folio));
>>
>> lruvec_del_folio(lruvec, folio);
>> - folio_set_active(folio);
>> lruvec_add_folio(lruvec, folio);
>> trace_mm_lru_activate(folio);
>>
>> @@ -342,7 +350,10 @@ void folio_activate(struct folio *folio)
>> return;
>>
>> lruvec = folio_lruvec_lock_irq(folio);
>> - lru_activate(lruvec, folio);
>> + if (!folio_test_unevictable(folio)) {
>> + folio_set_active(folio);
>> + lru_activate(lruvec, folio);
>> + }
>> unlock_page_lruvec_irq(lruvec);
>> folio_set_lru(folio);
>> }
>
> I test with the patch, and the performance regression disappears.
>
> By the way, I find folio_test_unevictable() is called in lru_deactivate, lru_lazyfree, etc.
> unevictable flag is set when the caller clears lru flag. IIUC, since commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
> adding to LRU batch"), folios in fbatch can't be set unevictable flag, so there is no need to check unevictable flag in lru_deactivate, lru_lazyfree, etc?
I was asking myself the exact same question when crafting this patch.
Sounds like a cleanup worth investigating! :)
Do you have capacity to look into that, and to turn my proposal into a
proper patch? It might take me a bit until I get to it.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-04-08 10:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-04 6:52 yangge1116
2024-07-12 8:56 ` Hugh Dickins
2024-07-12 20:56 ` Andrew Morton
2025-03-26 12:42 ` Jinjiang Tu
2025-03-26 12:46 ` David Hildenbrand
2025-03-27 11:16 ` Jinjiang Tu
2025-04-01 14:33 ` David Hildenbrand
2025-04-07 7:41 ` Jinjiang Tu
2025-04-08 8:47 ` Jinjiang Tu
2025-04-08 10:04 ` David Hildenbrand [this message]
2025-04-08 12:01 ` Jinjiang Tu
2025-04-10 8:06 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=207a00a2-0895-4086-97ae-d31ead423cf8@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuzixing@hygon.cn \
--cc=stable@vger.kernel.org \
--cc=tujinjiang@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=yangge1116@126.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox