Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Jinjiang Tu <tujinjiang@huawei.com>,
	yangge1116@126.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, 21cnbao@gmail.com,
	baolin.wang@linux.alibaba.com, aneesh.kumar@linux.ibm.com,
	liuzixing@hygon.cn, Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch
Date: Tue, 8 Apr 2025 12:04:42 +0200	[thread overview]
Message-ID: <207a00a2-0895-4086-97ae-d31ead423cf8@redhat.com> (raw)
In-Reply-To: <5d0cb178-6436-d98b-3abf-3bcf8710eb6f@huawei.com>

On 08.04.25 10:47, Jinjiang Tu wrote:
> 
> 在 2025/4/1 22:33, David Hildenbrand 写道:
>> On 27.03.25 12:16, Jinjiang Tu wrote:
>>>
>>> 在 2025/3/26 20:46, David Hildenbrand 写道:
>>>> On 26.03.25 13:42, Jinjiang Tu wrote:
>>>>> Hi,
>>>>>
>>>>
>>>> Hi!
>>>>
>>>>> We notiched a 12.3% performance regression for LibMicro pwrite
>>>>> testcase due to
>>>>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
>>>>> adding to LRU batch").
>>>>>
>>>>> The testcase is executed as follows, and the file is tmpfs file.
>>>>>        pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE
>>>>
>>>> Do we know how much that reflects real workloads? (IOW, how much
>>>> should we care)
>>>
>>> No, it's hard to say.
>>>
>>>>
>>>>>
>>>>> this testcase writes 1KB (only one page) to the tmpfs and repeats
>>>>> this step for many times. The Flame
>>>>> graph shows the performance regression comes from
>>>>> folio_mark_accessed() and workingset_activation().
>>>>>
>>>>> folio_mark_accessed() is called for the same page for many times.
>>>>> Before this patch, each call will
>>>>> add the page to cpu_fbatches.activate. When the fbatch is full, the
>>>>> fbatch is drained and the page
>>>>> is promoted to active list. And then, folio_mark_accessed() does
>>>>> nothing in later calls.
>>>>>
>>>>> But after this patch, the folio clear lru flags after it is added to
>>>>> cpu_fbatches.activate. After then,
>>>>> folio_mark_accessed will never call folio_activate() again due to the
>>>>> page is without lru flag, and
>>>>> the fbatch will not be full and the folio will not be marked active,
>>>>> later folio_mark_accessed()
>>>>> calls will always call workingset_activation(), leading to
>>>>> performance regression.
>>>>
>>>> Would there be a good place to drain the LRU to effectively get that
>>>> processed? (we can always try draining if the LRU flag is not set)
>>>
>>> Maybe we could drain the search the cpu_fbatches.activate of the
>>> local cpu in __lru_cache_activate_folio()? Drain other fbatches is
>>> meaningless .
>>
>> So the current behavior is that folio_mark_accessed() will end up
>> calling folio_activate()
>> once, and then __lru_cache_activate_folio() until the LRU was drained
>> (which can
>> take a looong time).
>>
>> The old behavior was that folio_mark_accessed() would keep calling
>> folio_activate() until
>> the LRU was drained simply because it was full of "all the same pages"
>> ?. Only *after*
>> the LRU was drained, folio_mark_accessed() would actually not do
>> anything (desired behavior).
>>
>> So the overhead comes primarily from __lru_cache_activate_folio()
>> searching through
>> the list "more" repeatedly because the LRU does get drained less
>> frequently; and
>> it would never find it in there in this case.
>>
>> So ... it used to be suboptimal before, now it's more suboptimal I
>> guess?! :)
>>
>> We'd need a way to better identify "this folio is already queued for
>> activation". Searching
>> that list as well would be one option, but the hole "search the list"
>> is nasty.
>>
>> Maybe we can simply set the folio as active when staging it for
>> activation, after having
>> cleared the LRU flag? We already do that during folio_add.
>>
>> As the LRU flag was cleared, nobody should be messing with that folio
>> until the cache was
>> drained and the move was successful.
>>
>>
>> Pretty sure this doesn't work, but just to throw out an idea:
>>
>>  From c26e1c0ceda6c818826e5b89dc7c7c9279138f80 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@redhat.com>
>> Date: Tue, 1 Apr 2025 16:31:56 +0200
>> Subject: [PATCH] test
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   mm/swap.c | 21 ++++++++++++++++-----
>>   1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index fc8281ef42415..bbf9aa76db87f 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -175,6 +175,8 @@ static void folio_batch_move_lru(struct
>> folio_batch *fbatch, move_fn_t move_fn)
>>       folios_put(fbatch);
>>   }
>>
>> +static void lru_activate(struct lruvec *lruvec, struct folio *folio);
>> +
>>   static void __folio_batch_add_and_move(struct folio_batch __percpu
>> *fbatch,
>>           struct folio *folio, move_fn_t move_fn,
>>           bool on_lru, bool disable_irq)
>> @@ -191,6 +193,10 @@ static void __folio_batch_add_and_move(struct
>> folio_batch __percpu *fbatch,
>>       else
>>           local_lock(&cpu_fbatches.lock);
>>
>> +    /* We'll only perform the actual list move deferred. */
>> +    if (move_fn == lru_activate)
>> +        folio_set_active(folio);
>> +
>>       if (!folio_batch_add(this_cpu_ptr(fbatch), folio) ||
>> folio_test_large(folio) ||
>>           lru_cache_disabled())
>>           folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn);
>> @@ -299,12 +305,14 @@ static void lru_activate(struct lruvec *lruvec,
>> struct folio *folio)
>>   {
>>       long nr_pages = folio_nr_pages(folio);
>>
>> -    if (folio_test_active(folio) || folio_test_unevictable(folio))
>> -        return;
>> -
>> +    /*
>> +     * We set the folio active after clearing the LRU flag, and set the
>> +     * LRU flag only after moving it to the right list.
>> +     */
>> +    VM_WARN_ON_ONCE(!folio_test_active(folio));
>> +    VM_WARN_ON_ONCE(folio_test_unevictable(folio));
>>
>>       lruvec_del_folio(lruvec, folio);
>> -    folio_set_active(folio);
>>       lruvec_add_folio(lruvec, folio);
>>       trace_mm_lru_activate(folio);
>>
>> @@ -342,7 +350,10 @@ void folio_activate(struct folio *folio)
>>           return;
>>
>>       lruvec = folio_lruvec_lock_irq(folio);
>> -    lru_activate(lruvec, folio);
>> +    if (!folio_test_unevictable(folio)) {
>> +        folio_set_active(folio);
>> +        lru_activate(lruvec, folio);
>> +    }
>>       unlock_page_lruvec_irq(lruvec);
>>       folio_set_lru(folio);
>>   }
> 
> I test with the patch, and the performance regression disappears.
> 
> By the way, I find folio_test_unevictable() is called in lru_deactivate, lru_lazyfree, etc.
> unevictable flag is set when the caller clears lru flag. IIUC, since commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
> adding to LRU batch"), folios in fbatch can't be set unevictable flag, so there is no need to check unevictable flag in lru_deactivate, lru_lazyfree, etc?

I was asking myself the exact same question when crafting this patch. 
Sounds like a cleanup worth investigating! :)

Do you have capacity to look into that, and to turn my proposal into a 
proper patch? It might take me a bit until I get to it.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-04-08 10:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-04  6:52 yangge1116
2024-07-12  8:56 ` Hugh Dickins
2024-07-12 20:56   ` Andrew Morton
2025-03-26 12:42 ` Jinjiang Tu
2025-03-26 12:46   ` David Hildenbrand
2025-03-27 11:16     ` Jinjiang Tu
2025-04-01 14:33       ` David Hildenbrand
2025-04-07  7:41         ` Jinjiang Tu
2025-04-08  8:47         ` Jinjiang Tu
2025-04-08 10:04           ` David Hildenbrand [this message]
2025-04-08 12:01             ` Jinjiang Tu
2025-04-10  8:06             ` Jinjiang Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=207a00a2-0895-4086-97ae-d31ead423cf8@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuzixing@hygon.cn \
    --cc=stable@vger.kernel.org \
    --cc=tujinjiang@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yangge1116@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox