From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <linux-mm@kvack.org>, <akpm@linux-foundation.org>,
<willy@infradead.org>, <yuzhao@google.com>,
<ryan.roberts@arm.com>, <ying.huang@intel.com>
Subject: Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
Date: Sat, 29 Apr 2023 16:32:34 +0800 [thread overview]
Message-ID: <a0cd4ae0-fcc3-51a9-38e4-a3968fdf134d@intel.com> (raw)
In-Reply-To: <20230428140236.czx5eii34z373jqq@box.shutemov.name>
Hi Kirill,
On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>> free_transhuge_page() acquires split queue lock then check
>>>> whether the THP was added to deferred list or not.
>>>>
>>>> It's safe to check whether the THP is in deferred list or not.
>>>> When code hit free_transhuge_page(), there is no one tries
>>>> to update the folio's _deferred_list.
>>>>
>>>> If folio is not in deferred_list, it's safe to check without
>>>> acquiring lock.
>>>>
>>>> If folio is in deferred_list, the other node in deferred_list
>>>> adding/deleteing doesn't impact the return value of
>>>> list_epmty(@folio->_deferred_list).
>>>
>>> Typo.
>>>
>>>>
>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>> see the 61% split_queue_lock contention:
>>>> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k]
>>>> release_pages
>>>> - 70.93% release_pages
>>>> - 61.42% free_transhuge_page
>>>> + 60.77% _raw_spin_lock_irqsave
>>>>
>>>> With this patch applied, the split_queue_lock contention is less
>>>> than 1%.
>>>>
>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>> mm/huge_memory.c | 19 ++++++++++++++++---
>>>> 1 file changed, 16 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>> struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>> unsigned long flags;
>>>>
>>>> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>> - if (!list_empty(&folio->_deferred_list)) {
>>>> + /*
>>>> + * At this point, there is no one trying to queue the folio
>>>> + * to deferred_list. folio->_deferred_list is not possible
>>>> + * being updated.
>>>> + *
>>>> + * If folio is already added to deferred_list, add/delete to/from
>>>> + * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>> + * It's safe to check list_empty(&folio->_deferred_list) without
>>>> + * acquiring the lock.
>>>> + *
>>>> + * If folio is not in deferred_list, it's safe to check without
>>>> + * acquiring the lock.
>>>> + */
>>>> + if (data_race(!list_empty(&folio->_deferred_list))) {
>>>> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>
>>> Recheck under lock?
>> In function deferred_split_scan(), there is following code block:
>> if (folio_try_get(folio)) {
>> list_move(&folio->_deferred_list, &list);
>> } else {
>> /* We lost race with folio_put() */
>> list_del_init(&folio->_deferred_list);
>> ds_queue->split_queue_len--;
>> }
>>
>> I am wondering what kind of "lost race with folio_put()" can be.
>>
>> My understanding is that it's not necessary to handle this case here
>> because free_transhuge_page() will handle it once folio get zero ref.
>> But I must miss something here. Thanks.
>
> free_transhuge_page() got when refcount is already zero. Both
> deferred_split_scan() and free_transhuge_page() can see the page with zero
> refcount. The check makes deferred_split_scan() to leave the page to the
> free_transhuge_page().
>
If deferred_split_scan() leaves the page to free_transhuge_page(), is it
necessary to do
list_del_init(&folio->_deferred_list);
ds_queue->split_queue_len--;
Can these two line be left to free_transhuge_page() either? Thanks.
Regards
Yin, Fengwei
next prev parent reply other threads:[~2023-04-29 8:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-25 8:46 [PATCH v2 0/2] Reduce lock contention related with large folio Yin Fengwei
2023-04-25 8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
2023-04-25 12:38 ` Kirill A. Shutemov
2023-04-26 1:47 ` Yin Fengwei
2023-04-26 2:08 ` Yin Fengwei
2023-04-26 8:17 ` Ryan Roberts
2023-04-28 6:28 ` Yin, Fengwei
2023-04-28 14:02 ` Kirill A. Shutemov
2023-04-29 8:32 ` Yin, Fengwei [this message]
2023-04-29 8:46 ` Kirill A. Shutemov
2023-05-01 5:50 ` Yin, Fengwei
2023-04-26 1:13 ` Huang, Ying
2023-04-26 1:48 ` Yin Fengwei
2023-04-26 8:11 ` Ryan Roberts
2023-04-25 8:46 ` [PATCH v2 2/2] lru: allow large batched add large folio to lru list Yin Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0cd4ae0-fcc3-51a9-38e4-a3968fdf134d@intel.com \
--to=fengwei.yin@intel.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox