Re: [PATCH v3 3/3] mm: THP low utilization shrinker

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Zhu (Kernel)" <alexlzhu@meta.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "Alex Zhu (Kernel)" <alexlzhu@meta.com>,
	Ning Zhang <ningzhang@linux.alibaba.com>,
	Yu Zhao <yuzhao@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Kernel Team <Kernel-team@fb.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"riel@surriel.com" <riel@surriel.com>
Subject: Re: [PATCH v3 3/3] mm: THP low utilization shrinker
Date: Thu, 20 Oct 2022 17:04:02 +0000	[thread overview]
Message-ID: <38163FDD-5249-4BF5-99B0-B50117A178E5@fb.com> (raw)
In-Reply-To: <87v8oencl5.fsf@yhuang6-desk2.ccr.corp.intel.com>



> On Oct 20, 2022, at 1:06 AM, Huang, Ying <ying.huang@intel.com> wrote:
> 
> "Alex Zhu (Kernel)" <alexlzhu@meta.com> writes:
> 
>>> On Oct 19, 2022, at 6:24 PM, Huang, Ying <ying.huang@intel.com> wrote:
>>> 
>>> "Alex Zhu (Kernel)" <alexlzhu@meta.com> writes:
>>> 
>>>>> On Oct 19, 2022, at 12:04 AM, Huang, Ying <ying.huang@intel.com> wrote:
>>>>> 
>>>>>> 
>>>>> <alexlzhu@fb.com> writes:
>>>>> 
> 
> [snip]
> 
>>>>>> diff --git a/mm/list_lru.c b/mm/list_lru.c
>>>>>> index a05e5bef3b40..8cc56a84b554 100644
>>>>>> --- a/mm/list_lru.c
>>>>>> +++ b/mm/list_lru.c
>>>>>> @@ -140,6 +140,32 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>>>>>> }
>>>>>> EXPORT_SYMBOL_GPL(list_lru_add);
>>>>>> 
>>>>>> +bool list_lru_add_page(struct list_lru *lru, struct page *page, struct list_head *item)
>>>>>> +{
>>>>>> +	int nid = page_to_nid(page);
>>>>>> +	struct list_lru_node *nlru = &lru->node[nid];
>>>>>> +	struct list_lru_one *l;
>>>>>> +	struct mem_cgroup *memcg;
>>>>>> +	unsigned long flags;
>>>>>> +
>>>>>> +	spin_lock_irqsave(&nlru->lock, flags);
>>>>>> +	if (list_empty(item)) {
>>>>>> +		memcg = page_memcg(page);
>>>>>> +		l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
>>>>>> +		list_add_tail(item, &l->list);
>>>>>> +		/* Set shrinker bit if the first element was added */
>>>>>> +		if (!l->nr_items++)
>>>>>> +			set_shrinker_bit(memcg, nid,
>>>>>> +					 lru_shrinker_id(lru));
>>>>>> +		nlru->nr_items++;
>>>>>> +		spin_unlock_irqrestore(&nlru->lock, flags);
>>>>>> +		return true;
>>>>>> +	}
>>>>>> +	spin_unlock_irqrestore(&nlru->lock, flags);
>>>>>> +	return false;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(list_lru_add_page);
>>>>>> +
>>>>> 
>>>>> It appears that only 2 lines are different from list_lru_add().  Is it
>>>>> possible for us to share code?  For example, add another flag for
>>>>> page_memcg() case?
>>>> 
>>>> I believe there are 4 lines. The page_to_nid(page) and the spin_lock_irqsave/restore. 
>>>> It was implemented this way as we found we needed to take a page as a parameter and obtain the node id from
>>>> the page. This is because the THP is not necessarily a slab object, as the list_lru_add/delete code assumes. 
>>>> 
>>>> Also, there is a potential deadlock when split_huge_page is called from reclaim and when split_huge_page is called 
>>>> by the THP Shrinker, which is why we need irqsave/restore. 
>>> 
>>> Can you provide more details on this?  If it's necessary, I think it should be OK to use irqsave/restore.
>> 
>> There is an ABBA deadlock between reclaim and the THP shrinker where
>> one waits for the page lock and the other waits for the list_lru lock.
> 
> Per my understanding, when we hold list_lru lock, we only use
> trylock_page() and backoff if we fail to acquire the page lock.  So it
> appears that we are safe here?  If not, can you provide the detailed
> race condition information, for example actions on different CPUs?

The ABBA deadlock was the reason we added this try lock_page(). It is safe now, was not safe in v3. 
> 
>> There is another deadlock where free_transhuge_page interrupts the THP
>> shrinker while it is splitting THPs (has the list_lru lock) and then
>> acquires the same list_lru lock (self deadlock). These happened in our
>> prod experiments.  I do believe irqsave/restore is necessary.
> 
> Got it.  We may use list_lru lock inside IRQ handler when free pages.  I
> think this information is good for code comments or patch description.

Sounds good! I’ll add some comments to make that more clear. 
> 
>>> 
>>>> I though this would be cleaner than attempting to shared code with list_lru_add/delete. Only the shrinker makes use of this. 
>>>> All other use cases assume slab objects.
>>> 
>>> Not only for slab objects now.  The original code works for slab and
>>> normal page.  Which is controlled by list_lru->memcg_aware.  You can add
>>> another flag for your new use case, or refactor it to use a function
>>> pointer.
>>> 
>>> Best Regards,
>>> Huang, Ying
>> 
>> I’ll take a look at this tomorrow and get back to you. Thanks for the suggestion! 
>>> 
>>>>> 
> 
> [snip]
> 
> Best Regards,
> Huang, Ying

     prev parent reply	other threads:[~2022-10-20 17:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 22:51 [PATCH v3 0/3] THP Shrinker alexlzhu
2022-10-12 22:51 ` [PATCH v3 1/3] mm: add thp_utilization metrics to debugfs alexlzhu
2022-10-13 11:35   ` Kirill A. Shutemov
2022-10-13 22:53     ` Alex Zhu (Kernel)
2022-10-18  4:28       ` Huang, Ying
2022-10-18  3:21   ` Huang, Ying
2022-10-12 22:51 ` [PATCH v3 2/3] mm: changes to split_huge_page() to free zero filled tail pages alexlzhu
2022-10-13 17:11   ` kernel test robot
2022-10-19  5:43   ` Huang, Ying
2022-10-12 22:51 ` [PATCH v3 3/3] mm: THP low utilization shrinker alexlzhu
2022-10-13 13:48   ` kernel test robot
2022-10-13 16:39   ` Johannes Weiner
2022-10-13 16:56     ` Johannes Weiner
2022-10-17 18:19       ` Alex Zhu (Kernel)
2022-10-19  7:04   ` Huang, Ying
2022-10-19 19:08     ` Alex Zhu (Kernel)
2022-10-20  1:24       ` Huang, Ying
2022-10-20  3:29         ` Alex Zhu (Kernel)
2022-10-20  8:06           ` Huang, Ying
2022-10-20 17:04             ` Alex Zhu (Kernel) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38163FDD-5249-4BF5-99B0-B50117A178E5@fb.com \
    --to=alexlzhu@meta.com \
    --cc=Kernel-team@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=ningzhang@linux.alibaba.com \
    --cc=riel@surriel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox