linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: "Alex Zhu (Kernel)" <alexlzhu@meta.com>
Cc: Ning Zhang <ningzhang@linux.alibaba.com>,
	 Yu Zhao <yuzhao@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 Kernel Team <Kernel-team@fb.com>,
	 "willy@infradead.org" <willy@infradead.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	 "riel@surriel.com" <riel@surriel.com>
Subject: Re: [PATCH v3 3/3] mm: THP low utilization shrinker
Date: Thu, 20 Oct 2022 16:06:30 +0800	[thread overview]
Message-ID: <87v8oencl5.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <2017B57F-DE06-4B78-A518-43985304EEF4@fb.com> (Alex Zhu's message of "Thu, 20 Oct 2022 03:29:51 +0000")

"Alex Zhu (Kernel)" <alexlzhu@meta.com> writes:

>> On Oct 19, 2022, at 6:24 PM, Huang, Ying <ying.huang@intel.com> wrote:
>> 
>> "Alex Zhu (Kernel)" <alexlzhu@meta.com> writes:
>> 
>>>> On Oct 19, 2022, at 12:04 AM, Huang, Ying <ying.huang@intel.com> wrote:
>>>> 
>>>>> 
>>>> <alexlzhu@fb.com> writes:
>>>>

[snip]

>>>>> diff --git a/mm/list_lru.c b/mm/list_lru.c
>>>>> index a05e5bef3b40..8cc56a84b554 100644
>>>>> --- a/mm/list_lru.c
>>>>> +++ b/mm/list_lru.c
>>>>> @@ -140,6 +140,32 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(list_lru_add);
>>>>> 
>>>>> +bool list_lru_add_page(struct list_lru *lru, struct page *page, struct list_head *item)
>>>>> +{
>>>>> +	int nid = page_to_nid(page);
>>>>> +	struct list_lru_node *nlru = &lru->node[nid];
>>>>> +	struct list_lru_one *l;
>>>>> +	struct mem_cgroup *memcg;
>>>>> +	unsigned long flags;
>>>>> +
>>>>> +	spin_lock_irqsave(&nlru->lock, flags);
>>>>> +	if (list_empty(item)) {
>>>>> +		memcg = page_memcg(page);
>>>>> +		l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
>>>>> +		list_add_tail(item, &l->list);
>>>>> +		/* Set shrinker bit if the first element was added */
>>>>> +		if (!l->nr_items++)
>>>>> +			set_shrinker_bit(memcg, nid,
>>>>> +					 lru_shrinker_id(lru));
>>>>> +		nlru->nr_items++;
>>>>> +		spin_unlock_irqrestore(&nlru->lock, flags);
>>>>> +		return true;
>>>>> +	}
>>>>> +	spin_unlock_irqrestore(&nlru->lock, flags);
>>>>> +	return false;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(list_lru_add_page);
>>>>> +
>>>> 
>>>> It appears that only 2 lines are different from list_lru_add().  Is it
>>>> possible for us to share code?  For example, add another flag for
>>>> page_memcg() case?
>>> 
>>> I believe there are 4 lines. The page_to_nid(page) and the spin_lock_irqsave/restore. 
>>> It was implemented this way as we found we needed to take a page as a parameter and obtain the node id from
>>> the page. This is because the THP is not necessarily a slab object, as the list_lru_add/delete code assumes. 
>>> 
>>> Also, there is a potential deadlock when split_huge_page is called from reclaim and when split_huge_page is called 
>>> by the THP Shrinker, which is why we need irqsave/restore. 
>> 
>> Can you provide more details on this?  If it's necessary, I think it should be OK to use irqsave/restore.
>
> There is an ABBA deadlock between reclaim and the THP shrinker where
> one waits for the page lock and the other waits for the list_lru lock.

Per my understanding, when we hold list_lru lock, we only use
trylock_page() and backoff if we fail to acquire the page lock.  So it
appears that we are safe here?  If not, can you provide the detailed
race condition information, for example actions on different CPUs?

> There is another deadlock where free_transhuge_page interrupts the THP
> shrinker while it is splitting THPs (has the list_lru lock) and then
> acquires the same list_lru lock (self deadlock). These happened in our
> prod experiments.  I do believe irqsave/restore is necessary.

Got it.  We may use list_lru lock inside IRQ handler when free pages.  I
think this information is good for code comments or patch description.

>> 
>>> I though this would be cleaner than attempting to shared code with list_lru_add/delete. Only the shrinker makes use of this. 
>>> All other use cases assume slab objects.
>> 
>> Not only for slab objects now.  The original code works for slab and
>> normal page.  Which is controlled by list_lru->memcg_aware.  You can add
>> another flag for your new use case, or refactor it to use a function
>> pointer.
>> 
>> Best Regards,
>> Huang, Ying
>
> I’ll take a look at this tomorrow and get back to you. Thanks for the suggestion! 
>> 
>>>> 

[snip]

Best Regards,
Huang, Ying


  reply	other threads:[~2022-10-20  8:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 22:51 [PATCH v3 0/3] THP Shrinker alexlzhu
2022-10-12 22:51 ` [PATCH v3 1/3] mm: add thp_utilization metrics to debugfs alexlzhu
2022-10-13 11:35   ` Kirill A. Shutemov
2022-10-13 22:53     ` Alex Zhu (Kernel)
2022-10-18  4:28       ` Huang, Ying
2022-10-18  3:21   ` Huang, Ying
2022-10-12 22:51 ` [PATCH v3 2/3] mm: changes to split_huge_page() to free zero filled tail pages alexlzhu
2022-10-13 17:11   ` kernel test robot
2022-10-19  5:43   ` Huang, Ying
2022-10-12 22:51 ` [PATCH v3 3/3] mm: THP low utilization shrinker alexlzhu
2022-10-13 13:48   ` kernel test robot
2022-10-13 16:39   ` Johannes Weiner
2022-10-13 16:56     ` Johannes Weiner
2022-10-17 18:19       ` Alex Zhu (Kernel)
2022-10-19  7:04   ` Huang, Ying
2022-10-19 19:08     ` Alex Zhu (Kernel)
2022-10-20  1:24       ` Huang, Ying
2022-10-20  3:29         ` Alex Zhu (Kernel)
2022-10-20  8:06           ` Huang, Ying [this message]
2022-10-20 17:04             ` Alex Zhu (Kernel)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8oencl5.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=Kernel-team@fb.com \
    --cc=alexlzhu@meta.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=ningzhang@linux.alibaba.com \
    --cc=riel@surriel.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox