From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3043C4332F for ; Thu, 20 Oct 2022 08:07:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D8BA8E0001; Thu, 20 Oct 2022 04:07:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 388346B007B; Thu, 20 Oct 2022 04:07:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 250488E0001; Thu, 20 Oct 2022 04:07:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 154476B0071 for ; Thu, 20 Oct 2022 04:07:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C92E3C05E0 for ; Thu, 20 Oct 2022 08:07:20 +0000 (UTC) X-FDA: 80040597840.30.DF78A13 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf28.hostedemail.com (Postfix) with ESMTP id 8B82DC0039 for ; Thu, 20 Oct 2022 08:07:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666253239; x=1697789239; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=UG7hL7f3fdWI3+0js12wWx9alsZjRqxibaUYME9dfDI=; b=HKfI3yMRwLKrCEU0SJ7CZHm/KxnsqtzcqPqyFkUqwL1i1UNsvXXCph0E dxDNDNQLODAzCd1i9PdZb92w5YCMpC96JNp4f99XAOgA9tEzMZWNFbkth Vxhr3UhjHQyFVsUPuWcKKE/d6jiNv7L6EdoVzin/NMdyQu+xyXdIUUhjA lMXFlmDqojqIgvSfjV9HvMuD6aUR9ZVq3LgUCRr6XffwQBWHKsn3PLzfE A5qi8tseM678FH9SHnBODy2gHpqaqOciLn8u6mXbAmYWGs0uEPTkSZKFY R3IgyssiATUc+Da7moWaxJB+cFcw8LXSDLz+sUVEp1VZs0iIlSGWg6huu g==; X-IronPort-AV: E=McAfee;i="6500,9779,10505"; a="287042130" X-IronPort-AV: E=Sophos;i="5.95,198,1661842800"; d="scan'208";a="287042130" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2022 01:07:18 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10505"; a="607527019" X-IronPort-AV: E=Sophos;i="5.95,198,1661842800"; d="scan'208";a="607527019" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2022 01:07:15 -0700 From: "Huang, Ying" To: "Alex Zhu (Kernel)" Cc: Ning Zhang , Yu Zhao , "linux-mm@kvack.org" , Kernel Team , "willy@infradead.org" , "hannes@cmpxchg.org" , "riel@surriel.com" Subject: Re: [PATCH v3 3/3] mm: THP low utilization shrinker References: <87zgdsnvka.fsf@yhuang6-desk2.ccr.corp.intel.com> <5D3EC059-F7E6-4943-AD16-CBE73FCA0357@fb.com> <875ygfnv7r.fsf@yhuang6-desk2.ccr.corp.intel.com> <2017B57F-DE06-4B78-A518-43985304EEF4@fb.com> Date: Thu, 20 Oct 2022 16:06:30 +0800 In-Reply-To: <2017B57F-DE06-4B78-A518-43985304EEF4@fb.com> (Alex Zhu's message of "Thu, 20 Oct 2022 03:29:51 +0000") Message-ID: <87v8oencl5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=HKfI3yMR; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666253240; a=rsa-sha256; cv=none; b=RLcwKMgCczGoNGnGrTPBQ14InubyEy9OWbAgPczT096FDW7ek035BubbV+OtoMPfZjQTuk UicMglV2BxiAYy4LzbFvAmRXxq8NVkw8Oxt7Dk40KqOTdSJnGLBq4Ue2IFfDkjsBpbO88I r3wuv/yE42Yly4e1oOYL9PiFazG6T04= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666253240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/DmR3W69k+ytadl9mTnJxomuJnUnP4lbqPCHO8li9/o=; b=Y0tkPJPSUqhTc0nm+/XJjlN5pbaA9jHH5NHJudrhWDn5e7cdkGMXWPkVNQwltxJ8Z6wJq9 24kfx8YitRPWnAkp2kR49/v9J+eRGjLom5sw+E0jA7M9afGuU5vTIUMpZdA1ugzy0u5WRx BZIUdEvtb2ZPTjB23pJWNIWpNsVdSEE= Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=HKfI3yMR; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: rosegys7iyr4j85sx11gb7ojyacys3x1 X-Rspamd-Queue-Id: 8B82DC0039 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666253239-266299 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Alex Zhu (Kernel)" writes: >> On Oct 19, 2022, at 6:24 PM, Huang, Ying wrote: >>=20 >> "Alex Zhu (Kernel)" writes: >>=20 >>>> On Oct 19, 2022, at 12:04 AM, Huang, Ying wrote: >>>>=20 >>>>>=20 >>>> writes: >>>> [snip] >>>>> diff --git a/mm/list_lru.c b/mm/list_lru.c >>>>> index a05e5bef3b40..8cc56a84b554 100644 >>>>> --- a/mm/list_lru.c >>>>> +++ b/mm/list_lru.c >>>>> @@ -140,6 +140,32 @@ bool list_lru_add(struct list_lru *lru, struct l= ist_head *item) >>>>> } >>>>> EXPORT_SYMBOL_GPL(list_lru_add); >>>>>=20 >>>>> +bool list_lru_add_page(struct list_lru *lru, struct page *page, stru= ct list_head *item) >>>>> +{ >>>>> + int nid =3D page_to_nid(page); >>>>> + struct list_lru_node *nlru =3D &lru->node[nid]; >>>>> + struct list_lru_one *l; >>>>> + struct mem_cgroup *memcg; >>>>> + unsigned long flags; >>>>> + >>>>> + spin_lock_irqsave(&nlru->lock, flags); >>>>> + if (list_empty(item)) { >>>>> + memcg =3D page_memcg(page); >>>>> + l =3D list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); >>>>> + list_add_tail(item, &l->list); >>>>> + /* Set shrinker bit if the first element was added */ >>>>> + if (!l->nr_items++) >>>>> + set_shrinker_bit(memcg, nid, >>>>> + lru_shrinker_id(lru)); >>>>> + nlru->nr_items++; >>>>> + spin_unlock_irqrestore(&nlru->lock, flags); >>>>> + return true; >>>>> + } >>>>> + spin_unlock_irqrestore(&nlru->lock, flags); >>>>> + return false; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(list_lru_add_page); >>>>> + >>>>=20 >>>> It appears that only 2 lines are different from list_lru_add(). Is it >>>> possible for us to share code? For example, add another flag for >>>> page_memcg() case? >>>=20 >>> I believe there are 4 lines. The page_to_nid(page) and the spin_lock_ir= qsave/restore.=20 >>> It was implemented this way as we found we needed to take a page as a p= arameter and obtain the node id from >>> the page. This is because the THP is not necessarily a slab object, as = the list_lru_add/delete code assumes.=20 >>>=20 >>> Also, there is a potential deadlock when split_huge_page is called from= reclaim and when split_huge_page is called=20 >>> by the THP Shrinker, which is why we need irqsave/restore.=20 >>=20 >> Can you provide more details on this? If it's necessary, I think it sho= uld be OK to use irqsave/restore. > > There is an ABBA deadlock between reclaim and the THP shrinker where > one waits for the page lock and the other waits for the list_lru lock. Per my understanding, when we hold list_lru lock, we only use trylock_page() and backoff if we fail to acquire the page lock. So it appears that we are safe here? If not, can you provide the detailed race condition information, for example actions on different CPUs? > There is another deadlock where free_transhuge_page interrupts the THP > shrinker while it is splitting THPs (has the list_lru lock) and then > acquires the same list_lru lock (self deadlock). These happened in our > prod experiments. I do believe irqsave/restore is necessary. Got it. We may use list_lru lock inside IRQ handler when free pages. I think this information is good for code comments or patch description. >>=20 >>> I though this would be cleaner than attempting to shared code with list= _lru_add/delete. Only the shrinker makes use of this.=20 >>> All other use cases assume slab objects. >>=20 >> Not only for slab objects now. The original code works for slab and >> normal page. Which is controlled by list_lru->memcg_aware. You can add >> another flag for your new use case, or refactor it to use a function >> pointer. >>=20 >> Best Regards, >> Huang, Ying > > I=E2=80=99ll take a look at this tomorrow and get back to you. Thanks for= the suggestion!=20 >>=20 >>>>=20 [snip] Best Regards, Huang, Ying