From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE37FC433E1 for ; Sun, 19 Jul 2020 04:46:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88CB72073A for ; Sun, 19 Jul 2020 04:46:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88CB72073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0666F8D0001; Sun, 19 Jul 2020 00:46:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 016786B0005; Sun, 19 Jul 2020 00:46:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6F8E8D0001; Sun, 19 Jul 2020 00:46:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id D1A226B0003 for ; Sun, 19 Jul 2020 00:46:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 25E3426824 for ; Sun, 19 Jul 2020 04:46:08 +0000 (UTC) X-FDA: 77053588416.30.chin05_5f0673426f19 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id E322E180B8406 for ; Sun, 19 Jul 2020 04:46:07 +0000 (UTC) X-HE-Tag: chin05_5f0673426f19 X-Filterd-Recvd-Size: 10236 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Sun, 19 Jul 2020 04:46:06 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04397;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0U36e0fH_1595133957; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U36e0fH_1595133957) by smtp.aliyun-inc.com(127.0.0.1); Sun, 19 Jul 2020 12:45:58 +0800 Subject: Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU To: Alexander Duyck Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , Michal Hocko , Vladimir Davydov References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-14-git-send-email-alex.shi@linux.alibaba.com> <072b39ac-b95a-94f1-67a2-3293d4550ff8@linux.alibaba.com> From: Alex Shi Message-ID: <530c222e-dfd4-f78e-e9d4-315fad6f816a@linux.alibaba.com> Date: Sun, 19 Jul 2020 12:45:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: E322E180B8406 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2020/7/18 =E4=B8=8A=E5=8D=882:26, Alexander Duyck =E5=86=99=E9=81= =93: > On Fri, Jul 17, 2020 at 12:46 AM Alex Shi = wrote: >> >> >> >> =E5=9C=A8 2020/7/17 =E4=B8=8A=E5=8D=885:12, Alexander Duyck =E5=86=99=E9= =81=93: >>> On Fri, Jul 10, 2020 at 5:59 PM Alex Shi = wrote: >>>> >>>> Combine PageLRU check and ClearPageLRU into a function by new >>>> introduced func TestClearPageLRU. This function will be used as page >>>> isolation precondition to prevent other isolations some where else. >>>> Then there are may non PageLRU page on lru list, need to remove BUG >>>> checking accordingly. >>>> >>>> Hugh Dickins pointed that __page_cache_release and release_pages >>>> has no need to do atomic clear bit since no user on the page at that >>>> moment. and no need get_page() before lru bit clear in isolate_lru_p= age, >>>> since it '(1) Must be called with an elevated refcount on the page'. >>>> >>>> As Andrew Morton mentioned this change would dirty cacheline for pag= e >>>> isn't on LRU. But the lost would be acceptable with Rong Chen >>>> report: >>>> https://lkml.org/lkml/2020/3/4/173 >>>> >> >> ... >> >>>> diff --git a/mm/swap.c b/mm/swap.c >>>> index f645965fde0e..5092fe9c8c47 100644 >>>> --- a/mm/swap.c >>>> +++ b/mm/swap.c >>>> @@ -83,10 +83,9 @@ static void __page_cache_release(struct page *pag= e) >>>> struct lruvec *lruvec; >>>> unsigned long flags; >>>> >>>> + __ClearPageLRU(page); >>>> spin_lock_irqsave(&pgdat->lru_lock, flags); >>>> lruvec =3D mem_cgroup_page_lruvec(page, pgdat); >>>> - VM_BUG_ON_PAGE(!PageLRU(page), page); >>>> - __ClearPageLRU(page); >>>> del_page_from_lru_list(page, lruvec, page_off_lru(pa= ge)); >>>> spin_unlock_irqrestore(&pgdat->lru_lock, flags); >>>> } >>> >>> So this piece doesn't make much sense to me. Why not use >>> TestClearPageLRU(page) here? Just a few lines above you are testing >>> for PageLRU(page) and it seems like if you are going to go for an >>> atomic test/clear and then remove the page from the LRU list you >>> should be using it here as well otherwise it seems like you could run >>> into a potential collision since you are testing here without clearin= g >>> the bit. >>> >> >> Hi Alex, >> >> Thanks a lot for comments! >> >> In this func's call path __page_cache_release, the page is unlikely be >> ClearPageLRU, since this page isn't used by anyone, and going to be fr= eed. >> just __ClearPageLRU would be safe, and could save a non lru page flags= disturb. >=20 > So if I understand what you are saying correctly you are indicating > that this page should likely not have the LRU flag set and that we > just transitioned it from 1 -> 0 so there should be nobody else > accessing it correct? >=20 > It might be useful to document this somewhere. Essentially what we are > doing then is breaking this up into the following cases. >=20 > 1. Setting the LRU bit requires holding the LRU lock > 2. Clearing the LRU bit requires either: > a. Use of atomic operations if page count is 1 or more > b. Non-atomic operations can be used if we cleared last referen= ce count >=20 > Is my understanding on this correct? So we have essentially two > scenarios, one for the get_page_unless_zero case, and another with the > put_page_testzero. the summary isn't incorrect.=20 The the points for me are: 1, Generally, the lru bit indicated if the page on lru list, just in some= temporary moment(isolating), the page may have no bit set when it's on lru list. t= hat imply the page must be on lru list when the lru bit is set. 2, have to remove lru bit before delete it from lru list. >=20 >>>> @@ -878,9 +877,8 @@ void release_pages(struct page **pages, int nr) >>>> spin_lock_irqsave(&locked_pgdat->lru= _lock, flags); >>>> } >>>> >>>> - lruvec =3D mem_cgroup_page_lruvec(page, lock= ed_pgdat); >>>> - VM_BUG_ON_PAGE(!PageLRU(page), page); >>>> __ClearPageLRU(page); >>>> + lruvec =3D mem_cgroup_page_lruvec(page, lock= ed_pgdat); >>>> del_page_from_lru_list(page, lruvec, page_of= f_lru(page)); >>>> } >>>> >>> >>> Same here. You are just moving the flag clearing, but you didn't >>> combine it with the test. It seems like if you are expecting this to >>> be treated as an atomic operation. It should be a relatively low cost >>> to do since you already should own the cacheline as a result of >>> calling put_page_testzero so I am not sure why you are not combining >>> the two. >> >> before the ClearPageLRU, there is a put_page_testzero(), that means no= one using >> this page, and isolate_lru_page can not run on this page the in func c= hecking. >> VM_BUG_ON_PAGE(!page_count(page), page); >> So it would be safe here. >=20 > Okay, so this is another 2b case as defined above then. >=20 >>> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index c1c4259b4de5..18986fefd49b 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -1548,16 +1548,16 @@ int __isolate_lru_page(struct page *page, is= olate_mode_t mode) >>>> { >>>> int ret =3D -EINVAL; >>>> >>>> - /* Only take pages on the LRU. */ >>>> - if (!PageLRU(page)) >>>> - return ret; >>>> - >>>> /* Compaction should not handle unevictable pages but CMA ca= n do so */ >>>> if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) >>>> return ret; >>>> >>>> ret =3D -EBUSY; >>>> >>>> + /* Only take pages on the LRU. */ >>>> + if (!PageLRU(page)) >>>> + return ret; >>>> + >>>> /* >>>> * To minimise LRU disruption, the caller can indicate that = it only >>>> * wants to isolate pages it will be able to operate on with= out >>>> @@ -1671,8 +1671,6 @@ static unsigned long isolate_lru_pages(unsigne= d long nr_to_scan, >>>> page =3D lru_to_page(src); >>>> prefetchw_prev_lru_page(page, src, flags); >>>> >>>> - VM_BUG_ON_PAGE(!PageLRU(page), page); >>>> - >>>> nr_pages =3D compound_nr(page); >>>> total_scan +=3D nr_pages; >>>> >>> >>> So effectively the changes here are making it so that a !PageLRU page >>> will cycle to the start of the LRU list. Now if I understand correctl= y >>> we are guaranteed that if the flag is not set it cannot be set while >>> we are holding the lru_lock, however it can be cleared while we are >>> holding the lock, correct? Thus that is why isolate_lru_pages has to >>> call TestClearPageLRU after the earlier check in __isolate_lru_page. >> >> Right. >> >>> >>> It might make it more readable to pull in the later patch that >>> modifies isolate_lru_pages that has it using TestClearPageLRU. >> As to this change, It has to do in this patch, since any TestClearPage= LRU may >> cause lru bit miss in the lru list, so the precondication check has to >> removed here. >=20 > So I think some of my cognitive dissonance is from the fact that you > really are doing two different things here. You aren't really > implementing the full TestClearPageLRU until patch 15. So this patch > is doing part of 2a and 2b, and then patch 15 is following up and > completing the 2a cases. I still think it might make more sense to > pull out the pieces related to 2b and move them into a patch before > this with documentation explaining that there should be no competition > for the LRU flag because the page has transitioned to a reference > count of zero. Then take the remaining bits and combine them with > patch 15 since the description for the two is pretty similar. >=20 Good suggestion, I consider this. Thanks