From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF741C388F9 for ; Wed, 11 Nov 2020 13:36:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EBC312065C for ; Wed, 11 Nov 2020 13:36:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EBC312065C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38EC56B006C; Wed, 11 Nov 2020 08:36:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 33DB56B0070; Wed, 11 Nov 2020 08:36:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27B5C6B0071; Wed, 11 Nov 2020 08:36:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id EE2F76B006C for ; Wed, 11 Nov 2020 08:36:21 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8C6EB8249980 for ; Wed, 11 Nov 2020 13:36:21 +0000 (UTC) X-FDA: 77472236562.30.cow29_09006f1272fe Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 51222180B3C8B for ; Wed, 11 Nov 2020 13:36:21 +0000 (UTC) X-HE-Tag: cow29_09006f1272fe X-Filterd-Recvd-Size: 8154 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 13:36:20 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3461AAD77; Wed, 11 Nov 2020 13:36:19 +0000 (UTC) To: Alex Shi , akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Michal Hocko References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> From: Vlastimil Babka Subject: Re: [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Message-ID: Date: Wed, 11 Nov 2020 14:36:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/5/20 9:55 AM, Alex Shi wrote: > Currently lru_lock still guards both lru list and page's lru bit, that'= s > ok. but if we want to use specific lruvec lock on the page, we need to > pin down the page's lruvec/memcg during locking. Just taking lruvec > lock first may be undermined by the page's memcg charge/migration. To > fix this problem, we will clear the lru bit out of locking and use > it as pin down action to block the page isolation in memcg changing. >=20 > So now a standard steps of page isolation is following: > 1, get_page(); #pin the page avoid to be free > 2, TestClearPageLRU(); #block other isolation like memcg change > 3, spin_lock on lru_lock; #serialize lru list access > 4, delete page from lru list; >=20 > This patch start with the first part: TestClearPageLRU, which combines > PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This > function will be used as page isolation precondition to prevent other > isolations some where else. Then there are may !PageLRU page on lru > list, need to remove BUG() checking accordingly. As there now may be !PageLRU pages on lru list, we need to ... >=20 > There 2 rules for lru bit now: > 1, the lru bit still indicate if a page on lru list, just in some > temporary moment(isolating), the page may have no lru bit when > it's on lru list. but the page still must be on lru list when the > lru bit set. > 2, have to remove lru bit before delete it from lru list. 2. we have to remove the lru bit before deleting page from lru list >=20 > As Andrew Morton mentioned this change would dirty cacheline for page > isn't on LRU. But the lost would be acceptable in Rong Chen > report: > https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ AFAIK these places generally expect PageLRU to be true, and if it's false= , it's=20 because of a race, so that effect should be negligible? > Suggested-by: Johannes Weiner > Signed-off-by: Alex Shi > Acked-by: Hugh Dickins > Acked-by: Johannes Weiner > Cc: Hugh Dickins > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov > Cc: Andrew Morton > Cc: linux-kernel@vger.kernel.org > Cc: cgroups@vger.kernel.org > Cc: linux-mm@kvack.org > --- ... > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1542,7 +1542,7 @@ unsigned int reclaim_clean_pages_from_list(struct= zone *zone, > */ > int __isolate_lru_page(struct page *page, isolate_mode_t mode) > { > - int ret =3D -EINVAL; > + int ret =3D -EBUSY; > =20 > /* Only take pages on the LRU. */ > if (!PageLRU(page)) > @@ -1552,8 +1552,6 @@ int __isolate_lru_page(struct page *page, isolate= _mode_t mode) > if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) > return ret; > =20 > - ret =3D -EBUSY; I'm not sure why this change is here, looks unrelated to the patch? Oh I see, you want to prevent the BUG() in isolate_lru_pages(). But due to that, the PageUnevictable check was also affected unintentiona= lly.=20 But I don't think it's that important to BUG() when we run into PageUnevi= ctable=20 unexpectedly, so that's probably ok. But with that, we can just make __isolate_lru_page() a bool function and = remove=20 the ugly switch in isolate_lru_pages()? > - > /* > * To minimise LRU disruption, the caller can indicate that it only > * wants to isolate pages it will be able to operate on without > @@ -1600,8 +1598,10 @@ int __isolate_lru_page(struct page *page, isolat= e_mode_t mode) > * sure the page is not being freed elsewhere -- the > * page release code relies on it. > */ > - ClearPageLRU(page); > - ret =3D 0; > + if (TestClearPageLRU(page)) > + ret =3D 0; > + else > + put_page(page); > } > =20 > return ret; > @@ -1667,8 +1667,6 @@ static unsigned long isolate_lru_pages(unsigned l= ong nr_to_scan, > page =3D lru_to_page(src); > prefetchw_prev_lru_page(page, src, flags); > =20 > - VM_BUG_ON_PAGE(!PageLRU(page), page); > - > nr_pages =3D compound_nr(page); > total_scan +=3D nr_pages; > =20 > @@ -1765,21 +1763,18 @@ int isolate_lru_page(struct page *page) > VM_BUG_ON_PAGE(!page_count(page), page); > WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); > =20 > - if (PageLRU(page)) { > + if (TestClearPageLRU(page)) { > pg_data_t *pgdat =3D page_pgdat(page); > struct lruvec *lruvec; > =20 > - spin_lock_irq(&pgdat->lru_lock); > + get_page(page); > lruvec =3D mem_cgroup_page_lruvec(page, pgdat); > - if (PageLRU(page)) { > - int lru =3D page_lru(page); > - get_page(page); > - ClearPageLRU(page); > - del_page_from_lru_list(page, lruvec, lru); > - ret =3D 0; > - } > + spin_lock_irq(&pgdat->lru_lock); > + del_page_from_lru_list(page, lruvec, page_lru(page)); > spin_unlock_irq(&pgdat->lru_lock); > + ret =3D 0; > } > + > return ret; > } > =20 > @@ -4293,6 +4288,10 @@ void check_move_unevictable_pages(struct pagevec= *pvec) > nr_pages =3D thp_nr_pages(page); > pgscanned +=3D nr_pages; > =20 > + /* block memcg migration during page moving between lru */ > + if (!TestClearPageLRU(page)) > + continue; > + > if (pagepgdat !=3D pgdat) { > if (pgdat) > spin_unlock_irq(&pgdat->lru_lock); > @@ -4301,10 +4300,7 @@ void check_move_unevictable_pages(struct pagevec= *pvec) > } > lruvec =3D mem_cgroup_page_lruvec(page, pgdat); > =20 > - if (!PageLRU(page) || !PageUnevictable(page)) > - continue; > - > - if (page_evictable(page)) { > + if (page_evictable(page) && PageUnevictable(page)) { Doing PageUnevictable() test first should be cheaper? > enum lru_list lru =3D page_lru_base_type(page); > =20 > VM_BUG_ON_PAGE(PageActive(page), page); > @@ -4313,12 +4309,15 @@ void check_move_unevictable_pages(struct pageve= c *pvec) > add_page_to_lru_list(page, lruvec, lru); > pgrescued +=3D nr_pages; > } > + SetPageLRU(page); > } > =20 > if (pgdat) { > __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); > __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); > spin_unlock_irq(&pgdat->lru_lock); > + } else if (pgscanned) { > + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); > } > } > EXPORT_SYMBOL_GPL(check_move_unevictable_pages); >=20