From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8599EC4741F for ; Thu, 5 Nov 2020 15:18:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D33602074B for ; Thu, 5 Nov 2020 15:18:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D33602074B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 143266B0128; Thu, 5 Nov 2020 10:18:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CD426B0129; Thu, 5 Nov 2020 10:18:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E88BA6B012A; Thu, 5 Nov 2020 10:18:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 62C976B0128 for ; Thu, 5 Nov 2020 10:18:25 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 94AE0362D for ; Thu, 5 Nov 2020 15:18:19 +0000 (UTC) X-FDA: 77450720718.28.cup10_3402f82272ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 35AE36C0A for ; Thu, 5 Nov 2020 15:18:19 +0000 (UTC) X-HE-Tag: cup10_3402f82272ca X-Filterd-Recvd-Size: 6536 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 15:18:18 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D0EBBAB4C; Thu, 5 Nov 2020 15:18:16 +0000 (UTC) Subject: Re: [PATCH] mm: account lazily freed anon pages in NR_FILE_PAGES To: Yafang Shao , akpm@linux-foundation.org, mhocko@suse.com, minchan@kernel.org, hannes@cmpxchg.org Cc: linux-mm@kvack.org References: <20201105131012.82457-1-laoar.shao@gmail.com> From: Vlastimil Babka Message-ID: <4c0a7ea6-4817-2dae-7473-8d0fe6110a45@suse.cz> Date: Thu, 5 Nov 2020 16:18:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <20201105131012.82457-1-laoar.shao@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/5/20 2:10 PM, Yafang Shao wrote: > The memory utilization (Used / Total) is used to monitor the memory > pressure by us. If it is too high, it means the system may be OOM sooner > or later when swap is off, then we will make adjustment on this system. Hmm I would say that any system looking just at memory utilization (Used / Total) and not looking at file lru size is flawed. There's a reason MemAvailable exists, and does count file lru sizes. > However, this method is broken since MADV_FREE is introduced, because > these lazily free anonymous can be reclaimed under memory pressure while > they are still accounted in NR_ANON_MAPPED. > > Furthermore, since commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into > LRU_INACTIVE_FILE list"), these lazily free anonymous pages are moved > from anon lru list into file lru list. That means > (Inactive(file) + Active(file)) may be much larger than Cached in > /proc/meminfo. That makes our users confused. Yeah the counters are tricky for multiple reasons as Michal said... > So we'd better account the lazily freed anonoymous pages in > NR_FILE_PAGES as well. > > Signed-off-by: Yafang Shao > Cc: Minchan Kim > Cc: Johannes Weiner > Cc: Michal Hocko > --- > mm/memcontrol.c | 11 +++++++++-- > mm/rmap.c | 26 ++++++++++++++++++-------- > mm/swap.c | 2 ++ > mm/vmscan.c | 2 ++ > 4 files changed, 31 insertions(+), 10 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 3dcbf24d2227..217a6f10fa8d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5659,8 +5659,15 @@ static int mem_cgroup_move_account(struct page *page, > > if (PageAnon(page)) { > if (page_mapped(page)) { > - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); > - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); > + if (!PageSwapBacked(page) && !PageSwapCache(page) && > + !PageUnevictable(page)) { > + __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); > + __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); > + } else { > + __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); > + __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); > + } > + > if (PageTransHuge(page)) { > __mod_lruvec_state(from_vec, NR_ANON_THPS, > -nr_pages); > diff --git a/mm/rmap.c b/mm/rmap.c > index 1b84945d655c..690ca7ff2392 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1312,8 +1312,13 @@ static void page_remove_anon_compound_rmap(struct page *page) > if (unlikely(PageMlocked(page))) > clear_page_mlock(page); > > - if (nr) > - __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > + if (nr) { > + if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) && > + !PageSwapCache(page) && !PageUnevictable(page)) > + __mod_lruvec_page_state(page, NR_FILE_PAGES, -nr); > + else > + __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > + } > } > > /** > @@ -1341,12 +1346,17 @@ void page_remove_rmap(struct page *page, bool compound) > if (!atomic_add_negative(-1, &page->_mapcount)) > goto out; > > - /* > - * We use the irq-unsafe __{inc|mod}_zone_page_stat because > - * these counters are not modified in interrupt context, and > - * pte lock(a spinlock) is held, which implies preemption disabled. > - */ > - __dec_lruvec_page_state(page, NR_ANON_MAPPED); > + if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) && > + !PageSwapCache(page) && !PageUnevictable(page)) { > + __dec_lruvec_page_state(page, NR_FILE_PAGES); > + } else { > + /* > + * We use the irq-unsafe __{inc|mod}_zone_page_stat because > + * these counters are not modified in interrupt context, and > + * pte lock(a spinlock) is held, which implies preemption disabled. > + */ > + __dec_lruvec_page_state(page, NR_ANON_MAPPED); > + } > > if (unlikely(PageMlocked(page))) > clear_page_mlock(page); > diff --git a/mm/swap.c b/mm/swap.c > index 47a47681c86b..340c5276a0f3 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -601,6 +601,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > > del_page_from_lru_list(page, lruvec, > LRU_INACTIVE_ANON + active); > + __mod_lruvec_state(lruvec, NR_ANON_MAPPED, -nr_pages); > ClearPageActive(page); > ClearPageReferenced(page); > /* > @@ -610,6 +611,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > */ > ClearPageSwapBacked(page); > add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE); > + __mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_pages); > > __count_vm_events(PGLAZYFREE, nr_pages); > __count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 1b8f0e059767..4821124c70f7 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1428,6 +1428,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, > goto keep_locked; > } > > + mod_lruvec_page_state(page, NR_ANON_MAPPED, nr_pages); > + mod_lruvec_page_state(page, NR_FILE_PAGES, -nr_pages); > count_vm_event(PGLAZYFREED); > count_memcg_page_event(page, PGLAZYFREED); > } else if (!mapping || !__remove_mapping(mapping, page, true, >