From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42475C47404 for ; Mon, 7 Oct 2019 14:28:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 092512070B for ; Mon, 7 Oct 2019 14:28:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 092512070B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A1D1E8E0005; Mon, 7 Oct 2019 10:28:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A5F08E0003; Mon, 7 Oct 2019 10:28:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BC338E0005; Mon, 7 Oct 2019 10:28:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 63FA08E0003 for ; Mon, 7 Oct 2019 10:28:08 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 13360181AC9B6 for ; Mon, 7 Oct 2019 14:28:08 +0000 (UTC) X-FDA: 76017218256.05.wash21_46f0e5cbac74b X-HE-Tag: wash21_46f0e5cbac74b X-Filterd-Recvd-Size: 4020 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Oct 2019 14:28:07 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3AD5EAD85; Mon, 7 Oct 2019 14:28:06 +0000 (UTC) Date: Mon, 7 Oct 2019 16:28:05 +0200 From: Michal Hocko To: Honglei Wang Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size Message-ID: <20191007142805.GM2381@dhcp22.suse.cz> References: <20190905071034.16822-1-honglei.wang@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190905071034.16822-1-honglei.wang@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 05-09-19 15:10:34, Honglei Wang wrote: > lruvec_lru_size() is involving lruvec_page_state_local() to get the > lru_size in the current code. It's base on lruvec_stat_local.count[] > of mem_cgroup_per_node. This counter is updated in batch. It won't > do charge if the number of coming pages doesn't meet the needs of > MEMCG_CHARGE_BATCH who's defined as 32 now. > > The testcase in LTP madvise09[1] fails due to small block memory is > not charged. It creates a new memcgroup and sets up 32 MADV_FREE > pages. Then it forks child who will introduce memory pressure in the > memcgroup. The MADV_FREE pages are expected to be released under the > pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages > won't be charged in lruvec_stat_local.count[] until some more pages > come in to satisfy the needs of batch charging. So these MADV_FREE > pages can't be freed in memory pressure which is a bit conflicted > with the definition of MADV_FREE. The test case is simly wrong. The caching and the batch size is an internal implementation detail. Moreover MADV_FREE is a _hint_ so all you can say is that those pages will get freed at some point in time but you cannot make any assumptions about when that moment happens. > Getting lru_size base on lru_zone_size of mem_cgroup_per_node which > is not updated in batch can make it a bit more accurate in similar > scenario. What does that mean? It would be more helpful to describe the code path which will use this more precise value and what is the effect of that. As I've said in the previous version, I do not object to the patch because a more precise lruvec_lru_size sounds like a nice thing as long as we are not paying a high price for that. Just look at the global case for mem_cgroup_disabled(). It uses node_page_state and that one is using per-cpu accounting with regular global value refreshing IIRC. > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c > > Signed-off-by: Honglei Wang > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c77d1e3761a7..c28672460868 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone) > */ > unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) > { > - unsigned long lru_size; > + unsigned long lru_size = 0; > int zid; > > - if (!mem_cgroup_disabled()) > - lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); > - else > + if (!mem_cgroup_disabled()) { > + for (zid = 0; zid < MAX_NR_ZONES; zid++) > + lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid); > + } else > lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru); > > for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { > -- > 2.17.0 -- Michal Hocko SUSE Labs