From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=r8za=IJ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E7DE1C433E0
	for <linux-mm@archiver.kernel.org>; Thu, 11 Mar 2021 08:52:57 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 5FB8964FA9
	for <linux-mm@archiver.kernel.org>; Thu, 11 Mar 2021 08:52:57 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5FB8964FA9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id BC2798D0299; Thu, 11 Mar 2021 03:52:56 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B72D48D028E; Thu, 11 Mar 2021 03:52:56 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A3AD28D0299; Thu, 11 Mar 2021 03:52:56 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146])
	by kanga.kvack.org (Postfix) with ESMTP id 8B69D8D028E
	for <linux-mm@kvack.org>; Thu, 11 Mar 2021 03:52:56 -0500 (EST)
Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 4B5D8180ACEF0
	for <linux-mm@kvack.org>; Thu, 11 Mar 2021 08:52:56 +0000 (UTC)
X-FDA: 77906978352.09.48B4F9A
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
	by imf09.hostedemail.com (Postfix) with ESMTP id 468006000108
	for <linux-mm@kvack.org>; Thu, 11 Mar 2021 08:52:53 +0000 (UTC)
IronPort-SDR: 1al0tyS7eh8CJdimZxrtHYVf3eYtW5c6cI2xow9VmrYl8J3wURqfdtW6l7hfRB3hn/DAR+UbaY
 OaKbQbrmcMpQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9919"; a="273675326"
X-IronPort-AV: E=Sophos;i="5.81,239,1610438400"; 
   d="scan'208";a="273675326"
Received: from orsmga007.jf.intel.com ([10.7.209.58])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2021 00:52:50 -0800
IronPort-SDR: W+qFg0eWwfOI4rWoPmPB+dSnRTtJZL/4f3zCci8E3l2BBsarZFzkG7nmBfpq1GfTrVAyUl7J6B
 ZL6L14FCK2Rg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.81,239,1610438400"; 
   d="scan'208";a="409416800"
Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.130])
  by orsmga007.jf.intel.com with ESMTP; 11 Mar 2021 00:52:47 -0800
From: "Huang, Ying" <ying.huang@intel.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>,  Andrew Morton <akpm@linux-foundation.org>,
  Linux MM <linux-mm@kvack.org>,  LKML <linux-kernel@vger.kernel.org>,  Mel
 Gorman <mgorman@suse.de>,  Johannes Weiner <hannes@cmpxchg.org>,  Vladimir
 Davydov <vdavydov.dev@gmail.com>,  Michal Hocko <mhocko@suse.cz>,  Joonsoo
 Kim <iamjoonsoo.kim@lge.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] vmscan: retry without cache trim mode if nothing scanned
References: <20210311004449.1170308-1-ying.huang@intel.com>
	<CALvZod7QNEXdKCJ3H3eoZKsRj5jtOESkmHm1dTC-ZjSBAcW7ng@mail.gmail.com>
Date: Thu, 11 Mar 2021 16:52:47 +0800
In-Reply-To: <CALvZod7QNEXdKCJ3H3eoZKsRj5jtOESkmHm1dTC-ZjSBAcW7ng@mail.gmail.com>
	(Shakeel Butt's message of "Wed, 10 Mar 2021 16:57:49 -0800")
Message-ID: <87v99yvzq8.fsf@yhuang-dev.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 468006000108
X-Stat-Signature: f6baww5zqusa11w7eqtwwsaik4cyr8w8
Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from="<ying.huang@intel.com>"; helo=mga05.intel.com; client-ip=192.55.52.43
X-HE-DKIM-Result: none/none
X-HE-Tag: 1615452773-778456
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hi, Butt,

Shakeel Butt <shakeelb@google.com> writes:

> On Wed, Mar 10, 2021 at 4:47 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> From: Huang Ying <ying.huang@intel.com>
>>
>> In shrink_node(), to determine whether to enable cache trim mode, the
>> LRU size is gotten via lruvec_page_state().  That gets the value from
>> a per-CPU counter (mem_cgroup_per_node->lruvec_stat[]).  The error of
>> the per-CPU counter from CPU local counting and the descendant memory
>> cgroups may cause some issues.  We run into this in 0-Day performance
>> test.
>>
>> 0-Day uses the RAM file system as root file system, so the number of
>> the reclaimable file pages is very small.  In the swap testing, the
>> inactive file LRU list will become almost empty soon.  But the size of
>> the inactive file LRU list gotten from the per-CPU counter may keep a
>> much larger value (say, 33, 50, etc.).  This will enable cache trim
>> mode, but nothing can be scanned in fact.  The following pattern
>> repeats for long time in the test,
>>
>> priority        inactive_file_size      cache_trim_mode
>> 12              33                      0
>> 11              33                      0
>> ...
>> 6               33                      0
>> 5               33                      1
>> ...
>> 1               33                      1
>>
>> That is, the cache_trim_mode will be enabled wrongly when the scan
>> priority decreases to 5.  And the problem will not be recovered for
>> long time.
>>
>> It's hard to get the more accurate size of the inactive file list
>> without much more overhead.  And it's hard to estimate the error of
>> the per-CPU counter too, because there may be many descendant memory
>> cgroups.  But after the actual scanning, if nothing can be scanned
>> with the cache trim mode, it should be wrong to enable the cache trim
>> mode.  So we can retry with the cache trim mode disabled.  This patch
>> implement this policy.
>
> Instead of playing with the already complicated heuristics, we should
> improve the accuracy of the lruvec stats. Johannes already fixed the
> memcg stats using rstat infrastructure and Tejun has suggestions on
> how to use rstat infrastructure efficiently for lruvec stats at
> https://lore.kernel.org/linux-mm/YCFgr300eRiEZwpL@slm.duckdns.org/.

Thanks for your information!  It should be better if we can improve the
accuracy of lruvec stats without much overhead.  But that may be not a
easy task.

If my understanding were correct, what Tejun suggested is to add a fast
read interface to rstat to be used in hot path.  And its accuracy is
similar as that of traditional per-CPU counter.  But if we can regularly
update the lruvec rstat with something like vmstat_update(), that should
be OK for the issue described in this patch.

Best Regards,
Huang, Ying