From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=tmHZ=5X=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0DB84C2BB1D
	for <linux-mm@archiver.kernel.org>; Tue,  7 Apr 2020 10:25:20 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id B846D2078A
	for <linux-mm@archiver.kernel.org>; Tue,  7 Apr 2020 10:25:19 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B846D2078A
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 27EE88E0022; Tue,  7 Apr 2020 06:25:19 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 22F998E0001; Tue,  7 Apr 2020 06:25:19 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 146558E0022; Tue,  7 Apr 2020 06:25:19 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60])
	by kanga.kvack.org (Postfix) with ESMTP id EE7DA8E0001
	for <linux-mm@kvack.org>; Tue,  7 Apr 2020 06:25:18 -0400 (EDT)
Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id AD08080195D8
	for <linux-mm@kvack.org>; Tue,  7 Apr 2020 10:25:18 +0000 (UTC)
X-FDA: 76680676716.19.care01_1132ed7f1211b
X-HE-Tag: care01_1132ed7f1211b
X-Filterd-Recvd-Size: 6333
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf06.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  7 Apr 2020 10:25:18 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id ACAF2AB76;
	Tue,  7 Apr 2020 10:25:15 +0000 (UTC)
Received: by quack2.suse.cz (Postfix, from userid 1000)
	id 347761E1233; Tue,  7 Apr 2020 12:25:15 +0200 (CEST)
Date: Tue, 7 Apr 2020 12:25:15 +0200
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"Anna.Schumaker@Netapp.com" <Anna.Schumaker@netapp.com>,
	Andrew Morton <akpm@linux-foundation.org>, Jan Kara <jack@suse.cz>,
	Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK
 instead.
Message-ID: <20200407102515.GB9482@quack2.suse.cz>
References: <draft-87d08kw57p.fsf@notabene.neil.brown.name>
 <878sj8w55y.fsf@notabene.neil.brown.name>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <878sj8w55y.fsf@notabene.neil.brown.name>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue 07-04-20 09:44:25, NeilBrown wrote:
> 
> After an NFS page has been written it is considered "unstable" until a
> COMMIT request succeeds.  If the COMMIT fails, the page will be
> re-written.
> 
> These "unstable" pages are currently accounted as "reclaimable", either
> in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
> 'reclaimable' count.  This might have made sense when sending the COMMIT
> required a separate action by the VFS/MM (e.g.  releasepage() used to
> send a COMMIT).  However now that all writes generated by ->writepages()
> will automatically be followed by a COMMIT (since commit 919e3bd9a875
> ("NFS: Ensure we commit after writeback is complete")) it makes more
> sense to treat them as writeback pages.
> 
> So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
> NR_WRITEBACK and WB_WRITEBACK.
> 
> A particular effect of this change is that when
> wb_check_background_flush() calls wb_over_bg_threshold(), the latter
> will report 'true' a lot less often as the 'unstable' pages are no
> longer considered 'dirty' (as there is nothing that writeback can do
> about them anyway).
> 
> Currently wb_check_background_flush() will trigger writeback to NFS even
> when there are relatively few dirty pages (if there are lots of unstable
> pages), this can result in small writes going to the server (10s of
> Kilobytes rather than a Megabyte) which hurts throughput.
> With this patch, there are fewer writes which are each larger on average.
> 
> Where the NR_UNSTABLE_NFS count was included in statistics
> virtual-files, the entry is retained, but the value is hard-coded as
> zero.  static trace points which record this counter no longer report
> it.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

...

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e5f76da8cd4e..24678d6e308d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5237,7 +5237,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  
>  	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
>  		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
> -		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
> +		" unevictable:%lu dirty:%lu writeback:%lu unstable:0\n"
>  		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
>  		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
>  		" free:%lu free_pcp:%lu free_cma:%lu\n",
> @@ -5250,7 +5250,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  		global_node_page_state(NR_UNEVICTABLE),
>  		global_node_page_state(NR_FILE_DIRTY),
>  		global_node_page_state(NR_WRITEBACK),
> -		global_node_page_state(NR_UNSTABLE_NFS),
>  		global_node_page_state(NR_SLAB_RECLAIMABLE),
>  		global_node_page_state(NR_SLAB_UNRECLAIMABLE),
>  		global_node_page_state(NR_FILE_MAPPED),
> @@ -5283,7 +5282,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			" anon_thp: %lukB"
>  #endif
>  			" writeback_tmp:%lukB"
> -			" unstable:%lukB"
> +			" unstable:0kB"
>  			" all_unreclaimable? %s"
>  			"\n",
>  			pgdat->node_id,
> @@ -5305,7 +5304,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR),
>  #endif
>  			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
> -			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
>  			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
>  				"yes" : "no");
>  	}

These are just page allocator splats on OOM. I don't think preserving
'unstable' in these reports is needed.

> @@ -1707,8 +1706,16 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
>  static void *vmstat_next(struct seq_file *m, void *arg, loff_t *pos)
>  {
>  	(*pos)++;
> -	if (*pos >= NR_VMSTAT_ITEMS)
> +	if (*pos >= NR_VMSTAT_ITEMS) {
> +		/*
> +		 * Deprecated counters which are no longer represented
> +		 * in vmstat arrays. We just lie about them to be always
> +		 * 0 to not break userspace which might expect them in
> +		 * the output.
> +		 */
> +		seq_puts(m, "nr_unstable 0");
>  		return NULL;
> +	}
>  	return (unsigned long *)m->private + *pos;
>  }

Umm, how is this supposed to work? vmstat_next() should return next element
of the sequence, not fill anything into seq_file - that's the job of
vmstat_show(). Looking at seq_read() implementation it may actually end up
working fine but I wouldn't really bet much on it especially in corner
cases like when we are just about to fill the user buffer and then need to
restart reading close to an end of vmstat file or so.

Michal, won't it be cleaner to have NR_VM_DEPRECATED_ITEMS included in
NR_VMSTAT_ITEMS, have names of these items in vmstat_text, and just set
appropriate number of 0 entries at the end of the array generated in
vmstat_start() and be done with it? That seems conceptually simpler and the
overhead is minimal.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR