From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VZpi=HY=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.2 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A9AB3C433E0
	for <linux-mm@archiver.kernel.org>; Mon, 22 Feb 2021 09:35:10 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 009D5601FE
	for <linux-mm@archiver.kernel.org>; Mon, 22 Feb 2021 09:35:09 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009D5601FE
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 5ED826B0071; Mon, 22 Feb 2021 04:35:09 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 576106B0072; Mon, 22 Feb 2021 04:35:09 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 465B18D0001; Mon, 22 Feb 2021 04:35:09 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57])
	by kanga.kvack.org (Postfix) with ESMTP id 2C3726B0071
	for <linux-mm@kvack.org>; Mon, 22 Feb 2021 04:35:09 -0500 (EST)
Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id E576F8134
	for <linux-mm@kvack.org>; Mon, 22 Feb 2021 09:35:08 +0000 (UTC)
X-FDA: 77845395096.26.2A0441E
Received: from outbound-smtp22.blacknight.com (outbound-smtp22.blacknight.com [81.17.249.190])
	by imf02.hostedemail.com (Postfix) with ESMTP id B6910407F8ED
	for <linux-mm@kvack.org>; Mon, 22 Feb 2021 09:34:56 +0000 (UTC)
Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152])
	by outbound-smtp22.blacknight.com (Postfix) with ESMTPS id A49C0BAB5B
	for <linux-mm@kvack.org>; Mon, 22 Feb 2021 09:35:06 +0000 (GMT)
Received: (qmail 3587 invoked from network); 22 Feb 2021 09:35:06 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.22.4])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 22 Feb 2021 09:35:06 -0000
Date: Mon, 22 Feb 2021 09:35:05 +0000
From: Mel Gorman <mgorman@techsingularity.net>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, kuba@kernel.org
Subject: Re: [PATCH RFC] SUNRPC: Refresh rq_pages using a bulk page allocator
Message-ID: <20210222093505.GG3697@techsingularity.net>
References: <161340498400.7780.962495219428962117.stgit@klimt.1015granger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <161340498400.7780.962495219428962117.stgit@klimt.1015granger.net>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Stat-Signature: 47rup4qw4yy9asmo8exq6kmbp7co5i78
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: B6910407F8ED
Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from="<mgorman@techsingularity.net>"; helo=outbound-smtp22.blacknight.com; client-ip=81.17.249.190
X-HE-DKIM-Result: none/none
X-HE-Tag: 1613986496-576728
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Feb 15, 2021 at 11:06:07AM -0500, Chuck Lever wrote:
> Reduce the rate at which nfsd threads hammer on the page allocator.
> This improves throughput scalability by enabling the nfsd threads to
> run more independently of each other.
> 

Sorry this is taking so long, there is a lot going on.

This patch has pre-requisites that are not in mainline which makes it
harder to evaluate what the semantics of the API should be.

> @@ -659,19 +659,33 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>  		/* use as many pages as possible */
>  		pages = RPCSVC_MAXPAGES;
>  	}
> -	for (i = 0; i < pages ; i++)
> -		while (rqstp->rq_pages[i] == NULL) {
> -			struct page *p = alloc_page(GFP_KERNEL);
> -			if (!p) {
> -				set_current_state(TASK_INTERRUPTIBLE);
> -				if (signalled() || kthread_should_stop()) {
> -					set_current_state(TASK_RUNNING);
> -					return -EINTR;
> -				}
> -				schedule_timeout(msecs_to_jiffies(500));
> +
> +	for (needed = 0, i = 0; i < pages ; i++)
> +		if (!rqstp->rq_pages[i])
> +			needed++;
> +	if (needed) {
> +		LIST_HEAD(list);
> +
> +retry:
> +		alloc_pages_bulk(GFP_KERNEL, 0,
> +				 /* to test the retry logic: */
> +				 min_t(unsigned long, needed, 13),
> +				 &list);
> +		for (i = 0; i < pages; i++) {
> +			if (!rqstp->rq_pages[i]) {
> +				struct page *page;
> +
> +				page = list_first_entry_or_null(&list,
> +								struct page,
> +								lru);
> +				if (unlikely(!page))
> +					goto empty_list;
> +				list_del(&page->lru);
> +				rqstp->rq_pages[i] = page;
> +				needed--;
>  			}
> -			rqstp->rq_pages[i] = p;
>  		}
> +	}
>  	rqstp->rq_page_end = &rqstp->rq_pages[pages];
>  	rqstp->rq_pages[pages] = NULL; /* this might be seen in nfsd_splice_actor() */
>  

There is a conflict at the end where rq_page_end gets updated. The 5.11
code assumes that the loop around the allocator definitely gets all
the required pages. What tree is this patch based on and is it going in
during this merge window? While the conflict is "trivial" to resolve,
it would be buggy because on retry, "i" will be pointing to the wrong
index and pages potentially leak. Rather than guessing, I'd prefer to
base a series on code you've tested.

The slowpath for the bulk allocator also sucks a bit for the semantics
required by this caller. As the bulk allocator does not walk the zonelist,
it can return failures prematurely -- fine for an optimistic bulk allocator
that can return a subset of pages but not for this caller which really
wants those pages. The allocator may need NOFAIL-like semantics to walk
the zonelist if the caller really requires success or at least walk the
zonelist if the preferred zone is low on pages. This patch would also
need to preserve the schedule_timeout behaviour so it does not use a lot
of CPU time retrying allocations in the presense of memory pressure.

-- 
Mel Gorman
SUSE Labs