From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0B8FAC25B4E
	for <linux-mm@archiver.kernel.org>; Sun, 22 Jan 2023 12:37:00 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 534246B0072; Sun, 22 Jan 2023 07:37:00 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4E3586B0073; Sun, 22 Jan 2023 07:37:00 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3AA9F6B0074; Sun, 22 Jan 2023 07:37:00 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 2C51E6B0072
	for <linux-mm@kvack.org>; Sun, 22 Jan 2023 07:37:00 -0500 (EST)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id EE1AF8039C
	for <linux-mm@kvack.org>; Sun, 22 Jan 2023 12:36:59 +0000 (UTC)
X-FDA: 80382384558.04.2FF9115
Received: from casper.infradead.org (casper.infradead.org [90.155.50.34])
	by imf26.hostedemail.com (Postfix) with ESMTP id 5D89414000A
	for <linux-mm@kvack.org>; Sun, 22 Jan 2023 12:36:57 +0000 (UTC)
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=oQxQRzWW;
	spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org;
	dmarc=none
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1674391018;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=SGNhBhqNCIpBI9HhG3N2nX6Tciz/iXZGKmi8TMa3TTE=;
	b=Vjr2HsdvbLKI20P/6V4sEidjazA3Rr/UAdfArqEHW1ASJ5LCy9olP8fWJetj6EQQ9q6Lmd
	wSg2Z1pnMeCXMeeV8n0/yPvyIpjs6IUZTDW+iqUWv3fM86rkeU89EWVEqR+7ksrxl2ldrZ
	3O7w/V3X/i0/Kst7oxTPz4uWeUgcB8k=
ARC-Authentication-Results: i=1;
	imf26.hostedemail.com;
	dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=oQxQRzWW;
	spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org;
	dmarc=none
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674391018; a=rsa-sha256;
	cv=none;
	b=jX2UxGSwtRfDajB3vN4Sl4jhv+/J5PTaQrtW24dPBnFrENlhdmu99l5vJpMvayBGULG3ka
	7VcCre6v09Pn9UBQ1+ndLEvIoe1Y6z+llpCmcByD9psb7dtwQcvtQeHQj88PEVI4dfLDXB
	/n7ykR8M8z3O86TdqABQsk9IkZrU8b4=
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version:
	References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description;
	bh=SGNhBhqNCIpBI9HhG3N2nX6Tciz/iXZGKmi8TMa3TTE=; b=oQxQRzWWRM16ul6rLzZ29l95DU
	UhTSOU4iKQihNIYq5l+o+r31CaOc8ZPNhvwHLBCashGXqU02suwPMkmPhnrraZlrooM2caVzdHJSY
	iTfno1Tqb0OdOauS+BTjPBhdqxtmkyGdRkV6JQotRvCW5uj8bsfwM+V9LX5CGt2O8H2JJO7eCwmIT
	9EvShG3oiKnDjdbBTsDGZse/qHvpZ3AHR8bv89/canf9TR25xdEjx3Y4Szu0rJiQWM6yWuia9nUi+
	mpdndAOZPMCxCW6X2OWrUwOc6FD8PxzMp8fJ7xTRIe+TOwkJ8m/4hE3WNmJhHIK6QcowPVm14pWbz
	Aexq5Xvw==;
Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux))
	id 1pJZad-003YM0-A0; Sun, 22 Jan 2023 12:36:51 +0000
Date: Sun, 22 Jan 2023 12:36:51 +0000
From: Matthew Wilcox <willy@infradead.org>
To: Ira Weiny <ira.weiny@intel.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	"Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Subject: Re: [RFC] memcpy_from_folio()
Message-ID: <Y80t49tQtWO5N22J@casper.infradead.org>
References: <Y8qr8c3+SJLGWhUo@casper.infradead.org>
 <63cb9105105ce_bd04c29434@iweiny-mobl.notmuch>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <63cb9105105ce_bd04c29434@iweiny-mobl.notmuch>
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 5D89414000A
X-Rspam-User: 
X-Stat-Signature: s8qyeak8acjmrs8z4b8684drw1pdnabw
X-HE-Tag: 1674391017-128847
X-HE-Meta: U2FsdGVkX18J7W2CBTKmEGKnbWbcb6+5dGD8y8XgBXjCuB4wI98EH1PzAmnNi/N6rMZvi869xm+rISkfWFR+36U+N7C/DFI0e/HUDi3cWmodcQHZZEe6Fe5It93UHKmsskDKR/Lh8repeS31p+/dIwUiRkRJhIONi6215RVwcPrl6h1V7bSoSqCLPhXBsp9j4c5orYbDry1lBJoA1teg8FZmeVfEMjeJ6382lBWxId+aLYeJmU5eVImyKvVR2JTHsVTN+jtwyqoPs/7jMd0ZAahPFA8WbKbj3JGbZsYe2pyP9PNMbQ1ZPioJTr0kSlv6Y7MJdjP0Nyxa4e4giL8eJsUoKG1gCVu1QmhEbn2pAuGJcZP3lxGyuMYxzuN0J6uLVh3Xyk40RHFDvqZRMRepaJsMEbV31mVgacYqx5DwMSGJOe8Uu3nV1UnXs0t0ooh6o8JPv4w5N9HL+XRqA3GnVlISiBRImS9jZxeRcdbVbxXxgpHxtFRth+GZw24MMKlKm2dkw/PUYLH17IoxOrTNpsIbTEGZOu2gUFbEmYgrCS+UmnqxThX00qqBbwj4bDknqrgRsbD0IlOTIiQEA3ts6gdSiSptSt979oSKhxwaRuEq0Q4vANqOU+VXrn+X1bgI8yZIV4Vu9LZb/XeeVnwFsyVmHStM/dunGA0abvvlg/lUd5CrWs9NUCOGFigi3fBqRQ0eHvXkESYZkqxcsp983PZ60A/PB8tUoNDtowKJp62Eej5z4deB0fcmzgTSNQBthpu6QeTzKOohrJgFK/wvpABZV7E5uB99gPu062OtRLp8KHka9Ad2Mer9k2AwTzKIDYF4d5KfMauuzE8bWzZMBTw8ZHFfW8SAlsQzr0J/4Rnrw8/5WQlaKTnhS6JDE+NhN+ZPHuMkITwuJFtqbfo90ykd4d3XpnMqoYOyFPRiOqhxYuJYSkXMJlwxRFZjR51vgzT3hwMMil0aR2/FArw
 LPi9F07o
 NGP6LRk4/cwRkQPp2BJYJh/91tqjfXbJ4LeRfHGbMs5ozOydOsqUPBqrpm9preNN31sXQxZ0nGQs1ToPczOulfgC0mR9abnaRXiVP1HPagwHn6xVYxtbJqnZ6Fv9i7hCxqh5tYRjQHP/EpVQjr6YP/6H913eZTVn/KosKKiPikF5K5McGYR1NxLee5t4Tiypi66HaMIAt/nwn0yQ=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> Matthew Wilcox wrote:
> > I think I have a good folio replacement for memcpy_from_page().  One of
> > the annoying things about dealing with multi-page folios is that you
> > can't kmap the entire folio, yet on systems without highmem, you don't
> > need to.  It's also somewhat annoying in the caller to keep track
> > of n/len/offset/pos/...
> > 
> > I think this is probably the best option.  We could have a loop that
> > kmaps each page in the folio, but that seems like excessive complexity.
> 
> Why?  IMO better to contain the complexity of highmem systems into any
> memcpy_[to,from]_folio() calls then spread them around the kernel.

Sure, but look at the conversion that I posted.  It's actually simpler
than using the memcpy_from_page() API.

> > I'm happy to have highmem systems be less efficient, since they are
> > anyway.  Another potential area of concern is that folios can be quite
> > large and maybe having preemption disabled while we copy 2MB of data
> > might be a bad thing.  If so, the API is fine with limiting the amount
> > of data we copy, we just need to find out that it is a problem and
> > decide what the correct limit is, if it's not folio_size().
> 
> Why not map the pages only when needed?  I agree that keeping preemption
> disabled for a long time is a bad thing.  But kmap_local_page does not
> disable preemption, only migration.

Some of the scheduler people aren't terribly happy about even disabling
migration for a long time.  Is "copying 2MB of data" a long time?  If I've
done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
DIMMs installed.  That works out to 17GB/s and so copying 2MB of data
will take 118us.  Probably OK for even the most demanding workload.

> Regardless any looping on the maps is going to only be on highmem systems
> and we can map the pages only if/when needed.  Synchronization of the folio
> should be handled by the caller.  So it is fine to all allow migration
> during memcpy_from_folio().
> 
> So why not loop through the pages only when needed?

So you're proposing re-enabling migration after calling
kmap_local_folio()?  I don't really understand.

> > 
> >  fs/ext4/verity.c           |   16 +++++++---------
> >  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
> >  include/linux/page-flags.h |    1 +
> >  3 files changed, 37 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > index e4da1704438e..afe847c967a4 100644
> > --- a/fs/ext4/verity.c
> > +++ b/fs/ext4/verity.c
> > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void *buf, size_t count,
> >  			  loff_t pos)
> >  {
> >  	while (count) {
> > -		size_t n = min_t(size_t, count,
> > -				 PAGE_SIZE - offset_in_page(pos));
> > -		struct page *page;
> > +		struct folio *folio;
> > +		size_t n;
> >  
> > -		page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> > +		folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
> >  					 NULL);
> 
> Is this an issue with how many pages get read into the page
> cache?  I went off on a tangent thinking this read the entire folio into
> the cache.  But I see now I was wrong.  If this is operating page by page
> why change this function at all?

The folio may (indeed _should_) be already present in the cache, otherwise
the cache isn't doing a very good job.  If read_mapping_folio() ends up
having to allocate the folio, today it only allocates a single page folio.
But if somebody else allocated it through the readahead code, and the
filesystem supports multi-page folios, then it will be larger than a
single page.  All callers must be prepared to handle a multi-page folio.

> > -		if (IS_ERR(page))
> > -			return PTR_ERR(page);
> > -
> > -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> > +		if (IS_ERR(folio))
> > +			return PTR_ERR(folio);
> >  
> > -		put_page(page);
> > +		n = memcpy_from_file_folio(buf, folio, pos, count);
> > +		folio_put(folio);
> >  
> >  		buf += n;
> >  		pos += n;
> > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > index 9fa462561e05..9917357b9e8f 100644
> > --- a/include/linux/highmem.h
> > +++ b/include/linux/highmem.h
> > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page, size_t offset, size_t len)
> >  	kunmap_local(addr);
> >  }
> >  
> > +/**
> > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > + * @to: The destination buffer.
> > + * @folio: The folio to copy from.
> > + * @pos: The position in the file.
> > + * @len: The maximum number of bytes to copy.
> > + *
> > + * Copy up to @len bytes from this folio.  This may be limited by PAGE_SIZE
> 
> I have a problem with 'may be limited'.  How is the caller to know this?

... from the return value?

> Won't this propagate a lot of checks in the caller?  Effectively replacing
> one complexity in the callers for another?

Look at the caller I converted!  It _reduces_ the amount of checks in
the caller.

> > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > + *
> > + * Return: The number of bytes copied from the folio.
> > + */
> > +static inline size_t memcpy_from_file_folio(char *to, struct folio *folio,
> > +		loff_t pos, size_t len)
> > +{
> > +	size_t offset = offset_in_folio(folio, pos);
> > +	char *from = kmap_local_folio(folio, offset);
> > +
> > +	if (folio_test_highmem(folio))
> > +		len = min(len, PAGE_SIZE - offset);
> > +	else
> > +		len = min(len, folio_size(folio) - offset);
> > +
> > +	memcpy(to, from, len);
> 
> Do we need flush_dcache_page() for the pages?

Why?  memcpy_from_page() doesn't have one.

> I gave this an attempt today before I realized read_mapping_folio() only
> reads a single page.  :-(
> 
> How does memcpy_from_file_folio() work beyond a single page?  And in that
> case what is the point?  The more I think about this the more confused I
> get.

In the highmem case, we map a single page and so cannot go beyond a
single page.  If the folio wasn't allocated from highmem, we're just
using its address directly, so we can access the whole folio.

Hope I've cleared up your confusions.