From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6F058C25B50
	for <linux-mm@archiver.kernel.org>; Mon, 23 Jan 2023 17:50:10 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 147A06B007B; Mon, 23 Jan 2023 12:50:10 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 0F81F6B0080; Mon, 23 Jan 2023 12:50:10 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F016D6B0089; Mon, 23 Jan 2023 12:50:09 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id DE3F66B007B
	for <linux-mm@kvack.org>; Mon, 23 Jan 2023 12:50:09 -0500 (EST)
Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id A563D1623C2
	for <linux-mm@kvack.org>; Mon, 23 Jan 2023 17:50:09 +0000 (UTC)
X-FDA: 80386802538.28.BDB45AE
Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51])
	by imf17.hostedemail.com (Postfix) with ESMTP id 92C5F40005
	for <linux-mm@kvack.org>; Mon, 23 Jan 2023 17:50:07 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=PtDpujp4;
	spf=pass (imf17.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1674496207;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=myY3gxRB0dEYaWId+MJCySxkYLOcOWzrmpY6Vy2xum4=;
	b=j6CJ5Fbf1rh527uL+rLivNiigJBk7xtb6cOCQSDn8uf3STtjlytIPZOqBnDP4MZ/o8aK8j
	bsykPr9yhpD9w+DbdvgErWaWWW+hzshN/OHlbdXXsUBCfj8fq6v1ZBes9flTACgnwnQzXO
	IVZny17xel5e4H3I8XlHU03KWRS8UG0=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=PtDpujp4;
	spf=pass (imf17.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674496207; a=rsa-sha256;
	cv=none;
	b=EzCfgjAKXN3hfGJ2qg/ME9wdoGtgSCxpYqfHpFe+1Pe9QfF42TJrkVKYRypF5Q5WLcBcK1
	/mKKaYAGtp5+AmRCG5GlVqfXcQ+4yjTJ9VrcJrxPKjRJjQjwz3WEmwCjhEiBpoxOPNPpBA
	FQAuGd9X0kMKBTohVNvT+7gOU1FovME=
Received: by mail-wm1-f51.google.com with SMTP id j17so9711873wms.0
        for <linux-mm@kvack.org>; Mon, 23 Jan 2023 09:50:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=myY3gxRB0dEYaWId+MJCySxkYLOcOWzrmpY6Vy2xum4=;
        b=PtDpujp4ygu8h01wt2lLeXXTMMtJ4hKLNeRfM7gZhHHdR/8/ktol7oKTCtKK3ejjul
         4W77y0z4Qs7nnL65RsgM5BapdtyMxpQHUMtLt449+fOQ0fajQ8r+HxGa4vzV+cmjd2z1
         KZfSt7eRW/cAzkSXETZkbIXxoHSy0OgirnBxmVEzlBYfp87iMDP0/yUryVSgWI4wfqvr
         19mmM1wVt9PeQCIxaxQkAXUyLQPFAHe6IHfZm7wqU60Rtrvz82/KXh2L3oqcNy4knoPi
         /H2iOYJOoV1inuvlr8s67r6/N2ynYmpgS5Atoei/sz6yqQxgwU4X7TvNlTeN+B6PdGUO
         ViBw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=myY3gxRB0dEYaWId+MJCySxkYLOcOWzrmpY6Vy2xum4=;
        b=mJwVmA/+jQC55D7oznCIkq34f1qPyO8CK6CeOgx14j0r1rjdvlHNIb3ZVmXUQ1nBWD
         24n1R5qEJcCmKRfKo/BK1z9RO0uOdB5JNN9tGIadSZ4lfFLZCKNqXHvwUhfmT4yv+e8Z
         xnksGXwD+fbb9qooq0VlMAO3vbh06tf6P/S1Q6bEGRfyENKXJCNUA5b86onomMQwJ34W
         GMK8RiWBK0GMX4Ie1bO3sZV0BX3Lbjx7apUo+4b5cW1pC3fITskivElYfHaykpohnLHw
         18DthEAgrLKFvYEVncTrjtTzAZG4pRI0rtHppffhTQOEIOaSXDgIQ1sVqSsW2gOCBnDt
         q+1g==
X-Gm-Message-State: AFqh2kq3rXdQc1mrjLif773+2N7PTGQpqSBtj5OMnVwQ6dIn1JITBh8D
	zqWV0RXUiBa5WAD/c5Ln68U=
X-Google-Smtp-Source: AMrXdXsYjXg6dn6K84z1ln8ZYb+KR+mVfBSnctLIYxxd7QqNzwSZxMqYGO5F2Nu7GOFYKafWTqlQUg==
X-Received: by 2002:a05:600c:540c:b0:3cf:7704:50ce with SMTP id he12-20020a05600c540c00b003cf770450cemr23730810wmb.38.1674496205986;
        Mon, 23 Jan 2023 09:50:05 -0800 (PST)
Received: from suse.localnet (host-79-36-37-176.retail.telecomitalia.it. [79.36.37.176])
        by smtp.gmail.com with ESMTPSA id r6-20020a05600c458600b003da286f8332sm11528119wmo.18.2023.01.23.09.50.04
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 23 Jan 2023 09:50:05 -0800 (PST)
From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To: Ira Weiny <ira.weiny@intel.com>, Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] memcpy_from_folio()
Date: Mon, 23 Jan 2023 18:50:04 +0100
Message-ID: <3660839.MHq7AAxBmi@suse>
In-Reply-To: <Y80t49tQtWO5N22J@casper.infradead.org>
References:
 <Y8qr8c3+SJLGWhUo@casper.infradead.org>
 <63cb9105105ce_bd04c29434@iweiny-mobl.notmuch>
 <Y80t49tQtWO5N22J@casper.infradead.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 92C5F40005
X-Stat-Signature: 8j6ie69tst8r4aykwd3onsspa4a5xuo3
X-Rspam-User: 
X-HE-Tag: 1674496207-496218
X-HE-Meta: U2FsdGVkX184hynhNVwwuuZra9+IdcKLP401p7qrxKHFfFZ5wbWmwIwBSooORVunV2SFX7fN5jW+Do0+sfxkv9EnRiY+3aUETMgHvmuk5hIJRg1WaYxvRPMpQGtPODBpTuA3kBejp7Vaxko1xhJOxq2ZSprcsZgh8GyXNFHcI4namRcQICpzTspBs5+gHBwXCtxByyBT66O2AogM4VlQilc6jwKzyPKm8RWGSIJv4zqyVb1+HQnaZ368KMK70fzaH2pVny1N8yeLKlacUvCQoIwji7xmBwkaYBc4+c72mOokdrjWgS93a+Jxtahi9pfGBelF/1Lm+hVkAJXwCq7Glih/C3NCrUGrkauJ/3MgmH97O2xkKQabTjUvgwpeA5ILl0ioijuDeyG+GIaW6wPPh5j0Gp07BvNE+bt2+NB5ILFLezTVqYfbBW4CbL/sgjVobKRDEjsLTnVgNzlQ5E+9Hr1h26fBISy2ThWDt914HJdQ+0wkC1epymQO7Vp6FSp761nmIRxwoGT/YcSgf3twAIERgd7oIKR8mLCd28UkNEaKyR+6isUFTHDkNIuFNbIvr/gNSZ6zR0QKyKpWKBqF57VnLbNBLr+IuXo6aaBetxB4F1KAYZSF8Zkwq4xFAq4+Wm26RnQrNJ2pFKMv8CWVaotlMNySsZ556Y5ZeBwQhAJhqo5XJ1TwICywq3E7FR6C0W+xcPmMi9M/fCXX/3G8tmk4vyqdz1KYtp5mRSCd6j6AKi9qJj2Zuw8X7eaPeTAUSPoxPzzISZMpNgCnJckaayjj061oOksvk9BuLueHlUJN7uxsqEuNDSNIUtxD9C63HQTi4UO91JSonBnM3TAVN14oBZQAbi/wm8BX1mIpDNXe4HZFA5XlSW9GFlqOEsL5QlbRlQCTtdZFihnlE18HSFfK1T/ks2Uc4duwVWWfmw4uWa5+4Z4J0YNUpffxSHDtr/kEm9PGfePf5IzZxz9
 XRC2CMGH
 58TEfdYQ6KKZHu60R0ZJJXEO0+AZnISFm7Fm9y55hW2tP4BkgLPkLMstfhKSM/nPxIFdUIDNtUIGnPo8DsZp2lNFCdcGukU8rM2qoF4RXnlQjoqPfXos9J0ef+I3mT3fv7eaARVib/qd056egnXoA4HzVcwCwUBoUn6Uho0QH6kia73IsB3W4IyhUk9eN/9d22uMij9NCXxr5iSlwEimxV6ZUssRWFOq7VSyOijrNZNxFrZ1ov0tCG5VMLGE5eUsuugmK6+5Zfkcr82yfLC91hISowie4Jjw4i+IaiIGPiO7QAqhJ34HeJ/GcXQ3ip1hjlgPmZ5rif0y9qCQt8rNdU3XbUmh5VrbDPBKpGsSidenf+fQrjhESl1Iu3n86pU/PWIAfWo5k6lhFN+1HBPd941cRwUQOEWrLB6LXQeb/fXEJzZI8jWDbgPsgiocL0OvQyZkv1M7kbfofYyoPw6OkWPuUoIjAsp44OjGhxukU3roexEGJAMqPMtADtYi0Fs7QmUe4HXzULqjHDF8=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On domenica 22 gennaio 2023 13:36:51 CET Matthew Wilcox wrote:
> On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> > Matthew Wilcox wrote:
> > > I think I have a good folio replacement for memcpy_from_page().  One of
> > > the annoying things about dealing with multi-page folios is that you
> > > can't kmap the entire folio, yet on systems without highmem, you don't
> > > need to.  It's also somewhat annoying in the caller to keep track
> > > of n/len/offset/pos/...
> > > 
> > > I think this is probably the best option.  We could have a loop that
> > > kmaps each page in the folio, but that seems like excessive complexity.
> > 
> > Why?  IMO better to contain the complexity of highmem systems into any
> > memcpy_[to,from]_folio() calls then spread them around the kernel.
> 
> Sure, but look at the conversion that I posted.  It's actually simpler
> than using the memcpy_from_page() API.
> 
> > > I'm happy to have highmem systems be less efficient, since they are
> > > anyway.  Another potential area of concern is that folios can be quite
> > > large and maybe having preemption disabled while we copy 2MB of data
> > > might be a bad thing.  If so, the API is fine with limiting the amount
> > > of data we copy, we just need to find out that it is a problem and
> > > decide what the correct limit is, if it's not folio_size().
> > 
> > Why not map the pages only when needed?  I agree that keeping preemption
> > disabled for a long time is a bad thing.  But kmap_local_page does not
> > disable preemption, only migration.
> 
> Some of the scheduler people aren't terribly happy about even disabling
> migration for a long time.  Is "copying 2MB of data" a long time?  If I've
> done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
> DIMMs installed.  That works out to 17GB/s and so copying 2MB of data
> will take 118us.  Probably OK for even the most demanding workload.
> 
> > Regardless any looping on the maps is going to only be on highmem systems
> > and we can map the pages only if/when needed.  Synchronization of the 
folio
> > should be handled by the caller.  So it is fine to all allow migration
> > during memcpy_from_folio().
> > 
> > So why not loop through the pages only when needed?
> 
> So you're proposing re-enabling migration after calling
> kmap_local_folio()?  I don't really understand.
> 
> > >  fs/ext4/verity.c           |   16 +++++++---------
> > >  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
> > >  include/linux/page-flags.h |    1 +
> > >  3 files changed, 37 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > > index e4da1704438e..afe847c967a4 100644
> > > --- a/fs/ext4/verity.c
> > > +++ b/fs/ext4/verity.c
> > > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void
> > > *buf, size_t count,> > 
> > >  			  loff_t pos)
> > >  
> > >  {
> > >  
> > >  	while (count) {
> > > 
> > > -		size_t n = min_t(size_t, count,
> > > -				 PAGE_SIZE - offset_in_page(pos));
> > > -		struct page *page;
> > > +		struct folio *folio;
> > > +		size_t n;
> > > 
> > > -		page = read_mapping_page(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > +		folio = read_mapping_folio(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > 
> > >  					 NULL);
> > 
> > Is this an issue with how many pages get read into the page
> > cache?  I went off on a tangent thinking this read the entire folio into
> > the cache.  But I see now I was wrong.  If this is operating page by page
> > why change this function at all?
> 
> The folio may (indeed _should_) be already present in the cache, otherwise
> the cache isn't doing a very good job.  If read_mapping_folio() ends up
> having to allocate the folio, today it only allocates a single page folio.
> But if somebody else allocated it through the readahead code, and the
> filesystem supports multi-page folios, then it will be larger than a
> single page.  All callers must be prepared to handle a multi-page folio.
> 
> > > -		if (IS_ERR(page))
> > > -			return PTR_ERR(page);
> > > -
> > > -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> > > +		if (IS_ERR(folio))
> > > +			return PTR_ERR(folio);
> > > 
> > > -		put_page(page);
> > > +		n = memcpy_from_file_folio(buf, folio, pos, count);
> > > +		folio_put(folio);
> > > 
> > >  		buf += n;
> > >  		pos += n;
> > > 
> > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > > index 9fa462561e05..9917357b9e8f 100644
> > > --- a/include/linux/highmem.h
> > > +++ b/include/linux/highmem.h
> > > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page,
> > > size_t offset, size_t len)> > 
> > >  	kunmap_local(addr);
> > >  
> > >  }
> > > 
> > > +/**
> > > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > > + * @to: The destination buffer.
> > > + * @folio: The folio to copy from.
> > > + * @pos: The position in the file.
> > > + * @len: The maximum number of bytes to copy.
> > > + *
> > > + * Copy up to @len bytes from this folio.  This may be limited by
> > > PAGE_SIZE
> > 
> > I have a problem with 'may be limited'.  How is the caller to know this?
> 
> ... from the return value?
> 
> > Won't this propagate a lot of checks in the caller?  Effectively replacing
> > one complexity in the callers for another?
> 
> Look at the caller I converted!  It _reduces_ the amount of checks in
> the caller.
> 
> > > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > > + *
> > > + * Return: The number of bytes copied from the folio.
> > > + */
> > > +static inline size_t memcpy_from_file_folio(char *to, struct folio
> > > *folio,
> > > +		loff_t pos, size_t len)
> > > +{
> > > +	size_t offset = offset_in_folio(folio, pos);
> > > +	char *from = kmap_local_folio(folio, offset);
> > > +
> > > +	if (folio_test_highmem(folio))
> > > +		len = min(len, PAGE_SIZE - offset);
> > > +	else
> > > +		len = min(len, folio_size(folio) - offset);
> > > +
> > > +	memcpy(to, from, len);
> > 
> > Do we need flush_dcache_page() for the pages?
> 
> Why?  memcpy_from_page() doesn't have one.
> 
> > I gave this an attempt today before I realized read_mapping_folio() only
> > reads a single page.  :-(
> > 
> > How does memcpy_from_file_folio() work beyond a single page?  And in that
> > case what is the point?  The more I think about this the more confused I
> > get.
> 
> In the highmem case, we map a single page and so cannot go beyond a
> single page.  If the folio wasn't allocated from highmem, we're just
> using its address directly, so we can access the whole folio.
> 
> Hope I've cleared up your confusions.

As you know I'm not a memory management expert, instead I'm just a user of 
these APIs. However, since I have been Cc'ed, I have read Matthew's RFC and 
the dialogue that follows. 

FWIW, I think I understand and I like this proposal. Therefore, I also hope to 
see it become a "real" patch.

Fabio