linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Haibo Li <haibo.li@mediatek.com>
Cc: linux-kernel@vger.kernel.org,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	AngeloGioacchino Del Regno
	<angelogioacchino.delregno@collabora.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, xiaoming.yu@mediatek.com
Subject: Re: [PATCH] mm/filemap.c:fix update prev_pos after one read request done
Date: Wed, 9 Aug 2023 18:44:46 +0200	[thread overview]
Message-ID: <20230809164446.uwxryhrfbjka2lio@quack3> (raw)
In-Reply-To: <20230628110220.120134-1-haibo.li@mediatek.com>

On Wed 28-06-23 19:02:20, Haibo Li wrote:
> ra->prev_pos tracks the last visited byte in the previous read request.
> It is used to check whether it is sequential read in
> ondemand_readahead and thus affects the readahead window.
> 
> From commit 06c0444290ce ("mm/filemap.c: generic_file_buffered_read()
> now uses find_get_pages_contig"),update logic of prev_pos is changed.
> It updates prev_pos after each returns from filemap_get_pages.
> But the read request from user may be not fully completed
> at this point.
> The updated prev_pos impacts the subsequent readahead window.
> 
> The real problem is performance drop of fsck_msdos between linux-5.4
> and linux-5.15(also linux-6.4).
> Comparing to linux-5.4,It spends about 110% time and read 140% pages.
> The read pattern of fsck_msdos is not fully sequential.
> 
> Simplified read pattern of fsck_msdos likes below:
> 1.read at page offset 0xa,size 0x1000
> 2.read at other page offset like 0x20,size 0x1000
> 3.read at page offset 0xa,size 0x4000
> 4.read at page offset 0xe,size 0x1000
> 
> Here is the read status on linux-6.4:
> 1.after read at page offset 0xa,size 0x1000
>     ->page ofs 0xa go into pagecache
> 2.after read at page offset 0x20,size 0x1000
>     ->page ofs 0x20 go into pagecache
> 3.read at page offset 0xa,size 0x4000
>     ->filemap_get_pages read ofs 0xa from pagecache and returns
>     ->prev_pos is updated to 0xb and goto next loop
>     ->filemap_get_pages tends to read ofs 0xb,size 0x3000
>     ->initial_readahead case in ondemand_readahead since prev_pos is
>       the same as request ofs.
>     ->read 8 pages while async size is 5 pages
>       (PageReadahead flag at page 0xe)
> 4.read at page offset 0xe,size 0x1000
>     ->hit page 0xe with PageReadahead flag set,double the ra_size.
>       read 16 pages while async size is 16 pages
> Now it reads 24 pages while actually uses 5 pages
> 
> on linux-5.4:
> 1.the same as 6.4
> 2.the same as 6.4
> 3.read at page offset 0xa,size 0x4000
>     ->read ofs 0xa from pagecache
>     ->read ofs 0xb,size 0x3000 using page_cache_sync_readahead
>       read 3 pages
>     ->prev_pos is updated to 0xd before generic_file_buffered_read
>       returns
> 4.read at page offset 0xe,size 0x1000
>     ->initial_readahead case in ondemand_readahead since
>       request ofs-prev_pos==1
>     ->read 4 pages while async size is 3 pages
> 
> Now it reads 7 pages while actually uses 5 pages.
> 
> In above demo,the initial_readahead case is triggered by offset
> of user request on linux-5.4.
> While it may be triggered by update logic of prev_pos on linux-6.4.
> 
> To fix the performance drop,update prev_pos after finishing one read
> request.
> 
> Signed-off-by: Haibo Li <haibo.li@mediatek.com>

Sorry for the delayed reply. This seems to have fallen through the cracks.
So if I understand your analysis right, you are complaining that random
read larger than 1 page gets misdetected as sequential read and so "larger
than necessary" readahead happens. I tend to agree with your opinion and your
solution looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

Willy, any opinion? Andrew, can you pickup the patch if Willy doesn't
object?

								Honza

> ---
>  mm/filemap.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 83dda76d1fc3..16b2054eee71 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2670,6 +2670,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  	int i, error = 0;
>  	bool writably_mapped;
>  	loff_t isize, end_offset;
> +	loff_t last_pos = ra->prev_pos;
>  
>  	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
>  		return 0;
> @@ -2721,8 +2722,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  		 * When a read accesses the same folio several times, only
>  		 * mark it as accessed the first time.
>  		 */
> -		if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
> -							fbatch.folios[0]))
> +		if (!pos_same_folio(iocb->ki_pos, last_pos - 1,
> +				    fbatch.folios[0]))
>  			folio_mark_accessed(fbatch.folios[0]);
>  
>  		for (i = 0; i < folio_batch_count(&fbatch); i++) {
> @@ -2749,7 +2750,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  
>  			already_read += copied;
>  			iocb->ki_pos += copied;
> -			ra->prev_pos = iocb->ki_pos;
> +			last_pos = iocb->ki_pos;
>  
>  			if (copied < bytes) {
>  				error = -EFAULT;
> @@ -2763,7 +2764,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
>  
>  	file_accessed(filp);
> -
> +	ra->prev_pos = last_pos;
>  	return already_read ? already_read : error;
>  }
>  EXPORT_SYMBOL_GPL(filemap_read);
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  reply	other threads:[~2023-08-09 16:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 11:02 Haibo Li
2023-08-09 16:44 ` Jan Kara [this message]
2023-08-09 18:45   ` Andrew Morton
2023-08-10  9:45     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230809164446.uwxryhrfbjka2lio@quack3 \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=haibo.li@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=matthias.bgg@gmail.com \
    --cc=willy@infradead.org \
    --cc=xiaoming.yu@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox