linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Youling Tang <youling.tang@linux.dev>
To: Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, chizhiling@163.com,
	youling.tang@linux.dev, Youling Tang <tangyouling@kylinos.cn>,
	Chi Zhiling <chizhiling@kylinos.cn>
Subject: [PATCH] mm/filemap: Align last_index to folio size
Date: Fri, 11 Jul 2025 13:55:09 +0800	[thread overview]
Message-ID: <20250711055509.91587-1-youling.tang@linux.dev> (raw)

From: Youling Tang <tangyouling@kylinos.cn>

On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE
enabled, We observed the following readahead behaviors:
 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=64k count=1
 # ./tools/mm/page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136d4c	__RU_l_________H______t_________________F_1
 1	136d4d	__RU_l__________T_____t_________________F_1
 2	136d4e	__RU_l__________T_____t_________________F_1
 3	136d4f	__RU_l__________T_____t_________________F_1
 ...
 c	136bb8	__RU_l_________H______t_________________F_1
 d	136bb9	__RU_l__________T_____t_________________F_1
 e	136bba	__RU_l__________T_____t_________________F_1
 f	136bbb	__RU_l__________T_____t_________________F_1   <-- first read
 10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
 11	13c2cd	___U_l__________T_____t______________I__F_1
 12	13c2ce	___U_l__________T_____t______________I__F_1
 13	13c2cf	___U_l__________T_____t______________I__F_1
 ...
 1c	1405d4	___U_l_________H______t_________________F_1
 1d	1405d5	___U_l__________T_____t_________________F_1
 1e	1405d6	___U_l__________T_____t_________________F_1
 1f	1405d7	___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136048	__RU_l_________H______t_________________F_1
 ...
 c	110a40	__RU_l_________H______t_________________F_1
 d	110a41	__RU_l__________T_____t_________________F_1
 e	110a42	__RU_l__________T_____t_________________F_1   <-- first read
 f	110a43	__RU_l__________T_____t_________________F_1   <-- first readahead flag
 10	13e7a8	___U_l_________H______t_________________F_1
 ...
 20	137a00	___U_l_________H______t_______P______I__F_1   <-- second readahead flag (20 - 2f)
 21	137a01	___U_l__________T_____t_______P______I__F_1
 ...
 3f	10d4af	___U_l__________T_____t_______P_________F_1
 [first readahead: ra_size = 32, req_count = 15, async_size = 17]

When reading 64k data (same for 61-63k range, where last_index is page-aligned
in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra()
and the PG_readahead flag is set on the next folio (the one containing 0x10 page).

When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra().
However, in this case the readahead flag is set on the 0xf page. Although the
requested read size (req_count) is 60k, the actual read will be aligned to
folio size (64k), which triggers the readahead flag and initiates asynchronous
readahead via page_cache_async_ra(). This results in two readahead operations
totaling 256k.

The root cause is that when the requested size is smaller than the actual read
size (due to folio alignment), it triggers asynchronous readahead. By changing
last_index alignment from page size to folio size, we ensure the requested size
matches the actual read size, preventing the case where a single read operation
triggers two readahead operations.

After applying the patch:
 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136d4c	__RU_l_________H______t_________________F_1
 1	136d4d	__RU_l__________T_____t_________________F_1
 2	136d4e	__RU_l__________T_____t_________________F_1
 3	136d4f	__RU_l__________T_____t_________________F_1
 ...
 c	136bb8	__RU_l_________H______t_________________F_1
 d	136bb9	__RU_l__________T_____t_________________F_1
 e	136bba	__RU_l__________T_____t_________________F_1   <-- first read
 f	136bbb	__RU_l__________T_____t_________________F_1
 10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
 11	13c2cd	___U_l__________T_____t______________I__F_1
 12	13c2ce	___U_l__________T_____t______________I__F_1
 13	13c2cf	___U_l__________T_____t______________I__F_1
 ...
 1c	1405d4	___U_l_________H______t_________________F_1
 1d	1405d5	___U_l__________T_____t_________________F_1
 1e	1405d6	___U_l__________T_____t_________________F_1
 1f	1405d7	___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

The same phenomenon will occur when reading from 49k to 64k. Set the readahead
flag to the next folio.

Because the minimum order of folio in address_space equals the block size (at
least in xfs and bcachefs that already support bs > ps), having request_count
aligned to block size will not cause overread.

Co-developed-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
---
 include/linux/pagemap.h | 6 ++++++
 mm/filemap.c            | 5 +++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e63fbfbd5b0f..447bb264fd94 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -480,6 +480,12 @@ mapping_min_folio_nrpages(struct address_space *mapping)
 	return 1UL << mapping_min_folio_order(mapping);
 }
 
+static inline unsigned long
+mapping_min_folio_nrbytes(struct address_space *mapping)
+{
+	return mapping_min_folio_nrpages(mapping) << PAGE_SHIFT;
+}
+
 /**
  * mapping_align_index() - Align index for this mapping.
  * @mapping: The address_space.
diff --git a/mm/filemap.c b/mm/filemap.c
index 765dc5ef6d5a..56a8656b6f86 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	unsigned int flags;
 	int err = 0;
 
-	/* "last_index" is the index of the page beyond the end of the read */
-	last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
+	/* "last_index" is the index of the folio beyond the end of the read */
+	last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping));
+	last_index >>= PAGE_SHIFT;
 retry:
 	if (fatal_signal_pending(current))
 		return -EINTR;
-- 
2.34.1



             reply	other threads:[~2025-07-11  5:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-11  5:55 Youling Tang [this message]
2025-07-11 16:08 ` David Hildenbrand
2025-07-14  8:16   ` Ryan Roberts
2025-07-14  9:33 ` Jan Kara
2025-08-12  9:08   ` Youling Tang
2025-09-03 22:50     ` Andrew Morton
2025-09-04  2:04       ` Youling Tang
2025-09-05 12:50       ` Jan Kara
2025-09-09  6:15         ` Youling Tang
2025-09-09  6:41           ` Andrew Morton
2025-07-14 22:34 ` Klara Modin
2025-07-14 22:43   ` Andrew Morton
2025-07-14 22:48     ` Klara Modin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250711055509.91587-1-youling.tang@linux.dev \
    --to=youling.tang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=chizhiling@163.com \
    --cc=chizhiling@kylinos.cn \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tangyouling@kylinos.cn \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox