From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B918C83F1B for ; Mon, 14 Jul 2025 09:33:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81F776B007B; Mon, 14 Jul 2025 05:33:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D1466B0089; Mon, 14 Jul 2025 05:33:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C0076B008A; Mon, 14 Jul 2025 05:33:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 58F9C6B007B for ; Mon, 14 Jul 2025 05:33:11 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 097D41DA48A for ; Mon, 14 Jul 2025 09:33:11 +0000 (UTC) X-FDA: 83662356582.18.C0170A4 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf09.hostedemail.com (Postfix) with ESMTP id AF30014000A for ; Mon, 14 Jul 2025 09:33:08 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uDa8Kuzy; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=ckq19csk; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="rd/SVjei"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zZn5+2pG; spf=pass (imf09.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752485589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d58zeA97b6KomVEz3OwBjB6joIqhDEGdapyBAJdFDO4=; b=2dlRBdfkUSo9O2TRIT+gP6L0ncdvDN2F1MolU9noeZVKkPZILvnozhuXdql4x5q8/2wpXE hSef0v/791yW8OGHLiooBP0jtZBzBPwe6gaC4Zr5EiwguAfShahGpr7UO92MXchuItlz4P PgBpSqcPxITCRHW6yIs5T/3ab6Ie/G4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uDa8Kuzy; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=ckq19csk; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="rd/SVjei"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zZn5+2pG; spf=pass (imf09.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752485589; a=rsa-sha256; cv=none; b=AAl7feelDHh0wqY1uVrRQEEP6/v6oVacX4yeaOik3e+gO+TpV+Ui5cUdCB8Uzailcn7lKg 9regN2qdkkPFZ8e0bqVMX70w472aN3OP4R/acdMTU3If1IxxmeDCDWK/otKjpff+AhVJ+P 0z4g64qLoJwYB0+ci1m/VkWCxptawoQ= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F086B1F387; Mon, 14 Jul 2025 09:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1752485587; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d58zeA97b6KomVEz3OwBjB6joIqhDEGdapyBAJdFDO4=; b=uDa8KuzyXgrwwsYs3OSF6xtgLSIqrr+CpRxVRJmSg8aXN+LS6rYy2XEP2/jglPn8nr8c9S xa5HrftK3ETtEbO/wNR5+VnDfMHF31H4VKQHhvD6LlHOZAg8QefiKDU3bZaA1IekNKcd2w pViZCxpxHrtrdga1dmXDa8YqKi5vngM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1752485587; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d58zeA97b6KomVEz3OwBjB6joIqhDEGdapyBAJdFDO4=; b=ckq19csk+VQFzjGjaKsdslmAZn8AZqV7OQb9hIaOpnVYjtyVEXZZC9F79AhN542pzvhhAp 0Zae3GSxKMBEHbBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1752485586; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d58zeA97b6KomVEz3OwBjB6joIqhDEGdapyBAJdFDO4=; b=rd/SVjeiYRWfoGR0q7iBm66A1cHuGG83JLJ2Terf11aEOxj6aUs1IFWnOxIQ8MtWgUjsXm z0e86hBhGlS+mUFGKM1U8ESCvGFToIX86beJXzwx5+P7kSIzC6BB2N3NdFmPJ014G4+CQO O5t9VeL7sQJZ88TWZa+0FW6WbQXqOFs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1752485586; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d58zeA97b6KomVEz3OwBjB6joIqhDEGdapyBAJdFDO4=; b=zZn5+2pGv98ETicUtJVq48eQ+um7+gutKTEDb23w1b9wiOLp3CAKECN8LOAdqzHm6Yr5Jm 54HyA08jTaoST1Bg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E12DA138A1; Mon, 14 Jul 2025 09:33:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id TPH0NtLOdGjIQgAAD6G6ig (envelope-from ); Mon, 14 Jul 2025 09:33:06 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 8C12BA06F4; Mon, 14 Jul 2025 11:33:06 +0200 (CEST) Date: Mon, 14 Jul 2025 11:33:06 +0200 From: Jan Kara To: Youling Tang Cc: Matthew Wilcox , Andrew Morton , Jan Kara , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chizhiling@163.com, Youling Tang , Chi Zhiling Subject: Re: [PATCH] mm/filemap: Align last_index to folio size Message-ID: References: <20250711055509.91587-1-youling.tang@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250711055509.91587-1-youling.tang@linux.dev> X-Rspamd-Queue-Id: AF30014000A X-Stat-Signature: tfx4h6pjiuky8aup65gdrryuz81rt8m9 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1752485588-324091 X-HE-Meta: U2FsdGVkX18wi49vWFU6WDfDarGSSpMv017OFlunV4TvHyFBNLPsh85d3/1CfxOZYz26/mL9TU3pwO5e7BhzpoEFw0zyJtWWq4OCANIzBKtr3fBDCpU4YXB/BBjJRFqd8+HNZ29st28Bvsnb1/jO3CU6BcyqCl+GitxbMi4K8QD1rc88urWTRypwXBF+PzfVoaA+Sadq+9CheUhl3e0DSTcDFeXGb+7AIvQxmoDvEw1/xbHd3UZ+aWf5RYye+gMCp29S37QrA5nJFlVasKzh3If9oXhhUd008Bz8OkzHamkdz96PahYTWpiYmHk5UzEa1W2+gJqsYCv62GPhGWgXInm+kidcBW7g0+8oqqoUPDqjqVkMyabPFiB8SvGMUP4ArJHooonO2dObbFr/FZUPcZ3jN3J4C3r0wTIk9FDWDwV57+3V7SsjY5aUwGtp3aCLd5BQ2MIWw9ut+qx5yRQe4VMW3UKYVHsLC3S2An3nVxucV9vsalUmwZOQa2ICf27Qu91oZKyomuTHVXj8B4uyVJYJ8ZVXZI7lzY6S3yMxuNMwYHcnizNYm5655lyGGBr/8ZOrDh1q/W9Yr9mekLhXw8oG3wTVH7Pom5r2L7d/6ExlV9+BoR3Zv0OjNIdMMtek2UvSe4KSK/N5O8vPqRxvd3del+/VefT4R2HtXGPPTabFQgkeuY3TPrztBOqQo+MMwwnwwadNnW+j6Gh5cOUnAa7DP+lkM6FSvQkB8S9QdDpWlJJAHeYFMQccniahx038UH6VZCc97pysIaPwXbjrtP95B7Ngvn2sJUxL4Cjpi9uXc/PmAzGKrhK0viSb+XqA/IVA1IlGn06KleDM72NjPwOJTzEXV6qm7lkgQMcooPkGd5LONT4jyB/oF5Bv8r1acwqBVSp9yzc5xTs3Wb2Aih0GduLy8ci2kYBwCTIZdb9eHercCeA9GqPwpLaxkYieSO6voKSE6PRcqEFbuym h6eEx3Cs R+dvqRqQ1BdGaZ4R9WnTXapwqMOQAwc+Kj1olVx17qVum+cnr87uxf7iPTiI54f6NWSoS4J8bAVUnZOtBPIDri8cG/+wpAMrUHK5aD2xc361oLGigX6UjSlGx3WSBwjBP6l8SF3UBfMjuejoRHZ3080ErY4JsFAuSulKYxhseageFOEcw1cSrHgiUg8ZUSdMEXt0sinr3bHP+ZWkq8s1KjcZrBiiNV5T8UHCHOreF9gzbA2stVHkYz0cXQAY6u4xGLRCxMQ/dWQ1b+5QZRnzC0bds+cfWFPoA9BaKWYHA0XXE36aYHCQTZzgUv5L8BK83iQHNwqJ1Up2nOLSClc65KHOw0QXjEfaOHhbUvyW+hjSRdv3ue0iTMzhi+jli90cJ1rI2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 11-07-25 13:55:09, Youling Tang wrote: > From: Youling Tang > > On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE > enabled, We observed the following readahead behaviors: > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=64k count=1 > # ./tools/mm/page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136d4c __RU_l_________H______t_________________F_1 > 1 136d4d __RU_l__________T_____t_________________F_1 > 2 136d4e __RU_l__________T_____t_________________F_1 > 3 136d4f __RU_l__________T_____t_________________F_1 > ... > c 136bb8 __RU_l_________H______t_________________F_1 > d 136bb9 __RU_l__________T_____t_________________F_1 > e 136bba __RU_l__________T_____t_________________F_1 > f 136bbb __RU_l__________T_____t_________________F_1 <-- first read > 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag > 11 13c2cd ___U_l__________T_____t______________I__F_1 > 12 13c2ce ___U_l__________T_____t______________I__F_1 > 13 13c2cf ___U_l__________T_____t______________I__F_1 > ... > 1c 1405d4 ___U_l_________H______t_________________F_1 > 1d 1405d5 ___U_l__________T_____t_________________F_1 > 1e 1405d6 ___U_l__________T_____t_________________F_1 > 1f 1405d7 ___U_l__________T_____t_________________F_1 > [ra_size = 32, req_count = 16, async_size = 16] > > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=60k count=1 > # ./page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136048 __RU_l_________H______t_________________F_1 > ... > c 110a40 __RU_l_________H______t_________________F_1 > d 110a41 __RU_l__________T_____t_________________F_1 > e 110a42 __RU_l__________T_____t_________________F_1 <-- first read > f 110a43 __RU_l__________T_____t_________________F_1 <-- first readahead flag > 10 13e7a8 ___U_l_________H______t_________________F_1 > ... > 20 137a00 ___U_l_________H______t_______P______I__F_1 <-- second readahead flag (20 - 2f) > 21 137a01 ___U_l__________T_____t_______P______I__F_1 > ... > 3f 10d4af ___U_l__________T_____t_______P_________F_1 > [first readahead: ra_size = 32, req_count = 15, async_size = 17] > > When reading 64k data (same for 61-63k range, where last_index is page-aligned > in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra() > and the PG_readahead flag is set on the next folio (the one containing 0x10 page). > > When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra(). > However, in this case the readahead flag is set on the 0xf page. Although the > requested read size (req_count) is 60k, the actual read will be aligned to > folio size (64k), which triggers the readahead flag and initiates asynchronous > readahead via page_cache_async_ra(). This results in two readahead operations > totaling 256k. > > The root cause is that when the requested size is smaller than the actual read > size (due to folio alignment), it triggers asynchronous readahead. By changing > last_index alignment from page size to folio size, we ensure the requested size > matches the actual read size, preventing the case where a single read operation > triggers two readahead operations. > > After applying the patch: > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=60k count=1 > # ./page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136d4c __RU_l_________H______t_________________F_1 > 1 136d4d __RU_l__________T_____t_________________F_1 > 2 136d4e __RU_l__________T_____t_________________F_1 > 3 136d4f __RU_l__________T_____t_________________F_1 > ... > c 136bb8 __RU_l_________H______t_________________F_1 > d 136bb9 __RU_l__________T_____t_________________F_1 > e 136bba __RU_l__________T_____t_________________F_1 <-- first read > f 136bbb __RU_l__________T_____t_________________F_1 > 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag > 11 13c2cd ___U_l__________T_____t______________I__F_1 > 12 13c2ce ___U_l__________T_____t______________I__F_1 > 13 13c2cf ___U_l__________T_____t______________I__F_1 > ... > 1c 1405d4 ___U_l_________H______t_________________F_1 > 1d 1405d5 ___U_l__________T_____t_________________F_1 > 1e 1405d6 ___U_l__________T_____t_________________F_1 > 1f 1405d7 ___U_l__________T_____t_________________F_1 > [ra_size = 32, req_count = 16, async_size = 16] > > The same phenomenon will occur when reading from 49k to 64k. Set the readahead > flag to the next folio. > > Because the minimum order of folio in address_space equals the block size (at > least in xfs and bcachefs that already support bs > ps), having request_count > aligned to block size will not cause overread. > > Co-developed-by: Chi Zhiling > Signed-off-by: Chi Zhiling > Signed-off-by: Youling Tang I agree with analysis of the problem but not quite with the solution. See below. > diff --git a/mm/filemap.c b/mm/filemap.c > index 765dc5ef6d5a..56a8656b6f86 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, > unsigned int flags; > int err = 0; > > - /* "last_index" is the index of the page beyond the end of the read */ > - last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE); > + /* "last_index" is the index of the folio beyond the end of the read */ > + last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping)); > + last_index >>= PAGE_SHIFT; I think that filemap_get_pages() shouldn't be really trying to guess what readahead code needs and round last_index based on min folio order. After all the situation isn't special for LBS filesystems. It can also happen that the readahead mark ends up in the middle of large folio for other reasons. In fact, we already do have code in page_cache_ra_order() -> ra_alloc_folio() that handles rounding of index where mark should be placed so your changes essentially try to outsmart that code which is not good. I think the solution should really be placed in page_cache_ra_order() + ra_alloc_folio() instead. In fact the problem you are trying to solve was kind of introduced (or at least made more visible) by my commit ab4443fe3ca62 ("readahead: avoid multiple marked readahead pages"). There I've changed the code to round the index down because I've convinced myself it doesn't matter and rounding down is easier to handle in that place. But your example shows there are cases where rounding down has weird consequences and rounding up would have been better. So I think we need to come up with a method how to round up the index of marked folio to fix your case without reintroducing problems mentioned in commit ab4443fe3ca62. Honza -- Jan Kara SUSE Labs, CR