From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5CAAC83F17 for ; Mon, 14 Jul 2025 08:16:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 545CA6B007B; Mon, 14 Jul 2025 04:16:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F6246B0089; Mon, 14 Jul 2025 04:16:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40C2D6B008A; Mon, 14 Jul 2025 04:16:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2EC9B6B007B for ; Mon, 14 Jul 2025 04:16:09 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E9883160519 for ; Mon, 14 Jul 2025 08:16:08 +0000 (UTC) X-FDA: 83662162416.09.32490C8 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf04.hostedemail.com (Postfix) with ESMTP id 08F954000A for ; Mon, 14 Jul 2025 08:16:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf04.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752480967; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j79MQL+DC5YrZMtqq6TevzYmLMnNRlrnaJLgzyaB1IQ=; b=scdCGuzXb3u582b64DY+aApyTmPJxrzto11WXKcSWgOABtmQyPK2X1K5Dh/7uAP/FMnup4 XhB2Z4EeResu3ZIqX53s87SH3OuICsvqRMjd2sT2JyeNZv+MDqzWVLEyMGYWGkaW/iG7dA TDfrbQY+E3DNap9QZt2YkbLM00WSqHM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752480967; a=rsa-sha256; cv=none; b=Qex6iG0fNN7XjlKr6Ryzd25y7VIP8+66A3DRX8DXdycGy/+//7mWFpNRYpTb4Fg5Vyz3xp o6fPMmR/fnHEGa/jBxBoLo3omZH6B9xkTuZ71wTv/JqomfZAVrhtaKOjoETB/nWbH33sCT 8AzUz4g6MEiKeQLzDVY6SMvHCfqaP94= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf04.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1BD351764; Mon, 14 Jul 2025 01:15:57 -0700 (PDT) Received: from [10.57.83.2] (unknown [10.57.83.2]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A368D3F66E; Mon, 14 Jul 2025 01:16:04 -0700 (PDT) Message-ID: <86a82f42-918c-45f8-ac49-2b1f341ee0d3@arm.com> Date: Mon, 14 Jul 2025 09:16:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/filemap: Align last_index to folio size Content-Language: en-GB To: David Hildenbrand , Youling Tang , Matthew Wilcox , Andrew Morton Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chizhiling@163.com, Youling Tang , Chi Zhiling References: <20250711055509.91587-1-youling.tang@linux.dev> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 08F954000A X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: om8aq41pidapqw5qfdu44yfbkdiua1e1 X-HE-Tag: 1752480966-149029 X-HE-Meta: U2FsdGVkX1/IdzHoQJgIID6AmFK6DSI7R4nHc9vNEULwG0pa9SStjRoX4R6miR0sqznmDC27yyYZNsyL9zTCF+qrzT23IOxh2pQ1TYpNzmTBlDixiTgYjGyMpA8+2lzhPZcSNKATKKdoQJN2oFoVFh/TcdHCR+kas3NfBj7S1I9aEvevx9YZeAH7ULlUzMygwRffzhCtfyrOo7+d0twNAnLds6mQdo35XHgQmpL/y1MFwRNowOA2bCp0KLvAQpLWAck76JtMvdrvezBH89erQeEj8zyPKInzdb1MaKLJ3VgG7hfhzwuzPwTojDS6O0XJxgv2ffatTPjMIoScpxwxc5jdQsn+792D5wxKmzQWCx3Q2eBf9p4GVJB8jb4u6dqQfb64bgoE/ub4YdQS6afnzjhyx9dXZh2nmOIYG2jyFztc8FucuBcmcjNtx070fVowWU0Gz41aRbFHxW+T88QbzKKWkwsjQsILP/ViYdyqBS2xEenNujvcBd8OeRTZhRq49QTXJwjTMb9cqnKkNlizvzu71l4Kc1BS1eB3otQjtS4uHt4AFkM7J7hVJgmOPr+IpaKlUPZt/EIJJBo2KhYmaMQ5AeXzXIqWHc+Ev8rBD/GnmhwFn6Xb5r7BApP6nfxwgEOFS+axuufIHzbiLrxleALqh1WVzLryrKWwxkKJCeappIFaFl8yFy6bdKt3m6pigWA0PfHwLlD5SbFctJ9lP7YKrvEg/gJQdXGl9qV9J0c5a4+uNK45dj2Mx9rAOuckCtb302AL+oScSeIJa+Xv9xjI3J0iluSwcICJfdbsEc6y7aXz+XZ1HNmoK8PA1MNEx3qb2R3gLAdjRXbaINgHqf4KXjYHOiuA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/07/2025 17:08, David Hildenbrand wrote: > CCing Ryan, who recently fiddled with readahead. > > > On 11.07.25 07:55, Youling Tang wrote: >> From: Youling Tang >> >> On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE >> enabled, We observed the following readahead behaviors: >>   # echo 3 > /proc/sys/vm/drop_caches >>   # dd if=test of=/dev/null bs=64k count=1 >>   # ./tools/mm/page-types -r -L -f  /mnt/xfs/test >>   foffset    offset    flags >>   0    136d4c    __RU_l_________H______t_________________F_1 >>   1    136d4d    __RU_l__________T_____t_________________F_1 >>   2    136d4e    __RU_l__________T_____t_________________F_1 >>   3    136d4f    __RU_l__________T_____t_________________F_1 >>   ... >>   c    136bb8    __RU_l_________H______t_________________F_1 >>   d    136bb9    __RU_l__________T_____t_________________F_1 >>   e    136bba    __RU_l__________T_____t_________________F_1 >>   f    136bbb    __RU_l__________T_____t_________________F_1   <-- first read >>   10    13c2cc    ___U_l_________H______t______________I__F_1   <-- readahead >> flag >>   11    13c2cd    ___U_l__________T_____t______________I__F_1 >>   12    13c2ce    ___U_l__________T_____t______________I__F_1 >>   13    13c2cf    ___U_l__________T_____t______________I__F_1 >>   ... >>   1c    1405d4    ___U_l_________H______t_________________F_1 >>   1d    1405d5    ___U_l__________T_____t_________________F_1 >>   1e    1405d6    ___U_l__________T_____t_________________F_1 >>   1f    1405d7    ___U_l__________T_____t_________________F_1 >>   [ra_size = 32, req_count = 16, async_size = 16] >> >>   # echo 3 > /proc/sys/vm/drop_caches >>   # dd if=test of=/dev/null bs=60k count=1 >>   # ./page-types -r -L -f  /mnt/xfs/test >>   foffset    offset    flags >>   0    136048    __RU_l_________H______t_________________F_1 >>   ... >>   c    110a40    __RU_l_________H______t_________________F_1 >>   d    110a41    __RU_l__________T_____t_________________F_1 >>   e    110a42    __RU_l__________T_____t_________________F_1   <-- first read >>   f    110a43    __RU_l__________T_____t_________________F_1   <-- first >> readahead flag >>   10    13e7a8    ___U_l_________H______t_________________F_1 >>   ... >>   20    137a00    ___U_l_________H______t_______P______I__F_1   <-- second >> readahead flag (20 - 2f) >>   21    137a01    ___U_l__________T_____t_______P______I__F_1 >>   ... >>   3f    10d4af    ___U_l__________T_____t_______P_________F_1 >>   [first readahead: ra_size = 32, req_count = 15, async_size = 17] >> >> When reading 64k data (same for 61-63k range, where last_index is page-aligned >> in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra() >> and the PG_readahead flag is set on the next folio (the one containing 0x10 >> page). >> >> When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra(). >> However, in this case the readahead flag is set on the 0xf page. Although the >> requested read size (req_count) is 60k, the actual read will be aligned to >> folio size (64k), which triggers the readahead flag and initiates asynchronous >> readahead via page_cache_async_ra(). This results in two readahead operations >> totaling 256k. >> >> The root cause is that when the requested size is smaller than the actual read >> size (due to folio alignment), it triggers asynchronous readahead. By changing >> last_index alignment from page size to folio size, we ensure the requested size >> matches the actual read size, preventing the case where a single read operation >> triggers two readahead operations. I recently fiddled with mmap readahead paths, doing similar-ish things. I haven't looked at the non-mmap paths so don't consider myself expert here. But what you are saying makes sense and superficially the solution looks good to me, so: Reviewed-by: Ryan Roberts with one nit below... >> >> After applying the patch: >>   # echo 3 > /proc/sys/vm/drop_caches >>   # dd if=test of=/dev/null bs=60k count=1 >>   # ./page-types -r -L -f  /mnt/xfs/test >>   foffset    offset    flags >>   0    136d4c    __RU_l_________H______t_________________F_1 >>   1    136d4d    __RU_l__________T_____t_________________F_1 >>   2    136d4e    __RU_l__________T_____t_________________F_1 >>   3    136d4f    __RU_l__________T_____t_________________F_1 >>   ... >>   c    136bb8    __RU_l_________H______t_________________F_1 >>   d    136bb9    __RU_l__________T_____t_________________F_1 >>   e    136bba    __RU_l__________T_____t_________________F_1   <-- first read >>   f    136bbb    __RU_l__________T_____t_________________F_1 >>   10    13c2cc    ___U_l_________H______t______________I__F_1   <-- readahead >> flag >>   11    13c2cd    ___U_l__________T_____t______________I__F_1 >>   12    13c2ce    ___U_l__________T_____t______________I__F_1 >>   13    13c2cf    ___U_l__________T_____t______________I__F_1 >>   ... >>   1c    1405d4    ___U_l_________H______t_________________F_1 >>   1d    1405d5    ___U_l__________T_____t_________________F_1 >>   1e    1405d6    ___U_l__________T_____t_________________F_1 >>   1f    1405d7    ___U_l__________T_____t_________________F_1 >>   [ra_size = 32, req_count = 16, async_size = 16] >> >> The same phenomenon will occur when reading from 49k to 64k. Set the readahead >> flag to the next folio. >> >> Because the minimum order of folio in address_space equals the block size (at >> least in xfs and bcachefs that already support bs > ps), having request_count >> aligned to block size will not cause overread. >> >> Co-developed-by: Chi Zhiling >> Signed-off-by: Chi Zhiling >> Signed-off-by: Youling Tang >> --- >>   include/linux/pagemap.h | 6 ++++++ >>   mm/filemap.c            | 5 +++-- >>   2 files changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index e63fbfbd5b0f..447bb264fd94 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -480,6 +480,12 @@ mapping_min_folio_nrpages(struct address_space *mapping) >>       return 1UL << mapping_min_folio_order(mapping); >>   } >>   +static inline unsigned long >> +mapping_min_folio_nrbytes(struct address_space *mapping) >> +{ >> +    return mapping_min_folio_nrpages(mapping) << PAGE_SHIFT; >> +} >> + >>   /** >>    * mapping_align_index() - Align index for this mapping. >>    * @mapping: The address_space. >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 765dc5ef6d5a..56a8656b6f86 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t >> count, >>       unsigned int flags; >>       int err = 0; >>   -    /* "last_index" is the index of the page beyond the end of the read */ >> -    last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE); >> +    /* "last_index" is the index of the folio beyond the end of the read */ pedantic nit: I think you actually mean "the index of the first page within the first minimum-sized folio beyond the end of the read"? Thanks, Ryan >> +    last_index = round_up(iocb->ki_pos + count, >> mapping_min_folio_nrbytes(mapping)); >> +    last_index >>= PAGE_SHIFT; >>   retry: >>       if (fatal_signal_pending(current)) >>           return -EINTR; > >