From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32E5BC83F1A for ; Mon, 14 Jul 2025 22:34:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A53738D0002; Mon, 14 Jul 2025 18:34:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2B7C8D0001; Mon, 14 Jul 2025 18:34:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91A108D0002; Mon, 14 Jul 2025 18:34:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7F2198D0001 for ; Mon, 14 Jul 2025 18:34:18 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2847A10C5EC for ; Mon, 14 Jul 2025 22:34:18 +0000 (UTC) X-FDA: 83664324996.25.7FEED10 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf15.hostedemail.com (Postfix) with ESMTP id 2F294A0008 for ; Mon, 14 Jul 2025 22:34:15 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="I+4R/EPI"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of klarasmodin@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=klarasmodin@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752532456; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/lsfIzpS/FedAznKrsoeJxLXMye9IFP7RjuGV3Aj11E=; b=tWjaIn+aF2jY+6eLq/qWmoIcio176GCvd1geuDceZvkBgQxQdw68zVdRxQzdwaVsd/ALAW QkZnG+hDAI2TfBHn55y/7jWQVp1K0KZaDuKKXXWwTzbtp4xLROWQ1GExkD6hS+0Pdw3e3q M9T02RxXvxPL68L2Sb06gwdMJkb5XdQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752532456; a=rsa-sha256; cv=none; b=T2XPEvU2fu+Q8FHS9cyHTF0O94KNil+Y4SFZmx3OemBwK7U47Sspnee5NoYm7yzHkP5hIF WLG+3BgCM0oFfiNzydZtbaHQ9ZCkdmoHlGT3cubQhZLcCnTyByA9GlNh68opXlo7wXKLat l+FN/pJzAn7Gt1sXituQFf1Zc9F1Ohs= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="I+4R/EPI"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of klarasmodin@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=klarasmodin@gmail.com Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-32ce1b2188dso43031721fa.3 for ; Mon, 14 Jul 2025 15:34:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752532454; x=1753137254; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/lsfIzpS/FedAznKrsoeJxLXMye9IFP7RjuGV3Aj11E=; b=I+4R/EPIG1tOfCsZWENheMBvyVUAZr8dFrKMwkcLQmpzRPGS6cDF2smAxMSKET6VZ2 w79FwDspkL4lezW0o/cboHGNUy0Z8rsl/WJLARxbPtdPwuEBNJFjs4kGpTpuTir9MSt0 LXodfGg8WXGd2+gSSaVcp+Rh9R5iazDVS6kd0Hn4Ys7vQ4g6caHZ9nxpI+eKELNrIBM5 d5NGHVhe48jwKMebLSAxR55F/+VYFKIkfZCFReLo40K6iyBh63+XCmp/FXsUUvakqSL8 vilwpZsZ8Sm1GJp/FFNml5y0C4KBdYRNIoqvX71pKP11Gd7aNOIli2W509BlPhHsXTNX VgBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752532454; x=1753137254; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/lsfIzpS/FedAznKrsoeJxLXMye9IFP7RjuGV3Aj11E=; b=jn9d+m66as1AefcSYQQUR1zILsCIbNWOAShYBswV4lWgaJqYpPKbzAWt/rJocbUvJs p1AR1ngRK8/AiaPAvZeUZeYHq3UB+jmlWflmM+ksDuviuA5iFQ88dc26Qh8vGQhJlyef JkDGpKlijRHArPTutnGT4Wa5VVbP8FJYa6vfgKIwgZ7pv8UbqvxX4VpKQA2e+3BALthu fNJv+KT4VZsqRk04jkCLAGvkCuhXnoyFCGfdCmt5GhKNzMzD7yeEl23bw1suasNOaBsr TzoIiGpY80oBS5cGteYWcpiEvzoq72bpl6C1DFxBy6RNRAJljQPFqamFb7XfEy/vV5FN hm6A== X-Forwarded-Encrypted: i=1; AJvYcCWMJQrx4oUmfkorFfytBwC0lmveUnlt3l7gzVceVdSVjnnbc+mymUtk8Pq/3z+O4Db1lvv9/kac5Q==@kvack.org X-Gm-Message-State: AOJu0YznhvOCNpbCqZfi1GlSLeVFJHzyq38ZJbIBlmY47ICul1zapbex lzm9de+4gAiHtY2uYT1J85gMmO/1G+erUx16BAfwUUhAQosmWIzcTmyd X-Gm-Gg: ASbGncvz9CN4245OoNIP99ZrWeVl4COJ3JJiKiLgfinPoYhSy5/HL9G8cbV8O9s2OeM WHqVbjUBelilrAyG4ui8OP+DsJx9q5V3TM9Sd5i6UpWWspQiqQv6nGQ4H9HPoLoTM96MwyiB6ib uZIBEJsjJCmnReVb3RLX/WH1fqLSsGrxgQ6k3t5xH8oXixXrgImKEhnDXrOuYEEv2+dpbn0ULgW Nzr1zko0PNEsBcRkMeX0Nr2q8XdWVZejhTcD6N6NEY6TU14yVpGR87TrRN24X3ZV5FKxH85xB2A 7TmcJS83H5ZJeD9U76nyJDpNZOXGjwIlEegZZ3vMHwUef9Ukx3R0rqmzetOp0Kcdi6XP+LqMPMb Go+Bb1Z0TX17ueGmIOF6Fr/uy1ORWWU/pnQOi2g== X-Google-Smtp-Source: AGHT+IFbEs0ZIkKVf8Yeir4vPF7NLzO/D3Zi0Tryd7KVEWog+DBD5fk2bx5TP047Ps/J9OHvdNCFLA== X-Received: by 2002:a05:651c:1a0c:b0:32b:7ddd:278d with SMTP id 38308e7fff4ca-330532d0515mr34780581fa.3.1752532453644; Mon, 14 Jul 2025 15:34:13 -0700 (PDT) Received: from localhost (soda.int.kasm.eu. [2001:678:a5c:1202:4fb5:f16a:579c:6dcb]) by smtp.gmail.com with UTF8SMTPSA id 38308e7fff4ca-32fa294bbf3sm16610651fa.49.2025.07.14.15.34.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jul 2025 15:34:13 -0700 (PDT) Date: Tue, 15 Jul 2025 00:34:12 +0200 From: Klara Modin To: Youling Tang Cc: Matthew Wilcox , Andrew Morton , Jan Kara , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chizhiling@163.com, Youling Tang , Chi Zhiling Subject: Re: [PATCH] mm/filemap: Align last_index to folio size Message-ID: References: <20250711055509.91587-1-youling.tang@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250711055509.91587-1-youling.tang@linux.dev> X-Rspamd-Queue-Id: 2F294A0008 X-Stat-Signature: g4t8tdnbuqcaqd9a6ufoy5pn33d9t548 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1752532455-282049 X-HE-Meta: U2FsdGVkX19Pc8YjtvNojLtjcsezSaO6RWlbRt8QZxVTy5AW+AtQCewJw334pZ+ubsXjagZUrsunlMtuoiE5zfKRn/t8tfz7Uk3fBa74R8P9weg7eznpK/7MWjOYtJVWV6giDohU82EgsL9WRx8Csbo2ym6JNANQhkZAY0ECml8+vuxhUgsbRVO24ljzAtNopgYuXPEg3oWnn8JBXWZP7ehADzcSNsgDhzH9hnQDESkM135S2osCb2Mo09uO+3rjlbzx5tFf2BVL5MmJruWpmBUS05Ul7qNUg245Y+bwRha2dG3RuGvhTBWoFHuSbntF60gzvDENPCYI9vXbeVXvl0N9sr5MbApdqApd/fdIuW+aO/U2L2d9XJapmfiRyi1Uq8+w4cUI+Xq8C+XKDV+tkzsqefEioeHzR3w0CORj/SVIeMKxFhnGi7UKhUMWoNcldssnwx65ffuC8nfYplv98V44a+EpBq6UcworqQ2P+AR5NfgxZSfKenwpDbldLWl/E+F6wqUCSrDf3gh0eCcBjgR6Po1fXhjt0cIYeoNRHKbi1zQbYCxk9H/9uEJbwREFFnwh8+AqByFTNR10/qElCVOf1GpXhKEff8yixRCvnBrHAkfozFB5/9gRIx+NIb/wQjbNh5vcV+FKiOyhwLqHud4nj/HYn+VtnREYUYhqHvyhV3wwDBnya/TGGirw2DSZsRwcFza/AcVi0nYv2aaDb57VqzN7YI1OUj5negbwrybSM7h3KynIrqPRuIqc1eNngqcUTUtVN8RhlVifGD+gcPeY3i9BbFjjpUweJ09wtEVNOEGB3ZNgedWfrdAhmrIasdrDi69EK50FP5bJBCARVD5/MtGqNiPzB95DgkskHk+oJh0N/LV27qagtX8I55oGe/Qx7NJgmNMpvOOHt1x5NhWLTz3Pqx38amFRZl5sYYM9HRlnG0uPqMj6nZKgg9vOHFCtD3g0ABNaamN3FgQ eXOhhJhC bDE9OzL7XoVRzipm44sf5RQDqFVb8tBr1HVso5GQEvP3+JsrH7DPRg4WAzcuzendoyOmP3c/2i6uPryqNml2xmVhAr3qJlBa0As/9io27xlGBab+8icndaxTs0iMSSu6ACYqYgqs6uRcn86hu/iu+6maHcKfhj/8QOG4RnzkKlQQk1tDvkQ3CabJufCU5QtZKYFGczSKq5Hu9R+DuK7L6qGxCgSr/hefUe491S5Nv7tbCNQ/wXKukWLn2+gtey7OU4MLHPUxbBtqbH03DEDqEGBntS0nBdMdvXDehDRAV9CTIaoXw7wpbSeoF8wl2z9GkWU5bkklhQeGUZ8TZoP7uqSszOsot4nzRGgPfwoKY+Yl6E39nuNGTec6cVLPT7ArVA9+Lnqu2L6M4hfjogltfYa6KtHgRxfWK5wsVe4s474FJVIA/BbMir1lE7ZCH/MTV5ztpJxHmBVQazod1Zux8iwQ1gzALkg7DGczstw4aqvHIAxE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 2025-07-11 13:55:09 +0800, Youling Tang wrote: > From: Youling Tang > > On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE > enabled, We observed the following readahead behaviors: > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=64k count=1 > # ./tools/mm/page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136d4c __RU_l_________H______t_________________F_1 > 1 136d4d __RU_l__________T_____t_________________F_1 > 2 136d4e __RU_l__________T_____t_________________F_1 > 3 136d4f __RU_l__________T_____t_________________F_1 > ... > c 136bb8 __RU_l_________H______t_________________F_1 > d 136bb9 __RU_l__________T_____t_________________F_1 > e 136bba __RU_l__________T_____t_________________F_1 > f 136bbb __RU_l__________T_____t_________________F_1 <-- first read > 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag > 11 13c2cd ___U_l__________T_____t______________I__F_1 > 12 13c2ce ___U_l__________T_____t______________I__F_1 > 13 13c2cf ___U_l__________T_____t______________I__F_1 > ... > 1c 1405d4 ___U_l_________H______t_________________F_1 > 1d 1405d5 ___U_l__________T_____t_________________F_1 > 1e 1405d6 ___U_l__________T_____t_________________F_1 > 1f 1405d7 ___U_l__________T_____t_________________F_1 > [ra_size = 32, req_count = 16, async_size = 16] > > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=60k count=1 > # ./page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136048 __RU_l_________H______t_________________F_1 > ... > c 110a40 __RU_l_________H______t_________________F_1 > d 110a41 __RU_l__________T_____t_________________F_1 > e 110a42 __RU_l__________T_____t_________________F_1 <-- first read > f 110a43 __RU_l__________T_____t_________________F_1 <-- first readahead flag > 10 13e7a8 ___U_l_________H______t_________________F_1 > ... > 20 137a00 ___U_l_________H______t_______P______I__F_1 <-- second readahead flag (20 - 2f) > 21 137a01 ___U_l__________T_____t_______P______I__F_1 > ... > 3f 10d4af ___U_l__________T_____t_______P_________F_1 > [first readahead: ra_size = 32, req_count = 15, async_size = 17] > > When reading 64k data (same for 61-63k range, where last_index is page-aligned > in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra() > and the PG_readahead flag is set on the next folio (the one containing 0x10 page). > > When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra(). > However, in this case the readahead flag is set on the 0xf page. Although the > requested read size (req_count) is 60k, the actual read will be aligned to > folio size (64k), which triggers the readahead flag and initiates asynchronous > readahead via page_cache_async_ra(). This results in two readahead operations > totaling 256k. > > The root cause is that when the requested size is smaller than the actual read > size (due to folio alignment), it triggers asynchronous readahead. By changing > last_index alignment from page size to folio size, we ensure the requested size > matches the actual read size, preventing the case where a single read operation > triggers two readahead operations. > > After applying the patch: > # echo 3 > /proc/sys/vm/drop_caches > # dd if=test of=/dev/null bs=60k count=1 > # ./page-types -r -L -f /mnt/xfs/test > foffset offset flags > 0 136d4c __RU_l_________H______t_________________F_1 > 1 136d4d __RU_l__________T_____t_________________F_1 > 2 136d4e __RU_l__________T_____t_________________F_1 > 3 136d4f __RU_l__________T_____t_________________F_1 > ... > c 136bb8 __RU_l_________H______t_________________F_1 > d 136bb9 __RU_l__________T_____t_________________F_1 > e 136bba __RU_l__________T_____t_________________F_1 <-- first read > f 136bbb __RU_l__________T_____t_________________F_1 > 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag > 11 13c2cd ___U_l__________T_____t______________I__F_1 > 12 13c2ce ___U_l__________T_____t______________I__F_1 > 13 13c2cf ___U_l__________T_____t______________I__F_1 > ... > 1c 1405d4 ___U_l_________H______t_________________F_1 > 1d 1405d5 ___U_l__________T_____t_________________F_1 > 1e 1405d6 ___U_l__________T_____t_________________F_1 > 1f 1405d7 ___U_l__________T_____t_________________F_1 > [ra_size = 32, req_count = 16, async_size = 16] > > The same phenomenon will occur when reading from 49k to 64k. Set the readahead > flag to the next folio. > > Because the minimum order of folio in address_space equals the block size (at > least in xfs and bcachefs that already support bs > ps), having request_count > aligned to block size will not cause overread. > > Co-developed-by: Chi Zhiling > Signed-off-by: Chi Zhiling > Signed-off-by: Youling Tang I bisected boot failures on two of my 32-bit systems to this patch. > --- > include/linux/pagemap.h | 6 ++++++ > mm/filemap.c | 5 +++-- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index e63fbfbd5b0f..447bb264fd94 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -480,6 +480,12 @@ mapping_min_folio_nrpages(struct address_space *mapping) > return 1UL << mapping_min_folio_order(mapping); > } > > +static inline unsigned long > +mapping_min_folio_nrbytes(struct address_space *mapping) > +{ > + return mapping_min_folio_nrpages(mapping) << PAGE_SHIFT; > +} > + > /** > * mapping_align_index() - Align index for this mapping. > * @mapping: The address_space. > diff --git a/mm/filemap.c b/mm/filemap.c > index 765dc5ef6d5a..56a8656b6f86 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, > unsigned int flags; > int err = 0; > > - /* "last_index" is the index of the page beyond the end of the read */ > - last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE); > + /* "last_index" is the index of the folio beyond the end of the read */ > + last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping)); iocb->ki_pos is loff_t (long long) while pgoff_t is unsigned long and this overflow seems to happen in practice, resulting in last_index being before index. The following diff resolves the issue for me: diff --git a/mm/filemap.c b/mm/filemap.c index 3c071307f40e..d2902be0b845 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2585,8 +2585,8 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, int err = 0; /* "last_index" is the index of the folio beyond the end of the read */ - last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping)); - last_index >>= PAGE_SHIFT; + last_index = round_up(iocb->ki_pos + count, + mapping_min_folio_nrbytes(mapping)) >> PAGE_SHIFT; retry: if (fatal_signal_pending(current)) return -EINTR; Regards, Klara Modin > + last_index >>= PAGE_SHIFT; > retry: > if (fatal_signal_pending(current)) > return -EINTR; > -- > 2.34.1 >