From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39519D2E9F0 for ; Mon, 11 Nov 2024 14:28:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7E636B007B; Mon, 11 Nov 2024 09:28:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2E996B0083; Mon, 11 Nov 2024 09:28:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F5896B0085; Mon, 11 Nov 2024 09:28:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 823356B007B for ; Mon, 11 Nov 2024 09:28:49 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E1B3612195C for ; Mon, 11 Nov 2024 14:28:48 +0000 (UTC) X-FDA: 82774044024.25.D4ADF6D Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf06.hostedemail.com (Postfix) with ESMTP id 2B4BC180005 for ; Mon, 11 Nov 2024 14:28:17 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lif+YqS7; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731335152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tTMwMwx74ZNUjjbl+IHSZpLW1ImjYvv0M8qHg/IUW+M=; b=EdIzh6tWRWf5NmUKXIfRufK/ditIcessZ+oDZYT/7P8cJghPzkAyKqRFhXV/vh+KnhjGOw j5NRIBSumcCiEOc3Str0dyKSeCsEWpZsbFseqp2tT0tDe42BBVgj8lE98OlR6p4zvES03R yrRZqXU4KYz4rpV8yC3XOI3Z6BY4cDA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lif+YqS7; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731335152; a=rsa-sha256; cv=none; b=w92QxlGECk5mgHUJC6P/MGYhPhOWZGFBeHE3XVC8kCuLixsXxU/xcAfLbd3Sh4YJrhJmC3 G7fvotpLy9GiW6bG8HT4QDE8SmpcFGFZ4wORfhCNBKNWJkaXqr6MG4o72OyilYBmEeo6+B 0ziBWu+WmW/Z6s2EfzwdYKP4sSOFpeo= Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-6e38ebcc0abso46652657b3.2 for ; Mon, 11 Nov 2024 06:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731335326; x=1731940126; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tTMwMwx74ZNUjjbl+IHSZpLW1ImjYvv0M8qHg/IUW+M=; b=Lif+YqS7f4E3U4q3Vmn699pDXOdhiqaBvUIx4Q//rZVl3eMLL4YSAOkGNZhT43uA5K Jvxwy1/jGHdzi7+1a7a8uYqBeOq0RifK4XayF5rIK7rWgzT9GCXaInbQbfB/HsmsEvQT QnY3fOLeblFGfALE1DX4s0izMH1PNPDw2pAUKX8We7KC68KoxuPci/uuXWsCjs7Zempv MSJQfeO40GmEgfP0S8pbuQzs8X66MGraumI3PhzX5If2VfDhPRFXfLzXwOXXGxj0Yxhw 6OET4kTp/tWLgxEQ2rdhqreinALpPLI3dA+KObQTrHbCsKF5OYtMp+LzzfRbkC0S6Pwd /bqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731335326; x=1731940126; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tTMwMwx74ZNUjjbl+IHSZpLW1ImjYvv0M8qHg/IUW+M=; b=gBhR8GQ/sjzgvO0y2b9LnenNbOn/gpOHy7stCV8kiYvu5b9sRf2sg4iHi8g1heNUv0 TZXc3zmd93k/2vrQtTGhpaq9Rb0Btoz1DSBNzI4x/QVvh8YUuQuvOJjc3zuuxDxkbQL2 5z/wg6KF4AnE3uQ+umVERUH+Gx2ToqyUGStWihOInq8BzbQVKqqXDSG3dJOGn2ghGlax tiIg3UZtJ1Z0R+4TYoEr7tazgU32/9qqlvpUg8MxlqGgWvpknQoEZGkaJQO6vkpeRQhd 1ToYvBb0lD0zUjd13xI9A+L79bBJxQjcu4fGzxNrwedJ1jWNhNh5Ujfr5mna1rDYGeJf 6wXw== X-Forwarded-Encrypted: i=1; AJvYcCVsJjqcr7NtARBZ+mak7hEqiyNuPl7LLRipenV7aIfyDU1dDmQbc+N+1L4MZiMTlsSQbmSskA89Wg==@kvack.org X-Gm-Message-State: AOJu0YzlErTpMT1DlrhsxnCCr8scfgq0in2zPslGVtNMushS2reof1xB 4KiZ1WS+k23SAbEj8Y0cZ0ZYSwE9FfZV6Ysa7BRrnvyDbJ9HLoBbMqX0QqteJdVL+h5kKMRyNkn zkEGp/63VprB8VUNlCmsZqQ0y2Ww= X-Google-Smtp-Source: AGHT+IE8xROkVGciyN1boA9K5AnO0Tbm7acf/oCXyXs2y8tb44Th2b+L5kBmSx7q8KZTclcc6JEy6uuk7GmGYIo4ywU= X-Received: by 2002:a05:690c:d83:b0:6e7:f98e:12dc with SMTP id 00721157ae682-6eaddd870c5mr116427167b3.9.1731335325923; Mon, 11 Nov 2024 06:28:45 -0800 (PST) MIME-Version: 1.0 References: <20241108141710.9721-1-laoar.shao@gmail.com> <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com> In-Reply-To: <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com> From: Yafang Shao Date: Mon, 11 Nov 2024 22:28:09 +0800 Message-ID: Subject: Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead To: David Hildenbrand Cc: willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2B4BC180005 X-Stat-Signature: mpqaoqkg6ih4nfbtfucd9skaf47oysjo X-Rspam-User: X-HE-Tag: 1731335297-33640 X-HE-Meta: U2FsdGVkX1/C/VEikRfSfAKhntedfnDaydWsjCiEUEj93Vki6eIUo1fo0EmQSN8gaRT1U4unqU2wSJyT5hETznpC5PWlE6zPDD/rwTHKVOHSL2Seg9pZr8WGZosELRmAOtr4QRYr2lqu0HFhZkZQq5EL4m8Sk8WnvK9DV52d2qz7mURknbvlt/z5jdJulVj+/9NPnKYtv65QQmVNHfKT5X8/lqV/t+lwqZpemjlHDxV4+XcagS3zgVZjO1sDfKsh00AAQ4UclTw8D1Ds6PEbhfu8IAXVZlXxZGs462Y9s/FyLF3jYXSye64W8Of4ZS4uxN6lrT9GzKhl2i/0GoehStx1ij4izqfo8xyfeUSZiAUHJQSbsQK+dhFNO4k+TS5U6/cJVH/W60SwYwxl2Pyoo6Hp92bzez7vk22BTB48ekscQu1DOIESfgM6BklxvYAxwlUclfhMb1Pna3gzOQ2nGjqR7en6MNzNikX7PHa78Ql+XPW+6+QcaZXNSq2mc3AracMzGZ2TFbWvF06DrUurfva3R2ZCuzt6FD2Wf8WCvDQw9tDlLJM4ma/AO4W9n/baQFHpLxSeKn/FVtm1+Uz8txDSEB/kPCnjqVGrZnJFz65wjfF31phSRgATH6QK2TBPFeowHcjteleJt32NbBEN8S5ohGbzW6p8a4oqliApIETo7f7dvsJpkeakSwfXIdsa0S2/WmR5spGiE8mXhmggiZECWnLu96Ce7hlrXInmfZIKlXtb/nSedzrqykIZ0qONBKbyvuCs/2XyOp7crV0x3NhCi97M2vzgT6qwFB70X8IiPEZAsOwiK8CAXYaQiWqotoI1k/4EJ9BfYfWqRl3tNb3GsylIHA0N6yqBUvF02J+Nrhb5kJi4S3Mfv9Bl1RZm9/sfGFzx7ijjg2K/3eHhnjVtSZKtP3o+xIrP0jnhykhd/G2tsmUiloysGK3C77a994zPKnKK2eliTFFloLv YqtFlG3m hVAvUM58cRCap1cthnlBFBYrkVlWfX4o7mL0xODtY3/L2xtpKNaVeLKQPX30gUPB+shbFfQ0W0gEZDk6ZWxPLvv6TXmQLSMzcviPPMe2EsoBmxGyzzaRmlfMNDsLQ791kBqqhbL5FcOiFGqJXq2wujkOdSccl9s2AgrnpdXF0doIOOpigu5rqGv/xa81QL1frCGNqvkvujueIhH4Mat+LVa4ykvsL4XIHsjk7i1ZcRa+hBXIbVuuMIFbbM3oDTeNbAalSyHH6zis6xkEzZXQzFz0JtMU463peSAdDl4+2VeRvME7W9uKxjMr+h3wgxqchA2nYhMAPi5qQvXGsgF21gnQJwjxtNiL7R5f8oEEpxjezYL+qO1N6CbatOuuLKRgymx6JoZEMPfX/j6PPljXTxAk+8kPJMWGtEEJT88M/7qZjYbhYhKf09s2k0LfY9qZYvfRC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 6:33=E2=80=AFPM David Hildenbrand wrote: > > On 08.11.24 15:17, Yafang Shao wrote: > > When testing large folio support with XFS on our servers, we observed t= hat > > only a few large folios are mapped when reading large files via mmap. > > After a thorough analysis, I identified it was caused by the > > `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this > > parameter is set to 128KB. After I tune it to 2MB, the large folio can > > work as expected. However, I believe the large folio behavior should no= t be > > dependent on the value of read_ahead_kb. It would be more robust if the > > kernel can automatically adopt to it. > > Now I am extremely confused. > > Documentation/ABI/stable/sysfs-block: > > "[RW] Maximum number of kilobytes to read-ahead for filesystems on this > block device." > > > So, with your patch, will we also be changing the readahead size to > exceed that, or simply allocate larger folios and not exceeding the > readahead size (e.g., leaving them partially non-filled)? Exceeding the readahead size for the MADV_HUGEPAGE case is straightforward; this is what the current patch accomplishes. > > If you're also changing the readahead behavior to exceed the > configuration parameter it would sound to me like "I am pushing the > brake pedal and my care brakes; fix the brakes to adopt whether to brake > automatically" :) > > Likely I am missing something here, and how the read_ahead_kb parameter > is used after your patch. The read_ahead_kb parameter continues to function for non-MADV_HUGEPAGE scenarios, whereas special handling is required for the MADV_HUGEPAGE case. It appears that we ought to update the Documentation/ABI/stable/sysfs-block to reflect the changes related to large folios, correct? > > > > > > With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a > > sequential read on a 1GB file using MADV_HUGEPAGE, the differences in > > /proc/meminfo are as follows: > > > > - before this patch > > FileHugePages: 18432 kB > > FilePmdMapped: 4096 kB > > > > - after this patch > > FileHugePages: 1067008 kB > > FilePmdMapped: 1048576 kB > > > > This shows that after applying the patch, the entire 1GB file is mapped= to > > huge pages. The stable list is CCed, as without this patch, large folio= s > > don=E2=80=99t function optimally in the readahead path. > >> It's worth noting that if read_ahead_kb is set to a larger value > that isn't > > aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to = map > > to hugepages. > > > > Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings= ") > > Suggested-by: Matthew Wilcox > > Signed-off-by: Yafang Shao > > Cc: stable@vger.kernel.org > > > > --- > > mm/readahead.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > Changes: > > v1->v2: > > - Drop the align (Matthew) > > - Improve commit log (Andrew) > > > > RFC->v1: https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.s= hao@gmail.com/ > > - Simplify the code as suggested by Matthew > > > > RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao= @gmail.com/ > > > > diff --git a/mm/readahead.c b/mm/readahead.c > > index 3dc6c7a128dd..9b8a48e736c6 100644 > > --- a/mm/readahead.c > > +++ b/mm/readahead.c > > @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_r= a_state *ra, > > return 4 * cur; > > if (cur <=3D max / 2) > > return 2 * cur; > > + if (cur > max) > > + return cur; > > return max; > > Maybe something like > > return max_t(unsigned long, cur, max); > > might be more readable (likely "max()" cannot be used because of the > local variable name "max" ...). > > > ... but it's rather weird having a "max" and then returning something > larger than the "max" ... especially with code like Indeed, that could lead to confusion ;) > > "ra->size =3D get_next_ra_size(ra, max_pages);" > > > Maybe we can improve that by renaming "max_pages" / "max" to what it > actually is supposed to be (which I haven't quite understood yet). Perhaps a more straightforward solution would be to implement it directly at the callsite, as demonstrated below? diff --git a/mm/readahead.c b/mm/readahead.c index 3dc6c7a128dd..187efae95b02 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -642,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *rac= tl, 1UL << order); if (index =3D=3D expected) { ra->start +=3D ra->size; - ra->size =3D get_next_ra_size(ra, max_pages); + /* + * Allow the actual size to exceed the readahead window for= a + * large folio. + */ + ra->size =3D get_next_ra_size(ra, max(max_pages, ra->size))= ; ra->async_size =3D ra->size; goto readit; } -- Regards Yafang