From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E65FC3ABAC for ; Tue, 6 May 2025 10:45:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8DDF6B0089; Tue, 6 May 2025 06:45:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A15456B008A; Tue, 6 May 2025 06:45:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 868CE6B008C; Tue, 6 May 2025 06:45:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 619126B0089 for ; Tue, 6 May 2025 06:45:15 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 29FF3C22C4 for ; Tue, 6 May 2025 10:45:15 +0000 (UTC) X-FDA: 83412150990.07.320BFBE Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf01.hostedemail.com (Postfix) with ESMTP id D211840008 for ; Tue, 6 May 2025 10:45:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pOBELZ2t; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=k8jzTtke; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=nL8ZCuVz; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=12CBnme0; dmarc=none; spf=pass (imf01.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746528313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iiCGLW0tY7/9F1f3HXvpiEVrOlfW7GJ9J/yhsizxvEw=; b=P0UHXfYHUlum7H1I+pkuKZIqCivdRaOGL76I/1zpmybiSSQnJAmRJ1iaC9oSJKHqkIc7zR CI2Nyw4wmFzTs1GWUtDizL54y1wyt+37V4cCWS5+EEdlL7cZTzFRannrFKiNa6B+mD/fMq eBv8mzvCRx8rmge0r+LJUjD57y3d/t4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=pOBELZ2t; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=k8jzTtke; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=nL8ZCuVz; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=12CBnme0; dmarc=none; spf=pass (imf01.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746528313; a=rsa-sha256; cv=none; b=iI2egI4ICM6Q8G9tmiuVT4YswoM6AYLKg2xqNaVITw4PY4162qYMIODFx0VGQ75FDnpqJB 8sHbGAZv7++3RnTbi4CoHqeIOJwkRCZP440llm2hh+ilVSGV3vWTGZE0gJLI3gjc/I1Dho Hp9jH0lx9afbiR5K6vMcaBxMtuFLipE= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E605B1F451; Tue, 6 May 2025 10:45:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1746528311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iiCGLW0tY7/9F1f3HXvpiEVrOlfW7GJ9J/yhsizxvEw=; b=pOBELZ2t4YpI1W5z/xn1nVEpxgc5/ypti+VaIm7xcunTnfUep/uyCEDZjQ3HY0ho6jAtKL JMG4oMbxMuIzJyc18e/C83hYMH1FyXObFUKosl35d4jydUhsmVYGDU8tHSO/KBbtZ10l4K LO84wg3GlzVdeB6kax9+ZQiMSzDVBtM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1746528311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iiCGLW0tY7/9F1f3HXvpiEVrOlfW7GJ9J/yhsizxvEw=; b=k8jzTtkeIT3L/VWU1N54Yy12uhuwMHnloV5S4+EXndtJZt5sGVmbNpx8RYT/iq7BQAKHlR NJuwLbQXvnZfbXBw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1746528310; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iiCGLW0tY7/9F1f3HXvpiEVrOlfW7GJ9J/yhsizxvEw=; b=nL8ZCuVzOWvUmk2mbqUa3nxMU0ZlsWKHlWR/1ZKyWdZL6meSnj9cDjpv5sB4n6YhlZ0PfG wA7UBXn/+c3LQjBwKZ3BBrJuFDnJH7dFO4Mxv8ZRBPG2BzfnyyAncoQBYwiABMCAwEy7hi 3vaqzBWw1VM7oJsBKBMmnY++aS4f5eA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1746528310; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iiCGLW0tY7/9F1f3HXvpiEVrOlfW7GJ9J/yhsizxvEw=; b=12CBnme0+63OtfO5+0A+UthtJQeaIf2t0k5+X7/7BtONCmfyVwVyEcZCcEca7Txf5GbOZ1 Bzozn/ZVhZ3qCwCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id CD81E13687; Tue, 6 May 2025 10:45:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OCvbMTboGWhsCQAAD6G6ig (envelope-from ); Tue, 06 May 2025 10:45:10 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 6DD74A09BE; Tue, 6 May 2025 12:45:10 +0200 (CEST) Date: Tue, 6 May 2025 12:45:10 +0200 From: Jan Kara To: Ryan Roberts Cc: Jan Kara , Andrew Morton , "Matthew Wilcox (Oracle)" , Alexander Viro , Christian Brauner , David Hildenbrand , Dave Chinner , Catalin Marinas , Will Deacon , Kalesh Singh , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v4 4/5] mm/readahead: Store folio order in struct file_ra_state Message-ID: References: <20250430145920.3748738-1-ryan.roberts@arm.com> <20250430145920.3748738-5-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: b8xe5fzuqbbriuufb1bgunszjzfgefof X-Rspamd-Queue-Id: D211840008 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1746528312-859218 X-HE-Meta: U2FsdGVkX1/4ZxH2J0WvlWwrcL5Rc8k6IucTjQo0IdAVtc3CT7HkJh7nnW5ojAqFrZBKEdR538lxy7M09+jjra3syWLSjq564nAU8Nzbcrss4ZYQPgSkkqYh88wuwN5KnviH0tHx2L9OmMuCyg/NlUX4tR/kUqQ0ABQv4eJRQ1bYVMRIa26QOSPbHOXGX3pMjBWJe53jEk2zKXOEwyvBdiXIgxSTeKGgxzVu8iKSn36CUC20Yerh8j6UB476SC5eI+UmK8g64JyxghfEDhSp4URV33XnVx5o8xExX/CVhkZf6mi3Tkw1JwVjKD9uJpeHVKblvI+9w1v0IiiqkqRMzUstHTJSBJZndnaKA0YfJAV91ject2PJinGu2K9E9JphXXOK2efWtZ/rj+Egd4fIfZJWYq+FewdBQ0ZVL5JAzKAkblgF4hBQVYreMpcfrebdlRSWyLEt6j8nyzdaKlFoxIZr97Yul639e3KZy8t7OjgzaqhplNRPZNG3TCUNFlk4VTyM5w2THVenmZBE2D6wrvDPTjg+Gt6AClfJfbITyP0wBpTVGCc4ORQOgEmafZ4QVZSycMsyiLoxuOoCfIHcoQBS4TqI5R9J83czDuGOUBjh/Lm06SPImCBbNcHfZhl0Qf/NniXw8ADUCLSa4vzn9rFJqnC9jBqWXPAp7GDBVG5HVB3ILPoedVMyGZRmJCMWtICrfI1YZx4XLtsBAObmYDrryrFOUnGHa4wek/0kuw7WfCv7FQpvdUnBnUxK+MCrOsSH4M4tTkQGVIb4XtY61eE0Ki+Z8vV/q7aEvaC+UnP+y+3kLGA68c2PJmgTxh1xbgkBFJABDxJtwKwcQtMpLLgD73Lyye3QrBt67K50wUGoNbT3LmVvicckJrAxFVfUHlQmtnV0KJPgCJ2wqNHeGP5QFy5st2FJ38cJRvAOPYYTj4j90O/L4uOk/He6sSyeO3GA0tyVuSon4dvGuEC DkqiwF+2 WRsajwQ1cSG//lhauscnZEjZZHsaVN7e1HpDT6cuxpU+gYnBzIZo8+lShEKbFQ49gY7NlpcErMzhc1UaNZEXe+yjBUFREWA3J8y1CvfUScLIaGwqy64z4cqS++5MVn0lHE6xZC9FO/aLWP0nkgu6/pCc5jKqQFpDrs+09VGJIWwLyoV573kRSqyPy7ETKHtIxHvf6UBsPYWZzkMZkpoBylxQym9sbhLIdlI0TygXtqE1110ozCqGxsnPWC8azJlvbSk4gQZrO0WpTQAXi1QgUjrJm+ZjSKjaK3Jqfb7VKcnqBDDVuHncoOvKjVqZbzSv5356cdqLdvOXp6gPgshBWTxj9VNyEPAjNRfLZyKN69oOmgSI8DCiIl6jaT+GQovADbXb1fW5J7YbXIT2vM397M6bwH/lwunycnbMhsn0NF88h6x2S1ka9mkZtBrtz8dOwFM/Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 06-05-25 10:53:07, Ryan Roberts wrote: > On 05/05/2025 10:52, Jan Kara wrote: > > On Wed 30-04-25 15:59:17, Ryan Roberts wrote: > >> Previously the folio order of the previous readahead request was > >> inferred from the folio who's readahead marker was hit. But due to the > >> way we have to round to non-natural boundaries sometimes, this first > >> folio in the readahead block is often smaller than the preferred order > >> for that request. This means that for cases where the initial sync > >> readahead is poorly aligned, the folio order will ramp up much more > >> slowly. > >> > >> So instead, let's store the order in struct file_ra_state so we are not > >> affected by any required alignment. We previously made enough room in > >> the struct for a 16 order field. This should be plenty big enough since > >> we are limited to MAX_PAGECACHE_ORDER anyway, which is certainly never > >> larger than ~20. > >> > >> Since we now pass order in struct file_ra_state, page_cache_ra_order() > >> no longer needs it's new_order parameter, so let's remove that. > >> > >> Worked example: > >> > >> Here we are touching pages 17-256 sequentially just as we did in the > >> previous commit, but now that we are remembering the preferred order > >> explicitly, we no longer have the slow ramp up problem. Note > >> specifically that we no longer have 2 rounds (2x ~128K) of order-2 > >> folios: > >> > >> TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA > >> ----- ---------- ---------- ---------- ------- ------- ----- ----- -- > >> HOLE 0x00000000 0x00001000 4096 0 1 1 > >> FOLIO 0x00001000 0x00002000 4096 1 2 1 0 > >> FOLIO 0x00002000 0x00003000 4096 2 3 1 0 > >> FOLIO 0x00003000 0x00004000 4096 3 4 1 0 > >> FOLIO 0x00004000 0x00005000 4096 4 5 1 0 > >> FOLIO 0x00005000 0x00006000 4096 5 6 1 0 > >> FOLIO 0x00006000 0x00007000 4096 6 7 1 0 > >> FOLIO 0x00007000 0x00008000 4096 7 8 1 0 > >> FOLIO 0x00008000 0x00009000 4096 8 9 1 0 > >> FOLIO 0x00009000 0x0000a000 4096 9 10 1 0 > >> FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0 > >> FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0 > >> FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0 > >> FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0 > >> FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0 > >> FOLIO 0x0000f000 0x00010000 4096 15 16 1 0 > >> FOLIO 0x00010000 0x00011000 4096 16 17 1 0 > >> FOLIO 0x00011000 0x00012000 4096 17 18 1 0 > >> FOLIO 0x00012000 0x00013000 4096 18 19 1 0 > >> FOLIO 0x00013000 0x00014000 4096 19 20 1 0 > >> FOLIO 0x00014000 0x00015000 4096 20 21 1 0 > >> FOLIO 0x00015000 0x00016000 4096 21 22 1 0 > >> FOLIO 0x00016000 0x00017000 4096 22 23 1 0 > >> FOLIO 0x00017000 0x00018000 4096 23 24 1 0 > >> FOLIO 0x00018000 0x00019000 4096 24 25 1 0 > >> FOLIO 0x00019000 0x0001a000 4096 25 26 1 0 > >> FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0 > >> FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0 > >> FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0 > >> FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0 > >> FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0 > >> FOLIO 0x0001f000 0x00020000 4096 31 32 1 0 > >> FOLIO 0x00020000 0x00021000 4096 32 33 1 0 > >> FOLIO 0x00021000 0x00022000 4096 33 34 1 0 > >> FOLIO 0x00022000 0x00024000 8192 34 36 2 1 > >> FOLIO 0x00024000 0x00028000 16384 36 40 4 2 > >> FOLIO 0x00028000 0x0002c000 16384 40 44 4 2 > >> FOLIO 0x0002c000 0x00030000 16384 44 48 4 2 > >> FOLIO 0x00030000 0x00034000 16384 48 52 4 2 > >> FOLIO 0x00034000 0x00038000 16384 52 56 4 2 > >> FOLIO 0x00038000 0x0003c000 16384 56 60 4 2 > >> FOLIO 0x0003c000 0x00040000 16384 60 64 4 2 > >> FOLIO 0x00040000 0x00050000 65536 64 80 16 4 > >> FOLIO 0x00050000 0x00060000 65536 80 96 16 4 > >> FOLIO 0x00060000 0x00080000 131072 96 128 32 5 > >> FOLIO 0x00080000 0x000a0000 131072 128 160 32 5 > >> FOLIO 0x000a0000 0x000c0000 131072 160 192 32 5 > >> FOLIO 0x000c0000 0x000e0000 131072 192 224 32 5 > >> FOLIO 0x000e0000 0x00100000 131072 224 256 32 5 > >> FOLIO 0x00100000 0x00120000 131072 256 288 32 5 > >> FOLIO 0x00120000 0x00140000 131072 288 320 32 5 Y > >> HOLE 0x00140000 0x00800000 7077888 320 2048 1728 > >> > >> Signed-off-by: Ryan Roberts > > > > ... > > > >> @@ -469,6 +469,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > >> int err = 0; > >> gfp_t gfp = readahead_gfp_mask(mapping); > >> unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping)); > >> + unsigned int new_order = ra->order; > >> > >> /* > >> * Fallback when size < min_nrpages as each folio should be > >> @@ -483,6 +484,8 @@ void page_cache_ra_order(struct readahead_control *ractl, > >> new_order = min_t(unsigned int, new_order, ilog2(ra->size)); > >> new_order = max(new_order, min_order); > >> > >> + ra->order = new_order; > >> + > >> /* See comment in page_cache_ra_unbounded() */ > >> nofs = memalloc_nofs_save(); > >> filemap_invalidate_lock_shared(mapping); > >> @@ -525,6 +528,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > >> * ->readahead() may have updated readahead window size so we have to > >> * check there's still something to read. > >> */ > >> + ra->order = 0; > > > > Hum, so you reset desired folio order if readahead hit some pre-existing > > pages in the page cache. Is this really desirable? Why not leave the > > desired order as it was for the next request? > > My aim was to not let order grow unbounded. When the filesystem doesn't support > large folios we end up here (from the "goto fallback") and without this, order > will just grow and grow (perhaps it doesn't matter though). I think we should > keep this. Yes, I agree that should be kept. > > But I guess your point is that we can also end up here when the filesystem does > support large folios but there is an error. In thta case, yes, I'll change to > not reset order to 0; it has already been fixed up earlier in this path. Right. > How's this: > > ---8<--- > diff --git a/mm/readahead.c b/mm/readahead.c > index 18972bc34861..0054ca18a815 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -475,8 +475,10 @@ void page_cache_ra_order(struct readahead_control *ractl, > * Fallback when size < min_nrpages as each folio should be > * at least min_nrpages anyway. > */ > - if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) > + if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) { > + ra->order = 0; > goto fallback; > + } > > limit = min(limit, index + ra->size - 1); > > @@ -528,7 +530,6 @@ void page_cache_ra_order(struct readahead_control *ractl, > * ->readahead() may have updated readahead window size so we have to > * check there's still something to read. > */ > - ra->order = 0; > if (ra->size > index - start) > do_page_cache_ra(ractl, ra->size - (index - start), > ra->async_size); Yes, this looks good to me. Honza -- Jan Kara SUSE Labs, CR