From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 197A9C3ABAA for ; Mon, 5 May 2025 16:14:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF24C6B0085; Mon, 5 May 2025 12:14:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA1666B0089; Mon, 5 May 2025 12:14:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A43396B008A; Mon, 5 May 2025 12:14:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8467B6B0085 for ; Mon, 5 May 2025 12:14:09 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ACF23801F6 for ; Mon, 5 May 2025 16:14:10 +0000 (UTC) X-FDA: 83409351060.13.A4585BB Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf13.hostedemail.com (Postfix) with ESMTP id 55EAC2000E for ; Mon, 5 May 2025 16:14:08 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=oKHuvvPg; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DtUluaRr; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=oKHuvvPg; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DtUluaRr; dmarc=none; spf=pass (imf13.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746461648; a=rsa-sha256; cv=none; b=XwCqo7xtogj4uU+nPeD82lDmJoQxPO/rSBt68rg1OVPt2ya03TPzsoQ13snkVQ7V6awb21 uUItLegjDhSmvLOrBIeOF8uIuVw34MZUp24pwYhRaWTMQDzeYqkk5aKLTk/ytVEhMac1ht JEDFuiMU70snqPSiJpZyO6uHTx23G7I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=oKHuvvPg; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DtUluaRr; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=oKHuvvPg; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DtUluaRr; dmarc=none; spf=pass (imf13.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746461648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OjHxQbu9HdAgaWrOliNNorCvOEEPBNiLd69xje/6Noc=; b=5b13zBNxwMupIMKhP09Cofe4xuzPJyfE8MCyk0qB2VhEMC/zpxp67ReaD9REAJUV88GEwn 2O4WflRBWf4WKzv9BpftaVaOBKAT8bNsp2Wy8iVTQeMfjmzFIKgQC13mcVdMWIBVnmE+SY 12fFuk4H+0dfITm0iH5jY+mP6aX8BzM= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6464C21995; Mon, 5 May 2025 16:14:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1746461646; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OjHxQbu9HdAgaWrOliNNorCvOEEPBNiLd69xje/6Noc=; b=oKHuvvPgCfHd6aphoHyFtSKX5wL8YXjEQyJmjX4DBH/8Uj/Eb7mNCdyaXpyJeTE0AIZm9u Ja39d3mxkzXe1VxmKljl7MhtDC7jRMLxf39OPsypfK8L3c42zcjXIa5ZeP3PGQKy1i5Zbp eoERvZ9O+v9eZWuzfTl9dqbdnEsDcXU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1746461646; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OjHxQbu9HdAgaWrOliNNorCvOEEPBNiLd69xje/6Noc=; b=DtUluaRrH/+1PejaGWGLSaWkpIHyzVOaR40upM46+gr0w3nOWzPjgEhypyo74h/hRGjecQ lstibh+yN+sP+qDw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1746461646; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OjHxQbu9HdAgaWrOliNNorCvOEEPBNiLd69xje/6Noc=; b=oKHuvvPgCfHd6aphoHyFtSKX5wL8YXjEQyJmjX4DBH/8Uj/Eb7mNCdyaXpyJeTE0AIZm9u Ja39d3mxkzXe1VxmKljl7MhtDC7jRMLxf39OPsypfK8L3c42zcjXIa5ZeP3PGQKy1i5Zbp eoERvZ9O+v9eZWuzfTl9dqbdnEsDcXU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1746461646; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OjHxQbu9HdAgaWrOliNNorCvOEEPBNiLd69xje/6Noc=; b=DtUluaRrH/+1PejaGWGLSaWkpIHyzVOaR40upM46+gr0w3nOWzPjgEhypyo74h/hRGjecQ lstibh+yN+sP+qDw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 44B3E13883; Mon, 5 May 2025 16:14:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id V7m+EM7jGGgqawAAD6G6ig (envelope-from ); Mon, 05 May 2025 16:14:06 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id E34D5A0948; Mon, 5 May 2025 18:14:05 +0200 (CEST) Date: Mon, 5 May 2025 18:14:05 +0200 From: Jan Kara To: Ryan Roberts Cc: David Hildenbrand , Jan Kara , Andrew Morton , "Matthew Wilcox (Oracle)" , Alexander Viro , Christian Brauner , Dave Chinner , Catalin Marinas , Will Deacon , Kalesh Singh , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v4 1/5] mm/readahead: Honour new_order in page_cache_ra_order() Message-ID: References: <20250430145920.3748738-1-ryan.roberts@arm.com> <20250430145920.3748738-2-ryan.roberts@arm.com> <48b4aa79-943b-46bc-ac24-604fdf998566@redhat.com> <12a74640-753a-4116-90f9-42ec0337f751@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <12a74640-753a-4116-90f9-42ec0337f751@arm.com> X-Rspamd-Action: no action X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 55EAC2000E X-Stat-Signature: qugz6t5s654q7ex4r4bhq35g63h51ozb X-Rspam-User: X-HE-Tag: 1746461648-261266 X-HE-Meta: U2FsdGVkX1/MO4M8e4MxMy6UA1B4l/dUPVMbr3HGP6/4s4xuqs4oa+Ag0O1QbwAWJIb/QhnUoMgRSv1zvj8Jy5yl3w3bx1mfhoqM961JGBZGNTmiBz/it05K0elm0T6csZQDMAEBGCORWGI5BaBlXXX6EM4vWRE+N+I7li8R2E7touagRXihWRoMeEzh9oP0L33aiOMwsCQxL0wjXuejjGyprddIIvC0luAF8vbl0Y7H1SxdAOmYDHzCX+hl1o3OwsmYN10SBm9/HAYnz64pbOYvu9H42e3P2R9ylF2GGNd08CTWzHbWqjg2couteadrg3F6wIwiew5Z4Ptn7+Q8ELbRWRnLfPI+rn94+xN9y1tbDMwwza+DtsjeKDau9EmOSy9zGJocmXTtzpIKKWYZga1L2X9CLUrvYbTBTYZHAr9VjlF4la/ruKAxDpm45JbsFtZtLEXqYsfnpXnwGbHosLMAO4vg2JhnMzCLia/o5CG/tejN1gUvPkUaoPwdbbT6RJgaDQ3OIujVlX7qfZzdqeTQLTVc3jpUeyVRUkrqFZdlCBUavgJP+91CioNVUJzWO5J4UzdBKOkpPISRoTJsqte+6zDAUd/mik0bM5BVgjSaQXkI7m42lyNAvtM+5Yvz9yxC1v2hKjoO0php4TVrPTpBn/VnH7GiqwMz0SvquRi/8EFmglK2PLrMaEB5jZEdIJabBSACRrmQB8PLc++5RCOyFGxxtKsze5ra43qxhUPOzQ/139apX+5QxKY3o2EETEqW5iqPoZ9S45Y23mjlVpbFaTj7ubeUO5kKJkvpg5l5nTuGErvJqem0CI7+gLK5YJiiUdb4RNDG37eZWk9jrFtdnuoO8rNDfV2f+my7YKQlTxKdKEaeyx/+U+Ya4hiahiTzKo5XYgp+J0yrT04HI4p5bNO1iV3AlzaSZ9pdXaPja9Sq8npl90t3JUpawN7/BX9HyT5EShPEz08ytf2 v9gU7LyO yIc64oXLGpButh5YACdTEVSQguaKVzncxaszAtq5WkY3WD9kjZGAtyP8qOmLLbEtM+jpQYs9fxk8qXiBtfIBLM6FbOW+nz7w/fLVvJzVQhX1z2KkJ+iwi+RjJkiYu4Xx+baJelBkqXR5z9iElPouTRSztvRrYxnvm2qOX+Ha/MKsaqIrMriY2xlIbtd/O5CW+siO68CyyRMWC9W/CzUKveTnA2lWAO5S7orffd14lwjBD2P9weYva8qUjg3c1uh5JiymyKQQloX0ZYIpd/w/mjfP579KmAoKiXm5QLVJnHagKMS3vCVupKh+eou0ppjTy/nthXwKw5xQHDyZi+IswRGcNsRHvv7DelT6fF9rkxKDUrnOAbBLoZOUJlKHb6OYjg3DE4k/LAMIdn8o/CZF2Wu6hfXn8Lp11fgY+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 05-05-25 13:51:48, Ryan Roberts wrote: > On 05/05/2025 11:25, David Hildenbrand wrote: > > On 05.05.25 12:09, Jan Kara wrote: > >> On Mon 05-05-25 11:51:43, David Hildenbrand wrote: > >>> On 30.04.25 16:59, Ryan Roberts wrote: > >>>> page_cache_ra_order() takes a parameter called new_order, which is > >>>> intended to express the preferred order of the folios that will be > >>>> allocated for the readahead operation. Most callers indeed call this > >>>> with their preferred new order. But page_cache_async_ra() calls it with > >>>> the preferred order of the previous readahead request (actually the > >>>> order of the folio that had the readahead marker, which may be smaller > >>>> when alignment comes into play). > >>>> > >>>> And despite the parameter name, page_cache_ra_order() always treats it > >>>> at the old order, adding 2 to it on entry. As a result, a cold readahead > >>>> always starts with order-2 folios. > >>>> > >>>> Let's fix this behaviour by always passing in the *new* order. > >>>> > >>>> Worked example: > >>>> > >>>> Prior to the change, mmaping an 8MB file and touching each page > >>>> sequentially, resulted in the following, where we start with order-2 > >>>> folios for the first 128K then ramp up to order-4 for the next 128K, > >>>> then get clamped to order-5 for the rest of the file because pa_pages is > >>>> limited to 128K: > >>>> > >>>> TYPE    STARTOFFS     ENDOFFS       SIZE  STARTPG    ENDPG   NRPG  ORDER > >>>> -----  ----------  ----------  ---------  -------  -------  -----  ----- > >>>> FOLIO  0x00000000  0x00004000      16384        0        4      4      2 > >>>> FOLIO  0x00004000  0x00008000      16384        4        8      4      2 > >>>> FOLIO  0x00008000  0x0000c000      16384        8       12      4      2 > >>>> FOLIO  0x0000c000  0x00010000      16384       12       16      4      2 > >>>> FOLIO  0x00010000  0x00014000      16384       16       20      4      2 > >>>> FOLIO  0x00014000  0x00018000      16384       20       24      4      2 > >>>> FOLIO  0x00018000  0x0001c000      16384       24       28      4      2 > >>>> FOLIO  0x0001c000  0x00020000      16384       28       32      4      2 > >>>> FOLIO  0x00020000  0x00030000      65536       32       48     16      4 > >>>> FOLIO  0x00030000  0x00040000      65536       48       64     16      4 > >>>> FOLIO  0x00040000  0x00060000     131072       64       96     32      5 > >>>> FOLIO  0x00060000  0x00080000     131072       96      128     32      5 > >>>> FOLIO  0x00080000  0x000a0000     131072      128      160     32      5 > >>>> FOLIO  0x000a0000  0x000c0000     131072      160      192     32      5 > >>> > >>> Interesting, I would have thought we'd ramp up earlier. > >>> > >>>> ... > >>>> > >>>> After the change, the same operation results in the first 128K being > >>>> order-0, then we start ramping up to order-2, -4, and finally get > >>>> clamped at order-5: > >>>> > >>>> TYPE    STARTOFFS     ENDOFFS       SIZE  STARTPG    ENDPG   NRPG  ORDER > >>>> -----  ----------  ----------  ---------  -------  -------  -----  ----- > >>>> FOLIO  0x00000000  0x00001000       4096        0        1      1      0 > >>>> FOLIO  0x00001000  0x00002000       4096        1        2      1      0 > >>>> FOLIO  0x00002000  0x00003000       4096        2        3      1      0 > >>>> FOLIO  0x00003000  0x00004000       4096        3        4      1      0 > >>>> FOLIO  0x00004000  0x00005000       4096        4        5      1      0 > >>>> FOLIO  0x00005000  0x00006000       4096        5        6      1      0 > >>>> FOLIO  0x00006000  0x00007000       4096        6        7      1      0 > >>>> FOLIO  0x00007000  0x00008000       4096        7        8      1      0 > >>>> FOLIO  0x00008000  0x00009000       4096        8        9      1      0 > >>>> FOLIO  0x00009000  0x0000a000       4096        9       10      1      0 > >>>> FOLIO  0x0000a000  0x0000b000       4096       10       11      1      0 > >>>> FOLIO  0x0000b000  0x0000c000       4096       11       12      1      0 > >>>> FOLIO  0x0000c000  0x0000d000       4096       12       13      1      0 > >>>> FOLIO  0x0000d000  0x0000e000       4096       13       14      1      0 > >>>> FOLIO  0x0000e000  0x0000f000       4096       14       15      1      0 > >>>> FOLIO  0x0000f000  0x00010000       4096       15       16      1      0 > >>>> FOLIO  0x00010000  0x00011000       4096       16       17      1      0 > >>>> FOLIO  0x00011000  0x00012000       4096       17       18      1      0 > >>>> FOLIO  0x00012000  0x00013000       4096       18       19      1      0 > >>>> FOLIO  0x00013000  0x00014000       4096       19       20      1      0 > >>>> FOLIO  0x00014000  0x00015000       4096       20       21      1      0 > >>>> FOLIO  0x00015000  0x00016000       4096       21       22      1      0 > >>>> FOLIO  0x00016000  0x00017000       4096       22       23      1      0 > >>>> FOLIO  0x00017000  0x00018000       4096       23       24      1      0 > >>>> FOLIO  0x00018000  0x00019000       4096       24       25      1      0 > >>>> FOLIO  0x00019000  0x0001a000       4096       25       26      1      0 > >>>> FOLIO  0x0001a000  0x0001b000       4096       26       27      1      0 > >>>> FOLIO  0x0001b000  0x0001c000       4096       27       28      1      0 > >>>> FOLIO  0x0001c000  0x0001d000       4096       28       29      1      0 > >>>> FOLIO  0x0001d000  0x0001e000       4096       29       30      1      0 > >>>> FOLIO  0x0001e000  0x0001f000       4096       30       31      1      0 > >>>> FOLIO  0x0001f000  0x00020000       4096       31       32      1      0 > >>>> FOLIO  0x00020000  0x00024000      16384       32       36      4      2 > >>>> FOLIO  0x00024000  0x00028000      16384       36       40      4      2 > >>>> FOLIO  0x00028000  0x0002c000      16384       40       44      4      2 > >>>> FOLIO  0x0002c000  0x00030000      16384       44       48      4      2 > >>>> FOLIO  0x00030000  0x00034000      16384       48       52      4      2 > >>>> FOLIO  0x00034000  0x00038000      16384       52       56      4      2 > >>>> FOLIO  0x00038000  0x0003c000      16384       56       60      4      2 > >>>> FOLIO  0x0003c000  0x00040000      16384       60       64      4      2 > >>>> FOLIO  0x00040000  0x00050000      65536       64       80     16      4 > >>>> FOLIO  0x00050000  0x00060000      65536       80       96     16      4 > >>>> FOLIO  0x00060000  0x00080000     131072       96      128     32      5 > >>>> FOLIO  0x00080000  0x000a0000     131072      128      160     32      5 > >>>> FOLIO  0x000a0000  0x000c0000     131072      160      192     32      5 > >>>> FOLIO  0x000c0000  0x000e0000     131072      192      224     32      5 > >>> > >>> Similar here, do you know why we don't ramp up earlier. Allocating that many > >>> order-0 + order-2 pages looks a bit suboptimal to me for a sequential read. > >> > >> Note that this is reading through mmap using the mmap readahead code. If > >> you use standard read(2), the readahead window starts small as well and > >> ramps us along with the desired order so we don't allocate that many small > >> order pages in that case. > > That does raise an interesting question though; why do we use a fixed size > window for mmap? It feels like we could start with a smaller window and ramp it > up as order ramps up too, capped to the end of the vma. > > Although perhaps that is an investigation for another day... My main motivation > here was to be consistent about what page_cache_ra_order()'s new_order means, > and to actually implement algorithm that was originally intended - start from 0 > and ramp up +2 on each readahead marker. Well, in my opinion the whole mmap readahead logic would deserve some remodelling :) because a lot of decisions there are quite disputable for contemporary systems. But that's definitely for some other patchset... Honza -- Jan Kara SUSE Labs, CR