From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F207DC3ABBC for ; Tue, 6 May 2025 15:06:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D3D66B0085; Tue, 6 May 2025 11:06:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 283886B0088; Tue, 6 May 2025 11:06:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 174856B0089; Tue, 6 May 2025 11:06:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EC9D76B0085 for ; Tue, 6 May 2025 11:06:28 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EE2E9591DB for ; Tue, 6 May 2025 15:06:29 +0000 (UTC) X-FDA: 83412809298.13.8C5887D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id CF3124000C for ; Tue, 6 May 2025 15:06:27 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746543988; a=rsa-sha256; cv=none; b=PAU2fJupe7OeaV9AGsrlEwNrr6MUXoHu6meCAAxpkGcd3a67WBHEO8AUwI7PomIN5NiVBE bcJI0+84uQ+kpK5uo480gE0aa+RkcP9mrJZzmr821FkVghTyEamePe5JH+VScU4ft6/Qmq R4haLRW+9KZFXLdudR8q3np7vZwwiho= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746543988; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=03sboroZOiGWJJeHwah0kDWKPUeFJl/eE5PXKxoa2NY=; b=bk4k4Dbur+ZQO6Iw+tSpL4CRLl7Y5uqxdafvKYedGy7vGlxQAw/bC23LJq0pgxTQKGZZ/S yKkxuodtqxpk+uZTx+0RP1PNtrTLofrIAyUHot9YL+eCVT2Xed05jwL5v4ehG7uODwqc2O xFYk/pE9M+U8A/OrQNkx1jwsSNDbKs4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5D9FB339; Tue, 6 May 2025 08:06:16 -0700 (PDT) Received: from [10.1.29.178] (XHFQ2J9959.cambridge.arm.com [10.1.29.178]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3C0A93F58B; Tue, 6 May 2025 08:06:23 -0700 (PDT) Message-ID: Date: Tue, 6 May 2025 16:06:21 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v4 4/5] mm/readahead: Store folio order in struct file_ra_state Content-Language: en-GB To: David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Alexander Viro , Christian Brauner , Jan Kara , Dave Chinner , Catalin Marinas , Will Deacon , Kalesh Singh , Zi Yan Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20250430145920.3748738-1-ryan.roberts@arm.com> <20250430145920.3748738-5-ryan.roberts@arm.com> <2b1ea3d9-6c9b-4700-ae21-5f65565a995a@arm.com> <5ea287f0-24cb-4ad4-8448-6e397fbf1ec8@redhat.com> From: Ryan Roberts In-Reply-To: <5ea287f0-24cb-4ad4-8448-6e397fbf1ec8@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CF3124000C X-Stat-Signature: xkhdgjki9eefbmk71n45xsr35d1hbhwc X-Rspam-User: X-HE-Tag: 1746543987-777822 X-HE-Meta: U2FsdGVkX1/KXN0/svm/hXsy4W83gCzvNGhVYmWuTQx/BHOOXDIn8aEoLPaReycmYYl/91LwBit7+bA2Tq0RBvnakttgcGWik1ObvIvkoxIFWnc+gfDoFTIp9HR+41f0kpDjiIm9tzb2rVRJFGBafdJ2Eys9TqqwZCGmB2zIr8UrTCQSEwYFIcmcLaLbBlM9LktEaYDNyPZYQ0aeCEfxT6ZLAMrNhdeZCcDlK621id7yXJs9guCp+ufdjwAnM56UZR4SyK9nHv0dn9GiA3M1IJtDoYiGpa2R2KXTf5hw8udeIL5mCBXhG2HDR5UqOeeGb6sLZY3K2pB781N48Q5asYVGcx3gX7+RcW6sLLIPyvO3xvkjYzY5ohtF7cH0D33YiGyy9bkci/glPizWTbTJcEE+3TFrl7rDCoGwkMjbJvhKUvRI645PsZW0EI7slS3+buF5Eo+yf8HhpGAMDuVw3Oytp2M/E2MCooDdm03LeouKKtxFbuWH2hXuBe4fJHeLakuklaPtltNB6+aA/a1dMxAP8sWTS9umEyjjjvEU2bhJsqQVHV439/WHnsl0VdJ43QLXCVX918JvdpSnRgHa1S8umXyEAB3VsGYvwvQcO8ubydIFAl7ZG9Hy5tTw7I0Mmk7DTrdPD2lnS/H0yvpChLQbaXIPobY8RbmrTrAOel1ffi+XZp0T5ss+JYiXDc9YsDPLyrQmP6cQBBJtJkajmJZdi5swHfZ1EpSJeOmKXZcSofvcm0hHreQPbnOe8iwz/z+9gVnYrnukx7ZvzwrLKIqTblqd+dIwHh6W3sN8JSvNLoKnX8EkkJp2inIuMwxtUlD4uN4XhfxXiv6zddSvWz/OmWSdZbFlFO0Iho082wxLq+uD8MA3tmvob2tCpjZmgRc7ggD0drMO6u27OwBnh78n+2PCc3as27dx2sfnxvaPPVNGe+RVVaCgCCv7VqptY6UUcjbQI1W0aQGO4P5 OnMmrIKS V5QhjglS+GM/dpBqf3d+H7pMTce+oMx48gIZWlJ6F47+ZnrXF4RRTxaw0U4Df6v4sLmq3JFR9Ub8LswPlocaeR/vh5yOPa1pYnHrK/R3xJd4nrk2X95g+rZtrMq51HscROo4KGCG5b6ejYb25oB/RWO+ZG4VHP5XkDUiLFR+f9QX/EOMXq+wpesiTsOFTL0lJ63hO4cJsiA/41N2AsOhNg48GoeBWeOsBj6eBgcwYcgxrVPjBafty/SFdtFCxPqRn5tmIlhXU8xXLoX+IMtKU5R5FNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/05/2025 15:24, David Hildenbrand wrote: > On 06.05.25 12:03, Ryan Roberts wrote: >> On 05/05/2025 11:08, David Hildenbrand wrote: >>> On 30.04.25 16:59, Ryan Roberts wrote: >>>> Previously the folio order of the previous readahead request was >>>> inferred from the folio who's readahead marker was hit. But due to the >>>> way we have to round to non-natural boundaries sometimes, this first >>>> folio in the readahead block is often smaller than the preferred order >>>> for that request. This means that for cases where the initial sync >>>> readahead is poorly aligned, the folio order will ramp up much more >>>> slowly. >>>> >>>> So instead, let's store the order in struct file_ra_state so we are not >>>> affected by any required alignment. We previously made enough room in >>>> the struct for a 16 order field. This should be plenty big enough since >>>> we are limited to MAX_PAGECACHE_ORDER anyway, which is certainly never >>>> larger than ~20. >>>> >>>> Since we now pass order in struct file_ra_state, page_cache_ra_order() >>>> no longer needs it's new_order parameter, so let's remove that. >>>> >>>> Worked example: >>>> >>>> Here we are touching pages 17-256 sequentially just as we did in the >>>> previous commit, but now that we are remembering the preferred order >>>> explicitly, we no longer have the slow ramp up problem. Note >>>> specifically that we no longer have 2 rounds (2x ~128K) of order-2 >>>> folios: >>>> >>>> TYPE    STARTOFFS     ENDOFFS        SIZE  STARTPG    ENDPG   NRPG  ORDER  RA >>>> -----  ----------  ----------  ----------  -------  -------  -----  -----  -- >>>> HOLE   0x00000000  0x00001000        4096        0        1      1 >>>> FOLIO  0x00001000  0x00002000        4096        1        2      1      0 >>>> FOLIO  0x00002000  0x00003000        4096        2        3      1      0 >>>> FOLIO  0x00003000  0x00004000        4096        3        4      1      0 >>>> FOLIO  0x00004000  0x00005000        4096        4        5      1      0 >>>> FOLIO  0x00005000  0x00006000        4096        5        6      1      0 >>>> FOLIO  0x00006000  0x00007000        4096        6        7      1      0 >>>> FOLIO  0x00007000  0x00008000        4096        7        8      1      0 >>>> FOLIO  0x00008000  0x00009000        4096        8        9      1      0 >>>> FOLIO  0x00009000  0x0000a000        4096        9       10      1      0 >>>> FOLIO  0x0000a000  0x0000b000        4096       10       11      1      0 >>>> FOLIO  0x0000b000  0x0000c000        4096       11       12      1      0 >>>> FOLIO  0x0000c000  0x0000d000        4096       12       13      1      0 >>>> FOLIO  0x0000d000  0x0000e000        4096       13       14      1      0 >>>> FOLIO  0x0000e000  0x0000f000        4096       14       15      1      0 >>>> FOLIO  0x0000f000  0x00010000        4096       15       16      1      0 >>>> FOLIO  0x00010000  0x00011000        4096       16       17      1      0 >>>> FOLIO  0x00011000  0x00012000        4096       17       18      1      0 >>>> FOLIO  0x00012000  0x00013000        4096       18       19      1      0 >>>> FOLIO  0x00013000  0x00014000        4096       19       20      1      0 >>>> FOLIO  0x00014000  0x00015000        4096       20       21      1      0 >>>> FOLIO  0x00015000  0x00016000        4096       21       22      1      0 >>>> FOLIO  0x00016000  0x00017000        4096       22       23      1      0 >>>> FOLIO  0x00017000  0x00018000        4096       23       24      1      0 >>>> FOLIO  0x00018000  0x00019000        4096       24       25      1      0 >>>> FOLIO  0x00019000  0x0001a000        4096       25       26      1      0 >>>> FOLIO  0x0001a000  0x0001b000        4096       26       27      1      0 >>>> FOLIO  0x0001b000  0x0001c000        4096       27       28      1      0 >>>> FOLIO  0x0001c000  0x0001d000        4096       28       29      1      0 >>>> FOLIO  0x0001d000  0x0001e000        4096       29       30      1      0 >>>> FOLIO  0x0001e000  0x0001f000        4096       30       31      1      0 >>>> FOLIO  0x0001f000  0x00020000        4096       31       32      1      0 >>>> FOLIO  0x00020000  0x00021000        4096       32       33      1      0 >>>> FOLIO  0x00021000  0x00022000        4096       33       34      1      0 >>>> FOLIO  0x00022000  0x00024000        8192       34       36      2      1 >>>> FOLIO  0x00024000  0x00028000       16384       36       40      4      2 >>>> FOLIO  0x00028000  0x0002c000       16384       40       44      4      2 >>>> FOLIO  0x0002c000  0x00030000       16384       44       48      4      2 >>>> FOLIO  0x00030000  0x00034000       16384       48       52      4      2 >>>> FOLIO  0x00034000  0x00038000       16384       52       56      4      2 >>>> FOLIO  0x00038000  0x0003c000       16384       56       60      4      2 >>>> FOLIO  0x0003c000  0x00040000       16384       60       64      4      2 >>>> FOLIO  0x00040000  0x00050000       65536       64       80     16      4 >>>> FOLIO  0x00050000  0x00060000       65536       80       96     16      4 >>>> FOLIO  0x00060000  0x00080000      131072       96      128     32      5 >>>> FOLIO  0x00080000  0x000a0000      131072      128      160     32      5 >>>> FOLIO  0x000a0000  0x000c0000      131072      160      192     32      5 >>>> FOLIO  0x000c0000  0x000e0000      131072      192      224     32      5 >>>> FOLIO  0x000e0000  0x00100000      131072      224      256     32      5 >>>> FOLIO  0x00100000  0x00120000      131072      256      288     32      5 >>>> FOLIO  0x00120000  0x00140000      131072      288      320     32      5  Y >>>> HOLE   0x00140000  0x00800000     7077888      320     2048   1728 >>>> >>>> Signed-off-by: Ryan Roberts >>>> --- >>>>    include/linux/fs.h |  2 ++ >>>>    mm/filemap.c       |  6 ++++-- >>>>    mm/internal.h      |  3 +-- >>>>    mm/readahead.c     | 18 +++++++++++------- >>>>    4 files changed, 18 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>> index 44362bef0010..cde482a7270a 100644 >>>> --- a/include/linux/fs.h >>>> +++ b/include/linux/fs.h >>>> @@ -1031,6 +1031,7 @@ struct fown_struct { >>>>     *      and so were/are genuinely "ahead".  Start next readahead when >>>>     *      the first of these pages is accessed. >>>>     * @ra_pages: Maximum size of a readahead request, copied from the bdi. >>>> + * @order: Preferred folio order used for most recent readahead. >>> >>> Looking at other members, and how it relates to the other members, should we >>> call this something like "ra_prev_order" / "prev_ra_order" to distinguish it >>> from !ra members and indicate the "most recent" semantics similar to "prev_pos"? >> >> As you know, I'm crap at naming, but... >> >> start, size, async_size and order make up the parameters for the "most recent" >> readahead request. Where "most recent" includes "current" once passed into >> page_cache_ra_order(). The others don't include "ra" or "prev" in their name so >> wasn't sure it was necessary here. >> >> ra_pages is a bit different; that's not part of the request, it's a (dynamic) >> ceiling to use when creating requests. >> >> Personally I'd leave it as is, but no strong opinion. > > I'm fine with it staying that way; I was merely trying to make sense of it all ... > > > ... maybe a better description of the parameters might make the semantics easier > to grasp. > > ""most recent" includes "current" once passed into page_cache_ra_order()" > > is *really* hard to digest :) > >> >>> >>> Just a thought while digging through this patch ... >>> >>> ... >>> >>>> --- a/mm/filemap.c >>>> +++ b/mm/filemap.c >>>> @@ -3222,7 +3222,8 @@ static struct file *do_sync_mmap_readahead(struct >>>> vm_fault *vmf) >>>>            if (!(vm_flags & VM_RAND_READ)) >>>>                ra->size *= 2; >>>>            ra->async_size = HPAGE_PMD_NR; >>>> -        page_cache_ra_order(&ractl, ra, HPAGE_PMD_ORDER); >>>> +        ra->order = HPAGE_PMD_ORDER; >>>> +        page_cache_ra_order(&ractl, ra); >>>>            return fpin; >>>>        } >>>>    #endif >>>> @@ -3258,8 +3259,9 @@ static struct file *do_sync_mmap_readahead(struct >>>> vm_fault *vmf) >>>>        ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2); >>>>        ra->size = ra->ra_pages; >>>>        ra->async_size = ra->ra_pages / 4; >>>> +    ra->order = 0; >>>>        ractl._index = ra->start; >>>> -    page_cache_ra_order(&ractl, ra, 0); >>>> +    page_cache_ra_order(&ractl, ra); >>>>        return fpin; >>>>    } >>> >>> Why not let page_cache_ra_order() consume the order and update ra->order (or >>> however it will be called :) ) internally? >> >> You mean continue to pass new_order as a parameter to page_cache_ra_order()? The >> reason I did it the way I'm doing it is because I thought it would be weird for >> the caller of page_cache_ra_order() to set up all the parameters (start, size, >> async_size) of the request except for order... > > Agreed. As above, I think we might do better with the description of these > parameters in general ... > > or even document how page_cache_ra_order() acts on these inputs? OK let me try to work something up for the next version...