From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21259C021BB for ; Mon, 24 Feb 2025 14:14:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ED446B008C; Mon, 24 Feb 2025 09:14:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 976486B0092; Mon, 24 Feb 2025 09:14:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EFF26B0093; Mon, 24 Feb 2025 09:14:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5D24F6B008C for ; Mon, 24 Feb 2025 09:14:09 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0B694513A9 for ; Mon, 24 Feb 2025 14:14:09 +0000 (UTC) X-FDA: 83155032618.25.38317BE Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf26.hostedemail.com (Postfix) with ESMTP id 7B3C7140016 for ; Mon, 24 Feb 2025 14:14:06 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=aCLhTC4S; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="X/JR80vL"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=aCLhTC4S; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="X/JR80vL"; spf=pass (imf26.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740406447; a=rsa-sha256; cv=none; b=KFo1PVc/nZXiZszzATbPyIVQ1jzDhCDtH1mSxUov1750wf/jpEuuTl2tLajODWZ0gpsJIg lOSj7x+NEYPVEl9Ll5BJvu65GvCzmgGiWYFHIb+lOjP/yEX6pKOiowKTFB4o5HSORuvYV+ cp2p2bGON6u6tYVW0IrLZRVEIPonNaw= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=aCLhTC4S; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="X/JR80vL"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=aCLhTC4S; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="X/JR80vL"; spf=pass (imf26.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740406447; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s7IjZogmPFxlkawqZh8kYdAkwk3qdalG/s4P2ZJJZ64=; b=pjpC8KOeupyX6gKrx2+yVtpOTI1xv4BqP50yolhdSWvXhzfgzCEq6awaU3Cuz4Hk8spI5l H9ZEU5rRJgw7STR0m01zx2Ogl8NFJmTUBy6aZ3X/WdDel2L5E21grrZVlXPMIiaX2X1TnS bPOldVLnvFW/R3Rt9zICGBqWW46XqJ4= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D73741F397; Mon, 24 Feb 2025 14:14:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1740406444; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s7IjZogmPFxlkawqZh8kYdAkwk3qdalG/s4P2ZJJZ64=; b=aCLhTC4SNEO+Twu9b8dgs0sO0h31DBp3jgFMZ3PXPExx15Op7tOtotFBnqEAutI/PQytAn tDnVTaRCuuegaFlMdWmPjAdcrJGJrUtRX5DNKILLbkQw847QQcfrAkqRUaNMrq5H8syNiH dHRXJ7PC4eIWX8ZrCNuqkUm76JcC4to= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1740406444; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s7IjZogmPFxlkawqZh8kYdAkwk3qdalG/s4P2ZJJZ64=; b=X/JR80vLaIteAzc7JWqXt4oWIKEGvhchyII1sWwj3hVdd+Gukzve377JqhrviutUDDRwMJ YEmQVA61aV96pyBA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1740406444; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s7IjZogmPFxlkawqZh8kYdAkwk3qdalG/s4P2ZJJZ64=; b=aCLhTC4SNEO+Twu9b8dgs0sO0h31DBp3jgFMZ3PXPExx15Op7tOtotFBnqEAutI/PQytAn tDnVTaRCuuegaFlMdWmPjAdcrJGJrUtRX5DNKILLbkQw847QQcfrAkqRUaNMrq5H8syNiH dHRXJ7PC4eIWX8ZrCNuqkUm76JcC4to= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1740406444; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s7IjZogmPFxlkawqZh8kYdAkwk3qdalG/s4P2ZJJZ64=; b=X/JR80vLaIteAzc7JWqXt4oWIKEGvhchyII1sWwj3hVdd+Gukzve377JqhrviutUDDRwMJ YEmQVA61aV96pyBA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id CE09A13707; Mon, 24 Feb 2025 14:14:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ePVIMqx+vGc6QQAAD6G6ig (envelope-from ); Mon, 24 Feb 2025 14:14:04 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 8D137A0785; Mon, 24 Feb 2025 15:14:04 +0100 (CET) Date: Mon, 24 Feb 2025 15:14:04 +0100 From: Jan Kara To: Kalesh Singh Cc: lsf-pc@lists.linux-foundation.org, "open list:MEMORY MANAGEMENT" , linux-fsdevel , Suren Baghdasaryan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Juan Yescas , android-mm , Matthew Wilcox , Vlastimil Babka , Michal Hocko Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: 7bs9f6zb5gko6ynkhim4adzqw6wq9s5r X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7B3C7140016 X-Rspam-User: X-HE-Tag: 1740406446-632418 X-HE-Meta: U2FsdGVkX18Eg7hhuLIis86Xl8g1xO+xlZ2drocpkUAR9IdaFifCsj/PsgL9aAxGDuG+/Z1Ow7UR6pXcSVOmj7a4C9zpUwn1kyuLRHYNwr44caz6ypAV5+m0aZ69k/XC4KXK2DSqYRULPTRq2UPQ0y2DQEspXbrtu+Inbo9gKfEHSEpX8IjzKJILx3wMzi0/qFp6dYKXeb0w36gDONw8GSkKop2RreC9aEDf/3/l1jx5ppJ1cuNcfCJLSJG8EqSnEokYvOAzIlbJ8yeqhEHeFym30KYhSz6tr8l3DlS8PURUzc9i0tWWTpjIx30M+j46OsEr4egdO66+GiaK7mWGxdsJ2REK4uKBdAwOTsKwtJptKkE9cBOg3TZ/ASq9pnO1hQv2fwlnvkuBgoCDaMvAzVbmKva7pjXHaw6q3lWKOiJIJXvDsMyJ9L1jQuVmeUfdYTCn0g7BlAkB6yzSK5UkW+BJYhy3B68uz9GxI6VV+Kd6Qf2/1yZUrX8+MgNytwWfBjYN86t+AZZpYWkNKrD2Zja3iD2/Mr9WjBiylI32PPISntebH/Y8sWEJ+/2NtdvApdHz5RPJ/lU3am/Xg6zXhhD9eDgTbhGbWfuxXEgkIUGIGEVVbgzUbph/oJxlw8h6QLXCAssTqXZjQADlNKNaGMboMjfgvF3Q4Nx8j1ElMb4fCaF9hVlmPqw7JitLt9PHVxqopr9cWkaXP7bDS2u0wJGzCwBm+GMuz4W0h4LbN1VxmExqm5kaQNP3VOKObZ+kgSezEz1Wb6RGj1nAHyf5I5z+mFO6Ku9Of/M5WE3oZLqpY59fzLJPv9kA+53xbx9x6JFCgIyaCyMVBh2Rb6TD6QLQACeRFFMwwmQERdg8KWgzwZJKzMqUA4LakWUsRPrwiI77VcEDHgTqH2B/5PK3xSmeDwGxJ+M76Ne81NC52kWsTlObUF9YrulHJYYxXiryuke4UTDqLKHsS7BJueM qxcJtEaD DVNr23ksXuwhOXTszljhOIsFuTRO4jbZIQ99r/Np2gaSLbh2yClsz5Kd32IPD8BhhHq8CpeleLhGukSm2otDxcmZF+8YjGOFI//hh2qK//i5GSIpzNky1b1T1e4xVgp8jQIElBpvhwx6+x8ZlnZq1sbSIPtSPsou31VuzmwnKTAOk2lhvP7YW8DiKKGK+C7JZTWHPGPLMHKc9hurSpIPn74eOyAal0xkC96k9AUZ0oIuFnLnk4+T6jk9Exv7bXhhWB0BtZc9EQXCLwjIHOhS+D5sPKKGxrJX8BB3FEpY/2yomvUER40RgzmYqYlYmQwvKfRJjpHRd3beV5OkGsMlIhlohnyDOXoaupMuCnvhd6WcA9Q4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000412, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello! On Fri 21-02-25 13:13:15, Kalesh Singh via Lsf-pc wrote: > Problem Statement > =============== > > Readahead can result in unnecessary page cache pollution for mapped > regions that are never accessed. Current mechanisms to disable > readahead lack granularity and rather operate at the file or VMA > level. This proposal seeks to initiate discussion at LSFMM to explore > potential solutions for optimizing page cache/readahead behavior. > > > Background > ========= > > The read-ahead heuristics on file-backed memory mappings can > inadvertently populate the page cache with pages corresponding to > regions that user-space processes are known never to access e.g ELF > LOAD segment padding regions. While these pages are ultimately > reclaimable, their presence precipitates unnecessary I/O operations, > particularly when a substantial quantity of such regions exists. > > Although the underlying file can be made sparse in these regions to > mitigate I/O, readahead will still allocate discrete zero pages when > populating the page cache within these ranges. These pages, while > subject to reclaim, introduce additional churn to the LRU. This > reclaim overhead is further exacerbated in filesystems that support > "fault-around" semantics, that can populate the surrounding pages’ > PTEs if found present in the page cache. > > While the memory impact may be negligible for large files containing a > limited number of sparse regions, it becomes appreciable for many > small mappings characterized by numerous holes. This scenario can > arise from efforts to minimize vm_area_struct slab memory footprint. OK, I agree the behavior you describe exists. But do you have some real-world numbers showing its extent? I'm not looking for some artificial numbers - sure bad cases can be constructed - but how big practical problem is this? If you can show that average Android phone has 10% of these useless pages in memory than that's one thing and we should be looking for some general solution. If it is more like 0.1%, then why bother? > Limitations of Existing Mechanisms > =========================== > > fadvise(..., POSIX_FADV_RANDOM, ...): disables read-ahead for the > entire file, rather than specific sub-regions. The offset and length > parameters primarily serve the POSIX_FADV_WILLNEED [1] and > POSIX_FADV_DONTNEED [2] cases. > > madvise(..., MADV_RANDOM, ...): Similarly, this applies on the entire > VMA, rather than specific sub-regions. [3] > Guard Regions: While guard regions for file-backed VMAs circumvent > fault-around concerns, the fundamental issue of unnecessary page cache > population persists. [4] Somewhere else in the thread you complain about readahead extending past the VMA. That's relatively easy to avoid at least for readahead triggered from filemap_fault() (i.e., do_async_mmap_readahead() and do_sync_mmap_readahead()). I agree we could do that and that seems as a relatively uncontroversial change. Note that if someone accesses the file through standard read(2) or write(2) syscall or through different memory mapping, the limits won't apply but such combinations of access are not that common anyway. Regarding controlling readahead for various portions of the file - I'm skeptical. In my opinion it would require too much bookeeping on the kernel side for such a niche usecache (but maybe your numbers will show it isn't such a niche as I think :)). I can imagine you could just completely turn off kernel readahead for the file and do your special readahead from userspace - I think you could use either userfaultfd for triggering it or new fanotify FAN_PREACCESS events. Honza -- Jan Kara SUSE Labs, CR