From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67280C197BF for ; Thu, 27 Feb 2025 22:13:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2F43280004; Thu, 27 Feb 2025 17:13:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CDEB0280001; Thu, 27 Feb 2025 17:13:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA5CD280004; Thu, 27 Feb 2025 17:13:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9CE93280001 for ; Thu, 27 Feb 2025 17:13:02 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 54002521E8 for ; Thu, 27 Feb 2025 22:13:02 +0000 (UTC) X-FDA: 83167125804.27.EABFAF3 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf02.hostedemail.com (Postfix) with ESMTP id 191FA8000A for ; Thu, 27 Feb 2025 22:12:58 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=lJ7b1qSI; dmarc=none; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740694380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IjcRWZ8Lvb7WvhvvISj1HcODhsYVUVcOolxyu3/O/yE=; b=3UV/P+7IGSslheLD1hm0dj3T+o7DqR/QkQNNlDcfhxZwj7TKNo38X9LZ7GVmYOwOfdi/Ue /iJE9F71i1yXCh6hY/yoTnf+VxFIfW4O2Wf5ZZ4Bg0MnJQrUra+TJFp/SviRJj3EycU01p ev5Co+8R5gmyf/5L7ZcX7Vvq25p+t4E= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=lJ7b1qSI; dmarc=none; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740694380; a=rsa-sha256; cv=none; b=N4+3OqPqkLAEmX7VbNLu4If1wgxQA6TNo+NY/E0G3R3z+QFL/B5HTlX/B4pGqgBIy/neXR 74gJjx9ZOqKBt+XhTGCRW9VBygLF1eUmnKAHxCCwVNVSDe3GA0DilwhxLsEe8KI3RE/qrB UHB95wU/Q0XvQCSPQDu+qiHnqS6rIKM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=IjcRWZ8Lvb7WvhvvISj1HcODhsYVUVcOolxyu3/O/yE=; b=lJ7b1qSIXp1DVOSBXCe0B3oUyY Gmo5TZ8v186AroD4PLQoD+OS0mnPGDN7aIOL7KPrSF7UmrdnvLnB0SBzF7m7EIU9sAydMTeGXSo8p 0x5bc73K4lznrghzD32VpU6XmhFZ6zJGA6dP6yt5CcALUbq9ziDb64d4ZKriTnNL/yESl2nFydzx3 90tSi5/0M8u4Z10QHpKVRZ9gmN1YtbobPYgr2rDPKYKY79Px3qqCd2stFPhU8kLhwpuLPkzgI4idx 9NTv+NPGSOUKN3L9d0zm/4pcJXqbkfncO1CIN47TZiTRc2E6076AAriMfLmKAK+hYOWMPLRx2Vw2r pXTXxdvw==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tnm7f-00000000hhB-0KpP; Thu, 27 Feb 2025 22:12:51 +0000 Date: Thu, 27 Feb 2025 22:12:50 +0000 From: Matthew Wilcox To: Dave Chinner Cc: Kalesh Singh , Lorenzo Stoakes , Jan Kara , lsf-pc@lists.linux-foundation.org, "open list:MEMORY MANAGEMENT" , linux-fsdevel , Suren Baghdasaryan , David Hildenbrand , "Liam R. Howlett" , Juan Yescas , android-mm , Vlastimil Babka , Michal Hocko , "Cc: Android Kernel" Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior Message-ID: References: <3bd275ed-7951-4a55-9331-560981770d30@lucifer.local> <82fbe53b-98c4-4e55-9eeb-5a013596c4c6@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 191FA8000A X-Rspam-User: X-Stat-Signature: etr5kxfzbc3wjcjw3tp6t1g87kgxwpb5 X-HE-Tag: 1740694378-839494 X-HE-Meta: U2FsdGVkX18ov3Pfp0UAFi2U0SLd1pLKD/6NuTzsfVLqxEKFaaXk05o9cpf+8i6k+VRYxc7VBw6YqO/L4+AhoPgvRuNPln20YRN/5wmWfwk8+RhtIpw4sXe+w0e2OpPTV9EmG8Iz/Nypu9ST3q0Yeqi+NM3JrWZqf8Oq3Kun/bq0n/htg5YvG5iZsq08nEvudHyPfhPRBeZFVh+pFu/G0iPIpb3m/01QnYEr4VXxs/fdYbcaTCez1vWZetlVJAnMbHvrL2R7PyNHCiHzlKnkb3UWaKahq7zCPUvZGABDKpWl4m7raqVtn5M48GS/TURIoRM6wFAynmD15wfdP3uajVyHry0eHt8kekL8Cb1gh/oNNPf3DG5Am4UQD4PMr0S2R2vqSsD0feNHZ5OwA6cEthjs5pkKNKTfP8R7b+xoxdm8zRTGOAL4GqXXMB3o3wcEIvZIJMSXY621zvD59zysEvmJ4QyMUZtykHKPh2UtNTOW8eMEUZjm+QrBKpGSB7vaedyA4njG3IviFpElISV0rkq4e8DuiDOsgeRyMECm4A16pMGxGmvWszjBtmpevpljPT71AnMUlF2wPpmORAiyPu7kfcgzLW/x35bz7kpvdfwLgZJekBcZTL2iW+mbd57b/w9v5cbVCRk+CQE4R52T3YRZLoFbeNMmm3I8eoVZMNyqxDgRvigfIwaUcYyDE6tGu3jdll6RVsjh0cxwFA1l+uuwtUxE+plmO1TJtShq7qVKNZynH6/D7ixtN21LRxeqgOVaRJGiKEmq0Nah0fCs8rAjSbrPRnRIhJDnXzvrA0lOUo6xDv8qtywaYMJXSfwCfkBVNh9ASJH2IdLiNCo587Er78QHpkY/LRV9VfrxaFx+MlLLgpQ8MpU0afrOStLklg+D5X/O0hCACerqvPR6VPdQgtaCByeBdiem1nRpqMQNSTu0m7psJiRThrwBegN+4q486ESrQzkWdLLu5Js CTY0Zcko WUFytB/oQjL8I7DdUvBSgiHCQ3zra+r7yKhqswaXfFfBw9pTcwQ9QeBv7UdeupJ+ot5Dppw2v2C9STA1WYxBuuwArilRcVfxzbtMMQp0V/I8Om3Qq2Kuak3U6nAJNfz888iMvMHjFB6iJCvnEyjyQHxVev/xp9GEZoKIa4FugMsbN4bVQ/ZHatgc8uz6NyQaVA34ZdPJnAeTFu+5kNJkQZO59Soza6/fJEuOXWUys1reYAWJ0tvP8dpYJtYwcmd01ZwsNiUCPaMM+udLTKBoZKFBcjG5Z/aTaCFK8NkDf26Kts93PWX05Ni0Uz3/r8J3WT8ms X-Bogosity: Ham, tests=bogofilter, spamicity=0.002594, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 10:56:21AM +1100, Dave Chinner wrote: > > From the previous discussions that Matthew shared [7], it seems like > > Dave proposed an alternative to moving the extents to the VFS layer to > > invert the IO read path operations [8]. Maybe this is a move > > approachable solution since there is precedence for the same in the > > write path? > > > > [7] https://lore.kernel.org/linux-fsdevel/Zs97qHI-wA1a53Mm@casper.infradead.org/ > > [8] https://lore.kernel.org/linux-fsdevel/ZtAPsMcc3IC1VaAF@dread.disaster.area/ > > Yes, if we are going to optimise away redundant zeros being stored > in the page cache over holes, we need to know where the holes in the > file are before the page cache is populated. Well, you shot that down when I started trying to flesh it out: https://lore.kernel.org/linux-fsdevel/Zs+2u3%2FUsoaUHuid@dread.disaster.area/ > As for efficient hole tracking in the mapping tree, I suspect that > we should be looking at using exceptional entries in the mapping > tree for holes, not inserting mulitple references to the zero folio. > i.e. the important information for data storage optimisation is that > the region covers a hole, not that it contains zeros. The xarray is very much optimised for storing power-of-two sized & aligned objects. It makes no sense to try to track extents using the mapping tree. Now, if we abandon the radix tree for the maple tree, we could talk about storing zero extents in the same data structure. But that's a big change with potentially significant downsides. It's something I want to play with, but I'm a little busy right now. > For buffered reads, all that is required when such an exceptional > entry is returned is a memset of the user buffer. For buffered > writes, we simply treat it like a normal folio allocating write and > replace the exceptional entry with the allocated (and zeroed) folio. ... and unmap the zero page from any mappings. > For read page faults, the zero page gets mapped (and maybe > accounted) via the vma rather than the mapping tree entry. For write > faults, a folio gets allocated and the exception entry replaced > before we call into ->page_mkwrite(). > > Invalidation simply removes the exceptional entries. ... and unmap the zero page from any mappings.