From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 494D7C021B6 for ; Mon, 24 Feb 2025 23:56:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6768C280004; Mon, 24 Feb 2025 18:56:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62676280002; Mon, 24 Feb 2025 18:56:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C77C280004; Mon, 24 Feb 2025 18:56:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 301EA280002 for ; Mon, 24 Feb 2025 18:56:28 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D30A31C721C for ; Mon, 24 Feb 2025 23:56:27 +0000 (UTC) X-FDA: 83156500014.19.2BC15FD Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf27.hostedemail.com (Postfix) with ESMTP id C4F9040005 for ; Mon, 24 Feb 2025 23:56:25 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=dTdL2NO3; spf=pass (imf27.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740441385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uRvTiYEU0DIWKzrEf3PrUw6znX+VEmOhmQ+RurgzOXk=; b=vI8yNf4M+ArYmrU1Xbjr0ldG/GoDmb//cWoOUUZBxg9s4S84nC36fbT2TO9acSVgi8wWK1 dM2idFo6KxAgP+2a2bfOSqr0b7VqNKRTksqy49+g1XKxETGjhFUXr3l9NfK1AzxsEjChqc CRfnAYg/wYRE/ZEqXkKVp/KFx4Zy5HU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=dTdL2NO3; spf=pass (imf27.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740441385; a=rsa-sha256; cv=none; b=waMvQ/CZn6k7p/3GT9qhJKdLuGjqBV6M/DlKxKUuexiXgRm+n/osq2CaJqWgn7vsHd8qvi pngjJedHuhyQjQu8YRj48cfCs9fLiShyhlrquZ9/8EFhOdUBPOIodfbtt4Bntfa+RBoFqc njumOMUrMRPR9XSFa993iKBhqcL0K4k= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-220c8f38febso105992535ad.2 for ; Mon, 24 Feb 2025 15:56:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1740441384; x=1741046184; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uRvTiYEU0DIWKzrEf3PrUw6znX+VEmOhmQ+RurgzOXk=; b=dTdL2NO3Q5d4H3Dj5Gf+UgmiIZC8yFkHh0Uy+N0sFQ9YVnD4ezfCNWGnAdba0Vazz+ rvvQB0vYJprijYUWyLqOBtwCWnwOY8E6YoMXOaILqG64AZaRZ5fLKqZH5vRbynmYmq+q cYML9LOp0jBC96qO9o5Fn0bjHRPLi5iOPzqqpfeNrfTwuZ8vn/aqCS7nIVzJWV9f5AE1 fSKVRIKiL+kRoNjE1WZaP8NH63Q8TdjOT2IHL/GAcBF7czxxXaMAW2E/AS5kEiuZ5K80 L8nK2SJVetMjLaVPq3Sby0EO7arf68ZPpzkHf65sC6P9OhgNe+A36qTNBKOqG9ZpsGJC CvDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740441384; x=1741046184; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uRvTiYEU0DIWKzrEf3PrUw6znX+VEmOhmQ+RurgzOXk=; b=k2CoGFuirA4dqh9mJ7Gcfje7H7+fdslmLv3glw9Qa3r5SiCC3YPWJnTIX21RX9Hm1/ lMd0zlzQzGx1g42Hy+pPqijCZZMITrpzaUXfPMgo0SjwcUmgp3cHpvAZgukt89aPTKH2 dCktiO/Er3XxUL4OZ2PAPIr5CyTXyLU7ZjFlsHyEz47NbPr4n3CPAzFl1ueOt6vn3MBK SmJTI1fvv3E31uO1fKh+KmNQDDDXR775cC1338s1HOymufMDd5xd3d+DkLdJeRxGPVRP Ul27aZzXOwBQRkx9U375fZ5lGNdLvMoo/QwvoFwBqpy0Nb5ls9V5wpILo6aWT+ok8gDR jBWQ== X-Forwarded-Encrypted: i=1; AJvYcCURAWNoBjYYjuBQNhbSPdsL6uZWw73zVtjTfVLHGHBOKw7Us6c77k1fT5VLnlzRlr619/aetgJTmw==@kvack.org X-Gm-Message-State: AOJu0YyWx2eftshIcKoQekHGrOGkNeh6DcAVt96uZySvIum8cjVxUlSR EQmQwoDhgxbnuj/dwrAtKV6tlheykK5Ea/lU1sCSEn99ujKfv1nD51mwuSR/Drk= X-Gm-Gg: ASbGncuWkItxllql92lOQKJO/3eTpikQgcnuu8MNynsYotj0yoIBtFX3fbWiNXikAW9 iQXH3sdsvAlk1tBm3NdDvnIa0NdN0L2U+JdfoBVWDisu+RfzUGT5Hw30/UNUkpaeXujiXCOc/et MRdDLA7QpEIOcklSVT6k1rUQZtnd7V9a/Cpr5oVYf/ba3ItyILTjMDiulhuBdJlcKuzq9Ec/f9M iho892IA+DveJgh+dLufkaUClp9tkaq8WhByw083idupakM2wk0hAa3h/uAT6jS5JiniP4Hh54W onBpV1kfkSPvRMruz6CI+P2t71WwFa2epLCMnenOUx+EY71mrzwWIJTiuIWrJp9BkZA= X-Google-Smtp-Source: AGHT+IGbWPKHZBl2SGmtWlS61E5DNhroaDo/9OwE/K0b4fTSm2X9AcdRjflngJr1VE55pq8/V5a00g== X-Received: by 2002:a17:902:e54e:b0:220:d256:d133 with SMTP id d9443c01a7336-2219ff9e7f3mr281552575ad.14.1740441384566; Mon, 24 Feb 2025 15:56:24 -0800 (PST) Received: from dread.disaster.area (pa49-186-89-135.pa.vic.optusnet.com.au. [49.186.89.135]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2230a0a639dsm1805515ad.203.2025.02.24.15.56.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Feb 2025 15:56:24 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tmiJB-00000005YIP-42JH; Tue, 25 Feb 2025 10:56:21 +1100 Date: Tue, 25 Feb 2025 10:56:21 +1100 From: Dave Chinner To: Kalesh Singh Cc: Lorenzo Stoakes , Jan Kara , lsf-pc@lists.linux-foundation.org, "open list:MEMORY MANAGEMENT" , linux-fsdevel , Suren Baghdasaryan , David Hildenbrand , "Liam R. Howlett" , Juan Yescas , android-mm , Matthew Wilcox , Vlastimil Babka , Michal Hocko , "Cc: Android Kernel" Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior Message-ID: References: <3bd275ed-7951-4a55-9331-560981770d30@lucifer.local> <82fbe53b-98c4-4e55-9eeb-5a013596c4c6@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C4F9040005 X-Stat-Signature: kqohnafh97rfqmp3emrcoow9bnqno4g7 X-Rspam-User: X-HE-Tag: 1740441385-300254 X-HE-Meta: U2FsdGVkX19qSbvbsoffkUgsn4mXeyQA08C6jPefIxr9/tUkZbW4FkmBZslIUQjQ20iXcmOOu6Q//QuK10KDa1AOdslnslF7aii3+0QIWnE3RKmkRBcvMQYChU/a9R287H9ZwVbihvRXo72SH5/lqCAd6CjO/x2of0YOSEHcmxoktOrD/1vUYE+bXwZQvfj+ighhzXxyzwF1WIgxpWRj+sPfFcg2pkSh3xVOVDn0+xPMl7u3PIjh9Bf9T9YwGwNx3YGLk9I0Q1N7PVfu/XCbvsbnavht3GhirhiVdLPVINk9V3yfBrDiNbu2xeL3flqbDclHtQuF/t+51HDVkodHpFti+AxIgvrbz3r8w1jdArRlIeyoDmFXjyQewsuuxYFeArVelfIRKJG9s2SwQ1GI0rif39xYyPx4ei/D1Ik/8H/e8kCTLGWvuaagnvSIBqk8sIQXzaSFYAtonRubUqsu3jvjSMGj3KQHW7A1V1episeykfdjqS36+c0tFl+ROBXvlaoXjqZQupkIHYvkKXUYBQqLSuoEiwXYguj8qlqCJCLb5RR1swlwl0pL6J5Ox90FW5JeHTL66evUgC81gffTTzk1t4le7ekaPZVfHYdaGeBypZadcGDbuS1Ii9+0rzxCv904SD6/rCTM5Z9qnuijR2H917j5BP58fdqlL/oN+JZdCAYN/Xsy8S0BPnMNm6UCUqq/pbRqAq4W/kmKJLcBEOBiYiWiQ68bW2P2HPMyb/B2TXUVhy1nlg2Hrmu2MoP4E7hyaM7s5CvUGobm2FZmtSl8QuF6jZHf6rA/Gec+o7bNJcpB03BtYSliBaKGq3QwPEVPDCwqX846qeShQF16WxW/xcp6cDeKxFuWEM6twBgBy+1z5Iwh1JYjqt3WL572/8LozhTzwA9OmJCmK+lAakU0lyZrWnLVdqnAr5H01L/IwojLP9ctjaYFPqpHrIbu/JTYq0ymiHUcqpOvPAN ygB5UZzE auKTIVCtwourM+QNu/V+xQiCeCoyJfdTMkUWePj9+rSK7H9CgJEqkFSLsOEywsbiakTKRxzK5JKjaLgjRa+hPZ29CYm8Sdlf45lwCt8p0uFdA38/tjkuucB1snTTcMIRz4dmJG4Ev+LMuaqrN/5PcyaHlsr7azWKjCe0OkMhnIBkKEBrpjUYct0CDmVo6IkL+UeqJvKgsBm/YT81pCiVsQEuy7RtMTmqYfuN/VTkL5OBd+AYPbeDew6696v87vu6JDXvzucQ5olMjH8DkYZKXuhPyJT3d+JLCO2btEL+PHTw2dadUmrshy+IWr5yzZ3Bj7xKs2r9Yc195VcW1JHpBTDQLLOdspWNTelZOTYNXuciVWRdy4YsK5F/BV80dQC8NK/rKR3pKSTBgQBR/haMYhGyVGjDq8OKFBo4Ub4PiV+4pc+mq1aj/Mz5478RFzadt65utUxhdEhBVvzyswms+139Er2UUZZrN+sb7XzORbrubZavIkY9dQf/8sg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.018095, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 01:36:50PM -0800, Kalesh Singh wrote: > Another possible way we can look at this: in the regressions shared > above by the ELF padding regions, we are able to make these regions > sparse (for *almost* all cases) -- solving the shared-zero page > problem for file mappings, would also eliminate much of this overhead. > So perhaps we should tackle this angle? If that's a more tangible > solution ? > > From the previous discussions that Matthew shared [7], it seems like > Dave proposed an alternative to moving the extents to the VFS layer to > invert the IO read path operations [8]. Maybe this is a move > approachable solution since there is precedence for the same in the > write path? > > [7] https://lore.kernel.org/linux-fsdevel/Zs97qHI-wA1a53Mm@casper.infradead.org/ > [8] https://lore.kernel.org/linux-fsdevel/ZtAPsMcc3IC1VaAF@dread.disaster.area/ Yes, if we are going to optimise away redundant zeros being stored in the page cache over holes, we need to know where the holes in the file are before the page cache is populated. As for efficient hole tracking in the mapping tree, I suspect that we should be looking at using exceptional entries in the mapping tree for holes, not inserting mulitple references to the zero folio. i.e. the important information for data storage optimisation is that the region covers a hole, not that it contains zeros. For buffered reads, all that is required when such an exceptional entry is returned is a memset of the user buffer. For buffered writes, we simply treat it like a normal folio allocating write and replace the exceptional entry with the allocated (and zeroed) folio. For read page faults, the zero page gets mapped (and maybe accounted) via the vma rather than the mapping tree entry. For write faults, a folio gets allocated and the exception entry replaced before we call into ->page_mkwrite(). Invalidation simply removes the exceptional entries. This largely gets rid of needing to care about the zero page outside of mmap() context where something needs to be mapped into the userspace mm context. Let the page fault/mm context substitute the zero page in the PTE mappings where necessary, but we don't need to use and/or track the zero page in the page cache itself.... FWIW, this also lends itself to storing unwritten extent information in exceptional entries. One of the problems we have is unwritten extents can contain either zeros (been read) and data (been overwritten in memory, but not flushed to disk). This is the problem that SEEK_DATA has to navigate - it has to walk the page cache over unwritten extents to determine if there is data over the unwritten extent or not. In this case, an exceptional entry gets added on read, which is then replaced with an actual folio on write. Now SEEK_DATA can easily and safely determine where the data actually lies over the unwritten extent with a mapping tree walk instead of having to load and lock each folio to check it is dirty or not.... -Dave. -- Dave Chinner david@fromorbit.com