From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0C66C021B3 for ; Fri, 21 Feb 2025 21:13:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1EF926B0085; Fri, 21 Feb 2025 16:13:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 19F556B0088; Fri, 21 Feb 2025 16:13:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 066A96B0089; Fri, 21 Feb 2025 16:13:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DDA376B0085 for ; Fri, 21 Feb 2025 16:13:29 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 804911C7C0F for ; Fri, 21 Feb 2025 21:13:29 +0000 (UTC) X-FDA: 83145202938.25.6912EF6 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf29.hostedemail.com (Postfix) with ESMTP id 96FBB12000A for ; Fri, 21 Feb 2025 21:13:27 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="QtJ8Vh4/"; spf=pass (imf29.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740172407; a=rsa-sha256; cv=none; b=8YgwSwGYLkYTf+oqhsv83bkuuh2vPfX5cQjIB0AnxVrKxz5OegLKYiMYZY8L5YwPR5N4gZ 9nFOmtkZUDmF1FKCJ9+VDdbueCMWbZ7H0FuAyiCz4nyL9Hi01YCbFETgY/77YVGTiy5TUj IM+o4H/nithwzus0wNAEwXuKL16Mwsc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="QtJ8Vh4/"; spf=pass (imf29.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740172407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=0a/5VjZ9YQsqB0qvwA2rQ2LgTDVW7fMERTkm2buWcc0=; b=uCMLlW20zY7G3bBoA1A8zDaLfRN0iiGqjaeELOKF5qT22AusjD9R7158IDu9p7RLDk24gd fVIzDpJb0Loa8TXjeb3kQkn8aJ073kFSAnB+xkZ4KUvgf4GAFHPWdgW+0uNXzTJnZA7ZRF ItKM46ansqDxm5e319zbynj2LL6zH64= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-220e0575f5bso39435ad.0 for ; Fri, 21 Feb 2025 13:13:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740172406; x=1740777206; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0a/5VjZ9YQsqB0qvwA2rQ2LgTDVW7fMERTkm2buWcc0=; b=QtJ8Vh4/ReqYdiFQrbMZkPvnWtkb/FTrPg4zut0eTLkWKU/9K79pF0FyaAoFvdsY4E VILVCvJUkb+kDL8gYM4UMh3Bl0hXiLNvT4xrglq8ZqPoledacsn6dUeWCO+MUYwS4IA8 dDh5MxL96dGMIcGgXOHQmqtH8lUt40gW9PQuKe6msiTwHtdxatdJLzagWEnlsvL1Ex4c cnCKCPD52quFShRMQjUgrTFKPFEogAl5UUxa7Q/snTUf47rFn2ebFEBFT53nneR/dU2Y Mtg/trGSYFmBPTD4rMNqNp6aNlR4us2mQODr3awBJMScpVcGTeJyVHvHCrbMEyuyfbgd mecA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740172406; x=1740777206; h=content-transfer-encoding:cc:to:subject:message-id:date:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0a/5VjZ9YQsqB0qvwA2rQ2LgTDVW7fMERTkm2buWcc0=; b=h4fl2REDRzQidiZROfDE21+nsLWwFyyLuRyRT+z0h6OsWRxiTkrFLxLRckavMCMNk8 cJiERwdQut0FtSfGYlE2w4GGaFRG8z6KfNuFbsjY/sn7xnnDXk5gCVQSBMTQcVDUuEZi puf54MXrTU4FBVJkwP6K5DUFUfOKoU9iggpm5nRm5sejO+/PzlqXAHVm5OP0FXfuLCJ9 UEDM8DspnZuB5nsnn3+JW5IRfKVUmuSZnWauuKTbu3fn7FCBjS5STbflPeaqJfYhE+4u mcZJYTvWkcA5htC/0IP4Cni+GS067do6npReRKqmMLquKL7F4WCnsQHpDTtu6Ioq5sZW eeQw== X-Forwarded-Encrypted: i=1; AJvYcCWtydIczyw89blGMsGNiMliQxsB3Q0MMm5RnAuyVoPM5xMMULUaCWGYVjPa7qLBZuj2PujFUQf4CQ==@kvack.org X-Gm-Message-State: AOJu0YxqGORewJIzjjOfbGVHgMYOHS74ssZiKV0iIjsR6QV/XGG806u+ xWT6kE2xOO1bzbol+Bqox5HkMEqRpH55FGaHHCmeUVOJL5PjBZFl797ukb/DhnzbVyjDZ4MkDuw 0XRRkwR7JV1RAs7r9HUpsuTm+G013X+y7cWz0 X-Gm-Gg: ASbGnctWtzAo/k1vnOVrKkM2+qY2STdI2gGkwO7u11hdP8G7CbsvufMLtlCp182xsG3 BmOQrtOZp5qioCZOB2UQ/X0+Y+T7DrlHxX0d/HajfQnTE+BzQljewY4dUEizT8Xy+WyvMznrOf2 EioylVTU7bM9n47p1ocT1c3tSNbTgN69QGK1yv X-Google-Smtp-Source: AGHT+IF4BBjHqEtO814ZrRj15jBo628pMYidnOtY4XjQ8S+tLYHsl6EerG/5wPIi+nM4wIcnq38arvvR8/KrDP1JDEs= X-Received: by 2002:a17:903:41c1:b0:215:4bdd:9919 with SMTP id d9443c01a7336-221ba6f71a0mr453155ad.17.1740172406211; Fri, 21 Feb 2025 13:13:26 -0800 (PST) MIME-Version: 1.0 From: Kalesh Singh Date: Fri, 21 Feb 2025 13:13:15 -0800 X-Gm-Features: AWEUYZkEm3e39YfjJK4KLHE7c24y-zC6H-DahuCaL-sEri8pz5hvLohaKAds2Kc Message-ID: Subject: [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior To: lsf-pc@lists.linux-foundation.org, "open list:MEMORY MANAGEMENT" , linux-fsdevel Cc: Suren Baghdasaryan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Juan Yescas , android-mm , Matthew Wilcox , Vlastimil Babka , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 96FBB12000A X-Stat-Signature: 753hwiaibw956pyjd6w5qp5rkr1dnb1z X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740172407-992293 X-HE-Meta: U2FsdGVkX1+I/dSNCQozDTp8Rd5a4Qcipf//+oYwS9fhYoQwLsaU5Ej0+wskwNEI6YfeH3nm3OMEd/FfgI83xYSx2NTD3xv5uUn9QrEruFlXagrvIka8nrjpv+qYf+RIWg7i+twf7/MBBr9lnDKghyMP9Gzl993lqPJJzTv/hHmgwdkO6Lq3t7ZkzpnTse1xoWUzP1lLkSw0jsYRkHuOvE6moNFGzqrRUDTaRi+xL+w1Y4O8mk441IL8KmiKHfip/gmbbCxpcdE+O33e6NR6/nqCaaXydsRyUr74bbcq9NhpEysuh5E2FRGA1mBdIGQEzxyqUw3aHuuDm0pkEr/SaoCYRpRRLg4XHLZWpoPIUbjTeXbzJj48NgzilmhU7aEVwtmNXIy8y+fxgBeNgGP5fbHBoPHMnkrnF/Ttmo1bqYi9UvIYXGo/LletoOqZYnNRdpIYGED+epMgkzDunnsTrw5PAKkh4Yi46zlysNURzgP8ckO3FF+v3mRILxJMnfELBsCui4vForApJSmFFCgoJkEeNgY1tvVcAnwMgutPO4Dw/9DPWisbY7niAaXnSpdPgWqLYKXgVWxVGnFa2ffuE5v5xfoZxT/uYcFk+MjfabkHYc47XZAUE38L5b97vSH5RuH9hOh34ytxDMA2goj57XA4d3iIV/RfKNEheWS9uWDVu57LW9ZaKoqrkv246+WeVodrcO17Nl8Pdud3bDH+94sOY74sJhoLIaFB5ceAcyGwOXON9qaS7awPzXxnxByZ1zJph5m38aC9Pm0hztiDo8WLu3KqUh/FciEKoLGXgAxLNo8ci1y0c6rphXETSMUl7rPtRtoV0SEQOYLka/Arsxvaufc9oHMIoUvfizyFWpZ526YC/JMEZunqZdfnOfEwkR6U0Nwuf//xdiowBWuEHpkq3CNPfghd9bZiCVhf8PNfaoFJNUlrSPLo+CdmEAzmC7Nnke7FvxgdiQEu+Ge 9Qwrh/l7 h9z9Lrl0lLxlaicEqrDh7iPOrODrGnTdBXVSJOEWj0GaaVer6O4eDLMLnYK3WWTvrJ/FkraKTJg0V8TW1B50GV8+8BBYuMvcsx21gvkrQlS84cNrXWHAobcZs70+0A6EUEetyDst49zKqT97PPdAovWi4Es1g56MUsz/tmIpajg8OEFhr+k40Ovl3FUAiY7/Gv045xHitrTYuOfWy6N7xT/jH32QLxRN/hGx0uC/J1XaS28oz8Ihs7kXxwk2+GmS131Jnqc+90zSWM6YKPaSvTtNNliuwN4KUppT+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.195253, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi organizers of LSF/MM, I realize this is a late submission, but I was hoping there might still be a chance to have this topic considered for discussion. Problem Statement =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Readahead can result in unnecessary page cache pollution for mapped regions that are never accessed. Current mechanisms to disable readahead lack granularity and rather operate at the file or VMA level. This proposal seeks to initiate discussion at LSFMM to explore potential solutions for optimizing page cache/readahead behavior. Background =3D=3D=3D=3D=3D=3D=3D=3D=3D The read-ahead heuristics on file-backed memory mappings can inadvertently populate the page cache with pages corresponding to regions that user-space processes are known never to access e.g ELF LOAD segment padding regions. While these pages are ultimately reclaimable, their presence precipitates unnecessary I/O operations, particularly when a substantial quantity of such regions exists. Although the underlying file can be made sparse in these regions to mitigate I/O, readahead will still allocate discrete zero pages when populating the page cache within these ranges. These pages, while subject to reclaim, introduce additional churn to the LRU. This reclaim overhead is further exacerbated in filesystems that support "fault-around" semantics, that can populate the surrounding pages=E2=80=99 PTEs if found present in the page cache. While the memory impact may be negligible for large files containing a limited number of sparse regions, it becomes appreciable for many small mappings characterized by numerous holes. This scenario can arise from efforts to minimize vm_area_struct slab memory footprint. Limitations of Existing Mechanisms =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D fadvise(..., POSIX_FADV_RANDOM, ...): disables read-ahead for the entire file, rather than specific sub-regions. The offset and length parameters primarily serve the POSIX_FADV_WILLNEED [1] and POSIX_FADV_DONTNEED [2] cases. madvise(..., MADV_RANDOM, ...): Similarly, this applies on the entire VMA, rather than specific sub-regions. [3] Guard Regions: While guard regions for file-backed VMAs circumvent fault-around concerns, the fundamental issue of unnecessary page cache population persists. [4] Empirical Demonstration =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Below is a simple program to demonstrate the issue. Assume that the last 20 pages of the mapping is a region known to never be accessed (perhaps a guard region). cachestat is a simple C program I wrote that returns the nr_cached for the entire file using the new cachestat() syscall [5]. cat pollute_page_cache.sh #!/bin/bash FILE=3D"myfile.txt" echo "Creating sparse file of size 25 pages" truncate -s 100k $FILE apparent_size=3D$(ls -lahs $FILE | awk '{ print $6 }') echo "Apparent Size: $apparent_size" real_size=3D$(ls -lahs $FILE | awk '{ print $1 }') echo "Real Size: $real_size" nr_cached=3D$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }') echo "Number cached pages: $nr_cached" echo "Reading first 5 pages..." head -c 20k $FILE nr_cached=3D$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }') echo "Number cached pages: $nr_cached" rm $FILE ------- ./pollute_page_cache.sh Creating sparse file of size 25 pages Apparent Size: 100K Real Size: 0 Number cached pages: 0 Reading first 5 pages... Number cached pages: 25 Thanks, Kalesh [1] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L96 [2] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L113 [3] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/madvise.c#L1277 [4] https://lore.kernel.org/r/cover.1739469950.git.lorenzo.stoakes@oracle.c= om/ [5] https://lore.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.com/