From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88D43C5AD49 for ; Tue, 3 Jun 2025 06:05:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 228CD6B03AC; Tue, 3 Jun 2025 02:05:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D9256B03AE; Tue, 3 Jun 2025 02:05:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EF516B03AF; Tue, 3 Jun 2025 02:05:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E4B6D6B03AC for ; Tue, 3 Jun 2025 02:05:21 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 48767EB02B for ; Tue, 3 Jun 2025 06:05:21 +0000 (UTC) X-FDA: 83513052042.26.772F9DA Received: from smtp232.sjtu.edu.cn (smtp232.sjtu.edu.cn [202.120.2.232]) by imf16.hostedemail.com (Postfix) with ESMTP id 2C11E180007 for ; Tue, 3 Jun 2025 06:05:17 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of zhr1502@sjtu.edu.cn designates 202.120.2.232 as permitted sender) smtp.mailfrom=zhr1502@sjtu.edu.cn ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748930719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=rGd4AutlKsaPhvwEoNJ+oSrWv5wwUIslB1Al5CXaJcU=; b=IpFb/kphVBlE0+nzg7X8tXex/651op9WN65YkdryfE0WWloZdHMhIz4RhEeaHoNzfEhPxv QM/BWcgClDl1IcHO4OqNnE+8ZoKMA0cD+L+1pyQz+rZ7QUvcW9XvgN72RTXh0c959c8Eb5 ycJk8ZU+L5E6MBbsKdl756nZXEj8mZs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of zhr1502@sjtu.edu.cn designates 202.120.2.232 as permitted sender) smtp.mailfrom=zhr1502@sjtu.edu.cn ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748930719; a=rsa-sha256; cv=none; b=oJdJtchq6iPyg/iJJ3OG0dyTJISUViME007n9Q+zvrOg/zC0pCFGeFp7Y7u2cfmxegqx3h GMv1YEsdMC60LpfX5kYu7ly3iP/V1J2XHBijgzMlzUwG7eOI9KeuQmLe2Ci2be2AyObiFW RVOjgvCR5DXtqPoysGy9UWjyXnSBfwA= Received: from proxy188.sjtu.edu.cn (smtp188.sjtu.edu.cn [202.120.2.188]) by smtp232.sjtu.edu.cn (Postfix) with ESMTPS id 95770102157DC; Tue, 3 Jun 2025 14:05:12 +0800 (CST) Received: from nixos. (unknown [10.181.220.127]) by proxy188.sjtu.edu.cn (Postfix) with ESMTPSA id 6C96637C935; Tue, 3 Jun 2025 14:05:12 +0800 (CST) Date: Tue, 3 Jun 2025 14:05:11 +0800 From: Haoran Zhu To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: Matthew Wilcox , Andrew Morton , Jan Kara Subject: [Question] mmap_miss not increasing in mmap random reads Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Stat-Signature: 1tpxykbi66ijrts3kczoybpbzed48rid X-Rspamd-Queue-Id: 2C11E180007 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748930717-745104 X-HE-Meta: U2FsdGVkX1+J0AjFLi+YC60Tiiyma2PqA5toztROxEpPYHmmsY6KEzaMVKwV/4xVviCusLJM7AoHNKFgCWMUPoU3TXJqR4m6Tp2EcsEUAJXD5higoFVJrkhUP/fW4b28V7DjOUKdDMiLY6oAtLJX8CNYqeD6aY3+BKJxL0pmcnqjeQ6aBE3UpfRrGfw+6X2B3cjCOstPr9yJB278B/6rjbZOnslLppAT/O+hicQ9bVlLOj9AJAdeXahZnz3/jlSFPn9vePG+SvyVPAKRXgHm1J+jLrlpK5nLfifWtrDl9pWLqJ8X66RJ/mlI91kMiKCaGH2kTuFeeSkoTW1FByOsWfyThzbzIMjDC0Q38oBtGjKwM0ryZR3Wv61fdbezUVPaf5AqiQcxOhRVmegOtfqY08dZrVrdaS4No/RKj5zuOuJb1wRQwT1dEdqaU3PHVY/Ds0scCTo27T2HGfw3qxYP+5ZDl+w2seKw79iXRTROqszwrNG3eAlyonDAwPUgHhqz0LObl9NQ2APyQkZ1wxYRfk7pkCrUhaLdzxqtakhp9ZgOpuFy0gqMkEJv6n2STmgqImncxdbBN2cnDilVM68lTqBqhbkT5B1dvNAEnYPHx+xeMAIxbGoAVYeQs/ndCvZJUEg8tKEWE3xQNhMoPgm6i+0hr08Tpu0Ex5THyoG9+4uDEwZWGh0H83XOBsQCUbzuQVjJm34xj2HFm1Sd+bR3pynagvz4nrRLqapOCdMD1K/bo2ATR/3tihYAWFoI3oekuZMfRNCFfHcNQmFAi9MCKgXYyuMCbvN2iiwr9WE6La8BgkzcfVZFW4FE8Kf0BbLCePbYtPuxRRZIYzd+1F3TR4pJ2D6UeQkGmObu72E9Hj25EyS0gTx34nVYvVQfyJMUTwwDAzlBtxRNtUzQFbCb5rNS8R4pyK1JovlDHeouZKdINiRHESPbTfzwznNtBecnLavOBBQmJZw7Yyn5h1F eczZ7rZd N0JpWPukG1IGsI6EvIEgBKNjT0fBNh8osml1O5RDBXWLdfxOrx3yZ9UqbFaKpK3Jhh1/zKKvllK4ORU38Cxah8Rc5OVhw4o7NujnGBC8XOXIXs3Ha4nwSXEs0jiveKBjibRXLWgW2yUZ4ZZ5mrcRU6JIvfigREhzwDeOCKUhVdhMLe5me5EGr7QRI6lvSir0h1rbc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, While examining mm/filemap.c, I noticed that file->f_ra.mmap_miss does not increase as expected under mmap-based random read workloads, which prevents readahead from being disabled—even when it's clearly ineffective. Test case: 4GB file mmap'd and randomly accessed in a KVM guest with 2GB RAM. See benchmark code attached at the end. I used the following bpftrace to monitor readahead activity: kfunc:vmlinux:do_page_cache_ra { printf("size: %d start: %d mmap_miss: %d from %s\n", args->ractl->file->f_ra.size, args->ractl->file->f_ra.start, args->ractl->file->f_ra.mmap_miss, comm); } The result is that mmap_miss remains low, and readahead remains enabled. From filemap_map_pages(), this appears to be due to the logic in mm/filemap.c:filemap_map_pages that treats the surrounding folios of a faulted-in page as asynchronous hits and subtracts them from mmap_miss: mmap_miss_saved = READ_ONCE(file->f_ra.mmap_miss); if (mmap_miss >= mmap_miss_saved) WRITE_ONCE(file->f_ra.mmap_miss, 0); else WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss_saved - mmap_miss); This suppresses mmap_miss growth even when faults are clearly synchronous. I commented out the above block, re-run the test and saw the benchmark time drop from ~6200 ms to ~1500 ms, indicating that readahead was being wrongly retained. Jan Kara previously mentioned a similar issue in [1]: > I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can > you please test whether attached patches fix the trashing for you? At least > now I can see mmap_miss properly increments when we are hitting uncached > pages... [1] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/ So my questions are: 1. Is this mmap_miss suppression intentional? 2. Was the design intended to avoid false positives for disabling readahead? 3. Would it make sense to reclassify the "asynchronous hits" in filemap_map_pages() to exclude those resulting directly from the current fault? Benchmark below. #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #define PAGE_SIZE 4096 void clear_page_cache() { sync(); int fd = open("/proc/sys/vm/drop_caches", O_WRONLY); if (fd == -1) { perror("open"); return; } if (write(fd, "3\n", 2) == -1) { perror("write"); } close(fd); } void rand_read(const char *memblock, uint64_t size, uint64_t nr) { for (uint64_t i = 0; i < nr; i++) { uint64_t pos = ((uint64_t)rand()) * rand() % size; if (memblock[pos] == '7') printf("Magic number!\n"); } } long long get_time_ms() { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return (long long)ts.tv_sec * 1000 + ts.tv_nsec / 1000000; } int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s [num_accesses]\n", argv[0]); return 1; } int fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open file"); return 1; } struct stat sb; fstat(fd, &sb); const char *memblock = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); if (memblock == MAP_FAILED) { perror("mmap"); return 1; } uint64_t nr_access = (argc > 2) ? strtoull(argv[2], NULL, 10) : (512 * 1024); clear_page_cache(); long long start = get_time_ms(); rand_read(memblock, sb.st_size, nr_access); long long end = get_time_ms(); printf("Rand Read Time: %lldms\n", end - start); return 0; } Reproduction steps: 1. save the above code as randread.c 2. # gcc -O2 -o randread randread.c 3. # fallocate -l 4G testfile 4. # ./randread testfile 524288 5. Example output: Rand Read Time: 1400ms Thanks, Haoran Zhu