linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Haoran Zhu <zhr1502@sjtu.edu.cn>
To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>
Subject: [Question] mmap_miss not increasing in mmap random reads
Date: Tue, 3 Jun 2025 14:05:11 +0800	[thread overview]
Message-ID: <aD6Ql3KA6u9B58lg@nixos.> (raw)

Hi all,

While examining mm/filemap.c, I noticed that file->f_ra.mmap_miss does not increase as expected under mmap-based random read workloads, which prevents readahead from being disabled—even when it's clearly ineffective.

Test case: 4GB file mmap'd and randomly accessed in a KVM guest with 2GB RAM. See benchmark code attached at the end. I used the following bpftrace to monitor readahead activity:

    kfunc:vmlinux:do_page_cache_ra {
        printf("size: %d start: %d mmap_miss: %d from %s\n",
               args->ractl->file->f_ra.size,
               args->ractl->file->f_ra.start,
               args->ractl->file->f_ra.mmap_miss,
               comm);
    }

The result is that mmap_miss remains low, and readahead remains enabled. From filemap_map_pages(), this appears to be due to the logic in mm/filemap.c:filemap_map_pages that treats the surrounding folios of a faulted-in page as asynchronous hits and subtracts them from mmap_miss:

    mmap_miss_saved = READ_ONCE(file->f_ra.mmap_miss);
    if (mmap_miss >= mmap_miss_saved)
        WRITE_ONCE(file->f_ra.mmap_miss, 0);
    else
        WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss_saved - mmap_miss);

This suppresses mmap_miss growth even when faults are clearly synchronous. I commented out the above block, re-run the test and saw the benchmark time drop from ~6200 ms to ~1500 ms, indicating that readahead was being wrongly retained.

Jan Kara previously mentioned a similar issue in [1]:

> I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can
> you please test whether attached patches fix the trashing for you? At least
> now I can see mmap_miss properly increments when we are hitting uncached
> pages...

[1] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/

So my questions are:
1. Is this mmap_miss suppression intentional?
2. Was the design intended to avoid false positives for disabling readahead?
3. Would it make sense to reclassify the "asynchronous hits" in filemap_map_pages() to exclude those resulting directly from the current fault?

Benchmark below.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/mman.h>
    #include <stdint.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <errno.h>
    #include <time.h>
    #include <string.h>

    #define PAGE_SIZE 4096

    void clear_page_cache() {
        sync();
        int fd = open("/proc/sys/vm/drop_caches", O_WRONLY);
        if (fd == -1) {
            perror("open");
            return;
        }
        if (write(fd, "3\n", 2) == -1) {
            perror("write");
        }
        close(fd);
    }

    void rand_read(const char *memblock, uint64_t size, uint64_t nr) {
        for (uint64_t i = 0; i < nr; i++) {
            uint64_t pos = ((uint64_t)rand()) * rand() % size;
            if (memblock[pos] == '7') printf("Magic number!\n");
        }
    }

    long long get_time_ms() {
        struct timespec ts;
        clock_gettime(CLOCK_MONOTONIC, &ts);
        return (long long)ts.tv_sec * 1000 + ts.tv_nsec / 1000000;
    }

    int main(int argc, char *argv[]) {
        if (argc < 2) {
            fprintf(stderr, "Usage: %s <filename> [num_accesses]\n", argv[0]);
            return 1;
        }

        int fd = open(argv[1], O_RDONLY);
        if (fd == -1) {
            perror("open file");
            return 1;
        }

        struct stat sb;
        fstat(fd, &sb);

        const char *memblock = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
        if (memblock == MAP_FAILED) {
            perror("mmap");
            return 1;
        }

        uint64_t nr_access = (argc > 2) ? strtoull(argv[2], NULL, 10) : (512 * 1024);

        clear_page_cache();

        long long start = get_time_ms();
        rand_read(memblock, sb.st_size, nr_access);
        long long end = get_time_ms();

        printf("Rand Read Time: %lldms\n", end - start);
        return 0;
    }

Reproduction steps:
1. save the above code as randread.c
2. # gcc -O2 -o randread randread.c
3. # fallocate -l 4G testfile
4. # ./randread testfile 524288
5. Example output:

    Rand Read Time: 1400ms

Thanks,
Haoran Zhu


                 reply	other threads:[~2025-06-03  6:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD6Ql3KA6u9B58lg@nixos. \
    --to=zhr1502@sjtu.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox