From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89C31C77B6E for ; Fri, 14 Apr 2023 04:07:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00C2E900002; Fri, 14 Apr 2023 00:07:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFF656B0075; Fri, 14 Apr 2023 00:07:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E140A900002; Fri, 14 Apr 2023 00:07:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D0D646B0072 for ; Fri, 14 Apr 2023 00:07:22 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 90D231402D6 for ; Fri, 14 Apr 2023 04:07:22 +0000 (UTC) X-FDA: 80678661924.17.B6CBAF9 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id 2BC8614000A for ; Fri, 14 Apr 2023 04:07:18 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="F/IwI1ym"; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681445241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tv+CDHQFvPKKZxgkjLcvB69sMHqucHbBd3BQ0wZ/t0M=; b=jlkeKV8mjxGR7/9vWcOXchuHLL3Dgr83Ppf+lgHDs3r88VMYNbNWuJlFs6RJUGB4xamTDf jaicpwinvsCGbDsZlWcwhlRCNMyO3boRe0/M5rG17GzaJsRd+avdHmRAEtpMkpUxe+RE0X 9d5AVqtGcVabIDp4FSFsvmE2tkCel4A= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="F/IwI1ym"; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681445241; a=rsa-sha256; cv=none; b=HXMje/TQtlH0vR8yCSzT8YRcnAFsZZnwL13+8Oat9w5QQS5iA2SDDDYqEQ9P2D+IpieTso hLucK1qi+Kb2vy5BzHgNNYS5LH4MR9pv1PfOJPy2E1dJylz4imrGxyh7CdJuBmF9UTlZ7Q fhsUKgOkaJXAsderyhKzNhpnE272U7A= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=tv+CDHQFvPKKZxgkjLcvB69sMHqucHbBd3BQ0wZ/t0M=; b=F/IwI1ymt2MmhmHHJUgmZLoaHl c7g/JlW+QqBmLS8XETO0uMxvg1smjIXRN9UQJxKpYCHincZDcXe8FiLJoqWqMssCy8zWW7LMYr9dK 4G84jtRC8IgV2aNFsos1C2Nu7bljSFgk5ddEqUx+ev93GFr8nfJdD1m9DxyImrQfKVKXuZNYe1RFw SuPZ47PY5St/jUezOFqeGvm4/4t0/iJ5NnS7ArBH7qAYNFbNcp2qSj3NKnWFfVx6txntW93gYftNZ aISMSjBOEbedQ9JjTs/1rcpSk/8H3KmW8scYVcksWvYRPKutQPXPF7W8A0eeANzO4uwP2uis4JVMy UBejYs+A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pnAiJ-008Qwi-Gw; Fri, 14 Apr 2023 04:07:07 +0000 Date: Fri, 14 Apr 2023 05:07:07 +0100 From: Matthew Wilcox To: Bernd Schubert Cc: "linux-fsdevel@vger.kernel.org" , Miklos Szeredi , Andrew Morton , linux-mm@kvack.org Subject: Re: sequential 1MB mmap read ends in 1 page sync read-ahead Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: ekykwbjfzwikhnz5dxudt31ixd9w5ixr X-Rspamd-Queue-Id: 2BC8614000A X-HE-Tag: 1681445238-855060 X-HE-Meta: U2FsdGVkX1/7IqHOoni01kdZ8uD8sPIWc5t+TY8vnM9wD2Ej1a24GnsPEoQxdXKWqnZF72PbuxFK9pS5Z1Ai2KEyrXKs4AeD/e9at7iAU80pL0efHO1rnRojpRftaLaetu2cBfvln5lM8kfJtwgHXrr9fvWR29BfCzbBQq/K+zIASENlTIY8hIv3I5Oyjj+LK2ZNehqQarA/UUbEvc8qL78BnDE+hcn/WodyqTv3iU5axft4RqQTBcFQAjhj+WV6RIDptyYDQNJiNCoU1RExsq1j0YcAdR1wr8kh6V2rtBHy/HGy6SoHfux8+tB2M7XRLrtAmaX/IdJf3X9u3vidDooK09wmO3QiY4+WCOMd/3gl2fw6pHG1YOgSbFWQ8rvLDJchw+m4iTH5x7Z3XEj93MnkFrEv4JzpMb5B+4NrsoYJv4wfvbZaFePAGCVTySdItGRVoSjHK8SmpJq54VEiZf7sp2Y09ydzVsFNJ1YyQ1jldUZCx9lEd2olz+z4Og5X8ovmLGye9BdRbfryu7+mEHwxHE8TsTa7Y0Jt6OTWrc4kqGn+c26eZ/kZlvfxVbb1uhJPk/Wq64Y6XcOl/Ixj4N/0fuirWN99Joz8B7HITLIdKRQh2AbAsT0w9Gd8/jt7EZHKTi3GAm8Y+Uw1amAeUbPg6H4m7QHTpwG2q5qNn5M6yqZekO20vTu/EcrfCao000sdh1MXF5YyBq7tyFqb6YdxY4m7gW87uGHwvNbiqL4LkKkHstsae/YPPaEP8AJ/FulON5qGsHCmhX8vnbGETRWlDVxAqq2XP4XAfTMUmj2ZlGyjk2oY0t15gxdJLakvnkcC2V70KPX15bkjeasOWFFtt6shc4iF/TmE9sTlYqw2xsLYwnOaPRXi8z8nStRPKRdYVOWOlpjI3rg/gbL4tteapS5IElKaOVAjbRkRqwqkOn9bz4mLdHUceDkfjWHtSkrd7DU16qmqJ+RxHVM 4Pl781pP RSk9wZGbaicFAZt/koHURDLWG6kLuSYpujg2fEtMmgYxk+y/GLib37VeHotVxAaFiA+HpzTBdxqq1a9Bxm30/1pcr00F+cLTUUcCBHpGwH8LwkiQK0N51snk4Nr4bkC685fFO+gteCozRu8LM2RMvDtEJZTI1HqOtuLNfdpxGbvbUekIUtiWhBEW9L4g9R2NtHJC6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 13, 2023 at 11:33:09PM +0200, Bernd Schubert wrote: > Sorry, forgot to add Andrew and linux-mm into CC. > > On 4/13/23 23:27, Bernd Schubert wrote: > > Hello, > > > > I found a weird mmap read behavior while benchmarking the fuse-uring > > patches. > > I did not verify yet, but it does not look fuse specific. > > Basically, I started to check because fio results were much lower > > than expected (better with the new code, though) > > > > fio cmd line: > > fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse > > --directory=/scratch/dest/ --rw=read multi-file.fio > > > > > > bernd@squeeze1 test2>cat multi-file.fio > > [global] > > group_reporting > > bs=1M > > runtime=300 > > > > [test] > > > > This sequential fio sets POSIX_MADV_SEQUENTIAL and then does memcpy > > beginning at offset 0 in 1MB steps (verified with additional > > logging in fios engines/mmap.c). > > > > And additional log in fuse_readahead() gives > > > > [ 1396.215084] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 > > index=0 > > [ 1396.237466] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 > > index=255 > > [ 1396.263175] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 > > index=254 > > [ 1396.282055] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 > > index=253 > > ... > > [ 1496.353745] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 > > index=64 > > [ 1496.381105] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 > > index=511 > > [ 1496.397487] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 > > index=510 > > [ 1496.416385] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 > > index=509 > > ... > > > > Logging in do_sync_mmap_readahead() > > > > [ 1493.130764] do_sync_mmap_readahead:3015 ino=132 index=0 count=0 > > ras_start=0 ras_size=0 ras_async=0 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > [ 1493.147173] do_sync_mmap_readahead:3015 ino=132 index=255 count=0 > > ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > [ 1493.165952] do_sync_mmap_readahead:3015 ino=132 index=254 count=0 > > ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > [ 1493.185566] do_sync_mmap_readahead:3015 ino=132 index=253 count=0 > > ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > ... > > [ 1496.341890] do_sync_mmap_readahead:3015 ino=132 index=64 count=0 > > ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > [ 1496.361385] do_sync_mmap_readahead:3015 ino=132 index=511 count=0 > > ras_start=96 ras_size=64 ras_async=64 ras_ra_pages=64 ras_mmap_miss=0 > > ras_prev_pos=-1 > > > > > > So we can see from fuse that it starts to read at page index 0, wants > > 64 pages (which is actually the double of bdi read_ahead_kb), then > > skips index 64...254) and immediately goes to index 255. For the mmaped > > memcpy pages are missing and then it goes back in 1 page steps to get > > these. > > > > A workaround here is to set read_ahead_kb in the bdi to a larger > > value, another workaround might be (untested) to increase the read-ahead > > window. Either of these two seem to be workarounds for the index order > > above. > > > > I understand that read-ahead gets limited by the bdi value (although > > exceeded above), but why does it go back in 1 page steps? My expectation > > would have been > > > > index=0  count=32 (128kb read-head) > > index=32 count=32 > > index=64 count=32 What I see with XFS is: fio-27518 [005] ..... 276.565025: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x23a8c ofs=0 order=2 fio-27518 [005] ..... 276.565035: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x19868 ofs=16384 order=2 fio-27518 [005] ..... 276.565036: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x237fc ofs=32768 order=2 fio-27518 [005] ..... 276.565038: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x27518 ofs=49152 order=2 fio-27518 [005] ..... 276.565039: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x14c7c ofs=65536 order=2 fio-27518 [005] ..... 276.565040: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x14338 ofs=81920 order=2 fio-27518 [005] ..... 276.565041: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x201fc ofs=98304 order=2 fio-27518 [005] ..... 276.565042: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x1fb98 ofs=114688 order=2 fio-27518 [005] ..... 276.565044: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x14510 ofs=131072 order=2 fio-27518 [005] ..... 276.565045: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x1e88c ofs=147456 order=2 fio-27518 [005] ..... 276.565046: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x26f00 ofs=163840 order=2 ... dev 8:32 ino 44 pfn=0x14f30 ofs=262144 order=4 fio-27518 [005] ..... 276.567718: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x145a0 ofs=327680 order=4 fio-27518 [005] ..... 276.567720: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x15730 ofs=393216 order=4 fio-27518 [005] ..... 276.567722: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x15e30 ofs=458752 order=4 fio-27518 [005] ..... 276.567942: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x18b00 ofs=524288 order=6 fio-27518 [005] ..... 276.569982: mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x15e40 ofs=786432 order=6 ... it then gets "stuck" at order-6, which is expected for a 256kB readahead window. This is produced by: echo 1 >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse --directory=/mnt/scratch/ --rw=read multi-file.fio echo 0 >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable less /sys/kernel/tracing/trace