From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BB75C77B72 for ; Fri, 14 Apr 2023 19:52:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F24B8900003; Fri, 14 Apr 2023 15:52:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EADC4900002; Fri, 14 Apr 2023 15:52:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D004F900003; Fri, 14 Apr 2023 15:52:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BAACC900002 for ; Fri, 14 Apr 2023 15:52:57 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 63C881402F3 for ; Fri, 14 Apr 2023 19:52:57 +0000 (UTC) X-FDA: 80681044794.17.F62EDE5 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by imf12.hostedemail.com (Postfix) with ESMTP id 123B740015 for ; Fri, 14 Apr 2023 19:52:53 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=VqOM8xKh; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=jpMzWBtZ; dmarc=pass (policy=none) header.from=fastmail.fm; spf=pass (imf12.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.24 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681501974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UwrEhkkjARKwmx+wt4KVFrRHwIPuRo3w6pp5461mGg4=; b=4yKFV1ST1NnCrR9ispkhXpAAe6+wrKT7SgZKm6NfpqmJ/D38M8SwKyeUpxcfOYmepVP+98 AHNHtTv5z7SgfbexuAaYHANlL+ZDWTkPSupnfbdSE+58ucUwTXlBIUZbQRYX/lhEZfJ7N4 Onl/MehRJUAvX5HnTTFEEJ7DraeklEY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=VqOM8xKh; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=jpMzWBtZ; dmarc=pass (policy=none) header.from=fastmail.fm; spf=pass (imf12.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.24 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681501974; a=rsa-sha256; cv=none; b=kQZhXI4eeHZQ8GoRukF7RZKCP2xh4ChsQQIQlNMewupWAzWJ5JJxe9bz+g0OT5wv1yElma VAyBnM1PyfSfiUrLHv9hnY3OQYedne0pdPwLRXDbO+MPymKfV9lCy3h4iVnJDQh+6Ac/+V nRFViG5PROW1YHiFIpm+xYt8wKTIXVo= Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 30CD4320046E; Fri, 14 Apr 2023 15:52:49 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Fri, 14 Apr 2023 15:52:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1681501968; x=1681588368; bh=UwrEhkkjARKwmx+wt4KVFrRHwIPuRo3w6pp 5461mGg4=; b=VqOM8xKhsEQqlBuK3oQwjA63pW3LWO5qpXIwqyKUhswrvwsHlI4 q+Wk13Q/Ib5LoSBhXaUs1TAE9HSN/7Xz5xOxoEjvvLK7fN4Y+D5BGzAy6s8JpKDr v9pfyBY9HLmpRud0LO2VcCC/8+UfCKLtvHfQhKcvKPudbqmutKZ479yrLET8Zg7C sn4NL5/a/yxoWliz+a1TAA/M/v1s6zbP5CYBmkX4OVxqqnL+cWUJz+eWYCHenqzG UaCF2dDmo0Kqf96rapTNuoTGUeg5ecNwaCDeR90LwQFLsLRx/R30DZRGio4R0Vno 7s/bK9T2bA4Pe+Dm2356d0RPHcIUm9YHCSA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1681501968; x=1681588368; bh=UwrEhkkjARKwmx+wt4KVFrRHwIPuRo3w6pp 5461mGg4=; b=jpMzWBtZnuVNpku/BGufbZbAtRqSdB2Xv/A7bD360pEUhb4r6WZ QyYWae1Nug+2Ujz++iUztjVdaXn3y5j/iTGDFwocNwKu5aEO4t23QGYIQHEBGk9L G2rwoWEshRs025EkoYHcQOV7wsk5ze1KbGzT3hkn5IAYpVUdqqAXQqvJPWBZyjKs cD5COVDdNI7snnAD5SI/46FGDYc4h7izjZfLglwWqEo/Fo3ERpcrMY/1CfBL6TTd Erjlsq4skWoc0fZr34r4IZ1XWNPGF9bg8Qygcv+8SNddZLL9N21UwExzKGnvKeUe GKhrAgGPGW55EXSZDcOiN5t8w6TZNKIm3Mg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdeltddgudegfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefkffggfgfuhffvvehfjggtgfesthekredttdefjeenucfhrhhomhepuegv rhhnugcuufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrg hilhdrfhhmqeenucggtffrrghtthgvrhhnpeduteejuedtvddtudfffeduudehvddvhfeg leehteevgfekhfelgefhfffgtedutdenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhl rdhfmh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 14 Apr 2023 15:52:47 -0400 (EDT) Message-ID: Date: Fri, 14 Apr 2023 21:52:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: sequential 1MB mmap read ends in 1 page sync read-ahead Content-Language: en-US, de-DE From: Bernd Schubert To: Matthew Wilcox Cc: "linux-fsdevel@vger.kernel.org" , Miklos Szeredi , Andrew Morton , linux-mm@kvack.org References: <1e88b8ed-5f17-c42e-9646-6a97efd9f99c@fastmail.fm> In-Reply-To: <1e88b8ed-5f17-c42e-9646-6a97efd9f99c@fastmail.fm> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 123B740015 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: k8oyis9mu9m9k7h1u9ome3epgm36n3bn X-HE-Tag: 1681501973-100971 X-HE-Meta: U2FsdGVkX1/Ij8Fr26nD+K/UQzkj3Vg7CS595MYhTSuRQOkRrkLR/WVnI/dWRKKv7ybIh5h/1w6plLNi2Lq3WmLjE63XUUd1pAqLGaxcFsm9mk3rP4jhHiqSFv3vJ5UyR+gnpYmYPcYqBmXYFrTyZj6qnG8AejEdIWyoEcDZxZmBfS4ERBzuwHx1DJ3QVOcmzBVGljicutCpnigiahAr/ykE5LRxY1sMPogAsSUhmQcZa+dJJAqGN1GIOepI+m12OiKKEnH/QvWz/fwij6UwfanmtXj1jeKGBEHEszztt3eZFkvMMDz4G8uxgeoBIW5hidxGLxnZZ8Ty/QoKCbwyeyQJOLC7xfJaKPgnjEuViYw1DLtHRre6IQJLdCRN0q7565QA9u28m2gQm0K09vrlV3Tqb1+3IcjoijCTwkSneC80A9zSHaYAqWtwPTKx9C6lx4azxjVGICBIrP8lAMYcKTeTNS4SWDVZ/oM+Dwaq3Z9rl+QTyV37yqnTuyi3cAaXRU+SnLRjh62QyJ4vc8dtywts96ZZuX+yboxXhwByMvPU1EYsU8HCeE8Bj1F15OUOnzA2d+pLG9zJzt/pUZ9FS0f5XO1v5ghCiIfU5zCRoSZdqPs/yCNoW3QOhMmFrotcnDo9kzb1u9fWMwCjzHV/I1Z/V/CpVUqVAdYyufNHPXvcIIfU/iEEk9uFgWi07Rbthg5XcpcpwX/pVnH3ivxotnntEHAjND37CFX58nv6cxwV/4Y9+qoWWQ6AlbKk8kR3lJwUvSV82TAP6xH/ewOp5/6rZWPFCjSEiLNGi6c15FIIl8YcgLhHoSt58oNbrVYI/8GFFyp4cEoGCzwlu4LESllZauRrSy4Az0q9zqgkqIY9+k/CVWaalK67kXN3E/yOXRb9PR06olSbiMWeBVVGsUFxXgJloHiTNipTKBDdZ33cPpLi18Y7/ebB+1Aq55qSRSgD7xCPC2Nw3uNxmdU L2p8L9hT jpXOzju6ivnwhVfxuvlkG4FJ/cHX4tp0TKcx8TUHmCBnJS6XlsF3SYis+8lX42p9LHV/zUf/s9n42QQO3kJygbStwauGwl/p3sKPo6aOx3ve3q0wfAsm5bIWq3aSM6IwHyxEI/4ZtM/vJCBk4XZozloUN1Xfqwm1E/Obic2UNGO/ziwMT11qCQcsqMfsLAMEWT98XUsV/pnimqoPCr73g8cjpaTj3AQKg3ONdXslVO/sSGGJ5w2V3nRs05Ttsa84cJ5yOjQ40Fcb88pF9i3qsT4jArbi92gfL6UeKu0AiOD49pnoSh2y4L70MvJHPb75HWaVA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/14/23 17:05, Bernd Schubert wrote: > > > On 4/14/23 06:07, Matthew Wilcox wrote: >> On Thu, Apr 13, 2023 at 11:33:09PM +0200, Bernd Schubert wrote: >>> Sorry, forgot to add Andrew and linux-mm into CC. >>> >>> On 4/13/23 23:27, Bernd Schubert wrote: >>>> Hello, >>>> >>>> I found a weird mmap read behavior while benchmarking the fuse-uring >>>> patches. >>>> I did not verify yet, but it does not look fuse specific. >>>> Basically, I started to check because fio results were much lower >>>> than expected (better with the new code, though) >>>> >>>> fio cmd line: >>>> fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse >>>> --directory=/scratch/dest/ --rw=read multi-file.fio >>>> >>>> >>>> bernd@squeeze1 test2>cat multi-file.fio >>>> [global] >>>> group_reporting >>>> bs=1M >>>> runtime=300 >>>> >>>> [test] >>>> >>>> This sequential fio sets POSIX_MADV_SEQUENTIAL and then does memcpy >>>> beginning at offset 0 in 1MB steps (verified with additional >>>> logging in fios engines/mmap.c). >>>> >>>> And additional log in fuse_readahead() gives >>>> >>>> [ 1396.215084] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>> index=0 >>>> [ 1396.237466] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>> index=255 >>>> [ 1396.263175] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>> index=254 >>>> [ 1396.282055] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>> index=253 >>>> ... >>>> [ 1496.353745] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>> index=64 >>>> [ 1496.381105] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>> index=511 >>>> [ 1496.397487] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>> index=510 >>>> [ 1496.416385] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>> index=509 >>>> ... >>>> >>>> Logging in do_sync_mmap_readahead() >>>> >>>> [ 1493.130764] do_sync_mmap_readahead:3015 ino=132 index=0 count=0 >>>> ras_start=0 ras_size=0 ras_async=0 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> [ 1493.147173] do_sync_mmap_readahead:3015 ino=132 index=255 count=0 >>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> [ 1493.165952] do_sync_mmap_readahead:3015 ino=132 index=254 count=0 >>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> [ 1493.185566] do_sync_mmap_readahead:3015 ino=132 index=253 count=0 >>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> ... >>>> [ 1496.341890] do_sync_mmap_readahead:3015 ino=132 index=64 count=0 >>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> [ 1496.361385] do_sync_mmap_readahead:3015 ino=132 index=511 count=0 >>>> ras_start=96 ras_size=64 ras_async=64 ras_ra_pages=64 ras_mmap_miss=0 >>>> ras_prev_pos=-1 >>>> >>>> >>>> So we can see from fuse that it starts to read at page index 0, wants >>>> 64 pages (which is actually the double of bdi read_ahead_kb), then >>>> skips index 64...254) and immediately goes to index 255. For the mmaped >>>> memcpy pages are missing and then it goes back in 1 page steps to get >>>> these. >>>> >>>> A workaround here is to set read_ahead_kb in the bdi to a larger >>>> value, another workaround might be (untested) to increase the >>>> read-ahead >>>> window. Either of these two seem to be workarounds for the index order >>>> above. >>>> >>>> I understand that read-ahead gets limited by the bdi value (although >>>> exceeded above), but why does it go back in 1 page steps? My >>>> expectation >>>> would have been >>>> >>>> index=0  count=32 (128kb read-head) >>>> index=32 count=32 >>>> index=64 count=32 >> >> What I see with XFS is: >> >>               fio-27518   [005] .....   276.565025: >> mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x23a8c ofs=0 order=2 ... >> mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x15e40 ofs=786432 >> order=6 >> >> ... it then gets "stuck" at order-6, which is expected for a 256kB >> readahead window. >> >> This is produced by: >> >> echo 1 >> >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable >> fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse >> --directory=/mnt/scratch/ --rw=read multi-file.fio >> echo 0 >> >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable >> less /sys/kernel/tracing/trace >> > > Thanks for looking at it Matthew! > > I see the same as on fuse on xfs - same output as I initially > posted (except the fuse logging, of course). > > With tracing and reduced file size to 2M > ... >              fio-3459    [011] ..... 65055.436534: > mm_filemap_add_to_page_cache: dev 252:16 ino 84 pfn=0x12cfb5 ofs=1052672 > order=0 > > And then it is stuck at order=0 > When I reduce bs to 4K I get similar results as you fio --size=10M --numjobs=1 --ioengine=mmap --output-format=normal,terse --directory=/scratch/source/ --rw=read multi-file.fio --bs=4K --group_reporting (so bs set to 4K) and a basically empty job file bernd@squeeze1 test2>cat multi-file.fio [global] [test] Up to bs=512K it works fine, 1M (and for what it matters already 768K) introduce the order=0 issue. Thanks, Bernd