From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EE03C77B72 for ; Fri, 14 Apr 2023 20:47:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 943BD6B0072; Fri, 14 Apr 2023 16:47:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F3746B0075; Fri, 14 Apr 2023 16:47:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 793F4900002; Fri, 14 Apr 2023 16:47:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 690906B0072 for ; Fri, 14 Apr 2023 16:47:46 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3EEE3160414 for ; Fri, 14 Apr 2023 20:47:46 +0000 (UTC) X-FDA: 80681182932.15.D422169 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by imf26.hostedemail.com (Postfix) with ESMTP id F38E1140003 for ; Fri, 14 Apr 2023 20:47:43 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=ikTxL2C0; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ake9xZjb; spf=pass (imf26.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.24 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681505264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qMW0SdtJhp6yzgDZfkC4LhSYXOnwr8epq1JDN68nykc=; b=sYO4RJZNP8XFbVvoRM2e5bkTqUFPJRK9+1R5nvRF2JD0dJgPI+wSFT18zwGcYh3t6LFO1T GIb9pcxqG90aqnIid7i6WISIEcO2YVX5fO8lN0SrMVH2w5WyNbc3KWI+Cp38yBEA9Rq/Om r5SZh529bd4AEayqXWYyVE4Xbdfq/6g= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=ikTxL2C0; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ake9xZjb; spf=pass (imf26.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.24 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681505264; a=rsa-sha256; cv=none; b=VU5ty2lYziGjDUyRI45XJWFVzMx4LVqmoDO4MOvmdoCJDb0f5o4F+CGNAcyHrpACzf+Twg uJiiVAqqF52Zzxa06N2SZE39KWiQ2PmndGv2Bu74ik4ttj/ITBCAxbWK2aGxAul13ZGe9k iqluINROM8kmPAftg7wxjJ+4zgEt2UA= Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 55E94320024A; Fri, 14 Apr 2023 16:47:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 14 Apr 2023 16:47:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1681505261; x=1681591661; bh=qMW0SdtJhp6yzgDZfkC4LhSYXOnwr8epq1J DN68nykc=; b=ikTxL2C0oQdZ7wOCHGPKsSEAo3mE6aRT42rsZatMpqORXxWS+m9 FPesR89/urodHgwt0Qip6AyYzxjCfYBn+Uph9Al0K1YUk57x+tVsuV+0LEwamj85 /S5zQY/M3VzybOOLz8yZYXUh0RJ9hrcxCEKzxcxmttkVlmGetGaX71CAO6yup71X yXeO3sFNeXg0PFLvzF7s4ZMDYBFGYbit3y4G3syPDMGf72UjvAqonCwdY3zXKRvh aMA5FTBqnF7pgPDf/RP5z1ogZSi7xjlGWOyLcMavEgyiZ7uzGSnqAmD6klX8DA5D X5dsll5Gp4+wPKy/eBi+495APUafLQNjEFg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1681505261; x=1681591661; bh=qMW0SdtJhp6yzgDZfkC4LhSYXOnwr8epq1J DN68nykc=; b=ake9xZjbMFzcdG+I7zT8KaZtDFBk2ujvJqjxssJ70W+PpqlcOVw tGwNwf5d9rWQxWDMNDVMxvnU+/z8XmBFXU8Mp3FmVcrNei5aHkjQsi9buDCUcwrF HTStcagjQvILQTl8iilLTcXViZbLUPDF9OoKXEvlwn+EF+9kVIIuyQlRoaA+VpGT E7jHOhd05osLv7tcpQNsVEK4gPbl52e3IfIkTAfLZPALYPeJAYj+d2+anM4gU9Oa FoLeWGoqTGijn5W1xdlYp9SDRpzPshibq9OpuGT11WAsfWvWk8m2MuCvc5tktl31 asuW0UZQFhDdn56rn73J9dyonvsJE6OyCFQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdeltddgudehhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefkffggfgfuhffvvehfjggtgfesthekredttdefjeenucfhrhhomhepuegv rhhnugcuufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrg hilhdrfhhmqeenucggtffrrghtthgvrhhnpeduteejuedtvddtudfffeduudehvddvhfeg leehteevgfekhfelgefhfffgtedutdenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhl rdhfmh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 14 Apr 2023 16:47:40 -0400 (EDT) Message-ID: Date: Fri, 14 Apr 2023 22:47:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: sequential 1MB mmap read ends in 1 page sync read-ahead Content-Language: en-US, de-DE From: Bernd Schubert To: Matthew Wilcox Cc: "linux-fsdevel@vger.kernel.org" , Miklos Szeredi , Andrew Morton , linux-mm@kvack.org References: <1e88b8ed-5f17-c42e-9646-6a97efd9f99c@fastmail.fm> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F38E1140003 X-Stat-Signature: jkigyynj41bjgd6nwaqbjqokzei4f1r7 X-Rspam-User: X-HE-Tag: 1681505263-407245 X-HE-Meta: U2FsdGVkX1/RspK69o9KAIkU9nkp08c/+OiGx5mO4Bc37JUyjsOiH9rYZdpIFBZ0koWFUjeL/kb6eP1w+oK6I8tRGXdPYcNXDQh9O8xHV+x0z0+R+CULVmmJDWcMgTWEw7W4HRh83alosIETqL7uV0DBPNhMGoRyMZizwchHKqNg8K0Mu31eoCV7kOqgSLzWjjiyq1OyIq65hmt5vmvnPJUgAjSQMxRIjK87oYHexDT6thJDMKq+7wYznvJ8xUs78uve/BaqDjOaRu1/uj8hv3+6I1NFdq2aRPNWcogHJfBg9mW1d9sp6Pqc4qwppN1N7HRKE/pbv0mQu0uj7BoYO74jnk8hF9NFjimHH9EB6d7IV1T+N0uL1UQQCvvcKAlep6LqsCOF9zVXv22pjB3tCWcIiSF6r375zXlhh92nXjTSp4vBX1Lfma3aCH3aS+8WhfHh8qLPFuY6/Dyise0pxKxUtUQIJFAecyv3D2Kv0NvoA9plsIam7JJ4nl+U4yb5YghwnG/nLkvv+QXF8MUS1ZxHdX23W4H2qsJNrlnYCICRY+7NTrWM/DZaJScbdsrNlAvcqAOROagP5flhKkRBBrpDk4RpM0sCH61VCgQUC6XznC6cdSlxa7wgW/yD//GBj3E/ONKRnUQ68T33PKiCJaEr7nDz5e/jIc4AQeHx9ZYp0KurcqHTu9W2u7/L8WR9vV/e1DJsineQcof0UMQSycWH4BiMPdKseZd5V5whDV49ZF+195yLuJZpzwXnIH8RgcHrr7c46pxFB1Weosem4tZYl6pSosZE8HUYrRXiCanyyRmUi4rkkvUs9kQHPjDq5vdwaGOhqnBYu7aLp2pwlzSjsQMfP2Ywr6OMth5DxUASeISF+5MzX2sHj3MxYw/WpJVWTUGxpAiSYhQme00Td8T0qCm4QCW40WCKyJbu3J+oq8Q7j9zO/D0L4fZZrM8NMfb490xttZSV8T22Q0C 3Jzz7XqB SYPqtcs3MJWLoGOzMdcrPvk6jdJhWihw7Y9XuY/mXsEMVQhLpPJyf5YPg4FcgC8VzJ6vlVusdsbBGublpFxPb6Sd0LXKftp03ndBWbBFkFrEuLpQzU/1+EueOEtHK+DNSoKNMcSkPSneD21AwgkkpDPEQxs192/ex7EodCXP3fcU1EwbYs8vrx+wjlnOy2HL83GO8oqvMcJSZr/S8wB8oO+DSmog8R2E3HlqpkkMEQF59g/84Sa14aNxvdp/eyCiOMDVP6lm8GFTp61PB+rUkDRpxL1Zv0xDpRjzV/SjaB0sMpdFRqFQazwU5+TB9N+DB40+Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/14/23 21:52, Bernd Schubert wrote: > > > On 4/14/23 17:05, Bernd Schubert wrote: >> >> >> On 4/14/23 06:07, Matthew Wilcox wrote: >>> On Thu, Apr 13, 2023 at 11:33:09PM +0200, Bernd Schubert wrote: >>>> Sorry, forgot to add Andrew and linux-mm into CC. >>>> >>>> On 4/13/23 23:27, Bernd Schubert wrote: >>>>> Hello, >>>>> >>>>> I found a weird mmap read behavior while benchmarking the fuse-uring >>>>> patches. >>>>> I did not verify yet, but it does not look fuse specific. >>>>> Basically, I started to check because fio results were much lower >>>>> than expected (better with the new code, though) >>>>> >>>>> fio cmd line: >>>>> fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse >>>>> --directory=/scratch/dest/ --rw=read multi-file.fio >>>>> >>>>> >>>>> bernd@squeeze1 test2>cat multi-file.fio >>>>> [global] >>>>> group_reporting >>>>> bs=1M >>>>> runtime=300 >>>>> >>>>> [test] >>>>> >>>>> This sequential fio sets POSIX_MADV_SEQUENTIAL and then does memcpy >>>>> beginning at offset 0 in 1MB steps (verified with additional >>>>> logging in fios engines/mmap.c). >>>>> >>>>> And additional log in fuse_readahead() gives >>>>> >>>>> [ 1396.215084] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>>> index=0 >>>>> [ 1396.237466] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>>> index=255 >>>>> [ 1396.263175] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>>> index=254 >>>>> [ 1396.282055] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>>> index=253 >>>>> ... >>>>> [ 1496.353745] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>>> index=64 >>>>> [ 1496.381105] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 >>>>> index=511 >>>>> [ 1496.397487] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>>> index=510 >>>>> [ 1496.416385] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 >>>>> index=509 >>>>> ... >>>>> >>>>> Logging in do_sync_mmap_readahead() >>>>> >>>>> [ 1493.130764] do_sync_mmap_readahead:3015 ino=132 index=0 count=0 >>>>> ras_start=0 ras_size=0 ras_async=0 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> [ 1493.147173] do_sync_mmap_readahead:3015 ino=132 index=255 count=0 >>>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> [ 1493.165952] do_sync_mmap_readahead:3015 ino=132 index=254 count=0 >>>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> [ 1493.185566] do_sync_mmap_readahead:3015 ino=132 index=253 count=0 >>>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> ... >>>>> [ 1496.341890] do_sync_mmap_readahead:3015 ino=132 index=64 count=0 >>>>> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> [ 1496.361385] do_sync_mmap_readahead:3015 ino=132 index=511 count=0 >>>>> ras_start=96 ras_size=64 ras_async=64 ras_ra_pages=64 ras_mmap_miss=0 >>>>> ras_prev_pos=-1 >>>>> >>>>> >>>>> So we can see from fuse that it starts to read at page index 0, wants >>>>> 64 pages (which is actually the double of bdi read_ahead_kb), then >>>>> skips index 64...254) and immediately goes to index 255. For the >>>>> mmaped >>>>> memcpy pages are missing and then it goes back in 1 page steps to get >>>>> these. >>>>> >>>>> A workaround here is to set read_ahead_kb in the bdi to a larger >>>>> value, another workaround might be (untested) to increase the >>>>> read-ahead >>>>> window. Either of these two seem to be workarounds for the index order >>>>> above. >>>>> >>>>> I understand that read-ahead gets limited by the bdi value (although >>>>> exceeded above), but why does it go back in 1 page steps? My >>>>> expectation >>>>> would have been >>>>> >>>>> index=0  count=32 (128kb read-head) >>>>> index=32 count=32 >>>>> index=64 count=32 >>> >>> What I see with XFS is: >>> >>>               fio-27518   [005] .....   276.565025: >>> mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x23a8c ofs=0 order=2 > > ... > >>> mm_filemap_add_to_page_cache: dev 8:32 ino 44 pfn=0x15e40 ofs=786432 >>> order=6 >>> >>> ... it then gets "stuck" at order-6, which is expected for a 256kB >>> readahead window. >>> >>> This is produced by: >>> >>> echo 1 >>> >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable >>> fio --size=1G --numjobs=1 --ioengine=mmap >>> --output-format=normal,terse --directory=/mnt/scratch/ --rw=read >>> multi-file.fio >>> echo 0 >>> >/sys/kernel/tracing/events/filemap/mm_filemap_add_to_page_cache/enable >>> less /sys/kernel/tracing/trace >>> >> >> Thanks for looking at it Matthew! >> >> I see the same as on fuse on xfs - same output as I initially >> posted (except the fuse logging, of course). >> >> With tracing and reduced file size to 2M >> > > ... > >>               fio-3459    [011] ..... 65055.436534: >> mm_filemap_add_to_page_cache: dev 252:16 ino 84 pfn=0x12cfb5 >> ofs=1052672 order=0 >> >> And then it is stuck at order=0 >> > > When I reduce bs to 4K I get similar results as you > > fio --size=10M --numjobs=1 --ioengine=mmap --output-format=normal,terse > --directory=/scratch/source/ --rw=read multi-file.fio --bs=4K > --group_reporting > > (so bs set to 4K) > > and a basically empty job file > > bernd@squeeze1 test2>cat multi-file.fio > [global] > > [test] > > > > Up to bs=512K it works fine, 1M (and for what it matters > already 768K) introduce the order=0 issue. Hmm, I replaced memcpy with dumb version, that copies byte by byte - problem gone. Is it possible that the optimized memcpy causes caused kind of random memory access and confuses mm / readahead? And somehow your memcpy or system is not doing that? Thanks, Bernd