From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9000D3B99E for ; Tue, 26 Nov 2024 15:06:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 295D06B0082; Tue, 26 Nov 2024 10:06:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 245396B0083; Tue, 26 Nov 2024 10:06:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E6A26B0092; Tue, 26 Nov 2024 10:06:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E54366B0082 for ; Tue, 26 Nov 2024 10:06:18 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 71B5A14139E for ; Tue, 26 Nov 2024 15:06:18 +0000 (UTC) X-FDA: 82828571910.26.26F2E2A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf19.hostedemail.com (Postfix) with ESMTP id 3790C1A0013 for ; Tue, 26 Nov 2024 15:06:09 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=HyZF2pJ8; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=01JgX4tr; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=HyZF2pJ8; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=01JgX4tr; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732633572; a=rsa-sha256; cv=none; b=dWYltSCIKaoDj55Ll2VoVeDSf7FwsScXQtsZtCRHlKG3M60d1KhoRqSDKeVvvpQOMkDttB R9O+ZadM3dOmWa8MDe7bROhs1qalOOq0CAXlOawzceHplT0ZhIYJxBRn5fYqZM8oS2eUzs Z8siXPrrfNAFn4d77OJRvx9JZxW6KQI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=HyZF2pJ8; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=01JgX4tr; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=HyZF2pJ8; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=01JgX4tr; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732633572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WC+RV5XxHoTaNHixzjyTfL7/6NAbrCIXiFN286SXR5M=; b=Wt/E39vVh2GelV5X1R/Y0xCFdUD92Wz3I+1NmiPMaX9QyG3tWQ4fUSIE+lSH6CPDzjNos3 3JvX+JSugC96RslDEzrQKxzWc3ikgZpic6SCa16F7IZlI0qhJCQW4CwQQhyp3HZJsx4+Rs C4TfrFe1d56isNjrxW6JCf26NL8y06k= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2A87621170; Tue, 26 Nov 2024 15:06:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732633574; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WC+RV5XxHoTaNHixzjyTfL7/6NAbrCIXiFN286SXR5M=; b=HyZF2pJ8z82qfdrkKDwXLNMWnqxueAVQbN8agZantfhA78SUv72ZNTn8y/Ss9fhovfBj4a eTK2702HWqxh/jOxFkvfe4K0GIV6blWS1b8XddOu9d4r57qdIdoj9S6GxSg6EHDKguRuRf Btuw7uu+qOr5CR9G0Cb9vlrTrhb5MvU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732633574; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WC+RV5XxHoTaNHixzjyTfL7/6NAbrCIXiFN286SXR5M=; b=01JgX4tr2v4KbW7ChJ+qGclNIpDdMoZR5SigniMqlbmQk9xSgaThONJVwO+jN1o0La6pbr HwgbPcE9qpWN5dAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732633574; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WC+RV5XxHoTaNHixzjyTfL7/6NAbrCIXiFN286SXR5M=; b=HyZF2pJ8z82qfdrkKDwXLNMWnqxueAVQbN8agZantfhA78SUv72ZNTn8y/Ss9fhovfBj4a eTK2702HWqxh/jOxFkvfe4K0GIV6blWS1b8XddOu9d4r57qdIdoj9S6GxSg6EHDKguRuRf Btuw7uu+qOr5CR9G0Cb9vlrTrhb5MvU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732633574; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WC+RV5XxHoTaNHixzjyTfL7/6NAbrCIXiFN286SXR5M=; b=01JgX4tr2v4KbW7ChJ+qGclNIpDdMoZR5SigniMqlbmQk9xSgaThONJVwO+jN1o0La6pbr HwgbPcE9qpWN5dAw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 180FA13890; Tue, 26 Nov 2024 15:06:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WtZ0BebjRWeaBQAAD6G6ig (envelope-from ); Tue, 26 Nov 2024 15:06:14 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id B01C0A08CA; Tue, 26 Nov 2024 16:06:13 +0100 (CET) Date: Tue, 26 Nov 2024 16:06:13 +0100 From: Jan Kara To: Anders Blomdell Cc: Philippe Troin , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, NeilBrown Subject: Re: Regression in NFS probably due to very large amounts of readahead Message-ID: <20241126150613.a4b57y2qmolapsuc@quack3> References: <49648605-d800-4859-be49-624bbe60519d@gmail.com> <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org> <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> <20241126103719.bvd2umwarh26pmb3@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241126103719.bvd2umwarh26pmb3@quack3> X-Stat-Signature: 8y5ua3pmejjgq5xcnxkanwzbtbfuspth X-Rspamd-Queue-Id: 3790C1A0013 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1732633569-784353 X-HE-Meta: U2FsdGVkX19OrHs0vIUtNZ9JZk7omRjHNgPEaRl/8JIVkEjXhjw5d6PnmFQRiErSycASkdiKSiyr764NeQhN2XNpkxNKw9FZMVhgIoHVy0E3e+7UBOV44Z2w0YG6+cgQQpSkduDcXivHFF9T8FnuLrAyUje9RFanRsC8eZUm+U8C2zZeeHXUyYheIr/Zm2ZKnmq9zPfh+yt6dVBrc0JYRzZZH7GdziM7W3uPpdA7cFM3qkvDULlFov5X3KSnLfy/niH5oQvTRZaPw/bwlF/Vo+/k+yoSW72TSSN/dJGvrjMEzitoQuaMzC41ja5EYpfJo9hhpShMLggumrUfgGP78JAKCEdubsigGCzTiJt2XQ3QRAa+VZtzICuVgoVEpNyCE1m/4ogtnJ1DJ3a/EcY92qUZjmcyd+rl15HnoO/EySy+3l70GpzdI7RM7CNflGoCtfmChBBhKVj1hXI6PMtRDXnw++Ubws6PisrXycsNRd+LiVBjUjPre3cr4PBM2PH+nE/GLtm9zH8cABYN0HFZFuAxS78ydYXcEB2ugimj7F29f3mSXppgRwPGq0o4vE0UJXSmhohjGNYxKe7cQ5tXH+cWb80lgaEKtpPChs/Smi/dsxhUu8WgNYIf4xJ/CAgfxbxVWgbZ2xDloLKUrG7cjml/1Oaq9X3a7vVHu3Fbn12MpXtjIf6hJdzdNy9h8BEjDzHDmeLhxcKXzX1JzWOS4QtmDvuzTMH0WdnU52P7sJXg19uCq7KTDAaei+bWO2gq6GqimRJRLluYccJCDTQVA9N4JejehD2glpq+PsYcUsIx57Vxy04xhGd0uH1pVHZ5ejYp4jJcDr2DdwDArzq7QI/b0bSQvVCI2HOLAOXATYVmD5NS0AIXNtkli1+EbVv/x9FAC3O/Dy/58G7l8K1H0lqk2MvFANFl21UD4/Yx4++BbnpQ2pY9nSfvW795qGNQUJ0ZnwdjAwlJJ6lNF9F +4VV6QzZ uuubS60+FqC9bO7tnFE0yKFEoPL7V3v8WB7miUYYNnffgJbS/U6Arve/sM8/nmpuRT2nTq4xq86bMNZF1xeZGnt5n6gAYYeNyHLI+zwQCd1SmFDRGTwnzKwfcD3GRfwaNoYtfV6mWO0vf1voOPzIsnrvxxDFzes3t/pSpiQY1oLkMPUeV48tTekQ1TIfjP6zWda1hqC2lzZRvPViSWqQ5nE5T79uZo7zmTd1R+BxiWPOGLeofdnTHzBIdNy8JeOWZ6uut+Ivv/YosHZH3af7rdNr9B4r4PIutufTSq/LRP5oyi/72Ik6/7+BbfC8BGcg/Hie1dxQZGzHVGbJzeqKklV9eqKntMfVk+pV3Nr8VQWU35cyORahLyrhwlz0/+ifqrPr4LGDC14HWekolYwveZ+stbD6mSohJFW4wbpGgjSq6ycO7Z9Qu/PrhXupwDLnlkAxo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 26-11-24 11:37:19, Jan Kara wrote: > On Tue 26-11-24 09:01:35, Anders Blomdell wrote: > > On 2024-11-26 02:48, Philippe Troin wrote: > > > On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote: > > > > When we (re)started one of our servers with 6.11.3-200.fc40.x86_64, > > > > we got terrible performance (lots of nfs: server x.x.x.x not > > > > responding). > > > > What triggered this problem was virtual machines with NFS-mounted > > > > qcow2 disks > > > > that often triggered large readaheads that generates long streaks of > > > > disk I/O > > > > of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache > > > > area of the > > > > machine. > > > > > > > > A git bisect gave the following suspect: > > > > > > > > git bisect start > > > > > > 8< snip >8 > > > > > > > # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25] > > > > readahead: properly shorten readahead when falling back to > > > > do_page_cache_ra() > > > > > > Thank you for taking the time to bisect, this issue has been bugging > > > me, but it's been non-deterministic, and hence hard to bisect. > > > > > > I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in > > > slightly different setups: > > > > > > (1) On machines mounting NFSv3 shared drives. The symptom here is a > > > "nfs server XXX not responding, still trying" that never recovers > > > (while the server remains pingable and other NFSv3 volumes from the > > > hanging server can be mounted). > > > > > > (2) On VMs running over qemu-kvm, I see very long stalls (can be up to > > > several minutes) on random I/O. These stalls eventually recover. > > > > > > I've built a 6.11.10 kernel with > > > 7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to > > > normal (no more NFS hangs, no more VM stalls). > > > > > Some printk debugging, seems to indicate that the problem > > is that the entity 'ra->size - (index - start)' goes > > negative, which then gets cast to a very large unsigned > > 'nr_to_read' when calling 'do_page_cache_ra'. Where the true > > bug is still eludes me, though. > > Thanks for the report, bisection and debugging! I think I see what's going > on. read_pages() can go and reduce ra->size when ->readahead() callback > failed to read all folios prepared for reading and apparently that's what > happens with NFS and what can lead to negative argument to > do_page_cache_ra(). Now at this point I'm of the opinion that updating > ra->size / ra->async_size does more harm than good (because those values > show *desired* readahead to happen, not exact number of pages read), > furthermore it is problematic because ra can be shared by multiple > processes and so updates are inherently racy. If we indeed need to store > number of read pages, we could do it through ractl which is call-site local > and used for communication between readahead generic functions and callers. > But I have to do some more history digging and code reading to understand > what is using this logic in read_pages(). Hum, checking the history the update of ra->size has been added by Neil two years ago in 9fd472af84ab ("mm: improve cleanup when ->readpages doesn't process all pages"). Neil, the changelog seems as there was some real motivation behind updating of ra->size in read_pages(). What was it? Now I somewhat disagree with reducing ra->size in read_pages() because it seems like a wrong place to do that and if we do need something like that, readahead window sizing logic should rather be changed to take that into account? But it all depends on what was the real rationale behind reducing ra->size in read_pages()... Honza -- Jan Kara SUSE Labs, CR