From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E572D3B99D for ; Tue, 26 Nov 2024 15:00:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1AA36B0085; Tue, 26 Nov 2024 10:00:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACAB46B0088; Tue, 26 Nov 2024 10:00:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96B646B008C; Tue, 26 Nov 2024 10:00:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 758186B0085 for ; Tue, 26 Nov 2024 10:00:08 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 955B441B49 for ; Tue, 26 Nov 2024 15:00:07 +0000 (UTC) X-FDA: 82828556328.21.48FA0C9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf28.hostedemail.com (Postfix) with ESMTP id 6B9B1C002D for ; Tue, 26 Nov 2024 14:59:57 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=TChG3feU; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3Sqpc3IL; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=TChG3feU; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3Sqpc3IL; dmarc=none; spf=pass (imf28.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732633201; a=rsa-sha256; cv=none; b=XgNC+scdkmi9HMS8PTkcOOPpLTAgLjFXlHWVYv5u/u0IKm+LWFAbQmw8+89jJOlfZpNLEY iCV+biveKAMhr5tV5zZXktjQzkEkZIrctIl/GR+2lrt8RMu2O59jgrn5Ien7ctnTT4XfNN K/82p6R98GGfU59G8jb/QWlen4NIV9U= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=TChG3feU; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3Sqpc3IL; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=TChG3feU; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3Sqpc3IL; dmarc=none; spf=pass (imf28.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732633201; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jHu/1ga8UiXrMPcTRTq0z7k/C/XRWuMzHOuhMSugtAM=; b=DZX5y3hNKAynYGdIgWcWzqB9wPQEv959d8mYT7XASilC40SOZs4Aaodh5mWoSulqLkQBCu xPE9ALS5dpP7nT6CcksxJBy4jKYvKt5XES/nxfv4oLYpJ49n+uRDZmdzYEi4TDMiF6L1Le VJCQQfr5oMLwBTvVyMWIqga2HGwDDXk= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 42BE81F74C; Tue, 26 Nov 2024 15:00:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732633203; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jHu/1ga8UiXrMPcTRTq0z7k/C/XRWuMzHOuhMSugtAM=; b=TChG3feUYThOQm4JDiUtpNkqC1yV8f6e0mhls/rC9zPb/HGJ5DQEtG8OYc74ZqXbezMlf2 eWy3ml/5OJq/HiwkNRr1L0aOMvH3zkfBdKsdlTiY6WwwhKPkBYRkHNrqIVNOreGUDr/sXT aojfr1VwGKh0ncbqwcsu2Citly6/A34= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732633203; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jHu/1ga8UiXrMPcTRTq0z7k/C/XRWuMzHOuhMSugtAM=; b=3Sqpc3IL5ebxP+hr9y+HcYlkeTFg2RP7wnyJLvAgJTPCTx4Ms8DDJvhqk2C/wB++RuiisF sYVGnnm+Rw2KvOBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732633203; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jHu/1ga8UiXrMPcTRTq0z7k/C/XRWuMzHOuhMSugtAM=; b=TChG3feUYThOQm4JDiUtpNkqC1yV8f6e0mhls/rC9zPb/HGJ5DQEtG8OYc74ZqXbezMlf2 eWy3ml/5OJq/HiwkNRr1L0aOMvH3zkfBdKsdlTiY6WwwhKPkBYRkHNrqIVNOreGUDr/sXT aojfr1VwGKh0ncbqwcsu2Citly6/A34= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732633203; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jHu/1ga8UiXrMPcTRTq0z7k/C/XRWuMzHOuhMSugtAM=; b=3Sqpc3IL5ebxP+hr9y+HcYlkeTFg2RP7wnyJLvAgJTPCTx4Ms8DDJvhqk2C/wB++RuiisF sYVGnnm+Rw2KvOBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 36F5813890; Tue, 26 Nov 2024 15:00:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ymBnDXPiRWesAwAAD6G6ig (envelope-from ); Tue, 26 Nov 2024 15:00:03 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id DE02BA08CA; Tue, 26 Nov 2024 16:00:02 +0100 (CET) Date: Tue, 26 Nov 2024 16:00:02 +0100 From: Jan Kara To: Anders Blomdell Cc: Jan Kara , Philippe Troin , "Matthew Wilcox (Oracle)" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: Regression in NFS probably due to very large amounts of readahead Message-ID: <20241126150002.o6fbe4yei4fwsehz@quack3> References: <49648605-d800-4859-be49-624bbe60519d@gmail.com> <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org> <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> <20241126103719.bvd2umwarh26pmb3@quack3> <6777d050-99a2-4f3c-b398-4b4271c427d5@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6777d050-99a2-4f3c-b398-4b4271c427d5@gmail.com> X-Rspamd-Action: no action X-Stat-Signature: pa9h319mdus19ktghuakmd9zdc7r9w51 X-Rspamd-Queue-Id: 6B9B1C002D X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1732633197-395397 X-HE-Meta: U2FsdGVkX1/vODNqcAUtLWLOOz8PSRKly65TwQMQpxOBnYPQmi0OKDNXCI9ULKxq8I3733PT6T5RU//S+BAwMpVwnfwU1U1Xy8ZdVT3TM6h9NrJ71CHvS2Kp9PBzx3IUCB86ZN+HyUwosMG1FXe5m40ZntGgHLjvAkMkCKBOsNxDVkNcLvjzVh1h0lLud2et5T9Q0jIq8GVlhuIgStAKHKdYxPXzvUTZI5hj9GV/iwYAJY739OjyS8wRcm2eO7CTqj580Ds8ynwefEYP4/NMeZbssvFw7L+hRguOwKCa9gQRVCUbvaE4fgHFRdAgNCXg5HeIUYWNQDHUtddol0zcaZhFl//BAUOF0m4RpMqwmkFGQp7x7i8fgLtQCQVpclv3E7rzKwKY07RS0yBOG4lojcg6LnbhusfSvqKWB+fRsS836KWV4pspxwLY59Dr+gxZIYaCIzN+SQMW+p6RCKRRGw2ETuqW5iJs5VzulTKv6tD9WkUCZTaB2rBJGLv0ngXocT7U9YH/ck3ccD4fV3Y3opYOoms5AIR+LVFiZqALinm8q8e8/mI7lv0F+8bVCUCTbX86xvEsNgP15b0yF7sY7w5+PPRhkStB2ugNNRVLxx/3bLxzJmSAbFT5j12qEY51eegjHNlzByeGBhm4YCTE+uquplmYt4OcCWVPvhSHliNsGargMuyp47SfxPTRmdWC6cjz3J33ammCeOMy/4abVRzeNLc7wA0/GSACJTSPkhrWquTsNv6CBtSQd9kSmGtb2gdHOkxtQWMy1Jydl9RKXGFovvSpS+GuLjEH6FVinHUy18jgp8ITu/Plb4cajVsciE2n7k6VobRRSJGsDrYROz/XPP522heKbcjiQB58rCsr2oRxaSwvitTHMssYzQQ7SbSg51DTb3EQB7qlaRdYQn7Lx5iSBbzOHw/+kYeycBGTjKJw105BUr4g7VtnNZiYv1dGu7LiokfXUZLwE36 Y9HNe9Nz DDxF4setqUu8y/NihtddkJ5ibEnWzoOrbT640xCyvHEhkVX7Lt9ZSUZqBu4G5XP8VK3QlSdAhOkrgfnxHERu3RXTqadszo72Y0Gnhsn/Qj731/1MOJcUZsvXWsAQLDNvh+Gm21pGA6MCRWHQQ8HcmLUFeqFUp1u2K2IptctqjaHM57ctraBPqmsVzq64sXdt25Bca1LbZNqNzPpDxphWtyByWoG8JWFAggvNS5JCEdJKY1EKCCVPB6re1CrIdYG3j+EFGBGNhP7YQgHuj8kRDrVUu0yyTbU1KR2Y8Zs3by/j1Um3AvXeOeddvFg14LuYhjc1Nujtrc6BXpRFdqUac1vd7SlSI0HG/zC9og98zT4dIspl2Tq+4dgXGFVQ2crb8tLuT3DT/2gWxfgLbON2iawzVnDg55AtBmsK+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 26-11-24 13:49:09, Anders Blomdell wrote: > > > On 2024-11-26 11:37, Jan Kara wrote: > > On Tue 26-11-24 09:01:35, Anders Blomdell wrote: > > > On 2024-11-26 02:48, Philippe Troin wrote: > > > > On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote: > > > > > When we (re)started one of our servers with 6.11.3-200.fc40.x86_64, > > > > > we got terrible performance (lots of nfs: server x.x.x.x not > > > > > responding). > > > > > What triggered this problem was virtual machines with NFS-mounted > > > > > qcow2 disks > > > > > that often triggered large readaheads that generates long streaks of > > > > > disk I/O > > > > > of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache > > > > > area of the > > > > > machine. > > > > > > > > > > A git bisect gave the following suspect: > > > > > > > > > > git bisect start > > > > > > > > 8< snip >8 > > > > > > > > > # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25] > > > > > readahead: properly shorten readahead when falling back to > > > > > do_page_cache_ra() > > > > > > > > Thank you for taking the time to bisect, this issue has been bugging > > > > me, but it's been non-deterministic, and hence hard to bisect. > > > > > > > > I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in > > > > slightly different setups: > > > > > > > > (1) On machines mounting NFSv3 shared drives. The symptom here is a > > > > "nfs server XXX not responding, still trying" that never recovers > > > > (while the server remains pingable and other NFSv3 volumes from the > > > > hanging server can be mounted). > > > > > > > > (2) On VMs running over qemu-kvm, I see very long stalls (can be up to > > > > several minutes) on random I/O. These stalls eventually recover. > > > > > > > > I've built a 6.11.10 kernel with > > > > 7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to > > > > normal (no more NFS hangs, no more VM stalls). > > > > > > > Some printk debugging, seems to indicate that the problem > > > is that the entity 'ra->size - (index - start)' goes > > > negative, which then gets cast to a very large unsigned > > > 'nr_to_read' when calling 'do_page_cache_ra'. Where the true > > > bug is still eludes me, though. > > > > Thanks for the report, bisection and debugging! I think I see what's going > > on. read_pages() can go and reduce ra->size when ->readahead() callback > > failed to read all folios prepared for reading and apparently that's what > > happens with NFS and what can lead to negative argument to > > do_page_cache_ra(). Now at this point I'm of the opinion that updating > > ra->size / ra->async_size does more harm than good (because those values > > show *desired* readahead to happen, not exact number of pages read), > > furthermore it is problematic because ra can be shared by multiple > > processes and so updates are inherently racy. If we indeed need to store > > number of read pages, we could do it through ractl which is call-site local > > and used for communication between readahead generic functions and callers. > > But I have to do some more history digging and code reading to understand > > what is using this logic in read_pages(). > > > > Honza > Good, look forward to a quick revert, and don't forget to CC GKH, so I > get kernels recent that work ASAP. Well, Greg won't merge any patch until it gets upstream. I've sent the revert now to Andrew (MM maintainer), once it lands in Linus' tree, Greg will take it since stable tree is CCed. Honza -- Jan Kara SUSE Labs, CR