From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA6AD59D90 for ; Tue, 26 Nov 2024 10:37:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C3496B0083; Tue, 26 Nov 2024 05:37:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 772446B0085; Tue, 26 Nov 2024 05:37:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EBDA6B0088; Tue, 26 Nov 2024 05:37:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3CC886B0083 for ; Tue, 26 Nov 2024 05:37:24 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EF5AC1A0C5A for ; Tue, 26 Nov 2024 10:37:23 +0000 (UTC) X-FDA: 82827894156.03.0E5C33D Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf27.hostedemail.com (Postfix) with ESMTP id 1FE3E40011 for ; Tue, 26 Nov 2024 10:37:13 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=rfiP8Yxh; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=G+XCtD2e; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=rfiP8Yxh; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=G+XCtD2e; spf=pass (imf27.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732617436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IdQC6j8zPVleA81o2pdIMBkbwZFyTWol19ahvIa+B20=; b=3+90rk4I9XMRn/oYdZpGUVIR1af5gU8vxAvnL+9oyQgSv4YjzJp/UB7FrPDcT46JBpaACJ QK5haVY5WUaSa46F6dVHTCGFZz9LP3pX6E2me2M+cVjxB1z1pMy7jQvgjxuBS1Z2xT2zde 1RhVFNP7OQIbUfJNqJMbhhjhsfmFtTY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=rfiP8Yxh; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=G+XCtD2e; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=rfiP8Yxh; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=G+XCtD2e; spf=pass (imf27.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732617436; a=rsa-sha256; cv=none; b=OEy+tSgyzWPKNvradCaTTRL2IagOA5h4msOM9RaXewvuw3eWN1yV32N5Rxur98/3IbFIUB /yx5nCfW0sAlcD64Vs/CmfcBYiG0Jq9lLVq2WAaonHD3ZXtx7DV+f295XF99J7y63ft0Xl /PUcCmkhVpxoceIgVTcE+JWBwBJUCCI= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8B1F21F45E; Tue, 26 Nov 2024 10:37:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732617439; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IdQC6j8zPVleA81o2pdIMBkbwZFyTWol19ahvIa+B20=; b=rfiP8YxhK+69HUCUmk7T3SqRPQ8XO+Awe2UviO4G7H4atvkCI2CfZOUJNteet+tz4Iq19W JanDkwWnP3Ky7LyyQFLznK8e53/tBOK3zlOdmQpWjS+R2USwOfPw+Br1iTV8Q96YRgegrM 2RJYYENsME9kIY+u6W5+TqXdg6Zv5ik= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732617439; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IdQC6j8zPVleA81o2pdIMBkbwZFyTWol19ahvIa+B20=; b=G+XCtD2ebUyOCoetbKyPdjBM0EOVsy/mHtPI2+zaEOjGcok9uSyzqQaO8m3XR8EyNmKekD T+b/7U6yQlaimDCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732617439; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IdQC6j8zPVleA81o2pdIMBkbwZFyTWol19ahvIa+B20=; b=rfiP8YxhK+69HUCUmk7T3SqRPQ8XO+Awe2UviO4G7H4atvkCI2CfZOUJNteet+tz4Iq19W JanDkwWnP3Ky7LyyQFLznK8e53/tBOK3zlOdmQpWjS+R2USwOfPw+Br1iTV8Q96YRgegrM 2RJYYENsME9kIY+u6W5+TqXdg6Zv5ik= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732617439; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IdQC6j8zPVleA81o2pdIMBkbwZFyTWol19ahvIa+B20=; b=G+XCtD2ebUyOCoetbKyPdjBM0EOVsy/mHtPI2+zaEOjGcok9uSyzqQaO8m3XR8EyNmKekD T+b/7U6yQlaimDCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 73C96139AA; Tue, 26 Nov 2024 10:37:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id wbs9HN+kRWcMKQAAD6G6ig (envelope-from ); Tue, 26 Nov 2024 10:37:19 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 24CA6A08CA; Tue, 26 Nov 2024 11:37:19 +0100 (CET) Date: Tue, 26 Nov 2024 11:37:19 +0100 From: Jan Kara To: Anders Blomdell Cc: Philippe Troin , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: Regression in NFS probably due to very large amounts of readahead Message-ID: <20241126103719.bvd2umwarh26pmb3@quack3> References: <49648605-d800-4859-be49-624bbe60519d@gmail.com> <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org> <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> X-Rspamd-Action: no action X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1FE3E40011 X-Stat-Signature: cqphd77qabo9tgsd3x1dgndpofqyaert X-Rspam-User: X-HE-Tag: 1732617433-172048 X-HE-Meta: U2FsdGVkX1//1CVSRAH44VvPf+AEoGrInun+7Ol5oUtPez7y9EmftGSletX3LoRePDSAaiMAQFLhVQZrkSh4d6Cy/VJaTTHY5YflQzel004d9+7qtiOFyxSRwIx97rzb1s9k+ahWYDpOQ+SYpK2Ppcs3z4QpVtL/H/sGNfhi8if+QTEB1NSLPny18rJQSIEkvF8r5aOwSbxnSblG0TrmQE82qFEGJ38HrScdqdCNVWmPoNeK/CfriGw5OlQVgE/yutow9zUsB5JaNOHO1gZeglzoQe9CyeVKj28Fdkgdz78cSJg90wqKSYZTQKXYt4J/t1nSfOihmFQ05HYP6FV53xbOiSdkvgkVnxm49hT5saClogvl42+aVG+Bsw2z+qkhloEVpIHf1EDu/4m3oaQdWCoiBtj73r9IBqos2C+Wa5oRg9JlcUp7wpQ5qc9jexWO/bknjTNoJA82kWzkdxaAcn3UoDZ/oZCFNS/YXssdYtGsMkq2TxT+sE3dQW8gP1CrmFr9jVV6IC8f6ZGk9Ru4d29ax7TsExfkn38XYIB510ggRo6sUK/RNpm3tUwRJLjq5VNIfR6ZQMQxLt9VeDaQmKasoCnjQayuFhrPrW16aUDjKrtggwGJHuHtGhBkATjTtENGz0Spsw8iak2k0A62eAZQ/hwi5Hh6wGCl8MaYk7EF5tDPHFiDdROREOefY7Kxl+2D1/eAWBVTR1clx0mPfq784/1XGGDTiROATV+p1qqEVrmqGEzYzhFcfHo6t09eM9jcNHVu8DyYq8Hs1igARC9IZYjcJRgJJFrAXMnc7XSkkKcOiqVX/YVWNN00uI7VSRCCMyh82kWilWoQjYu4POxjjeylPyRZ9vJPX68Qmbzt1EEWLJA4D4chcQdeQP1quPQ6kgLxnG+6ZsG61vdz2Rshepdtb1FWU+5e2f/oRiTWFSooRJ1yK/kxXORktGu/xGEPn+5NCKk+kBVTC9V 1jJ4B5vu tFpgFQdnsO9v8+AHN0xP7Y7F2gqdTBcJQW9/TBarQqGyd82nuaGZIi3XW+3VCEEcVzdCnIoS+RslXjXbzHOfEWCyO+1biehHxkVvvXF1sx95CpdKEy+jTVGRqals3OwAW2PojRJ8YH/UIebtwPeRHnIZS++8xU+R/tpAf6Cq360Hduz5ptWzgcNu+rWdQY0Uy4xro1n6mb4Ip3fYzxlMPpRCnRmUdIMWmH6RI5WmFsdAMcH5Je6F59Vgwu5VSIOc+EReFk23yso7nocrw/EcJszJ5JCJ1xE6JXXEJoF0e/V6WV/VybDLxa8yk/NMDDyGPP9iC0gjsAUYA6O0Y1xm/rmcpjpdEPPEmxxfQT74Clv/XhJ++4joYA2rDlht8jO+x0khP5om/ph2l60366jg5PEIAOSZrCjQlDsYg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 26-11-24 09:01:35, Anders Blomdell wrote: > On 2024-11-26 02:48, Philippe Troin wrote: > > On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote: > > > When we (re)started one of our servers with 6.11.3-200.fc40.x86_64, > > > we got terrible performance (lots of nfs: server x.x.x.x not > > > responding). > > > What triggered this problem was virtual machines with NFS-mounted > > > qcow2 disks > > > that often triggered large readaheads that generates long streaks of > > > disk I/O > > > of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache > > > area of the > > > machine. > > > > > > A git bisect gave the following suspect: > > > > > > git bisect start > > > > 8< snip >8 > > > > > # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25] > > > readahead: properly shorten readahead when falling back to > > > do_page_cache_ra() > > > > Thank you for taking the time to bisect, this issue has been bugging > > me, but it's been non-deterministic, and hence hard to bisect. > > > > I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in > > slightly different setups: > > > > (1) On machines mounting NFSv3 shared drives. The symptom here is a > > "nfs server XXX not responding, still trying" that never recovers > > (while the server remains pingable and other NFSv3 volumes from the > > hanging server can be mounted). > > > > (2) On VMs running over qemu-kvm, I see very long stalls (can be up to > > several minutes) on random I/O. These stalls eventually recover. > > > > I've built a 6.11.10 kernel with > > 7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to > > normal (no more NFS hangs, no more VM stalls). > > > Some printk debugging, seems to indicate that the problem > is that the entity 'ra->size - (index - start)' goes > negative, which then gets cast to a very large unsigned > 'nr_to_read' when calling 'do_page_cache_ra'. Where the true > bug is still eludes me, though. Thanks for the report, bisection and debugging! I think I see what's going on. read_pages() can go and reduce ra->size when ->readahead() callback failed to read all folios prepared for reading and apparently that's what happens with NFS and what can lead to negative argument to do_page_cache_ra(). Now at this point I'm of the opinion that updating ra->size / ra->async_size does more harm than good (because those values show *desired* readahead to happen, not exact number of pages read), furthermore it is problematic because ra can be shared by multiple processes and so updates are inherently racy. If we indeed need to store number of read pages, we could do it through ractl which is call-site local and used for communication between readahead generic functions and callers. But I have to do some more history digging and code reading to understand what is using this logic in read_pages(). Honza -- Jan Kara SUSE Labs, CR