From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56AC1D3B98B for ; Tue, 26 Nov 2024 12:49:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D2786B0083; Tue, 26 Nov 2024 07:49:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9827E6B0085; Tue, 26 Nov 2024 07:49:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84A416B0088; Tue, 26 Nov 2024 07:49:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5FC896B0083 for ; Tue, 26 Nov 2024 07:49:17 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E0A501C729B for ; Tue, 26 Nov 2024 12:49:16 +0000 (UTC) X-FDA: 82828226628.18.C0B7987 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf21.hostedemail.com (Postfix) with ESMTP id 736691C0006 for ; Tue, 26 Nov 2024 12:49:06 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EpfhJ+BH; spf=pass (imf21.hostedemail.com: domain of anders.blomdell@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=anders.blomdell@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732625352; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=arVq5sYPtBHjzPOt7CSX/eUWHNFhe4Yu0pMFs73XXFc=; b=TfxLjByEibBIM+4SO67iKrOqCHx+gGsGo/+MRDnmJj4uROk88e0Dw31yN7SMYjeZdr2PpC 2Sz3PCbYeVWiIaksqojuZAEOmifCE9dmKACgwmpBLZYVLlUhP2We02imShIucLx/H4RuGe YKvGPys1jGCI8f6Q2a2AX+DSPBKrwks= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732625352; a=rsa-sha256; cv=none; b=ZYmJ4asrMXmmMfWDropZfq6fK26yrqBo1gFmW5mbcR65mnNeO0J+NYyXT7LfmorvPg9VV/ lXuhrGky5nbooUpIW6bjj35Z+yyanJV1MMD6rxY5aIdGGM/v0FkTFzpFWv2ztcJUTAqkl9 RsFNpxYh+ukmZa0n/tOXiW4h9DEyePc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EpfhJ+BH; spf=pass (imf21.hostedemail.com: domain of anders.blomdell@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=anders.blomdell@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-53de101525eso2293262e87.0 for ; Tue, 26 Nov 2024 04:49:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732625353; x=1733230153; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=arVq5sYPtBHjzPOt7CSX/eUWHNFhe4Yu0pMFs73XXFc=; b=EpfhJ+BHPa8vGZzAYhSMuAbZEPhKlOejDVHeunPMX6OZ2aYEN0+7wlKQRYzciJWBka B7P8M7mPAV7LrblAJ6BGLxxVSlRMg33WmQD/YLciJeIjLXuXQbwz+PAi1sQNe7j47afP azns0oRyFieme7CxLuSTxo1KdmP5py/5D4xL9llB/sl0sjX3P5dDrAqeBd1pbnbPhf/O RgJNxAApQF6S/aEZqenMEToMsD2nW/QLb9f7fc7sqpJ6QCjGREvwvwvZwbjGTI75oTsH 6gkiZ4HUveeb5jV6QYDOj+39t7k+3EHfMfyx0N9LjF1xj7dvVR8Jhx1ia6yE2FMgiWa0 Zn6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732625353; x=1733230153; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=arVq5sYPtBHjzPOt7CSX/eUWHNFhe4Yu0pMFs73XXFc=; b=H/yPLDZ7BrNstQW4dPJmohonPF1joZLBCej8ziv4k+JyeW2bPm7G4I9YMriDv9BLeF 7VtgM/RmN56FTXIkH27ejrIqni79c7EjiKbgiuSm2jqyEcgwtaskNOESKlYW6JcRaLyp QYTlYjHkCcMpU0LkuR2YledqR05RQ2DM44A35wplzIhJXPE649v6N/sP7Hb5dFtsoD7u 0jPvonSwOtXPEy+rTBoB+Ah5O2FaWp1eFtoUgBAofcf8R7RP5Ch/dbihBPy6+L4/CTH3 M5HhThCiWAlCUxlFVUl51/2oOwxwrLkM3464glAr460xqQPHUWI7rTWHw5OJV48iOF19 u//A== X-Forwarded-Encrypted: i=1; AJvYcCUTM/zbB40y0iMuojcEBVTjqMHMTLQhfPndj+gfEC6CO9M74YXkYUNAX9eZ3abE/J+HOgNIR/ao1A==@kvack.org X-Gm-Message-State: AOJu0YyF5HfkQSf40k8eGzOwLDyJNVdLxCsexFXmWzJ6IUhaF1AWoUOT MrjR8fBDwC/H/d9H77KO31k+xZgmS7qYd1wv3tJdbl/oKOwZiEQc X-Gm-Gg: ASbGncs60aDZrG7u3KHCfYeBpCekL+m6gU4pwoHnI8+M8/af/l/txuwtkDYIOmLOY3+ 77RvJAvZfgcYgAvO1Y77Xb2Cu8ROzBwNs1FfKkbB5ltZILKgQTHubmF6UXJyoIEAJhMhEFHNmng 9do7LqIgVbesL2QdP/dOPxi5rArPsI1tmfyLQw0AjpMWRhg6FKTQAI7PnVVOUvLRy671D4uEKlK nkKrXovGugiJwp8sXKpJXf5QoBb0pqUaPQ74kw9O/YtjvHBreNTPwh6GI+oggGE3VMEcGBU/0ML p/cGdsIv X-Google-Smtp-Source: AGHT+IFVpS4FmtsdmagtZddlo7pfwq3eSDGmyAJK0KYwfojtIIFI/sDvVOq140czgbRRX3zvdTroeA== X-Received: by 2002:a05:6512:3b06:b0:53d:d06c:cdf8 with SMTP id 2adb3069b0e04-53de8800269mr1074256e87.1.1732625352549; Tue, 26 Nov 2024 04:49:12 -0800 (PST) Received: from [130.235.83.196] (nieman.control.lth.se. [130.235.83.196]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53dd2497e60sm2004772e87.256.2024.11.26.04.49.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Nov 2024 04:49:11 -0800 (PST) Message-ID: <6777d050-99a2-4f3c-b398-4b4271c427d5@gmail.com> Date: Tue, 26 Nov 2024 13:49:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Regression in NFS probably due to very large amounts of readahead To: Jan Kara Cc: Philippe Troin , "Matthew Wilcox (Oracle)" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <49648605-d800-4859-be49-624bbe60519d@gmail.com> <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org> <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> <20241126103719.bvd2umwarh26pmb3@quack3> Content-Language: en-US From: Anders Blomdell In-Reply-To: <20241126103719.bvd2umwarh26pmb3@quack3> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Stat-Signature: dxr8ghtxe1tqwejuyqksx6bey1hd7qxx X-Rspamd-Queue-Id: 736691C0006 X-Rspam-User: X-HE-Tag: 1732625346-644461 X-HE-Meta: U2FsdGVkX19WgdLhs+Z67gCdYtgV1CRyQD8ugnuEMTtxDFno/bOISglfpxYq2BNZfB8VjApv8hdvq6rSj6ejaIju9U8UrWGTu9+WWYAPUlLYVaAu38I/5x4hADhFyK1wA6w7taRYiYx7EGk4aQNNOwMCPFBAN5z1f84C+kGpCJzxsFhDpcMLxOPEG5UI7TbyxCnjs26qkFO8eDOqgH0Nfhs03O/ufg/H3BF8X2zalJFvN8HwigdrOaaOdpSpHaCbYbFyscoSUi9cYNUGhuQK4ws4CIXbXOX92ApjMlC7FqM3QV6c/hGd+cbjf4LO1Y7Q4HLu4cKSHjNAI+anEZ8ICr8rj7nO+l+FOT7s4QIbttU+Iz2xGXby+VdNShs7me5KlEoJ1HtqBWCCzg93d6AxgMx3lz4ylguNx/lrXKx5pHwfTUvBQ5IQexvIYgrCfY2dqFqQT7ZUkoA9aZClcpOpvj+lZ0TVqvM66eaB75kCkU8TqcO3lbMUTbcl7qtdwTzbEI8NRbUpJ4P/2/CbUG2fqD2H2XdNHdyd9BjURttW/pjG4iUsyRPcIxH27pgjVSjxS2fWc8UNQazE8naLUTeWB4cdnbg+p+2vaxDFiIvHYwLeDOem8RYdQQpe2Q0ukYU+ZOkRRUFe5zCMrhhbcSzacG7LNN/hn55sTiubKTnRL8AOEe+GWPlvgtg9LzEV1odeN2eadhzo84adwe3Z1uBzak4oyk0blbL5nP9JHY5LqB+8Cm4gK+t4b0In4Q9VGtlvP1MuONuKQs/+DAAHuvUr8ngVwqRIyU2ESlvLKbcuzr78jS6J+XZwH5hgZag9al1lOGnx73RQMtYJIa+vzUaMRE/CQuvjNaBXCGdm3SG54b5O97x7wpRJa8Gyv5qLDMla7GqPF6xe5KNS3wzFJ2RV5z7nmPLQGnuArzyWeVzXii49uVCEEA8JOvOfFb1sHsQyenFO0LG3Tsw1EIxIFAh rFtwhNtt th8jRlsVK0lGi0R8jFzcNSARc3LVO3HBJV9gdwy9vxHwUJdLu9bXtbxp7s7+shQrxd/2L9JLlo7lI6pSCxwDfbeC6bClEW+FVKn/oNxFb+0l3XPPbvlE8NxVXi4/yt2qkhOA/5Wn61UbtLp1rgpNiqCQ7m8WEqeUxhiu2G8VLIOK8hsXm2gpGjlWlWot1nB1AIf+vk0mKmSEBONvx8CMF9gR5xX8wwMR+q3DmdQMSH5tnyh/gywp6E4NgLGyn88caEDHTGe1KAqFMnOwU1uRNtmHWT5B0VwrTzwx6hC3xtIYJZWU7RSgALu+N2sizE2rpP06ko1dDcLcsKtAh9TV7oXGYuMjG/+eFERyvA7wmuFplvRUwYSz4Puoo/wN2PENMDBrbVElYW+HQuIGXum4E0ffr9hnPUmlfYxXVTsOueDOG4LenbKTH4yVefw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-11-26 11:37, Jan Kara wrote: > On Tue 26-11-24 09:01:35, Anders Blomdell wrote: >> On 2024-11-26 02:48, Philippe Troin wrote: >>> On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote: >>>> When we (re)started one of our servers with 6.11.3-200.fc40.x86_64, >>>> we got terrible performance (lots of nfs: server x.x.x.x not >>>> responding). >>>> What triggered this problem was virtual machines with NFS-mounted >>>> qcow2 disks >>>> that often triggered large readaheads that generates long streaks of >>>> disk I/O >>>> of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache >>>> area of the >>>> machine. >>>> >>>> A git bisect gave the following suspect: >>>> >>>> git bisect start >>> >>> 8< snip >8 >>> >>>> # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25] >>>> readahead: properly shorten readahead when falling back to >>>> do_page_cache_ra() >>> >>> Thank you for taking the time to bisect, this issue has been bugging >>> me, but it's been non-deterministic, and hence hard to bisect. >>> >>> I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in >>> slightly different setups: >>> >>> (1) On machines mounting NFSv3 shared drives. The symptom here is a >>> "nfs server XXX not responding, still trying" that never recovers >>> (while the server remains pingable and other NFSv3 volumes from the >>> hanging server can be mounted). >>> >>> (2) On VMs running over qemu-kvm, I see very long stalls (can be up to >>> several minutes) on random I/O. These stalls eventually recover. >>> >>> I've built a 6.11.10 kernel with >>> 7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to >>> normal (no more NFS hangs, no more VM stalls). >>> >> Some printk debugging, seems to indicate that the problem >> is that the entity 'ra->size - (index - start)' goes >> negative, which then gets cast to a very large unsigned >> 'nr_to_read' when calling 'do_page_cache_ra'. Where the true >> bug is still eludes me, though. > > Thanks for the report, bisection and debugging! I think I see what's going > on. read_pages() can go and reduce ra->size when ->readahead() callback > failed to read all folios prepared for reading and apparently that's what > happens with NFS and what can lead to negative argument to > do_page_cache_ra(). Now at this point I'm of the opinion that updating > ra->size / ra->async_size does more harm than good (because those values > show *desired* readahead to happen, not exact number of pages read), > furthermore it is problematic because ra can be shared by multiple > processes and so updates are inherently racy. If we indeed need to store > number of read pages, we could do it through ractl which is call-site local > and used for communication between readahead generic functions and callers. > But I have to do some more history digging and code reading to understand > what is using this logic in read_pages(). > > Honza Good, look forward to a quick revert, and don't forget to CC GKH, so I get kernels recent that work ASAP. Regards /Anders