From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E411C36008 for ; Wed, 26 Mar 2025 16:19:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62800280097; Wed, 26 Mar 2025 12:19:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D74428008D; Wed, 26 Mar 2025 12:19:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4514C280097; Wed, 26 Mar 2025 12:19:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 228B128008D for ; Wed, 26 Mar 2025 12:19:37 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 25F28140833 for ; Wed, 26 Mar 2025 16:19:39 +0000 (UTC) X-FDA: 83264212878.26.053A221 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf11.hostedemail.com (Postfix) with ESMTP id 51DC24000B for ; Wed, 26 Mar 2025 16:19:37 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=OxbWhJla; spf=none (imf11.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743005977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+FL4jQ7jfJtYr+rNauc09f/kXOG4xRAMPJ7WWh6wqkA=; b=B92opGp+0IZ9TalRRcd4fJRR/LlhAxA/bOLSwDOVfYzZtFW5CjU401/oJN6V4xZjw5eGG4 RRJkWMBq6Tm+e8GncOUPQSJNdQdTbSOcYIK3xI9zez/TUuqi8t3jqnnZ5xJbWzgaW/GiuK +em/IX1Y6zIN2SYpCftQro8XKK1UU6E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743005977; a=rsa-sha256; cv=none; b=sqOFnUnbCoQGBOlNL3448EBHMP8Dx0Hc+7FrpAk3oiQjekCN18AODy5lkS13EXWRntuxWf 8swWI2dnBcvBDMY+RRyOXBd4i80Jn/lGPVK6yJP0rhp8H4xSXqCSpdHmH3+vdCFlp66SNK 1tEWm/U5ur13khWAkhmYqfzXorVz/bY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=OxbWhJla; spf=none (imf11.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=+FL4jQ7jfJtYr+rNauc09f/kXOG4xRAMPJ7WWh6wqkA=; b=OxbWhJlatg2vMRMEI2PqoacR4w PNAN1Q+jJKNoF5opvZJZuAgbsldApO2bbqnhNiAgD8482gq5cJsplie1vV0EQmlj7zmZDgQA/MHqC ujZFQNzJlq4p3chCNCe4RSq0axcr9qOz46xGRQoBZlt1I8yqwmSq0eVeUuoimZlckYblbMB/kotzH 5S0gN0x4Ua886so6qUSjAY3mTKyNnUO2eRmCx/3ReLJlQ4Hy0GN4iVJ4Xgt7EvqONRUKwZUBJy7R/ Bugrk9ntB7Zdkq/v+QJYu1vj9Xr/DmrzBIRLQry/OP6AZ0Fbxtq0OEmdxMqATmND6cQuR0WBI2r3q W63/+jfQ==; Received: from willy by casper.infradead.org with local (Exim 4.98.1 #2 (Red Hat Linux)) id 1txTTZ-00000000ps7-0FO1; Wed, 26 Mar 2025 16:19:33 +0000 Date: Wed, 26 Mar 2025 16:19:32 +0000 From: Matthew Wilcox To: Theodore Ts'o Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Chris Mason , Josef Bacik , Luis Chamberlain Subject: Re: [LSF/MM/BPF Topic] Filesystem reclaim & memory allocation BOF Message-ID: References: <20250326155522.GB1459574@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250326155522.GB1459574@mit.edu> X-Stat-Signature: 7ws1amoghwhpnhzz3t1z9zjwqtdwdw6w X-Rspam-User: X-Rspamd-Queue-Id: 51DC24000B X-Rspamd-Server: rspam08 X-HE-Tag: 1743005977-117674 X-HE-Meta: U2FsdGVkX1/zqbLDZw4Thv5ppSKp/bkDbHiIw0qvJGEJEHTSbO+OwXHCBjDU4xXOKJHB/RSjAsV2N6OIXrObQSeOXmdvGyVsj3iThsTn4uCnxx4KMM/edCTrnb/5Mz014w1xRDGr8x8S5FHYoOsMmLwUQMNQ7WYTop2Ct7D0StGg9T5aEWmg/ozPIToNNbMQ/r9P78eonSkv48G8fY88XqU92dqT9EcI7284mKFsumuXKOq5a+VR4udH+2jQkBUOePge5LJ2ziPuru4jnvuJNF8euJQgMakCowbmHnAHAzdA8tq+M+ZbzQhjVHcSMFW47WX/AvZooEv032BfrSo7AuMq2edv8JRjgCcBVxH0+pFhrZyQ7MmfM0dKqwSKX6FENx8iIRsPv643+aDKahs8muirv70VN4d/Ulzy1rww1QQnEZwSxMyK+SmA3DHx9HTMO4n88LgqPVS8lRuEekKGIbBxi1j/K4/BBuLS4m/ehxTybnXU7/r50v0N+YrdgLANu2mOMq2DUoY1nh1lZ4C9/3d9ehBgU+8RqJOFg3xsrh670hSWabvHNztmMIMJ1HE9bTPHMaRJ7Cys3jj+TU9bBuKhpozrLg/n+dWSPWRw1RHdvGVHXnQDnxNlkYWnrX+XQEpHy9hN00jDA16tInEWHLFfavfSmoS0yZORTGgtLlLMDgfXISMQYwLSwcH1hscu5C8nDK2x9lstKtgjSMUNjrVS1lVIy7QlnddryZ1Y0SSv85HFW1OWjqRalZrFA+L5iTBZllRqkrs1aU5AY+IJerGaTNgx7XgqNV/1QJgsOYQ/BFRYTaDQ29QKKaceMVweVBaHCncAwKbNEuQx1kovLX1g6Tmmpe+EXM2KtRLncwIxdcMD5lPt3DP25BlaDiQVApSucgh/mNaP7wyt5x7iXT6ahyF/HEpzk6hzkPuqLv92YHpAyd25H1WoZM2os5y+U+1V96bdz9/DSoX2OD7 rJS557tm Lx5YZUgXQTDDkYB27iexlig4UwFUrofs1BLR2O5/ZL9IIXubYxS+uQUrblJhfx6qOJ8I5NiMzA1ig97IGrtRuAAs2XNQ9crPHY4W6taWH2307lY5GhVV20xv6ITmUKTmSiY639+L8+tL2IHe8EfDdjRt0dtJp+3qhKh+ySFMQcEpkavDCzE+TOT9Wlx3h347lkHDFzN7UwhSW8qXNlJywQew3IxooDIkiYbxBi7FzTk13BfslqdO7y/utt+4sOzYL66UMvPTxS6RrXqPN+5KkYSboeZchPwVXByi8SwBlJ2+wIbREe8QitUrYqA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 26, 2025 at 11:55:22AM -0400, Theodore Ts'o wrote: > On Wed, Mar 26, 2025 at 03:25:07PM +0000, Matthew Wilcox wrote: > > > > We've got three reports now (two are syzkaller kiddie stuff, but one's a > > real workload) of a warning in the page allocator from filesystems > > doing reclaim. Essentially they're using GFP_NOFAIL from reclaim > > context. This got me thinking about bs>PS and I realised that if we fix > > this, then we're going to end up trying to do high order GFP_NOFAIL allocations > > in the memory reclaim path, and that is really no bueno. > > > > https://lore.kernel.org/linux-mm/20250326105914.3803197-1-matt@readmodwrite.com/ > > > > I'll prepare a better explainer of the problem in advance of this. > > Thanks for proposing this as a last-minute LSF/MM topic! > > I was looking at this myself, and was going to reply to the mail > thread above, but I'll do it here. > > >From my perspective, the problem is that as part of memory reclaim, > there is an attempt to shrink the inode cache, and there are cases > where an inode's refcount was elevated (for example, because it was > referenced by a dentry), and when the dentry gets flushed, now the > inode can get evicted. But if the inode is one that has been deleted, > then at eviction time the file system will try to release the blocks > associated with the deleted-file. This operation will require memory > allocation, potential I/O, and perhaps waiting for a journal > transaction to complete. > > So basically, there are a class of inodes where if we are in reclaim, > we should probably skip trying to evict them because there are very > likely other inodes that will be more likely to result in memory > getting released expeditiously. And if we take a look at > inode_lru_isolate(), there's logic there already about when inodes > should skipped getting evicted. It's probably just a matter of adding > some additional coditions there. This is a helpful way of looking at the problem. I was looking at the problem further down where we've already entered evict_inode(). At that point we can't fail. My proposal was going to be that the filesystem pin the metadata that it would need to modify in order to evict the inode. But avoiding entering evict_inode() is even better. However, I can't see how inode_lru_isolate() can know whether (looking at the three reports): - the ext4 inode table has been reclaimed and ext4 would need to allocate memory in order to reload the table from disc in order to evict this inode - the ext4 block bitmap has been reclaimed and ext4 would need to allocate memory in order to reload the bitmap from disc to discard the preallocation - the fat cluster information has been reclaimed and fat would need to allocate memory in order to reload the cluster from disc to update the cluster information If we did have, say, a callback from inode_lru_isolate() to the filesystem to find out if the inode can be dropped without memory allocation, that callback would have to pin the underlying memory in order for it to not be reclaimed between inode_lru_isolate() and evict_inode(). So maybe it makes sense for ->evict_inode() to change from void to being able to return an errno, and then change the filesystems to not set GFP_NOFAIL, and instead just decline to evict the inode.