From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79BE5C36008 for ; Wed, 26 Mar 2025 15:55:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8418280093; Wed, 26 Mar 2025 11:55:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D333D28008D; Wed, 26 Mar 2025 11:55:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C23F2280093; Wed, 26 Mar 2025 11:55:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8CD4928008D for ; Wed, 26 Mar 2025 11:55:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 90906B9872 for ; Wed, 26 Mar 2025 15:55:47 +0000 (UTC) X-FDA: 83264152734.04.14598E9 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by imf10.hostedemail.com (Postfix) with ESMTP id 93488C001E for ; Wed, 26 Mar 2025 15:55:45 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf10.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743004545; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ObZaDIM+7PaJbwMr/SDl4NZ27egbVV3DVhFMLt9672Y=; b=fZ4wT+eQ39oGQLkpLJnnBsnwYvHq9zzGq9PVXSvIp9558ukkK4wcyKJ8IJeoZS0qpibXzL cKjWPvgexWg1hlzBuUiiMed+NZG1n8xOl1O/vnRGFMPoBgMOuiV6F5eOwaK/WgUjqyhK2+ 519ZHSNcPtm1v4q+FgI9+GmRG3Z09lY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf10.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743004545; a=rsa-sha256; cv=none; b=AotzMKAgqlVc4pEL8mKfc3r+aGfU20R50qRgj8e3L0+Y6U8l0i/DPsJoMTG0BamHMESfmP 6l/tzwX9YP15DyTeQKLpCDeZH0DfVFC5EOvRdYW0j8PLEMgLNoIKRAQodXD5ivmyHTdgDd MCKkxjSVkmFHJ13fdeADAb8zR/LUqJ8= Received: from macsyma.thunk.org ([99.209.85.25]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 52QFtMTW005549 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 26 Mar 2025 11:55:23 -0400 Received: by macsyma.thunk.org (Postfix, from userid 15806) id 77A51346060; Wed, 26 Mar 2025 11:55:22 -0400 (EDT) Date: Wed, 26 Mar 2025 11:55:22 -0400 From: "Theodore Ts'o" To: Matthew Wilcox Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Chris Mason , Josef Bacik , Luis Chamberlain Subject: Re: [LSF/MM/BPF Topic] Filesystem reclaim & memory allocation BOF Message-ID: <20250326155522.GB1459574@mit.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam01 X-Stat-Signature: w5smdyfuzmsp37irwsrxjymzs536uj39 X-Rspam-User: X-Rspamd-Queue-Id: 93488C001E X-HE-Tag: 1743004545-347072 X-HE-Meta: U2FsdGVkX19fqmAksAze6hwL0Ho5sgyPinnTjEmJzxP5aXEhDRI9JtifZKGLSWyyhUDuhx9WrukidJddspuAlxnliVwcg1+g5XrogkOBqe1440dss1kXzOVEsXWoIWPL/SSXhi/8yXKAfPJGLC2F0L2phFZeP3j5L5AWl1FocncVSONybJvPHLCqi5FQs8WFu2g/YKimtKPedhyJekvo1+Oc12oe8YCptM0sLUSYyV2t4+325B1epp9Zbls1gWxCP4/j6Dh9YeF8aq9O5hO29NCT44hEbfb3bvz4EJj2C72jXrrj5nbn2W++iTbkj0bNc4iD2A4SffqQ0SIESaK8prg3+M+zbfTVSsJz46Ybbn8vZDAOcHUbNcNvbf9TuZIm73P8u6LxsHEj1+BEuVswLKiqS5qaU+XN85pdKgaRigrhVrNQX6x6wX5Ka+WnS+q/ORgG+etVxBiApyCJHhPbV7cgbjb3wR5uQvHdrKH526JL+ljf3xexwIcSq6zgdgTUdd6/abjPL6lAL4DPXg0NDd/NNTarKLQCv6baG/uSr0Mh2LGb6KqCkmC0Q7UNDplJSvrZNshX6ughcVyNe6CYhXno83tmjT0lUkswW4qOniyB7RLvgQ+ffm57a2gwI/Slxl/v3vE9p6xyuBhp+sFlQXDCdFKyFwGP09Tvwy0YCUfKav/4AAgJ2ysPwldlyUgMAGrbupqjThNO52dielHSMFEMaA2KYe9A/d48skdmE0+7LRAK0WxPpYAQGw4mbG1M+xOaHt9xbUaa80FPROLzk9DPCqRv72XUQOBSFz4wVA+2n6OuHDLQ1froGrNb8Rs6Ijx+Pqa/2Q/rdDQarcdz+qaQHfXI2Vt3VPhxlG+LxYnLJblBpI5Y04KvWeGCBwrnHg/t7aLy2BwiBVvrSUjnSF+QSdUdspWpaTJl3CSFhQUPLKZeerXtoZNhG57Se0pTY1ERBNT84C6brL6tngQ N6rA6O0M WXX5hh7emtp/ZdnfLHH38+Hs9HalrUO3J12PdX680xwFbshMhI6ZUSRHgLWaZgbXK3X6WkPmHOsN5RPvGLFywqjwhjGXT9MpP3b0wKfy6e2v5hmIBn0s1ShTVYCUJ1RefiysTwmjQTbxjfonuAwO1PdNietoT4Se30qb5nz0KLnt+nbhyUPpJ0o35CnTj6Q1PvHByXYQO/OOY4l0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 26, 2025 at 03:25:07PM +0000, Matthew Wilcox wrote: > > We've got three reports now (two are syzkaller kiddie stuff, but one's a > real workload) of a warning in the page allocator from filesystems > doing reclaim. Essentially they're using GFP_NOFAIL from reclaim > context. This got me thinking about bs>PS and I realised that if we fix > this, then we're going to end up trying to do high order GFP_NOFAIL allocations > in the memory reclaim path, and that is really no bueno. > > https://lore.kernel.org/linux-mm/20250326105914.3803197-1-matt@readmodwrite.com/ > > I'll prepare a better explainer of the problem in advance of this. Thanks for proposing this as a last-minute LSF/MM topic! I was looking at this myself, and was going to reply to the mail thread above, but I'll do it here. >From my perspective, the problem is that as part of memory reclaim, there is an attempt to shrink the inode cache, and there are cases where an inode's refcount was elevated (for example, because it was referenced by a dentry), and when the dentry gets flushed, now the inode can get evicted. But if the inode is one that has been deleted, then at eviction time the file system will try to release the blocks associated with the deleted-file. This operation will require memory allocation, potential I/O, and perhaps waiting for a journal transaction to complete. So basically, there are a class of inodes where if we are in reclaim, we should probably skip trying to evict them because there are very likely other inodes that will be more likely to result in memory getting released expeditiously. And if we take a look at inode_lru_isolate(), there's logic there already about when inodes should skipped getting evicted. It's probably just a matter of adding some additional coditions there. This seems relatively straightforward; what am I missing? > Required attendees: Ted, Luis, Chris, Josef, other people who've wrestled > with this before. Happy to be there! :-)_ - Ted