From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FDA8C36010 for ; Mon, 7 Apr 2025 23:00:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EEAC6B0006; Mon, 7 Apr 2025 19:00:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79F286B0007; Mon, 7 Apr 2025 19:00:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63CD86B0008; Mon, 7 Apr 2025 19:00:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 431806B0006 for ; Mon, 7 Apr 2025 19:00:45 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 24218C1709 for ; Mon, 7 Apr 2025 23:00:45 +0000 (UTC) X-FDA: 83308769250.13.609E9FD Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 126E714001B for ; Mon, 7 Apr 2025 23:00:42 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=FNA1c8cN; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744066843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=evNCUqWM/KE8Fgtxc4/OAJwDhlQwnWsjIDPeaW75aro=; b=NeL/PqZ2Vv5gj0yD4oA16UJlDe2OQeKnEbhuWAjlmwdxStB6uC/rGy4XqxwsaRHyDWjZ/v K7h4MsJNIrZ8YdpymwH6JarhdTbSB+4Gyirc+QtHszoRa7MhtujbphbzG8yX/JSC8LPGRx TrSQUjHBbWn0/hAhFHLmSv0dpy7HGiI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=FNA1c8cN; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744066843; a=rsa-sha256; cv=none; b=XhF4kO/5RLkD9j3tHWxHTESzgJT9WxlJSLIM8qxpa3h9+MDl/Y4vLROgpo9c80ZzdOu4lH b6F+AHE8WtpEOG0rJEEfFj3VJp1FZCxMP7ha2eOkDCQpzYnSKOKo1QRPAKvVMzEK9TXBKI BJ30QoTDgk/s8kA2YqRhiZwYJGB/vDc= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-22622ddcc35so66253445ad.2 for ; Mon, 07 Apr 2025 16:00:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1744066842; x=1744671642; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=evNCUqWM/KE8Fgtxc4/OAJwDhlQwnWsjIDPeaW75aro=; b=FNA1c8cNc9yfnJn/TZIHhyX8PxdP8pDEXZqjpyhXNJ0SVdmJtiUNrqVomfC61PnpVP 3Hu8Ewe5kgHN2U1GYWQdcOSWCwFZsgJXBChoBHaSVMohwv35gTGQEbFaVQF01tpEdXb7 k3dK8DK/bvF1YnaPTUw8+UhHJMT6JMc3NINFgluDQLvZjNJ3i8vNOkKbugllrBD9euq/ AK95wKF7h6j3DJ+H3rGFQLOCdTk2lVUP7U50Fp3YMeQ3XrcWVI29MwDYHj4fvqpkkz92 sOS5r9Swfk6eSfbS51wniQ9s6OSsbUmMRqGYNSrWs108qo77w+Q/vMN9gSpStjL+YHHZ wGhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744066842; x=1744671642; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=evNCUqWM/KE8Fgtxc4/OAJwDhlQwnWsjIDPeaW75aro=; b=r1Ln0x/XFNabbaqhjWmJEqLKu0Z0PuDCvWKHo3ImSYpTOcOphyxAgE9ny86peUKaWI p184asfvFy9uDLB7vOKQRnxhhTNnCNx7xMMe7JKLe5/lzpE0cHjDHTEzkDQ+O6nVTJJD kYt5inpsPmXU6F+P8jGW7OqJQ7NwOKG8ftN9VC8861o+/mK/qPrxbPkWReZKosF2ul+N i+x8kWpDlRPeXYgQjeLIQTLpO6rRvzIY2erpDOiltwzaNw2dyQyBmy3vCzNiritykXQR pKfF/jOq3pPA1XvZvi8rOry+r/yKGxS4Tia3yAl4DlImAjbLR28L4nIeshigdLUcuRuR ILSw== X-Forwarded-Encrypted: i=1; AJvYcCUY/GNDUfIwRdWQcISGPdwTqSnqgkK9m/v5TCMOp/0mhjuDfeVy5o7EGaPFWaWZmuEKto40gjWJEg==@kvack.org X-Gm-Message-State: AOJu0YwW5+JHJSVZ2Bz7ddYkQDIUAeeHSJWQiKtuf1KZJcHZ0ozQMvd+ Lg70Nr3lspahsjAW9dhE8XwGSBULapCSGR7cWcuQ0vAcCARWTyysXvpIoh5ii8s= X-Gm-Gg: ASbGnctDhCR4wjO3PbN3cLJN0+hePbsHVEf66d2YUzxiYFbKxOxIaMoE7cgU4Cv35yZ dcmQxFg30+xwj0gdiL5a/VS3Z9dXKCzDtrdk39SkvPLBqoKGqHfOKuRwG7uWud6hRt5e5CrBtQl uZ4qJzIU8WFyRr8wNMO4PNOxgvkudu/fPc8ghUqAU8XjpeM/8hgVhM0DNb2PFxvqS9Q6BJykU97 Oh6AUnd9Jlkqj3/3JnRirotN2OkdW9/XkvTLwRkKW/P3hWW/+PZoo1g8WO76FEjH6y1LFnBKdA8 ogys/Acq3jmYpMfzWjYEd7Tc58yIkpqYo6kz7jhP5nCMxMV4SLAHrVCZWdjR8EIYfNxnZV/hRY1 KpLO7I1H3YE7ZGBBtUA== X-Google-Smtp-Source: AGHT+IGnjTVKROKhiUbKxR04fwQFtAP05Upi/1tbtxnH3zXCfd/7yJ0o+NE/GXDeWDfof5DG3+Gjjg== X-Received: by 2002:a17:903:22c2:b0:223:6744:bfb9 with SMTP id d9443c01a7336-22a955738c0mr177726185ad.41.1744066841870; Mon, 07 Apr 2025 16:00:41 -0700 (PDT) Received: from dread.disaster.area (pa49-181-60-96.pa.nsw.optusnet.com.au. [49.181.60.96]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2297865e0b4sm87525135ad.154.2025.04.07.16.00.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Apr 2025 16:00:41 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1u1vSI-00000005pxh-0x3i; Tue, 08 Apr 2025 09:00:38 +1000 Date: Tue, 8 Apr 2025 09:00:38 +1000 From: Dave Chinner To: Matthew Wilcox Cc: Matt Fleming , adilger.kernel@dilger.ca, akpm@linux-foundation.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luka.2016.cs@gmail.com, tytso@mit.edu, Barry Song , kernel-team@cloudflare.com, Vlastimil Babka , Miklos Szeredi , Amir Goldstein , Qi Zheng , Roman Gushchin , Muchun Song Subject: Re: Potential Linux Crash: WARNING in ext4_dirty_folio in Linux kernel v6.13-rc5 Message-ID: References: <20250326105914.3803197-1-matt@readmodwrite.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam01 X-Stat-Signature: qqhjh3yqstfitc5c6uty5iot3mjh3t49 X-Rspam-User: X-Rspamd-Queue-Id: 126E714001B X-HE-Tag: 1744066842-784890 X-HE-Meta: U2FsdGVkX18ffjWkK8dlZb1r5t0YLvmt8y/0mTTEaqaGLk+by14HM/SUE6w87aduxvKzw3l3WbhfDXlYadVecq+B0YyvVtrw7G5QQBdwfcRQHMuNXD1PEzS6Jre+qFooen/de9hMPn+A9CCEE/DwT7+yjNC6d+NkGTwZtB3ii+iXtPA9r01UfIf0LRjcYK6JEfVZ3OUIudKgDemz2EQZgGdZvH40syfvwUjFnE7tcWdhJ59wNu+MtwCJb7wZb1ElDeRfeaIV72isVEixxw0tjiNzcrU94TIxfRX630rQn094PmBfQExsPWIiLKbfURdMCiX5lTt6Vl8+VnTXsBjQqLB6jqE5rxIoDGQniVClp4rFs8iT4ot5avlTW7J0YXlmPm5NCr/ZKPDObwk27DLJozdJscI0TSu/IhheR6IrLEVWFtOPjpmdDWhKW5oAILt6y1ca9rsR+DpnR5KiVUBovf6BWBiBQlHKbFJcHA8YCf8qc6D1+7Y7zVXytHedKYFKbqGVELAiWn54TYsBm5EzT+OpebhQHaE+sjubnSDgNQStMwXQFuILOQnDN9nWq4feVIc03Ep8d+uI5KNQoeyClsZ64DXPYZpP6ABdilY6OwnLEq5nOzxKSNlyvCQzEkmnlM2mXmu7Zr2fgcXbOD3esEnZvbPgvqVV7bxfrbcx0X4a1IJJf97knMVZ6aVsqrGcAtM1U1+P1dHKeQDZzqG34xTztj8C4nxERRV1tPYQPTauP02zgx7Pq6Gd5NnnEkOOxlGIbOL0nB9LpNNulsLit32Ngl4tEpfwVjaOi3h0XRBAudYUqastZQHGQIm3V7LCkkcLD6WW1xqtI6SnQI+TmHigVmnC4Fcr8vQm24SAFZ/55xICToemmvFvamt7voATZkaHqcO7hREjEjVLCJSa0HT+FslosmAJcfbidXLL+Fal1ooPoYw9VbWsZZMCHkspNIk6Piw1YPR09dZ9uzA gY0Xqmbt L/U9BlZ0TE+BxWscI75MU+uP9LSkZs6WCik4mJ4P2ONmCoVtZZb08k0nT7funuce+f+XMI3zSCnzZptIgRICUQNK4We28ybw5sotIkf8PiA7NpequaQ0NLTGEtxWKf1CaYopwe8+3sam8Tk2tCLiOHNbJwJCDx5/DbLbHVD6PgtIro971GXXk6PBmysboMSWPJ2SzB6yb11xvMuMLHWjSjxiPV/GxeFtqoOhrO4zCPmfkBWDafNFsjLzNGlIZN+VWRbL4NlrtzSoBGkKE18LTQWwKNSNeO8sRyHBzxmdDceHeEkbDQ6RpkDhL35qtshuy7kx0zqyhkeUcsC9uavVzBkp7q57I4jdVgx420YdSBSiFMb7Ba1F+H6s5D22jHfXwh7Vdaf29zaNaIjPqDO+lkBJj902ZrN0BwswEPJi7wRj6+7rQPxdn5WX0MUEZU+8fhErFg4PovAZpb2eFQqHxd+X+LSm8Jb7zFdBSqgdjivw/jRv9JxfklYq57yU/vNDgWz3MNeOraxI9QKF/WahBARCG3EyjAVoOsWzqcB6ERCEEULagrxat9ioYmH4Kd5uBJ5ORiFUJyYC4q95CS35TGSpDMcQfHrJgQfhVOzH9RcGwpx4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 03, 2025 at 06:12:26PM +0100, Matthew Wilcox wrote: > On Thu, Apr 03, 2025 at 01:29:44PM +0100, Matt Fleming wrote: > > On Wed, Mar 26, 2025 at 10:59 AM Matt Fleming wrote: > > > > > > Hi there, > > > > > > I'm also seeing this PF_MEMALLOC WARN triggered from kswapd in 6.12.19. > > > > > > Does overlayfs need some kind of background inode reclaim support? > > > > Hey everyone, I know there was some off-list discussion last week at > > LSFMM, but I don't think a definite solution has been proposed for the > > below stacktrace. > > Hi Matt, > > We did have a substantial discussion at LSFMM and we just had another > discussion on the ext4 call. I'm going to try to summarise those > discussions here, and people can jump in to correct me (I'm not really > an expert on this part of MM-FS interaction). > > At LSFMM, we came up with a solution that doesn't work, so let's start > with ideas that don't work: > > - Allow PF_MEMALLOC to dip into the atomic reserves. With large block > devices, we might end up doing emergency high-order allocations, and > that makes everybody nervous > - Only allow inode reclaim from kswapd and not from direct reclaim. That's what GFP_NOFS does. We already rely on kswapd to do inode reclaim rather than direct reclaim when filesystem cache pressure is driving memory reclaim... > Your stack trace here is from kswapd, so obviously that doesn't work. > - Allow ->evict_inode to return an error. At this point the inode has > been taken off the lists which means that somebody else may have > started to start constructing it again, and we can't just put it back > on the lists. No. When ->evict_inode is called, the inode hasn't been taken off the inode hash list. Hence the inode can still be found via cache lookups whilst evict_inode() is running. However, the inode will have I_FREEING set, so lookups will call wait_on_freeing_inode() before retrying the lookup. They will get woken by the inode_wake_up_bit() call in evict() that happens after ->evict_inode returns, so I_FREEING is what provides ->evict_inode serialisation against new lookups trying to recreate the inode whilst it is being torn down. IOWs, nothing should be reconstructing the inode whilst evict() is tearing it down because it can still be found in the inode hash. > Jan explained that _usually_ the reclaim path is not the last > holder of a reference to the inode. What's happening here is that > we've lost a race where the dentry is being turned negative by > somebody else at the same time, and usually they'd have the last > reference and call evict. But if the shrinker has the last > reference, it has to do the eviction. > > Jan does not think that Overlayfs is a factor here. It may change > the timing somewhat but should not make the race wider (nor > narrower). > > Ideas still on the table: > > - Convert all filesystems to use the XFS inode management scheme. > Nobody is thrilled by this large amount of work. There is no need to do that. > - Find a simpler version of the XFS scheme to implement for other > filesystems. If we push the last half of evict_inode() out to the background thread (i.e. go async before remove_inode_hash() is called), then new lookups will still serialise on the inode hash due to I_FREEING being set. i.e. Problems only arise if the inode is removed from lookup visibility whilst they still have cleanup work pending. e.g. have the filesystem provide a ->evict_inode_async() method that either completes inode eviction directly or punts it to a workqueue where it does the work and then completes inode eviction. As long as all this work is done whilst the inode is marked I_FREEING and is present in the inode hash, then new lookups will serialise on the eviction work regardless of how it is scheduled. It is likely we could simplify the XFS code by converting it over to a mechanism like this, rather than playing the long-standing "defer everything to background threads from ->destroy_inode()" game that we current do. -Dave. -- Dave Chinner david@fromorbit.com