From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FFCCEB2DB for ; Tue, 1 Oct 2024 02:23:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCF656B01A5; Mon, 30 Sep 2024 22:23:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5C66280036; Mon, 30 Sep 2024 22:23:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD1CE6B01A5; Mon, 30 Sep 2024 22:23:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 863416B017B for ; Mon, 30 Sep 2024 22:23:06 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F3B451C4B28 for ; Tue, 1 Oct 2024 02:23:05 +0000 (UTC) X-FDA: 82623435972.24.FB9EF7A Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf08.hostedemail.com (Postfix) with ESMTP id D04C0160006 for ; Tue, 1 Oct 2024 02:23:03 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=xec93Axm; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf08.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727749363; a=rsa-sha256; cv=none; b=PevLCKuxKZw3WsrDnxyY8G7FQUbHWoHowR0VOMU0tlbrgJefUx3ZvfZjZwB7HEfs2Rzs9S FGNEZ1rmYVa5ETbxWh0XIcMKKVbAvlDgFtp3aVSIOzeCVAmDlshcL6DwVTESWEMgNeohjh nU6UWHj+PQvv+fEtqgEJkbp1+X1Z75U= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=xec93Axm; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf08.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727749363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fphHfnJEcRT+CCCigEf6BbGzPILMR7P2aFDlWYLdau0=; b=UTzFnHJXQBzs0BLMs0Apa7tTYsJA5UpmWkYM9eiVMLnHBqzHSU0yVodzREymtZ5+hqC2IX lxgh+mprB/SwRGh2cwocCCU2EvX6+sfIihXovLCnGZZDILi0H6nGld+hFvC1/9c4gcUvHi BlmRttM3w8rMcxWpeecp02zFUjy57V8= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-20b9b35c7c3so12927135ad.3 for ; Mon, 30 Sep 2024 19:23:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727749382; x=1728354182; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=fphHfnJEcRT+CCCigEf6BbGzPILMR7P2aFDlWYLdau0=; b=xec93AxmRSr/BDjKIeoy+IYlXL2kAIlb3KH8j1N6iuUa2H/YnPUYBAx7qyuv2uOu/z R6zdSKI//ngBXtEsQdwbdT1jCZtskXAAEwG2swkEftm68RmcBl+2oDB2XjggAcQeFPoJ 6yvyiwl9t2LpztdYGHk2/7dgY0aDNRwJfT8QvlpwGIwDe+6MP/BWip+7f9K4U+xWXBWh ROUEH9lAGO+vN6axioPRKVbHyhYmZnXW5ujhH2kzj3+UNP+Lc38FT0dQtCbUHN7wbOSe BfVOx6oBjboBaqHJImdGR1JDihyRQhI9BaYNitz7ZCZScnlwJFxNYGwGB1lNNEBfk3Ec fkpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727749382; x=1728354182; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fphHfnJEcRT+CCCigEf6BbGzPILMR7P2aFDlWYLdau0=; b=pPfRWT0tFhbwJ7bIXZr4tk3naQi5pqJ8XmmFT1+po9ko/1Ud44hg6NcDWLcCf9e69P /2DkEaAIZ25MyfWReYFrLTUpnhAJ6svoWb8Asa5b8k1mEclTbZE8OiiyblHZxFPJuSeG nUs7vk6HraTsGmaf7OzdBPKIIAygXSfye79fsFk1bJsVUVAhdYiqJlwol+wW7F12O4Y8 PxTMkSIugY6w/INKwF/Bs8NLXtOIK7f05O0kpNUwbYXzQ1uWaEy2gct7VDNpOihHuoHV i+nkUNI5r6YhcQj10ky/sZkPji6AKPO8kxdRo9hzvuqNE+0vMQUSj6nQ5oIayQ2nXOSC nXzw== X-Forwarded-Encrypted: i=1; AJvYcCX2A6T47NQk5yp1YgCPErKOhkwAL9MEzzhO/Jyv8mFlQuz2HAMwjeBArEbznARIWakXkxj2+/xHtA==@kvack.org X-Gm-Message-State: AOJu0Ywt1M+XN7W3XPrKb6uMJgWVSAjb806WIwF+Y6fj1OqAY6zjKz2B jJVHHQXyt1FfDqcr7EZtQ8hL+ytFuxyNZ0hs+cV03TD4eDDPdDPXh9sNf5cBNss= X-Google-Smtp-Source: AGHT+IEIi8UX+/6HPP0XEhI/lnEGGmhElhXWLrtHf1wP23Nef5qBLLRm8LL8M7x5tzM1YwV1gXAbcA== X-Received: by 2002:a17:903:41c3:b0:205:5582:d650 with SMTP id d9443c01a7336-20b37bfaae6mr212256425ad.52.1727749382508; Mon, 30 Sep 2024 19:23:02 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20b37d5edbbsm60978415ad.13.2024.09.30.19.23.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Sep 2024 19:23:01 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1svSXT-00CJIe-0I; Tue, 01 Oct 2024 12:22:59 +1000 Date: Tue, 1 Oct 2024 12:22:59 +1000 From: Dave Chinner To: Christian Theune Cc: Linus Torvalds , Matthew Wilcox , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> X-Rspam-User: X-Stat-Signature: m9ktitsccjh99c5n4qpknirdqzfytoh1 X-Rspamd-Queue-Id: D04C0160006 X-Rspamd-Server: rspam02 X-HE-Tag: 1727749383-303163 X-HE-Meta: U2FsdGVkX1/EkxS7isEKsurxuv7+WZzqZ1tec5rzK0fdCqi4MQxuikFF2riZR5SRj/0JfxHXNtLoutQEQMqKB/PQpHNXgwyWG8m3jm1P7AuFf0tseMXuMDAMbsKc5jiHK8CaVrUlDaZT02JJLtAY0EHJqlNOKirxee1ehxBdDBkiAcwroEOFLUZx7a8+SqRG2COfsf623nuSoTdF1/j05SeM/iNI52e4RmFQaIlxao9/TTD35lxXNOPUXRhKNDozRUBBWQ5uYXqz7X60YCcugsMflHOnmsoUAb68tgRlSbZLyuKWKE3hrhGtc71NyRLEH2doegZU95nKcsUqbzM+Z5HYtocELdv/T/gnHz/JmWBicdCAvlSux2C1iR2pKaSl9yZ6nd2CmydvUA7bX7aNqB6AqvhdvARQjVrZTNJVdSybYrFSWIwBtnOSotX/M19hCEu0Db/sQCwUjIS0FPIJMo8a1VU6PJMA04dNJRsqRD06uWb/DHNJVwG46EHiyvhqYqQstgHYvoZ2CdXj75g9e7sZ11hoURdsx2bS7R1kcq0qoZE8cgQeKjIqPgp3B4PPGVOxGzOrk3RWqUU2iGaKQXeEX059dqNKC44gVNglj6h8nMP44NeldsRoFRW710iYrC40q73MMjxRJKuQy9dII0IlWIjVcYIYlERulk0A3rkqyuxEclW+v2ziRJNn2+3r0gImi4KW+6W1N8UIjUAONBhFCkRWd5IkKaDivgnqwyYaPcM6vrEq7psbtbHqlA+i/Vw2Ao4mXhYASUcF8FEu9DUUt+8gFiNOQ0c2a3nzIYg7wRdYqKczznUtOkZmSVGUXGTkPnRc8zWaN3+hnuOVOjDN4d42Za85Yi8m8iRfMD5J4txt9GxkkTbJYOnGoNJpmZ4mV2Z9Rd8dYDZhRPa9m6pAEeUrAwWzASylxYyoyQu9/oqS3JPzmdOR1R84Jm7a8Q+b+ODuCiCOZle8Xft KEsfZaUB +q3n9UzkUru9gLW86uKlAUxK2umkxpbRrEh0ubwE1edMCXqunVrK3tuVHKo0qwwilTGEsDp/f8VFEipSkB6PHBQz5xAQ9LnQdud/eWaLa5v3z5GhEG3+QU4WzLLIBJ6lroC3hMM/HvUbzVRwCYA94C6GPQoFd8B79kL+eVXx+AjgepAUOmI/vIfjE8o7ADfNu7/dEdT97FCCTLdiR97biiBwgeTu19sKjV8JKxY5QglwWG2IAtPguEHaDbiapsqu3Sn1z8ZDtAjTZ7b04ACfNuUDLGABn8ARB5gWh/gFiGz6UQuNtYtd+/bT8NZknIUjQEEGDFmGk951YF296nBDKP2vvOyK5wxKnN7naKchC1Du9ZiUS7cY3s1pOACZ+W2yYTaKZWRmY925rYgQgOlIhGGyEpr6PMimlDyARZ+detxJj66fnyPd1/WSyLg/z8BBpZw/YjIUJeoR3Sfa1UjpiS7s4iZURGYCRMO4HqRz6a+DrNeQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000040, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 30, 2024 at 07:34:39PM +0200, Christian Theune wrote: > Hi, > > we’ve been running a number of VMs since last week on 6.11. We’ve > encountered one hung task situation multiple times now that seems > to be resolving itself after a bit of time, though. I do not see > spinning CPU during this time. > > The situation seems to be related to cgroups-based IO throttling / > weighting so far: ..... > Sep 28 03:39:19 10 kernel: INFO: task nix-build:94696 blocked for more than 122 seconds. > Sep 28 03:39:19 10 kernel: Not tainted 6.11.0 #1-NixOS > Sep 28 03:39:19 10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Sep 28 03:39:19 10 kernel: task:nix-build state:D stack:0 pid:94696 tgid:94696 ppid:94695 flags:0x00000002 > Sep 28 03:39:19 10 kernel: Call Trace: > Sep 28 03:39:19 10 kernel: > Sep 28 03:39:19 10 kernel: __schedule+0x3a3/0x1300 > Sep 28 03:39:19 10 kernel: schedule+0x27/0xf0 > Sep 28 03:39:19 10 kernel: io_schedule+0x46/0x70 > Sep 28 03:39:19 10 kernel: folio_wait_bit_common+0x13f/0x340 > Sep 28 03:39:19 10 kernel: folio_wait_writeback+0x2b/0x80 > Sep 28 03:39:19 10 kernel: truncate_inode_partial_folio+0x5e/0x1b0 > Sep 28 03:39:19 10 kernel: truncate_inode_pages_range+0x1de/0x400 > Sep 28 03:39:19 10 kernel: evict+0x29f/0x2c0 > Sep 28 03:39:19 10 kernel: do_unlinkat+0x2de/0x330 That's not what I'd call expected behaviour. By the time we are that far through eviction of a newly unlinked inode, we've already removed the inode from the writeback lists and we've supposedly waited for all writeback to complete. IOWs, there shouldn't be a cached folio in writeback state at this point in time - we're supposed to have guaranteed all writeback has already compelted before we call truncate_inode_pages_final().... So how are we getting a partial folio that is still under writeback at this point in time? -Dave. -- Dave Chinner david@fromorbit.com