From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A59DED78307 for ; Mon, 2 Dec 2024 10:44:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9AF26B0089; Mon, 2 Dec 2024 05:44:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E23C86B008A; Mon, 2 Dec 2024 05:44:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9C436B008C; Mon, 2 Dec 2024 05:44:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A86166B0089 for ; Mon, 2 Dec 2024 05:44:40 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 47A4AC1468 for ; Mon, 2 Dec 2024 10:44:40 +0000 (UTC) X-FDA: 82849684974.08.1B83BA7 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf29.hostedemail.com (Postfix) with ESMTP id 14D8D120011 for ; Mon, 2 Dec 2024 10:44:20 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=E5uGDr9f; spf=pass (imf29.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733136269; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yq8INn4zbqzqDhdWCGdL8z+1i1zAqXBLUq6KmP1JVYM=; b=kNpDgr8XtQ3r7TARdl1Qr5NkN6WHZ+zfl7hmabI0i1Yo0vDVW+RvN4PVMu9OdTDP/txM7+ 7WaEkyxbxm4ImZVQJ7eJW21ufiLCQTewnuH7VTMS3dSHQwpPRFDFXIDGHRocW4Y+KzD+Dm rDoK5kCJO5EsJddyd40+B0A0mwGYmC8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=E5uGDr9f; spf=pass (imf29.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733136269; a=rsa-sha256; cv=none; b=VuxfbKPpDGcAqR/Jm+70um/9YOdmvmQ4APp0zR05+cMjYEhDd/m4uJ4CFY+ZTZ7R7iZknk 7v1/fVfEzNZkooUirbH2/DKraAi1Rmh5Mue3AMEXZ+9UTujFOrGbJG8TFGxClxuLsbJZGz tAYj4bTKIzbZkCz1g2YEioZ8yZayH/c= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1733136274; bh=yq8INn4zbqzqDhdWCGdL8z+1i1zAqXBLUq6KmP1JVYM=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=E5uGDr9fYjfHo3lHmuz4fFFchfOz5KOxDw6oDLr7hssldy12vEJyZFgxGPbEZbkPr rboK6KdiyWavWJFdNZUq5qKRp3TXNLjF1TyCLrRw3Kj0FM6HwjXYvmoNJnqNjUIgnu huitjDunqC3Gf6IedTFVl+G2Z1QugCGmMdfHh7z8= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.200.121\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Mon, 2 Dec 2024 11:44:11 +0100 Cc: Chris Mason , Dave Chinner , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: <48D7686C-6BD4-4439-B7FD-411530802161@flyingcircus.io> References: <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> <381863DE-17A7-4D4E-8F28-0F18A4CEFC31@flyingcircus.io> <0A480EBE-9B4D-49CC-9A32-3526F32426E6@flyingcircus.io> To: Linus Torvalds X-Rspamd-Queue-Id: 14D8D120011 X-Rspamd-Server: rspam12 X-Stat-Signature: bc9phpyoz4itbw4zq4ithn16ttqakot3 X-Rspam-User: X-HE-Tag: 1733136260-919647 X-HE-Meta: U2FsdGVkX1/gwTjn4Ttmbsv+/0z5RxGjoHe2M2nLBop8aP2y6P5JiFVBv7tXuH14qYJ3bDePBAFhhT8LO4LKH8rG7GzGUylFZPodgc3Spsi1Ui91ngXpy+bJqjrVGZ1/Li9o/ZOhUQN4pC7STHDzxf+Ou5xRo/icjeoi2u5vLuClOfYWa6P51NyIlZC8/RNTEDGKB//Jxq9o4kt8hS/QgiJ4QQXRsyodg1q1kO7bHWoKRKyxo3JJFyraF+wBEiVAMshnGjcz1U5nXbiwOlJwOHTvG4SCmLpZPyz8C2T7+kzibeikjXfhTBqjXcuGCFWeJlgEnRqkOBmzbtP6Ow3NACWiE3vjC2/JGBaF292JL6d/dyT0uX9wbftprvfoBB9gLw+swPjM2a37ogBW+whdgXL2pEz7rAq6bjJGtwKApPgJRvXWhOQZGQSpbwvV9zaXsUHr/WSKPGmBO/IzH08Mig+KoEpSWomWtUwkozgaGJb/YoaokI7+J64cmBn7fsgTv1ID0ihO8hXMQxDnmMxfGxLK7BGYLNSsL5oqC8uymJbiqa+BUY+53VkLA5DdVnr0nrnQEPrqXh/0vrJbuQtJ8Z8m3ZJkCfStPqDLcYwoxVBEVWrdPQf2Hu/fUQP5wks92M0dFCJ4q9zE0RoRA2fNoAXZpZ7dbdoe1JImMuFe06D4yqxmhmYrBORe12Uh+IhPa3o1KptiI3YBoAHf1FkUJINOu3KZKQLuxrAP0IR1rCDVC928AEEIENqdi9IuRp0Am8yEifYuj27CYGugdx++VUNk5p3db7rVSC4Mna+ZNVld4i3U0mQlHNiQ/k4d/pYP8wglrAu6ZheVDDCFq7fZhWBM+9pIxLoISQ8FuwiQ3V29zk7zo3IgyIy8cfP+zafex+nNtW8DhspxhqHgRkWlhr8xACpf5ISCaRsu6ZPsI5JMLI4qoC/OTjsUwPme7F1a3kAUY8T/VkXDsZwoilG v20sLNNA i1u7myzqQUExYbx1HshudwsH+a8H6VJrdtFyBRgDyKVecAMPxJJyPoiioTrrNj0q1p7R/kMlityS4YMg+YR8PjVpFD4M6uJnsMZWhW0ldgFwv7BA2IZPKINSFuj9sA7HngDLI/So1viSYOubhHSBXFdtdpMrjK/TKeSJBeHjztDTdzSBu9b5KdDbVzqXJtQFEqLr9llGboLX6kAHZdL2DwMRd1ghTR9SUEO0VBqcH3CWciOvTwXXtEsi+wavMujg+P3H1gPEkvozbV7rKfIpbrvC+PiXn7C+Y7uOV3dgy2Imhl4ZtlCyG7KD0RbTL2sdRY51trp0ZC7rabFl2shQ09Uvj+YtWJHuBWxWpJtd0UCknLLBhoFkXzw/NL85xVsg9lOZws3uGh+3JvGJim1qV5inlfd/EPzCnCHc9pxe5xqY77iTOYRfvHKDO+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, waking this thread up again: we=E2=80=99ve been running the original fix = on top of 6.11 for roughly 8 weeks now and have not had a single = occurence of this. I=E2=80=99d be willing to call this as fixed.=20 @Linus: we didn=E2=80=99t specify an actual deadline, but I guess 8 week = without any hit is good enough? My plan would be to migrate our fleet to 6.6 now. AFAICT the relevant = patch series is the one in https://lore.kernel.org/all/20240415171857.19244-4-ryncsn@gmail.com/T/#u = and was released in 6.6.54. I=E2=80=99d like to revive the discussion on the second issue, though, = as it ended with Linus=E2=80=99 last post and I couldn=E2=80=99t find whether this may have been followed up = elsewhere or still needs to be worked on? Christian > On 12. Oct 2024, at 19:01, Linus Torvalds = wrote: >=20 > On Fri, 11 Oct 2024 at 06:06, Chris Mason wrote: >>=20 >> - Linus's starvation observation. It doesn't feel like there's = enough >> load to cause this, especially given us sitting in truncate, where it >> should be pretty unlikely to have multiple procs banging on the page = in >> question. >=20 > Yeah, I think the starvation can only possibly happen in > fdatasync-like paths where it's waiting for existing writeback without > holding the page lock. And while Christian has had those backtraces > too, the truncate path is not one of them. >=20 > That said, just because I wanted to see how nasty it is, I looked into > changing the rules for folio_wake_bit(). >=20 > Christian, just to clarify, this is not for you to test - this is > very experimental - but maybe Willy has comments on it. >=20 > Because it *might* be possible to do something like the attached, > where we do the page flags changes atomically but without any locks if > there are no waiters, but if there is a waiter on the page, we always > clear the page flag bit atomically under the waitqueue lock as we wake > up the waiter. >=20 > I changed the name (and the return value) of the > folio_xor_flags_has_waiters() function to just not have any > possibility of semantic mixup, but basically instead of doing the xor > atomically and unconditionally (and returning whether we had waiters), > it now does it conditionally only if we do *not* have waiters, and > returns true if successful. >=20 > And if there were waiters, it moves the flag clearing into the wakeup = function. >=20 > That in turn means that the "while whiteback" loop can go back to be > just a non-looping "if writeback", and folio_wait_writeback() can't > get into any starvation with new writebacks always showing up. >=20 > The reason I say it *might* be possible to do something like this is > that it changes __folio_end_writeback() to no longer necessarily clear > the writeback bit under the XA lock. If there are waiters, we'll clear > it later (after releasing the lock) in the caller. >=20 > Willy? What do you think? Clearly this now makes PG_writeback not > synchronized with the PAGECACHE_TAG_WRITEBACK tag, but the reason I > think it might be ok is that the code that *sets* the PG_writeback bit > in __folio_start_writeback() only ever starts with a page that isn't > under writeback, and has a >=20 > VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); >=20 > at the top of the function even outside the XA lock. So I don't think > these *need* to be synchronized under the XA lock, and I think the > folio flag wakeup atomicity might be more important than the XA > writeback tag vs folio writeback bit. >=20 > But I'm not going to really argue for this patch at all - I wanted to > look at how bad it was, I wrote it, I'm actually running it on my > machine now and it didn't *immediately* blow up in my face, so it > *may* work just fine. >=20 > The patch is fairly simple, and apart from the XA tagging issue is > seems very straightforward. I'm just not sure it's worth synchronizing > one part just to at the same time de-synchronize another.. >=20 > Linus > <0001-Test-atomic-folio-bit-waiting.patch> Liebe Gr=C3=BC=C3=9Fe, Christian Theune --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick