From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABA58CEB2C0 for ; Mon, 30 Sep 2024 20:57:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3547428002C; Mon, 30 Sep 2024 16:57:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DD0A28002A; Mon, 30 Sep 2024 16:57:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12FC728002C; Mon, 30 Sep 2024 16:57:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E5CAA28002A for ; Mon, 30 Sep 2024 16:57:06 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 70580160B96 for ; Mon, 30 Sep 2024 20:57:06 +0000 (UTC) X-FDA: 82622614452.15.CB6720C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf29.hostedemail.com (Postfix) with ESMTP id 20B18120018 for ; Mon, 30 Sep 2024 20:57:02 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=uQYqFNAu; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727729786; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CYhZpytb38aFtb2OIz5EkmYbXp7wsTdiyaA1Hi7/lRY=; b=qcrjdRCePgsua24DWfnlmSo1jsFE4PvP/uJrE27+cwiBIly+TC46tWpvcftr5UlADdET0+ PVqA+kCpaCE7BJBtXrpgGDcMTn+pAzU6Gxfa0fPqmBTWRvQ5UEkprjCMBX2pI1az8DB/7J JktvHBAYcaqGzKHuM+GzguMQx9ecJDg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=uQYqFNAu; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727729786; a=rsa-sha256; cv=none; b=fWvFgIXp/P4GfDU6No8YHFNA3MBU7NfEjFbziEmSC7j8ZwyHg5sBiJzS398BzRowC2LKxS 7uY89EEwZZO+8jaM8YsZR9V/LPaCikVsXr0fhWEy1n7vHAu5TrZfNnhi1eBlak6KKBsddl dtNB0zrUQ/aKpTtiHQe8yANaFs9bsmM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=CYhZpytb38aFtb2OIz5EkmYbXp7wsTdiyaA1Hi7/lRY=; b=uQYqFNAujgk6D4yfw7K+xlhc+N s9mPV+J3PpTfHxSccYhnB2FyKGUifVBubSYz7Si8lA2eiYw/FGZ8LfAzL05fap14UMYO3sv7k6mbS uv+ctARlaq4JQPxDePMCfjp0vjG7Pn77oJ1/clzYyP+RHz3tAtwsTOKif9fTQC1egl2NpKr36By+4 1DW1ISu+xyM2W7451yQy8gv/Qlg8ZEr+HLZBYkZZIS/qTAdR1dW2P5ccEFuXRIjmJpeNTVyu7MQVk HgjSuuL8i6l2Zocqb9+o/DmWZetyQhwRHCLiJj6bwravb8Qu4bsUmNdOnKPN5EMQcrNJi62vuVLKl M4PVNnCQ==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1svNRw-00000000cMU-3ST3; Mon, 30 Sep 2024 20:56:56 +0000 Date: Mon, 30 Sep 2024 21:56:56 +0100 From: Matthew Wilcox To: Linus Torvalds Cc: Christian Theune , Dave Chinner , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> <295BE120-8BF4-41AE-A506-3D6B10965F2B@flyingcircus.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: t4gbdzgamrdz7dw5gencj6kjopwokxxs X-Rspamd-Queue-Id: 20B18120018 X-Rspamd-Server: rspam11 X-HE-Tag: 1727729822-222471 X-HE-Meta: U2FsdGVkX18qp8Gy1FgWVGQCLlmFd1/HDRkH43YO6yKrnoGzSjumr4ZIl7obmv2T5bmVTqfq1HKDirN86ZIs4i1sle6lqIJIj6JvDtPkWsPUNuUUiMCxJwiS/AmiR64lrAzFR2fHbStorB1qOmThuUaRFd4MCU3TNe2O9Z/FsKD37cCcFSnnzSOsR8wjJQg4VOdp/kAQ/041X43yZ7C3HWXWdFfBv1eI9Uz0S2IrR5ukDykUUxi1oGGQfilqP7fsxS9J6TwJgaVJLw0/w2m6pIRan4x/iN65GzoD+QHXx/57Jr31U9fwqHHvIh1wQp19rsvnLqccl6fe9l5EN2bL354ofWPp7pQYkPhHhal9CpkHEc8lyEQ/S86uOLh1VxK5Pi2EROfMFz4q6kJcx8nsm6h3xLz3v3DJhoy8V8dfNHnkytCG5vPxtMLsFWumkbsuzW1svn4weTwzSNr4rMVLC+UGbOvPKxJLs3esLkfY2fUkvGcohZXlE+Jl7GEDqydpjDistdZNUKfh4tce7TTvgYstvK/Hlc92dpIW1fkOj5tO3f2xmao8nAZYrKwNQx31maXAYWx3+41BGhzs3coTv35rz0YAwMMOoXqsCNjVqJWd+Aa+iQnh5Rnp8WYFJ98LXyKZ4otRwW222uLv24Fwm8tsNovU6DFq/o/4vNBfXM+dBnfflFSkwgDkEngAiuqbJmfvd7QGYVW1IoLdV+hp/C/Yla/+HSV9b/hqcAxA51y9mRZb+27hm3r6vVRAjKSKZvt0D6ch+Bs3jjHZhb/2NjIPz4GT5QUx1WBMF/+ymnFvN0jG/VrYGTfAdxUXx7dbRVYpjddoz5lqvIDIGzQU5xJHwyvjps4eOT+A/9Rzcx962JX+u65O5sCJvAjB+pXi8y1oxC3iSf9EXbaRvO5prMMZKaiGOE9ehnfGKcK07KPv0Rmae2I7ZjKhVOP7Fa5r78iaFaHBc1nl8xZoxn3 WwWEhdsI UFd8XWR2HtGTQ6J217L0begPnSycPwOB0LT+l/67W8LMhmc2d0cCFnHnk42tE7wZDmerCyiAW5cYbp3X8n5f+v4Kz6QKpuIX9kaIQT5uEo7XMRmBG7E5nEsW86kHwh0dXTA/XNCS4aJncz1jm2csQXQqtXp86/lEnHo+bYWcAhRhDsM1sxUssQDQTxSbAnGB+iOZtgF6EF3HHsSt1adIZdVsk2lu+qoKzjAq8RKppy4U59jikxq4K4p+zsH+cRYypFsGiP5ivXGt0/EZh/bYlp6nJ6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 30, 2024 at 01:12:37PM -0700, Linus Torvalds wrote: > It's basically been that way forever. The code has changed many times, > but we've basically always had that "wait on bit will wait not until > the next wakeup, but until it actually sees the bit being clear". > > And by "always" I mean "going back at least to before the git tree". I > didn't search further. It's not new. > > The only reason I pointed at that (relatively recent) commit from 2021 > is that when we rewrote the page bit waiting logic (for some unrelated > horrendous scalability issues with tens of thousands of pages on wait > queues), the rewritten code _tried_ to not do it, and instead go "we > were woken up by a bit clear op, so now we've waited enough". > > And that then caused problems as explained in that commit c2407cf7d22d > ("mm: make wait_on_page_writeback() wait for multiple pending > writebacks") because the wakeups aren't atomic wrt the actual bit > setting/clearing/testing. Could we break out if folio->mapping has changed? Clearly if it has, we're no longer waiting for the folio we thought we were waiting for, but for a folio which now belongs to a different file. maybe this: +void __folio_wait_writeback(struct address_space *mapping, struct folio *folio) +{ + while (folio_test_writeback(folio) && folio->mapping == mapping) { + trace_folio_wait_writeback(folio, mapping); + folio_wait_bit(folio, PG_writeback); + } +} [...] void folio_wait_writeback(struct folio *folio) { - while (folio_test_writeback(folio)) { - trace_folio_wait_writeback(folio, folio_mapping(folio)); - folio_wait_bit(folio, PG_writeback); - } + __folio_wait_writeback(folio->mapping, folio); }