From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB280CE8379 for ; Mon, 30 Sep 2024 18:46:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47DCC28001C; Mon, 30 Sep 2024 14:46:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40660280017; Mon, 30 Sep 2024 14:46:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A71A28001C; Mon, 30 Sep 2024 14:46:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0629D280017 for ; Mon, 30 Sep 2024 14:46:55 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AB1C31C5B1B for ; Mon, 30 Sep 2024 18:46:54 +0000 (UTC) X-FDA: 82622286348.10.9DBBDC8 Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by imf16.hostedemail.com (Postfix) with ESMTP id 9A6CB180006 for ; Mon, 30 Sep 2024 18:46:51 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=PprSIZRn; spf=pass (imf16.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.46 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727721948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8ZxtY/yExLakikPIWimGxq8rXeYi1ZRab5obj2wCNkU=; b=B0Cn8kgPumRBhw/tuUmN7+lyW2frPfJX4t9M/Qng5ITgzTQhjoPmBKcZI/hXBPcYDh9Yp5 JNYrvRWRExt2FoanIGRtMP0e8DUbR/VvfviNBG9dxyngVNtx+B+9zGYGQ+PVE17bDfF3a+ WWsTp8wyPTLxTuG62cU3Lb1OTHuYR3g= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=PprSIZRn; spf=pass (imf16.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.46 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727721948; a=rsa-sha256; cv=none; b=1GzNYf00MIXXkko3UTeWUpAj2i+GkvHVYNmFyH3wcvN3gduecb6zTKVExgMqrf1GHHxTSQ 1lA7yT8fVB7Up5s0e3orj97bIhowXbRlQkG5VQ5mbuoc0LIO4M81x6bXMv5Y5fOzCfgRjM 9E+gwABO5xQdbTrr4O80veUcTU6+Xzg= Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-5399675e14cso1500124e87.3 for ; Mon, 30 Sep 2024 11:46:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1727722009; x=1728326809; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8ZxtY/yExLakikPIWimGxq8rXeYi1ZRab5obj2wCNkU=; b=PprSIZRn3gPCpWRyZGIPulDzVl30sbKK5MADtbiaAAAE7iuVCKkJE/vKZYVlug01PX tqU03lqiwheqlsfJLaiBigZeUMfc2/HlXMru6X0Wcdz5bzTyYOCl+pUwZTk5ZIPJplau n0Is3VblRifxAXszrnDpmA7OGbOQJ+GGitS5M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727722009; x=1728326809; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8ZxtY/yExLakikPIWimGxq8rXeYi1ZRab5obj2wCNkU=; b=O0tCKe3PVJx8HgaWuQbnvcinroZ7dqCdAUeShQCzHl5dE7ZaSbo3v9uXpj79kYJSYK 9yHzZF40LrIBwprmA1nDwDHP2xTvVbN+SOiFaqHFnooMuBXRuvdY3yj1MfAGLWgOX22a h3gHSlZEfhhojyQQtCaWK5+2z6lFSUcw+bpv5TWzp8jJRkrXywHiVZKkkv933PUKz7Lg U1+C8vOrQvZkqqxxATJ/hQbr0XnCntjYs4qC1kI3V0IS/IMffA4GoxclZkxyBQBEXayt 7mMV0amDlc/kQ+AQ0f/cvMBwJHMRBCyo0MYCwOFHSwXHzDNJPZKxhtIfnkbWzq7mB7AI vC1A== X-Forwarded-Encrypted: i=1; AJvYcCUq1FL8JUU6lAdIFj37RAAJ0JdBssO5UL2RDNk1rNesqH32ifRZ7E7SoPDKy+8xj2dDKab0rmxlgw==@kvack.org X-Gm-Message-State: AOJu0YzxIcoerCAgj4pps9N/853ZVpMtRB4AEbHbquTLUbMe26bNIl3q EZDjLkyuOCgxULbLRg5fn9EXgzdlXp6d8VzOawK0DQB1AhRFJmGS34dyFErEVhaz7fsbqpfKzDy qJc4= X-Google-Smtp-Source: AGHT+IFYlMlpKJhUYVnFh9HBSBCJ3STOHenL6gBfjvBzW1ffdm8cMsX7NyYwGznOjUYSA8QNzROxRg== X-Received: by 2002:a05:6512:2510:b0:539:8b2f:c7aa with SMTP id 2adb3069b0e04-5398b2fcc02mr5652196e87.53.1727722009569; Mon, 30 Sep 2024 11:46:49 -0700 (PDT) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com. [209.85.167.51]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5389fd5e3c5sm1315837e87.106.2024.09.30.11.46.48 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 11:46:48 -0700 (PDT) Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-5399675e14cso1500086e87.3 for ; Mon, 30 Sep 2024 11:46:48 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCUUTWZ8s+PqlzrO+1YvyW9VIx1Eq8ojpFMUC4Uji8u+ugCqx1enIk2TjGrMPH9ce7cbI7L++GOEFg==@kvack.org X-Received: by 2002:a05:6512:ad2:b0:537:a824:7e5 with SMTP id 2adb3069b0e04-5389fc361dfmr6504852e87.18.1727722007490; Mon, 30 Sep 2024 11:46:47 -0700 (PDT) MIME-Version: 1.0 References: <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> In-Reply-To: <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> From: Linus Torvalds Date: Mon, 30 Sep 2024 11:46:30 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Christian Theune Cc: Dave Chinner , Matthew Wilcox , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9A6CB180006 X-Stat-Signature: noekcbb73yfj1jcjb65xnie7z663b58d X-HE-Tag: 1727722011-192296 X-HE-Meta: U2FsdGVkX19dkJlz8XK670U6WZlGxIJ/Lz7BI3irPg/Xpgy5FEozmZtpkAJsZOgPV36yJagDrhcu4+STp7jmr1kLyxTJMcZS4aLi1j26yXBACdPE7pZILR5EKolBfdAOXCEBqqOuiQtU0rloSY5j2mlYI7R4GNlgs9tKfp691b0oqaaXztwOFliAxsXs4In4vGzSMWqCfJwh8RNgtsz43aP3xiquh2z2rz63hsy7zu//cCDUsv6RipFbrD1y3Ko2PIgZf3H4AbQMr+adavy+5T2mKdQDOeJ28fe2TNiTx/q4mESnY+dd5w5KvySHT/Xk8Bl1SeLciuuxWIo6qlYXlf878UJwKcg1Ji/0qQZQpqRZpr906lvYSIQg26frRRQdK3/qWgG3LOTsMoWVKM/8jgreHR2SAMHkJB8g4nBn61b1gIdCl4FU1xBftQ2EOpCA0vi5s8M8SRZUiFHg5eBs0yYUqn6YJgPuNknTb1lmE8iyDOeghVg5+fYQI5eWE8bggg15M1nZUVqDY7FkoyfxY1XOo4ltcqDglKFjGKPy24asZN0ziyPBW+GGeiVbHUOmzWz4kTyk78tlL+vsaMpGvQRZVEDLUs7VIkYkMMF0V9nqJ57a3EW8ToJIwscS46k/wOKLbFtnyn9bjP7yooQ3Z7Y+wT3KNgHFnl3GwkrxuMjQEX8IY/NVfWv17wpSvKgPrT+EtDTWiLSuzpl0ZoFPMnSKcJFW4QdBzuK7YVUt5OqEc0+mviH/V8lvEpPzgDPjMWSndatNPidZqlGOjQ/b0wPll6QlB6YU6iQegCUXalhk+pukDL60lPQA/n2lwkXCamvo+XKs1GayRWP2SvgjZ+9/oYB6f52HOVDCc5GcP2QcT3SFxbn37kuIqSdIKvfGazZsErbwvCon0PlvA4lFaTmNu1Qe4BpRh51vvN8kUft4XCzavNNz6uOmu3y17fum3kmchoe7tFDjSDvy/U7 u0otAOOn v+OpS3t3CoQI+JmjUOgz0dQYN+7o2P4hEnl3uhLro5jhzRalm9jReTUoBJr332P4gPLfOa+XZktQ+GDxp6BBZHSUNGEuYkW779wiDFGgk6GRQTAEsId9tlM/n6Bjpr+V4C7/PgOmb5P29FYeZ3KF66biXwS9i/XGS/8n6uQVPFn+BLaqxCL000vYX1/js0lrAdPwPFPn9gMN4bazVHfNWERStqb2pHxurTCeIkmKsaW+B1zqmGIvcYVKXD6Q40KnNtc1mawzWXZxSXqIxYBWCDWnfpPcvj1tqnr+pKE6sSnlF8zwmwGxlDzUFFTn+fgLymGBJDwCZ/BPh0kFbL/qWcEl+IL6+tiZdZkXr4fFKOfFaBCk9Gc7OWdZyLAC+D19XFexOH0DMkfd2GxKElUVf1SDa/I2tGmKxPMFxK0eOb5joW13xEhUY3/9tgVl3XTcap3VnDYJ6x5sTEE36FkoNSx2KP0Ig1fRUJBSvhujPNyvUI2oYyKfHixQRrdjoTMrskRkq762RuGp1ydcaLoUm4NjzPNiFK/Le6Gh5MMqPv56CKaA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 30 Sept 2024 at 10:35, Christian Theune wrote: > > Sep 27 00:51:20 13 kernel: folio_wait_bit_common+0x13f/0x340 > Sep 27 00:51:20 13 kernel: folio_wait_writeback+0x2b/0x80 Gaah. Every single case you point to is that folio_wait_writeback() case. And this might be an old old annoyance. folio_wait_writeback() is insane. It does while (folio_test_writeback(folio)) { trace_folio_wait_writeback(folio, folio_mapping(folio)); folio_wait_bit(folio, PG_writeback); } and the reason that is insane is that PG_writeback isn't some kind of exclusive state. So folio_wait_bit() will return once somebody has ended writeback, but *new* writeback can easily have been started afterwards. So then we go back to wait... And even after it eventually returns (possibly after having waited for hundreds of other processes writing back that folio - imagine lots of other threads doing writes to it and 'fdatasync()' or whatever) the caller *still* can't actually assume that the writeback bit is clear, because somebody else might have started writeback again. Anyway, it's insane, but it's insane for a *reason*. We've tried to fix this before, long before it was a folio op. See commit c2407cf7d22d ("mm: make wait_on_page_writeback() wait for multiple pending writebacks"). IOW, this code is known-broken and might have extreme unfairness issues (although I had blissfully forgotten about it), because while the actual writeback *bit* itself is set and cleared atomically, the wakeup for the bit is asynchronous and can be delayed almost arbitrarily, so you can get basically spurious wakeups that were from a previous bit clear. So the "wait many times" is crazy, but it's sadly a necessary crazy as things are right now. Now, many callers hold the page lock while doing this, and in that case new writeback cases shouldn't happen, and so repeating the loop should be extremely limited. But "many" is not "all". For example, __filemap_fdatawait_range() very much doesn't hold the lock on the pages it waits for, so afaik this can cause that unfairness and starvation issue. That said, while every one of your traces are for that folio_wait_writeback(), the last one is for the truncate case, and that one *does* hold the page lock and so shouldn't see this potential unfairness issue. So the code here is questionable, and might cause some issues, but the starvation of folio_wait_writeback() can't explain _all_ the cases you see. Linus