From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE880CEB2CD for ; Mon, 30 Sep 2024 23:53:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7354C6B0275; Mon, 30 Sep 2024 19:53:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E5476B0276; Mon, 30 Sep 2024 19:53:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ABE76B0277; Mon, 30 Sep 2024 19:53:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3C4446B0275 for ; Mon, 30 Sep 2024 19:53:30 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E711A80C45 for ; Mon, 30 Sep 2024 23:53:29 +0000 (UTC) X-FDA: 82623058938.22.4FC23B3 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf17.hostedemail.com (Postfix) with ESMTP id 9203540005 for ; Mon, 30 Sep 2024 23:53:27 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=JBFTmYNs; spf=pass (imf17.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.43 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727740240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6S4zxOawKjAaveDpwT3Z661D+kmHQpHbS1yW1IiFsM=; b=jB7Sg4JojIRSstB418kQn7ZN/c93WIQG0w0rRQ7EnvAWiT7eSIhge1qatfaRqjeK0j3PhO bJxq2EVRx9Z129lnRU7Ai72XoqFfmoJSYaWVopInHmfOE+XBfskodPi2Fv6dIav9raowSM WNIHn9JYHFI6d/yjS2GXnRL8HC7iVmA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=JBFTmYNs; spf=pass (imf17.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.43 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727740240; a=rsa-sha256; cv=none; b=OJjE1hjP3Q1zVG12ccGcvwaVNo5YHRsAgyodylGoPXvZJe0YcupEhaf2QBEsI/BagJ/nfr rtChixoZX3zRCGUlED+aBz+gntk3A8opIS6qzpz5Lg5G7F9F3GZ7Z4HoEybVFuhLbBOKnb rDqA6SUWk6Qzw/Jp8wrCqHY7zWe8eV0= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5c40aea5c40so8959197a12.0 for ; Mon, 30 Sep 2024 16:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1727740406; x=1728345206; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N6S4zxOawKjAaveDpwT3Z661D+kmHQpHbS1yW1IiFsM=; b=JBFTmYNsZC9FZXnx1LlfHMIO4HZyDCBIUFmMlevv7oNfCe8BoLeka88EgY/RBuc8is tfOr9+MLD3WBJH7WZHIsXrDHhcTkrIfGeo21WRAUz72oJ2o1t/X8nuRwevw6jj1FxB8s HUGHLWtwI/XuAr9cNo2rHPD53LxO2Vrv+Qeuo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727740406; x=1728345206; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N6S4zxOawKjAaveDpwT3Z661D+kmHQpHbS1yW1IiFsM=; b=bqugGj83Wi0jVnrHgYsGo0HHUzsYpmsy+REUcS8u8yJVRbyFHmvu1EKPm5TOKvfQqL uUawORPO6onQsLs0+4qSvbhazx90WMYtwenERu6ZpFGYKsOi98hrF64UnP6/AVXmEY8t Enh8VpCzR6sSMBeuas7O/FQ5kkvtwu8Qjp0aRseGWSU59mnSPUksvvtrwiy8Q5Ob8Wy5 O8CFa0hd67Sr4FHWdu8/MY9dTvuvT0a5eLykL68T+w9WHuy52bmR1fYDVS0vGwpLmxm1 /9tWdWRNxVFiOG5vuCS+utdTv0GmhMUv3OEuTp62ff2PN4ozFL4kIdUt4hKdcfkZ10c/ bWug== X-Forwarded-Encrypted: i=1; AJvYcCVJ4xmVDaHttvptUDlvZOglbq2urBli7aXfPM4LXEHQRiRjK+FAZW1X2CQUp8IXzgiLoopxt4KQ0Q==@kvack.org X-Gm-Message-State: AOJu0YwcM+HWfmY+nk5zQFTZZ5qCc2gRlN0IoLg8D/SMDD6rAZ5N/3LQ 2cwI0goZtlA5B47poFiwWVdEofKhggUoOgB5m5P95ig9IUmSsWj9LFyGtK6z8EOSwAro+nHcp7W O2gxrhw== X-Google-Smtp-Source: AGHT+IF0ulh5YKDR3pShfjpTuVs+iukyfR+CslLYN1oxLMZ2TsvgbU41HVebkBQEASD1/sy9Paoapw== X-Received: by 2002:a05:6402:5191:b0:5c2:1014:295a with SMTP id 4fb4d7f45d1cf-5c8a29f7c88mr1065579a12.2.1727740405880; Mon, 30 Sep 2024 16:53:25 -0700 (PDT) Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com. [209.85.218.46]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c882405166sm5336271a12.9.2024.09.30.16.53.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Sep 2024 16:53:24 -0700 (PDT) Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a8a789c4fc5so1017575966b.0 for ; Mon, 30 Sep 2024 16:53:22 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCW6oPqyT/bZKSHstXvkGqEcXfPTI5EdF9ZUI5iM65W3PJAGC0Pmyb4w+A6HNZ4JnQZVEzVcavf+nA==@kvack.org X-Received: by 2002:a17:907:7294:b0:a8d:5f69:c839 with SMTP id a640c23a62f3a-a967bf527c8mr104147566b.15.1727740402409; Mon, 30 Sep 2024 16:53:22 -0700 (PDT) MIME-Version: 1.0 References: <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> <295BE120-8BF4-41AE-A506-3D6B10965F2B@flyingcircus.io> In-Reply-To: From: Linus Torvalds Date: Mon, 30 Sep 2024 16:53:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Matthew Wilcox Cc: Christian Theune , Dave Chinner , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9203540005 X-Stat-Signature: o363z8d4d7yc89essnpg9x43eoqamewf X-Rspam-User: X-HE-Tag: 1727740407-931061 X-HE-Meta: U2FsdGVkX18pHGlAwqxnr2bMUyTcq9/SW+UtmwaOx9JYXixNvhq62kjGd1cfe6BE7mwPJMOclTdyzIjrO6rBg6lJNmA4aU07w0qc0LfWVBX1IXAksK2F0RNDkV2RttJPshTmY/aVwHCuuFAKRVthHYrUVXUGUe1lPcrZ3mk7496YvwRC7zb5BTcWrmU83KPJCN/EkVy2Nko+EsSxYDALCjdttNZ5RCtIA0/S7GSnPpoHZu6G80qaLngnFLAR6m/g5BHrlf48N25zqxLhChD17ipUypc0ON6W8PCkFHZrW9R8FPcoOsQ2rX938C6/sZKd2ZygedvMvhUobwJJ5Db7hiEDgFgEtGFnL+w0vNOalwjTbK4G0SUANE21FQMQp+GkYw3uf35WDB+BG2he4DSiz8yq0A7idgSIwIp8OkHnwPm54AOFkiMHYNTn/+3W7PpCm13um4nrH7aHI/sEsiRzcRtei2lNMfbrK2uMhL70B8QwKNh0oEJxlPizk6qD7noutnpdihwvgORgHkTH6t1NtDSjv9DfFV2i9QDMuFegpfydBeH084MlRNUTxXr0I2Q0by6dSCNHLDFg80e33UQboUhOTh5cjYe9+jg6g3LPRbeI4euLrxIYFS1SBz/0oNcFAfgj7PXAt3HUE27+yuf9ZPhl6efrhxx0IT+CHbNML/dLLRxESKmD2T9dTx3u0ABAEk5aOIc1oqkgkNaleTykPxEb8X8lEdrhAEdG00Gv/QkFiGM0Jj4ec871EAe+yQBdgpPw9zsKgHf2VN9TpoQnAM7SitArjDQj56ywUaPRZnoLZMMd8pBpqHrgFIkacqRKRKWf9Ru9Kh9Le4vm43VGCCyWApElejTw24awSHG17AJiP7VUdaNzfKinpdY2F7kKjWoe0LXbF8cmRsLMJkP1w/dU8CsDU+zJLZilaJmUrRFLtiAKOhml0SOFsuYld/aC7cksCc6cd7DPWesU4BJ QxQh+yZM zUyp1Maw+GWut52FN0AEJOyxSgLBE/Ul+T9b4hLGIE4/ZS0D2u+/KctY8s0zG7bVp7fTEzd9+vg+0YNCdFqvJt8/7KtvCRjN/fxFh9FTYLozgofVuOAnwkAfwdyI1C8nuy12H0zFN+eBaMK4JCD/g1Cr2C6PwA+kJQ1fBW9b5iCzzbPiKKxEHKampvdauE5KiQQPrO7GZt/m7NKP7joYs9AzKvKYMf0fIMRVN/+wiGZS5rLwmSsIm83JetvHt5AFfZAOz58Y0D8Hjwj6dqtBf1BqFnAuVISykHP2i3h5IqcNbzN/8tMbwejpRYAsrzqLe1hS7CZGb8CgeTIbW21f5Q8ClEXG7ry1A3BI3Im3rmXsEE57lPcNsKb0d8cNe2v3X7nT7WtewwIKVyHKTqwjbX/spZ0R/IC44S6gBUhwooviOJZw31+RwpQt93onkGgDX++NULaBmlVE6/dOdRooQvybowqbaLSShxR5xA2lngTwIPu5Zy+Guuj0/lyttrY+KM1q5agvM/L8c/fREs1xjKugxt1XCoXHTlhgVL/ifdIhIoA8n3PicvG9UDg/FSKATXwOBoe4FSMKCZ2JNkYJvdWAZijOYbBTsafOqQai1aWOK3ms= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 30 Sept 2024 at 13:57, Matthew Wilcox wrote: > > Could we break out if folio->mapping has changed? Clearly if it has, > we're no longer waiting for the folio we thought we were waiting for, > but for a folio which now belongs to a different file. Sounds like a sane check to me, but it's also not clear that this would make any difference. The most likely reason for starvation I can see is a slow thread (possibly due to cgroup throttling like Christian alluded to) would simply be continually unlucky, because every time it gets woken up, some other thread has already dirtied the data and caused writeback again. I would think that kind of behavior (perhaps some DB transaction header kind of folio) would be more likely than the mapping changing (and then remaining under writeback for some other mapping). But I really don't know. I would much prefer to limit the folio_wait_bit() loop based on something else. For example, the basic reason for that loop (unless there is some other hidden one) is that the folio writeback bit is not atomic wrt the wakeup. Maybe we could *make* it atomic, by simply taking the folio waitqueue lock before clearing the bit? (Only if it has the "waiters" bit set, of course!) Handwavy. Anyway, this writeback handling is nasty. folio_end_writeback() has a big comment about the subtle folio reference issue too, and ignoring that we also have this: if (__folio_end_writeback(folio)) folio_wake_bit(folio, PG_writeback); (which is the cause of the non-atomicity: __folio_end_writeback() will clear the bit, and return the "did we have waiters", and then folio_wake_bit() will get the waitqueue lock and wake people up). And notice how __folio_end_writeback() clears the bit with ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback); which does that "clear bit and look it it had waiters" atomically. But that function then has a comment that says * This must only be used for flags which are changed with the folio * lock held. For example, it is unsafe to use for PG_dirty as that * can be set without the folio lock held. [...] but the code that uses it here does *NOT* hold the folio lock. I think the comment is wrong, and the code is fine (the important point is that the folio lock _serialized_ the writers, and while clearing doesn't hold the folio lock, you can't clear it without setting it, and setting the writeback flag *does* hold the folio lock). So my point is not that this code is wrong, but that this code is all kinds of subtle and complex. I think it would be good to change the rules so that we serialize with waiters, but being complex and subtle means it sounds all kinds of nasty. Linus