From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F163C10DC1 for ; Wed, 6 Dec 2023 10:34:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D29A6B0089; Wed, 6 Dec 2023 05:34:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3836B6B0092; Wed, 6 Dec 2023 05:34:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24B0E6B0095; Wed, 6 Dec 2023 05:34:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1547D6B0089 for ; Wed, 6 Dec 2023 05:34:56 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D1CB1160149 for ; Wed, 6 Dec 2023 10:34:55 +0000 (UTC) X-FDA: 81536035350.02.8909C63 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf09.hostedemail.com (Postfix) with ESMTP id B91C314000E for ; Wed, 6 Dec 2023 10:34:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=2R3mZkgp; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701858893; a=rsa-sha256; cv=none; b=yPTrClYeef06OkA7B3qv6XsdYg0rNGv0NUEElOU3m2cTOj3neCYkT7gzM3ASYlYPcLqgvP 19IgNCAPMD1N6BbqcB0AOWhEKa0/xYglFu2WqqVbMLtPZLMm0YwjqeSnybohc8qhufcB1H 1hYmuTU6XAqBlIorhC1+h0s2hYZ0KHI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=2R3mZkgp; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701858893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N4Z8EdhyaZRP7j5Yaa3EE9onKgRz+Z4MfKus85WrUrY=; b=UPlgzKJmBmTeWhXnC11nTXDTLX3mBG9ivxnjSu7wlnefvHlPP5EetdK0igLw5tec0VmQzl wQjCDEQha23hYnWHj5CzyZ3C+rEa8senDx99zKG5VmkKtTvCFPcpO/JXJU2gkZCPb3SbiR Ik5+/CIArTFlYzAPZKygqi8l2+nB1JU= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1d04dba2781so34422845ad.3 for ; Wed, 06 Dec 2023 02:34:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701858892; x=1702463692; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=N4Z8EdhyaZRP7j5Yaa3EE9onKgRz+Z4MfKus85WrUrY=; b=2R3mZkgpVeTxBY6EV8zSTHNEXKL/t64MPHuDvG4LQzUxN46JkzaANzlTiFtUvU9CIw gsTfdFdPsO3qYG1L54Qs05Wy5wkZrkLND8UfoLGUcVVv7vyTwZUoWbtvmZGpSGn4CDyW T26T3dVmYgCFdbpgdIv4ExB0HA5SrDQIsySrmC1yR7iJoMq6pE1eQaXB2IQIsLp3soHc 6CmHcwfB1WoJ9W1CbH+qVhhNrTPt006mfj3tT0CIcFHUurT+OwliGus0vlHl0XxI0fLc RU8EpvhkHRapN4zSw84SDI5Td76GGs/DkT2vpmI8UISWodp7+Ey8xa9FBzwozIGlhLI1 tyZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701858892; x=1702463692; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=N4Z8EdhyaZRP7j5Yaa3EE9onKgRz+Z4MfKus85WrUrY=; b=EQ+/ERoxzENZEm60plbFJDkno+qbRbtANETRQp9Z67Ra5vvCYwejZ/eOefk9EH+CjP ta3KNYrz0ZdvR+0odUIywe3x0LsFtY83AkCxUStA9nPrQb8jQtfbNX7W9u4bmBD9TWHZ HoUvnCToOYpfQF1YqlCvajCGT3MyoUOAb248PBhkUktuMdA5L89a5V0SDkQxtfpzeg2g ZAS6sE2719ToJ6iJQ2+nms5/ZNh5yc7XvL92LImEcL9hRVt0VyW6JoHWDIJA/lQSAldg CYKSfPF/uBh1irJ9f2tL3cRCAhSNMzXN3sJ2IOXXr3cwbaJS0SUcOu3HQk3vOSGx2RQG +U+w== X-Gm-Message-State: AOJu0YxZoA8/g8fuDchHqSpsaYz+m4Gr3rsIkYlvYOI8QGiYxK5GlfUX 2yPWn6Rwg1JnzjLbWiahbwB5nQ== X-Google-Smtp-Source: AGHT+IG+r05DW+YG20wT0zMSUQjc8wthXzTPHXekABeetwsDcjq3ngyrW/IXXGOyDC8eXSbIvaBSMQ== X-Received: by 2002:a17:90b:1249:b0:286:6cc1:866d with SMTP id gx9-20020a17090b124900b002866cc1866dmr475002pjb.82.1701858892359; Wed, 06 Dec 2023 02:34:52 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id z14-20020a1709027e8e00b001cc1dff5b86sm10987117pla.244.2023.12.06.02.34.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 02:34:51 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rApEv-004aa5-1z; Wed, 06 Dec 2023 21:34:49 +1100 Date: Wed, 6 Dec 2023 21:34:49 +1100 From: Dave Chinner To: Christoph Hellwig Cc: Baokun Li , Jan Kara , linux-mm@kvack.org, linux-ext4@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, willy@infradead.org, akpm@linux-foundation.org, ritesh.list@gmail.com, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Message-ID: References: <20231202091432.8349-1-libaokun1@huawei.com> <20231204121120.mpxntey47rluhcfi@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B91C314000E X-Stat-Signature: rdzarb3eamk198f8adgs6ny8mbnqspg1 X-Rspam-User: X-HE-Tag: 1701858893-810941 X-HE-Meta: U2FsdGVkX18y1dwzIrsy6i7OwdVNDI2QEMhXaiLtJGUzQftF8SBbXoElqKKrq5dvaGkKzpQhqDCNMrnrpguU5w38PqG4NLjkQ5BFDLe3spJA8F4kG6lLwIPXbdFmpIme69eU/27Q/2qWqVb63HNJG25dOKPV4szKU7OtArSbWJcbzh6ZaPuZppYd+pgfwtL/QqFOj85oEEr/8ecv73Y4VctRSq9DuaSpFhZ4xVYsWEO9uAMTv63nf7y98nqR1i5/Dn37hDHFNH3zh7loG623rf7+9+BPRGb2SimDJ8LNccD7qJZ69oHvygEqh7cdMkTZqWWQ/IO8fLuF1Ww2LGC9ztnx/eJQodlVRMsyhaCgbk98fJxmT9WUoRQbeMNy9TVawdZq5qDj6eBShuiI7RN9IK0Be2CIm4Om1NIrAzsLeKtgYRice7gA4yaKngcEUPSY/G+5a/YUUVqUPfHqDofmSN7YZjydRGvwOYjiFUG8ICo5TAFURPvq9uatX6fwV9Njlhh8ATCQk4OqJBCBa25/7tk9WZLkteqfKSPKuo67T9I5+QbDNQ8xd65PU2WFidaG9JUmM60FuR//1Feoi+leoK8RHqyraWhIhTaMbYtOE/TsL5/w8AVrGOMDtPJVvZEa4bDsg2M9y9e1pP3Fad4mW3fj7luJCKHttF8fJHTO10nvV4bwGqvJxkvd1NV9YUKX09XTyln0Hu7QQCnm2v2SNzwj2uyPp70a3+D1SoFLifBOYmhsrXaiX4Vn1vuFwD27SeIiPerXDvNrMyVxqS5xMul+uZRHS/BYeuhZ05t1IW7ZPxFaFlpHeAhG1aub1y5AFwFXihEtZgGPbkQrW47Fk2JUaCDoXamrbTc4zF6goxLcZxN7dbHHYhjlUse+sv8lDGYAw6e9BzRK6NPnYz9YzN/kaXUytC1zpzonQ75+Z1Y0J/5Gnj7TIgghqQIF7e4kE1A12BYXIbpleeT/aTa OCe5wybv iWf6cOrK4Y/mk6b/UmpU/1e+cRMWOc+0XZbWCcS+LDf4Lv8bHqm9uR2GpKuusUhXHNXptKDdhOD+DrUd4ybYQMYAXuhuquFOuc6FGpnXQ9QNegwMNl/851QvWVAII/2hme7qbwLOvl5vF+0nlHZv6gstWGZWXOjREUn4FOzAXwlJ3OxxgKINVduxASpoNSV8JC3aUKDKT30Hg8EU/UdWh658GlVTHUR9KVEMXsx5dyT0U2nFA8rahcfv2UdYcdu1COhPC0g44Vrqy14PHFP+S5Ado446nr3xL4WdEdSAkENHXpI5T+NDCvEro7ZdB07XNg6ZTXX+uxLltuqxjtNPqlAE5l7EjbBp5XKLncyyU9tV3p940cduffjPJQIk/4Hd82sIHF7W0WeC1ncdefNpRPbW/WDlifeyRIV5Tn9IMG7XQIQL6h1cHcLMquQzF/v7Ubc6rr57MjjVb4Crv3tBJUFQtTv9pKswh3UYD80tc1Kgg1woPoD9xRF3CGVf3U+CCJfwxhpLgg+zvk30= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 06, 2023 at 01:02:43AM -0800, Christoph Hellwig wrote: > On Wed, Dec 06, 2023 at 07:35:35PM +1100, Dave Chinner wrote: > > Mixing overlapping buffered read with direct writes - especially partial block > > extending DIO writes - is a recipe for data corruption. It's not a > > matter of if, it's a matter of when. > > > > Fundamentally, when you have overlapping write IO involving DIO, the > > result of the overlapping IOs is undefined. One cannot control > > submission order, the order that the overlapping IO hit the > > media, or completion ordering that might clear flags like unwritten > > extents. The only guarantee that we give in this case is that we > > won't expose stale data from the disk to the user read. > > Btw, one thing we could do to kill these races forever is to track if > there are any buffered openers for an inode and just fall back to > buffered I/O for that case. With that and and inode_dio_wait for > when opening for buffered I/O we'd avoid the races an various crazy > workarounds entirely. That's basically what Solaris did 20-25 years ago. The inode held a flag that indicated what IO was being done, and if the "buffered" flag was set (either through mmap() based access or buffered read/write syscalls) then direct IO would do also do buffered IO until the flag was cleared and the cache cleaned and invalidated. That had .... problems. Largely they were performance problems - unpredictable IO latency and CPU overhead for IO meant applications would randomly miss SLAs. The application would see IO suddenly lose all concurrency, go real slow and/or burn lots more CPU when the inode switched to buffered mode. I'm not sure that's a particularly viable model given the raw IO throughput even cheap modern SSDs largely exceeds the capability of buffered IO through the page cache. The differences in concurrency, latency and throughput between buffered and DIO modes will be even more stark itoday than they were 20 years ago.... -Dave. -- Dave Chinner david@fromorbit.com