From: Andres Freund <andres@anarazel.de>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>, Pankaj Raghav <pankaj.raghav@linux.dev>,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
lsf-pc@lists.linux-foundation.org, djwong@kernel.org,
john.g.garry@oracle.com, willy@infradead.org, hch@lst.de,
ritesh.list@gmail.com, Luis Chamberlain <mcgrof@kernel.org>,
dchinner@redhat.com, Javier Gonzalez <javier.gonz@samsung.com>,
gost.dev@samsung.com, tytso@mit.edu, p.raghav@samsung.com,
vi.shah@samsung.com
Subject: Re: [LSF/MM/BPF TOPIC] Buffered atomic writes
Date: Tue, 17 Feb 2026 13:42:41 -0500 [thread overview]
Message-ID: <yamn4f3oympcvc4otmzrrjgwxd5xanvk5j376dojky76lrkfgv@agfxv32zyfvr> (raw)
In-Reply-To: <aZSzJs3WIuV4SQJp@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Hi,
On 2026-02-17 23:57:50 +0530, Ojaswin Mujoo wrote:
> From my mental model and very high level understanding of Postgres' WAL
> model [1] I am under the impression that for moving from full page
> writes to RWF_ATOMIC, we would need to ensure that the **disk** write IO
> of any data buffer should go in an untorn fashion.
Right.
> Now, coming to your example, IIUC here we can actually tolerate to do
> the 2nd write above non atomically because it is already a sort of full
> page write in the journal.
>
> So lets say if we do something like:
>
> 0. Buffer has some initial value on disk
> 1. Write new rows into buffer
> 2. Write the buffer as RWF_ATOMIC
> 3. Overwrite the complete buffer which will journal all the contents
> 4. Write the buffer as non RWF_ATOMIC
> 5. Crash
>
> I think it is still possible to satisfy my assumption of **disk** IO
> being untorn. Example, here we can have an RWF_ATOMIC implementation
> where the data on disk after crash could either be in initial state 0.
> or be the new value after 4. This is not strictly the old or new
> semantic but still ensures the data is consistent.
The way I understand Jan is that, unless we are careful with the write in 4),
the write for 0) could still be in progress, with the copy from userspace to
the pagecache from 4 happening in the middle of the DMA for the write from 0),
leading to a torn page on-disk, even though the disk actually behaved
correctly.
> My naive understanding says that as long as disk has consistent/untorn
> data, like above, we can recover via the journal.
Yes, if that were true, we could recover. But if my understanding of Jan's
concern is right, that'd not necessarily be guaranteed.
Greetings,
Andres Freund
next prev parent reply other threads:[~2026-02-17 18:42 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 10:20 Pankaj Raghav
2026-02-13 13:32 ` Ojaswin Mujoo
2026-02-16 9:52 ` Pankaj Raghav
2026-02-16 15:45 ` Andres Freund
2026-02-17 12:06 ` Jan Kara
2026-02-17 12:42 ` Pankaj Raghav
2026-02-17 16:21 ` Andres Freund
2026-02-18 1:04 ` Dave Chinner
2026-02-18 6:47 ` Christoph Hellwig
2026-02-18 23:42 ` Dave Chinner
2026-02-17 16:13 ` Andres Freund
2026-02-17 18:27 ` Ojaswin Mujoo
2026-02-17 18:42 ` Andres Freund [this message]
2026-02-18 17:37 ` Jan Kara
2026-02-18 21:04 ` Andres Freund
2026-02-19 0:32 ` Dave Chinner
2026-02-17 18:33 ` Ojaswin Mujoo
2026-02-17 17:20 ` Ojaswin Mujoo
2026-02-18 17:42 ` [Lsf-pc] " Jan Kara
2026-02-18 20:22 ` Ojaswin Mujoo
2026-02-16 11:38 ` Jan Kara
2026-02-16 13:18 ` Pankaj Raghav
2026-02-17 18:36 ` Ojaswin Mujoo
2026-02-16 15:57 ` Andres Freund
2026-02-17 18:39 ` Ojaswin Mujoo
2026-02-18 0:26 ` Dave Chinner
2026-02-18 6:49 ` Christoph Hellwig
2026-02-18 12:54 ` Ojaswin Mujoo
2026-02-15 9:01 ` Amir Goldstein
2026-02-17 5:51 ` Christoph Hellwig
2026-02-17 9:23 ` [Lsf-pc] " Amir Goldstein
2026-02-17 15:47 ` Andres Freund
2026-02-17 22:45 ` Dave Chinner
2026-02-18 4:10 ` Andres Freund
2026-02-18 6:53 ` Christoph Hellwig
2026-02-18 6:51 ` Christoph Hellwig
2026-02-20 10:08 ` Pankaj Raghav (Samsung)
2026-02-20 15:10 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yamn4f3oympcvc4otmzrrjgwxd5xanvk5j376dojky76lrkfgv@agfxv32zyfvr \
--to=andres@anarazel.de \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=javier.gonz@samsung.com \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=p.raghav@samsung.com \
--cc=pankaj.raghav@linux.dev \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=vi.shah@samsung.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox