From: Andres Freund <andres@anarazel.de>
To: Jan Kara <jack@suse.cz>
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Pankaj Raghav <pankaj.raghav@linux.dev>,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
lsf-pc@lists.linux-foundation.org, djwong@kernel.org,
john.g.garry@oracle.com, willy@infradead.org, hch@lst.de,
ritesh.list@gmail.com, Luis Chamberlain <mcgrof@kernel.org>,
dchinner@redhat.com, Javier Gonzalez <javier.gonz@samsung.com>,
gost.dev@samsung.com, tytso@mit.edu, p.raghav@samsung.com,
vi.shah@samsung.com
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Buffered atomic writes
Date: Mon, 16 Feb 2026 10:57:44 -0500 [thread overview]
Message-ID: <fh3hxyhoezqzcsabqjr2glft2uvrx5bkyx6pejek3uskpm5ow4@zym4kmg5o2bm> (raw)
In-Reply-To: <w3vwdaygcz3prsxwv43blo4co666mragpdwaxihbirt5stl4vr@agyz4mnaxghj>
Hi,
On 2026-02-16 12:38:59 +0100, Jan Kara wrote:
> On Fri 13-02-26 19:02:39, Ojaswin Mujoo wrote:
> > Another thing that came up is to consider using write through semantics
> > for buffered atomic writes, where we are able to transition page to
> > writeback state immediately after the write and avoid any other users to
> > modify the data till writeback completes. This might affect performance
> > since we won't be able to batch similar atomic IOs but maybe
> > applications like postgres would not mind this too much. If we go with
> > this approach, we will be able to avoid worrying too much about other
> > users changing atomic data underneath us.
> >
> > An argument against this however is that it is user's responsibility to
> > not do non atomic IO over an atomic range and this shall be considered a
> > userspace usage error. This is similar to how there are ways users can
> > tear a dio if they perform overlapping writes. [1].
>
> Yes, I was wondering whether the write-through semantics would make sense
> as well.
As outlined in
https://lore.kernel.org/all/zzvybbfy6bcxnkt4cfzruhdyy6jsvnuvtjkebdeqwkm6nfpgij@dlps7ucza22s/
that is something that would be useful for postgres even orthogonally to
atomic writes.
If this were the path to go with, I'd suggest adding an RWF_WRITETHROUGH and
requiring it to be set when using RWF_ATOMIC on an buffered write. That way,
if the kernel were to eventually support buffered atomic writes without
immediate writeback, the semantics to userspace wouldn't suddenly change.
> Intuitively it should make things simpler because you could
> practially reuse the atomic DIO write path. Only that you'd first copy
> data into the page cache and issue dio write from those folios. No need for
> special tracking of which folios actually belong together in atomic write,
> no need for cluttering standard folio writeback path, in case atomic write
> cannot happen (e.g. because you cannot allocate appropriately aligned
> blocks) you get the error back rightaway, ...
>
> Of course this all depends on whether such semantics would be actually
> useful for users such as PostgreSQL.
I think it would be useful for many workloads.
As noted in the linked message, there are some workloads where I am not sure
how the gains/costs would balance out (with a small PG buffer pool in a write
heavy workload, we'd loose the ability to have the kernel avoid redundant
writes). It's possible that we could develop some heuristics to fall back to
doing our own torn-page avoidance in such cases, although it's not immediately
obvious to me what that heuristic would be. It's also not that common a
workload, it's *much* more common to have a read heavy workload that has to
overflow in the kernel page cache, due to not being able to dedicate
sufficient memory to postgres.
Greetings,
Andres Freund
next prev parent reply other threads:[~2026-02-16 15:57 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 10:20 Pankaj Raghav
2026-02-13 13:32 ` Ojaswin Mujoo
2026-02-16 9:52 ` Pankaj Raghav
2026-02-16 15:45 ` Andres Freund
2026-02-17 12:06 ` Jan Kara
2026-02-17 12:42 ` Pankaj Raghav
2026-02-17 16:21 ` Andres Freund
2026-02-18 1:04 ` Dave Chinner
2026-02-18 6:47 ` Christoph Hellwig
2026-02-18 23:42 ` Dave Chinner
2026-02-17 16:13 ` Andres Freund
2026-02-17 18:27 ` Ojaswin Mujoo
2026-02-17 18:42 ` Andres Freund
2026-02-18 17:37 ` Jan Kara
2026-02-18 21:04 ` Andres Freund
2026-02-19 0:32 ` Dave Chinner
2026-02-17 18:33 ` Ojaswin Mujoo
2026-02-17 17:20 ` Ojaswin Mujoo
2026-02-18 17:42 ` [Lsf-pc] " Jan Kara
2026-02-18 20:22 ` Ojaswin Mujoo
2026-02-16 11:38 ` Jan Kara
2026-02-16 13:18 ` Pankaj Raghav
2026-02-17 18:36 ` Ojaswin Mujoo
2026-02-16 15:57 ` Andres Freund [this message]
2026-02-17 18:39 ` Ojaswin Mujoo
2026-02-18 0:26 ` Dave Chinner
2026-02-18 6:49 ` Christoph Hellwig
2026-02-18 12:54 ` Ojaswin Mujoo
2026-02-15 9:01 ` Amir Goldstein
2026-02-17 5:51 ` Christoph Hellwig
2026-02-17 9:23 ` [Lsf-pc] " Amir Goldstein
2026-02-17 15:47 ` Andres Freund
2026-02-17 22:45 ` Dave Chinner
2026-02-18 4:10 ` Andres Freund
2026-02-18 6:53 ` Christoph Hellwig
2026-02-18 6:51 ` Christoph Hellwig
2026-02-20 10:08 ` Pankaj Raghav (Samsung)
2026-02-20 15:10 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fh3hxyhoezqzcsabqjr2glft2uvrx5bkyx6pejek3uskpm5ow4@zym4kmg5o2bm \
--to=andres@anarazel.de \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=javier.gonz@samsung.com \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=p.raghav@samsung.com \
--cc=pankaj.raghav@linux.dev \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=vi.shah@samsung.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox