ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	NeilBrown <neilb@suse.de>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	Steven Rostedt <rostedt@goodmis.org>,
	Guenter Roeck <linux@roeck-us.net>,
	Christoph Hellwig <hch@infradead.org>,
	ksummit@lists.linux.dev, linux-fsdevel@vger.kernel.org
Subject: Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
Date: Wed, 20 Sep 2023 08:57:19 +1000	[thread overview]
Message-ID: <ZQonT/MQgCIg+oZP@dread.disaster.area> (raw)
In-Reply-To: <ZQku4dvmtO56BvCr@casper.infradead.org>

On Tue, Sep 19, 2023 at 06:17:21AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 11:15:54AM +1000, Dave Chinner wrote:
> > This was easy to do with iomap based filesystems because they don't
> > carry per-block filesystem structures for every folio cached in page
> > cache - we carry a single object per folio that holds the 2 bits of
> > per-filesystem block state we need for each block the folio maps.
> > Compare that to a bufferhead - it uses 56 bytes of memory per
> > fielsystem block that is cached.
> 
> 56?1  What kind of config do you have?  It's 104 bytes on Debian:
> buffer_head          936   1092    104   39    1 : tunables    0    0    0 : slabdata     28     28      0
> 
> Maybe you were looking at a 32-bit system; most of the elements are
> word-sized (pointers, size_t or long)

Perhaps so, it's been years since I actually paid attention to the
exact size of a bufferhead (XFS completely moved away from them back
in 2018). Regardless, underestimating the size of the bufferhead
doesn't materially change the reasons iomap is a better choice for
filesystems running on modern storage hardware...

> > So we have to consider that maybe it is less work to make high-order
> > folios work with bufferheads. And that's where we start to get into
> > the maintenance problems with old filesysetms using bufferheads -
> > how do we ensure that the changes for high-order folio support in
> > bufferheads does not break the way one of these old filesystems
> > that use bufferheads?
> 
> I don't think we can do it.  Regardless of the question you're proposing
> here, the model where we complete a BIO, then walk every buffer_head
> attached to the folio to determine if we can now mark the folio as being
> (uptodate / not-under-writeback) just doesn't scale when you attach more
> than tens of BHs to the folio.  It's one bit per BH rather than having
> a summary bitmap like iomap has.

*nod*

I said as much earlier in the email:

"The pointer chasing model per-block bufferhead iteration requires
to update state and retrieve mapping information just does not scale
to marshalling millions of objects a second through the page cache."


> I have been thinking about spitting the BH into two pieces, something
> like this:
> 
> struct buffer_head_head {
> 	spinlock_t b_lock;
> 	struct buffer_head *buffers;
> 	unsigned long state[];
> };
> 
> and remove BH_Uptodate and BH_Dirty in favour of setting bits in state
> like iomap does.

Yes, that woudl make it similar to the way iomap works, but I think
that then creates more problems in that bufferhead state is used for
per-block locking and blocking waits. I don't really want to think
about much more how complex stuff like __block_write_full_folio()
becomes with this model...

> But, as you say, there are a lot of filesystems that would need to be
> audited and probably modified.

Yes, this is the common problem all these "modernise old API" ideas
end up at - this is the primary issue that needs to be sorted out,
and we're no closer to that now than when the thread started.

We can deal with this problem for filesystems that we can test. For
stuff we can't test and verify, then we really have to start
considering the larger picture around shipping unverified code to
users.

Go read this article on LWN about new EU laws for software
development that aren't that far off being passed into law:

https://lwn.net/Articles/944300/

And it's clear that there are also current policy discussions going
through the US federal government that are, most likely, going to
end up in a similar place with respect to secure development
practices for critical software infrastructure like the Linux
kernel.

Now combine that with this one about the problem of bogus CVEs
(which could have been written about syzbot and filesystems!):

https://lwn.net/Articles/944209/

And it's pretty clear that the current issues with unmaintained code
will only get worse from here. All it will take is a CVE to be
issued on one of these unmaintained filesystems, and the safest
thing for us to do will be to remove the code to remove all
potential liability for it.

The basic message is that we aren't going to be able to ignore code
that we can't substantially verify for much longer.  We simply won't
have a choice about the code we ship: if is not testable and
verified to the best of our abilities then nobody will risk
shipping it regardless of whether they have users or not.

That's the model the cybersecurity-industrial complex is pushing us
towards whether we like it or not. If this is the future in which we
develop software, then this has substantial impact on any discussion
about how to manage old unmaintained, untestable code in any project
we work on, not just the Linux kernel...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2023-09-19 22:57 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30 14:07 Christoph Hellwig
2023-09-05 23:06 ` Dave Chinner
2023-09-05 23:23   ` Matthew Wilcox
2023-09-06  2:09     ` Dave Chinner
2023-09-06 15:06       ` Christian Brauner
2023-09-06 15:59         ` Christian Brauner
2023-09-06 19:09         ` Geert Uytterhoeven
2023-09-08  8:34         ` Christoph Hellwig
2023-09-07  0:46     ` Bagas Sanjaya
2023-09-09 12:50     ` James Bottomley
2023-09-09 15:44       ` Matthew Wilcox
2023-09-10 19:51         ` James Bottomley
2023-09-10 20:19           ` Kent Overstreet
2023-09-10 21:15           ` Guenter Roeck
2023-09-11  3:10           ` Theodore Ts'o
2023-09-11 19:03             ` James Bottomley
2023-09-12  0:23               ` Dave Chinner
2023-09-12 16:52             ` H. Peter Anvin
2023-09-09 22:42       ` Kent Overstreet
2023-09-10  8:19         ` Geert Uytterhoeven
2023-09-10  8:37           ` Bernd Schubert
2023-09-10 16:35           ` Kent Overstreet
2023-09-10 17:26             ` Geert Uytterhoeven
2023-09-10 17:35               ` Kent Overstreet
2023-09-11  1:05         ` Dave Chinner
2023-09-11  1:29           ` Kent Overstreet
2023-09-11  2:07             ` Dave Chinner
2023-09-11 13:35               ` David Disseldorp
2023-09-11 17:45                 ` Bart Van Assche
2023-09-11 19:11                   ` David Disseldorp
2023-09-11 23:05                 ` Dave Chinner
2023-09-26  5:24           ` Eric W. Biederman
2023-09-08  8:55   ` Christoph Hellwig
2023-09-08 22:47     ` Dave Chinner
2023-09-06 22:32 ` Guenter Roeck
2023-09-06 22:54   ` Dave Chinner
2023-09-07  0:53     ` Bagas Sanjaya
2023-09-07  3:14       ` Dave Chinner
2023-09-07  1:53     ` Steven Rostedt
2023-09-07  2:22       ` Dave Chinner
2023-09-07  2:51         ` Steven Rostedt
2023-09-07  3:26           ` Matthew Wilcox
2023-09-07  8:04             ` Thorsten Leemhuis
2023-09-07 10:29               ` Christian Brauner
2023-09-07 11:18                 ` Thorsten Leemhuis
2023-09-07 12:04                   ` Matthew Wilcox
2023-09-07 12:57                   ` Guenter Roeck
2023-09-07 13:56                     ` Christian Brauner
2023-09-08  8:44                     ` Christoph Hellwig
2023-09-07  3:38           ` Dave Chinner
2023-09-07 11:18             ` Steven Rostedt
2023-09-13 16:43               ` Eric Sandeen
2023-09-13 16:58                 ` Guenter Roeck
2023-09-13 17:03                 ` Linus Torvalds
2023-09-15 22:48                   ` Dave Chinner
2023-09-16 19:44                     ` Steven Rostedt
2023-09-16 21:50                     ` James Bottomley
2023-09-17  1:40                       ` NeilBrown
2023-09-17 17:30                         ` Linus Torvalds
2023-09-17 18:09                           ` Linus Torvalds
2023-09-17 18:57                           ` Theodore Ts'o
2023-09-17 19:45                             ` Linus Torvalds
2023-09-18 11:14                               ` Jan Kara
2023-09-18 17:26                                 ` Linus Torvalds
2023-09-18 19:32                                   ` Jiri Kosina
2023-09-18 19:59                                     ` Linus Torvalds
2023-09-18 20:50                                       ` Theodore Ts'o
2023-09-18 22:48                                         ` Linus Torvalds
2023-09-18 20:33                                     ` H. Peter Anvin
2023-09-19  4:56                                   ` Dave Chinner
2023-09-25  9:43                                     ` Christoph Hellwig
2023-09-27 22:23                                 ` Dave Kleikamp
2023-09-19  1:15                           ` Dave Chinner
2023-09-19  5:17                             ` Matthew Wilcox
2023-09-19 16:34                               ` Theodore Ts'o
2023-09-19 16:45                                 ` Matthew Wilcox
2023-09-19 17:15                                   ` Linus Torvalds
2023-09-19 22:57                               ` Dave Chinner [this message]
2023-09-18 14:54                       ` Bill O'Donnell
2023-09-19  2:44                       ` Dave Chinner
2023-09-19 16:57                         ` James Bottomley
2023-09-25  9:38                   ` Christoph Hellwig
2023-09-25 14:14                     ` Dan Carpenter
2023-09-25 16:50                     ` Linus Torvalds
2023-09-07  9:48       ` Dan Carpenter
2023-09-07 11:04         ` Segher Boessenkool
2023-09-07 11:22           ` Steven Rostedt
2023-09-07 12:24             ` Segher Boessenkool
2023-09-07 11:23           ` Dan Carpenter
2023-09-07 12:30             ` Segher Boessenkool
2023-09-12  9:50               ` Richard Biener
2023-10-23  5:19                 ` Eric Gallager
2023-09-08  8:39       ` Christoph Hellwig
2023-09-08  8:38     ` Christoph Hellwig
2023-09-08 23:21       ` Dave Chinner
2023-09-07  0:48   ` Bagas Sanjaya
2023-09-07  3:07     ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQonT/MQgCIg+oZP@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=hch@infradead.org \
    --cc=ksummit@lists.linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=neilb@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=sandeen@sandeen.net \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox