From: Kent Overstreet <kent.overstreet@linux.dev>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Matthew Wilcox <willy@infradead.org>, NeilBrown <neilb@suse.de>,
Amir Goldstein <amir73il@gmail.com>,
paulmck@kernel.org, lsf-pc@lists.linux-foundation.org,
linux-mm@kvack.org,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU
Date: Thu, 29 Feb 2024 23:01:01 -0500 [thread overview]
Message-ID: <vpyvfmlr2cc6oyinf676zgc7mdqbbul2mq67kvkfebze3f4ov2@ucp43ej3dlrh> (raw)
In-Reply-To: <3bykct7dzcduugy6kvp7n32sao4yavgbj2oui2rpidinst2zmn@e5qti5lkq25t>
On Thu, Feb 29, 2024 at 10:52:06PM -0500, Kent Overstreet wrote:
> On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote:
> > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote:
> > > Or maybe you just want the syscall to return an error instead of
> > > blocking for an unbounded amount of time if userspace asks for
> > > something silly.
> >
> > Warn on allocation above a certain size without MAY_FAIL would seem to
> > cover all those cases. If there is a case for requiring instant
> > allocation, you always have GFP_ATOMIC, and, I suppose, we could even
> > do a bounded reclaim allocation where it tries for a certain time then
> > fails.
>
> Then you're baking in this weird constant into all your algorithms that
> doesn't scale as machine memory sizes and working set sizes increase.
>
> > > Honestly, relying on the OOM killer and saying that because that now
> > > we don't have to write and test your error paths is a lazy cop out.
> >
> > OOM Killer is the most extreme outcome. Usually reclaim (hugely
> > simplified) dumps clean cache first and tries the shrinkers then tries
> > to write out dirty cache. Only after that hasn't found anything after
> > a few iterations will the oom killer get activated
>
> All your caches dumped and the machine grinds to a halt and then a
> random process gets killed instead of simply _failing the allocation_.
>
> > > The same kind of thinking got us overcommit, where yes we got an
> > > increase in efficiency, but the cost was that everyone started
> > > assuming and relying on overcommit, so now it's impossible to run
> > > without overcommit enabled except in highly controlled environments.
> >
> > That might be true for your use case, but it certainly isn't true for a
> > cheap hosting cloud using containers: overcommit is where you make your
> > money, so it's absolutely standard operating procedure. I wouldn't
> > call cheap hosting a "highly controlled environment" they're just
> > making a bet they won't get caught out too often.
>
> Reading comprehension fail. Reread what I wrote.
>
> > > And that means allocation failure as an effective signal is just
> > > completely busted in userspace. If you want to write code in
> > > userspace that uses as much memory as is available and no more, you
> > > _can't_, because system behaviour goes to shit if you have overcommit
> > > enabled or a bunch of memory gets wasted if overcommit is disabled
> > > because everyone assumes that's just what you do.
> >
> > OK, this seems to be specific to your use case again, because if you
> > look at what the major user space processes like web browsers do, they
> > allocate way over the physical memory available to them for cache and
> > assume the kernel will take care of it. Making failure a signal for
> > being over the working set would cause all these applications to
> > segfault almost immediately.
>
> Again, reread what I wrote. You're restating what I wrote and completely
> missing the point.
>
> > > Let's _not_ go that route in the kernel. I have pointy sticks to
> > > brandish at people who don't want to deal with properly handling
> > > errors.
> >
> > Error legs are the least exercised and most bug, and therefore exploit,
> > prone pieces of code in C. If we can get rid of them, we should.
>
> Fuck no.
>
> Having working error paths is _basic_, and learning how to test your
> code is also basic. If you can't be bothered to do that you shouldn't be
> writing kernel code.
>
> We are giving far too much by going down the route of "oh, just kill
> stuff if we screwed the pooch and overcommitted".
>
> I don't fucking care if it's what the big cloud providers want because
> it's convenient for them, some of us actually do care about reliability.
>
> By just saying "oh, the OO killer will save us" what you're doing is
> making it nearly impossible to fully utilize a machine without having
> stuff randomly killed.
>
> Fuck. That.
And besides all that, as a practical matter you can't just "not have
erro paths" because, like you said, you'd still have to have a max size
where you WARN() - and _fail the allocation_ - and you've still got to
unwind.
The OOM killer can't kill processes while they're stuck blocking on an
allocation that will rever return in the kernel.
I think we can safely nip this idea in the bud.
Test your damn error paths...
next prev parent reply other threads:[~2024-03-01 4:01 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-27 18:56 Paul E. McKenney
2024-02-27 19:19 ` [Lsf-pc] " Amir Goldstein
2024-02-27 22:59 ` Paul E. McKenney
2024-03-01 3:28 ` Kent Overstreet
2024-03-05 2:43 ` Paul E. McKenney
2024-03-05 2:56 ` Yosry Ahmed
2024-02-28 19:37 ` Matthew Wilcox
2024-02-29 1:29 ` Dave Chinner
2024-02-29 4:20 ` Kent Overstreet
2024-02-29 4:17 ` Kent Overstreet
2024-02-29 4:24 ` Matthew Wilcox
2024-02-29 4:44 ` Kent Overstreet
2024-03-01 2:16 ` NeilBrown
2024-03-01 2:39 ` Kent Overstreet
2024-03-01 2:48 ` Matthew Wilcox
2024-03-01 3:09 ` Kent Overstreet
2024-03-01 3:33 ` James Bottomley
2024-03-01 3:52 ` Kent Overstreet
2024-03-01 4:01 ` Kent Overstreet [this message]
2024-03-01 4:09 ` NeilBrown
2024-03-01 4:18 ` Kent Overstreet
2024-03-01 4:18 ` James Bottomley
2024-03-01 4:08 ` James Bottomley
2024-03-01 4:15 ` Kent Overstreet
2024-03-05 2:54 ` Yosry Ahmed
2024-03-01 5:54 ` Dave Chinner
2024-03-01 20:20 ` Kent Overstreet
2024-03-01 23:47 ` NeilBrown
2024-03-02 0:02 ` Kent Overstreet
2024-03-02 11:33 ` Tetsuo Handa
2024-03-02 16:53 ` Matthew Wilcox
2024-03-03 22:45 ` NeilBrown
2024-03-03 22:54 ` Kent Overstreet
2024-03-04 0:20 ` Dave Chinner
2024-03-04 1:16 ` NeilBrown
2024-03-04 0:35 ` Matthew Wilcox
2024-03-04 1:27 ` NeilBrown
2024-03-04 2:05 ` Kent Overstreet
2024-03-12 14:46 ` Vlastimil Babka
2024-03-12 22:09 ` NeilBrown
2024-03-20 18:32 ` Dan Carpenter
2024-03-20 18:48 ` Vlastimil Babka
2024-03-20 18:55 ` Matthew Wilcox
2024-03-20 19:07 ` Kent Overstreet
2024-03-20 19:14 ` Matthew Wilcox
2024-03-20 19:33 ` Kent Overstreet
2024-03-20 19:09 ` Kent Overstreet
2024-03-21 6:27 ` Dan Carpenter
2024-03-22 1:47 ` NeilBrown
2024-03-22 6:13 ` Dan Carpenter
2024-03-24 22:31 ` NeilBrown
2024-03-25 8:43 ` Dan Carpenter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=vpyvfmlr2cc6oyinf676zgc7mdqbbul2mq67kvkfebze3f4ov2@ucp43ej3dlrh \
--to=kent.overstreet@linux.dev \
--cc=James.Bottomley@hansenpartnership.com \
--cc=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.de \
--cc=paulmck@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox