linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Kent Overstreet" <kent.overstreet@linux.dev>
Cc: "James Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Amir Goldstein" <amir73il@gmail.com>,
	paulmck@kernel.org, lsf-pc@lists.linux-foundation.org,
	linux-mm@kvack.org,
	"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
	"Jan Kara" <jack@suse.cz>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU
Date: Fri, 01 Mar 2024 15:09:09 +1100	[thread overview]
Message-ID: <170926614942.24797.13632376785557689080@noble.neil.brown.name> (raw)
In-Reply-To: <vpyvfmlr2cc6oyinf676zgc7mdqbbul2mq67kvkfebze3f4ov2@ucp43ej3dlrh>

On Fri, 01 Mar 2024, Kent Overstreet wrote:
> On Thu, Feb 29, 2024 at 10:52:06PM -0500, Kent Overstreet wrote:
> > On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote:
> > > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote:
> > > > Or maybe you just want the syscall to return an error instead of
> > > > blocking for an unbounded amount of time if userspace asks for
> > > > something silly.
> > > 
> > > Warn on allocation above a certain size without MAY_FAIL would seem to
> > > cover all those cases.  If there is a case for requiring instant
> > > allocation, you always have GFP_ATOMIC, and, I suppose, we could even
> > > do a bounded reclaim allocation where it tries for a certain time then
> > > fails.
> > 
> > Then you're baking in this weird constant into all your algorithms that
> > doesn't scale as machine memory sizes and working set sizes increase.
> > 
> > > > Honestly, relying on the OOM killer and saying that because that now
> > > > we don't have to write and test your error paths is a lazy cop out.
> > > 
> > > OOM Killer is the most extreme outcome.  Usually reclaim (hugely
> > > simplified) dumps clean cache first and tries the shrinkers then tries
> > > to write out dirty cache.  Only after that hasn't found anything after
> > > a few iterations will the oom killer get activated
> > 
> > All your caches dumped and the machine grinds to a halt and then a
> > random process gets killed instead of simply _failing the allocation_.
> > 
> > > > The same kind of thinking got us overcommit, where yes we got an
> > > > increase in efficiency, but the cost was that everyone started
> > > > assuming and relying on overcommit, so now it's impossible to run
> > > > without overcommit enabled except in highly controlled environments.
> > > 
> > > That might be true for your use case, but it certainly isn't true for a
> > > cheap hosting cloud using containers: overcommit is where you make your
> > > money, so it's absolutely standard operating procedure.  I wouldn't
> > > call cheap hosting a "highly controlled environment" they're just
> > > making a bet they won't get caught out too often.
> > 
> > Reading comprehension fail. Reread what I wrote.
> > 
> > > > And that means allocation failure as an effective signal is just
> > > > completely busted in userspace. If you want to write code in
> > > > userspace that uses as much memory as is available and no more, you
> > > > _can't_, because system behaviour goes to shit if you have overcommit
> > > > enabled or a bunch of memory gets wasted if overcommit is disabled
> > > > because everyone assumes that's just what you do.
> > > 
> > > OK, this seems to be specific to your use case again, because if you
> > > look at what the major user space processes like web browsers do, they
> > > allocate way over the physical memory available to them for cache and
> > > assume the kernel will take care of it.  Making failure a signal for
> > > being over the working set would cause all these applications to
> > > segfault almost immediately.
> > 
> > Again, reread what I wrote. You're restating what I wrote and completely
> > missing the point.
> > 
> > > > Let's _not_ go that route in the kernel. I have pointy sticks to
> > > > brandish at people who don't want to deal with properly handling
> > > > errors.
> > > 
> > > Error legs are the least exercised and most bug, and therefore exploit,
> > > prone pieces of code in C.  If we can get rid of them, we should.
> > 
> > Fuck no.
> > 
> > Having working error paths is _basic_, and learning how to test your
> > code is also basic. If you can't be bothered to do that you shouldn't be
> > writing kernel code.
> > 
> > We are giving far too much by going down the route of "oh, just kill
> > stuff if we screwed the pooch and overcommitted".
> > 
> > I don't fucking care if it's what the big cloud providers want because
> > it's convenient for them, some of us actually do care about reliability.
> > 
> > By just saying "oh, the OO killer will save us" what you're doing is
> > making it nearly impossible to fully utilize a machine without having
> > stuff randomly killed.
> > 
> 
> And besides all that, as a practical matter you can't just "not have
> erro paths" because, like you said, you'd still have to have a max size
> where you WARN() - and _fail the allocation_ - and you've still got to
> unwind.

No.  You warn and DON'T fail the allocation.  Just like lockdep warns of
possible deadlocks but lets you continue.
These will be found in development (mostly) and changed to use
__GFP_RETRY_MAYFAIL and have appropriate error-handling paths.


> 
> The OOM killer can't kill processes while they're stuck blocking on an
> allocation that will rever return in the kernel.

But it can depopulate the user address space (I think).

NeilBrown


> 
> I think we can safely nip this idea in the bud.
> 
> Test your damn error paths...
> 



  reply	other threads:[~2024-03-01  4:09 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 18:56 Paul E. McKenney
2024-02-27 19:19 ` [Lsf-pc] " Amir Goldstein
2024-02-27 22:59   ` Paul E. McKenney
2024-03-01  3:28     ` Kent Overstreet
2024-03-05  2:43       ` Paul E. McKenney
2024-03-05  2:56       ` Yosry Ahmed
2024-02-28 19:37   ` Matthew Wilcox
2024-02-29  1:29     ` Dave Chinner
2024-02-29  4:20       ` Kent Overstreet
2024-02-29  4:17     ` Kent Overstreet
2024-02-29  4:24       ` Matthew Wilcox
2024-02-29  4:44         ` Kent Overstreet
2024-03-01  2:16     ` NeilBrown
2024-03-01  2:39       ` Kent Overstreet
2024-03-01  2:48         ` Matthew Wilcox
2024-03-01  3:09           ` Kent Overstreet
2024-03-01  3:33             ` James Bottomley
2024-03-01  3:52               ` Kent Overstreet
2024-03-01  4:01                 ` Kent Overstreet
2024-03-01  4:09                   ` NeilBrown [this message]
2024-03-01  4:18                     ` Kent Overstreet
2024-03-01  4:18                   ` James Bottomley
2024-03-01  4:08                 ` James Bottomley
2024-03-01  4:15                   ` Kent Overstreet
2024-03-05  2:54           ` Yosry Ahmed
2024-03-01  5:54       ` Dave Chinner
2024-03-01 20:20         ` Kent Overstreet
2024-03-01 23:47           ` NeilBrown
2024-03-02  0:02             ` Kent Overstreet
2024-03-02 11:33               ` Tetsuo Handa
2024-03-02 16:53                 ` Matthew Wilcox
2024-03-03 22:45               ` NeilBrown
2024-03-03 22:54                 ` Kent Overstreet
2024-03-04  0:20                 ` Dave Chinner
2024-03-04  1:16                   ` NeilBrown
2024-03-04  0:35                 ` Matthew Wilcox
2024-03-04  1:27                   ` NeilBrown
2024-03-04  2:05                   ` Kent Overstreet
2024-03-12 14:46                 ` Vlastimil Babka
2024-03-12 22:09                   ` NeilBrown
2024-03-20 18:32                   ` Dan Carpenter
2024-03-20 18:48                     ` Vlastimil Babka
2024-03-20 18:55                       ` Matthew Wilcox
2024-03-20 19:07                         ` Kent Overstreet
2024-03-20 19:14                           ` Matthew Wilcox
2024-03-20 19:33                             ` Kent Overstreet
2024-03-20 19:09                     ` Kent Overstreet
2024-03-21  6:27                 ` Dan Carpenter
2024-03-22  1:47                   ` NeilBrown
2024-03-22  6:13                     ` Dan Carpenter
2024-03-24 22:31                       ` NeilBrown
2024-03-25  8:43                         ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=170926614942.24797.13632376785557689080@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=paulmck@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox