linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Kent Overstreet <kent.overstreet@linux.dev>,
	Matthew Wilcox <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>, Amir Goldstein <amir73il@gmail.com>,
	 paulmck@kernel.org, lsf-pc@lists.linux-foundation.org,
	linux-mm@kvack.org,
	 linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU
Date: Fri, 01 Mar 2024 10:33:59 +0700	[thread overview]
Message-ID: <a43cf329bcfad3c52540fe33e35e2e65b0635bfd.camel@HansenPartnership.com> (raw)
In-Reply-To: <wpof7womk7rzsqeox63pquq7jfx4qdyb3t45tqogcvxvfvdeza@ospqr2yemjah>

On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote:
> On Fri, Mar 01, 2024 at 02:48:52AM +0000, Matthew Wilcox wrote:
> > On Thu, Feb 29, 2024 at 09:39:17PM -0500, Kent Overstreet wrote:
> > > On Fri, Mar 01, 2024 at 01:16:18PM +1100, NeilBrown wrote:
> > > > Insisting that GFP_KERNEL allocations never returned NULL would
> > > > allow us to remove a lot of untested error handling code....
> > > 
> > > If memcg ever gets enabled for all kernel side allocations we
> > > might start seeing failures of GFP_KERNEL allocations.
> > 
> > Why would we want that behaviour?  A memcg-limited allocation
> > should behave like any other allocation -- block until we've freed
> > some other memory in this cgroup, either by swap or killing or ...
> 
> It's not uncommon to have a more efficient way of doing something if
> you can allocate more memory, but still have the ability to run in a
> more bounded amount of space if you need to; I write code like this
> quite often.

The cgroup design is to do what we do usually, but within settable hard
and soft limits.  So if the kernel could make GFP_KERNEL wait without
failing, the cgroup would mirror that.

> Or maybe you just want the syscall to return an error instead of
> blocking for an unbounded amount of time if userspace asks for
> something silly.

Warn on allocation above a certain size without MAY_FAIL would seem to
cover all those cases.  If there is a case for requiring instant
allocation, you always have GFP_ATOMIC, and, I suppose, we could even
do a bounded reclaim allocation where it tries for a certain time then
fails.

> Honestly, relying on the OOM killer and saying that because that now
> we don't have to write and test your error paths is a lazy cop out.

OOM Killer is the most extreme outcome.  Usually reclaim (hugely
simplified) dumps clean cache first and tries the shrinkers then tries
to write out dirty cache.  Only after that hasn't found anything after
a few iterations will the oom killer get activated.

> The same kind of thinking got us overcommit, where yes we got an
> increase in efficiency, but the cost was that everyone started
> assuming and relying on overcommit, so now it's impossible to run
> without overcommit enabled except in highly controlled environments.

That might be true for your use case, but it certainly isn't true for a
cheap hosting cloud using containers: overcommit is where you make your
money, so it's absolutely standard operating procedure.  I wouldn't
call cheap hosting a "highly controlled environment" they're just
making a bet they won't get caught out too often.

> And that means allocation failure as an effective signal is just
> completely busted in userspace. If you want to write code in
> userspace that uses as much memory as is available and no more, you
> _can't_, because system behaviour goes to shit if you have overcommit
> enabled or a bunch of memory gets wasted if overcommit is disabled
> because everyone assumes that's just what you do.

OK, this seems to be specific to your use case again, because if you
look at what the major user space processes like web browsers do, they
allocate way over the physical memory available to them for cache and
assume the kernel will take care of it.  Making failure a signal for
being over the working set would cause all these applications to
segfault almost immediately.

I think what you're asking for is an API to try to calculate what the
current available headroom in the working set would be?  That's highly
heuristic, but the mm people might have an idea how to do it.

> Let's _not_ go that route in the kernel. I have pointy sticks to
> brandish at people who don't want to deal with properly handling
> errors.

Error legs are the least exercised and most bug, and therefore exploit,
prone pieces of code in C.  If we can get rid of them, we should.

James



  reply	other threads:[~2024-03-01  3:34 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 18:56 Paul E. McKenney
2024-02-27 19:19 ` [Lsf-pc] " Amir Goldstein
2024-02-27 22:59   ` Paul E. McKenney
2024-03-01  3:28     ` Kent Overstreet
2024-03-05  2:43       ` Paul E. McKenney
2024-03-05  2:56       ` Yosry Ahmed
2024-02-28 19:37   ` Matthew Wilcox
2024-02-29  1:29     ` Dave Chinner
2024-02-29  4:20       ` Kent Overstreet
2024-02-29  4:17     ` Kent Overstreet
2024-02-29  4:24       ` Matthew Wilcox
2024-02-29  4:44         ` Kent Overstreet
2024-03-01  2:16     ` NeilBrown
2024-03-01  2:39       ` Kent Overstreet
2024-03-01  2:48         ` Matthew Wilcox
2024-03-01  3:09           ` Kent Overstreet
2024-03-01  3:33             ` James Bottomley [this message]
2024-03-01  3:52               ` Kent Overstreet
2024-03-01  4:01                 ` Kent Overstreet
2024-03-01  4:09                   ` NeilBrown
2024-03-01  4:18                     ` Kent Overstreet
2024-03-01  4:18                   ` James Bottomley
2024-03-01  4:08                 ` James Bottomley
2024-03-01  4:15                   ` Kent Overstreet
2024-03-05  2:54           ` Yosry Ahmed
2024-03-01  5:54       ` Dave Chinner
2024-03-01 20:20         ` Kent Overstreet
2024-03-01 23:47           ` NeilBrown
2024-03-02  0:02             ` Kent Overstreet
2024-03-02 11:33               ` Tetsuo Handa
2024-03-02 16:53                 ` Matthew Wilcox
2024-03-03 22:45               ` NeilBrown
2024-03-03 22:54                 ` Kent Overstreet
2024-03-04  0:20                 ` Dave Chinner
2024-03-04  1:16                   ` NeilBrown
2024-03-04  0:35                 ` Matthew Wilcox
2024-03-04  1:27                   ` NeilBrown
2024-03-04  2:05                   ` Kent Overstreet
2024-03-12 14:46                 ` Vlastimil Babka
2024-03-12 22:09                   ` NeilBrown
2024-03-20 18:32                   ` Dan Carpenter
2024-03-20 18:48                     ` Vlastimil Babka
2024-03-20 18:55                       ` Matthew Wilcox
2024-03-20 19:07                         ` Kent Overstreet
2024-03-20 19:14                           ` Matthew Wilcox
2024-03-20 19:33                             ` Kent Overstreet
2024-03-20 19:09                     ` Kent Overstreet
2024-03-21  6:27                 ` Dan Carpenter
2024-03-22  1:47                   ` NeilBrown
2024-03-22  6:13                     ` Dan Carpenter
2024-03-24 22:31                       ` NeilBrown
2024-03-25  8:43                         ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a43cf329bcfad3c52540fe33e35e2e65b0635bfd.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.de \
    --cc=paulmck@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox