From: "Daniel Phillips" <daniel.raymond.phillips@gmail.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Daniel Phillips <phillips@phunq.net>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Matt Mackall <mpm@selenic.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
David Miller <davem@davemloft.net>,
Andrew Morton <akpm@linux-foundation.org>,
Daniel Phillips <phillips@google.com>
Subject: Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK
Date: Wed, 8 Aug 2007 03:37:31 -0400 [thread overview]
Message-ID: <4a5909270708080037n32be2a73k5c28d33bb02f770b@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0708071513290.3683@schroedinger.engr.sgi.com>
On 8/7/07, Christoph Lameter <clameter@sgi.com> wrote:
> > > AFAICT: This patchset is not throttling processes but failing
> > > allocations.
> >
> > Failing allocations? Where do you see that? As far as I can see,
> > Peter's patch set allows allocations to fail exactly where the user has
> > always specified they may fail, and in no new places. If there is a
> > flaw in that logic, please let us know.
>
> See the code added to slub: Allocations are satisfied from the reserve
> patch or they are failing.
First off, what I know about this patch set is, it works. Without it,
ddsnap deadlocks on network IO, and with it, no deadlock. We are
niggling over how it works, not whether it works, or whether it is
needed.
Next a confession: I have not yet applied this particular patch and
read the resulting file. However, the algorithm it is supposed to
implement with respect to slab is:
1. If the allocation can be satisified in the usual way, do that.
2. Otherwise, if the GFP flags do not include __GFP_MEMALLOC or
PF_MEMALLOC is not set, fail the allocation
3. Otherwise, if the memcache's reserve quota is not reached,
satisfy the request, allocating a new page from the MEMALLOC reserve,
but the memcache's reserve counter and succeed
4. Die. We are the walking dead. Do whatever we feel like, for
example fail allocation. This is a kernel bug, if we are lucky enough
of the kernel will remain running to get some diagnostics.
If it does not implement that algorithm, please shout.
> > > The patchset does not reconfigure the memory reserves as
> > > expected.
> >
> > What do you mean by that? Expected by who?
>
> What would be expected it some recalculation of min_freekbytes?
I still do not know exactly what you are talking about. The patch set
provides a means of adjusting the global memalloc reserve when a
memalloc pool user starts or stops. We could leave that part entirely
out of the patch set and just rely on the reserve being "big enough"
as we have done since the dawn of time. That would leave less to
niggle about and the reserve adjustment mechanism could be submitted
later. Would that be better?
> > > And I suspect that we
> > > have the same issues as in earlier releases with various corner cases
> > > not being covered.
> >
> > Do you have an example?
>
> Try NUMA constraints and zone limitations.
Are you worried about a correctness issue that would prevent the
machine from operating, or are you just worried about allocating
reserve pages to the local node for performance reasons?
> > > Code is added that is supposedly not used.
> >
> > What makes you think that?
>
> Because the argument is that performance does not matter since the code
> patchs are not used.
Used, yes. Maybe heavily, maybe not. The point is, with current
kernels you would never get to these new code paths because your
machine would have slowed to a crawl or deadlocked. So not having the
perfect NUMA implementation of the new memory reserves is perhaps a
new performance issue for NUMA, but it is no show stopper, and
according to design, existing code paths are not degraded by any
measurable extent. Deadlock in current kernels is a showstopper, that
is what this patch set fixes.
Since there is no regression for NUMA here (there is not supposed to
be anyway) I do not think that adding more complexity to the patch set
to optimize NUMA in this corner case that you could never even get to
before is warranted. That is properly a follow-on NUMA-specific
patch, analogous to a per-arch optimization.
> > > If it ever is on a large config then we are in very deep trouble by
> > > the new code paths themselves that serialize things in order to give
> > > some allocations precendence over the other allocations that are made
> > > to fail ....
> >
> > You mean by allocating the reserve memory on the wrong node in NUMA?
>
> No I mean all 1024 processors of our system running into this fail/succeed
> thingy that was added.
If an allocation now fails that would have succeeded in the past, the
patch set is buggy. I can't say for sure one way or another at this
time of night. If you see something, could you please mention a
file/line number?
> > That is on a code path that avoids destroying your machine performance
> > or killing the machine entirely as with current kernels, for which a
>
> As far as I know from our systems: The current kernels do not kill the
> machine if the reserves are configured the right way.
Current kernels deadlock on a regular basis doing fancy block IO. I
am not sure what you mean.
> > few cachelines pulled to another node is a small price to pay. And you
> > are free to use your special expertise in NUMA to make those fallback
> > paths even more efficient, but first you need to understand what they
> > are doing and why.
>
> There is your problem. The justification is not clear at all and the
> solution likely causes unrelated problems.
Well I hope the justification is clear now. Not deadlocking is a very
good thing, and we have a before and after test case. Getting late
here, Peter's email shift starts now ;-)
Regards,
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-08 7:37 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-06 10:29 [PATCH 00/10] foundations for reserve-based allocation Peter Zijlstra
2007-08-06 10:29 ` [PATCH 01/10] mm: gfp_to_alloc_flags() Peter Zijlstra
2007-08-06 10:29 ` [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2007-08-06 18:11 ` Christoph Lameter
2007-08-06 18:21 ` Daniel Phillips
2007-08-06 18:31 ` Peter Zijlstra
2007-08-06 18:43 ` Daniel Phillips
2007-08-06 19:11 ` Christoph Lameter
2007-08-06 19:31 ` Peter Zijlstra
2007-08-06 20:12 ` Christoph Lameter
2007-08-06 18:42 ` Christoph Lameter
2007-08-06 18:48 ` Daniel Phillips
2007-08-06 18:51 ` Christoph Lameter
2007-08-06 19:15 ` Daniel Phillips
2007-08-06 20:12 ` Matt Mackall
2007-08-06 20:19 ` Christoph Lameter
2007-08-06 20:26 ` Peter Zijlstra
2007-08-06 21:05 ` Christoph Lameter
2007-08-06 22:59 ` Daniel Phillips
2007-08-06 23:14 ` Christoph Lameter
2007-08-06 23:49 ` Daniel Phillips
2007-08-07 22:18 ` Christoph Lameter
2007-08-08 7:24 ` Peter Zijlstra
2007-08-08 18:06 ` Christoph Lameter
2007-08-08 7:37 ` Daniel Phillips [this message]
2007-08-08 18:09 ` Christoph Lameter
2007-08-09 18:41 ` Daniel Phillips
2007-08-09 18:49 ` Christoph Lameter
2007-08-10 0:17 ` Daniel Phillips
2007-08-10 1:48 ` Christoph Lameter
2007-08-10 3:34 ` Daniel Phillips
2007-08-10 3:48 ` Christoph Lameter
2007-08-10 8:15 ` Daniel Phillips
2007-08-10 17:46 ` Christoph Lameter
2007-08-10 23:25 ` Daniel Phillips
2007-08-13 6:55 ` Daniel Phillips
2007-08-13 23:04 ` Christoph Lameter
2007-08-06 20:27 ` Andrew Morton
2007-08-06 23:16 ` Daniel Phillips
2007-08-06 22:47 ` Daniel Phillips
2007-08-06 10:29 ` [PATCH 03/10] mm: tag reseve pages Peter Zijlstra
2007-08-06 18:11 ` Christoph Lameter
2007-08-06 18:13 ` Daniel Phillips
2007-08-06 18:28 ` Peter Zijlstra
2007-08-06 19:34 ` Andi Kleen
2007-08-06 18:43 ` Christoph Lameter
2007-08-06 18:47 ` Peter Zijlstra
2007-08-06 18:59 ` Andi Kleen
2007-08-06 19:09 ` Christoph Lameter
2007-08-06 19:10 ` Andrew Morton
2007-08-06 19:16 ` Christoph Lameter
2007-08-06 19:38 ` Matt Mackall
2007-08-06 20:18 ` Andi Kleen
2007-08-06 10:29 ` [PATCH 04/10] mm: slub: add knowledge of reserve pages Peter Zijlstra
2007-08-08 0:13 ` Christoph Lameter
2007-08-08 1:44 ` Matt Mackall
2007-08-08 17:13 ` Christoph Lameter
2007-08-08 17:39 ` Andrew Morton
2007-08-08 17:57 ` Christoph Lameter
2007-08-08 18:46 ` Andrew Morton
2007-08-10 1:54 ` Daniel Phillips
2007-08-10 2:01 ` Christoph Lameter
2007-08-20 7:38 ` Peter Zijlstra
2007-08-20 7:43 ` Peter Zijlstra
2007-08-20 9:12 ` Pekka J Enberg
2007-08-20 9:17 ` Peter Zijlstra
2007-08-20 9:28 ` Pekka Enberg
2007-08-20 19:26 ` Christoph Lameter
2007-08-20 20:08 ` Peter Zijlstra
2007-08-06 10:29 ` [PATCH 05/10] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-08-06 10:29 ` [PATCH 06/10] mm: kmem_estimate_pages() Peter Zijlstra
2007-08-06 10:29 ` [PATCH 07/10] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2007-08-06 10:29 ` [PATCH 08/10] mm: serialize access to min_free_kbytes Peter Zijlstra
2007-08-06 10:29 ` [PATCH 09/10] mm: emergency pool Peter Zijlstra
2007-08-06 10:29 ` [PATCH 10/10] mm: __GFP_MEMALLOC Peter Zijlstra
2007-08-06 17:35 ` [PATCH 00/10] foundations for reserve-based allocation Daniel Phillips
2007-08-06 18:17 ` Peter Zijlstra
2007-08-06 18:40 ` Daniel Phillips
2007-08-06 19:31 ` Daniel Phillips
2007-08-06 19:36 ` Peter Zijlstra
2007-08-06 19:53 ` Daniel Phillips
2007-08-06 17:56 ` Christoph Lameter
2007-08-06 18:33 ` Peter Zijlstra
2007-08-06 20:23 ` Matt Mackall
2007-08-07 0:09 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a5909270708080037n32be2a73k5c28d33bb02f770b@mail.gmail.com \
--to=daniel.raymond.phillips@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
--cc=phillips@google.com \
--cc=phillips@phunq.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox