Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: mel@skynet.ie (Mel Gorman)
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Nicolas Mailhot <nicolas.mailhot@laposte.net>,
	Christoph Lameter <clameter@sgi.com>,
	Andy Whitcroft <apw@shadowen.org>,
	akpm@linux-foundation.org,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
Date: Wed, 16 May 2007 17:46:32 +0100	[thread overview]
Message-ID: <20070516164631.GD10225@skynet.ie> (raw)
In-Reply-To: <464B26E8.3060404@yahoo.com.au>

On (17/05/07 01:44), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On (17/05/07 00:04), Nick Piggin didst pronounce:
> >
> >>Mel Gorman wrote:
> 
> >>>I guess we should only set this for non kmalloc caches then. 
> >>>So move the call into kmem_cache_create? Would make the min order 3 on
> >>>most of my mm machines.
> >>>===
> >>
> >>You do not *know* if the slab is going to be allocated from. Or maybe it
> >>is a few times at bootup, or once every 10 minutes.
> >>
> >
> >
> >So is your primary issue with raise_kswapd_order() being called at the
> >time a cache is opened for use and instead it should be more selective?
> >
> >
> >>>The second part of what you say is that there could be a non-slab user of
> >>>high order allocs. That is true and expected. In that case, the existing
> >>>mechanism informs kswapd of the higher order as it does today so it can
> >>>reclaim at the higher order for a bit and enter direct reclaim if 
> >>>necessary.
> >>
> >>You seem to have broken the existing mechanism though.
> >>
> >
> >
> >How is it broken exactly? What has changed in this patch is that there
> >may be a minimum order that kswapd reclaims at. The same minimum number
> >of pages are kept free.
> 
> I mean with patch 2.
> 

Ok.

> 
> >If the watermark was totally ignored with the second patch, I would 
> >understand
> >but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
> >allocation, the watermarks are obeyed for order-0 so memory does not get
> >exhausted as that could cause a host of problems. The difference is if this
> >is a HIGH or HARDER allocation and the memory can be granted without going
> >belong the order-0 watermarks, it'll succeed. Would it be better if the
> >lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
> >should be obeyed?
> 
> But I don't know why you want to disobey higher order watermarks in the
> first place.

Because the original problem was bio_alloc() allocations failing and the OOM
log showed that the higher-order pages were available. Patch 2 addressed it
by succeeding these allocations if the min watermark was not breached with the
knowledge that kswapd was awake and reclaiming at the relevant order. I think
it may even have solved it without the kswapd change but the kswapd change
seemed sensible.

> *Those* are exactly the things that are going to be helpful
> to fix this problem of atomic higher order allocations failing or non
> atomic ones going into direct reclaim.
> 

And the intention was that non-atomic ones would go into direct reclaim
after kicking kswapd but the atomic allocations would at least succeeed if
the memory was there as long as they don't totally mess up watermarks.

> 
> >>>It's not being replaced. That existing watermarking is still used. If it
> >>>was being replaced, the for loop in zone_watermark_ok() would have been
> >>>taken out.
> >>
> >>Patch 2 sure doesn't make it any better.
> >>
> >
> >
> >The second patch is simply saying "If you can satisfy the allocation 
> >without
> >going below the watermarks for order-0, then do it". Again, if it used
> >!(alloc_flags & ALLOC_CPUSET), would you be happier?
> 
> No ;)
> 

heh.

> 
> >>>My point is that when it does, a caller is still likely to enter direct
> >>>reclaim and kswapd can help prevent stalls if it pre-emptively reclaims 
> >>>at
> >>>an order known to be commonly used when free pages is below watermarks
> >>
> >>So we should increase the watermarks, and keep the existing, working
> >>code there and it will work for everyone, not just for slab, and it
> >>will not keep higher orders free if they are not needed.
> >>
> >
> >
> >Raising watermarks is no guarantee that a high-order allocation that can 
> >sleep
> >will occur at the right time to kick kswapd awake and that it'll get back 
> >from
> >whatever it's doing in time to spot the new order and start reclaiming 
> >again.
> 
> You don't *need* a higher order allocation that can sleep in order
> to kick kswapd. Crikey, I keep saying this.
> 

Indeed, we seem to have got stuck in a loop of sorts.

I understand that kswapd gets kicked awake either way but there must be a
timing issue. Lets say we had a situations like

order-0 alloc
watermark hit => wake kswapd
order-0 alloc			kswapd reclaiming order 0
order-0 alloc			kswapd reclaiming order 0
order-3 alloc => kick kswap for order 3
order-0 alloc			kswapd reclaiming order 0
order-3 alloc			kswapd reclaiming order 0
order-3 alloc			kswapd reclaiming order 0
order-3 alloc => highorder mark hit, fail

kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
and spots the new order and start over again. So there is a potentially
sizable window there where problems can hit. Right?

> >>>Well, if it could, order:3 allocation failure reports wouldn't occur
> >>>periodically.
> >>
> >>They are reports of failures, not failure to handle the failures.
> >>
> >
> >
> >If the failures were being handled correctly, why would it be logging at
> >all? They would have set __GFP_NOWARN and recovered silently.
> 
> Lots of places don't set __GFP_NOWARN but handle failures. Generally
> you want to keep the warning even for atomic allocations if it is
> a reasonably small order (0 or 1 or even 2).
> 

Fair enough

> The failures I have seen are not "networking stops working". They are
> "e1000 gives page allocation failures", and the replies have always
> been "that's not unexpected". Have you seen *any* of the former type?
> 

Admittadly, I don't recall complaints that networking totally failed. The
result should be that packets drop until such time that the allocations
start succeeding again.

> >>>It already reserves and still occasionally hits the problem.
> >>
> >>e1000 reserves page? It would have to use them in a manner that guaranteed
> >>timely return to the reserve pool like mempools. If it did that then it
> >>would not have a problem.
> >>
> >
> >
> >When I last looked, they kept a series of buffers in a ring buffer. My
> >understanding at the time was that this buffer regularly gets depleted
> >and refilled.
> 
> But refilled via the allocator, right? One which does not revert to a
> private stash if it cannot get a page.
> 

True.

> 
> >>>>All this stuff used to work properly :(
> >>>>
> >>>
> >>>
> >>>It only came to light recently that there might be issues.
> >>
> >>I mean kswapd asynchronously freeing higher order pages proactively. We
> >>should get that working again first.
> >>
> >
> >
> >What do you suggest then?
> 
> Working out why it apparently isn't working, first. Then maybe look at
> raising watermarks (they get reduced fairly rapidly as the order increases,
> so it might just be that there is not enough at order-3).
> 

I believe it failed to work due to a combination of kswapd reclaiming at
the wrong order for a while and the fact that the watermarks are pretty
agressive when it comes to higher orders. I'm trying to think of
alternative fixes but keep coming back to the current fix using 
!(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
the memory is there and above min watermarks at order-0.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-05-16 16:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
2007-05-14 18:01   ` Christoph Lameter
2007-05-14 18:13     ` Christoph Lameter
2007-05-14 18:24       ` Mel Gorman
2007-05-14 18:52         ` Christoph Lameter
2007-05-15  8:42         ` Nicolas Mailhot
2007-05-15  9:16           ` Mel Gorman
2007-05-16  8:25             ` Nick Piggin
2007-05-16  9:03               ` Mel Gorman
2007-05-16  9:10                 ` Nick Piggin
2007-05-16  9:45                   ` Mel Gorman
2007-05-16 12:28                     ` Nick Piggin
2007-05-16 13:50                       ` Mel Gorman
2007-05-16 14:04                         ` Nick Piggin
2007-05-16 15:32                           ` Mel Gorman
2007-05-16 15:44                             ` Nick Piggin
2007-05-16 16:46                               ` Mel Gorman [this message]
2007-05-17  7:09                                 ` Nick Piggin
2007-05-17 12:22                                   ` Andy Whitcroft
2007-05-18  2:25                                     ` Nick Piggin
2007-05-16 15:46                             ` Nick Piggin
2007-05-16 14:20                         ` Nick Piggin
2007-05-16 15:06                           ` Nicolas Mailhot
2007-05-16 15:33                             ` Mel Gorman
2007-05-15 17:09           ` Christoph Lameter
2007-05-15  4:39       ` Christoph Lameter
2007-05-14 18:19     ` Mel Gorman
2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
2007-05-16 12:14   ` Nick Piggin
2007-05-16 13:24     ` Mel Gorman
2007-05-16 13:35       ` Nick Piggin
2007-05-16 14:00         ` Mel Gorman
2007-05-16 14:11           ` Nick Piggin
2007-05-16 18:28             ` Andy Whitcroft
2007-05-16 18:48               ` Mel Gorman
2007-05-16 19:00                 ` Christoph Lameter
2007-05-17  7:34               ` Nick Piggin
2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070516164631.GD10225@skynet.ie \
    --to=mel@skynet.ie \
    --cc=akpm@linux-foundation.org \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=nicolas.mailhot@laposte.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox