From: Andy Whitcroft <apw@shadowen.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@skynet.ie>,
Nicolas Mailhot <nicolas.mailhot@laposte.net>,
Christoph Lameter <clameter@sgi.com>,
akpm@linux-foundation.org,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
Date: Thu, 17 May 2007 13:22:09 +0100 [thread overview]
Message-ID: <464C48F1.3060903@shadowen.org> (raw)
In-Reply-To: <464BFF9D.809@yahoo.com.au>
Nick Piggin wrote:
> Mel Gorman wrote:
>> On (17/05/07 01:44), Nick Piggin didst pronounce:
>
>>>> If the watermark was totally ignored with the second patch, I would
>>>> understand
>>>> but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
>>>> allocation, the watermarks are obeyed for order-0 so memory does not
>>>> get
>>>> exhausted as that could cause a host of problems. The difference is
>>>> if this
>>>> is a HIGH or HARDER allocation and the memory can be granted without
>>>> going
>>>> belong the order-0 watermarks, it'll succeed. Would it be better if the
>>>> lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
>>>> should be obeyed?
>>>
>>> But I don't know why you want to disobey higher order watermarks in the
>>> first place.
>>
>>
>> Because the original problem was bio_alloc() allocations failing and
>> the OOM
>> log showed that the higher-order pages were available. Patch 2
>> addressed it
>> by succeeding these allocations if the min watermark was not breached
>> with the
>> knowledge that kswapd was awake and reclaiming at the relevant order.
>> I think
>> it may even have solved it without the kswapd change but the kswapd
>> change
>> seemed sensible.
>
> But that just breaks the watermarks.
>
> It could be that the actual values of the watermarks as they are now are
> not very good ones, which is where the problem is coming from.
>
>
>>> *Those* are exactly the things that are going to be helpful
>>> to fix this problem of atomic higher order allocations failing or non
>>> atomic ones going into direct reclaim.
>>>
>>
>>
>> And the intention was that non-atomic ones would go into direct reclaim
>> after kicking kswapd but the atomic allocations would at least
>> succeeed if
>> the memory was there as long as they don't totally mess up watermarks.
>
> But we have 3 levels of watermarks, so you can keep a reserve for atomic
> allocations _and_ a buffer between the reclaim watermark and the direct
> reclaim watermark.
>
>
>>>> Raising watermarks is no guarantee that a high-order allocation that
>>>> can sleep
>>>> will occur at the right time to kick kswapd awake and that it'll get
>>>> back from
>>>> whatever it's doing in time to spot the new order and start
>>>> reclaiming again.
>>>
>>> You don't *need* a higher order allocation that can sleep in order
>>> to kick kswapd. Crikey, I keep saying this.
>>>
>>
>>
>> Indeed, we seem to have got stuck in a loop of sorts.
>>
>> I understand that kswapd gets kicked awake either way but there must be a
>> timing issue. Lets say we had a situations like
>>
>> order-0 alloc
>> watermark hit => wake kswapd
>> order-0 alloc kswapd reclaiming order 0
>> order-0 alloc kswapd reclaiming order 0
>> order-3 alloc => kick kswap for order 3
>> order-0 alloc kswapd reclaiming order 0
>> order-3 alloc kswapd reclaiming order 0
>> order-3 alloc kswapd reclaiming order 0
>> order-3 alloc => highorder mark hit, fail
>>
>> kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
>> and spots the new order and start over again. So there is a potentially
>> sizable window there where problems can hit. Right?
>
> Take a look at the code. wakeup_kswapd and __alloc_pages.
>
> First, assume the zone is above high watermarks for order-0 and order-1.
> order-0 allocs...
> order-1 low watermark hit => don't care, not allocing order-1
> order-0 low watermark hit => wake kswapd reclaim order 0
> order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1
> order-1 allocs continue to succeed until the min watermark is hit
> order-1 *atomic* allocs continue until the atomic reserve is hit
> order-1 memalloc allocs continue until no more order-1 pages left.
This represents the ideal. However we never consider the reserves at
order-1 unless we get an order-1 allocation. With lots of order-0
allocations (the norm) we can run the order-1 availability well below
even the atomic reserve without anyone noticing, while the total reserve
is above the order-0 low watermark. Here kswapd has been idle as there
is only order-0 activity and we have sufficient of those. THEN an
order-1 comes in, we are below the order-1 low watermarks, we wake
kswapd, and retry and discover we are below the atomic threshold and
_fail_ the allocation.
>
> There really is (or should be) a proper watermarking system in place that
> provides the right buffering for higher order allocations.
I think that this is should be, not is.
>>> Working out why it apparently isn't working, first. Then maybe look at
>>> raising watermarks (they get reduced fairly rapidly as the order
>>> increases,
>>> so it might just be that there is not enough at order-3).
>>>
>>
>>
>> I believe it failed to work due to a combination of kswapd reclaiming at
>> the wrong order for a while and the fact that the watermarks are pretty
>> agressive when it comes to higher orders. I'm trying to think of
>> alternative fixes but keep coming back to the current fix using
>> !(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
>> the memory is there and above min watermarks at order-0.
>
> kswapd reclaiming at the wrong order should be a bug. It should start
> reclaiming at the right order as soon as an allocation (atomic or not)
> goes through the "start reclaiming now" watermark.
>
> Now this is just looking at mainline code that has the kswapd_max_order,
> and kswapd doesn't actually reclaim "at" any order -- it just uses the
> kswapd_max_order to know when the required "stop reclaiming now" marks
> have been hit. If lumpy reclaim is not reclaiming at the right order,
> then it means it isn't refreshing from kswapd_max_order enough.
Yes I believe all of this is working as designed. The problem is that
we treat order-0 and order-1 allocations as independant. We do not take
into account that we split order-1's to make order-0. We do not check
the order-1 reserve for order 0 and so wake kswapd early enough. It is
very hard given the interdependant nature if the current calculation to
detect transitions at _other_ orders when we allocate at any specific order.
Hmmmmmm.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-17 12:22 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
2007-05-14 18:01 ` Christoph Lameter
2007-05-14 18:13 ` Christoph Lameter
2007-05-14 18:24 ` Mel Gorman
2007-05-14 18:52 ` Christoph Lameter
2007-05-15 8:42 ` Nicolas Mailhot
2007-05-15 9:16 ` Mel Gorman
2007-05-16 8:25 ` Nick Piggin
2007-05-16 9:03 ` Mel Gorman
2007-05-16 9:10 ` Nick Piggin
2007-05-16 9:45 ` Mel Gorman
2007-05-16 12:28 ` Nick Piggin
2007-05-16 13:50 ` Mel Gorman
2007-05-16 14:04 ` Nick Piggin
2007-05-16 15:32 ` Mel Gorman
2007-05-16 15:44 ` Nick Piggin
2007-05-16 16:46 ` Mel Gorman
2007-05-17 7:09 ` Nick Piggin
2007-05-17 12:22 ` Andy Whitcroft [this message]
2007-05-18 2:25 ` Nick Piggin
2007-05-16 15:46 ` Nick Piggin
2007-05-16 14:20 ` Nick Piggin
2007-05-16 15:06 ` Nicolas Mailhot
2007-05-16 15:33 ` Mel Gorman
2007-05-15 17:09 ` Christoph Lameter
2007-05-15 4:39 ` Christoph Lameter
2007-05-14 18:19 ` Mel Gorman
2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
2007-05-16 12:14 ` Nick Piggin
2007-05-16 13:24 ` Mel Gorman
2007-05-16 13:35 ` Nick Piggin
2007-05-16 14:00 ` Mel Gorman
2007-05-16 14:11 ` Nick Piggin
2007-05-16 18:28 ` Andy Whitcroft
2007-05-16 18:48 ` Mel Gorman
2007-05-16 19:00 ` Christoph Lameter
2007-05-17 7:34 ` Nick Piggin
2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=464C48F1.3060903@shadowen.org \
--to=apw@shadowen.org \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=nickpiggin@yahoo.com.au \
--cc=nicolas.mailhot@laposte.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox