From: Nick Piggin <npiggin@gmail.com>
To: Ying Han <yinghan@google.com>
Cc: Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mel@csn.ul.ie>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>,
Hugh Dickins <hughd@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: [RFC PATCH] do_try_to_free_pages() might enter infinite loop
Date: Tue, 1 May 2012 13:34:30 +1000 [thread overview]
Message-ID: <CAPa8GCC1opy9u6NHy9m=1xU4EfRsHu8VN2kU-bXtRz=z_Mq0PA@mail.gmail.com> (raw)
In-Reply-To: <CALWz4ixeBq7cMoopukaRZxUmH1i0+L4xZ_49B0YpZ4iZuRC+Uw@mail.gmail.com>
On 25 April 2012 04:37, Ying Han <yinghan@google.com> wrote:
> On Mon, Apr 23, 2012 at 10:36 PM, Nick Piggin <npiggin@gmail.com> wrote:
>> On 24 April 2012 06:56, Ying Han <yinghan@google.com> wrote:
>>> This is not a patch targeted to be merged at all, but trying to understand
>>> a logic in global direct reclaim.
>>>
>>> There is a logic in global direct reclaim where reclaim fails on priority 0
>>> and zone->all_unreclaimable is not set, it will cause the direct to start over
>>> from DEF_PRIORITY. In some extreme cases, we've seen the system hang which is
>>> very likely caused by direct reclaim enters infinite loop.
>>
>> Very likely, or definitely? Can you reproduce it? What workload?
>
> No, we don't have reproduce workload for that yet. Everything is based
> on the watchdog dump file :(
>
>>
>>>
>>> There have been serious patches trying to fix similar issue and the latest
>>> patch has good summary of all the efforts:
>>>
>>> commit 929bea7c714220fc76ce3f75bef9056477c28e74
>>> Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>> Date: Thu Apr 14 15:22:12 2011 -0700
>>>
>>> vmscan: all_unreclaimable() use zone->all_unreclaimable as a name
>>>
>>> Kosaki explained the problem triggered by async zone->all_unreclaimable and
>>> zone->pages_scanned where the later one was being checked by direct reclaim.
>>> However, after the patch, the problem remains where the setting of
>>> zone->all_unreclaimable is asynchronous with zone is actually reclaimable or not.
>>>
>>> The zone->all_unreclaimable flag is set by kswapd by checking zone->pages_scanned in
>>> zone_reclaimable(). Is that possible to have zone->all_unreclaimable == false while
>>> the zone is actually unreclaimable?
>>>
>>> 1. while kswapd in reclaim priority loop, someone frees a page on the zone. It
>>> will end up resetting the pages_scanned.
>>>
>>> 2. kswapd is frozen for whatever reason. I noticed Kosaki's covered the
>>> hibernation case by checking oom_killer_disabled, but not sure if that is
>>> everything we need to worry about. The key point here is that direct reclaim
>>> relies on a flag which is set by kswapd asynchronously, that doesn't sound safe.
>>>
>>> Instead of keep fixing the problem, I am wondering why we have the logic
>>> "not oom but keep trying reclaim w/ priority 0 reclaim failure" at the first place:
>>>
>>> Here is the patch introduced the logic initially:
>>>
>>> commit 408d85441cd5a9bd6bc851d677a10c605ed8db5f
>>> Author: Nick Piggin <npiggin@suse.de>
>>> Date: Mon Sep 25 23:31:27 2006 -0700
>>>
>>> [PATCH] oom: use unreclaimable info
>>>
>>> However, I didn't find detailed description of what problem the commit trying
>>> to fix and wondering if the problem still exist after 5 years. I would be happy
>>> to see the later case where we can consider to revert the initial patch.
>>
>> The problem we were having is that processes would be killed at seemingly
>> random points of time, under heavy swapping, but long before all swap was
>> used.
>>
>> The particular problem IIRC was related to testing a lot of guests on an s390
>> machine. I'm ashamed to have not included more information in the
>> changelog -- I suspect it was probably in a small batch of patches with a
>> description in the introductory mail and not properly placed into patches :(
>>
>> There are certainly a lot of changes in the area since then, so I couldn't be
>> sure of what will happen by taking this out.
>>
>> I don't think the page allocator "try harder" logic was enough to solve the
>> problem, and I think it was around in some form even back then.
>>
>> The biggest problem is that it's not an exact science. It will never do the
>> right thing for everybody, sadly. Even if it is able to allocate pages at a
>> very slow rate, this is effectively as good as a hang for some users. For
>> others, they want to be able to manually intervene before anything is killed.
>>
>> Sorry if this isn't too helpful! Any ideas would be good. Possibly need to have
>> a way to describe these behaviours in an abstract way (i.e., not just magic
>> numbers), and allow user to tune it.
>
> Thank you Nick and this is helpful. I looked up on the patches you
> mentioned, and I can see what problem they were trying to solve by
> that time. However things have been changed a lot, and it is hard to
> tell if the problem still remains on the current kernel or not. By
> spotting each by each, I see either the patch has been replaced by
> different logic or the same logic has been implemented differently.
>
> For this particular one patch, we now have code which does page alloc
> retry before entering OOM. So I am wondering if that will help the OOM
> situation by that time.
Well it's not doing exactly the same thing, actually. And note that the
problem was not about parallel OOM-killing. The fact that the page
reclaim has not made any progress when we last called in does not
actually mean that it cannot make _any_ progress.
My patch is more about detecting the latter case. I don't see there
is equivalent logic in page allocator to replace it.
But again: this is not a question of correct or incorrect as far as I
can see, simply a matter of where you define "hopeless"! I could
easily see the need for way to bias that (kill quickly, medium, try to
never kill).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-05-01 3:34 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-23 20:56 Ying Han
2012-04-23 22:20 ` KOSAKI Motohiro
2012-04-23 23:18 ` Ying Han
2012-04-23 23:19 ` Ying Han
2012-04-24 1:31 ` Minchan Kim
2012-04-24 2:06 ` Ying Han
2012-04-24 16:36 ` Ying Han
2012-04-24 16:38 ` Rik van Riel
2012-04-24 16:45 ` KOSAKI Motohiro
2012-04-24 17:22 ` Ying Han
2012-04-24 17:17 ` Ying Han
2012-04-24 5:36 ` Nick Piggin
2012-04-24 18:37 ` Ying Han
2012-05-01 3:34 ` Nick Piggin [this message]
2012-05-01 16:18 ` Ying Han
2012-05-01 16:20 ` Ying Han
2012-05-01 17:06 ` Rik van Riel
2012-05-02 3:25 ` Nick Piggin
2012-06-11 23:33 ` KOSAKI Motohiro
2012-06-11 23:37 ` KOSAKI Motohiro
2012-06-14 5:25 ` Ying Han
2012-06-12 0:53 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPa8GCC1opy9u6NHy9m=1xU4EfRsHu8VN2kU-bXtRz=z_Mq0PA@mail.gmail.com' \
--to=npiggin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox