From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx186.postini.com [74.125.245.186]) by kanga.kvack.org (Postfix) with SMTP id CBE966B0044 for ; Mon, 23 Apr 2012 18:20:58 -0400 (EDT) Received: by dadq36 with SMTP id q36so53842dad.8 for ; Mon, 23 Apr 2012 15:20:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1335214564-17619-1-git-send-email-yinghan@google.com> References: <1335214564-17619-1-git-send-email-yinghan@google.com> From: KOSAKI Motohiro Date: Mon, 23 Apr 2012 18:20:37 -0400 Message-ID: Subject: Re: [RFC PATCH] do_try_to_free_pages() might enter infinite loop Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Ying Han Cc: Michal Hocko , Johannes Weiner , Mel Gorman , KAMEZAWA Hiroyuki , Rik van Riel , Minchan Kim , Hugh Dickins , Nick Piggin , Andrew Morton , linux-mm@kvack.org On Mon, Apr 23, 2012 at 4:56 PM, Ying Han wrote: > This is not a patch targeted to be merged at all, but trying to understan= d > a logic in global direct reclaim. > > There is a logic in global direct reclaim where reclaim fails on priority= 0 > and zone->all_unreclaimable is not set, it will cause the direct to start= over > from DEF_PRIORITY. In some extreme cases, we've seen the system hang whic= h is > very likely caused by direct reclaim enters infinite loop. > > There have been serious patches trying to fix similar issue and the lates= t > patch has good summary of all the efforts: > > commit 929bea7c714220fc76ce3f75bef9056477c28e74 > Author: KOSAKI Motohiro > Date: =A0 Thu Apr 14 15:22:12 2011 -0700 > > =A0 =A0vmscan: all_unreclaimable() use zone->all_unreclaimable as a name > > Kosaki explained the problem triggered by async zone->all_unreclaimable a= nd > zone->pages_scanned where the later one was being checked by direct recla= im. > However, after the patch, the problem remains where the setting of > zone->all_unreclaimable is asynchronous with zone is actually reclaimable= or not. > > The zone->all_unreclaimable flag is set by kswapd by checking zone->pages= _scanned in > zone_reclaimable(). Is that possible to have zone->all_unreclaimable =3D= =3D false while > the zone is actually unreclaimable? > > 1. while kswapd in reclaim priority loop, someone frees a page on the zon= e. It > will end up resetting the pages_scanned. > > 2. kswapd is frozen for whatever reason. I noticed Kosaki's covered the > hibernation case by checking oom_killer_disabled, but not sure if that is > everything we need to worry about. The key point here is that direct recl= aim > relies on a flag which is set by kswapd asynchronously, that doesn't soun= d safe. If kswapd was frozen except hibernation, why don't you add frozen check instead of hibernation check? And when and why is that happen? > > Instead of keep fixing the problem, I am wondering why we have the logic > "not oom but keep trying reclaim w/ priority 0 reclaim failure" at the fi= rst place: > > Here is the patch introduced the logic initially: > > commit 408d85441cd5a9bd6bc851d677a10c605ed8db5f > Author: Nick Piggin > Date: =A0 Mon Sep 25 23:31:27 2006 -0700 > > =A0 =A0[PATCH] oom: use unreclaimable info > > However, I didn't find detailed description of what problem the commit tr= ying > to fix and wondering if the problem still exist after 5 years. I would be= happy > to see the later case where we can consider to revert the initial patch. This patch fixed one of false oom issue. Think, 1. thread-a reach priority-0. 2. thread-b was exited and free a lot of pages. 3. thread-a call out_of_memory(). This is not very good because we now have enough memory.... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org