From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 19 Sep 2007 22:43:11 -0700 (PDT) From: David Rientjes Subject: Re: [patch 3/8] oom: save zonelist pointer for oom killer calls In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Tim Pepper Cc: Christoph Lameter , Andrew Morton , Andrea Arcangeli , Rik van Riel , linux-mm@kvack.org, pj@sgi.com List-ID: On Wed, 19 Sep 2007, Tim Pepper wrote: > When no zones in the current zonelist are on the list of OOM zones, > then all the current zones are added to the list of OOM zones...or > none of them depending on how badly OOM'd we are. Tricky. > That's not quite how I intended it to be read, but your analysis of the return values of try_set_zone_oom() are correct. It was intended to return non-zero (i.e. the OOM killer is still invoked) if the zonelist couldn't be added to our list simply because the kzalloc() failed. This is after the test to see if any of the zones are already marked as being in the OOM killer and they weren't (the only reason we didn't immediately return 0). So the return value is correct and the OOM killer should still be invoked. But yeah, it's cleaner if we change all_unreclaimable to an unsigned int flags and convert all current testers of the all_unreclaimable value to use it. Then we can simply set a bit, ZONE_OOM, to identify such zones. > If any single zone in the current zonelist matches in the list of OOM > zones, none of the current zones are added to the list of OOM zones. > Given the patch header comments, this was done on purpose. But > doesn't that leave your list of OOM zones incomplete and open you to > OOM killing in parallel on a given zone? > They aren't added to the list because we aren't going to invoke the OOM killer for them, we're going to return 0 to __alloc_pages(), the only caller of try_set_zone_oom(), and that will put the task to sleep and then retry the allocation that it failed on when it wakes up: if (!try_set_zoom_oom(zonelist)) { schedule_timeout_uninterruptible(1); goto restart; } The only time we return 1 is when none of the zones from the __alloc_pages() zonelist was found already to be marked in the OOM killer and thus it _prevents_ parallel OOM killings. > Or is that all ok in that you're trying to minimise needlessly OOM > killing something when possible but are willing to throw in the towel > when things are tending towards royally hosed? > The entire patchset is aimed toward serialization and trying to avoid needlessly killing tasks when killing one would alleviate the condition. The current OOM killer performs very badly for this. > At any rate this seems complex with subtly varying behaviour that left > me wondering if it really works as advertised. I imagine without the > kzmalloc and instead checking/setting bits in bitmasks the code would > be cleaner. > There's an easy way to check if it works as advertised, and that's to apply the patchset to Linus' latest git and trying it out. I'm fairly happy with the results and I consider it to be much better than the current behavior. I've tested it pretty thoroughly. But I do agree that checking bits in an unsigned int flags member of struct zone will be better, but I intend to still mimic the behavior of a trylock for serialization. try_set_zone_oom() will simply be implemented differently. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org