* Re: [patch 06/12] mm: oom_kill: simplify OOM killer locking [not found] <048301d066d1$653e63d0$2fbb2b70$@alibaba-inc.com> @ 2015-03-25 8:05 ` Hillf Danton 0 siblings, 0 replies; 5+ messages in thread From: Hillf Danton @ 2015-03-25 8:05 UTC (permalink / raw) To: Johannes Weiner; +Cc: linux-kernel, linux-mm, Hillf Danton > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -32,6 +32,8 @@ enum oom_scan_t { > /* Thread is the potential origin of an oom condition; kill first on oom */ > #define OOM_FLAG_ORIGIN ((__force oom_flags_t)0x1) > > +extern struct mutex oom_lock; > + > static inline void set_current_oom_origin(void) > { > current->signal->oom_flags |= OOM_FLAG_ORIGIN; > @@ -60,9 +62,6 @@ extern void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > struct mem_cgroup *memcg, nodemask_t *nodemask, > const char *message); > > -extern bool oom_zonelist_trylock(struct zonelist *zonelist, gfp_t gfp_flags); > -extern void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_flags); Alternately expose three functions, rather than oom_lock mutex? bool oom_trylock(void); void oom_lock(void); void oom_unlock(void); Hillf -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 00/12] mm: page_alloc: improve OOM mechanism and policy @ 2015-03-25 6:17 Johannes Weiner 2015-03-25 6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner 0 siblings, 1 reply; 5+ messages in thread From: Johannes Weiner @ 2015-03-25 6:17 UTC (permalink / raw) To: linux-mm, linux-fsdevel, linux-kernel Cc: Linus Torvalds, Andrew Morton, Tetsuo Handa, Huang Ying, Andrea Arcangeli, Dave Chinner, Michal Hocko, Theodore Ts'o Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. Here is a breakdown of the proposed changes: mm: oom_kill: remove pointless locking in oom_enable() mm: oom_kill: clean up victim marking and exiting interfaces mm: oom_kill: remove misleading test-and-clear of known TIF_MEMDIE mm: oom_kill: remove pointless locking in exit_oom_victim() mm: oom_kill: generalize OOM progress waitqueue mm: oom_kill: simplify OOM killer locking mm: page_alloc: inline should_alloc_retry() contents These are preparational patches to clean up parts in the OOM killer and the page allocator. Filesystem folks and others that only care about allocation semantics may want to skip over these. mm: page_alloc: wait for OOM killer progress before retrying One of the hangs we have seen reported is from lower order allocations that loop infinitely in the allocator. In an attempt to address that, it has been proposed to limit the number of retry loops - possibly even make that number configurable from userspace - and return NULL once we are certain that the system is "truly OOM". But it wasn't clear how high that number needs to be to reliably determine a global OOM situation from the perspective of an individual allocation. An issue is that OOM killing is currently an asynchroneous operation and the optimal retry number depends on how long it takes an OOM kill victim to exit and release its memory - which of course varies with system load and exiting task. To address this, this patch makes OOM killing synchroneous and only returns to the allocator once the victim has actually exited. With that, the allocator no longer requires retry loops just to poll for the victim releasing memory. mm: page_alloc: private memory reserves for OOM-killing allocations Once out_of_memory() is synchroneous, there are still two issues that can make determining system-wide OOM from a single allocation context unreliable. For one, concurrent allocations can swoop in right after a kill and steal the memory, causing spurious allocation failures for contexts that actually freed memory. But also, the OOM victim could get blocked on some state that the allocation is holding, which would delay the release of the memory (and refilling of the reserves) until after the allocation has completed. This patch creates private reserves for allocations that have issued an OOM kill. Once these reserves run dry, it seems reasonable to assume that other allocations are not succeeding either anymore. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Another hang that was reported was from NOFS allocations. The trouble with these is that they can't issue or wait for writeback during page reclaim, and so we don't want to OOM kill on their behalf. However, with such restrictions on making progress, they are prone to hangs. This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. mm: page_alloc: do not lock up low-order allocations upon OOM With both OOM killing and "true OOM situation" detection more reliable, this patch finally allows allocations up to order 3 to actually fail on OOM and leave the fallback strategy to the caller - as opposed to the current policy of hanging in the allocator. Comments? drivers/staging/android/lowmemorykiller.c | 2 +- include/linux/mmzone.h | 2 + include/linux/oom.h | 12 +- kernel/exit.c | 2 +- mm/internal.h | 3 +- mm/memcontrol.c | 20 +-- mm/oom_kill.c | 167 +++++++----------------- mm/page_alloc.c | 189 +++++++++++++--------------- mm/vmstat.c | 2 + 9 files changed, 154 insertions(+), 245 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 06/12] mm: oom_kill: simplify OOM killer locking 2015-03-25 6:17 [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Johannes Weiner @ 2015-03-25 6:17 ` Johannes Weiner 2015-03-26 13:31 ` Michal Hocko 0 siblings, 1 reply; 5+ messages in thread From: Johannes Weiner @ 2015-03-25 6:17 UTC (permalink / raw) To: linux-mm, linux-fsdevel, linux-kernel Cc: Linus Torvalds, Andrew Morton, Tetsuo Handa, Huang Ying, Andrea Arcangeli, Dave Chinner, Michal Hocko, Theodore Ts'o The zonelist locking and the oom_sem are two overlapping locks that are used to serialize global OOM killing against different things. The historical zonelist locking serializes OOM kills from allocations with overlapping zonelists against each other to prevent killing more tasks than necessary in the same memory domain. Only when neither tasklists nor zonelists from two concurrent OOM kills overlap (tasks in separate memcgs bound to separate nodes) are OOM kills allowed to execute in parallel. The younger oom_sem is a read-write lock to serialize OOM killing against the PM code trying to disable the OOM killer altogether. However, the OOM killer is a fairly cold error path, there is really no reason to optimize for highly performant and concurrent OOM kills. And the oom_sem is just flat-out redundant. Replace both locking schemes with a single global mutex serializing OOM kills regardless of context. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/oom.h | 5 +-- mm/memcontrol.c | 18 +++++--- mm/oom_kill.c | 127 +++++++++++----------------------------------------- mm/page_alloc.c | 8 ++-- 4 files changed, 44 insertions(+), 114 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index a8e6a498cbcb..7deecb7bca5e 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -32,6 +32,8 @@ enum oom_scan_t { /* Thread is the potential origin of an oom condition; kill first on oom */ #define OOM_FLAG_ORIGIN ((__force oom_flags_t)0x1) +extern struct mutex oom_lock; + static inline void set_current_oom_origin(void) { current->signal->oom_flags |= OOM_FLAG_ORIGIN; @@ -60,9 +62,6 @@ extern void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, struct mem_cgroup *memcg, nodemask_t *nodemask, const char *message); -extern bool oom_zonelist_trylock(struct zonelist *zonelist, gfp_t gfp_flags); -extern void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_flags); - extern void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask, int order, const nodemask_t *nodemask, struct mem_cgroup *memcg); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index aab5604e0ac4..9f280b9df848 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1530,6 +1530,8 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int points = 0; struct task_struct *chosen = NULL; + mutex_lock(&oom_lock); + /* * If current has a pending SIGKILL or is exiting, then automatically * select it. The goal is to allow it to allocate so that it may @@ -1537,7 +1539,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, */ if (fatal_signal_pending(current) || task_will_free_mem(current)) { mark_oom_victim(current); - return; + goto unlock; } check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, order, NULL, memcg); @@ -1564,7 +1566,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, mem_cgroup_iter_break(memcg, iter); if (chosen) put_task_struct(chosen); - return; + goto unlock; case OOM_SCAN_OK: break; }; @@ -1585,11 +1587,13 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, css_task_iter_end(&it); } - if (!chosen) - return; - points = chosen_points * 1000 / totalpages; - oom_kill_process(chosen, gfp_mask, order, points, totalpages, memcg, - NULL, "Memory cgroup out of memory"); + if (chosen) { + points = chosen_points * 1000 / totalpages; + oom_kill_process(chosen, gfp_mask, order, points, totalpages, + memcg, NULL, "Memory cgroup out of memory"); + } +unlock: + mutex_unlock(&oom_lock); } #if MAX_NUMNODES > 1 diff --git a/mm/oom_kill.c b/mm/oom_kill.c index d3490b019d46..5cfda39b3268 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -42,7 +42,8 @@ int sysctl_panic_on_oom; int sysctl_oom_kill_allocating_task; int sysctl_oom_dump_tasks = 1; -static DEFINE_SPINLOCK(zone_scan_lock); + +DEFINE_MUTEX(oom_lock); #ifdef CONFIG_NUMA /** @@ -405,13 +406,12 @@ static atomic_t oom_victims = ATOMIC_INIT(0); static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait); bool oom_killer_disabled __read_mostly; -static DECLARE_RWSEM(oom_sem); /** * mark_oom_victim - mark the given task as OOM victim * @tsk: task to mark * - * Has to be called with oom_sem taken for read and never after + * Has to be called with oom_lock held and never after * oom has been disabled already. */ void mark_oom_victim(struct task_struct *tsk) @@ -460,14 +460,14 @@ bool oom_killer_disable(void) * Make sure to not race with an ongoing OOM killer * and that the current is not the victim. */ - down_write(&oom_sem); + mutex_lock(&oom_lock); if (test_thread_flag(TIF_MEMDIE)) { - up_write(&oom_sem); + mutex_unlock(&oom_lock); return false; } oom_killer_disabled = true; - up_write(&oom_sem); + mutex_unlock(&oom_lock); wait_event(oom_victims_wait, !atomic_read(&oom_victims)); @@ -634,52 +634,6 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); -/* - * Try to acquire the OOM killer lock for the zones in zonelist. Returns zero - * if a parallel OOM killing is already taking place that includes a zone in - * the zonelist. Otherwise, locks all zones in the zonelist and returns 1. - */ -bool oom_zonelist_trylock(struct zonelist *zonelist, gfp_t gfp_mask) -{ - struct zoneref *z; - struct zone *zone; - bool ret = true; - - spin_lock(&zone_scan_lock); - for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) - if (test_bit(ZONE_OOM_LOCKED, &zone->flags)) { - ret = false; - goto out; - } - - /* - * Lock each zone in the zonelist under zone_scan_lock so a parallel - * call to oom_zonelist_trylock() doesn't succeed when it shouldn't. - */ - for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) - set_bit(ZONE_OOM_LOCKED, &zone->flags); - -out: - spin_unlock(&zone_scan_lock); - return ret; -} - -/* - * Clears the ZONE_OOM_LOCKED flag for all zones in the zonelist so that failed - * allocation attempts with zonelists containing them may now recall the OOM - * killer, if necessary. - */ -void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_mask) -{ - struct zoneref *z; - struct zone *zone; - - spin_lock(&zone_scan_lock); - for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) - clear_bit(ZONE_OOM_LOCKED, &zone->flags); - spin_unlock(&zone_scan_lock); -} - /** * __out_of_memory - kill the "best" process when we run out of memory * @zonelist: zonelist pointer @@ -693,8 +647,8 @@ void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_mask) * OR try to be smart about which process to kill. Note that we * don't have to be perfect here, we just have to be good. */ -static void __out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, - int order, nodemask_t *nodemask, bool force_kill) +bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, + int order, nodemask_t *nodemask, bool force_kill) { const nodemask_t *mpol_mask; struct task_struct *p; @@ -704,10 +658,13 @@ static void __out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, enum oom_constraint constraint = CONSTRAINT_NONE; int killed = 0; + if (oom_killer_disabled) + return false; + blocking_notifier_call_chain(&oom_notify_list, 0, &freed); if (freed > 0) /* Got some memory back in the last second. */ - return; + goto out; /* * If current has a pending SIGKILL or is exiting, then automatically @@ -720,7 +677,7 @@ static void __out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, if (current->mm && (fatal_signal_pending(current) || task_will_free_mem(current))) { mark_oom_victim(current); - return; + goto out; } /* @@ -760,32 +717,8 @@ static void __out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, */ if (killed) schedule_timeout_killable(1); -} - -/** - * out_of_memory - tries to invoke OOM killer. - * @zonelist: zonelist pointer - * @gfp_mask: memory allocation flags - * @order: amount of memory being requested as a power of 2 - * @nodemask: nodemask passed to page allocator - * @force_kill: true if a task must be killed, even if others are exiting - * - * invokes __out_of_memory if the OOM is not disabled by oom_killer_disable() - * when it returns false. Otherwise returns true. - */ -bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, - int order, nodemask_t *nodemask, bool force_kill) -{ - bool ret = false; - - down_read(&oom_sem); - if (!oom_killer_disabled) { - __out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill); - ret = true; - } - up_read(&oom_sem); - return ret; + return true; } /* @@ -795,27 +728,21 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, */ void pagefault_out_of_memory(void) { - struct zonelist *zonelist; - - down_read(&oom_sem); if (mem_cgroup_oom_synchronize(true)) - goto unlock; + return; - zonelist = node_zonelist(first_memory_node, GFP_KERNEL); - if (oom_zonelist_trylock(zonelist, GFP_KERNEL)) { - if (!oom_killer_disabled) - __out_of_memory(NULL, 0, 0, NULL, false); - else - /* - * There shouldn't be any user tasks runable while the - * OOM killer is disabled so the current task has to - * be a racing OOM victim for which oom_killer_disable() - * is waiting for. - */ - WARN_ON(test_thread_flag(TIF_MEMDIE)); + if (!mutex_trylock(&oom_lock)) + return; - oom_zonelist_unlock(zonelist, GFP_KERNEL); + if (!out_of_memory(NULL, 0, 0, NULL, false)) { + /* + * There shouldn't be any user tasks runnable while the + * OOM killer is disabled, so the current task has to + * be a racing OOM victim for which oom_killer_disable() + * is waiting for. + */ + WARN_ON(test_thread_flag(TIF_MEMDIE)); } -unlock: - up_read(&oom_sem); + + mutex_unlock(&oom_lock); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 656379820190..9ebc760187ac 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2380,10 +2380,10 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, *did_some_progress = 0; /* - * Acquire the per-zone oom lock for each zone. If that - * fails, somebody else is making progress for us. + * Acquire the oom lock. If that fails, somebody else is + * making progress for us. */ - if (!oom_zonelist_trylock(ac->zonelist, gfp_mask)) { + if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; @@ -2428,7 +2428,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) *did_some_progress = 1; out: - oom_zonelist_unlock(ac->zonelist, gfp_mask); + mutex_unlock(&oom_lock); return page; } -- 2.3.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 06/12] mm: oom_kill: simplify OOM killer locking 2015-03-25 6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner @ 2015-03-26 13:31 ` Michal Hocko 2015-03-26 15:17 ` Johannes Weiner 0 siblings, 1 reply; 5+ messages in thread From: Michal Hocko @ 2015-03-26 13:31 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, linux-fsdevel, linux-kernel, Linus Torvalds, Andrew Morton, Tetsuo Handa, Huang Ying, Andrea Arcangeli, Dave Chinner, Theodore Ts'o On Wed 25-03-15 02:17:10, Johannes Weiner wrote: > The zonelist locking and the oom_sem are two overlapping locks that > are used to serialize global OOM killing against different things. > > The historical zonelist locking serializes OOM kills from allocations > with overlapping zonelists against each other to prevent killing more > tasks than necessary in the same memory domain. Only when neither > tasklists nor zonelists from two concurrent OOM kills overlap (tasks > in separate memcgs bound to separate nodes) are OOM kills allowed to > execute in parallel. > > The younger oom_sem is a read-write lock to serialize OOM killing > against the PM code trying to disable the OOM killer altogether. > > However, the OOM killer is a fairly cold error path, there is really > no reason to optimize for highly performant and concurrent OOM kills. > And the oom_sem is just flat-out redundant. > > Replace both locking schemes with a single global mutex serializing > OOM kills regardless of context. OK, this is much simpler. You have missed drivers/tty/sysrq.c which should take the lock as well. ZONE_OOM_LOCKED can be removed as well. __out_of_memory in the kerneldoc should be renamed. [...] > @@ -795,27 +728,21 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > */ > void pagefault_out_of_memory(void) > { > - struct zonelist *zonelist; > - > - down_read(&oom_sem); > if (mem_cgroup_oom_synchronize(true)) > - goto unlock; > + return; OK, so we are back to what David has asked previously. We do not need the lock for memcg and oom_killer_disabled because we know that no tasks (except for potential oom victim) are lurking around at the time oom_killer_disable() is called. So I guess we want to stick a comment into mem_cgroup_oom_synchronize before we check for oom_killer_disabled. After those are fixed, feel free to add Acked-by: Michal Hocko <mhocko@suse.cz> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 06/12] mm: oom_kill: simplify OOM killer locking 2015-03-26 13:31 ` Michal Hocko @ 2015-03-26 15:17 ` Johannes Weiner 2015-03-26 16:07 ` Michal Hocko 0 siblings, 1 reply; 5+ messages in thread From: Johannes Weiner @ 2015-03-26 15:17 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, linux-fsdevel, linux-kernel, Linus Torvalds, Andrew Morton, Tetsuo Handa, Huang Ying, Andrea Arcangeli, Dave Chinner, Theodore Ts'o On Thu, Mar 26, 2015 at 02:31:11PM +0100, Michal Hocko wrote: > On Wed 25-03-15 02:17:10, Johannes Weiner wrote: > > The zonelist locking and the oom_sem are two overlapping locks that > > are used to serialize global OOM killing against different things. > > > > The historical zonelist locking serializes OOM kills from allocations > > with overlapping zonelists against each other to prevent killing more > > tasks than necessary in the same memory domain. Only when neither > > tasklists nor zonelists from two concurrent OOM kills overlap (tasks > > in separate memcgs bound to separate nodes) are OOM kills allowed to > > execute in parallel. > > > > The younger oom_sem is a read-write lock to serialize OOM killing > > against the PM code trying to disable the OOM killer altogether. > > > > However, the OOM killer is a fairly cold error path, there is really > > no reason to optimize for highly performant and concurrent OOM kills. > > And the oom_sem is just flat-out redundant. > > > > Replace both locking schemes with a single global mutex serializing > > OOM kills regardless of context. > > OK, this is much simpler. > > You have missed drivers/tty/sysrq.c which should take the lock as well. > ZONE_OOM_LOCKED can be removed as well. __out_of_memory in the kerneldoc > should be renamed. Argh, an older version had the lock inside out_of_memory() and I never updated the caller when I changed the rules. Thanks. I'll fix both. > > @@ -795,27 +728,21 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > > */ > > void pagefault_out_of_memory(void) > > { > > - struct zonelist *zonelist; > > - > > - down_read(&oom_sem); > > if (mem_cgroup_oom_synchronize(true)) > > - goto unlock; > > + return; > > OK, so we are back to what David has asked previously. We do not need > the lock for memcg and oom_killer_disabled because we know that no tasks > (except for potential oom victim) are lurking around at the time > oom_killer_disable() is called. So I guess we want to stick a comment > into mem_cgroup_oom_synchronize before we check for oom_killer_disabled. I would prefer everybody that sets TIF_MEMDIE and kills a task to hold the lock, including memcg. Simplicity is one thing, but also a global OOM kill might not even be necessary when it's racing with the memcg. > After those are fixed, feel free to add > Acked-by: Michal Hocko <mhocko@suse.cz> Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 06/12] mm: oom_kill: simplify OOM killer locking 2015-03-26 15:17 ` Johannes Weiner @ 2015-03-26 16:07 ` Michal Hocko 0 siblings, 0 replies; 5+ messages in thread From: Michal Hocko @ 2015-03-26 16:07 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, linux-fsdevel, linux-kernel, Linus Torvalds, Andrew Morton, Tetsuo Handa, Huang Ying, Andrea Arcangeli, Dave Chinner, Theodore Ts'o On Thu 26-03-15 11:17:46, Johannes Weiner wrote: > On Thu, Mar 26, 2015 at 02:31:11PM +0100, Michal Hocko wrote: [...] > > > @@ -795,27 +728,21 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > > > */ > > > void pagefault_out_of_memory(void) > > > { > > > - struct zonelist *zonelist; > > > - > > > - down_read(&oom_sem); > > > if (mem_cgroup_oom_synchronize(true)) > > > - goto unlock; > > > + return; > > > > OK, so we are back to what David has asked previously. We do not need > > the lock for memcg and oom_killer_disabled because we know that no tasks > > (except for potential oom victim) are lurking around at the time > > oom_killer_disable() is called. So I guess we want to stick a comment > > into mem_cgroup_oom_synchronize before we check for oom_killer_disabled. > > I would prefer everybody that sets TIF_MEMDIE and kills a task to hold > the lock, including memcg. Simplicity is one thing, but also a global > OOM kill might not even be necessary when it's racing with the memcg. sure I am find with that. > > After those are fixed, feel free to add > > Acked-by: Michal Hocko <mhocko@suse.cz> > > Thanks. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-26 16:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <048301d066d1$653e63d0$2fbb2b70$@alibaba-inc.com>
2015-03-25 8:05 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Hillf Danton
2015-03-25 6:17 [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Johannes Weiner
2015-03-25 6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner
2015-03-26 13:31 ` Michal Hocko
2015-03-26 15:17 ` Johannes Weiner
2015-03-26 16:07 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox