From: Michal Hocko <mhocko@suse.cz>
To: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Andrew Morton <akpm@linux-foundation.org>,
Cong Wang <xiyou.wangcong@gmail.com>,
David Rientjes <rientjes@google.com>,
Oleg Nesterov <oleg@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Linux PM list <linux-pm@vger.kernel.org>
Subject: Re: [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend
Date: Wed, 5 Nov 2014 15:55:51 +0100 [thread overview]
Message-ID: <20141105145551.GF4527@dhcp22.suse.cz> (raw)
In-Reply-To: <20141105124620.GB4527@dhcp22.suse.cz>
On Wed 05-11-14 13:46:20, Michal Hocko wrote:
[...]
> From ef6227565fa65b52986c4626d49ba53b499e54d1 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 5 Nov 2014 11:49:14 +0100
> Subject: [PATCH] OOM, PM: make OOM detection in the freezer path raceless
>
> 5695be142e20 (OOM, PM: OOM killed task shouldn't escape PM suspend)
> has left a race window when OOM killer manages to note_oom_kill after
> freeze_processes checks the counter. The race window is quite small
> and really unlikely and deemed sufficient at the time of submission.
>
> Tejun wasn't happy about this partial solution though and insisted on
> a full solution. That requires the full OOM and freezer exclusion,
> though. This is done by this patch which introduces oom_sem RW lock.
> Page allocation OOM path takes the lock for reading because there might
> be concurrent OOM happening on disjunct zonelists. oom_killer_disabled
> check is moved right before out_of_memory is called because it was
> checked too early before and we do not want to hold the lock while doing
> the last attempt for allocation which might involve zone_reclaim.
This is incorrect because it would cause an endless allocation loop
because we really have to got to no_page if OOM is disabled.
> freeze_processes then takes the lock for write throughout the whole
> freezing process and OOM disabling.
>
> There is no need to recheck all the processes with the full
> synchronization anymore.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> include/linux/oom.h | 5 +++++
> kernel/power/process.c | 50 +++++++++-----------------------------------------
> mm/oom_kill.c | 17 -----------------
> mm/page_alloc.c | 24 ++++++++++++------------
> 4 files changed, 26 insertions(+), 70 deletions(-)
>
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index e8d6e1058723..350b9b2ffeec 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -73,7 +73,12 @@ extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> extern int register_oom_notifier(struct notifier_block *nb);
> extern int unregister_oom_notifier(struct notifier_block *nb);
>
> +/*
> + * oom_killer_disabled can be modified only under oom_sem taken for write
> + * and checked under read lock along with the full OOM handler.
> + */
> extern bool oom_killer_disabled;
> +extern struct rw_semaphore oom_sem;
>
> static inline void oom_killer_disable(void)
> {
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index 5a6ec8678b9a..befce9785233 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -108,30 +108,6 @@ static int try_to_freeze_tasks(bool user_only)
> return todo ? -EBUSY : 0;
> }
>
> -static bool __check_frozen_processes(void)
> -{
> - struct task_struct *g, *p;
> -
> - for_each_process_thread(g, p)
> - if (p != current && !freezer_should_skip(p) && !frozen(p))
> - return false;
> -
> - return true;
> -}
> -
> -/*
> - * Returns true if all freezable tasks (except for current) are frozen already
> - */
> -static bool check_frozen_processes(void)
> -{
> - bool ret;
> -
> - read_lock(&tasklist_lock);
> - ret = __check_frozen_processes();
> - read_unlock(&tasklist_lock);
> - return ret;
> -}
> -
> /**
> * freeze_processes - Signal user space processes to enter the refrigerator.
> * The current thread will not be frozen. The same process that calls
> @@ -142,7 +118,6 @@ static bool check_frozen_processes(void)
> int freeze_processes(void)
> {
> int error;
> - int oom_kills_saved;
>
> error = __usermodehelper_disable(UMH_FREEZING);
> if (error)
> @@ -157,27 +132,20 @@ int freeze_processes(void)
> pm_wakeup_clear();
> printk("Freezing user space processes ... ");
> pm_freezing = true;
> - oom_kills_saved = oom_kills_count();
> +
> + /*
> + * Need to exlude OOM killer from triggering while tasks are
> + * getting frozen to make sure none of them gets killed after
> + * try_to_freeze_tasks is done.
> + */
> + down_write(&oom_sem);
> error = try_to_freeze_tasks(true);
> if (!error) {
> __usermodehelper_set_disable_depth(UMH_DISABLED);
> oom_killer_disable();
> -
> - /*
> - * There might have been an OOM kill while we were
> - * freezing tasks and the killed task might be still
> - * on the way out so we have to double check for race.
> - */
> - if (oom_kills_count() != oom_kills_saved &&
> - !check_frozen_processes()) {
> - __usermodehelper_set_disable_depth(UMH_ENABLED);
> - printk("OOM in progress.");
> - error = -EBUSY;
> - } else {
> - printk("done.");
> - }
> + printk("done.\n");
> }
> - printk("\n");
> + up_write(&oom_sem);
> BUG_ON(in_atomic());
>
> if (error)
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5340f6b91312..bbf405a3a18f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -404,23 +404,6 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
> dump_tasks(memcg, nodemask);
> }
>
> -/*
> - * Number of OOM killer invocations (including memcg OOM killer).
> - * Primarily used by PM freezer to check for potential races with
> - * OOM killed frozen task.
> - */
> -static atomic_t oom_kills = ATOMIC_INIT(0);
> -
> -int oom_kills_count(void)
> -{
> - return atomic_read(&oom_kills);
> -}
> -
> -void note_oom_kill(void)
> -{
> - atomic_inc(&oom_kills);
> -}
> -
> #define K(x) ((x) << (PAGE_SHIFT-10))
> /*
> * Must be called while holding a reference to p, which will be released upon
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9cd36b822444..76095266c4b5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -243,6 +243,7 @@ void set_pageblock_migratetype(struct page *page, int migratetype)
> }
>
> bool oom_killer_disabled __read_mostly;
> +DECLARE_RWSEM(oom_sem);
>
> #ifdef CONFIG_DEBUG_VM
> static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
> @@ -2252,14 +2253,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> }
>
> /*
> - * PM-freezer should be notified that there might be an OOM killer on
> - * its way to kill and wake somebody up. This is too early and we might
> - * end up not killing anything but false positives are acceptable.
> - * See freeze_processes.
> - */
> - note_oom_kill();
> -
> - /*
> * Go through the zonelist yet one more time, keep very high watermark
> * here, this is only to catch a parallel oom killing, we must fail if
> * we're still under heavy pressure.
> @@ -2288,8 +2281,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> if (gfp_mask & __GFP_THISNODE)
> goto out;
> }
> - /* Exhausted what can be done so it's blamo time */
> - out_of_memory(zonelist, gfp_mask, order, nodemask, false);
> +
> + /*
> + * Exhausted what can be done so it's blamo time.
> + * Just make sure that we cannot race with oom_killer disabling
> + * e.g. PM freezer needs to make sure that no OOM happens after
> + * all tasks are frozen.
> + */
> + down_read(&oom_sem);
> + if (!oom_killer_disabled)
> + out_of_memory(zonelist, gfp_mask, order, nodemask, false);
> + up_read(&oom_sem);
>
> out:
> oom_zonelist_unlock(zonelist, gfp_mask);
> @@ -2716,8 +2718,6 @@ rebalance:
> */
> if (!did_some_progress) {
> if (oom_gfp_allowed(gfp_mask)) {
> - if (oom_killer_disabled)
> - goto nopage;
> /* Coredumps can quickly deplete all memory reserves */
> if ((current->flags & PF_DUMPCORE) &&
> !(gfp_mask & __GFP_NOFAIL))
> --
> 2.1.1
>
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-11-05 14:55 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-21 7:27 [PATCH 0/4 -v2] OOM vs. freezer interaction fixes Michal Hocko
2014-10-21 7:27 ` [PATCH 1/4] freezer: Do not freeze tasks killed by OOM killer Michal Hocko
2014-10-21 12:04 ` Rafael J. Wysocki
2014-10-21 7:27 ` [PATCH 2/4] freezer: remove obsolete comments in __thaw_task() Michal Hocko
2014-10-21 12:04 ` Rafael J. Wysocki
2014-10-21 7:27 ` [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend Michal Hocko
2014-10-21 12:09 ` Rafael J. Wysocki
2014-10-21 13:14 ` Michal Hocko
2014-10-21 13:42 ` Rafael J. Wysocki
2014-10-21 14:11 ` Michal Hocko
2014-10-21 14:41 ` Rafael J. Wysocki
2014-10-21 14:29 ` Michal Hocko
2014-10-22 14:39 ` Rafael J. Wysocki
2014-10-22 14:22 ` Michal Hocko
2014-10-22 21:18 ` Rafael J. Wysocki
2014-10-26 18:49 ` Pavel Machek
2014-11-04 19:27 ` Tejun Heo
2014-11-05 12:46 ` Michal Hocko
2014-11-05 13:02 ` Tejun Heo
2014-11-05 13:31 ` Michal Hocko
2014-11-05 13:42 ` Michal Hocko
2014-11-05 14:14 ` Michal Hocko
2014-11-05 15:45 ` Michal Hocko
2014-11-05 15:44 ` Tejun Heo
2014-11-05 16:01 ` Michal Hocko
2014-11-05 16:29 ` Tejun Heo
2014-11-05 16:39 ` Michal Hocko
2014-11-05 16:54 ` Tejun Heo
2014-11-05 17:01 ` Tejun Heo
2014-11-06 13:05 ` Michal Hocko
2014-11-06 15:09 ` Tejun Heo
2014-11-06 16:01 ` Michal Hocko
2014-11-06 16:12 ` Tejun Heo
2014-11-06 16:31 ` Michal Hocko
2014-11-06 16:33 ` Tejun Heo
2014-11-06 16:58 ` Michal Hocko
2014-11-05 17:46 ` Michal Hocko
2014-11-05 17:55 ` Tejun Heo
2014-11-06 12:49 ` Michal Hocko
2014-11-06 15:01 ` Tejun Heo
2014-11-06 16:02 ` Michal Hocko
2014-11-06 16:28 ` Tejun Heo
2014-11-10 16:30 ` Michal Hocko
2014-11-12 18:58 ` [RFC 0/4] OOM vs PM freezer fixes Michal Hocko
2014-11-12 18:58 ` [RFC 1/4] OOM, PM: Do not miss OOM killed frozen tasks Michal Hocko
2014-11-14 17:55 ` Tejun Heo
2014-11-12 18:58 ` [RFC 2/4] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-11-12 18:58 ` [RFC 3/4] OOM, PM: handle pm freezer as an OOM victim correctly Michal Hocko
2014-11-12 18:58 ` [RFC 4/4] OOM: thaw the OOM victim if it is frozen Michal Hocko
2014-11-14 20:14 ` [RFC 0/4] OOM vs PM freezer fixes Tejun Heo
2014-11-18 21:08 ` Michal Hocko
2014-11-18 21:10 ` [RFC 1/2] oom: add helper for setting and clearing TIF_MEMDIE Michal Hocko
2014-11-18 21:10 ` [RFC 2/2] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-11-27 0:47 ` Rafael J. Wysocki
2014-12-02 22:08 ` Tejun Heo
2014-12-04 14:16 ` Michal Hocko
2014-12-04 14:44 ` Tejun Heo
2014-12-04 16:56 ` Michal Hocko
2014-12-04 17:18 ` Michal Hocko
2014-12-05 16:41 ` [PATCH 0/4] OOM vs PM freezer fixes Michal Hocko
2014-12-05 16:41 ` [PATCH -v2 1/5] oom: add helpers for setting and clearing TIF_MEMDIE Michal Hocko
2014-12-06 12:56 ` Tejun Heo
2014-12-07 10:13 ` Michal Hocko
2015-01-07 17:57 ` Tejun Heo
2015-01-07 18:23 ` Michal Hocko
2014-12-05 16:41 ` [PATCH -v2 2/5] OOM: thaw the OOM victim if it is frozen Michal Hocko
2014-12-06 13:06 ` Tejun Heo
2014-12-07 10:24 ` Michal Hocko
2014-12-07 10:45 ` Michal Hocko
2014-12-07 13:59 ` Tejun Heo
2014-12-07 18:55 ` Michal Hocko
2014-12-05 16:41 ` [PATCH -v2 3/5] PM: convert printk to pr_* equivalent Michal Hocko
2014-12-05 22:40 ` Rafael J. Wysocki
2014-12-07 10:26 ` Michal Hocko
2014-12-06 13:08 ` Tejun Heo
2014-12-05 16:41 ` [PATCH -v2 4/5] sysrq: " Michal Hocko
2014-12-06 13:09 ` Tejun Heo
2014-12-05 16:41 ` [PATCH -v2 5/5] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-12-06 13:11 ` Tejun Heo
2014-12-07 10:11 ` Michal Hocko
2015-01-07 18:41 ` Tejun Heo
2015-01-07 18:48 ` Michal Hocko
2015-01-08 11:51 ` Michal Hocko
2014-12-07 10:09 ` [PATCH 0/4] OOM vs PM freezer fixes Michal Hocko
2014-12-07 13:55 ` Tejun Heo
2014-12-07 19:00 ` Michal Hocko
2014-12-18 16:27 ` Michal Hocko
2014-11-05 14:55 ` Michal Hocko [this message]
2014-10-26 18:40 ` [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend Pavel Machek
2014-10-21 7:27 ` [PATCH 4/4] PM: convert do_each_thread to for_each_process_thread Michal Hocko
2014-10-21 12:10 ` Rafael J. Wysocki
2014-10-21 13:19 ` Michal Hocko
2014-10-21 13:43 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141105145551.GF4527@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=rjw@rjwysocki.net \
--cc=tj@kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox