From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B4E5C30658 for ; Tue, 25 Jun 2024 07:08:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF2F26B0330; Tue, 25 Jun 2024 03:08:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA2D36B0332; Tue, 25 Jun 2024 03:08:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1C196B0333; Tue, 25 Jun 2024 03:08:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AD0ED6B0330 for ; Tue, 25 Jun 2024 03:08:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 636B31A1846 for ; Tue, 25 Jun 2024 07:08:32 +0000 (UTC) X-FDA: 82268532864.01.06ECB20 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf25.hostedemail.com (Postfix) with ESMTP id 64B13A0003 for ; Tue, 25 Jun 2024 07:08:30 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="XXQNBbf/"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf25.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719299298; a=rsa-sha256; cv=none; b=E8rFu3e/vWiT+6eaa84PzhEIM/k/TpC2njPho/kW4iVXuecuwGs1bbSMj5qD6ydV+Tu6jH vogUwXrUaEXteQ+iTKb/nnFV5fuKJMp2jDoxyv/A4mtbDa4Kz7tUsgZ9ytOpXAo19kIr1k Pq22sl5jBue2KGpi5EultrFpnBj1i0M= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="XXQNBbf/"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf25.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719299298; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xyOXxnQynoFodkVpIwJuTU6KXaFUk8BI30dcxk589UQ=; b=nLzO4hQAAmlJyyXfrBxPmo+RyKy5lDjoI2wSLvXOkmFdlMhodrvogkPwTbHfpUfiRMAYDN J1SgzElknR7dsmEa919Nfam77nnCnyYkkVQIc7Ka1Of55H69RzQF4PgDlJPNMgBe9ZvqE1 TWkUtAYInfcp3akGhQ+s28Y2REoY1sY= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a6fe118805dso279909366b.3 for ; Tue, 25 Jun 2024 00:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1719299309; x=1719904109; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=xyOXxnQynoFodkVpIwJuTU6KXaFUk8BI30dcxk589UQ=; b=XXQNBbf/Jei7ptdBkxzMUD5fK+vgWLC5umMCDUm52PhjDMo0JC8EMP4uzdBK+FVNEg ttiw527H4PuUbofLrCKivNH0qAlGkWyH0yxJU25UrcvPENbaAEHkJE70lj7cYbVnovJ/ 9TQl7IJ7VFN47SXAtE07GSgddT34VAFuEYvQYl+TS71nxsADM5LYvW+3vlwLF28IHP1o 8qfuoi1Gp/HMPWUzXKGeY2+u9waE6lVvuNGhBDV/t9CSZeSOyWU9oUzdsidwTCEp9R9t d4mJnSVOPEWA6wKHOKTUsDgk1uuPm/0f3mIrCAtAgD1QZEhoka2FQeUJQExLGZATxjiV foxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719299309; x=1719904109; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xyOXxnQynoFodkVpIwJuTU6KXaFUk8BI30dcxk589UQ=; b=dymjl/0a4HympFKS+rK8fBGlaDOReOLqe+FcNs+HSWK8/vLG75t97ff0kT9NJPDOQh s/acOCprBO3QTMEVFNDWsv/rZvwQ6956qisvrBPPWPOp1u1ixstPzCxyxdzuiwNNvFAB 6ZPFqcoUd1bsMcNFgEkFlJzwnK5nTmm4RUdMT+jCzUPY7f3b0OZ9+c7ohHGbSiH3VzvK cO4B6Jq/mocMTLFl7ajTs4mpDTPopoVMcxkw27j3s/mmLiWkUhx3uTNU52k9XyAIWmRu ctKiOXTA168wKue3jspZPez5qk/KEH1QbjIyeNwl60eBWvGC7rmGYDHQ5zS3UGaLf97O 68qg== X-Forwarded-Encrypted: i=1; AJvYcCV7j3f2Ya8U1GVRLXoH9zW5XWgHtrC15BueDSvkMAmGx1SpYEbAB0fGxD8nIUtQwa3KcVvs1XbrHcGm9jE4s4HeU4o= X-Gm-Message-State: AOJu0YzXIVxkPsVqxpKW2lqk/l3P8PSqY/iXChJ2cf+nWFqgA4eEvyb5 1fItVI0T110nPDCKKMX2AgH1ytjJZfxZiVq2PA2TcVsqwLJ9AYtupw3DmBcWW0U= X-Google-Smtp-Source: AGHT+IHToyxiXKJW1iERakQSaGVg/a4impp4WsyWD8XFkRD7byciNXdvnrBkubTWZUS/5YgxMIJBwA== X-Received: by 2002:a17:907:a701:b0:a6f:5318:b8f3 with SMTP id a640c23a62f3a-a7245dc96c3mr447197666b.57.1719299308840; Tue, 25 Jun 2024 00:08:28 -0700 (PDT) Received: from localhost (109-81-95-13.rct.o2.cz. [109.81.95.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a725ba97a45sm135719166b.139.2024.06.25.00.08.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jun 2024 00:08:28 -0700 (PDT) Date: Tue, 25 Jun 2024 09:08:27 +0200 From: Michal Hocko To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 08/14] mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.c Message-ID: References: <20240625005906.106920-1-roman.gushchin@linux.dev> <20240625005906.106920-9-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240625005906.106920-9-roman.gushchin@linux.dev> X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 64B13A0003 X-Stat-Signature: byzgkaejtieg88b48ajw3c8iprd88pqf X-Rspam-User: X-HE-Tag: 1719299310-751412 X-HE-Meta: U2FsdGVkX1/FytacEtZVAyj7XIxuFq8/Wpj5ShqV0MmuiuCWR8dyua069TdOxoeIOK/un+GmvKfuGdAZjIIlao3JDXMYFEhYu4ahFFDrsYlkegkeICiiYLAqtTzP/Nv6pZE/MowH1+ksrRj1byjko8P95cs5t8PnOYeL3pX2iATIv4DAH+gMXI+wI09+dc7Suapy5dJ5IYHaoriCZm5RlG5ZbZb0ZgUetKOY0EnAsvXm3sJGde6S+0yRqjFmttYo2NunpeozctjCdZ7hn2MwWoFVJHMwg6CQn8Je5TcZqlymrutuGmZuBfKL/N91SBZbaXQDdDzI7BkNrbQYzTMUitVwxyC+jhcBJ7N3BdFQwPCt6cg+0c+C7SHBNyntOwCJFyK5JQr8Zx7NY/TCmqzapnAEv7zidW1mHWL2Zr/+LJa6dLxwS5bVsZXmB5TNcPDxXmjM+aqPdt4gdeZOTPO/XnXSyAxNN3dPEo/XzQPEF6/VosFvpfNET8if7o/nLPcrImAglHlsDDtgUrXE3rVQsr+OOES67c94S7aMLv8ftIQBsMCm+Le7/E/H9RJwPU8Wb0d1Bnp3AjDXSB8o3DMB/eVYufwNkkDHDf8XfA5iEFsvOv5b0Eu9U/lqGA4zaQz+UixCrDCmYFTmsgBrOeWH/FbrvWe+Uep3MErt0VSZamSHynicQ1wgvBmvJ6aJhGdzObmvukI3clSp69jdcVnZahjiBynLu7qwtLkBdA0UR+ERpAIZHfRLtkKyVw0Vhu6F2a/bxFFLZrA+NVCN6oCLqMDvStD6HCQSaviZDj20irDd/WwxDvh1FV6u2lCrHds5b9UfodP3FJA7kRzlA0otZohz/Z0iIM4XSA0ZBCRrf5h+c3CUaybNWx26/jW/1cBzqOT5feo1zWWboWh/IruW2QbCKF6VJEU9JO37ESkSPTPEDydguUSInyeIKsS24Tf7UIM9jEOTO0b4KRcyxiX 8s4vnqXd RfZi9B+Jf1JJiR/y4VF0Bn0OGUiLc0AwO31nTvIj2GkaVMSIt7xBViBKUHhrp7l+y0S2VC4DdEayJC+Ip9Blz5VPVCIN+8yBp1fRm3z3GgmKcxAUs0cIilH8hKhhfi4gYzT+ZN0rlC1RzBvwVlFTDlxnEJczAnk0UEGDmrYEaqgKiQYipsjV1KOE01bLXUja2qE3AwcNKhc29a3nNHBchyiNqmA0HXuZNNkjS+BItZoN/vAQWhfV3xwaW9ZK0l5p8rET2ai2AGCTj258TDscDSy6lga3tcUOkb7E9gfaoAz3yTFpzXVhyt7TPXDBK0p7dEqaU5zPkRXrYDC0x0RDTwK9FzA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 24-06-24 17:59:00, Roman Gushchin wrote: > Cgroup v1 supports a complicated OOM handling in userspace mechanism, > which is not supported by cgroup v2. Let's move the corresponding code > into memcontrol-v1.c. > > Aside from mechanical code movement this patch introduces two new > functions: memcg1_oom_prepare() and memcg1_oom_finish(). > Those are implementing cgroup v1-specific parts of the common memcg > OOM handling path. > > Signed-off-by: Roman Gushchin Acked-by: Michal Hocko > --- > mm/memcontrol-v1.c | 229 ++++++++++++++++++++++++++++++++++++++++++++- > mm/memcontrol-v1.h | 3 +- > mm/memcontrol.c | 216 +----------------------------------------- > 3 files changed, 231 insertions(+), 217 deletions(-) > > diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c > index d7b5c4c14732..253d49d5fb12 100644 > --- a/mm/memcontrol-v1.c > +++ b/mm/memcontrol-v1.c > @@ -110,7 +110,13 @@ struct mem_cgroup_event { > struct work_struct remove; > }; > > -extern spinlock_t memcg_oom_lock; > +#ifdef CONFIG_LOCKDEP > +static struct lockdep_map memcg_oom_lock_dep_map = { > + .name = "memcg_oom_lock", > +}; > +#endif > + > +DEFINE_SPINLOCK(memcg_oom_lock); > > static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, > struct mem_cgroup_tree_per_node *mctz, > @@ -1469,7 +1475,7 @@ static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) > return 0; > } > > -void mem_cgroup_oom_notify(struct mem_cgroup *memcg) > +static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) > { > struct mem_cgroup *iter; > > @@ -1959,6 +1965,225 @@ void memcg1_css_offline(struct mem_cgroup *memcg) > spin_unlock_irq(&memcg->event_list_lock); > } > > +/* > + * Check OOM-Killer is already running under our hierarchy. > + * If someone is running, return false. > + */ > +static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) > +{ > + struct mem_cgroup *iter, *failed = NULL; > + > + spin_lock(&memcg_oom_lock); > + > + for_each_mem_cgroup_tree(iter, memcg) { > + if (iter->oom_lock) { > + /* > + * this subtree of our hierarchy is already locked > + * so we cannot give a lock. > + */ > + failed = iter; > + mem_cgroup_iter_break(memcg, iter); > + break; > + } else > + iter->oom_lock = true; > + } > + > + if (failed) { > + /* > + * OK, we failed to lock the whole subtree so we have > + * to clean up what we set up to the failing subtree > + */ > + for_each_mem_cgroup_tree(iter, memcg) { > + if (iter == failed) { > + mem_cgroup_iter_break(memcg, iter); > + break; > + } > + iter->oom_lock = false; > + } > + } else > + mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); > + > + spin_unlock(&memcg_oom_lock); > + > + return !failed; > +} > + > +static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) > +{ > + struct mem_cgroup *iter; > + > + spin_lock(&memcg_oom_lock); > + mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); > + for_each_mem_cgroup_tree(iter, memcg) > + iter->oom_lock = false; > + spin_unlock(&memcg_oom_lock); > +} > + > +static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) > +{ > + struct mem_cgroup *iter; > + > + spin_lock(&memcg_oom_lock); > + for_each_mem_cgroup_tree(iter, memcg) > + iter->under_oom++; > + spin_unlock(&memcg_oom_lock); > +} > + > +static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) > +{ > + struct mem_cgroup *iter; > + > + /* > + * Be careful about under_oom underflows because a child memcg > + * could have been added after mem_cgroup_mark_under_oom. > + */ > + spin_lock(&memcg_oom_lock); > + for_each_mem_cgroup_tree(iter, memcg) > + if (iter->under_oom > 0) > + iter->under_oom--; > + spin_unlock(&memcg_oom_lock); > +} > + > +static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); > + > +struct oom_wait_info { > + struct mem_cgroup *memcg; > + wait_queue_entry_t wait; > +}; > + > +static int memcg_oom_wake_function(wait_queue_entry_t *wait, > + unsigned mode, int sync, void *arg) > +{ > + struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; > + struct mem_cgroup *oom_wait_memcg; > + struct oom_wait_info *oom_wait_info; > + > + oom_wait_info = container_of(wait, struct oom_wait_info, wait); > + oom_wait_memcg = oom_wait_info->memcg; > + > + if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && > + !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) > + return 0; > + return autoremove_wake_function(wait, mode, sync, arg); > +} > + > +void memcg_oom_recover(struct mem_cgroup *memcg) > +{ > + /* > + * For the following lockless ->under_oom test, the only required > + * guarantee is that it must see the state asserted by an OOM when > + * this function is called as a result of userland actions > + * triggered by the notification of the OOM. This is trivially > + * achieved by invoking mem_cgroup_mark_under_oom() before > + * triggering notification. > + */ > + if (memcg && memcg->under_oom) > + __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); > +} > + > +/** > + * mem_cgroup_oom_synchronize - complete memcg OOM handling > + * @handle: actually kill/wait or just clean up the OOM state > + * > + * This has to be called at the end of a page fault if the memcg OOM > + * handler was enabled. > + * > + * Memcg supports userspace OOM handling where failed allocations must > + * sleep on a waitqueue until the userspace task resolves the > + * situation. Sleeping directly in the charge context with all kinds > + * of locks held is not a good idea, instead we remember an OOM state > + * in the task and mem_cgroup_oom_synchronize() has to be called at > + * the end of the page fault to complete the OOM handling. > + * > + * Returns %true if an ongoing memcg OOM situation was detected and > + * completed, %false otherwise. > + */ > +bool mem_cgroup_oom_synchronize(bool handle) > +{ > + struct mem_cgroup *memcg = current->memcg_in_oom; > + struct oom_wait_info owait; > + bool locked; > + > + /* OOM is global, do not handle */ > + if (!memcg) > + return false; > + > + if (!handle) > + goto cleanup; > + > + owait.memcg = memcg; > + owait.wait.flags = 0; > + owait.wait.func = memcg_oom_wake_function; > + owait.wait.private = current; > + INIT_LIST_HEAD(&owait.wait.entry); > + > + prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); > + mem_cgroup_mark_under_oom(memcg); > + > + locked = mem_cgroup_oom_trylock(memcg); > + > + if (locked) > + mem_cgroup_oom_notify(memcg); > + > + schedule(); > + mem_cgroup_unmark_under_oom(memcg); > + finish_wait(&memcg_oom_waitq, &owait.wait); > + > + if (locked) > + mem_cgroup_oom_unlock(memcg); > +cleanup: > + current->memcg_in_oom = NULL; > + css_put(&memcg->css); > + return true; > +} > + > + > +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) > +{ > + /* > + * We are in the middle of the charge context here, so we > + * don't want to block when potentially sitting on a callstack > + * that holds all kinds of filesystem and mm locks. > + * > + * cgroup1 allows disabling the OOM killer and waiting for outside > + * handling until the charge can succeed; remember the context and put > + * the task to sleep at the end of the page fault when all locks are > + * released. > + * > + * On the other hand, in-kernel OOM killer allows for an async victim > + * memory reclaim (oom_reaper) and that means that we are not solely > + * relying on the oom victim to make a forward progress and we can > + * invoke the oom killer here. > + * > + * Please note that mem_cgroup_out_of_memory might fail to find a > + * victim and then we have to bail out from the charge path. > + */ > + if (READ_ONCE(memcg->oom_kill_disable)) { > + if (current->in_user_fault) { > + css_get(&memcg->css); > + current->memcg_in_oom = memcg; > + } > + return false; > + } > + > + mem_cgroup_mark_under_oom(memcg); > + > + *locked = mem_cgroup_oom_trylock(memcg); > + > + if (*locked) > + mem_cgroup_oom_notify(memcg); > + > + mem_cgroup_unmark_under_oom(memcg); > + > + return true; > +} > + > +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) > +{ > + if (locked) > + mem_cgroup_oom_unlock(memcg); > +} > + > static int __init memcg1_init(void) > { > int node; > diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h > index ef1b7037cbdc..3de956b2422f 100644 > --- a/mm/memcontrol-v1.h > +++ b/mm/memcontrol-v1.h > @@ -87,9 +87,10 @@ enum res_type { > bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, > enum mem_cgroup_events_target target); > unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); > -void mem_cgroup_oom_notify(struct mem_cgroup *memcg); > ssize_t memcg_write_event_control(struct kernfs_open_file *of, > char *buf, size_t nbytes, loff_t off); > > +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); > +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); > > #endif /* __MM_MEMCONTROL_V1_H */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 92fb72bbd494..8abd364ac837 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1616,130 +1616,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > return ret; > } > > -#ifdef CONFIG_LOCKDEP > -static struct lockdep_map memcg_oom_lock_dep_map = { > - .name = "memcg_oom_lock", > -}; > -#endif > - > -DEFINE_SPINLOCK(memcg_oom_lock); > - > -/* > - * Check OOM-Killer is already running under our hierarchy. > - * If someone is running, return false. > - */ > -static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) > -{ > - struct mem_cgroup *iter, *failed = NULL; > - > - spin_lock(&memcg_oom_lock); > - > - for_each_mem_cgroup_tree(iter, memcg) { > - if (iter->oom_lock) { > - /* > - * this subtree of our hierarchy is already locked > - * so we cannot give a lock. > - */ > - failed = iter; > - mem_cgroup_iter_break(memcg, iter); > - break; > - } else > - iter->oom_lock = true; > - } > - > - if (failed) { > - /* > - * OK, we failed to lock the whole subtree so we have > - * to clean up what we set up to the failing subtree > - */ > - for_each_mem_cgroup_tree(iter, memcg) { > - if (iter == failed) { > - mem_cgroup_iter_break(memcg, iter); > - break; > - } > - iter->oom_lock = false; > - } > - } else > - mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); > - > - spin_unlock(&memcg_oom_lock); > - > - return !failed; > -} > - > -static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) > -{ > - struct mem_cgroup *iter; > - > - spin_lock(&memcg_oom_lock); > - mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); > - for_each_mem_cgroup_tree(iter, memcg) > - iter->oom_lock = false; > - spin_unlock(&memcg_oom_lock); > -} > - > -static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) > -{ > - struct mem_cgroup *iter; > - > - spin_lock(&memcg_oom_lock); > - for_each_mem_cgroup_tree(iter, memcg) > - iter->under_oom++; > - spin_unlock(&memcg_oom_lock); > -} > - > -static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) > -{ > - struct mem_cgroup *iter; > - > - /* > - * Be careful about under_oom underflows because a child memcg > - * could have been added after mem_cgroup_mark_under_oom. > - */ > - spin_lock(&memcg_oom_lock); > - for_each_mem_cgroup_tree(iter, memcg) > - if (iter->under_oom > 0) > - iter->under_oom--; > - spin_unlock(&memcg_oom_lock); > -} > - > -static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); > - > -struct oom_wait_info { > - struct mem_cgroup *memcg; > - wait_queue_entry_t wait; > -}; > - > -static int memcg_oom_wake_function(wait_queue_entry_t *wait, > - unsigned mode, int sync, void *arg) > -{ > - struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; > - struct mem_cgroup *oom_wait_memcg; > - struct oom_wait_info *oom_wait_info; > - > - oom_wait_info = container_of(wait, struct oom_wait_info, wait); > - oom_wait_memcg = oom_wait_info->memcg; > - > - if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && > - !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) > - return 0; > - return autoremove_wake_function(wait, mode, sync, arg); > -} > - > -void memcg_oom_recover(struct mem_cgroup *memcg) > -{ > - /* > - * For the following lockless ->under_oom test, the only required > - * guarantee is that it must see the state asserted by an OOM when > - * this function is called as a result of userland actions > - * triggered by the notification of the OOM. This is trivially > - * achieved by invoking mem_cgroup_mark_under_oom() before > - * triggering notification. > - */ > - if (memcg && memcg->under_oom) > - __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); > -} > - > /* > * Returns true if successfully killed one or more processes. Though in some > * corner cases it can return true even without killing any process. > @@ -1753,104 +1629,16 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > > memcg_memory_event(memcg, MEMCG_OOM); > > - /* > - * We are in the middle of the charge context here, so we > - * don't want to block when potentially sitting on a callstack > - * that holds all kinds of filesystem and mm locks. > - * > - * cgroup1 allows disabling the OOM killer and waiting for outside > - * handling until the charge can succeed; remember the context and put > - * the task to sleep at the end of the page fault when all locks are > - * released. > - * > - * On the other hand, in-kernel OOM killer allows for an async victim > - * memory reclaim (oom_reaper) and that means that we are not solely > - * relying on the oom victim to make a forward progress and we can > - * invoke the oom killer here. > - * > - * Please note that mem_cgroup_out_of_memory might fail to find a > - * victim and then we have to bail out from the charge path. > - */ > - if (READ_ONCE(memcg->oom_kill_disable)) { > - if (current->in_user_fault) { > - css_get(&memcg->css); > - current->memcg_in_oom = memcg; > - } > + if (!memcg1_oom_prepare(memcg, &locked)) > return false; > - } > - > - mem_cgroup_mark_under_oom(memcg); > > - locked = mem_cgroup_oom_trylock(memcg); > - > - if (locked) > - mem_cgroup_oom_notify(memcg); > - > - mem_cgroup_unmark_under_oom(memcg); > ret = mem_cgroup_out_of_memory(memcg, mask, order); > > - if (locked) > - mem_cgroup_oom_unlock(memcg); > + memcg1_oom_finish(memcg, locked); > > return ret; > } > > -/** > - * mem_cgroup_oom_synchronize - complete memcg OOM handling > - * @handle: actually kill/wait or just clean up the OOM state > - * > - * This has to be called at the end of a page fault if the memcg OOM > - * handler was enabled. > - * > - * Memcg supports userspace OOM handling where failed allocations must > - * sleep on a waitqueue until the userspace task resolves the > - * situation. Sleeping directly in the charge context with all kinds > - * of locks held is not a good idea, instead we remember an OOM state > - * in the task and mem_cgroup_oom_synchronize() has to be called at > - * the end of the page fault to complete the OOM handling. > - * > - * Returns %true if an ongoing memcg OOM situation was detected and > - * completed, %false otherwise. > - */ > -bool mem_cgroup_oom_synchronize(bool handle) > -{ > - struct mem_cgroup *memcg = current->memcg_in_oom; > - struct oom_wait_info owait; > - bool locked; > - > - /* OOM is global, do not handle */ > - if (!memcg) > - return false; > - > - if (!handle) > - goto cleanup; > - > - owait.memcg = memcg; > - owait.wait.flags = 0; > - owait.wait.func = memcg_oom_wake_function; > - owait.wait.private = current; > - INIT_LIST_HEAD(&owait.wait.entry); > - > - prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); > - mem_cgroup_mark_under_oom(memcg); > - > - locked = mem_cgroup_oom_trylock(memcg); > - > - if (locked) > - mem_cgroup_oom_notify(memcg); > - > - schedule(); > - mem_cgroup_unmark_under_oom(memcg); > - finish_wait(&memcg_oom_waitq, &owait.wait); > - > - if (locked) > - mem_cgroup_oom_unlock(memcg); > -cleanup: > - current->memcg_in_oom = NULL; > - css_put(&memcg->css); > - return true; > -} > - > /** > * mem_cgroup_get_oom_group - get a memory cgroup to clean up after OOM > * @victim: task to be killed by the OOM killer > -- > 2.45.2 -- Michal Hocko SUSE Labs