From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93293E77188 for ; Thu, 19 Dec 2024 01:28:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 027216B0082; Wed, 18 Dec 2024 20:28:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EF2176B0083; Wed, 18 Dec 2024 20:28:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6B696B0085; Wed, 18 Dec 2024 20:28:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B59B56B0082 for ; Wed, 18 Dec 2024 20:28:07 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 194B81C831D for ; Thu, 19 Dec 2024 01:28:07 +0000 (UTC) X-FDA: 82909972278.29.C26CE9E Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf17.hostedemail.com (Postfix) with ESMTP id 9EAE34001B for ; Thu, 19 Dec 2024 01:27:37 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734571649; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0eBoiF9gfnQF7aXMx9yCnSj3nkiNQPCa4YFNMz2SMN8=; b=dIZMd4P52ZvHJhFgvVgtqbm1bKXpyALWIAjxFzIs6clSkQrnfpuyMNnjOdJejoYNQAkSDO uJcKDZt/M2Bsv7aDHt2hAO9pqNcR5Nq+sBmbuRRu3HKIXtUdyNNz1DD9IhZoIy12NFt+U7 duXqObl0Thuty3qzFD5s0SYCKKF93J0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734571649; a=rsa-sha256; cv=none; b=HtmM9/AFDF+K/gyxhLQu82q6Mcu6TXl4O6e2RgxeNzKDFkfR6h44vXcttHo4aHj2JEIUUB qI3ZW/J3laLALXdT+K+fA2fFIDJmQhAK1pB8fSuUiQDFo3BS0fZgVZ2MXQGvp2fu+CuQ6S WvlYBs+Wk1bSe9P2UBlnd9+l0OiZNqM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YDCbB74LLz4f3jMP for ; Thu, 19 Dec 2024 09:27:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 80FC11A018D for ; Thu, 19 Dec 2024 09:27:54 +0800 (CST) Received: from [10.67.109.79] (unknown [10.67.109.79]) by APP4 (Coremail) with SMTP id gCh0CgAnToKYdmNn59MBFA--.16175S2; Thu, 19 Dec 2024 09:27:54 +0800 (CST) Message-ID: <7d7b3c01-4977-41fa-a19c-4e6399117e8e@huaweicloud.com> Date: Thu, 19 Dec 2024 09:27:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] memcg: fix soft lockup in the OOM process To: Michal Hocko Cc: Tejun Heo , akpm@linux-foundation.org, hannes@cmpxchg.org, yosryahmed@google.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, davidf@vimeo.com, vbabka@suse.cz, handai.szj@taobao.com, rientjes@google.com, kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com References: <20241217121828.3219752-1-chenridong@huaweicloud.com> <872c5042-01d6-4ff3-94bc-8df94e1e941c@huaweicloud.com> <02f7d744-f123-4523-b170-c2062b5746c8@huaweicloud.com> Content-Language: en-US From: Chen Ridong In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-CM-TRANSID:gCh0CgAnToKYdmNn59MBFA--.16175S2 X-Coremail-Antispam: 1UD129KBjvJXoWxWr45Zw1DCFWftr4ftry5urg_yoW5KFW5pF yDXa4jyan5J3yFqr12vw10vryayrWIka15Xr4Dtr15KrnIqw1avry2krW7uF97uFn2yF12 vr4j9w1xWr4jvaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvjb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AF wI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5 MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I 0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWU JVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUIa 0PDUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Queue-Id: 9EAE34001B X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: debc3eashqgeidqkofgg135s63fxcm3j X-HE-Tag: 1734571657-36450 X-HE-Meta: U2FsdGVkX19D+aAulKsSYcjU7nzQ7+ElUaP23Ut5MZF+xKS8FvfnCmzNl2nVm96RyyYFDtN+X7GgeRXIA6ZKS/Fsbl3yMWYbO0csTi7wmgecEXxBaM4wCcNCqWx3zSAXCLbiugg4RUjNtSpZmRDqLOB8a0HPtp8s/pappHJOxLNYtUFHSnBkEUAv+Iq+F9BwK/znl6CDNa3IeRS5M6nRX+2r1z3F4vPZF2E50nmjqOXCkwSfy+lgUawqa0vGtOjCX1LbAaB5DK2scTY1n/2v6jfp7tTNIu6vwE2LnAPTYezWcFeeMW7MqJIMgiZ5CLsU0OwhW6U5nd03snI7tAjOl8eYkFnekxqVan2kVWpeh46admC0kX5fwj1Jbd0Dp9ZaqYJUfCUBWQlLVkS9QR5tO3M+UAdWRBYeSGi8n0EiIoMkzdRhjO9/tSI8IupeVa1Kow/mnQNr3KMaUSy3YEJSZk3e1HwCOM8mhDX2X2zq9N753fv87VMO34wkry3Imz/e8KUyZYAegePHv+LdXze9r6xhIxOeG6al0+G3RqN1mRytVV2jUyXG8OBsLzoLI7mQ8KnknJYc3vSoAQYj5RAoNtBVJvfrohki70hT4AdGRGsAzmZWXLPhiMwUfahwv8JMQ35W3LuRhiwm/q5l3KK0Wu0IZLhWRI6MeTwOQ59ee5r1hNb5pD7dpqvuOjhp5gqTkD4DrBNW+2TdIlTRJKdjO9KXxT/aE82IFCVYXnY/+YNyBiGKj6uoBRY9sb4yUYCCXO4EJdauJBpYw+k/7KKEZSM0E7F/fHByItd7YRyMP05SYhcdX9VHOVHMStUmloxo3XePHWGhngC9VJiflWeY+BEKtfmJOwCBr3gF3r3ypcZmGty8d3R+EmPT1d4y5teBLn5resKv1e3du6bCivca7Vh3SHbk7DoFkLycVlFpUw0WIhyRI7JDmgaYH9HgGhGXGd462/K40mLEqpmRG5Y /hCAAckF H8GdvR/uJRMEHEhP70zGqiPhTWY7lYWXzEZGHqFQZqm2SyMDISNVpQZQdH09iqBE6NxtIr5MvHYcleQkLXr4PzyPkqVpn7wCXkERomOMZSKY9U0AMufUOFd9tarG19lrqKjtPT2uACJl7HKN0uUEEAjnwkqjS8hhwb9+V X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/18 18:22, Michal Hocko wrote: > On Wed 18-12-24 17:00:38, Chen Ridong wrote: >> >> >> On 2024/12/18 15:56, Michal Hocko wrote: >>> On Wed 18-12-24 15:44:34, Chen Ridong wrote: >>>> >>>> >>>> On 2024/12/17 20:54, Michal Hocko wrote: >>>>> On Tue 17-12-24 12:18:28, Chen Ridong wrote: >>>>> [...] >>>>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>>>>> index 1c485beb0b93..14260381cccc 100644 >>>>>> --- a/mm/oom_kill.c >>>>>> +++ b/mm/oom_kill.c >>>>>> @@ -390,6 +390,7 @@ static int dump_task(struct task_struct *p, void *arg) >>>>>> if (!is_memcg_oom(oc) && !oom_cpuset_eligible(p, oc)) >>>>>> return 0; >>>>>> >>>>>> + cond_resched(); >>>>>> task = find_lock_task_mm(p); >>>>>> if (!task) { >>>>>> /* >>>>> >>>>> This is called from RCU read lock for the global OOM killer path and I >>>>> do not think you can schedule there. I do not remember specifics of task >>>>> traversal for crgoup path but I guess that you might need to silence the >>>>> soft lockup detector instead or come up with a different iteration >>>>> scheme. >>>> >>>> Thank you, Michal. >>>> >>>> I made a mistake. I added cond_resched in the mem_cgroup_scan_tasks >>>> function below the fn, but after reconsideration, it may cause >>>> unnecessary scheduling for other callers of mem_cgroup_scan_tasks. >>>> Therefore, I moved it into the dump_task function. However, I missed the >>>> RCU lock from the global OOM. >>>> >>>> I think we can use touch_nmi_watchdog in place of cond_resched, which >>>> can silence the soft lockup detector. Do you think that is acceptable? >>> >>> It is certainly a way to go. Not the best one at that though. Maybe we >>> need different solution for the global and for the memcg OOMs. During >>> the global OOM we rarely care about latency as the whole system is >>> likely to struggle. Memcg ooms are much more likely. Having that many >>> tasks in a memcg certainly requires a further partitioning so if >>> configured properly the OOM latency shouldn't be visible much. But I am >>> wondering whether the cgroup task iteration could use cond_resched while >>> the global one would touch_nmi_watchdog for every N iterations. I might >>> be missing something but I do not see any locking required outside of >>> css_task_iter_*. >> >> Do you mean like that: > > I've had something like this (untested) in mind > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7b3503d12aaf..37abc94abd2e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1167,10 +1167,14 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, > for_each_mem_cgroup_tree(iter, memcg) { > struct css_task_iter it; > struct task_struct *task; > + unsigned int i = 0 > > css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); > - while (!ret && (task = css_task_iter_next(&it))) > + while (!ret && (task = css_task_iter_next(&it))) { > ret = fn(task, arg); > + if (++i % 1000) > + cond_resched(); > + } > css_task_iter_end(&it); > if (ret) { > mem_cgroup_iter_break(memcg, iter); Thank you for your patience. I had this idea in mind as well. However, there are two considerations that led me to reconsider it: 1. I wasn't convinced about how we should call cond_resched every N iterations. Should it be 1000 or 10000? 2. I don't think all callers of mem_cgroup_scan_tasks need cond_resched. Only fn is expensive (e.g., dump_tasks), and it needs cond_resched. At least, I have not encountered any other issue except except when fn is dump_tasks. If you think this is acceptable, I will test and update the patch. Best regards, Ridong