From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F1DBE77187 for ; Wed, 18 Dec 2024 09:00:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E16136B0082; Wed, 18 Dec 2024 04:00:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC48E6B0083; Wed, 18 Dec 2024 04:00:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8BDD6B0085; Wed, 18 Dec 2024 04:00:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A9CA56B0082 for ; Wed, 18 Dec 2024 04:00:51 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B7AC6C0CA5 for ; Wed, 18 Dec 2024 09:00:50 +0000 (UTC) X-FDA: 82907482434.17.D012987 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf14.hostedemail.com (Postfix) with ESMTP id 9E2E7100015 for ; Wed, 18 Dec 2024 09:00:10 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734512433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBA9JXaYGz6RqY/1ngD3sX71OzORWy8eO2izwk2Ifi0=; b=lkMQLT9IWszPC1+Xk507v2HDvaN2+OfQVsQxM2sNqxxRqz8BAWgiZ0d19nv4M3FlAHeBVD B/QokrTF+hdzqC/hAUlqYp1NlLZWTIMErw7DUuKeswKcFxGFyPnunxCJoLEa5/RPW2XaHH qy99TB8bhJRtGkjYhfRtXSNOt3/CNTE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734512433; a=rsa-sha256; cv=none; b=DOLgXHSQQACQE9J1qRg9h+NSIOCjHT59+lIbZt9T08Syvg+JdUWxFfJbGFbIS4nntgzfjQ Wcs66h9hh1O1upoyVNrUGG2qW5+kJLSDLR/DTElxbsTYGFcTrmbbkt6LXg10I5oWiOZqSF VNjAt7R4FS/gMBytpZ+ZxEc5kmHKDMI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YCnh43RSjz4f3kG6 for ; Wed, 18 Dec 2024 17:00:20 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id F21EC1A07BD for ; Wed, 18 Dec 2024 17:00:39 +0800 (CST) Received: from [10.67.109.79] (unknown [10.67.109.79]) by APP1 (Coremail) with SMTP id cCh0CgAHr7A2j2JnGiFEEw--.54528S2; Wed, 18 Dec 2024 17:00:39 +0800 (CST) Message-ID: <02f7d744-f123-4523-b170-c2062b5746c8@huaweicloud.com> Date: Wed, 18 Dec 2024 17:00:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] memcg: fix soft lockup in the OOM process To: Michal Hocko , Tejun Heo Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, yosryahmed@google.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, davidf@vimeo.com, vbabka@suse.cz, handai.szj@taobao.com, rientjes@google.com, kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com References: <20241217121828.3219752-1-chenridong@huaweicloud.com> <872c5042-01d6-4ff3-94bc-8df94e1e941c@huaweicloud.com> Content-Language: en-US From: Chen Ridong In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-CM-TRANSID:cCh0CgAHr7A2j2JnGiFEEw--.54528S2 X-Coremail-Antispam: 1UD129KBjvJXoWxXF18ZFyrCr4kZw1fur1rJFb_yoW5Cr1fpF yDWasFyws8uay0qrnFvw1vvr1Sy392kF4jgr4ktryFyrn0qw1Svryjy3y3uryfZFn2yF12 vF4j9w17Wr1jvFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv0b4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS 14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I 8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWr XwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x 0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU0 s2-5UUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9E2E7100015 X-Stat-Signature: dbbt947mqkxmuumg7tm47zt886yqutjf X-Rspam-User: X-HE-Tag: 1734512410-525400 X-HE-Meta: U2FsdGVkX1+0L4Hrx5quM0fOvhC1mnjZXVGa34bGwDkv5K9MhME8Ug0agFlW1rMl5tpPf6bzzmib6D10tCLrzKEM83MYresm1yHsATbTcYg28WuTXfiK1j+10mHvDLWjdVgNJ9FsLYfpPcXSrlkxt8iJhm2l5lk7ovhV5O5hlN/rWZ5L3sqnEaxuUykEHXsNGN0xtnLQZutbLUfS7fQy9asdlp8rt++OJ7qah1jWvxVfQFIX3nBGhbEftj4ESXXdXTDZwxRqDbPaL10QoHMdeUacyoM6TC/hr6wWublbUxMDq0gFvlYF60Z9BrtiPas9yYS/9Sq/cTX244cGA92tbqA24XVdc/e9FYepdbnf/KJIBi1PTeswzEgwvzsd9z6Hfo8BRgIC2CQnsi2udt6JSPBWnHTRDgPYHrBf3KDJggRuEsFcVX8IL7AjFus9lYZBqKqVyGxEiOJxt4ujZvmk3dQcqHBohM0eIYsBVPAHZieV4Qf3qJpT0cdcpYnV9lCw9WGTxO0REElN04+AZdpyl5mBPb8yfkKLwfXbSX2m5LLMVMwbDchK62XXrStFYQu2KBTbp42VOAdy5SvzDYTE9tGPJ5RJvfOBtF2YVobU0/Oq46Bk2VgCyyGcw1g89aLkta46BpZdrOVhZDUjuoJq7vj/vrmVHXMpRh1bHQI3VAb186WtowG7wY19mE+Y3J2TxYEfZ2Ha6Dk5LgLSXniyBebuFFRVdW6Uo8aD1EStlyiIYTYEFTnLz9DsdcLK7Tl6HteVP0dctw/mf3xq/DbqUtT8ble2jliDR13tjnr4X3toxE5SyfyQqsDkFi1/vqyt94cNP/DhZB06azD0V6GjygVRJ7RE0HeITPsTal1bIvyfz6s5BoMg3iqfAIgOL6BAvKslAJql60NMWCgiqgCUinICx0QXG/YMP7dUry8sipEPwFJjNiU752Q62ECv2I+1LZIOkqQFa9oCHw2e893 dnW4HPMS jcuFYcc4LQFDI+1b8blbsNsBaZTzaUDht1rASnuIfnWW95utiq8O42Wt9QFiPrcnIkmJt0MWwRAReySoAWFDE9/s6Nj7tFTh39Vf5hqnS3WTmOKPyPKFVTRlsX1EdPl5RV0PqmfQIjUZwr0k98QabXG/n1MmoyAaCe1Yd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/18 15:56, Michal Hocko wrote: > On Wed 18-12-24 15:44:34, Chen Ridong wrote: >> >> >> On 2024/12/17 20:54, Michal Hocko wrote: >>> On Tue 17-12-24 12:18:28, Chen Ridong wrote: >>> [...] >>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>>> index 1c485beb0b93..14260381cccc 100644 >>>> --- a/mm/oom_kill.c >>>> +++ b/mm/oom_kill.c >>>> @@ -390,6 +390,7 @@ static int dump_task(struct task_struct *p, void *arg) >>>> if (!is_memcg_oom(oc) && !oom_cpuset_eligible(p, oc)) >>>> return 0; >>>> >>>> + cond_resched(); >>>> task = find_lock_task_mm(p); >>>> if (!task) { >>>> /* >>> >>> This is called from RCU read lock for the global OOM killer path and I >>> do not think you can schedule there. I do not remember specifics of task >>> traversal for crgoup path but I guess that you might need to silence the >>> soft lockup detector instead or come up with a different iteration >>> scheme. >> >> Thank you, Michal. >> >> I made a mistake. I added cond_resched in the mem_cgroup_scan_tasks >> function below the fn, but after reconsideration, it may cause >> unnecessary scheduling for other callers of mem_cgroup_scan_tasks. >> Therefore, I moved it into the dump_task function. However, I missed the >> RCU lock from the global OOM. >> >> I think we can use touch_nmi_watchdog in place of cond_resched, which >> can silence the soft lockup detector. Do you think that is acceptable? > > It is certainly a way to go. Not the best one at that though. Maybe we > need different solution for the global and for the memcg OOMs. During > the global OOM we rarely care about latency as the whole system is > likely to struggle. Memcg ooms are much more likely. Having that many > tasks in a memcg certainly requires a further partitioning so if > configured properly the OOM latency shouldn't be visible much. But I am > wondering whether the cgroup task iteration could use cond_resched while > the global one would touch_nmi_watchdog for every N iterations. I might > be missing something but I do not see any locking required outside of > css_task_iter_*. Do you mean like that: diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index d9061bd55436..9d197a731841 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5023,7 +5023,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) } spin_unlock_irqrestore(&css_set_lock, irqflags); - + cond_resched(); return it->cur_task; } @@ -433,8 +433,10 @@ static void dump_tasks(struct oom_control *oc) struct task_struct *p; rcu_read_lock(); - for_each_process(p) + for_each_process(p) { + touch_nmi_watchdog(); dump_task(p, oc); + } rcu_read_unlock(); } The 'css_task_iter_*' functions are used in many places. We should be very careful when adding cond_resched within these functions. I don't see any RCU or spinlock usage outside of css_task_iter_*, except for mutex locks, such as in cgroup_do_freeze. And perhaps Tj will have some opinions on this? Best regards, Ridong