From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 141FEC02183 for ; Tue, 14 Jan 2025 12:13:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D26F6B007B; Tue, 14 Jan 2025 07:13:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 95BDC6B0083; Tue, 14 Jan 2025 07:13:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D54B6B0085; Tue, 14 Jan 2025 07:13:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 630956B007B for ; Tue, 14 Jan 2025 07:13:52 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 222651A096A for ; Tue, 14 Jan 2025 12:13:52 +0000 (UTC) X-FDA: 83005948704.16.5901BFA Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf06.hostedemail.com (Postfix) with ESMTP id 15EC8180008 for ; Tue, 14 Jan 2025 12:13:45 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736856830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ezP8wrmt6K+yIbSOtHVpKT+shebwNVa0ZxahBhUQCXw=; b=CfS03hePI0w3Vlu+a2OQ3wXPZqYPnrxaFpCIHpwVdMwnG+855ceTPtSB3kbjPU3Uj5J29y rAFAq2VBdSAQJ+cRcqR3wznfxAKQuTQb34bnnlcSlq3g7efwbYsfJGvi3ImSe/M8nPBXPP tn8CKYVORVuPSA9LPNJuZVprMSi7ZgM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736856830; a=rsa-sha256; cv=none; b=zFwxkhrj6QToXNttv35Ccptgy8YuuedV3NLr93tWEvAbnzlBocyDR4lh3ycAftlmkG8lOs 424qh4SLvyoKrIOhRvieN4au7wQavjuEVCu6xIL1bKM4cV/vwRhOYBfgtqzRnhk/o3dibt lx3a2EIrvgmCe4ZUuyjw2VaXZ/BuDP0= Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YXShM43xmz4f3jqC for ; Tue, 14 Jan 2025 20:13:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id BAEAD1A0E99 for ; Tue, 14 Jan 2025 20:13:38 +0800 (CST) Received: from [10.67.109.79] (unknown [10.67.109.79]) by APP3 (Coremail) with SMTP id _Ch0CgCXN8LxVIZnXCCuAw--.9620S2; Tue, 14 Jan 2025 20:13:38 +0800 (CST) Message-ID: <0b6a3935-8b6c-4d11-bacc-31c1ba15b349@huaweicloud.com> Date: Tue, 14 Jan 2025 20:13:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] memcg: fix soft lockup in the OOM process To: Vlastimil Babka , Michal Hocko , Andrew Morton Cc: hannes@cmpxchg.org, yosryahmed@google.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, davidf@vimeo.com, handai.szj@taobao.com, rientjes@google.com, kamezawa.hiroyu@jp.fujitsu.com, RCU , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com References: <20241224025238.3768787-1-chenridong@huaweicloud.com> <1ea309c1-d0f8-4209-b0b0-e69ad4e986ae@suse.cz> <58caaa4f-cf78-4d0f-af31-8a9277b6ebf5@huaweicloud.com> <20250113194546.3de1af46fa7a668111909b63@linux-foundation.org> Content-Language: en-US From: Chen Ridong In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-CM-TRANSID:_Ch0CgCXN8LxVIZnXCCuAw--.9620S2 X-Coremail-Antispam: 1UD129KBjvJXoW7tF15WrykJF47tFWDJrW3Awb_yoW8tFWxpF yUu3WUKFs5Jrn5Xw42q34Sgr12qw1kZrsrXr15Kr13urn8Krn7Zr17Kay5uF93Aryfu3W0 vr4vgrWxurZ0yrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv0b4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS 14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I 8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWr XwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x 0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU0 s2-5UUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Queue-Id: 15EC8180008 X-Stat-Signature: zawiz8sewoiex3qjnfmwaj7em499euun X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736856825-762182 X-HE-Meta: U2FsdGVkX1+RjliILjXwguh0yDhzQ9n2YaGTHZdoSDe43c3hsHayqkbAIjJN3rroeR1ETSVdr1z8gBYl8E59BX+sPnUvVtYUwLTq7ngSnLv7rzzpEKsX4qSEx3/ojaOEM8a1dp4WyapSxE/xdzm2GUVRTjZcmcJmxf+yUnk/nNVG6CdYa1IdkL98pOtORAIO88C1v5OWc2cop+XUeZH6czxD6cET7mrwuNKc6HjZsKgf9toezHAr21hM4TvwVqBQOlQ2xIrOia/1LrHMns+b10J73bIxU1uTO7mRe00LOAe2+hQPYkt709u/skVaiiGMSdK0BjDa1O12Slf5dTqelhOJqWn+0yF7yKHg3us1ut2o3KZ4Iw1A2j4EysDS47aZkO7/T1urKjFhrAJ1zus2p0XsJiLz+C20LsnsKZwFRyGI0NSssI4k+hjYHV7nnDwIv+WOpdU7B3IwQoshKBayeeMPem/195v0te4QUTTYr15wbfpI3wOu0df89pLcGILOZ69ejzgF0p15mMFaKvvPknAhAEUXWhv+3qaPX6U9Ukw+F4P8Sdq18bBqn0tg+9MjZEghuKlIr+WM/wr2C7LppHsydXTX9NhgMEtSm06jEksyfEl+rcyPbOSGom+GgrVCABQ7ZfzxdlCctEIfhGMKSXr6bJms76tQnXB/BxxXrJiw/46//SwkJoyJPE9VSH8IMjBYc+TkwxiO+fXGMJ9mvFZiqk+AipT0TKnQq8XRTmqEHGDuX2q69CbWAq3dj64K8ei6865yA5Pzo4XXK//66cLAjrsbzPQ9cnku1NxyG9qTC1SyTxaavkpUDi0A+SWT1Cyn/DGnxV1FPVJ+CNIeCI/ZtdVX3kVXEJm0Oakkml9nMp29erjapxrFUi4jv84OfDwwPJagactRUHxKLzLqjU24FWLUrl/S49e1jsxlGrd43JNt7IHrt8tJY2qximgmRiUD5NEF2dNVJvgDPys 4RyJeuj6 0tG3DIj/oDuTg5J2oQOXn4ZqI3dIi4p+ozDEGeo3es2EnisLWYgMacMJnHPSg9+eyxwiC/T6WUYJDj+wnsg4Q+PrXUp1P5R5ONCMMNEB99WMC4AaHqLL1NZjFsh1lIzmqM/ZtrjMFD1QPXRANdoqs6n7Jfj5zN14xXexJS75FASEDPDLj/FeNOAU2iw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/1/14 17:20, Vlastimil Babka wrote: > On 1/14/25 09:40, Michal Hocko wrote: >> On Mon 13-01-25 19:45:46, Andrew Morton wrote: >>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong wrote: >>> >>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc) >>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); >>>>>> else { >>>>>> struct task_struct *p; >>>>>> + int i = 0; >>>>>> >>>>>> rcu_read_lock(); >>>>>> - for_each_process(p) >>>>>> + for_each_process(p) { >>>>>> + /* Avoid potential softlockup warning */ >>>>>> + if ((++i & 1023) == 0) >>>>>> + touch_softlockup_watchdog(); >>>>> >>>>> This might suppress the soft lockup, but won't a rcu stall still be detected? >>>> >>>> Yes, rcu stall was still detected. > > "was" or "would be"? I thought only the memcg case was observed, or was that > some deliberate stress test of the global case? (or the pr_info() console > stress test mentioned earlier, but created outside of the oom code?) > It's not easy to reproduce for global OOM. Because the pr_info() console stress test can also lead to other softlockups or RCU warnings(not causeed by OOM process) because the whole system is struggling.However, if I add mdelay(1) in the dump_task() function (just to slow down dump_task, assuming this is slowed by pr_info()) and trigger a global OOM, RCU warnings can be observed. I think this can verify that global OOM can trigger RCU warnings in the specific scenarios. >>>> For global OOM, system is likely to struggle, do we have to do some >>>> works to suppress RCU detete? >>> >>> rcu_cpu_stall_reset()? >> >> Do we really care about those? The code to iterate over all processes >> under RCU is there (basically) since ever and yet we do not seem to have >> many reports of stalls? Chen's situation is specific to memcg OOM and >> touching the global case was mostly for consistency reasons. > > Then I'd rather not touch the global case then if it's theoretical? It's not > even exactly consistent, given it's a cond_resched() in the memcg code (that > can be eventually automatically removed once/if lazy preempt becomes the > sole implementation), but the touch_softlockup_watchdog() would remain, > while doing only half of the job?