From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 344F1E77188 for ; Tue, 24 Dec 2024 23:06:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 456526B0082; Tue, 24 Dec 2024 18:06:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 405EF6B0083; Tue, 24 Dec 2024 18:06:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CCAB6B0085; Tue, 24 Dec 2024 18:06:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 09F886B0082 for ; Tue, 24 Dec 2024 18:06:57 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 860F41A025B for ; Tue, 24 Dec 2024 23:06:56 +0000 (UTC) X-FDA: 82931388834.11.257F8BB Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf14.hostedemail.com (Postfix) with ESMTP id A5EC610001A for ; Tue, 24 Dec 2024 23:06:10 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=agH4IcoM; spf=pass (imf14.hostedemail.com: domain of rientjes@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735081588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ak3ee7or6kKgRYECOqwoqGPVQoP0Zrj+7r83JVxR2dk=; b=nVMYhB+uxfvnRk4v2LRmhMl5cgBzfOaRwI+fZv16ortXGmnrlzSVaIE4LHR5kWgUeQC89a aVdGDToOHdhq3u7wsMHKzNJJ3Hu3e0ehOVP7ovY3qSzd78f31Fxa6PYvHePd0Rqs1oh/gC QKY870pySrPlc28654K4qXcCZXmPVnw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735081588; a=rsa-sha256; cv=none; b=t7CLhYTKZVNlA/2ZlKCUupvRwJ6ZUdInRg0kFJxVn0XXvME4tFSmU2ltyxaIZ/kwRvXdOx Dp8Mod45rKwN55psdXefRMdG6L57tAhuXKxfNcNqLXIbDRrqAGHWa4FhNNeIbf4sDxwX8Z wZ+BE4QuxnI/CfQxj9FFTJV7EhkjHmY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=agH4IcoM; spf=pass (imf14.hostedemail.com: domain of rientjes@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-219f6ca9a81so326865ad.1 for ; Tue, 24 Dec 2024 15:06:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1735081613; x=1735686413; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ak3ee7or6kKgRYECOqwoqGPVQoP0Zrj+7r83JVxR2dk=; b=agH4IcoMnRVkp+W27GaGuVz3Mo9kNRYw3YzfmRYAnrjSmGipTLsKsjT/KaZZIfb/HM q9i9Eu12xdguwSV6XLNZzUs09ZBO2IUxKsjOfyqCcIifcv1YSKDjYxLAePtxMJuatBf5 Qz6Ime9J5JkLHK+n7uBWZdmwlvzx5Fql2OAEBz0VaimDULkBT9MK8koGzUGvMimQsd7z 4xvrRp3nX+A11PQzg1o0ZtH7yu0uuVKrGoojRqSEgxCNEkKaa/LLWvshBtu13ZpBlMmE RpeBg1BBCsnNkepWUuKgQ3eT/27R29ba3mVjTJ25jFIbYHbH+lg2HYGAfA6QU7gfGp0q 7oLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735081613; x=1735686413; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ak3ee7or6kKgRYECOqwoqGPVQoP0Zrj+7r83JVxR2dk=; b=CagBCDNWo/U8SjDnIM8P/VUdErY1OsNVQlEOIh2bYcSCo59hk+pyhdnFMTwrOAXmUF TzIM4erEtrIAJmO63wvmUXrjsRPtW77KXa4fvQGW9SVkjEVVxJZwzKJV06YG3KTWD6q8 0plUlO644ofJndg1A29rvfSob1pvz0mKDycmTjsAywzBCAFygPCHMKLTIGX1LKJ3O/jB aVwDwGsY03tExZLudG6H9adYHnO3/t4RT9crwnFvYu7N1K9Ic1fVIHzY8w/qnjMXDeT3 djQYIQNqY1kAnYi3o/mzHhlpLie9WZIiD6SBPxLOCSyJPgi//D1s8DAMSftyo/erUUg9 PP2Q== X-Forwarded-Encrypted: i=1; AJvYcCW7/SV1Tpj1ErFZ0Pclf7XBE/Gi1+A0Nrj6+xmrREu2Inu0PQclxvRSJVczMmvQpknPkvpxHrQ+3Q==@kvack.org X-Gm-Message-State: AOJu0YyQbV9hPsGtuL6QcRlogh7SjenX34CZHMSL4TXXhlqmSJRoNaWO ZNrgP1nzm4xzqx5AGsklAjVbWaQ/H6g3gkR4AYyvQFERqL+znhdMesHDU9MmOg== X-Gm-Gg: ASbGncvzsFIk73YBAKgs4PQ28M3yy5RUhxQJOIxO39vFTHC/M0fyEv/qdlJkf+byPPv n/bmiwcgoCUroTXBM2e9Wdf6ZRlPbKbb/oIbbQ8tvc0UCI5I3MrYsceH/RztYiYqApaAFSWnwL+ D9ztaAf16NwzFmsXHI0dE1zBeNxqZKvLolfPYrzsF/aWMdSk6+mrQIKPQRQrWxWIk4GcRfuY22O bKcXICksqSM3wYceMyXGQrrkvNIyBdfXTcSv9hMRBfMtlazHkNh9Da5J/ALLg4AH8g34nx12y4g UXu7xRYLCUw= X-Google-Smtp-Source: AGHT+IHNmnc4TcyjcS/dyfTNay6CvJR0QQupUFNVFLYTOakUBs5j+08Uhhl7AqaGCg9dls9Cud8pag== X-Received: by 2002:a17:903:110c:b0:215:9d57:cf0c with SMTP id d9443c01a7336-219e76bd1e5mr8148105ad.5.1735081613027; Tue, 24 Dec 2024 15:06:53 -0800 (PST) Received: from [2620:0:1008:15:cdf5:75d7:1c62:d741] ([2620:0:1008:15:cdf5:75d7:1c62:d741]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc964e60sm95109615ad.16.2024.12.24.15.06.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Dec 2024 15:06:52 -0800 (PST) Date: Tue, 24 Dec 2024 15:06:51 -0800 (PST) From: David Rientjes To: Chen Ridong cc: akpm@linux-foundation.org, mhocko@kernel.org, hannes@cmpxchg.org, yosryahmed@google.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, davidf@vimeo.com, vbabka@suse.cz, handai.szj@taobao.com, kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com Subject: Re: [PATCH v3] memcg: fix soft lockup in the OOM process In-Reply-To: <20241224025238.3768787-1-chenridong@huaweicloud.com> Message-ID: <8cf29751-7c71-52ff-5492-0019ca7b0e02@google.com> References: <20241224025238.3768787-1-chenridong@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A5EC610001A X-Stat-Signature: g89bf7szusuy31g6mofi3bdqxbqfwmhx X-Rspam-User: X-HE-Tag: 1735081570-47272 X-HE-Meta: U2FsdGVkX19nEWGvqQMWgeuXjzPe9NCDbprPHyLCJzW5uKiwT76cAKrnHmGPrx0JHRSYZYxlma2fqEsFeZTbq2D1waWhl3HSw6MSxLssaDR1Q9UlxL8gBcOL3D1sxbTyJU5ud8A8q0GGMmtBAr+O+r+z5IudIMF+RKYvRXJHO98hyWhFHh9wSUI4Tt2vTQoqZGBuBWlJtlXAg2lWVDn4tevoT+mrl6ppUm0H+PBL3OjRqiiNUeEU9NHMyWcfrgFnVmP0XEL0BioMNNsCwe+efLKC8xCoD6WfwZGso9OE5KeEaWpY00c3bsgmuDrBeSJMPPw3f2i7JpbAKWDeFHCvJ2q9R+bbR8qtgjqWbOU3P2VWZJWcPeGuMZJwMuK/0SpMXuaz7GbhzURGM3tZuxqEBWPq00vl6Na9eFgJSAvYV3bokeqTPvHNntKs/DCYIH9UmilW2ttnwkA9d03O3rWh/wZussgwDQiFiwGv3E9q23NbT4flnO1KGQktT140sLm9PeoRaA4vSpo0cTrTIC1Wa5p7OEnW8LFSvjvTpOnkuw0CBpxtUJg2qYMJcarJnknXlJJ3Rn3Rminbf/J1znKVJ8eSAbyKlq3D8NjmKewyhCdNYTWh3dQ+fxY/36COpozRyYbK+2t7io/uX2CLP7DfFcOI6pQzMiLRVYSLHjPOy8Ey1isVJdH+NOkFSKhDA+j2CHLxSY58FWN6dkT/VUfZCv8iSXFZquXRHCHh/Kl8ye5rdVUkflO6uxikECzmarlbInM/AZwdryv2i2qRDgBuVO+L6hB3+CE5NcR2o+gOwUEGO8GywUWZaYPG5HCfSTvylNqDbTo8G65g1YkCXT7GYvBlnmakxr/2IZCABTuFv5VpwxF0tqMIoUU51p6Z2eeoD+CAx61AN6x7HEqcmOuUhyiXBSLWqgU4h+CpJmFjuQ5opF2Iy4r8ZJOmVyhDhtNUDhcpXhywFpBhGp0AgEI kWxZB3LR +ZSI8TIAdkqPkS6TyvPB+Y8HlQd5Z4Sc3rN1BdURGp/ocUvpWT042uHX5ckqCtrYbdQ79sK6vZylEB30C5ZFwHKp5PfJsTrJeADuNW1tuCtt59dxirGvFDZghglu1Fh6Qn3Bhr2B52vAoeCuZxGTs7jX9Se2EnSXsLuasvnqWBpm3UtbifO3CnavaDQG7/C1rHZawS3fvSXaiIJYvUsaTOyv/0m+mWLN4X+FbRofGAMOw0VJ6Flszk5FH99XjWpnw7IbMXMZMtSV7Qi1J1PtZUW9FSuhtvp45d+fAwTQdbnLLnbMKfrm1pm0yV1vPq8YZ3CTspKJdWQ4w3+xA4LcjcJei+jfB3gEPwfYQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 24 Dec 2024, Chen Ridong wrote: > From: Chen Ridong > > A soft lockup issue was found in the product with about 56,000 tasks were > in the OOM cgroup, it was traversing them when the soft lockup was > triggered. > > watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] > CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G > Hardware name: Huawei Cloud OpenStack Nova, BIOS > RIP: 0010:console_unlock+0x343/0x540 > RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff > RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 > RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 > R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 > R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > vprintk_emit+0x193/0x280 > printk+0x52/0x6e > dump_task+0x114/0x130 > mem_cgroup_scan_tasks+0x76/0x100 > dump_header+0x1fe/0x210 > oom_kill_process+0xd1/0x100 > out_of_memory+0x125/0x570 > mem_cgroup_out_of_memory+0xb5/0xd0 > try_charge+0x720/0x770 > mem_cgroup_try_charge+0x86/0x180 > mem_cgroup_try_charge_delay+0x1c/0x40 > do_anonymous_page+0xb5/0x390 > handle_mm_fault+0xc4/0x1f0 > > This is because thousands of processes are in the OOM cgroup, it takes a > long time to traverse all of them. As a result, this lead to soft lockup > in the OOM process. > > To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' > function per 1000 iterations. For global OOM, call > 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. > > Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") > Signed-off-by: Chen Ridong Looks fine to me, although we do a lot of processes traversals for oom kill selection as well and this hasn't ever popped up as a significant concern. We have cases far beyond 56k processes. No objection to the approach, however.