From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13517C54ED1 for ; Tue, 27 May 2025 10:04:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80A806B0088; Tue, 27 May 2025 06:04:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E79A6B008C; Tue, 27 May 2025 06:04:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F86D6B0095; Tue, 27 May 2025 06:04:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5170E6B0088 for ; Tue, 27 May 2025 06:04:31 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BECFCE785B for ; Tue, 27 May 2025 10:04:30 +0000 (UTC) X-FDA: 83488253100.25.981B9A2 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf26.hostedemail.com (Postfix) with ESMTP id CC577140009 for ; Tue, 27 May 2025 10:04:28 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of breno.debian@gmail.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=breno.debian@gmail.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748340268; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TFIfIK4ADX5O+SVrFsZMZF7P/e0zVh2UBTpMLSIjLOg=; b=zcgCB4mECbPQrPlU8kdSv4lBkOTvuWc3awGsdejl1PvAR/sOkOu0Kbb2kzqXTYOImiGO+i n9eK7MeehdRfTK5tHGbSs2giukeXV4YtW56G4bXJ/AOXpVLeqfElo4QrE4KoYqwQMAlLrL FID6caZqpAhFrF6HCAy224EQljl5Dwo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of breno.debian@gmail.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=breno.debian@gmail.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748340268; a=rsa-sha256; cv=none; b=lGLM2k0t8FUXEQWjdCvOOWVMDDf4ogwXHnGgzpsic+mR+LFANsWmtDRpxRapPllT9JoslR 9BZsIjYJxDXcOyWuPMasw2eN80G8XkB45tJCc/51N0nk7+lMCRQXaPNhV5UOtoKrBfFpyQ eyb7vmevQuEwuFgLevUHmdgr7OLnZEQ= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-604533a2f62so4923025a12.3 for ; Tue, 27 May 2025 03:04:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748340267; x=1748945067; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TFIfIK4ADX5O+SVrFsZMZF7P/e0zVh2UBTpMLSIjLOg=; b=l2t9SGYcvbZW/s8Z2kJfgWgzTUpccr8d4K0cUpwW7CL+8KYM5zKqTWsTmVjNpbDuyh RoSKmq+jXquiGaHfoK7WlRTeHhUfkB2OR639GcvARCGNGfL9koYv0Ic6JOOLr9/zdHWz TOOCOojjuvrG5wGMPcHbWFuUEOpKb0rl3Vvy8YruGOK33CdIX4kPRGteqrpVJbi+ARte DrLmeHRlCArdxIOJxVsy16kpSYTrrfWAQ26qz0y53NOpZPYwM1NZIaEetj29869lT/k3 auuY8H6f+F3pHvPqcTahdh8lIZHrZBgX0ZdkvXvdtXWlXMdrK1ndVXVSXj1CbhOoOoiw E7Sg== X-Forwarded-Encrypted: i=1; AJvYcCWCit7SPXthAtQZmUBOuyh4JcXBMAkTUwIsUjQwzcJrN8IMbDzdrqBCWsblaQReUqSX2af4yVvCcA==@kvack.org X-Gm-Message-State: AOJu0YyEfO3AWMW4HQx8eTkF/mF0NW4BhZbb2y3aM4L9ZGXAWZsdO80X I0Rz91GEMGmAMEOdO2UxaSBrr5xCa+qJSP+ZfTnzTI/XJOTtgP5OCKCr X-Gm-Gg: ASbGncvvdUtCtUTuyi08a24/AIUr7DNMqoxOLldEzQbzqTEUHKLNVLfujGX0pxEHjYZ LsDDzNWIxFhOUihcTp3wAA7FzjQnmYFBclhSduh/1rIBi0tvfaz/m79cPhgUaq3fCuoU12xu2RI BQM6MAYM9b3IElToI7jEOPICo2Y3M66UClzKT4hj/He5Odmdjb/EnkHgutoW7GCGOwPA8+GJo2u YLUAJW8NKMjeuoLcxzOLON2uTH8jdcxptfD6Ze3CWvG6qHQarwUVoan3udw9JMh2gNrFjw0v7dq K6xZ+lRSdf3y15Igj5o0MewCw2Y2fqK0xVevH7ZArSM= X-Google-Smtp-Source: AGHT+IE2BWkUJ4uvhTni0eOx9zKK0hJ92IBH7mDXXjjatUyvvKCqhQPHMIQQV9Fk6+wWbr9kfJbJfw== X-Received: by 2002:a17:907:3e13:b0:ad8:9257:5731 with SMTP id a640c23a62f3a-ad89257599dmr134142166b.16.1748340266716; Tue, 27 May 2025 03:04:26 -0700 (PDT) Received: from gmail.com ([2a03:2880:30ff:41::]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ad88b1195e6sm107707066b.120.2025.05.27.03.04.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 May 2025 03:04:26 -0700 (PDT) Date: Tue, 27 May 2025 03:03:34 -0700 From: Breno Leitao To: Shakeel Butt Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Chen Ridong , Greg Kroah-Hartman , Michal Hocko , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Michael van der Westhuizen , Usama Arif , Pavel Begunkov , Rik van Riel Subject: Re: [PATCH] memcg: Always call cond_resched() after fn() Message-ID: References: <20250523-memcg_fix-v1-1-ad3eafb60477@debian.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CC577140009 X-Stat-Signature: kkjbeqcc8ki7apu3e9xjk1aq7bs9eb1o X-Rspam-User: X-HE-Tag: 1748340268-70946 X-HE-Meta: U2FsdGVkX1/QBSYLCT0EPPFk4r8+Kq6W0keMlkQomnwWuVQdvmB/qKeuzzOH1FCTs6tGJjuHNJOj4V8N8weLPmvS5cQJAsNfHJt4mRw5gIetC092TpUmYxG97FnDdbSFJWJ7GUSV2ynOUgugU+78wIL7h6bztkVwCn8kMu/RxarVpzB0EYJgCPiSoZGZRfFpdtEPzUlHYVH/fLXQ+E0aWxNhooxve2llqP2sYA9LRfFxlms8RDKcy9Fj4u3Q7ALEd1Yp+TTpnglOudKLQwL9DOEvLv12ZXSvjEXDLQ1d8E9+IFfQ8SGWOfHE/1IX2/OhIFlm49Pttmdgcxxlgj9LXmBfHe/oSkoyC2MlEJpFqpDouAthM53IBtF8STEFwEPJcSFcbgwN1bzA2/+VowNpQfmDwtLF9Vzba9lvddH5FsDRWokwkIufA4VBRziLG0/BlX8PzTjXtWCoXX49sVkM2prVWBL7iGeW1QnLAmAWFJ+93XlW/Cb9UI5OujKuaIzz2vTr7ATh28Z7jgsFzjhKAh6pEgjMeBGHcqeuSh6a+An27rbi/yrfig8wlgSGr/to8dbmu64qYM7nkDPeNXudaxvb+ENPl2Osg2cSj1InKc1wqQJfN5s81vwkYj4kAT+P9sp91DLlh+R97HSD7G3rpO9ctyKtHli2WUyr/bmvTmZbHO3EutbiHrFIJ9UcmQe+NtxZnO/KFBuZl5XLqAD7J5+fHGbcRLahN9+vyqWfEXPRL2JLVdVbV/0B1HCuSNeMwRtnyzHSg7+ApHEm2BwSn3ReY2ur6ThoslGq0TeWh5xj/ngsZahIY2/KGdWXQX3k6418RdRyMbgmSgnpPOXG8qSavitXHQzsZ5aCO8b5vySfC4bHPK67U40cWuFgcMCXTWAeZWWb51EjwqgC/IFDxkb/g4EwYZaki583J/Qj6r/6YibmZ8ZlZskSv6DfPBFozxU+95cbikBvM1/L4AR f89Pmdp7 pDmGvC26eeBRyeqEYricZlTzj+fn6CsLgEPoqzySoDGQ9IBej0gbRga7GNdZJZWZKnXwbqdQRc/cN4jLBrv4hNf+qYkM3MeRbKpybt/2TQkivewIBI2vlt10dSHbgoJw2DUYMDIixYOhfrIyaRi8XPEYqse4lhcigEGJThsv53QBGsYQ2Ge4oM0ia0Ncj4ckm33+J4MbKfM0LGXNWAXoTKqbBnUIGSaBCzcWlHuoiSucyAg4L99j3EFzua3e4m02whk3i+hhKza8+IyTpvVKuOUdnK6nBNpza59vSL3YoTUpi1O6EPnPVMfj6UmFNdMzoQk1sfiuVWCOWLqLcnHAzGU0dQatU5KQ03oa62561K6eSqUjH8xheQsPdm0N0S0KBMwknuL2Qlwm6bZM7FpidKfdvWDcYq0FDVGIXRqe0GfABJ0D9CRoNEVerBcXIu45p+QDvRjAOlbLRy9lH9QtjoPm+STBhIuQfBkeDNPJwkZu1OWCDw0pzWNNbY8BW2JdSzVQjAURuQNc9nwnyMOJu1wbDdtz+kCLWMwqCSlX5+jZxmGwzQU1LXNhP9PWRyGviaGskLyAOWqsxZ9F8pn8mrPRW+WTla19L5btxHqXl6dwTjfQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello Shakeel, On Fri, May 23, 2025 at 11:21:39AM -0700, Shakeel Butt wrote: > On Fri, May 23, 2025 at 10:21:06AM -0700, Breno Leitao wrote: > > I am seeing soft lockup on certain machine types when a cgroup > > OOMs. This is happening because killing the process in certain machine > > might be very slow, which causes the soft lockup and RCU stalls. This > > happens usually when the cgroup has MANY processes and memory.oom.group > > is set. > > > > Example I am seeing in real production: > > > > [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... > > .... > > [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... > > [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] > > .... > > > > > Quick look at why this is so slow, it seems to be related to serial > > flush for certain machine types. For all the crashes I saw, the target > > CPU was at console_flush_all(). > > > > In the case above, there are thousands of processes in the cgroup, and > > it is soft locking up before it reaches the 1024 limit in the code > > (which would call the cond_resched()). So, cond_resched() in 1024 blocks > > is not sufficient. > > > > Remove the counter-based conditional rescheduling logic and call > > cond_resched() unconditionally after each task iteration, after fn() is > > called. This avoids the lockup independently of how slow fn() is. > > > > Cc: Michael van der Westhuizen > > Cc: Usama Arif > > Cc: Pavel Begunkov > > Suggested-by: Rik van Riel > > Signed-off-by: Breno Leitao > > Fixes: 46576834291869457 ("memcg: fix soft lockup in the OOM process") > > Can you share the call stack but I think from the above, it seems to be > from oom_kill_memcg_member(). Sure, this is what I see at the crash time: [73963.996160] Memory cgroup out of memory: Killed process 177737 (adb) total-vm:24896kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:0 [73964.026146] Memory cgroup out of memory: Killed process 177738 (sh) total-vm:8064kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:0 [73964.055784] Memory cgroup out of memory: Killed process 177739 (adb) total-vm:24896kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:0 [73964.085773] Memory cgroup out of memory: Killed process 177740 (sh) total-vm:8064kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:0 [73964.115468] Memory cgroup out of memory: Killed process 177742 (adb) total-vm:24896kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:0 [73964.145375] watchdog: BUG: soft lockup - CPU#20 stuck for 26s! [node-linux-arm6:159076] [73964.145376] CPU#20 Utilization every 4s during lockup: [73964.145377] #1: 4% system, 0% softirq, 0% hardirq, 0% idle [73964.145379] #2: 4% system, 0% softirq, 0% hardirq, 0% idle [73964.145380] #3: 4% system, 0% softirq, 0% hardirq, 0% idle [73964.145380] #4: 4% system, 0% softirq, 0% hardirq, 0% idle [73964.145381] #5: 6% system, 0% softirq, 0% hardirq, 0% idle [73964.145382] Modules linked in: vhost_vsock(E) ghes_edac(E) bpf_preload(E) tls(E) tcp_diag(E) inet_diag(E) sch_fq(E) act_gact(E) cls_bpf(E) ipmi_ssif(E) ipmi_devintf(E) crct10dif_ce(E) sm3_ce(E) sm3(E) sha3_ce(E) nvidia_cspmu(E) sha512_ce(E) sha512_arm64(E) arm_smmuv3_pmu(E) arm_cspmu_module(E) coresight_trbe(E) arm_spe_pmu(E) coresight_stm(E) coresight_tmc(E) ipmi_msghandler(E) coresight_etm4x(E) stm_core(E) coresight_funnel(E) spi_tegra210_quad(E) coresight(E) cppc_cpufreq(E) sch_fq_codel(E) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) tap(E) mpls_gso(E) mpls_iptunnel(E) mpls_router(E) fou(E) acpi_power_meter(E) loop(E) drm(E) backlight(E) drm_panel_orientation_quirks(E) autofs4(E) efivarfs(E) [73964.145407] CPU: 20 UID: 0 PID: 159076 Comm: node-linux-arm6 Kdump: loaded Tainted: G E [73964.145409] Tainted: [E]=UNSIGNED_MODULE [73964.145410] Hardware name: ..... [73964.145411] pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [73964.145412] pc : console_flush_all+0x3bc/0x550 [73964.145418] lr : console_flush_all+0x3b4/0x550 [73964.145419] sp : ffff80014baceff0 [73964.145420] x29: ffff80014bacf040 x28: 0000000000000001 x27: ffff800082d2d008 [73964.145421] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00017bd00000 [73964.145422] x23: 0000000000000000 x22: 0000000000000084 x21: ffff80008253a718 [73964.145423] x20: ffff800082526b08 x19: ffff80014bacf0ac x18: 0000000000000018 [73964.145424] x17: 0000000000000058 x16: 00000000ffffffff x15: 0000000000003638 [73964.145425] x14: 0000000000000001 x13: 65636f7270206465 x12: 6c6c694b203a7972 [73964.145427] x11: ffff800083086000 x10: ffff80008308625c x9 : 00000000ffffffff [73964.145428] x8 : 0000000000000000 x7 : 205d383634353131 x6 : 312e34363933375b [73964.145429] x5 : ffff8000830864f7 x4 : ffff80014bacee7f x3 : ffff8000803ce390 [73964.145430] x2 : 00000000000000aa x1 : 0000000000000000 x0 : 0000000000000000 [73964.145432] Call trace: [73964.145433] console_flush_all+0x3bc/0x550 (P) [73964.145434] console_unlock+0x90/0x188 [73964.145437] vprintk_emit+0x3c8/0x560 [73964.145439] vprintk_default+0x3c/0x50 [73964.145441] vprintk+0x2c/0x40 [73964.145443] _printk+0x50/0x68 [73964.145444] __oom_kill_process+0x36c/0x5f8 [73964.145445] oom_kill_memcg_member+0x54/0xb8 [73964.145447] mem_cgroup_scan_tasks+0xa4/0x190 [73964.145449] oom_kill_process+0x124/0x290 [73964.145450] out_of_memory+0x194/0x4b8 [73964.145451] mem_cgroup_out_of_memory+0xcc/0x110 [73964.145452] __mem_cgroup_charge+0x5d8/0x9e0 [73964.145454] filemap_add_folio+0x44/0xe0 [73964.145456] alloc_extent_buffer+0x2a8/0xaa8 [73964.145458] read_block_for_search+0x204/0x308 [73964.145460] btrfs_search_slot+0x5bc/0x998 [73964.145463] btrfs_lookup_file_extent+0x44/0x58 [73964.145465] btrfs_get_extent+0x130/0x900 [73964.145467] btrfs_do_readpage+0x2d8/0x798 [73964.145469] btrfs_readahead+0x64/0x198 [73964.145470] read_pages+0x58/0x370 [73964.145471] page_cache_ra_unbounded+0x218/0x260 [73964.145473] page_cache_ra_order+0x2d0/0x338 [73964.145474] filemap_fault+0x418/0xe68 [73964.145475] __do_fault+0xb0/0x270 [73964.145476] do_pte_missing+0x73c/0x1148 [73964.145477] handle_mm_fault+0x2c8/0xeb8 [73964.145478] do_translation_fault+0x250/0x820 [73964.145479] do_mem_abort+0x40/0xc8 [73964.145481] el0_ia+0x60/0x100 [73964.145483] el0t_64_sync_handler+0xe8/0x100 [73964.145484] el0t_64_sync+0x168/0x170 [73964.145486] Kernel panic - not syncing: softlockup: hung tasks [73964.145487] CPU: 20 UID: 0 PID: 159076 Comm: node-linux-arm6 Kdump: loaded Tainted: G EL [73964.145489] Tainted: [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP [73964.145490] Call trace: [73964.145491] show_stack+0x1c/0x30 (C) [73964.145492] dump_stack_lvl+0x38/0x80 [73964.145494] panic+0x11c/0x370 [73964.145496] watchdog_timer_fn+0x5c0/0x968 [73964.145498] __hrtimer_run_queues+0x100/0x310 [73964.145500] hrtimer_interrupt+0x110/0x400 [73964.145502] arch_timer_handler_phys+0x34/0x50 [73964.145503] handle_percpu_devid_irq+0x88/0x1c0 [73964.145505] generic_handle_domain_irq+0x48/0x80 [73964.145506] gic_handle_irq+0x4c/0x108 [73964.145507] call_on_irq_stack+0x24/0x30 [73964.145509] do_interrupt_handler+0x50/0x78 [73964.145511] el1_interrupt+0x30/0x48 [73964.145512] el1h_64_irq_handler+0x14/0x20 [73964.145513] el1h_64_irq+0x6c/0x70 [73964.145514] console_flush_all+0x3bc/0x550 (P) [73964.145515] console_unlock+0x90/0x188 [73964.145517] vprintk_emit+0x3c8/0x560 [73964.145517] vprintk_default+0x3c/0x50 [73964.145519] vprintk+0x2c/0x40 [73964.145520] _printk+0x50/0x68 [73964.145521] __oom_kill_process+0x36c/0x5f8 [73964.145522] oom_kill_memcg_member+0x54/0xb8 [73964.145524] mem_cgroup_scan_tasks+0xa4/0x190 [73964.145525] oom_kill_process+0x124/0x290 [73964.145526] out_of_memory+0x194/0x4b8 [73964.145527] mem_cgroup_out_of_memory+0xcc/0x110 [73964.145528] __mem_cgroup_charge+0x5d8/0x9e0 [73964.145530] filemap_add_folio+0x44/0xe0 [73964.145531] alloc_extent_buffer+0x2a8/0xaa8 [73964.145532] read_block_for_search+0x204/0x308 [73964.145534] btrfs_search_slot+0x5bc/0x998 [73964.145536] btrfs_lookup_file_extent+0x44/0x58 [73964.145538] btrfs_get_extent+0x130/0x900 [73964.145540] btrfs_do_readpage+0x2d8/0x798 [73964.145542] btrfs_readahead+0x64/0x198 [73964.145544] read_pages+0x58/0x370 [73964.145545] page_cache_ra_unbounded+0x218/0x260 [73964.145546] page_cache_ra_order+0x2d0/0x338 [73964.145547] filemap_fault+0x418/0xe68 [73964.145548] __do_fault+0xb0/0x270 [73964.145549] do_pte_missing+0x73c/0x1148 [73964.145549] handle_mm_fault+0x2c8/0xeb8 [73964.145550] do_translation_fault+0x250/0x820 [73964.145552] do_mem_abort+0x40/0xc8 [73964.145554] el0_ia+0x60/0x100 [73964.145556] el0t_64_sync_handler+0xe8/0x100 [73964.145557] el0t_64_sync+0x168/0x170 [73964.145560] SMP: stopping secondary CPUs [73964.145803] Starting crashdump kernel... [73964.145804] Bye! > Have you tried making __oom_kill_process > not chatty? Basically instead of dumping to serial directly, use local > buffer and then dump once it is full. Not sure I followed you here. __oom_kill_process is doing the following: static void __oom_kill_process(struct task_struct *victim, const char *message) { ... pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n", Would you use a buffer to print to, and them flush it at the same time (with pr_err()?) > Anyways, that would be a bit more involved and until then this seems > fine to me. Agree. Thanks for the review.