From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9610DC3271E for ; Fri, 5 Jul 2024 13:08:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDC0F6B009B; Fri, 5 Jul 2024 09:08:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8BBB6B009D; Fri, 5 Jul 2024 09:08:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D05636B009E; Fri, 5 Jul 2024 09:08:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B012C6B009B for ; Fri, 5 Jul 2024 09:08:47 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 16A8CC19A1 for ; Fri, 5 Jul 2024 13:08:47 +0000 (UTC) X-FDA: 82305728694.18.7BB5F86 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf22.hostedemail.com (Postfix) with ESMTP id 38B11C0023 for ; Fri, 5 Jul 2024 13:08:44 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bfuIbGwa; spf=pass (imf22.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720184890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=zzMzMen4mMOH7WJ2OeAEXcey8lmVegvshZBU/sBgtQc=; b=NFrLyz93VvZYo8F5NQv0al0eQzomlgYA6KyK8Zr0DVf5X2H3yqPSv5PMYPxQaQHcImhzWP BYenuDy6brnSc+7g8QSTHs3NSsv3IBvomXb/Rk1BFkHsHYZW1pKAAi03TeGEpT4wuh7WFp bJNAeUxHfD/m9lW9WVNP5SFQinsNpzg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bfuIbGwa; spf=pass (imf22.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720184890; a=rsa-sha256; cv=none; b=tyXTkhEjlwJJe7PXebysr7oDsUGIeM+Gs4hgx0lDsiFOMR+6j2sXtBJPbJgLFVx9ptOa1n jbkjD4wttmS15nVlg3Qj5lqt6A44+XdydU/6bF14+37dEk2Yby2t+wPfz3tN+s94RT+XXD 6jzHrMBwLDdHHvhji24KD6LrHgomxMs= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1fb1ff21748so9162715ad.3 for ; Fri, 05 Jul 2024 06:08:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720184924; x=1720789724; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=zzMzMen4mMOH7WJ2OeAEXcey8lmVegvshZBU/sBgtQc=; b=bfuIbGwa4KR28N/DpdH7hJHPF5VvzWD9Pe+/+wOPRYdcHGJEWMkcwO86hEFGm6o0mk 9SoHUaczfHygqxvjZ9Kg8Tqnuwz7Wzv0lzssxVYPIcJ0w//NXRgEkiG9V4GIiD1jKUd0 aopsD5zzFG+K17Ga3a95IqAdy1Xq4FSmxWKoHrhBUFuHpG4i1Ge9ZivMm5Prl1R2CztO B2GZMJvy/OeCBiB1193UR+U8W2OrZ7NdDfMlIcHjlSs/+3DLjT49ozTEV1CTyROjozMC i+BGdEkDfl4a8iJI/46MCu+5sleo7EThGxoCHDOc95ToAINb9FwCJt42vWMl4j8wWRyI 0Egw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720184924; x=1720789724; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zzMzMen4mMOH7WJ2OeAEXcey8lmVegvshZBU/sBgtQc=; b=T8CvAEkaIb5v4N6NuXN2OwL4nYi0Weh8tlNfkdY+BP3CFFgTg/vmFwBYgi9oiYRxEp oNARVihQa7EVOKoWBpZOV73gVBoWLORCobM/7kzt47OW+pDcXM6ssWl4QEsEA9mOADVq 8OqgEFWHZIDb8puIfQ7fmxWNmxrxjm7mErRDvW2sJaYI1zF+tdlZSOLE4nExwYMuyZG0 /qE8MyDs4gGw7j1qFnzdsLL8LQGv9w3I28gaCIoalNqr950/xc/dsAMGSjwnCpxVWZmp LYf6AizA0O683pd4WMeqTi9AoYRSskAmhlpCJa0XfIRTcuU30MkyCY63Na8OHFV1kIGJ JUJg== X-Gm-Message-State: AOJu0YxJ5gFuv6aV88qRWWVkLuAX+boQAb7ifQLNQezaSv4LPxBhz8VE pM4Bzyc1Q74LaKNN+y9rEY/35U6x7ctSweTdMm/OVyGYKbdPcnsZ X-Google-Smtp-Source: AGHT+IGaqK9f9AEo/SmeQn/sKMsjzUFfQfwlv4YVRQtQOAWre9fLO6IoR+GX7FFZZF7pMJHWavIpHQ== X-Received: by 2002:a17:903:22c7:b0:1fb:1cc3:6482 with SMTP id d9443c01a7336-1fb33efc329mr29022335ad.45.1720184923843; Fri, 05 Jul 2024 06:08:43 -0700 (PDT) Received: from AHUANG12-3ZHH9X.lenovo.com (220-143-201-219.dynamic-ip.hinet.net. [220.143.201.219]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fac10c8b92sm140147185ad.22.2024.07.05.06.08.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 06:08:43 -0700 (PDT) From: Adrian Huang X-Google-Original-From: Adrian Huang To: Andrew Morton , Uladzislau Rezki , Christoph Hellwig Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Adrian Huang , Jiwei Sun Subject: [PATCH 1/1] mm/vmalloc: Add preempt point in purge_vmap_node() when enabling kasan Date: Fri, 5 Jul 2024 21:08:08 +0800 Message-Id: <20240705130808.1581-1-ahuang12@lenovo.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 38B11C0023 X-Stat-Signature: kad4dgbxohxnruf6xx7anib14r4fpq93 X-Rspam-User: X-HE-Tag: 1720184924-787600 X-HE-Meta: U2FsdGVkX199Wvi3X/ruRv10qzQzF/4rYaqap0RgBcNABfXuuus+Ukh3QBYY9ZAc/qSxqfdWKfGSunsI0dLyxODRDcaCdyvuVSkH5IOkZfRzmSIlimFwxkiDBERCvyN+pqnVx8gs4vU0FaiCoJJAoDYhrRIP27Ao6HBMBUUf/pcr0ZQeMbdDDEKW7X/Gz/6zq+SQSPzSewlUFGLM/IF8Gru43ZawP/GFlq5vhH06MGnHzGy8xUzFHUESEgiXR25kRHOqHSkmZcKRqyjSPrgKZTNG0otZoeW69hhcXx7PX2+/mvSGpPivm4JLed+O1UPvuCI6SUOh+qzJQcuRfRVrsMFFBwE9lG7mhuVqPvv7bdrQE2CnmRYX/WrBrQdU/ORLdtxSWACmWggfLbzsBAB/BRaHeuWGVlO5bZFVU8rpYiZiaAwXG04c0HNkZf8JJWGcys8Ha6FrygE73YvJWWY1Nx/LWG4FjTx/6dYLHgA0srXmuYLbx9E2HJl3odRMgYoNrWeLRS8zclCMFdV1OnAZy+ypKd/0rjnr8pDXw5pdkwk5us4d8RtbTEyhoONxep/FnmkE4PJub6x9J5Xci0pqadENHGl3huu3C4vL6/JjTiuOXoQA4KyVPfruL1+qNFReQ/XvWz9HI+XPHPaehTzlypPYoxOIUak9tsGYB3wPmYq4knmFTzIDwTuFhs4/dFx/MKx0VJWoTIm8cg6crb4Q5IbM9garS6qPtsnImcGcIq60Ykfp/QcU+VZTmpVDQRvrJyioqegGJN3P/TmGhQgwDW5DlkZi2xi4SQ2SoZhJzdqnTEEuWWvwylEMGsfx4RiZ9qTT8YZCMZdmT3rHGQiQocz9jdteX5ppfYDJvswbFelj3/H0U/UyMsqT6LqfeI+HwfXeT5pYddy/XErGX+YgSO9AjW6MP7eFfDSy+PqcQn3icY4+mn6MH7Oh7RTtyiyvpQgKYz6eoSALfSaApGU kqoVHjMD 6Pg8q3nht0IQLqLGRSt8HaUvYRyd/GpLEsxuVbohp+UJT48c3JCVjGM41ORrHDGljqSj11CpV96tcozwPvwVKry1yf5L+YA95yVLPnOL3ZxmgUsDxGRrhJr3W9JYXMqQT8E6b7mm+jrSERnc7NUmEq68x3OSsAvNMzB7Yw/IeQjSMw5uZCytSAaj90OHE+CcOskgH1eSCLMTsjqVLYdoEoztZHkaOwo5ektUrVF2U/dRa+cF8wzFFemjAhH9INlPICclPgA1/l7YEED95p4B5KuX5ADePTVZEH8tlX3XTB4KPOLiUYVBldaN3uhN4VgFH2nxAz7P0CHDIE7J/IADv3v+pRsCAYptQ2ZarJwbE92SYwnI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Adrian Huang When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drain_vmap_area_work RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: ? watchdog_timer_fn+0x2cd/0x390 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x300/0x6d0 ? sched_clock_cpu+0x69/0x4e0 ? __pfx___hrtimer_run_queues+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? ktime_get_update_offsets_now+0x7f/0x2a0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? hrtimer_interrupt+0x2ca/0x760 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0 ? sysvec_apic_timer_interrupt+0x6a/0x90 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? smp_call_function_many_cond+0x1d8/0xbb0 ? __pfx_do_kernel_range_flush+0x10/0x10 on_each_cpu_cond_mask+0x20/0x40 flush_tlb_kernel_range+0x19b/0x250 ? srso_return_thunk+0x5/0x5f ? kasan_release_vmalloc+0xa7/0xc0 purge_vmap_node+0x357/0x820 ? __pfx_purge_vmap_node+0x10/0x10 __purge_vmap_area_lazy+0x5b8/0xa10 drain_vmap_area_work+0x21/0x30 process_one_work+0x661/0x10b0 worker_thread+0x844/0x10e0 ? srso_return_thunk+0x5/0x5f ? __kthread_parkme+0x82/0x140 ? __pfx_worker_thread+0x10/0x10 kthread+0x2a5/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Debugging Analysis: 1. [Crash] The call trace indicates CPU #28 is waiting for other CPUs' responses by sending an IPI message in order to flush tlb. However, crash indicates the target CPU has responded. A. CPU #28 is waiting for CPU #2' response. crash> cfd_data | grep -w 28 [28]: ffff8883bc4469c0 crash> struct call_function_data ffff8883bc4469c0 struct call_function_data { csd = 0x4ca40, cpumask = 0xffff888103585e40, cpumask_ipi = 0xffff8881035858c0 } crash> struct __call_single_data 0x4ca40:a | grep ffff8883bb74ca40 [2]: ffff8883bb74ca40 B. CPU #2 has responded because the bit 'CSD_FLAG_LOCK' of u_flags is cleared. crash> struct __call_single_data 0xffff8883bb74ca40 struct __call_single_data { node = { ... { u_flags = 0, a_flags = { counter = 0 } }, ... }, func = 0xffffffff8117b080 , info = 0xffff8883bc444940 } C. CPU #2 is running userspace application 'nm'. crash> bt -c 2 PID: 52035 TASK: ffff888194c21ac0 CPU: 2 COMMAND: "nm" #0 [ffffc90043157ea8] crash_nmi_callback at ffffffff81122f42 #1 [ffffc90043157eb0] nmi_handle at ffffffff8108c988 #2 [ffffc90043157f10] default_do_nmi at ffffffff835b3460 #3 [ffffc90043157f30] exc_nmi at ffffffff835b3630 #4 [ffffc90043157f50] asm_exc_nmi at ffffffff83601573 RIP: 00007f6261b90d38 RSP: 00007ffe4d1cb180 RFLAGS: 00000202 RAX: 0000000000000001 RBX: 6e6f2d7865646e69 RCX: 00007f626281f634 RDX: 00007f6262ba7f67 RSI: 00007f626265f65e RDI: 00007f62648f8a30 RBP: 2d6f746c6e696874 R8: 00007f62618a4cc0 R9: 0000000000000001 R10: 00007f627233e488 R11: 00007f627233d768 R12: 000055bee0ff4b90 R13: 000055bee0fac650 R14: 00007ffe4d1cb280 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b D. The soft lockup CPU iteratively traverses 128 vmap_nodes (128 bits are set) and flushes tlb. This might be the time-consuming work. crash> p /x purge_nodes $1 = { bits = {0xffffffffffffffff, 0xffffffffffffffff, 0x0, ...} 2. [Ftrace] In order to prove that the soft lockup CPU spends too much time iterating vmap_nodes and flushing tlb when purging vm_area structures, the ftrace confirms the speculation (Some info is trimmed). kworker: funcgraph_entry: | drain_vmap_area_work() { kworker: funcgraph_entry: | mutex_lock() { kworker: funcgraph_entry: 1.092 us | __cond_resched(); kworker: funcgraph_exit: 3.306 us | } ... ... kworker: funcgraph_entry: | flush_tlb_kernel_range() { ... ... kworker: funcgraph_exit: # 7533.649 us | } ... ... kworker: funcgraph_entry: 2.344 us | mutex_unlock(); kworker: funcgraph_exit: $ 23871554 us | } The drain_vmap_area_work() spends over 23 seconds. There are 2805 flush_tlb_kernel_range() calls in this ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called in kasan_release_vmalloc(). 3. Extending the soft lockup time can work around the issue (For example, # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the above-mentioned speculation: drain_vmap_area_work() spends too much time. Commit 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock") employs an effective vmap node logic. However, current design iterates 128 vmap_nodes and flushes tlb by a single CPU if vmap_lazy_nr < 2 * lazy_max_pages(). When enabling kasan, this might trigger the soft lockup because additional tlb flushes of kasan vmalloc spend much more time if 128 vmap nodes have the available purge list. Fix the issue by adding preempt point in purge_vmap_node() when enabling kasan. Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock") Signed-off-by: Adrian Huang Reviewed-and-tested-by: Jiwei Sun --- mm/vmalloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d0cbdd7c1e5b..380bdc317c8d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2193,6 +2193,15 @@ static void purge_vmap_node(struct work_struct *work) struct vmap_area *va, *n_va; LIST_HEAD(local_list); + /* + * Add the preemption point when enabling KASAN. Each vmap_area of + * vmap nodes has to flush tlb when releasing vmalloc, which might + * be the time-consuming work if lots of vamp nodes have the + * available purge list. + */ + if (kasan_enabled()) + cond_resched(); + vn->nr_purged = 0; list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { -- 2.34.1