From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2A83C3DA5D for ; Mon, 22 Jul 2024 11:51:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 74A536B008A; Mon, 22 Jul 2024 07:51:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FB006B008C; Mon, 22 Jul 2024 07:51:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54CF56B0092; Mon, 22 Jul 2024 07:51:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2CFA66B008A for ; Mon, 22 Jul 2024 07:51:03 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A686281736 for ; Mon, 22 Jul 2024 11:51:02 +0000 (UTC) X-FDA: 82367222364.01.5658436 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf15.hostedemail.com (Postfix) with ESMTP id A8B9FA0028 for ; Mon, 22 Jul 2024 11:51:00 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fGa/HG9C"; spf=pass (imf15.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721649037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hHBpaWmjrNZ1mUi3zmdyGdS7Vt8ZJEl138881JB/3WE=; b=eNLhii9MuGNtpDTU8Yh8qDhYCxwRQHhJAM/TmeOIUWK+IxDvSHlape9qaEBvykWRxhhBfg YGhIjM8cYvOOm0A4WbJNtUYHUxOs46Sny6HdljK/8i+xelxmsbaTpechktQHBu5G5hBN04 8f2MsrJB/JBVMmp3CYjpS5LC2FEinro= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fGa/HG9C"; spf=pass (imf15.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721649037; a=rsa-sha256; cv=none; b=AdvJEvqcPHy0m2OSWDXCBrCz70fF/s0y0cFwjpugQG8FMl8Q9jrcpBu1OFXD2jiRE6bQGS mPnVsnSxzSUCOyNhjoEWBolT7ixSCLUHMcyKqprTvyiMCpEYDQvutcbAkdH4/8kE8y4Fz5 GQMvdCmP3CEvoF7nZ2V0Y0IYdWF2o8w= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1fda7fa60a9so3559135ad.3 for ; Mon, 22 Jul 2024 04:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721649059; x=1722253859; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hHBpaWmjrNZ1mUi3zmdyGdS7Vt8ZJEl138881JB/3WE=; b=fGa/HG9CKf882QKlI1rdsZU9/WXSFvXhkNhtEUb6rkmW1Fd4ll8Y/9nbAu4g+Kxcta YIwSdg6LB2CmTv+JHYu3n+k5B9K0m9XlRV9wtk7UGe8eVxGJrIFHH03p9h7Rru2q/S74 oQf208V5at14DjKRYir1kgHSgEme1aUtg6XYYFfONDDQtiu8LHwFwqsEarh3ywXv8Kq5 n1UiJFUC5VEf0TyVkoNptg201IxV+xKlHQlWTl3vAigsDrw8YTHImIBiBLQDm1N1za3q BstLKtIHo4ysINPbVA7NJs0DRjCpmNwpSBlYCkr9JqRPdlTTl8VjWg2vmPTHMnG+vJot onjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721649059; x=1722253859; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hHBpaWmjrNZ1mUi3zmdyGdS7Vt8ZJEl138881JB/3WE=; b=BPHh3aPL6Z5QxVsE3L1wCn8IWosftp6XuQ+x5Lq9Ak7ziV17Z4f+evRoH1jvuwcJ0q IZ5deeSfDrpZhn4RleZ0nXsFgqct89Bg4dy4xCGY7jhNcrRIrNWNZklJCJ7yoCPb8F4U AMPPcuYqg+h04sbFmJ4liUNOfryTwWsGUeyM1stZB5v9njQGo3r3MAzkMyy8wHhdCfAu 0n4yWggHh3QjOzK+r6IXAQ7HyCl81nAY2vqN6qxK8Sb2bXtnOjQHnImNYhlemNqDjgDA UdMpymmlpPfvowoFdo9QqthVnFPpon+seXULarVUpXSCebegQRZO2LO6KrQwRdn7T01r euyw== X-Forwarded-Encrypted: i=1; AJvYcCUdTFNFaelHh2hxiaQfESwo1j3m8hu2qKeyckhTTadAoCVt7Qi83GcTnBe4eqmXRelEgFH3lu2wN2djycbZytBPOEM= X-Gm-Message-State: AOJu0YwrR/reOv8j1OGNfRy2BVsZM0p4EIqs3uDFxoCuHIgMWb6bGjzO nEhY2fnkXC6qEW07NdGuEElccJ5+XACEegPC29p32x9UPK9oBopp X-Google-Smtp-Source: AGHT+IHf3hRl4FiSWjZFHAamQpDN+8F/4VMWm5wuICul3s3cWrt4/hEksQasE1z/shg3fFpyhRBlZA== X-Received: by 2002:a17:903:2444:b0:1fc:a869:7fb7 with SMTP id d9443c01a7336-1fd746230d6mr62249525ad.54.1721649059129; Mon, 22 Jul 2024 04:50:59 -0700 (PDT) Received: from AHUANG12-3ZHH9X.lenovo.com (220-143-182-11.dynamic-ip.hinet.net. [220.143.182.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fd6f452c5csm52749525ad.234.2024.07.22.04.50.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 04:50:58 -0700 (PDT) From: Adrian Huang X-Google-Original-From: Adrian Huang To: urezki@gmail.com Cc: adrianhuang0701@gmail.com, ahuang12@lenovo.com, akpm@linux-foundation.org, hch@infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, sunjw10@lenovo.com Subject: Re: [PATCH 1/1] mm/vmalloc: Add preempt point in purge_vmap_node() when enabling kasan Date: Mon, 22 Jul 2024 19:50:54 +0800 Message-Id: <20240722115054.6295-1-ahuang12@lenovo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A8B9FA0028 X-Stat-Signature: abpwqo77kx8z158z6usuw85scfqcs51y X-Rspam-User: X-HE-Tag: 1721649060-472776 X-HE-Meta: U2FsdGVkX1/bf7wCiEUr1e0VO/EqL/lCYe/KShl5WBLuuftzNmSZS6dpqNAo8kKMI6OwpC8EVmCRfNW1E/vpmC0CM3IXSIXq0muuBpWkLbrQjEV8cLMIINYoUuYrkEZ4yQIWuQn0dHrxaHclqJKs3Eo9JDm2heCI4glHaz8+eBtJfObMYlXY1hJeyjjE13t8PKBZ4IvUCvjAXw6Ud8wkDiQ8sE2M7F75T1KcIL031NgOys+9vLmYYWRW29Y1OTsoTPzUv5vQyS/B2p38g/rahfVfskTphX2DMCrrqnSAMnEJdQNZ4fxl6xXQCAvwO2yHM9qRMbYGxXxkNJRqXDUKQbfU2F1C/L7Wj0LuNoH2tcR/0sVafEyQogd0zjmSeweXW63kcQZtfIclbV2KW4yEk5MoNuj+sRLMO/hVmCMm0mabu8nvNVJJfLKpUIDwGxQ45i4QRp5TNPjmxnQofo83RNA+ud+SjPlKxSD9uNEzc32S8qYPhIeykH+oF1XReksgd5QfJWG5LF1uTwnBjbZALpCvjN2RmVbzYoudqKATILv1eNEyDbjCqyUiEdRYgaO5Dtf0Zcgeywyaobol/z8DhIseODWPxa/y7NVZGyWFSTfztj4huHPCSuaDK6oD++yv1OvIKwcKZREKtjR24O4Hb33xuMDMW5NwFSXryFcVCvCljmIJx7eraUTZ76DtL0hVyO+MwXqNdyzZ+U8f3xtx9LWgLIHnXxaiUwMxxQzm9vx6VdLa+ckWexgJT97jvCXlmzjL+u5Abini/pdYSKPZ1DUO9VhY/qiIbTr8uR5lc3fWYu82NjJerW+TENgJjL5yq+WaTqjQVjfLiMT7haQv5pLQC5EQxdkgS4ZALGsqVZ11DwgE99EjDWF1uOBJ83PbyXuAbiHhacRlwVnFdC0obT2G4H6NDAuzUHlK/RWVMzXC94Kir+5JUFwTfYSmMP6IG5+0kOUmFVtlapW6ppt J3IXTUQR fxQqGYfO2S3R81JeBd8NQyyfwPJUEotgxGqnsrHLduLMF4dANHCPQOlgjqkgo6xxPUvPQ96h+QIHuGSYy5mnlsmj5qoJxzr/IG0IqWzOIVpKVa35kL7duL8i/VKJIvLmgdfeNk3Dckl3Vzoj5yBnhZPFbJ3/RB9ZB5tjBkPTMtRuEa6kLIiyEsHz36XwfX/0nY+TteqvlomU8QIVe/ASb4BAPufPtnBjaKJ2SOpoeoM1S9AtzlnNUIHRDXvLU24CCabEmJCN6uDEHTJ7kBzd7mc0/H1wVl4fwX1PddBDWk4TtjnGpUUNfbUNBNxHRtgHBKNEp0tsyvI8EsyRESkVUSrmQ4q6h4FJkPHi5itLZ3jsAqEE4PJb9dW5NpTQq9atA5g2fxC2ghQhdZYE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > I tried to simulate the reported splat and i can reproduce it with KASAN > enabled. I use qemu on my 64-core system, it allows me to specify 255 > cores while running VM. The kernel is 6.10.0-rc5. > > The kernel should be built with CONFIG_KASAN=y and CONFIG_KASAN_VMALLOC=y > > The "soft lockup" can be triggered when the kernel is compiled within a > VM using 256 jobs and preemption is disabled: > > echo none > /sys/kernel/debug/sched/preempt > make -C coding/linux.git/ -j256 bzImage > > > watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] > CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 > Workqueue: events drain_vmap_area_work > RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 > ... > Great to hear you're able to reproduce the issue. I keep debugging, and the original patch (https://lore.kernel.org/all/ZogS_04dP5LlRlXN@pc636/T/) shows purge_vmap_node() iteratively releases kasan vmalloc allocations and flushes tlb for each vmap_area. There are 2805 flush_tlb_kernel_range() calls in ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called in kasan_release_vmalloc(): Called by purge_vmap_node(). - [Rough calculation] Each flush_tlb_kernel_range() runs about 7.5ms. -- 2804 * 7.5ms = 21.03 seconds (That's why a soft lock is trigger) If we combine all tlb flush operations into one operation in the call path 'purge_vmap_node()->kasan_release_vmalloc()', the running time of drain_vmap_area_work() can be saved greately. The idea is from the flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the soft lockup won't not be triggered. Please refer to the following patch. Here is the test result based on 6.10: [6.10 wo/ the patch below] 1. ftrace latency profiling (record a trace if the latency > 20s): Commands echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on 2. Run `make -j $(nproc)` to compile the kernel source 3. Once the soft lockup is reproduced, check the ftace: cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 76) $ 50412985 us | } /* __purge_vmap_area_lazy */ 76) $ 50412997 us | } /* drain_vmap_area_work */ 76) $ 29165911 us | } /* __purge_vmap_area_lazy */ 76) $ 29165926 us | } /* drain_vmap_area_work */ 91) $ 53629423 us | } /* __purge_vmap_area_lazy */ 91) $ 53629434 us | } /* drain_vmap_area_work */ 91) $ 28121014 us | } /* __purge_vmap_area_lazy */ 91) $ 28121026 us | } /* drain_vmap_area_work */ [6.10 w/ the patch below] 1. Repeat step 1-2 in "[6.10 wo/ the patch below]" 2. The soft lockup is not triggered and the ftrace log is empty. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace log. 4. Setting 'tracing_thresh' to 1 second gets ftrace log. cat /sys/kernel/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 51) $ 1019695 us | } /* __purge_vmap_area_lazy */ 51) $ 1019703 us | } /* drain_vmap_area_work */ 198) $ 1018707 us | } /* __purge_vmap_area_lazy */ 198) $ 1018718 us | } /* drain_vmap_area_work */ 5. Run the following test_vmalloc command without any issues modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x1 nr_pages=4 Could you please test this patch in your VM environment? --- diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 70d6a8f6e25d..ddbf42a1a7b7 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -55,6 +55,9 @@ extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; int kasan_populate_early_shadow(const void *shadow_start, const void *shadow_end); +#define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply existing page range */ +#define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */ + #ifndef kasan_mem_to_shadow static inline void *kasan_mem_to_shadow(const void *addr) { @@ -511,7 +514,8 @@ void kasan_populate_early_vm_area_shadow(void *start, unsigned long size); int kasan_populate_vmalloc(unsigned long addr, unsigned long size); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end); + unsigned long free_region_end, + unsigned long flags); #else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -526,7 +530,8 @@ static inline int kasan_populate_vmalloc(unsigned long start, static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -561,7 +566,8 @@ static inline int kasan_populate_vmalloc(unsigned long start, static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } static inline void *kasan_unpoison_vmalloc(const void *start, unsigned long size, diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index d6210ca48dda..88d1c9dcb507 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -489,7 +489,8 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr, */ void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) + unsigned long free_region_end, + unsigned long flags) { void *shadow_start, *shadow_end; unsigned long region_start, region_end; @@ -522,12 +523,17 @@ void kasan_release_vmalloc(unsigned long start, unsigned long end, __memset(shadow_start, KASAN_SHADOW_INIT, shadow_end - shadow_start); return; } - apply_to_existing_page_range(&init_mm, + + + if (flags & KASAN_VMALLOC_PAGE_RANGE) + apply_to_existing_page_range(&init_mm, (unsigned long)shadow_start, size, kasan_depopulate_vmalloc_pte, NULL); - flush_tlb_kernel_range((unsigned long)shadow_start, - (unsigned long)shadow_end); + + if (flags & KASAN_VMALLOC_TLB_FLUSH) + flush_tlb_kernel_range((unsigned long)shadow_start, + (unsigned long)shadow_end); } } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e34ea860153f..d66e09135876 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2193,8 +2193,15 @@ static void purge_vmap_node(struct work_struct *work) struct vmap_area *va, *n_va; LIST_HEAD(local_list); + unsigned long start; + unsigned long end; + vn->nr_purged = 0; + start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start; + + end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end; + list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; unsigned long orig_start = va->va_start; @@ -2205,7 +2212,8 @@ static void purge_vmap_node(struct work_struct *work) if (is_vmalloc_or_module_addr((void *)orig_start)) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE); atomic_long_sub(nr, &vmap_lazy_nr); vn->nr_purged++; @@ -2218,6 +2226,8 @@ static void purge_vmap_node(struct work_struct *work) list_add(&va->list, &local_list); } + kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH); + reclaim_list_global(&local_list); } @@ -4726,7 +4736,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; } @@ -4776,7 +4787,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; kfree(vms[area]); }