From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32737CDB465 for ; Thu, 19 Oct 2023 09:47:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE7E58D0199; Thu, 19 Oct 2023 05:47:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B97B98D0019; Thu, 19 Oct 2023 05:47:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A38798D0199; Thu, 19 Oct 2023 05:47:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8E22B8D0019 for ; Thu, 19 Oct 2023 05:47:40 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 602381A0DDF for ; Thu, 19 Oct 2023 09:47:40 +0000 (UTC) X-FDA: 81361733880.02.8FE8BEB Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf22.hostedemail.com (Postfix) with ESMTP id 04F6DC0020 for ; Thu, 19 Oct 2023 09:47:36 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697708858; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F/guqm0bAnd8erKXwEu4rIXqDeeFo2MIuBKs9kIGWUY=; b=brchosMCbrg+sPLPWiOBif664uFKFzVa6HqXQo+ADEEsfTzKsXycxofvxogvTEDDuRX4G+ /lzAF7zhCZJZ1thOAhB7lI8xEK9gjHBGsgxCQHR32t/vCV3LuW1uUMxUhN8cRt12U86ec0 rN0TNUWUCAjXnonkv4BNoUSrNFern/o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697708858; a=rsa-sha256; cv=none; b=eSwqf8Bb5ZOHClAmMXykCox5KWcHJ/9IlzK5+FpjaTLaSyLypQ/d46VSYUGuCN99D+K3sn 0CZrD2E5BM6DQd8+4lJNeRpLN4gu+tYiqhqgHiVRT6xdKYlImaGkgc/AfBQtBzHqjuxuNd AGchkpCP8X/Cc6/AqYISrug9x3lrwAQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from dggpemm100001.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4SB2pC11q5zNnyQ; Thu, 19 Oct 2023 17:43:15 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 19 Oct 2023 17:47:14 +0800 Message-ID: <146508e9-db36-4c84-8ac6-1b3173b94194@huawei.com> Date: Thu, 19 Oct 2023 17:47:13 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Content-Language: en-US To: Uladzislau Rezki CC: Marco Elver , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Christoph Hellwig , Lorenzo Stoakes , , References: <20230906124234.134200-1-wangkefeng.wang@huawei.com> <4e2e075f-b74c-4daf-bf1a-f83fced742c4@huawei.com> <5b33515b-5fd2-4dc7-9778-e321484d2427@huawei.com> <7ab8839b-8f88-406e-b6e1-2c69c8967d4e@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm100001.china.huawei.com (7.185.36.93) X-CFilter-Loop: Reflected X-Stat-Signature: ck41y9zcr394zypmwqt4gwdjdzz49j9j X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 04F6DC0020 X-Rspam-User: X-HE-Tag: 1697708856-177076 X-HE-Meta: U2FsdGVkX1/UMoNWg+KvqKvlUFJJTJkxuqWb7dk8z5lFcYEv/LsoJUzmzPV/167ZOrqeq6d3Xv+ZkB+L25k0xdbJ7MnFf5wORhOxZK36riKA2uaHbBU0tTKQ4GN9bFONfMkH+FJLD7CRoATq9yrqX1MqUmMmx6tCLrWYb/xK3FdnI7cIc7VqraK+NgReb0NWOSgQPdXNhFVjydxGt6c3Xjotuhni4wsxd8e9yZyIpev2yHyEmijMfoUxj7ovhujLsTwccDZNTYdyQ6GadzTsAsr1TxsLCNVJmj0ejnX3Z9eNpdpjRBvit/OfImcOm4cVmUSPDUz4FiR5PiRBsMyBVJIjok9rnpGEq5ve1FouyPVJ3GPCzPqC/aiFHg2ODL9HOtkIe3iNLqUCm/6TC1eIcVVudEvDba1Fl7fpmGM0bTi4Pi5deLqzPSaliZoa3apB5JwHkfLRmZjBi0KJSLxS79B1e8rwcsT/7VDe8SAJCk8pj7nyH6374RD5PnROtYfUSsqbieZcdHHEQkauHNcOHzvEIcIueRRHo4izdi6Bm4L/2+trmy/vultEzr2uGhr8lyjlxvjXw9bXDPPTDKhT8bGgUJS9pAExI2xvwVUgTAz3OKVTwctyMilO71F/XyqeeUbani+BjaQ/+S/lyFR9Vbjsh5hz9/SqA7N9f14XDz3SHbFynpsbHLDqCA171ebHBLaE4/etj75kIo6skejfMZGqUdPcS/yzYLNlttd7MF+dhpGVlnJmv3Ebxytqza4J+OR/gTBCZv2yl5viRsIt9r9svlHFf+oX+2KgI/VnQHt6mMr73XtMmHM1GOl5p2++GhSON0Yumq/zdHR0sZU7QuOGJ8vUCX0Gr+oCfrzkPxBOchqzrO7Bv9hm5JdNty2qVfQhhyGNpDElcWF4Q5LKzFLB8wOKNBbE82xrVCNnleKaNYO7klcFNxD3ArkGgVZ9Jr0WLJzP09VgEH1JbWw NPiChNwb GMX3btII1yahROHDFLfNBAGKULAbBGxXJ4aXrMgsB9Mwz4+N4gbWtkxLpoHT2818wu7ubZaBKQVlcoObQHa/b7yE2kRbPOVhwLHPBHmEkVZ1RCrurID5LY8wdb9hFILF8zzt57fu7m0R1hf2btDqETKHmwspbHuDh14LQTiNOfXQmwxA3/egP3wmbD7TJt7pjcq3k19s/a5+ZnjRC2OIuJiZXxYXBrLvk0ctzym1HHhFeVwHni+2q5ugdzAGdGYsKVxhpLjTQBD0Y7m7eBboecCTR8ESmKrregbimyeeACTrZeYlJNfIkVygD+fIhFIeDvxYL0Jnw2r0/OCWA1P8eAWjNeROQ4LwNOwZ/0LX9M1cEdpE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/10/19 16:53, Uladzislau Rezki wrote: > On Thu, Oct 19, 2023 at 03:26:48PM +0800, Kefeng Wang wrote: >> >> >> On 2023/10/19 14:17, Uladzislau Rezki wrote: >>> On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote: >>>> >>>> >>>> On 2023/10/19 0:37, Marco Elver wrote: >>>>> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev >>>>> wrote: >>>>>> >>>>>> The issue is easy to reproduced with large vmalloc, kindly ping... >>>>>> >>>>>> On 2023/9/15 8:58, Kefeng Wang wrote: >>>>>>> Hi All, any suggest or comments,many thanks. >>>>>>> >>>>>>> On 2023/9/6 20:42, Kefeng Wang wrote: >>>>>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when >>>>>>>> populate or depopulate pte with large region, looking forward to your >>>>>>>> reply and advise, thanks. >>>>>>> >>>>>>> Here is full stack,for populate pte, >>>>>>> >>>>>>> [ C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458] >>>>>>> [ C3] Modules linked in: test(OE+) >>>>>>> [ C3] irq event stamp: 320776 >>>>>>> [ C3] hardirqs last enabled at (320775): [] >>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8 >>>>>>> [ C3] hardirqs last disabled at (320776): [] >>>>>>> el1_interrupt+0x38/0xa8 >>>>>>> [ C3] softirqs last enabled at (318174): [] >>>>>>> __do_softirq+0x658/0x7ac >>>>>>> [ C3] softirqs last disabled at (318169): [] >>>>>>> ____do_softirq+0x18/0x30 >>>>>>> [ C3] CPU: 3 PID: 458 Comm: insmod Tainted: G OE 6.5.0+ #595 >>>>>>> [ C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >>>>>>> [ C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>>> [ C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 >>>>>>> [ C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 >>>>>>> [ C3] sp : ffff800093386d70 >>>>>>> [ C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0 >>>>>>> [ C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708 >>>>>>> [ C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000 >>>>>>> [ C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000 >>>>>>> [ C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60 >>>>>>> [ C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9 >>>>>>> [ C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8 >>>>>>> [ C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000 >>>>>>> [ C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70 >>>>>>> [ C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507 >>>>>>> [ C3] Call trace: >>>>>>> [ C3] _raw_spin_unlock_irqrestore+0x50/0xb8 >>>>>>> [ C3] rmqueue_bulk+0x434/0x6b8 >>>>>>> [ C3] get_page_from_freelist+0xdd4/0x1680 >>>>>>> [ C3] __alloc_pages+0x244/0x508 >>>>>>> [ C3] alloc_pages+0xf0/0x218 >>>>>>> [ C3] __get_free_pages+0x1c/0x50 >>>>>>> [ C3] kasan_populate_vmalloc_pte+0x30/0x188 >>>>>>> [ C3] __apply_to_page_range+0x3ec/0x650 >>>>>>> [ C3] apply_to_page_range+0x1c/0x30 >>>>>>> [ C3] kasan_populate_vmalloc+0x60/0x70 >>>>>>> [ C3] alloc_vmap_area.part.67+0x328/0xe50 >>>>>>> [ C3] alloc_vmap_area+0x4c/0x78 >>>>>>> [ C3] __get_vm_area_node.constprop.76+0x130/0x240 >>>>>>> [ C3] __vmalloc_node_range+0x12c/0x340 >>>>>>> [ C3] __vmalloc_node+0x8c/0xb0 >>>>>>> [ C3] vmalloc+0x2c/0x40 >>>>>>> [ C3] show_mem_init+0x1c/0xff8 [test] >>>>>>> [ C3] do_one_initcall+0xe4/0x500 >>>>>>> [ C3] do_init_module+0x100/0x358 >>>>>>> [ C3] load_module+0x2e64/0x2fc8 >>>>>>> [ C3] init_module_from_file+0xec/0x148 >>>>>>> [ C3] idempotent_init_module+0x278/0x380 >>>>>>> [ C3] __arm64_sys_finit_module+0x88/0xf8 >>>>>>> [ C3] invoke_syscall+0x64/0x188 >>>>>>> [ C3] el0_svc_common.constprop.1+0xec/0x198 >>>>>>> [ C3] do_el0_svc+0x48/0xc8 >>>>>>> [ C3] el0_svc+0x3c/0xe8 >>>>>>> [ C3] el0t_64_sync_handler+0xa0/0xc8 >>>>>>> [ C3] el0t_64_sync+0x188/0x190 >>>>>>> > This trace is stuck in the rmqueue_bulk() because you request a > huge alloc size. It has nothing to do with free_vmap_area_lock, > it is about bulk allocator. It gets stuck to accomplish such > demand. Yes, this is not about spinlock issue, it runs too much time in kasan_populate_vmalloc() as the __apply_to_page_range() with a large range, and this issue could be fixed by adding a cond_resched() in kasan_populate_vmalloc(), see patch1. > > >>>>>>> and for depopuldate pte, >>>>>>> >>>>>>> [ C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59] >>>>>>> [ C6] Modules linked in: test(OE+) >>>>>>> [ C6] irq event stamp: 39458 >>>>>>> [ C6] hardirqs last enabled at (39457): [] >>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8 >>>>>>> [ C6] hardirqs last disabled at (39458): [] >>>>>>> el1_interrupt+0x38/0xa8 >>>>>>> [ C6] softirqs last enabled at (39420): [] >>>>>>> __do_softirq+0x658/0x7ac >>>>>>> [ C6] softirqs last disabled at (39415): [] >>>>>>> ____do_softirq+0x18/0x30 >>>>>>> [ C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G OEL >>>>>>> 6.5.0+ #595 >>>>>>> [ C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >>>>>>> [ C6] Workqueue: events drain_vmap_area_work >>>>>>> [ C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>>> [ C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 >>>>>>> [ C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 >>>>>>> [ C6] sp : ffff80008fe676b0 >>>>>>> [ C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80 >>>>>>> [ C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006 >>>>>>> [ C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006 >>>>>>> [ C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000 >>>>>>> [ C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0 >>>>>>> [ C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9 >>>>>>> [ C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8 >>>>>>> [ C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000 >>>>>>> [ C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98 >>>>>>> [ C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21 >>>>>>> [ C6] Call trace: >>>>>>> [ C6] _raw_spin_unlock_irqrestore+0x50/0xb8 >>>>>>> [ C6] free_pcppages_bulk+0x2bc/0x3e0 >>>>>>> [ C6] free_unref_page_commit+0x1fc/0x290 >>>>>>> [ C6] free_unref_page+0x184/0x250 >>>>>>> [ C6] __free_pages+0x154/0x1a0 >>>>>>> [ C6] free_pages+0x88/0xb0 >>>>>>> [ C6] kasan_depopulate_vmalloc_pte+0x58/0x80 >>>>>>> [ C6] __apply_to_page_range+0x3ec/0x650 >>>>>>> [ C6] apply_to_existing_page_range+0x1c/0x30 >>>>>>> [ C6] kasan_release_vmalloc+0xa4/0x118 >>>>>>> [ C6] __purge_vmap_area_lazy+0x4f4/0xe30 >>>>>>> [ C6] drain_vmap_area_work+0x60/0xc0 >>>>>>> [ C6] process_one_work+0x4cc/0xa38 >>>>>>> [ C6] worker_thread+0x240/0x638 >>>>>>> [ C6] kthread+0x1c8/0x1e0 >>>>>>> [ C6] ret_from_fork+0x10/0x20 >>>>>>> >> >> See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan will >> populate and depopulate vmalloc pte, those will spend more time than >> no-kasan kernel, for unmap, and there is already a cond_resched_lock() in >> __purge_vmap_area_lazy(), but with more time consumed under >> spinlock(free_vmap_area_lock), and we couldn't add cond_resched_lock in >> kasan_depopulate_vmalloc_pte(), so if spin lock converted to mutex lock, we >> could add a cond_resched into kasan depopulate, this is why make such >> conversion if kasan enabled, but this >> conversion maybe not correct, any better solution? >> > I have at least below thoughts: > > a) Add a max allowed threshold that user can request over vmalloc() call. > I do not think ~40G is a correct request. I don't know, but maybe some driver could map large range , but we do meet this issue in qemu, though it is very low probability. > > b) This can fix unmap path: > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ef8599d394fd..988735da5c5c 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1723,7 +1723,6 @@ static void purge_fragmented_blocks_allcpus(void); > */ > static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > { > - unsigned long resched_threshold; > unsigned int num_purged_areas = 0; > struct list_head local_purge_list; > struct vmap_area *va, *n_va; > @@ -1747,36 +1746,32 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > struct vmap_area, list)->va_end); > > flush_tlb_kernel_range(start, end); > - resched_threshold = lazy_max_pages() << 1; > > - spin_lock(&free_vmap_area_lock); > list_for_each_entry_safe(va, n_va, &local_purge_list, list) { > unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; > unsigned long orig_start = va->va_start; > unsigned long orig_end = va->va_end; > > + if (is_vmalloc_or_module_addr((void *)orig_start)) > + kasan_release_vmalloc(orig_start, orig_end, > + va->va_start, va->va_end); > + > /* > * Finally insert or merge lazily-freed area. It is > * detached and there is no need to "unlink" it from > * anything. > */ > + spin_lock(&free_vmap_area_lock); > va = merge_or_add_vmap_area_augment(va, &free_vmap_area_root, > &free_vmap_area_list); > + spin_unlock(&free_vmap_area_lock); > > if (!va) > continue; > > - if (is_vmalloc_or_module_addr((void *)orig_start)) > - kasan_release_vmalloc(orig_start, orig_end, > - va->va_start, va->va_end); > - > atomic_long_sub(nr, &vmap_lazy_nr); > num_purged_areas++; > - > - if (atomic_long_read(&vmap_lazy_nr) < resched_threshold) > - cond_resched_lock(&free_vmap_area_lock); > } > - spin_unlock(&free_vmap_area_lock); > > out: > trace_purge_vmap_area_lazy(start, end, num_purged_areas); > Thanks for you suggestion. but check kasan_release_vmalloc(), it seems that kasan_release_vmalloc() need free_vmap_area_lock from comment[1], Marco and all kasan maintainers, please help to check the above way. [1] https://elixir.bootlin.com/linux/v6.6-rc6/source/mm/kasan/shadow.c#L491 > > c) bulk-path i have not checked, but on a high level kasan_populate_vmalloc() > should take a breath between requests. > > -- > Uladzislau Rezki >