From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A824CDB47E for ; Wed, 18 Oct 2023 16:37:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F8408D0166; Wed, 18 Oct 2023 12:37:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A7C88D0016; Wed, 18 Oct 2023 12:37:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 098AF8D0166; Wed, 18 Oct 2023 12:37:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ED6A98D0016 for ; Wed, 18 Oct 2023 12:37:40 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C56AEA03DD for ; Wed, 18 Oct 2023 16:37:40 +0000 (UTC) X-FDA: 81359138280.02.8DDDDB3 Received: from mail-ua1-f45.google.com (mail-ua1-f45.google.com [209.85.222.45]) by imf27.hostedemail.com (Postfix) with ESMTP id 03DF540008 for ; Wed, 18 Oct 2023 16:37:38 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Qmh/qdWC"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of elver@google.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=elver@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697647059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZS6/4vqnqCgfA+WMNFGqFV+FgCO5fKsjIazP0J73rTA=; b=pgqWxgKfnmfEjxocuAVt3uNx6ccchyBYvl3oPhyl/sqyn3jp+W3b4tnCUt8VIGnOKYkt42 yuYmhIJFJxS+C8/pypx/8sifLLijsXvDteZrn46WNLw0NwnyXWIn84ZwZtpqarOZ+iTLfN j55iSs7UtKwWzEG3voOlJZy3QJiLQVE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Qmh/qdWC"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of elver@google.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=elver@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697647059; a=rsa-sha256; cv=none; b=rhD23bNx/R0FhsLcgJvpm093ExXit7bYjUnO/XaHGsIS7lIKhwgTBkIm64AcoV/4tlqu4H nrO9EiJYb/eKmm8aQpCi05tUyP6am7WPYzgwlJ+CnGvVhcAOBAUDlLcw0ajfbSHOBndIek AxsAS22GHz7cAkYvoQrAzvQ/869iGjI= Received: by mail-ua1-f45.google.com with SMTP id a1e0cc1a2514c-7b625ed7208so2978400241.1 for ; Wed, 18 Oct 2023 09:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697647058; x=1698251858; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZS6/4vqnqCgfA+WMNFGqFV+FgCO5fKsjIazP0J73rTA=; b=Qmh/qdWCCtaXUtePNJH18jCGQzhcZIoQ6uLOb4VF7H241kIDykZtTLaAFfQeOIqaTR vqUL2aZ6s9EIqwry00xg0DRSD3/fEScBPfAJOnHurE4JOLqyEJ6REWyQ9a1dwJSmLJ9E h6CP/KFZ+hZXvBhhPQCX1UQNO0pc55IhaHymHN7+q1BnhjDK+9GVn3dau6q4qGe94LIP fY5tfxeYhs7Sh8ZPxRNd479oOTgqvzyZFPZtl0hB3QndOKr04PxMLRl0aLzXhgcvOLkx 62VK1XZK/Wjk2lC+sNeIsae6JQ3AyR2N/G/dtdfA34sJnnmjr04CurYDW4x/wIEdLjbA FR2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697647058; x=1698251858; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZS6/4vqnqCgfA+WMNFGqFV+FgCO5fKsjIazP0J73rTA=; b=M1xBznQhI4UzzSQcSNAYSjdWpcBM5GFw5sibHwrEM80eThZHaJ1eI/F6hvYTckMchh ILfXup00zqq4vyGSaBj7LKKXTVcUyt5tGQEm3JNEULqXmEc8OgVY9hAPduupCKUvexcM +OD/gnagxj9y9htKcKzis++zjeZTtpMatDFo2QmnsPOTz1I3r/c1uozu3Jfdw3I4Tg9a jgEx0TqXT8reQen8ObI9gHvcUJSeH5Ibg4Mc4wKTqg19VHcXskL02CFR22JZ62Ob4spo Zo/ewM95lLNc6QtJQQWuaKT6qodtfJX6VCVvA4yqCwuXg7Lc5lPoRupeabardXgCg6Pk ylQg== X-Gm-Message-State: AOJu0YwziUzYcwrjIYy663M5/ZL/3v3WpqTOlfqRrDN/i6TSdtzx5u+2 VGUcorxlRsxW8ZmuNXFCLLKbcDOo17yn9iJ6cUTRKg== X-Google-Smtp-Source: AGHT+IHCzUHrThu4ZI2FOSJtIv0Cskjq9+i6OFcG11BgUSvF7GS10jPw/Y1pl7I2OrB24c1bQrmas5L3AL4TlePoE5Q= X-Received: by 2002:a67:c39a:0:b0:458:19fc:e1e5 with SMTP id s26-20020a67c39a000000b0045819fce1e5mr6396948vsj.6.1697647057877; Wed, 18 Oct 2023 09:37:37 -0700 (PDT) MIME-Version: 1.0 References: <20230906124234.134200-1-wangkefeng.wang@huawei.com> <4e2e075f-b74c-4daf-bf1a-f83fced742c4@huawei.com> In-Reply-To: From: Marco Elver Date: Wed, 18 Oct 2023 18:37:01 +0200 Message-ID: Subject: Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte To: Kefeng Wang Cc: Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , kasan-dev@googlegroups.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 03DF540008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ytepiktytaskr7bsowqu1w6cd8kfdu3o X-HE-Tag: 1697647058-976917 X-HE-Meta: U2FsdGVkX18KK6t5L60Nr1ngqQ2X6Z05vtovTvhl5cNQXI7bEWp2AbtQaSpMXTUx960Pn14iRZFuR3Cp8CBHFX7TtdXoucpoVwZtEXZ7NYtKx250x4+H86aCHc46Z5oUhUiTzDqst7mMyKF3WMPDkJd7TfMzXDycfI7O8ClEjGec9UnS3I/oSjOo0QkaWfCkX4c/7TD+PbYEb2THm1Fil8U/FgAlK1xM4bkEOzODipnynYHBbpeLNG0PVKUoPy0mBHJcCVonZRjMo+oiU9eiZSkE7Npb31x2o8HBIQNnPFhQxo5zeugciqa5pWzT06nBUI0cxMsHPrCaGFaI2LaTeIdGFbFqt2Zs42TNKZSlv7KfWulsxLKVzxLw0WSqZMgDVLV41LNHqEeYweMad2x0d7SuVYRt8+ryXTy/a6RaICH6bBe/Iy9A95h6RH4WGTNkkN5WbH2w7BZUffKHTIl0tyng2kz/lg+aW4bFqdCZepnEAyixyRReC4f3ayw6TY0JpX4huYrYsMC4r3unz88miKOIUMxZTthskVfNwYqeVi7ZAi1tfPRdWc37movn7rfSsnefh9YJ1+e+FpqoSi8RbAh9jGa6Bs6GymSRRFVBavmbQc3LDnUPSf5SQgijkuElA92o7Um+1wRjQqiVWXng13z76c0PPJJg11X7CH7KktFXBqY2lNIbakzK5PZne1EtRGMTNXPQVC/y8OJfHhuSh0E03Wxmcel4L6fljgxhFJ/50ZINYxPQv/soM0de/uSphUcWwYs+lYwcXMC4U0fw00Wyg3OCrtD/j2Z4N/58QJ+qXO1QoHKnkMwIdqr1bEdvMZ0caIitGp3RHO0t4hFs5iMrnVObfUyLfWepMsDXXwuNGMtQmGPt7jTkfMBTr/jKQBFj/8zYQSR2lvfWZ2De9U1/df4xA9ruyeONQfc4QXa9gNpx4dXEWObjGZfh/NZRzLcKVJv10NiU24cQ9kX c7PTrD/B gJR7HI5jBK8OJAO481CGdp2Ma281fTFyXRhbDbsJ+abC9buvsY12t+mN3xl86SRdPvi8KXmmHlnBSUHr8PAt4SxLC1ngF1BFmbQYSHnK/qWX8zlU2F9Z2CBM7sCtlK2GTLqCY/aRmU6j8+33YF2OYAmSihWR6PgNZ/YIvXUZsukCAhk9jOTFWSXGyf5q21RUKNLFXhQG3MxFFj4YSYVysRtEPJtLNCugEZpyUnDK/Ne4kupNcUu+6dSqWtorpKCcIZzba8qyj2IKqJmZHlTiv7kW/wXxC6HlSJYrfrim2f4dK2Bofvjp5zl8PaLR0ren3zqWfha6ZDv3UPa9nSNe0VNDlnwPkjm7T4U62uHfcDgNJwV3dB3hC48NGObF8ht/HtBLF7MdI5KHpXC0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev wrote: > > The issue is easy to reproduced with large vmalloc, kindly ping... > > On 2023/9/15 8:58, Kefeng Wang wrote: > > Hi All=EF=BC=8C any suggest or comments=EF=BC=8Cmany thanks. > > > > On 2023/9/6 20:42, Kefeng Wang wrote: > >> This is a RFC, even patch3 is a hack to fix the softlock issue when > >> populate or depopulate pte with large region, looking forward to your > >> reply and advise, thanks. > > > > Here is full stack=EF=BC=8Cfor populate pte=EF=BC=8C > > > > [ C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458] > > [ C3] Modules linked in: test(OE+) > > [ C3] irq event stamp: 320776 > > [ C3] hardirqs last enabled at (320775): [] > > _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C3] hardirqs last disabled at (320776): [] > > el1_interrupt+0x38/0xa8 > > [ C3] softirqs last enabled at (318174): [] > > __do_softirq+0x658/0x7ac > > [ C3] softirqs last disabled at (318169): [] > > ____do_softirq+0x18/0x30 > > [ C3] CPU: 3 PID: 458 Comm: insmod Tainted: G OE 6.5.0+ #5= 95 > > [ C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/201= 5 > > [ C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D= --) > > [ C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C3] sp : ffff800093386d70 > > [ C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa= 9c0 > > [ C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353= 708 > > [ C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000= 000 > > [ C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000= 000 > > [ C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024e= c60 > > [ C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff= 5f9 > > [ C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff= 5f8 > > [ C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000= 000 > > [ C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670= d70 > > [ C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e= 507 > > [ C3] Call trace: > > [ C3] _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C3] rmqueue_bulk+0x434/0x6b8 > > [ C3] get_page_from_freelist+0xdd4/0x1680 > > [ C3] __alloc_pages+0x244/0x508 > > [ C3] alloc_pages+0xf0/0x218 > > [ C3] __get_free_pages+0x1c/0x50 > > [ C3] kasan_populate_vmalloc_pte+0x30/0x188 > > [ C3] __apply_to_page_range+0x3ec/0x650 > > [ C3] apply_to_page_range+0x1c/0x30 > > [ C3] kasan_populate_vmalloc+0x60/0x70 > > [ C3] alloc_vmap_area.part.67+0x328/0xe50 > > [ C3] alloc_vmap_area+0x4c/0x78 > > [ C3] __get_vm_area_node.constprop.76+0x130/0x240 > > [ C3] __vmalloc_node_range+0x12c/0x340 > > [ C3] __vmalloc_node+0x8c/0xb0 > > [ C3] vmalloc+0x2c/0x40 > > [ C3] show_mem_init+0x1c/0xff8 [test] > > [ C3] do_one_initcall+0xe4/0x500 > > [ C3] do_init_module+0x100/0x358 > > [ C3] load_module+0x2e64/0x2fc8 > > [ C3] init_module_from_file+0xec/0x148 > > [ C3] idempotent_init_module+0x278/0x380 > > [ C3] __arm64_sys_finit_module+0x88/0xf8 > > [ C3] invoke_syscall+0x64/0x188 > > [ C3] el0_svc_common.constprop.1+0xec/0x198 > > [ C3] do_el0_svc+0x48/0xc8 > > [ C3] el0_svc+0x3c/0xe8 > > [ C3] el0t_64_sync_handler+0xa0/0xc8 > > [ C3] el0t_64_sync+0x188/0x190 > > > > and for depopuldate pte=EF=BC=8C > > > > [ C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1= :59] > > [ C6] Modules linked in: test(OE+) > > [ C6] irq event stamp: 39458 > > [ C6] hardirqs last enabled at (39457): [] > > _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C6] hardirqs last disabled at (39458): [] > > el1_interrupt+0x38/0xa8 > > [ C6] softirqs last enabled at (39420): [] > > __do_softirq+0x658/0x7ac > > [ C6] softirqs last disabled at (39415): [] > > ____do_softirq+0x18/0x30 > > [ C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G OEL > > 6.5.0+ #595 > > [ C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/201= 5 > > [ C6] Workqueue: events drain_vmap_area_work > > [ C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D= --) > > [ C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C6] sp : ffff80008fe676b0 > > [ C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5df= a80 > > [ C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000= 006 > > [ C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000= 006 > > [ C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000= 000 > > [ C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c1= 1b0 > > [ C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff= 5f9 > > [ C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff= 5f8 > > [ C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000= 000 > > [ C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcc= e98 > > [ C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009= a21 > > [ C6] Call trace: > > [ C6] _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C6] free_pcppages_bulk+0x2bc/0x3e0 > > [ C6] free_unref_page_commit+0x1fc/0x290 > > [ C6] free_unref_page+0x184/0x250 > > [ C6] __free_pages+0x154/0x1a0 > > [ C6] free_pages+0x88/0xb0 > > [ C6] kasan_depopulate_vmalloc_pte+0x58/0x80 > > [ C6] __apply_to_page_range+0x3ec/0x650 > > [ C6] apply_to_existing_page_range+0x1c/0x30 > > [ C6] kasan_release_vmalloc+0xa4/0x118 > > [ C6] __purge_vmap_area_lazy+0x4f4/0xe30 > > [ C6] drain_vmap_area_work+0x60/0xc0 > > [ C6] process_one_work+0x4cc/0xa38 > > [ C6] worker_thread+0x240/0x638 > > [ C6] kthread+0x1c8/0x1e0 > > [ C6] ret_from_fork+0x10/0x20 > > > > > > > >> > >> Kefeng Wang (3): > >> mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte= () > >> mm: kasan: shadow: move free_page() out of page table lock > >> mm: kasan: shadow: HACK add cond_resched_lock() in > >> kasan_depopulate_vmalloc_pte() The first 2 patches look ok, but yeah, the last is a hack. I also don't have any better suggestions, only more questions. Does this only happen on arm64? Do you have a minimal reproducer you can share?