From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 641C0C433FE for ; Tue, 11 Oct 2022 19:52:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 899536B0071; Tue, 11 Oct 2022 15:52:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84A546B0073; Tue, 11 Oct 2022 15:52:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 69ADE8E0001; Tue, 11 Oct 2022 15:52:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4CC196B0071 for ; Tue, 11 Oct 2022 15:52:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 24D631A03F4 for ; Tue, 11 Oct 2022 19:52:47 +0000 (UTC) X-FDA: 80009716374.28.1C8D35E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 536931C003B for ; Tue, 11 Oct 2022 19:52:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665517965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hfGmqn9tAmQbOGvVwdHJC7Su1j3vDwXMepRWl/TaZqw=; b=EbO2lzcYsmmqVbGYwBpkaBznfq7ZtnEn3obKlQDbBImuv6AKO0DtSNfW2+i9nBT4YTxbUK dPqLYiAyOfe95wbc7Ata6U+uqSRl08CzfbDNX0B8vcl7PI977kDEZuLD72u3AYrdX/3FTW 0qLgEeyX3YAk7Gm3+bguyLzqBN8O9LA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-360-YTAkz0IlO6-DPT5hAiaqlA-1; Tue, 11 Oct 2022 15:52:44 -0400 X-MC-Unique: YTAkz0IlO6-DPT5hAiaqlA-1 Received: by mail-wr1-f69.google.com with SMTP id e19-20020adfa453000000b0022e3608df56so4247863wra.10 for ; Tue, 11 Oct 2022 12:52:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hfGmqn9tAmQbOGvVwdHJC7Su1j3vDwXMepRWl/TaZqw=; b=DIiNIda0ZT+eHx5dxe5ODBlbOTkfhpnHy81xvUVSlnph9w2Xn5vGdwfRfgAcvd2QjW HtWw+EOR1I70wyBQ1RbJ2DJlqd/yXlXDLJjsGjkbMKSj2lDTlBVLUdHUC5vP8h+8ThAJ lRr4bU8fB0OLHIN/FPnJQ30YIcrzwPEPX5cAvqElzn1dxnFIVV5SRO1Mcz8W02pXCWaY CypxBHUm3eXAfxeJNRkS4hFXZKC0P4yR/6ZT86jFcYsYH7iNHZ9PgGMU+M3s0UeFBszE /KsYmGqzoTtX8HepjNtEAZ4iWcR0ZaQf7+8z3O05VzGwvzYMKj743XhFicKDa3VKbOf8 mgOQ== X-Gm-Message-State: ACrzQf04XJMOUiWeL8J0/GgPt3C0U81FGUj9Y6jEu7d4qB9bGZjoRlYP r5F2+LiTJebJu5ZVXsSa+fxB2RnpBokF5fb9BNPPCKRSiqmxJ3w5OViS7TCrMh4MVmt9JXtSe+Z 0uBPjoMD56nE= X-Received: by 2002:a05:600c:1c82:b0:3c6:c225:eb99 with SMTP id k2-20020a05600c1c8200b003c6c225eb99mr457463wms.23.1665517962542; Tue, 11 Oct 2022 12:52:42 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6tmBpkK+WJSOHUmn5VQDUNgWtbcIhrKpIC4zkf5Axhk/Kkbdp2gc31XE1NOZ07xjKX1LwwDQ== X-Received: by 2002:a05:600c:1c82:b0:3c6:c225:eb99 with SMTP id k2-20020a05600c1c8200b003c6c225eb99mr457447wms.23.1665517962217; Tue, 11 Oct 2022 12:52:42 -0700 (PDT) Received: from ?IPV6:2003:cb:c709:6900:f110:6527:aa46:a922? (p200300cbc7096900f1106527aa46a922.dip0.t-ipconnect.de. [2003:cb:c709:6900:f110:6527:aa46:a922]) by smtp.gmail.com with ESMTPSA id c8-20020a05600c0a4800b003b4fdbb6319sm21838765wmq.21.2022.10.11.12.52.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Oct 2022 12:52:41 -0700 (PDT) Message-ID: <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com> Date: Tue, 11 Oct 2022 21:52:40 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 To: Uladzislau Rezki Cc: Alexander Potapenko , Andrey Konovalov , "linux-mm@kvack.org" , Andrey Ryabinin , Dmitry Vyukov , Vincenzo Frascino , kasan-dev@googlegroups.com References: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> <9ce8a3a3-8305-31a4-a097-3719861c234e@redhat.com> <6d75325f-a630-5ae3-5162-65f5bb51caf7@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665517966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hfGmqn9tAmQbOGvVwdHJC7Su1j3vDwXMepRWl/TaZqw=; b=FXOFvCePNWEgE47ugQEVjV8nWO8Q/47EnR0PXSb/kL+cD/oULSBnfke628pmqt+pxIjpDI 7IKi8/OF8RyHhhmIyM4PnMPMvKP/73lH5+cPVwt8d7d5Sy1cy9EUxMoPHLKk6/Mcttk+7R HO54o9KrrRomowA/E2kvg8H9t8LOMpA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EbO2lzcY; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665517966; a=rsa-sha256; cv=none; b=e1xTHoHHtpMhdAsl1s/jwjTKNbCXuvgx5L0hrvjBJTHiU3JTrVhNjgEzoekRHcBdhPOlH+ kOzDa2KGwpN1cTJNQsrWIqmlqzbPzzvdBozGMwq4j2LdL8ba5o9TE5TQyjCZblWYIirLal L/SZA0cs6aFGLNLi4XW3cl0YpWV44to= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EbO2lzcY; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: a7wf4kf3psjqy5xw1oufdk1owxyghy3b X-Rspamd-Queue-Id: 536931C003B X-HE-Tag: 1665517966-692697 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.10.22 14:19, Uladzislau Rezki wrote: > On Mon, Oct 10, 2022 at 08:56:55AM +0200, David Hildenbrand wrote: >>>>> Maybe try to increase the module-section size to see if it solves the >>>>> problem. >>>> >>>> What would be the easiest way to do that? >>>> >>> Sorry for late answer. I was trying to reproduce it on my box. What i >>> did was trying to load all modules in my system with KASAN_INLINE option: >>> >> >> Thanks! >> >>> >>> #!/bin/bash >>> >>> # Exclude test_vmalloc.ko >>> MODULES_LIST=(`find /lib/modules/$(uname -r) -type f \ >>> \( -iname "*.ko" -not -iname "test_vmalloc*" \) | awk -F"/" '{print $NF}' | sed 's/.ko//'`) >>> >>> function moduleExist(){ >>> MODULE="$1" >>> if lsmod | grep "$MODULE" &> /dev/null ; then >>> return 0 >>> else >>> return 1 >>> fi >>> } >>> >>> i=0 >>> >>> for module_name in ${MODULES_LIST[@]}; do >>> sudo modprobe $module_name >>> >>> if moduleExist ${module_name}; then >>> ((i=i+1)) >>> echo "Successfully loaded $module_name counter $i" >>> fi >>> done >>> >>> >>> as you wrote it looks like it is not easy to reproduce. So i do not see >>> any vmap related errors. >> >> Yeah, it's quite mystery and only seems to trigger on these systems with a >> lot of CPUs. >> >>> >>> Returning back to the question. I think you could increase the MODULES_END >>> address and shift the FIXADDR_START little forward. See the dump_pagetables.c >>> But it might be they are pretty compact and located in the end. So i am not >>> sure if there is a room there. >> >> That's what I was afraid of :) >> >>> >>> Second. It would be good to understand if vmap only fails on allocating for a >>> module: >>> >>> >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>> index dd6cdb201195..53026fdda224 100644 >>> --- a/mm/vmalloc.c >>> +++ b/mm/vmalloc.c >>> @@ -1614,6 +1614,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, >>> va->va_end = addr + size; >>> va->vm = NULL; >>> + trace_printk("-> alloc %lu size, align: %lu, vstart: %lu, vend: %lu\n", size, align, vstart, vend); >>> + >>> spin_lock(&vmap_area_lock); >>> >> >> I'll try grabbing a suitable system again and add some more debugging >> output. Might take a while, unfortunately. >> > Yes that makes sense. Especially to understand if it fails on the MODULES_VADDR > - MODULES_END range or somewhere else. According to your trace output it looks > like that but it would be good to confirm it by adding some traces. > > BTW, vmap code is lack of good trace events. Probably it is worth to add > some basic ones. Was lucky to grab that system again. Compiled a custom 6.0 kernel, whereby I printk all vmap allocation errors, including the range similarly to what you suggested above (but printk only on the failure path). So these are the failing allocations: # dmesg | grep " -> alloc" [ 168.862511] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.863020] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.863841] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.864562] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.864646] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.865688] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.865718] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.866098] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.866551] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.866752] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867147] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867210] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867312] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867650] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867767] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.868059] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.868463] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.868822] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.868919] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.869843] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.869854] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.870174] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.870611] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.870806] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.870982] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 168.879000] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.449101] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.449834] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.450667] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.451539] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.452326] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.453239] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.454052] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.454697] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.454811] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.455575] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.455754] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.461450] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.805223] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.805507] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.929577] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.930389] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.931244] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.932035] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.932796] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.933592] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.934470] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.935344] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 169.970641] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.191600] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.191875] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.241901] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.242708] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.243465] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.244211] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.245060] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.245868] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.246433] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.246657] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.247451] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.248226] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.248902] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.249704] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.250497] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.251244] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.252076] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.587168] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 170.598995] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 171.865721] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 [ 172.138557] -> alloc 917504 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 Really looks like only module vmap space. ~ 1 GiB of vmap module space ... I did try: diff --git a/mm/vmalloc.c b/mm/vmalloc.c index dd6cdb201195..199154a2228a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -72,6 +72,8 @@ early_param("nohugevmalloc", set_nohugevmalloc); static const bool vmap_allow_huge = false; #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */ +static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); + bool is_vmalloc_addr(const void *x) { unsigned long addr = (unsigned long)kasan_reset_tag(x); @@ -1574,7 +1576,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, struct vmap_area *va; unsigned long freed; unsigned long addr; - int purged = 0; int ret; BUG_ON(!size); @@ -1631,23 +1632,22 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, return va; overflow: - if (!purged) { + if (atomic_long_read(&vmap_lazy_nr)) { purge_vmap_area_lazy(); - purged = 1; goto retry; } freed = 0; blocking_notifier_call_chain(&vmap_notify_list, 0, &freed); - if (freed > 0) { - purged = 0; + if (freed > 0) goto retry; - } - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) + if (!(gfp_mask & __GFP_NOWARN)) { pr_warn("vmap allocation for size %lu failed: use vmalloc= to increase size\n", size); + printk("-> alloc %lu size, align: %lu, vstart: %lu, vend: %lu\n", size, align, vstart, vend); + } kmem_cache_free(vmap_area_cachep, va); return ERR_PTR(-EBUSY); @@ -1690,8 +1690,6 @@ static unsigned long lazy_max_pages(void) return log * (32UL * 1024 * 1024 / PAGE_SIZE); } -static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); - But that didn't help at all. That system is crazy: # lspci | wc -l 1117 What I find interesting is that we have these recurring allocations of similar sizes failing. I wonder if user space is capable of loading the same kernel module concurrently to trigger a massive amount of allocations, and module loading code only figures out later that it has already been loaded and backs off. My best guess would be that module loading is serialized completely, but for some reason, something seems to go wrong with a lot of concurrency ... -- Thanks, David / dhildenb