From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 568E2C4332F for ; Sat, 15 Oct 2022 09:23:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EEA66B0072; Sat, 15 Oct 2022 05:23:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59EA66B0075; Sat, 15 Oct 2022 05:23:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43EC16B0078; Sat, 15 Oct 2022 05:23:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2DA916B0072 for ; Sat, 15 Oct 2022 05:23:21 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EB92F40C92 for ; Sat, 15 Oct 2022 09:23:20 +0000 (UTC) X-FDA: 80022645360.16.1E1EDCB Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf24.hostedemail.com (Postfix) with ESMTP id 72C9A18002D for ; Sat, 15 Oct 2022 09:23:20 +0000 (UTC) Received: by mail-ej1-f52.google.com with SMTP id ot12so15152543ejb.1 for ; Sat, 15 Oct 2022 02:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=MYXJEzIwLvn5K4p0xa4MarpR6odk5pnBKyc18VUK7qE=; b=SHvrsXqqcZJpIaj7WmBQ8EBWfPMB0Pclc7yh2T9pjrzmxUcc+VRoHAY5CMo0fPPVJg X9GM6cm3Cko+eQnnwYKkgXXynROMojyHodENHM14tRVcieOSV4uiWFB+8lUt3Iuq6rNZ sR3UukV1uwvYzR5p+c4rZI1sLqlNa107Kiyw8taes5QCYIz5P2A+NlEXZ2ytHobr7NpJ 5pTKZvQQsFDMDz8n2tjcP2sekPsqu6BwMDLnKH6492rD0wIMb293bDMpAIHHFdNsKwO/ g+inrvuoii9mOHrW12gaQZ0m9sAyqAuuqKNDGrN+B/ORwVGTDvLK+iz9b5yRUWzb/ox3 BikA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MYXJEzIwLvn5K4p0xa4MarpR6odk5pnBKyc18VUK7qE=; b=CNIbXkSeFFJkAegnjYroyM35pMkb2xWpjwHPxT96kOH2HBt3ncLhrdJEvCsSwEr2Fs hjL4CUGugRhAQ5Jc/HInsDkaEkTcHZvExYkBp6V/sTgpBAQxLEmy0STaPWountJAB013 wXzBkU5mBwj68SZt4d0tbb3S/q5xJg21bpl6VPDEqvEUooN3+CNxfWZWrUxHB/L+6tIr BLftGJbxRyC9Np/9DND7aaINgnYjBOX9NxUDLJXpyse71dkMKI4Lg5MV2ycHeMsLbyq0 Op88EWuoK9xNbWfgfLlvb3VUxUmq023BYaYSH3OyGj/FrA8CrEyoLe7078IySOWYyvfD sShg== X-Gm-Message-State: ACrzQf2x3BMU835KoSUSpUuRUgHjvHiJ2AYkv8PUvbZNdszfGjraBjXo KW8LDqhEfpWpzgSojVNF3ZY= X-Google-Smtp-Source: AMsMyM4r3oc/UfKILElvb7ykQ/Sj/01HcmK2BEIb+mGEMS9mdE/u4mX7jt0P/+PQh2g3XCdXQXgGQQ== X-Received: by 2002:a17:906:846e:b0:78d:ed3c:edfa with SMTP id hx14-20020a170906846e00b0078ded3cedfamr1386124ejc.515.1665825798830; Sat, 15 Oct 2022 02:23:18 -0700 (PDT) Received: from pc636 (49-224-201-31.ftth.glasoperator.nl. [31.201.224.49]) by smtp.gmail.com with ESMTPSA id u22-20020a056402111600b00458dc7e8ecasm3289631edv.72.2022.10.15.02.23.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Oct 2022 02:23:18 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Sat, 15 Oct 2022 11:23:17 +0200 To: David Hildenbrand Cc: Uladzislau Rezki , Alexander Potapenko , Andrey Konovalov , "linux-mm@kvack.org" , Andrey Ryabinin , Dmitry Vyukov , Vincenzo Frascino , kasan-dev@googlegroups.com Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS Message-ID: References: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> <9ce8a3a3-8305-31a4-a097-3719861c234e@redhat.com> <6d75325f-a630-5ae3-5162-65f5bb51caf7@redhat.com> <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665825800; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MYXJEzIwLvn5K4p0xa4MarpR6odk5pnBKyc18VUK7qE=; b=SZL5en2MajDKf/ASTtCz2GtgV6Xp0MItA2v2u3A9kEny7xxXYeqmshmE7y1yfijQH3e2hP xnDZtRMF7/0HmnQsYLK5pfSf6Ex9NTp46KOmF+TnCqrfz16DXu4LXMmQSwwgRCwKRtwFoX SZOK+ba+ed/qmh4saXyfxraAy/OGRZw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SHvrsXqq; spf=pass (imf24.hostedemail.com: domain of urezki@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665825800; a=rsa-sha256; cv=none; b=6NE2z5y4390dObe3GGtTG43cRJFSNvneQSyS5uMPZskHUFSKU/nJzzmAQSNAFSwbZOWLaW rg3XljcFjz9lj363uZjLByOaCukJFsH/xG3gvbq8Ac6xipHOfTi3pifWM9xcD4jNcNZi+D rVdJ0DDLwssKbzrpH41jHo8Qjyr0ACw= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SHvrsXqq; spf=pass (imf24.hostedemail.com: domain of urezki@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: ouf56dc4faj14cx5w9gakkhgkg4tktcg X-Rspamd-Queue-Id: 72C9A18002D X-HE-Tag: 1665825800-537430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > > > > OK. It is related to a module vmap space allocation when a module is > > inserted. I wounder why it requires 2.5MB for a module? It seems a lot > > to me. > > > > Indeed. I assume KASAN can go wild when it instruments each and every memory > access. > > > > > > > Really looks like only module vmap space. ~ 1 GiB of vmap module space ... > > > > > If an allocation request for a module is 2.5MB we can load ~400 modules > > having 1GB address space. > > > > "lsmod | wc -l"? How many modules your system has? > > > > ~71, so not even close to 400. > > > > What I find interesting is that we have these recurring allocations of similar sizes failing. > > > I wonder if user space is capable of loading the same kernel module concurrently to > > > trigger a massive amount of allocations, and module loading code only figures out > > > later that it has already been loaded and backs off. > > > > > If there is a request about allocating memory it has to be succeeded > > unless there are some errors like no space no memory. > > Yes. But as I found out we're really out of space because module loading > code allocates module VMAP space first, before verifying if the module was > already loaded or is concurrently getting loaded. > > See below. > > [...] > > > I wrote a small patch to dump a modules address space when a fail occurs: > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 83b54beb12fa..88d323310df5 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) > > kmem_cache_free(vmap_area_cachep, va); > > } > > +static void > > +dump_modules_free_space(unsigned long vstart, unsigned long vend) > > +{ > > + unsigned long va_start, va_end; > > + unsigned int total = 0; > > + struct vmap_area *va; > > + > > + if (vend != MODULES_END) > > + return; > > + > > + trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend); > > + > > + spin_lock(&free_vmap_area_lock); > > + list_for_each_entry(va, &free_vmap_area_list, list) { > > + va_start = (va->va_start > vstart) ? va->va_start:vstart; > > + va_end = (va->va_end < vend) ? va->va_end:vend; > > + > > + if (va_start >= va_end) > > + continue; > > + > > + if (va_start >= vstart && va_end <= vend) { > > + trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n", > > + va_start, va_end, va_end - va_start); > > + total += (va_end - va_start); > > + } > > + } > > + > > + spin_unlock(&free_vmap_area_lock); > > + trace_printk("--- Total free: %u ---\n", total); > > +} > > + > > /* > > * Allocate a region of KVA of the specified size and alignment, within the > > * vstart and vend. > > @@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > > goto retry; > > } > > - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) > > + if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) { > > pr_warn("vmap allocation for size %lu failed: use vmalloc= to increase size\n", > > size); > > + dump_modules_free_space(); > > + } > > + > > kmem_cache_free(vmap_area_cachep, va); > > return ERR_PTR(-EBUSY); > > } > > Thanks! > > I can spot the same module getting loaded over and over again concurrently > from user space, only failing after all the allocations when realizing that > the module is in fact already loaded in add_unformed_module(), failing with > -EEXIST. > > That looks quite inefficient. Here is how often user space tries to load the > same module on that system. Note that I print *after* allocating module VMAP > space. > OK. It explains the problem :) Indeed it is inefficient. Allocating and later on figuring out that a module is already there looks weird. Furthermore an attacking from the user space can be organized. > # dmesg | grep Loading | cut -d" " -f5 | sort | uniq -c > 896 acpi_cpufreq > 1 acpi_pad > 1 acpi_power_meter > 2 ahci > 1 cdrom > 2 compiled-in > 1 coretemp > 15 crc32c_intel > 307 crc32_pclmul > 1 crc64 > 1 crc64_rocksoft > 1 crc64_rocksoft_generic > 12 crct10dif_pclmul > 16 dca > 1 dm_log > 1 dm_mirror > 1 dm_mod > 1 dm_region_hash > 1 drm > 1 drm_kms_helper > 1 drm_shmem_helper > 1 fat > 1 fb_sys_fops > 14 fjes > 1 fuse > 205 ghash_clmulni_intel > 1 i2c_algo_bit > 1 i2c_i801 > 1 i2c_smbus > 4 i40e > 4 ib_core > 1 ib_uverbs > 4 ice > 403 intel_cstate > 1 intel_pch_thermal > 1 intel_powerclamp > 1 intel_rapl_common > 1 intel_rapl_msr > 399 intel_uncore > 1 intel_uncore_frequency > 1 intel_uncore_frequency_common > 64 ioatdma > 1 ipmi_devintf > 1 ipmi_msghandler > 1 ipmi_si > 1 ipmi_ssif > 4 irdma > 406 irqbypass > 1 isst_if_common > 165 isst_if_mbox_msr > 300 kvm > 408 kvm_intel > 1 libahci > 2 libata > 1 libcrc32c > 409 libnvdimm > 8 Loading > 1 lpc_ich > 1 megaraid_sas > 1 mei > 1 mei_me > 1 mgag200 > 1 nfit > 1 pcspkr > 1 qrtr > 405 rapl > 1 rfkill > 1 sd_mod > 2 sg > 409 skx_edac > 1 sr_mod > 1 syscopyarea > 1 sysfillrect > 1 sysimgblt > 1 t10_pi > 1 uas > 1 usb_storage > 1 vfat > 1 wmi > 1 x86_pkg_temp_thermal > 1 xfs > > > For each if these loading request, we'll reserve module VMAP space, and free > it once we realize later that the module was already previously loaded. > > So with a lot of CPUs we might end up trying to load the same module that > often at the same time that we actually run out of module VMAP space. > > I have a prototype patch that seems to fix this in module loading code. > Good! I am glad the problem can be solved :) -- Uladzislau Rezki