From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25992C77B73 for ; Wed, 19 Apr 2023 07:15:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69F3F8E0003; Wed, 19 Apr 2023 03:15:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64E888E0001; Wed, 19 Apr 2023 03:15:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 517618E0003; Wed, 19 Apr 2023 03:15:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 40D8E8E0001 for ; Wed, 19 Apr 2023 03:15:18 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F39FAA01FD for ; Wed, 19 Apr 2023 07:15:17 +0000 (UTC) X-FDA: 80697279474.03.8F9C585 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 2264140025 for ; Wed, 19 Apr 2023 07:15:15 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=ssAsuQK+; spf=pass (imf01.hostedemail.com: domain of gregkh@linuxfoundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org; dmarc=pass (policy=none) header.from=linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681888516; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=20CoBZOE7wVs4f4rufOIvqMB4chX9B8zLlrrXNkvLYE=; b=ysO/YNBN794Bk8UTe3XmaFNpARCiR/7BjhVMukRPB0hwN3PFp9Nf+Qso9N1hR0M3t6kVcZ GBrD55AB74V5ykqkEINB8DmwpbmwzDrebeYFWkdot6lmqwuM/JRA4eg8xnEZtT+qteRMI0 rndqBcFFUoVGCbLjgOhW5tHCw5tb8Zw= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=ssAsuQK+; spf=pass (imf01.hostedemail.com: domain of gregkh@linuxfoundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org; dmarc=pass (policy=none) header.from=linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681888516; a=rsa-sha256; cv=none; b=iBPc6431eSUegiQYy759nxThu8gOv1CKDfxIvVZlXAJTjrS/L8fXiFpo6nXtQMFfa1Hkkz TJKeBIear/YbXKGGcp54vvKIpka2caUNzo6DISVNnqDGQ6JoR+y+AuGX1eJ12kvQjkG2wd xKb9gpPxCE51VwmbnbIbZb56WUpHzyM= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0745963BB3; Wed, 19 Apr 2023 07:15:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE10BC433D2; Wed, 19 Apr 2023 07:15:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1681888514; bh=raTzdIyYvUwuNJ9Dt1Cuom2hZfleXfVay1boETzqlsg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ssAsuQK+EEa0ROEjk/687q6fRihZPoK9sha9JuRE/nv9sIBz1kbdxQZec5cZyqa+Q Tt4H6f+0CmvE/Jh/XoVYiqmgGl7BzYPrAtzH3/tqhxYNZvpg8VD+QPaHBSNUZOA6Tm Y9L5LQgIZDU/nThd5zYWwUxZiNTZa4HC9Hm+4lx4= Date: Wed, 19 Apr 2023 09:15:11 +0200 From: Greg KH To: Luis Chamberlain Cc: david@redhat.com, patches@lists.linux.dev, linux-modules@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com, torvalds@linux-foundation.org, rafael@kernel.org, christophe.leroy@csgroup.eu, tglx@linutronix.de, peterz@infradead.org, song@kernel.org, rppt@kernel.org, dave@stgolabs.net, willy@infradead.org, vbabka@suse.cz, mhocko@suse.com, dave.hansen@linux.intel.com, colin.i.king@gmail.com, jim.cromie@gmail.com, catalin.marinas@arm.com, jbaron@akamai.com, rick.p.edgecombe@intel.com Subject: Re: [PATCH] module: add debugging auto-load duplicate module support Message-ID: <2023041951-evolution-unwitting-1791@gregkh> References: <20230418204636.791699-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230418204636.791699-1-mcgrof@kernel.org> X-Rspamd-Queue-Id: 2264140025 X-Stat-Signature: xcstczgofdbicpk4cspfj4re1qy55j4e X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681888515-879252 X-HE-Meta: U2FsdGVkX1+XdOgyGG6zknDMv9Lhsc8kmQtk6PJ/o3SmOOVE2AZlDuA9GYemsXWXDcJhz1zHbSypbHotvDzQCA0PjisZEa3TnY4YLskE/f0rVrwILCIW4Xhn200CrKlb2EiaFg+QqNetEZdoH5kMsufBwmQayyI4gVSqnQ7D+5t7ofHtZTC/vx8XmJWpJIlOpZKwIOPn37ePsVcdgJVa+NiDGDh03R9fEh2mRxZvwlFr0uh1+2mxLQwZB/r68D/SoVy9izLmz5N9dKV8LKWvza8pMd3ltzrQnCWgGg72isIFgLfIdqnOPHcQ6PYBpdAaEyqYdAlmEZe3rdCjWiztTbTsBXHHciZB3lDL5GKPCTBk4LUWvASt9l3dkYUIg3hk3Pt9EWI5QbWnCyB107m3G7ROwpw+nJ53Gw4g7YCC+YOshr5gVoZyl88XsseImZ6INyqYVa23ap/PXD/lGrc72KkooRADFVS5EnU5QrHpCShNepntS49/jOr1gPMGFJ3KD/7plhww2df+jBL0yXvoJyHyOa7fjuu461BjZdTpBRC/hgQRo8wDD8M+ykk/HvT6ooCv2KzS7Q7BwUgqS6RnRaaSd+vIDMZF7nVXRUdL8zbbWoQeiUpN4K4nFrR00H6ONbBhZ14FPxR77e4zRLLIj5dGHAvf8aiOqG8TDHihxDksW4FM3ad42GOhRKGGn8BPo2WtQnKT4/rzWe9lhgRafflrw+yaEyfWp56kkci2F78mDs5SDHYE8mjUXaj9M3vzw7U9swh+i4JEmtaPOHo3FRooNEEG0hpSQyzkb6n15bYLYkJ8NlYmBQ0GZ+qeMze5t6wcLzxZG6wbqBIgf8LtKuZc679aOYnIOalaH6PSx48lP+maFvxmXmukfv1UV0aArQmGIf7LxHyMla018oE+n6+Pe27yrMWB9UNaC4eTx9BgZkz2maQKBPA7NJE7stul3rfcwPcOxbaeYQT55bJ x6NjyA6G eJ/sgZueyEUTD9Zp0TvNTMark+CmxLb68JxGt9GGaMXfpNIemut0fNaSz5E0nN4zt/3GsLLp4glutxk9i+K+vcwS/+j28cDr03eXcfvugGtWT3Jukw2GCBJT3S3Y0mwXZLKmAF+F+a/QWeID3LF/jNYYN7eYF84WWqrGwJAamU6tH2qeKRhPIrZmfu0xCtUWpMgF03hY3F8xHMumHKsNyPnq7ktJcKgR4PcVaGBewIUVUTbZ8YST3dlCFxGEO8IqMvVs2zq9KoBZcutfHRInfdF/CnIdK53Zr7uNyzPfRRVsyhg0g5f8KNTJDueH/uF6SlHA3c9mHJyYhrPL9XhuxD+X4PTc13F6d//AQwD7VBr55DORtfZcFYAvxYQ9Vo5kjgPn7tM6HmR739C36QmGAw7ghGr4u5vhRh/+m1pXGtnM4Vn+IS/n2nZLuAERg8Fdaquyg7wNdxMqHTB4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 18, 2023 at 01:46:36PM -0700, Luis Chamberlain wrote: > This adds debugging support to the kernel module auto-loader to > easily detect and deal with duplicate module requests. To aid > with possible bootup failure issues it will supress the waste > in virtual memory when races happen before userspace loads a > module and the kernel is still issuing requests for the same > module. > > Folks debugging virtual memory abuse on bootup can and should > enable this to see what WARN()s come on, to see if module > auto-loading is to blame for their woes. You get 72 columns for changelog text, so you can use it :) > > Signed-off-by: Luis Chamberlain > --- > > Changes on this patch since the last RFC: > > o dropped the kernel_read*() patch from this series moving to > punt the issues as a udev issue now that we have proof auto-loading > is not the issue > o some spell checks > > kernel/module/Kconfig | 40 +++++++ > kernel/module/Makefile | 1 + > kernel/module/dups.c | 234 +++++++++++++++++++++++++++++++++++++++ > kernel/module/internal.h | 15 +++ > kernel/module/kmod.c | 23 +++- > 5 files changed, 309 insertions(+), 4 deletions(-) > create mode 100644 kernel/module/dups.c > > diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig > index e6df183e2c80..cc146ef4a6ac 100644 > --- a/kernel/module/Kconfig > +++ b/kernel/module/Kconfig > @@ -59,6 +59,46 @@ config MODULE_STATS > > If unsure, say N. > > +config MODULE_AUTOLOAD_SUPRESS_DUPS MODULE_DEBUG_DUPLICATE perhaps? It has nothing to do with autoloading (other than that is what userspace is doing) and you aren't suppressing anything except throwing up warnings, right? > + bool "Debug duplicate modules with auto-loading" > + help > + Module autoloading allows in-kernel code to request modules through > + the *request_module*() API calls. This in turn just calls userspace > + modprobe. Although modprobe checks to see if a module is already > + loaded before trying to load a module there is a small time window in > + which multiple duplicate requests can end up in userspace and multiple > + modprobe calls race calling finit_module() around the same time for > + duplicate modules. The finit_module() system call can consume in the > + worst case more than twice the respective module size in virtual > + memory for each duplicate module requests. Although duplicate module > + requests are non-fatal virtual memory is a limited resource and each > + duplicate module request ends up just wasting virtual memory. It's not "wasted", as it is returned when the module is determined to be a duplicate. Otherwise everyone will want this enabled as they think it will actually save memory. > + This debugging facility will create WARN() splats for duplicate module > + requests to help identify if module auto-loading is the culprit to your > + woes. Since virtual memory abuse caused by duplicate module requests > + could render a system unusable this functionality will also suppresses > + the waste in virtual memory caused by duplicate requests by sharing > + races in requests for the same module to a single unified request. > + Once a non-wait request_module() call completes a module should be > + loaded and modprobe should simply not allow new finit_module() calls. > + > + Enable this functionality to try to debug virtual memory abuse during > + boot on systems and identify if the abuse was due to module > + auto-loading. > + > + If the first module request used request_module_nowait() we cannot > + use that as the anchor to wait for duplicate module requests, since > + users of request_module() do want a proper return value. If a call > + for the same module happened earlier with request_module() though, > + then a duplicate request_module_nowait() would be detected. > + > + You want to enable this if you want to debug and see if duplicate > + module auto-loading might be causing virtual memory abuse during > + bootup. A kernel trace will be provided for each duplicate request. > + > + Disable this if you are on production. "on production"? That does not translate well, how about: Only enable this for debugging system functionality, never have it enabled on real systems. or something like that? > + > endif # MODULE_DEBUG > > config MODULE_FORCE_LOAD > diff --git a/kernel/module/Makefile b/kernel/module/Makefile > index 52340bce497e..e8b121ac39cf 100644 > --- a/kernel/module/Makefile > +++ b/kernel/module/Makefile > @@ -10,6 +10,7 @@ KCOV_INSTRUMENT_module.o := n > obj-y += main.o > obj-y += strict_rwx.o > obj-y += kmod.o > +obj-$(CONFIG_MODULE_AUTOLOAD_SUPRESS_DUPS) += dups.o > obj-$(CONFIG_MODULE_DECOMPRESS) += decompress.o > obj-$(CONFIG_MODULE_SIG) += signing.o > obj-$(CONFIG_LIVEPATCH) += livepatch.o > diff --git a/kernel/module/dups.c b/kernel/module/dups.c > new file mode 100644 > index 000000000000..903ab7c7e8f4 > --- /dev/null > +++ b/kernel/module/dups.c > @@ -0,0 +1,234 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * kmod dups - the kernel module autoloader duplicate suppressor > + * > + * Copyright (C) 2023 Luis Chamberlain > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +DEFINE_MUTEX(kmod_dup_mutex); static? > +static LIST_HEAD(dup_kmod_reqs); > + > +struct kmod_dup_req { > + struct list_head list; > + char name[MODULE_NAME_LEN]; > + struct completion first_req_done; > + struct work_struct complete_work; > + struct delayed_work delete_work; > + int dup_ret; > +}; > + > +static struct kmod_dup_req *kmod_dup_request_lookup(char *module_name) Lock requirements? You should document that here. > +{ > + struct kmod_dup_req *kmod_req; > + > + list_for_each_entry_rcu(kmod_req, &dup_kmod_reqs, list, > + lockdep_is_held(&kmod_dup_mutex)) { > + if (strlen(kmod_req->name) == strlen(module_name) && > + !memcmp(kmod_req->name, module_name, strlen(module_name))) { > + return kmod_req; > + } > + } > + > + return NULL; > +} > + > +static void kmod_dup_request_delete(struct work_struct *work) > +{ > + struct kmod_dup_req *kmod_req; > + kmod_req = container_of(to_delayed_work(work), struct kmod_dup_req, delete_work); > + > + /* > + * The typical situation is a module successully loaded. In that > + * situation the module will be present already in userspace. If > + * new requests come in after that, userspace will already know the > + * module is loaded so will just return 0 right away. There is still > + * a small chance right after we delete this entry new request_module() > + * calls may happen after that, they can happen. These heuristics > + * are to protect finit_module() abuse for auto-loading, if modules > + * are still tryign to auto-load even if a module is already loaded, > + * that's on them, and those inneficiencies should not be fixed by > + * kmod. The inneficies there are a call to modprobe and modprobe > + * just returning 0. > + */ > + mutex_lock(&kmod_dup_mutex); > + list_del_rcu(&kmod_req->list); > + synchronize_rcu(); > + mutex_unlock(&kmod_dup_mutex); > + kfree(kmod_req); > +} > + > +static void kmod_dup_request_complete(struct work_struct *work) > +{ > + struct kmod_dup_req *kmod_req; > + > + kmod_req = container_of(work, struct kmod_dup_req, complete_work); > + > + /* > + * This will ensure that the kernel will let all the waiters get > + * informed its time to check the return value. It's time to > + * go home. > + */ > + complete_all(&kmod_req->first_req_done); > + > + /* > + * Now that we have allowed prior request_module() calls to go on > + * with life, let's schedule deleting this entry. We don't have > + * to do it right away, but we *eventually* want to do it so to not > + * let this linger forever as this is just a boot optimization for > + * possible abuses of vmalloc() incurred by finit_module() thrashing. > + */ > + queue_delayed_work(system_wq, &kmod_req->delete_work, 60 * HZ); > +} > + > +bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret) > +{ > + struct kmod_dup_req *kmod_req, *new_kmod_req; > + int ret; > + > + /* > + * Pre-allocate the entry in case we have to use it later > + * to avoid contention with the mutex. > + */ > + new_kmod_req = kzalloc(sizeof(*new_kmod_req), GFP_KERNEL); > + if (!new_kmod_req) > + return false; > + > + memcpy(new_kmod_req->name, module_name, strlen(module_name)); > + INIT_WORK(&new_kmod_req->complete_work, kmod_dup_request_complete); > + INIT_DELAYED_WORK(&new_kmod_req->delete_work, kmod_dup_request_delete); > + init_completion(&new_kmod_req->first_req_done); > + > + mutex_lock(&kmod_dup_mutex); > + > + kmod_req = kmod_dup_request_lookup(module_name); > + if (!kmod_req) { > + /* > + * If the first request that came through for a module > + * was with request_module_nowait() we cannot wait for it > + * and share its return value with other users which may > + * have used request_module() and need a proper return value > + * so just skip using them as an anchor. > + * > + * If a prior request to this one came through with > + * request_module() though, then a request_module_nowait() > + * would benefit from duplicate detection. > + */ > + if (!wait) { > + kfree(new_kmod_req); > + pr_warn("New request_module_nowait() for %s -- cannot track duplicates for this request\n", module_name); pr_dbg() as this is debugging? And did you set your pr_fmt value to make it obvious where these messages are coming from? > + mutex_unlock(&kmod_dup_mutex); > + return false; > + } > + > + /* > + * There was no duplicate, just add the request so we can > + * keep tab on duplicates later. > + */ > + pr_info("New request_module() for %s\n", module_name); Why not pr_dbg()? > + list_add_rcu(&new_kmod_req->list, &dup_kmod_reqs); > + mutex_unlock(&kmod_dup_mutex); > + return false; > + } > + mutex_unlock(&kmod_dup_mutex); > + > + /* We are dealing with a duplicate request now */ > + No blank line needed. > + kfree(new_kmod_req); > + > + /* > + * To fix these try to use try_then_request_module() instead as that > + * will check if the component you are looking for is present or not. > + * You could also just queue a single request to load the module once, > + * instead of having each and everything you need try to request for > + * the module. > + * > + * Duplicate request_module() calls can cause quite a bit of wasted > + * vmalloc() space when racing with userspace. > + */ > + WARN(1, "module-autoload: duplicate request for module %s\n", module_name); Remember, this will trigger syzbot and any system that has panic-on-warn, so be prepared for that mess. Especially the syzbot reports, that's going to cause lots of issues for you if you don't tell them to always disable this option. thanks, greg k-h