From: Lucas De Marchi <lucas.demarchi@intel.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>, <david@redhat.com>,
<patches@lists.linux.dev>, <linux-modules@vger.kernel.org>,
<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<pmladek@suse.com>, <petr.pavlu@suse.com>, <prarit@redhat.com>,
<torvalds@linux-foundation.org>, <rafael@kernel.org>,
<christophe.leroy@csgroup.eu>, <tglx@linutronix.de>,
<peterz@infradead.org>, <song@kernel.org>, <rppt@kernel.org>,
<dave@stgolabs.net>, <willy@infradead.org>, <vbabka@suse.cz>,
<mhocko@suse.com>, <dave.hansen@linux.intel.com>,
<colin.i.king@gmail.com>, <jim.cromie@gmail.com>,
<catalin.marinas@arm.com>, <jbaron@akamai.com>,
<rick.p.edgecombe@intel.com>, <j.granados@samsung.com>
Subject: Re: [PATCH] module: add debugging auto-load duplicate module support
Date: Fri, 21 Apr 2023 09:42:39 -0700 [thread overview]
Message-ID: <bnhskcp6hy6liwlefyjcxumlnvmkmyvhvatkq7ve3kb2zecyxl@c3jq2apjqlcy> (raw)
In-Reply-To: <ZEKn89wPH19r2bM4@kroah.com>
On Fri, Apr 21, 2023 at 05:12:51PM +0200, Greg KH wrote:
>On Thu, Apr 20, 2023 at 02:03:32PM -0700, Luis Chamberlain wrote:
>> On Thu, Apr 20, 2023 at 07:32:10AM +0200, Greg KH wrote:
>> > On Wed, Apr 19, 2023 at 04:32:30PM -0700, Luis Chamberlain wrote:
>> > > > It's not "wasted", as it is returned when the module is determined to be
>> > > > a duplicate. Otherwise everyone will want this enabled as they think it
>> > > > will actually save memory.
>> > >
>> > > I'll change the language to be clear the issue is memory pressure early
>> > > on boot. I'll also add a bit of language to help at least guide people
>> > > to realize that the real value-add for this, ie, I'll have to mention we
>> > > suspect issue is udev and not module auto-loading and that this however
>> > > may still help find a few cases we can optimize for.
>> >
>> > This isn't udev's "problem", all it is doing is what the kernel asked it
>> > to do. The kernel said "Here's a new device I found, load a module for
>> > it please!"
>>
>> If you believe that then the issue is still a kernel issue, and the
>> second part to that sentence "load a module for it" was done without
>> consideration of the implications, or without optimizations in mind.
>> Given the implications were perhaps not well understood it is unfair
>> for us to be hard on ourselves on that. But now we know, ideally if we
>> could we *should* only issue a request for a module *once* during boot.
>
>But there is no mapping between devices and modules other than what is
>exported in the module info and that is up to userspace to handle.
>
>> Where does the kernel actually say "load a module"?
>
>The driver core says "hey a new device is now present!"
>
>Userspace takes that message and calls kmod with the device information
>which then determines what module to load by looking at the device
>aliases.
>
>> Isn't that just an implied gesture?
>
>Yes.
>
>> > And it's the kmod code here, not udev itself doing all of this.
>>
>> Yes, IMHO kmod could and *should* be enhanced to share a loading context
>> during boot so to avoid duplicates too and then udev would have to
>> embrace such functionality. That's going to take time to propagate, as
>> you can imagine.
>
>udev is just the transport to kmod here, it's not in the job of
>filtering duplicate messages.
udev nowadays use *lib*kmod. It's udev who has the
context it can operate on.
Also, those module loads will not use the path this patch is changing
call_modprobe is not the path that triggers udev to load modules.
/me confused
What can be done from userspace in the udev path
1) udev to do the ratelimit'ing. Define a time window,
filter out uevents in systemd/src/udev/udev-builtin-kmod.c
2) libkmod to do the ratelimit'ing with a similar approach, but udev
needs to tell libkmod what is the window it wants to use
3) libkmod to act on the context it has from the *kernel*. It used
to be cheap with the call simply blocking early on the syscall in
a mutex... or we didn't have that many calls. So libkmod
simply calls [f]init_module() again regardless of the module's
state being in a "coming" state. Is this the case here? I haven't
seen this data. This is done to avoid a) the toctou implied and b) to
provide the correct return for that call - libkmod can't know if the
previous call will succeed or fail.
libkmod only skips the call if the module is already in
the live state. It seems systemd-udev also duplicates the check
in src/shared/module-util.c:module_load_and_warn()
Note that libkmod already spares loading the module multiple times from
disk as it uses a memory pool for the modules. It reuses one iff it
comes from the same context (i.e. it's only udev involved and not a
bunch of parallel calls to modprobe).
4) If all the calls are coming from the same context and it is udev...
I'm not sure this is actually the problem - the udev's kmod builtin
handler is single-threaded and will handle one request at a time.
I don't see any data to confirm it's coming from a single source or
multiple sources. Could you get a trace containing [f]init_module and
the trace_module_request(), together with a verbose udev log?
If this is all coming from a synthetic use case with thousands of
modprobe execs, I'm not sure there is much to do on the userspace side.
>
>> > Why not
>> > just rate-limit it in userspace if your system can't handle 10's of
>> > thousands of kmod calls all at once? I think many s390 systems did this
>> > decades ago when they were controlling 10's of thousands of scsi devices
>> > and were hit with "device detection storms" at boot like this.
>>
>> Boot is a special context and in this particular case I agree userspace
>> kmod could/should be extended to avoid duplicate module requests in that
see above
>> context. But likewise the kernel should only have to try to issue a
>> request for a single module once, if it could easily do that.
>
>Are you sure that this is happening at boot in a way that userspace
>didn't just trigger it on its own after init started up? That happens
>as a "coldboot" walk of the device tree and all uevent are regenerated.
>That is userspace asking for this, so there's nothing that the kernel
>can do.
>
>> This does beg the question, why force userspace to rate limit if we
>> can do better in the kernel? Specially if *we're the ones*, as you say,
>> that are hinting to userspace to shoot back loading modules for us and we
>> know we're just going to drop duplicates?
>
>Maybe error out of duplicate module loading earlier? I don't know,
>sorry.
I still don't see what's the source of the problem from the data
available. Is the kernel issuing multiple request_module()? Or is the
kernel sending multiple udev event for userspace to map the alias to the
module and load it? The mapping alias -> module currently belongs in
userspace so if you are de-duplicating, it can't be only on the module
name.
>
>> > What specific devices and bus types are the problem here for these systems?
>>
>> My best assessment of the situation is that each CPU in udev ends up triggering
>> a load of duplicate set of modules, not just one, but *a lot*. Not sure
>> what heuristics udev uses to load a set of modules per CPU.
>
>Again, finding which device and bus is causing the problem is going to
>be key here to try to solve the issue. Are you logging duplicate module
agreed.
If the info I requested above is available on other threads, could you
point me to those?
thanks
Lucas De Marchi
>loads by name as well?
>
>thanks,
>
>greg k-h
next prev parent reply other threads:[~2023-04-21 16:43 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 20:46 Luis Chamberlain
2023-04-19 7:15 ` Greg KH
2023-04-19 23:32 ` Luis Chamberlain
2023-04-20 5:32 ` Greg KH
2023-04-20 21:03 ` Luis Chamberlain
2023-04-21 15:12 ` Greg KH
2023-04-21 16:42 ` Lucas De Marchi [this message]
2023-04-21 17:38 ` Luis Chamberlain
2023-04-21 18:31 ` Lucas De Marchi
2023-04-21 18:45 ` Luis Chamberlain
2023-04-26 10:13 ` Petr Pavlu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bnhskcp6hy6liwlefyjcxumlnvmkmyvhvatkq7ve3kb2zecyxl@c3jq2apjqlcy \
--to=lucas.demarchi@intel.com \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=colin.i.king@gmail.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=j.granados@samsung.com \
--cc=jbaron@akamai.com \
--cc=jim.cromie@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-modules@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=petr.pavlu@suse.com \
--cc=pmladek@suse.com \
--cc=prarit@redhat.com \
--cc=rafael@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox