linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lucas De Marchi <lucas.demarchi@intel.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>, <david@redhat.com>,
	<patches@lists.linux.dev>, <linux-modules@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<pmladek@suse.com>, <petr.pavlu@suse.com>, <prarit@redhat.com>,
	<torvalds@linux-foundation.org>, <rafael@kernel.org>,
	<christophe.leroy@csgroup.eu>, <tglx@linutronix.de>,
	<peterz@infradead.org>, <song@kernel.org>, <rppt@kernel.org>,
	<dave@stgolabs.net>, <willy@infradead.org>, <vbabka@suse.cz>,
	<mhocko@suse.com>, <dave.hansen@linux.intel.com>,
	<colin.i.king@gmail.com>, <jim.cromie@gmail.com>,
	<catalin.marinas@arm.com>, <jbaron@akamai.com>,
	<rick.p.edgecombe@intel.com>, <j.granados@samsung.com>
Subject: Re: [PATCH] module: add debugging auto-load duplicate module support
Date: Fri, 21 Apr 2023 09:42:39 -0700	[thread overview]
Message-ID: <bnhskcp6hy6liwlefyjcxumlnvmkmyvhvatkq7ve3kb2zecyxl@c3jq2apjqlcy> (raw)
In-Reply-To: <ZEKn89wPH19r2bM4@kroah.com>

On Fri, Apr 21, 2023 at 05:12:51PM +0200, Greg KH wrote:
>On Thu, Apr 20, 2023 at 02:03:32PM -0700, Luis Chamberlain wrote:
>> On Thu, Apr 20, 2023 at 07:32:10AM +0200, Greg KH wrote:
>> > On Wed, Apr 19, 2023 at 04:32:30PM -0700, Luis Chamberlain wrote:
>> > > > It's not "wasted", as it is returned when the module is determined to be
>> > > > a duplicate.  Otherwise everyone will want this enabled as they think it
>> > > > will actually save memory.
>> > >
>> > > I'll change the language to be clear the issue is memory pressure early
>> > > on boot. I'll also add a bit of language to help at least guide people
>> > > to realize that the real value-add for this, ie, I'll have to mention we
>> > > suspect issue is udev and not module auto-loading and that this however
>> > > may still help find a few cases we can optimize for.
>> >
>> > This isn't udev's "problem", all it is doing is what the kernel asked it
>> > to do.  The kernel said "Here's a new device I found, load a module for
>> > it please!"
>>
>> If you believe that then the issue is still a kernel issue, and the
>> second part to that sentence "load a module for it" was done without
>> consideration of the implications, or without optimizations in mind.
>> Given the implications were perhaps not well understood it is unfair
>> for us to be hard on ourselves on that. But now we know, ideally if we
>> could we *should* only issue a request for a module *once* during boot.
>
>But there is no mapping between devices and modules other than what is
>exported in the module info and that is up to userspace to handle.
>
>> Where does the kernel actually say "load a module"?
>
>The driver core says "hey a new device is now present!"
>
>Userspace takes that message and calls kmod with the device information
>which then determines what module to load by looking at the device
>aliases.
>
>> Isn't that just an implied gesture?
>
>Yes.
>
>> > And it's the kmod code here, not udev itself doing all of this.
>>
>> Yes, IMHO kmod could and *should* be enhanced to share a loading context
>> during boot so to avoid duplicates too and then udev would have to
>> embrace such functionality. That's going to take time to propagate, as
>> you can imagine.
>
>udev is just the transport to kmod here, it's not in the job of
>filtering duplicate messages.

udev nowadays use *lib*kmod. It's udev who has the
context it can operate on.

Also, those module loads will not use the path this patch is changing
call_modprobe is not the path that triggers udev to load modules.
/me confused

What can be done from userspace in the udev path

1) udev to do the ratelimit'ing. Define a time window,
filter out uevents in systemd/src/udev/udev-builtin-kmod.c

2) libkmod to do the ratelimit'ing with a similar approach, but udev
needs to tell libkmod what is the window it wants to use

3) libkmod to act on the context it has from the *kernel*. It used
to be cheap with the call simply blocking early on the syscall in
a mutex... or we didn't have that many calls. So libkmod
simply calls [f]init_module() again regardless of the module's
state being in a "coming" state.  Is this the case here? I haven't
seen this data. This is done to avoid a) the toctou implied and b) to
provide the correct return for that call - libkmod can't know if the
previous call will succeed or fail.

libkmod only skips the call if the module is already in
the live state. It seems systemd-udev also duplicates the check
in src/shared/module-util.c:module_load_and_warn()

Note that libkmod already spares loading the module multiple times from
disk as it uses a memory pool for the modules. It reuses one iff it
comes from the same context (i.e. it's only udev involved and not a
bunch of parallel calls to modprobe).

4) If all the calls are coming from the same context and it is udev...
I'm not sure this is actually the problem - the udev's kmod builtin
handler is single-threaded and will handle one request at a time.
I don't see any data to confirm it's coming from a single source or
multiple sources. Could you get a trace containing [f]init_module and
the trace_module_request(), together with a verbose udev log?

If this is all coming from a synthetic use case with thousands of
modprobe execs, I'm not sure there is much to do on the userspace side.

>
>> > Why not
>> > just rate-limit it in userspace if your system can't handle 10's of
>> > thousands of kmod calls all at once? I think many s390 systems did this
>> > decades ago when they were controlling 10's of thousands of scsi devices
>> > and were hit with "device detection storms" at boot like this.
>>
>> Boot is a special context and in this particular case I agree userspace
>> kmod could/should be extended to avoid duplicate module requests in that

see above

>> context. But likewise the kernel should only have to try to issue a
>> request for a single module once, if it could easily do that.
>
>Are you sure that this is happening at boot in a way that userspace
>didn't just trigger it on its own after init started up?  That happens
>as a "coldboot" walk of the device tree and all uevent are regenerated.
>That is userspace asking for this, so there's nothing that the kernel
>can do.
>
>> This does beg the question, why force userspace to rate limit if we
>> can do better in the kernel? Specially if *we're the ones*, as you say,
>> that are hinting to userspace to shoot back loading modules for us and we
>> know we're just going to drop duplicates?
>
>Maybe error out of duplicate module loading earlier?  I don't know,
>sorry.

I still don't see what's the source of the problem from the data
available. Is the kernel issuing multiple request_module()? Or is the
kernel sending multiple udev event for userspace to map the alias to the
module and load it? The mapping alias -> module currently belongs in
userspace so if you are de-duplicating, it can't be only on the module
name.

>
>> > What specific devices and bus types are the problem here for these systems?
>>
>> My best assessment of the situation is that each CPU in udev ends up triggering
>> a load of duplicate set of modules, not just one, but *a lot*. Not sure
>> what heuristics udev uses to load a set of modules per CPU.
>
>Again, finding which device and bus is causing the problem is going to
>be key here to try to solve the issue.  Are you logging duplicate module

agreed.

If the info I requested above is available on other threads, could you
point me to those?

thanks
Lucas De Marchi

>loads by name as well?
>
>thanks,
>
>greg k-h


  reply	other threads:[~2023-04-21 16:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 20:46 Luis Chamberlain
2023-04-19  7:15 ` Greg KH
2023-04-19 23:32   ` Luis Chamberlain
2023-04-20  5:32     ` Greg KH
2023-04-20 21:03       ` Luis Chamberlain
2023-04-21 15:12         ` Greg KH
2023-04-21 16:42           ` Lucas De Marchi [this message]
2023-04-21 17:38             ` Luis Chamberlain
2023-04-21 18:31               ` Lucas De Marchi
2023-04-21 18:45                 ` Luis Chamberlain
2023-04-26 10:13                 ` Petr Pavlu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bnhskcp6hy6liwlefyjcxumlnvmkmyvhvatkq7ve3kb2zecyxl@c3jq2apjqlcy \
    --to=lucas.demarchi@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=colin.i.king@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=j.granados@samsung.com \
    --cc=jbaron@akamai.com \
    --cc=jim.cromie@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=petr.pavlu@suse.com \
    --cc=pmladek@suse.com \
    --cc=prarit@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox