From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
Mark Brown <broonie@kernel.org>,
Linus Walleij <linus.walleij@linaro.org>,
Roland Dreier <roland@kernel.org>,
Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
ksummit@lists.linux.dev, Daniel Vetter <daniel@ffwll.ch>
Subject: Re: cdev/devm_* issues (was Re: [TECH TOPIC] Rust for Linux)
Date: Thu, 8 Jul 2021 11:23:30 +0300 [thread overview]
Message-ID: <YOa2Au2TEjCpm1xX@pendragon.ideasonboard.com> (raw)
In-Reply-To: <YOagA4bgdGYos5aa@kroah.com>
Hi Greg,
On Thu, Jul 08, 2021 at 08:49:39AM +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 08:17:08PM +0300, Laurent Pinchart wrote:
> > On Wed, Jul 07, 2021 at 02:45:04PM +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 01:38:44PM +0100, James Bottomley wrote:
> > > > On Wed, 2021-07-07 at 14:20 +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 12:34:31PM +0100, James Bottomley wrote:
> > > > > > On Wed, 2021-07-07 at 11:50 +0100, Mark Brown wrote:
> > > > > > > On Tue, Jul 06, 2021 at 10:36:14PM +0200, Linus Walleij wrote:
> > > > > > > > On Tue, Jul 6, 2021 at 10:00 PM Roland Dreier wrote:
> > > > > > > > > "devres" / devm_xxx was an attempt to deal with this in C,
> > > > > > > > > but it only solves some cases of this and has not received a
> > > > > > > > > lot of adoption (we can argue about the reasons).
> > > > > > > >
> > > > > > > > Really? From my point of view that is adopted all over the map.
> > > > > > > > I add new users all the time and use it as much as I can when
> > > > > > > > writing new drivers.
> > > > > > >
> > > > > > > Yes, it's *super* widely used in most of the kernel. Perhaps
> > > > > > > there's some subsystems that reject it for some reason.
> > > > > > >
> > > > > > > > I think it's a formidable success, people just need to learn to
> > > > > > > > do it more.
> > > > > > >
> > > > > > > There *are* issues with people adopting it too enthusiastically -
> > > > > > > as well as the memory lifetime issues that Laurent mentioned it's
> > > > > > > easy for it to cause problems with interrupt handlers that are
> > > > > > > left live longer than they should be and try to use things that
> > > > > > > were already deallocated.
> >
> > (CC'ing Daniel Vetter as the author of the DRM devres-based resource
> > management)
> >
> > I've given this lots of thoughts lately, in the context of V4L2, but it
> > should be roughly the same for character devices in general. Here's what
> > I think should be done for drivers that expose character devices.
> >
> > - Drivers must stop allocating their internal data structure (the one
> > they set as device data with dev_set_drvdata()) with devm_kzalloc().
> > The data structure must instead be allocated with a plain kzalloc()
> > and reference-counted.
> >
> > Most drivers will register a single character device using a
> > subsystem-specific API (e.g. video_register_device() in V4L2). The
> > subsystem needs to provide a .release() callback, called when the
> > last reference to the character device is released. Drivers must
> > implement this, and can simply free their internal data structure at
> > this point.
> >
> > For drivers that register multiple character devices, or in general
> > expose multiple interfaces to userspace or other parts of the kernel,
> > the internal data structure must be properly reference-counted, with a
> > reference released in each .release() callback. There may be ways to
> > simplify this.
> >
> > This can be seen as going back to the pre-devm_kzalloc() era, but it's
> > only about undoing a mistake that was done way too often (to be fair,
> > many drivers used to just kfree() the data in the driver's .remove()
> > operation, so it wasn't always a regression, only enshrining a
> > preexisting bad practice).
> >
> > This is only part of the puzzle though. There's a remove/use race that
> > still needs to be solved.
> >
> > - In .remove(), drivers must inform the character device that new access
> > from userspace are not allowed anymore. This would set a flag in
> > struct cdev that would result in all new calls from userspace through
> > file operations to be rejected (with -ENXIO for instance). This should
> > be wrapped in subsystem-specific functions (e.g.
> > video_device_disconnect() wrapping cdev_disconnect()). From now on, no
> > new calls are possible, but existing calls may be in progress.
> >
> > - Drivers then need to cancel all pending I/O and wait for completion.
> > I/O (either direct memory-mapped I/O or through bus APIs, such as I2C
> > or USB transactions) are not allowed anymore after .remove() returns.
> > This will have the side effect of waking up userspace contexts that
> > are waiting for I/O completion (through a blocking file I/O operation
> > for instance). Existing calls from userspace will start completing.
> >
> > This is also a good place to start shutting down the device, for
> > instance disabling interrupts.
> >
> > - The next step is for drivers to wait until all calls from userspace
> > complete. This should be done with a call count in struct cdev that is
> > updated upon entry and exit of calls from userspace, and a wait queue
> > to wait for that count to go to 0. This should be wrapped in
> > subsystem-specific APIs. As the flag that indicates device removal is
> > set, no new calls are allowed so the counter can only decrease, and as
> > all pending I/O have terminated or have been cancelled, no pending
> > calls should be blocked.
> >
> > - Finally, drivers should unregister the character device, through the
> > appropriate subsystem API.
> >
> > At this point, memory mappings and file handles referencing the device
> > may still exist, but no file operation is in progress. The device is
> > quiescent.
> >
> > Care needs to be taken in drivers and subsystems to not start any I/O
> > operation when handling the file .release() operation or the
> > destruction of memory mappings. Overall I don't expect much issues,
> > but I'm sure some drivers do strange things in those code paths.
> >
> > - When the least memory mapping is gone and the last file handle is
> > closed, the subsystem will call the driver's .release() callback. At
> > this point, the driver will perform the operations listed in the first
> > item of this list.
> >
> >
> > The challenge here will be to make this as easy as possible for drivers,
> > to avoid risk of drivers getting it wrong. The DRM/KMS subsystem has
> > created a devres-based system to handle this, with the devres associated
> > with the drm_device (the abstraction of the cdevs exposed by DRM to
> > userspace), *not* the physical device. It has a drawback though, it
> > assumes that a DRM driver will only ever want to register a drm_device
> > and nothing else, and hardcodes that assumption in the way it releases
> > resources. That's fine for most DRM drivers, but if a driver was to
> > register a drm_device and something else (such as a V4L2 video_device,
> > an input device, ...), the DRM subsystem will get in the way.
> >
> > I have two questions:
> >
> > - Does the above make sense ?
>
> Yes, totally, thank you for taking the time to write this all out. It's
> been in the back of my mind for over a decade that we need to work on
> these issues, but have not had any time to sit down and write it all
> down like you have.
>
> > - Assuming it does, how do we get from the current mess to a situation
> > where writing a driver will be a pleasure, not a punishment ? :-)
>
> That's the real question. Thanks to a lot of cleanups that Christoph
> has recently done to the lower-level cdev code, the lower levels are now
> in a shape where we can work on them better. I'm going to try to carve
> out some time in the next few months to start to work on these things.
> I think that once we get the ideas of what needs to be done, and a
> working core change, I can unleash some interns on doing tree-wide
> cleanups/changes to help bring everything into alignment.
>
> I'm going to save this email off for reference for later. But of
> course, if others want to start to attack this earlier, all the better :)
My too long todo list contains an entry to address this in the V4L2
subsystem to start with (see
https://lore.kernel.org/linux-media/20200816003315.GA13826@roeck-us.net/).
It may be a good prototype to then extend it to cdev. If I get a chance
to work on it, I'll CC you.
> > > > > > Fixing this should not be beyond the wit of humankind, though. If
> > > > > > we delayed deallocation to module release, that would fix the
> > > > > > interrupt issue, wouldn't it? Perhaps all devres memory for
> > > > > > devices should live until then anyway and thus be automatically
> > > > > > deallocated instead of needing an explicit free ... the problem
> > > > > > with that being compiled in devices currently optimize away the
> > > > > > module refcounting, but that should be fixable.
> > > > >
> > > > > module code lifespans are different than device structure lifespans,
> > > > > it's when people get them confused, as here, that we have problems.
> > > >
> > > > I'm not claiming they are. What I am claiming is that module lifetime
> > > > must always encompass the device lifetimes. Therefore, you can never
> > > > be incorrect by using a module lifetime for anything attached to a
> > > > device, just inefficient for using memory longer than potentially
> > > > needed. However, in a lot of use cases, the device is created on
> > > > module init and destroyed on module exit, so the inefficiency is barely
> > > > noticeable.
> > >
> > > In almost no use case is the device created on module init of a driver.
> > > devices are created by busses, look at the USB code as an example. The
> > > usb bus creates the devices and then individual modules bind to that
> > > device as needed. If the device is removed from the system, wonderful,
> > > the device is unbound, but the module is still loaded. So if you really
> > > wanted to, with your change, just do a insert/remove of a USB device a
> > > few zillion times and then memory is all gone :(
> > >
> > > > The question I'm asking is shouldn't we optimize for this?
> > >
> > > No.
> > >
> > > > so let people allocate devm memory safe in the knowledge it will be
> > > > freed on module release?
> > >
> > > No, see above. Modules are never removed in a normal system. devices
> > > are.
> > >
> > > And the drm developers have done great work in unwinding some of these
> > > types of mistakes in their drivers, let's not go backwards please.
--
Regards,
Laurent Pinchart
next prev parent reply other threads:[~2021-07-08 8:24 UTC|newest]
Thread overview: 200+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-25 22:09 [TECH TOPIC] Rust for Linux Miguel Ojeda
2021-07-05 23:51 ` Linus Walleij
2021-07-06 4:30 ` Leon Romanovsky
2021-07-06 9:55 ` Linus Walleij
2021-07-06 10:16 ` Geert Uytterhoeven
2021-07-06 17:59 ` Linus Walleij
2021-07-06 18:36 ` Miguel Ojeda
2021-07-06 19:12 ` Linus Walleij
2021-07-06 21:32 ` Miguel Ojeda
2021-07-07 14:10 ` Arnd Bergmann
2021-07-07 15:28 ` Miguel Ojeda
2021-07-07 15:50 ` Andrew Lunn
2021-07-07 16:34 ` Miguel Ojeda
2021-07-07 16:55 ` Arnd Bergmann
2021-07-07 17:54 ` Miguel Ojeda
2021-07-06 10:22 ` Leon Romanovsky
2021-07-06 14:30 ` Miguel Ojeda
2021-07-06 14:32 ` Miguel Ojeda
2021-07-06 15:03 ` Sasha Levin
2021-07-06 15:33 ` Miguel Ojeda
2021-07-06 15:42 ` Laurent Pinchart
2021-07-06 16:09 ` Mike Rapoport
2021-07-06 18:29 ` Miguel Ojeda
2021-07-06 18:38 ` Laurent Pinchart
2021-07-06 19:45 ` Steven Rostedt
2021-07-06 19:59 ` Miguel Ojeda
2021-07-06 18:53 ` Sasha Levin
2021-07-06 21:50 ` Miguel Ojeda
2021-07-07 4:57 ` Leon Romanovsky
2021-07-07 13:39 ` Alexandre Belloni
2021-07-07 13:50 ` Miguel Ojeda
2021-07-06 18:26 ` Linus Walleij
2021-07-06 19:11 ` Miguel Ojeda
2021-07-06 19:13 ` Johannes Berg
2021-07-06 19:43 ` Miguel Ojeda
2021-07-06 10:20 ` James Bottomley
2021-07-06 14:55 ` Miguel Ojeda
2021-07-06 15:01 ` Sasha Levin
2021-07-06 15:36 ` Miguel Ojeda
2021-07-09 10:02 ` Marco Elver
2021-07-09 16:02 ` Miguel Ojeda
2021-07-06 18:09 ` Linus Walleij
2021-07-06 14:24 ` Miguel Ojeda
2021-07-06 14:33 ` Laurent Pinchart
2021-07-06 14:56 ` Leon Romanovsky
2021-07-06 15:29 ` Miguel Ojeda
2021-07-07 4:38 ` Leon Romanovsky
2021-07-06 20:00 ` Roland Dreier
2021-07-06 20:36 ` Linus Walleij
2021-07-06 22:00 ` Laurent Pinchart
2021-07-07 7:27 ` Julia Lawall
2021-07-07 7:45 ` Greg KH
2021-07-07 7:52 ` James Bottomley
2021-07-07 13:49 ` Miguel Ojeda
2021-07-07 14:08 ` James Bottomley
2021-07-07 15:15 ` Miguel Ojeda
2021-07-07 15:44 ` Greg KH
2021-07-07 17:01 ` Wedson Almeida Filho
2021-07-07 17:20 ` Greg KH
2021-07-07 19:19 ` Wedson Almeida Filho
2021-07-07 20:38 ` Jan Kara
2021-07-07 23:09 ` Wedson Almeida Filho
2021-07-08 6:11 ` Greg KH
2021-07-08 13:36 ` Wedson Almeida Filho
2021-07-08 18:51 ` Greg KH
2021-07-08 19:31 ` Andy Lutomirski
2021-07-08 19:35 ` Geert Uytterhoeven
2021-07-08 21:56 ` Andy Lutomirski
2021-07-08 19:49 ` Linus Walleij
2021-07-08 20:34 ` Miguel Ojeda
2021-07-08 22:13 ` Linus Walleij
2021-07-09 7:24 ` Geert Uytterhoeven
2021-07-19 12:24 ` Wedson Almeida Filho
2021-07-19 13:15 ` Wedson Almeida Filho
2021-07-19 14:02 ` Arnd Bergmann
2021-07-19 14:13 ` Linus Walleij
2021-07-19 21:32 ` Arnd Bergmann
2021-07-19 21:33 ` Arnd Bergmann
2021-07-20 1:46 ` Miguel Ojeda
2021-07-20 6:43 ` Johannes Berg
2021-07-19 14:43 ` Geert Uytterhoeven
2021-07-19 18:24 ` Miguel Ojeda
2021-07-19 18:47 ` Steven Rostedt
2021-07-19 14:54 ` Miguel Ojeda
2021-07-19 17:32 ` Wedson Almeida Filho
2021-07-19 21:31 ` Arnd Bergmann
2021-07-19 17:37 ` Miguel Ojeda
2021-07-19 16:02 ` Vegard Nossum
2021-07-19 17:45 ` Miguel Ojeda
2021-07-19 17:54 ` Miguel Ojeda
2021-07-19 18:06 ` Wedson Almeida Filho
2021-07-19 19:37 ` Laurent Pinchart
2021-07-19 21:09 ` Wedson Almeida Filho
2021-07-20 23:54 ` Laurent Pinchart
2021-07-21 1:33 ` Andy Lutomirski
2021-07-21 1:42 ` Laurent Pinchart
2021-07-21 13:54 ` Linus Walleij
2021-07-21 14:13 ` Wedson Almeida Filho
2021-07-21 14:19 ` Linus Walleij
2021-07-22 11:33 ` Wedson Almeida Filho
2021-07-23 0:45 ` Linus Walleij
2021-07-21 4:39 ` Wedson Almeida Filho
2021-07-23 1:04 ` Laurent Pinchart
2021-07-21 4:23 ` Wedson Almeida Filho
2021-07-23 1:13 ` Laurent Pinchart
2021-07-19 22:57 ` Alexandre Belloni
2021-07-20 7:15 ` Miguel Ojeda
2021-07-20 9:39 ` Alexandre Belloni
2021-07-20 12:10 ` Miguel Ojeda
2021-07-19 13:53 ` Linus Walleij
2021-07-19 14:42 ` Wedson Almeida Filho
2021-07-19 22:16 ` Linus Walleij
2021-07-20 1:20 ` Wedson Almeida Filho
2021-07-20 13:21 ` Andrew Lunn
2021-07-20 13:38 ` Miguel Ojeda
2021-07-20 14:04 ` Andrew Lunn
2021-07-20 13:55 ` Greg KH
2021-07-20 1:21 ` Miguel Ojeda
2021-07-20 16:00 ` Mark Brown
2021-07-20 22:42 ` Linus Walleij
2021-07-19 14:43 ` Miguel Ojeda
2021-07-19 15:15 ` Andrew Lunn
2021-07-19 15:43 ` Miguel Ojeda
2021-07-09 7:03 ` Viresh Kumar
2021-07-09 17:06 ` Mark Brown
2021-07-09 17:43 ` Miguel Ojeda
2021-07-10 9:53 ` Jonathan Cameron
2021-07-10 20:09 ` Kees Cook
2021-07-08 13:55 ` Miguel Ojeda
2021-07-08 14:58 ` Greg KH
2021-07-08 15:02 ` Mark Brown
2021-07-08 16:38 ` Andy Lutomirski
2021-07-08 18:01 ` Greg KH
2021-07-08 18:00 ` Miguel Ojeda
2021-07-08 18:44 ` Greg KH
2021-07-08 23:09 ` Miguel Ojeda
2021-07-08 7:20 ` Geert Uytterhoeven
2021-07-08 13:41 ` Wedson Almeida Filho
2021-07-08 13:43 ` Geert Uytterhoeven
2021-07-08 13:54 ` Wedson Almeida Filho
2021-07-08 14:16 ` Geert Uytterhoeven
2021-07-08 14:24 ` Wedson Almeida Filho
2021-07-09 7:04 ` Jerome Glisse
2021-07-08 14:04 ` Miguel Ojeda
2021-07-08 14:18 ` Geert Uytterhoeven
2021-07-08 14:28 ` Miguel Ojeda
2021-07-08 14:33 ` Geert Uytterhoeven
2021-07-08 14:35 ` Miguel Ojeda
2021-07-09 11:55 ` Geert Uytterhoeven
2021-07-08 16:07 ` Andy Lutomirski
2021-07-07 20:58 ` Miguel Ojeda
2021-07-07 21:47 ` Laurent Pinchart
2021-07-07 22:44 ` Miguel Ojeda
2021-07-07 17:01 ` Miguel Ojeda
2021-07-07 10:50 ` Mark Brown
2021-07-07 10:56 ` Julia Lawall
2021-07-07 11:27 ` James Bottomley
2021-07-07 11:34 ` James Bottomley
2021-07-07 12:20 ` Greg KH
2021-07-07 12:38 ` James Bottomley
2021-07-07 12:45 ` Greg KH
2021-07-07 17:17 ` Laurent Pinchart
2021-07-08 6:49 ` cdev/devm_* issues (was Re: [TECH TOPIC] Rust for Linux) Greg KH
2021-07-08 8:23 ` Laurent Pinchart [this message]
2021-07-08 23:06 ` Linus Walleij
2021-07-09 0:02 ` Dan Williams
2021-07-09 16:53 ` Wedson Almeida Filho
2021-07-13 8:59 ` Linus Walleij
[not found] ` <CAHp75VfW7PxAyU=eYPNWFU_oUY=aStz-4W5gX87KSo402YhMXQ@mail.gmail.com>
2021-07-21 13:46 ` Linus Walleij
2021-07-21 15:49 ` Andy Shevchenko
2021-07-10 7:09 ` Dan Carpenter
2021-07-12 13:42 ` Jason Gunthorpe
2021-07-15 9:54 ` Daniel Vetter
2021-07-21 9:08 ` Dan Carpenter
2021-07-22 9:56 ` Daniel Vetter
2021-07-22 10:09 ` Dan Carpenter
2021-07-08 9:08 ` [TECH TOPIC] Rust for Linux Mauro Carvalho Chehab
2021-07-10 16:42 ` Laurent Pinchart
2021-07-10 17:18 ` Andy Lutomirski
2021-07-07 15:17 ` Mark Brown
2021-07-06 21:45 ` Bart Van Assche
2021-07-06 23:08 ` Stephen Hemminger
2021-07-07 2:41 ` Bart Van Assche
2021-07-07 18:57 ` Linus Torvalds
2021-07-07 20:32 ` Bart Van Assche
2021-07-07 20:39 ` Linus Torvalds
2021-07-07 21:40 ` Laurent Pinchart
2021-07-08 7:22 ` Geert Uytterhoeven
2021-07-07 21:02 ` Laurent Pinchart
2021-07-07 22:11 ` Miguel Ojeda
2021-07-07 22:43 ` Laurent Pinchart
2021-07-07 23:21 ` Miguel Ojeda
2021-07-07 23:40 ` Laurent Pinchart
2021-07-08 0:27 ` Miguel Ojeda
2021-07-08 0:56 ` Laurent Pinchart
2021-07-08 6:26 ` Alexey Dobriyan
2021-07-06 19:05 ` Bart Van Assche
2021-07-06 19:27 ` Miguel Ojeda
2021-07-07 15:48 ` Steven Rostedt
2021-07-07 16:44 ` Miguel Ojeda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YOa2Au2TEjCpm1xX@pendragon.ideasonboard.com \
--to=laurent.pinchart@ideasonboard.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=broonie@kernel.org \
--cc=daniel@ffwll.ch \
--cc=greg@kroah.com \
--cc=ksummit@lists.linux.dev \
--cc=linus.walleij@linaro.org \
--cc=miguel.ojeda.sandonis@gmail.com \
--cc=roland@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox