linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Alice Ryhl <aliceryhl@google.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Carlos Llamas" <cmllamas@google.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Christian Brauner" <brauner@kernel.org>,
	"Jan Kara" <jack@suse.cz>, "Paul Moore" <paul@paul-moore.com>,
	"James Morris" <jmorris@namei.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Dave Chinner" <david@fromorbit.com>,
	"Qi Zheng" <zhengqi.arch@bytedance.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"David Hildenbrand" <david@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Danilo Krummrich" <dakr@kernel.org>,
	kernel-team@android.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org
Subject: Re: [PATCH 1/5] export file_close_fd and task_work_add
Date: Mon, 9 Feb 2026 15:21:58 +0000	[thread overview]
Message-ID: <df876a6e-013c-4566-890d-7c1d662fced3@lucifer.local> (raw)
In-Reply-To: <aYSfBJA4hR4shPfI@google.com>

On Thu, Feb 05, 2026 at 01:45:40PM +0000, Alice Ryhl wrote:
> On Thu, Feb 05, 2026 at 11:53:19AM +0000, Lorenzo Stoakes wrote:
> > On Thu, Feb 05, 2026 at 11:42:46AM +0000, Alice Ryhl wrote:
> > > On Thu, Feb 05, 2026 at 11:20:33AM +0000, Lorenzo Stoakes wrote:
> > > > On Thu, Feb 05, 2026 at 10:51:26AM +0000, Alice Ryhl wrote:
> > > > > This exports the functionality needed by Binder to close file
> > > > > descriptors.
> > > > >
> > > > > When you send a fd over Binder, what happens is this:
> > > > >
> > > > > 1. The sending process turns the fd into a struct file and stores it in
> > > > >    the transaction object.
> > > > > 2. When the receiving process gets the message, the fd is installed as a
> > > > >    fd into the current process.
> > > > > 3. When the receiving process is done handling the message, it tells
> > > > >    Binder to clean up the transaction. As part of this, fds embedded in
> > > > >    the transaction are closed.
> > > > >
> > > > > Note that it was not always implemented like this. Previously the
> > > > > sending process would install the fd directly into the receiving proc in
> > > > > step 1, but as discussed previously [1] this is not ideal and has since
> > > > > been changed so that fd install happens during receive.
> > > > >
> > > > > The functions being exported here are for closing the fd in step 3. They
> > > > > are required because closing a fd from an ioctl is in general not safe.
> > > > > This is to meet the requirements for using fdget(), which is used by the
> > > > > ioctl framework code before calling into the driver's implementation of
> > > > > the ioctl. Binder works around this with this sequence of operations:
> > > > >
> > > > > 1. file_close_fd()
> > > > > 2. get_file()
> > > > > 3. filp_close()
> > > > > 4. task_work_add(current, TWA_RESUME)
> > > > > 5. <binder returns from ioctl>
> > > > > 6. fput()
> > > > >
> > > > > This ensures that when fput() is called in the task work, the fdget()
> > > > > that the ioctl framework code uses has already been fdput(), so if the
> > > > > fd being closed happens to be the same fd, then the fd is not closed
> > > > > in violation of the fdget() rules.
> > > >
> > > > I'm not really familiar with this mechanism but you're already talking about
> > > > this being a workaround so strikes me the correct thing to do is to find a way
> > > > to do this in the kernel sensibly rather than exporting internal implementation
> > > > details and doing it in binder.
> > >
> > > I did previously submit a patch that implemented this logic outside of
> > > Binder, but I was advised to move it into Binder.
> >
> > Right yeah that's just odd to me, we really do not want to be adding internal
> > implementation details to drivers.
> >
> > This is based on bitter experience of bugs being caused by drivers abusing every
> > interface they get, which is basically exactly what always happens, sadly.
> >
> > And out-of-tree is heavily discouraged.
> >
> > Also can we use EXPORT_SYMBOL_FOR_MODULES() for anything we do need to export to
> > make it explicitly only for binder, perhaps?
> >
> > >
> > > But I'm happy to submit a patch to extract this logic into some sort of
> > > close_fd_safe() method that can be called even if said fd is currently
> > > held using fdget().
> >
> > Yup, especially given Christian's view on the kernel task export here I think
> > that's a more sensible approach.
> >
> > But obviously I defer the sensible-ness of this to him as I am but an mm dev :)
>
> Quick sketch of how this would look:
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index adde1e40cccd..6fb7175ff69b 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -64,7 +64,6 @@
>  #include <linux/spinlock.h>
>  #include <linux/ratelimit.h>
>  #include <linux/syscalls.h>
> -#include <linux/task_work.h>
>  #include <linux/sizes.h>
>  #include <linux/ktime.h>
>
> @@ -1962,68 +1961,6 @@ static bool binder_validate_fixup(struct binder_proc *proc,
>  	return (fixup_offset >= last_min_offset);
>  }
>
> -/**
> - * struct binder_task_work_cb - for deferred close
> - *
> - * @twork:                callback_head for task work
> - * @file:                 file to close
> - *
> - * Structure to pass task work to be handled after
> - * returning from binder_ioctl() via task_work_add().
> - */
> -struct binder_task_work_cb {
> -	struct callback_head twork;
> -	struct file *file;
> -};
> -
> -/**
> - * binder_do_fd_close() - close list of file descriptors
> - * @twork:	callback head for task work
> - *
> - * It is not safe to call ksys_close() during the binder_ioctl()
> - * function if there is a chance that binder's own file descriptor
> - * might be closed. This is to meet the requirements for using
> - * fdget() (see comments for __fget_light()). Therefore use
> - * task_work_add() to schedule the close operation once we have
> - * returned from binder_ioctl(). This function is a callback
> - * for that mechanism and does the actual ksys_close() on the
> - * given file descriptor.
> - */
> -static void binder_do_fd_close(struct callback_head *twork)
> -{
> -	struct binder_task_work_cb *twcb = container_of(twork,
> -			struct binder_task_work_cb, twork);
> -
> -	fput(twcb->file);
> -	kfree(twcb);
> -}
> -
> -/**
> - * binder_deferred_fd_close() - schedule a close for the given file-descriptor
> - * @fd:		file-descriptor to close
> - *
> - * See comments in binder_do_fd_close(). This function is used to schedule
> - * a file-descriptor to be closed after returning from binder_ioctl().
> - */
> -static void binder_deferred_fd_close(int fd)
> -{
> -	struct binder_task_work_cb *twcb;
> -
> -	twcb = kzalloc(sizeof(*twcb), GFP_KERNEL);
> -	if (!twcb)
> -		return;
> -	init_task_work(&twcb->twork, binder_do_fd_close);
> -	twcb->file = file_close_fd(fd);
> -	if (twcb->file) {
> -		// pin it until binder_do_fd_close(); see comments there
> -		get_file(twcb->file);
> -		filp_close(twcb->file, current->files);
> -		task_work_add(current, &twcb->twork, TWA_RESUME);
> -	} else {
> -		kfree(twcb);
> -	}
> -}
> -
>  static void binder_transaction_buffer_release(struct binder_proc *proc,
>  					      struct binder_thread *thread,
>  					      struct binder_buffer *buffer,
> @@ -2183,7 +2120,10 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
>  						offset, sizeof(fd));
>  				WARN_ON(err);
>  				if (!err) {
> -					binder_deferred_fd_close(fd);
> +					/*
> +					 * Intentionally ignore EBADF errors here.
> +					 */
> +					close_fd_safe(fd, GFP_KERNEL | __GFP_NOFAIL);
>  					/*
>  					 * Need to make sure the thread goes
>  					 * back to userspace to complete the
> diff --git a/fs/file.c b/fs/file.c
> index 0a4f3bdb2dec..58e3825e846c 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -21,6 +21,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/close_range.h>
>  #include <linux/file_ref.h>
> +#include <linux/task_work.h>
>  #include <net/sock.h>
>  #include <linux/init_task.h>
>
> @@ -1525,3 +1526,47 @@ int iterate_fd(struct files_struct *files, unsigned n,
>  	return res;
>  }
>  EXPORT_SYMBOL(iterate_fd);
> +
> +struct close_fd_safe_task_work {
> +	struct callback_head twork;
> +	struct file *file;
> +};
> +
> +static void close_fd_safe_callback(struct callback_head *twork)
> +{
> +	struct close_fd_safe_task_work *twcb = container_of(twork,
> +			struct close_fd_safe_task_work, twork);
> +
> +	fput(twcb->file);
> +	kfree(twcb);
> +}
> +
> +/**
> + * close_fd_safe - close the given fd
> + * @fd: file descriptor to close
> + * @flags: gfp flags for allocation of task work
> + *
> + * This closes an fd. Unlike close_fd(), this may be used even if the fd is
> + * currently held with fdget().
> + *
> + * Returns: 0 or an error code
> + */
> +int close_fd_safe(unsigned int fd, gfp_t flags)
> +{
> +	struct close_fd_safe_task_work *twcb;
> +
> +	twcb = kzalloc(sizeof(*twcb), flags);
> +	if (!twcb)
> +		return -ENOMEM;
> +	init_task_work(&twcb->twork, close_fd_safe_callback);
> +	twcb->file = file_close_fd(fd);
> +	if (!twcb->file) {
> +		kfree(twcb);
> +		return -EBADF;
> +	}
> +
> +	get_file(twcb->file);
> +	filp_close(twcb->file, current->files);
> +	task_work_add(current, &twcb->twork, TWA_RESUME);
> +	return 0;
> +}

Would need an EXPORT_SYMBOL_FOR_MODULES(...) here right?

> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index c45306a9f007..1c99a56c0cdf 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -111,6 +111,7 @@ int iterate_fd(struct files_struct *, unsigned,
>  		const void *);
>
>  extern int close_fd(unsigned int fd);
> +extern int close_fd_safe(unsigned int fd, gfp_t flags);

One nit, generally well in mm anyway we avoid the 'extern' and remove them as we
go. Not sure about vfs actually though?

>  extern struct file *file_close_fd(unsigned int fd);
>
>  extern struct kmem_cache *files_cachep;

I mean this is essentially taking what's in binder and making it a general
thing, so needs Christian's input on whether this is sensible I think :)

Cheers, Lorenzo


  reply	other threads:[~2026-02-09 15:22 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 10:51 [PATCH 0/5] Make Rust Binder build as a module Alice Ryhl
2026-02-05 10:51 ` [PATCH 1/5] export file_close_fd and task_work_add Alice Ryhl
2026-02-05 11:20   ` Lorenzo Stoakes
2026-02-05 11:42     ` Alice Ryhl
2026-02-05 11:53       ` Lorenzo Stoakes
2026-02-05 13:45         ` Alice Ryhl
2026-02-09 15:21           ` Lorenzo Stoakes [this message]
2026-02-10  8:47             ` Alice Ryhl
2026-02-05 11:38   ` Christian Brauner
2026-02-05 11:52     ` Jan Kara
2026-02-05 12:07       ` Alice Ryhl
2026-02-05 10:51 ` [PATCH 2/5] security: export binder symbols Alice Ryhl
2026-02-20  0:00   ` Paul Moore
2026-02-05 10:51 ` [PATCH 3/5] mm: export zap_page_range_single and list_lru_add/del Alice Ryhl
2026-02-05 10:59   ` David Hildenbrand (arm)
2026-02-05 11:04     ` Alice Ryhl
2026-02-05 11:12       ` David Hildenbrand (arm)
2026-02-05 11:18         ` Alice Ryhl
2026-02-05 11:30           ` David Hildenbrand (arm)
2026-02-05 11:29   ` Lorenzo Stoakes
2026-02-05 11:43     ` David Hildenbrand (arm)
2026-02-05 11:57       ` David Hildenbrand (arm)
2026-02-05 12:01         ` Lorenzo Stoakes
2026-02-05 12:06           ` David Hildenbrand (arm)
2026-02-05 12:07             ` Lorenzo Stoakes
2026-02-05 11:57       ` Lorenzo Stoakes
2026-02-05 12:03         ` David Hildenbrand (arm)
2026-02-05 12:12           ` Lorenzo Stoakes
2026-02-05 12:24         ` Miguel Ojeda
2026-02-05 12:28           ` Lorenzo Stoakes
2026-02-05 11:58       ` Alice Ryhl
2026-02-05 12:10         ` Lorenzo Stoakes
2026-02-05 12:13           ` David Hildenbrand (arm)
2026-02-05 12:19             ` Alice Ryhl
2026-02-05 12:24               ` Lorenzo Stoakes
2026-02-05 12:30                 ` David Hildenbrand (Arm)
2026-02-09 15:22                   ` Lorenzo Stoakes
2026-02-05 12:16           ` Alice Ryhl
2026-02-05 12:07     ` Alice Ryhl
2026-02-05 12:18       ` Lorenzo Stoakes
2026-02-05 10:51 ` [PATCH 4/5] ipc: export init_ipc_ns and put_ipc_ns Alice Ryhl
2026-02-05 10:51 ` [PATCH 5/5] rust_binder: mark ANDROID_BINDER_IPC_RUST tristate Alice Ryhl
2026-02-05 13:21   ` Gary Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df876a6e-013c-4566-890d-7c1d662fced3@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=a.hindborg@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aliceryhl@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=cmllamas@google.com \
    --cc=dakr@kernel.org \
    --cc=david@fromorbit.com \
    --cc=david@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=jmorris@namei.org \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=ojeda@kernel.org \
    --cc=paul@paul-moore.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=surenb@google.com \
    --cc=tmgross@umich.edu \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox