* [PATCH v2 1/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL)
2023-09-08 17:57 [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL) Michal Clapinski
@ 2023-09-08 17:57 ` Michal Clapinski
2023-09-08 20:02 ` kernel test robot
2023-09-08 17:57 ` [PATCH v2 2/2] selftests: test ioctl(MEMFD_CHECK_IF_ORIGINAL) Michal Clapinski
2023-09-08 20:34 ` [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL) Jonathan Corbet
2 siblings, 1 reply; 7+ messages in thread
From: Michal Clapinski @ 2023-09-08 17:57 UTC (permalink / raw)
To: Jonathan Corbet, Mike Kravetz, Muchun Song, Andrew Morton,
Hugh Dickins, Shuah Khan, Greg Kroah-Hartman, Arnd Bergmann,
Yi Liu, Dominik Brodowski, Hans Verkuil, Steve French, Simon Ser,
Jason Gunthorpe, Marc Dionne, Jiri Slaby, David Howells,
Luca Vizzarro, Jeff Xu, Aleksa Sarai, Kees Cook, Daniel Verkamp,
linux-doc, linux-kernel, linux-mm, linux-kselftest
Cc: Michal Clapinski
Add a way to check if an fd points to the memfd's original open fd
(the one created by memfd_create).
Useful because only the original open fd can be both writable and
executable.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
---
Documentation/userspace-api/ioctl/ioctl-number.rst | 1 +
fs/hugetlbfs/inode.c | 9 +++++++++
include/linux/memfd.h | 12 ++++++++++++
mm/memfd.c | 9 +++++++++
mm/shmem.c | 9 +++++++++
5 files changed, 40 insertions(+)
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 4ea5b837399a..9a0782116ac2 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -355,6 +355,7 @@ Code Seq# Include File Comments
0xB6 all linux/fpga-dfl.h
0xB7 all uapi/linux/remoteproc_cdev.h <mailto:linux-remoteproc@vger.kernel.org>
0xB7 all uapi/linux/nsfs.h <mailto:Andrei Vagin <avagin@openvz.org>>
+0xB8 00 linux/memfd.h
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h
0xCA 10-2F uapi/misc/ocxl.h
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 316c4cebd3f3..89ff46f7ac54 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -35,6 +35,7 @@
#include <linux/magic.h>
#include <linux/migrate.h>
#include <linux/uio.h>
+#include <linux/memfd.h>
#include <linux/uaccess.h>
#include <linux/sched/mm.h>
@@ -1324,6 +1325,12 @@ static void init_once(void *foo)
inode_init_once(&ei->vfs_inode);
}
+static long hugetlbfs_file_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ return memfd_ioctl(file, cmd, arg);
+}
+
const struct file_operations hugetlbfs_file_operations = {
.read_iter = hugetlbfs_read_iter,
.mmap = hugetlbfs_file_mmap,
@@ -1331,6 +1338,8 @@ const struct file_operations hugetlbfs_file_operations = {
.get_unmapped_area = hugetlb_get_unmapped_area,
.llseek = default_llseek,
.fallocate = hugetlbfs_fallocate,
+ .unlocked_ioctl = hugetlbfs_file_ioctl,
+ .compat_ioctl = hugetlbfs_file_ioctl,
};
static const struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/include/linux/memfd.h b/include/linux/memfd.h
index e7abf6fa4c52..50f512624c92 100644
--- a/include/linux/memfd.h
+++ b/include/linux/memfd.h
@@ -3,14 +3,26 @@
#define __LINUX_MEMFD_H
#include <linux/file.h>
+#include <linux/ioctl.h>
#ifdef CONFIG_MEMFD_CREATE
extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg);
+extern long memfd_ioctl(struct file *file, unsigned int cmd, unsigned int arg);
#else
static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a)
{
return -EINVAL;
}
+static inline long memfd_ioctl(struct file *f, unsigned int c, unsigned int a)
+{
+ return -EINVAL;
+}
#endif
+/*
+ * Return 1 if the memfd is original (i.e. was created by memfd_create,
+ * not reopened), 0 otherwise.
+ */
+#define MEMFD_CHECK_IF_ORIGINAL _IOR(0xB8, 0, int)
+
#endif /* __LINUX_MEMFD_H */
diff --git a/mm/memfd.c b/mm/memfd.c
index 1cad1904fc26..06bcb970c387 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -262,6 +262,15 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
return error;
}
+long memfd_ioctl(struct file *file, unsigned int cmd, unsigned int arg)
+{
+ if (cmd == MEMFD_CHECK_IF_ORIGINAL)
+ return (file->f_mode & FMODE_WRITE) &&
+ !(file->f_mode & FMODE_WRITER);
+
+ return -EINVAL;
+}
+
#define MFD_NAME_PREFIX "memfd:"
#define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
#define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
diff --git a/mm/shmem.c b/mm/shmem.c
index 02e62fccc80d..347fcba15fb7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -79,6 +79,7 @@ static struct vfsmount *shm_mnt;
#include <linux/rmap.h>
#include <linux/uuid.h>
#include <linux/quotaops.h>
+#include <linux/memfd.h>
#include <linux/uaccess.h>
@@ -4459,6 +4460,12 @@ const struct address_space_operations shmem_aops = {
};
EXPORT_SYMBOL(shmem_aops);
+static long shmem_file_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ return memfd_ioctl(file, cmd, arg);
+}
+
static const struct file_operations shmem_file_operations = {
.mmap = shmem_mmap,
.open = shmem_file_open,
@@ -4471,6 +4478,8 @@ static const struct file_operations shmem_file_operations = {
.splice_read = shmem_file_splice_read,
.splice_write = iter_file_splice_write,
.fallocate = shmem_fallocate,
+ .unlocked_ioctl = shmem_file_ioctl,
+ .compat_ioctl = shmem_file_ioctl,
#endif
};
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v2 2/2] selftests: test ioctl(MEMFD_CHECK_IF_ORIGINAL)
2023-09-08 17:57 [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL) Michal Clapinski
2023-09-08 17:57 ` [PATCH v2 1/2] " Michal Clapinski
@ 2023-09-08 17:57 ` Michal Clapinski
2023-09-08 20:34 ` [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL) Jonathan Corbet
2 siblings, 0 replies; 7+ messages in thread
From: Michal Clapinski @ 2023-09-08 17:57 UTC (permalink / raw)
To: Jonathan Corbet, Mike Kravetz, Muchun Song, Andrew Morton,
Hugh Dickins, Shuah Khan, Greg Kroah-Hartman, Arnd Bergmann,
Yi Liu, Dominik Brodowski, Hans Verkuil, Steve French, Simon Ser,
Jason Gunthorpe, Marc Dionne, Jiri Slaby, David Howells,
Luca Vizzarro, Jeff Xu, Aleksa Sarai, Kees Cook, Daniel Verkamp,
linux-doc, linux-kernel, linux-mm, linux-kselftest
Cc: Michal Clapinski
Signed-off-by: Michal Clapinski <mclapinski@google.com>
---
tools/testing/selftests/memfd/memfd_test.c | 32 ++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 3df008677239..1a702af6e01a 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -13,6 +13,7 @@
#include <stdlib.h>
#include <signal.h>
#include <string.h>
+#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/syscall.h>
@@ -39,6 +40,10 @@
#define MFD_NOEXEC_SEAL 0x0008U
+#ifndef MEMFD_CHECK_IF_ORIGINAL
+#define MEMFD_CHECK_IF_ORIGINAL _IOR(0xB8, 0, int)
+#endif
+
/*
* Default is not to test hugetlbfs
*/
@@ -1567,6 +1572,31 @@ static void test_share_fork(char *banner, char *b_suffix)
close(fd);
}
+static void test_ioctl_check_original(void)
+{
+ int fd, fd2;
+
+ printf("%s IOCTL-CHECK-ORIGINAL\n", memfd_str);
+ fd = sys_memfd_create("kern_memfd_check_original", 0);
+ if (fd < 0) {
+ printf("memfd_create failed: %m\n");
+ abort();
+ }
+ if (ioctl(fd, MEMFD_CHECK_IF_ORIGINAL) != 1) {
+ printf("ioctl(MEMFD_CHECK_IF_ORIGINAL) failed\n");
+ abort();
+ }
+
+ fd2 = mfd_assert_reopen_fd(fd);
+ if (ioctl(fd2, MEMFD_CHECK_IF_ORIGINAL) != 0) {
+ printf("ioctl(MEMFD_CHECK_IF_ORIGINAL) failed\n");
+ abort();
+ }
+
+ close(fd);
+ close(fd2);
+}
+
int main(int argc, char **argv)
{
pid_t pid;
@@ -1609,6 +1639,8 @@ int main(int argc, char **argv)
test_share_open("SHARE-OPEN", "");
test_share_fork("SHARE-FORK", "");
+ test_ioctl_check_original();
+
/* Run test-suite in a multi-threaded environment with a shared
* file-table. */
pid = spawn_idle_thread(CLONE_FILES | CLONE_FS | CLONE_VM);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL)
2023-09-08 17:57 [PATCH v2 0/2] mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL) Michal Clapinski
2023-09-08 17:57 ` [PATCH v2 1/2] " Michal Clapinski
2023-09-08 17:57 ` [PATCH v2 2/2] selftests: test ioctl(MEMFD_CHECK_IF_ORIGINAL) Michal Clapinski
@ 2023-09-08 20:34 ` Jonathan Corbet
2023-09-08 21:55 ` Michał Cłapiński
2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Corbet @ 2023-09-08 20:34 UTC (permalink / raw)
To: Michal Clapinski, Mike Kravetz, Muchun Song, Andrew Morton,
Hugh Dickins, Shuah Khan, Greg Kroah-Hartman, Arnd Bergmann,
Yi Liu, Dominik Brodowski, Hans Verkuil, Steve French, Simon Ser,
Jason Gunthorpe, Marc Dionne, Jiri Slaby, David Howells,
Luca Vizzarro, Jeff Xu, Aleksa Sarai, Kees Cook, Daniel Verkamp,
linux-doc, linux-kernel, linux-mm, linux-kselftest
Cc: Michal Clapinski
Michal Clapinski <mclapinski@google.com> writes:
> This change introduces a way to check if an fd points to a memfd's
> original open fd (the one created by memfd_create).
>
> We encountered an issue with migrating memfds in CRIU (checkpoint
> restore in userspace - it migrates running processes between
> machines). Imagine a scenario:
> 1. Create a memfd. By default it's open with O_RDWR and yet one can
> exec() to it (unlike with regular files, where one would get ETXTBSY).
> 2. Reopen that memfd with O_RDWR via /proc/self/fd/<fd>.
>
> Now those 2 fds are indistinguishable from userspace. You can't exec()
> to either of them (since the reopen incremented inode->i_writecount)
> and their /proc/self/fdinfo/ are exactly the same. Unfortunately they
> are not the same. If you close the second one, the first one becomes
> exec()able again. If you close the first one, the other doesn't become
> exec()able. Therefore during migration it does matter which is recreated
> first and which is reopened but there is no way for CRIU to tell which
> was first.
So please bear with me...I'll confess that I don't fully understand the
situation here, so this is probably a dumb question.
It seems like you are adding this "original open" test as a way of
working around a quirk with the behavior of subsequent opens. I don't
*think* that this is part of the intended, documented behavior of
memfds, it's just something that happens. You're exposing an artifact
of the current implementation.
Given that the two file descriptors are otherwise indistinguishable,
might a better fix be to make them indistinguishable in this regard as
well? Is there a good reason why the second fd doesn't become
exec()able in this scenario and, if not, perhaps that behavior could be
changed instead?
Thanks,
jon
^ permalink raw reply [flat|nested] 7+ messages in thread