linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] symlink length caching
@ 2024-11-19  9:45 Mateusz Guzik
  2024-11-19  9:45 ` [PATCH v2 1/3] vfs: support caching symlink lengths in inodes Mateusz Guzik
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-19  9:45 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm, Mateusz Guzik

quote:
    When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
    speed up when issuing readlink on /initrd.img on ext4.

Benchmark code at the bottom.

ext4 and tmpfs are patched, other filesystems can also get there with
some more work.

Arguably the current get_link API should be patched to let the fs return
the size, but that's not a churn I'm interested into diving in.

On my v1 Jan remarked 1.5% is not a particularly high win questioning
whether doing this makes sense. I noted the value is only this small
because of other slowdowns.

To elaborate here are highlights while benching on Sapphire Rapids:
1. putname using atomics (over 3.5% on my profile)

sounds like Al has plans to do something here(?), I'm not touching it if
it can be helped. the atomic definitely does not have to be there in the
common case.

2. kmem_cache_alloc_noprof/kmem_cache_free (over 7% combined) 

They are both dog slow due to cmpxchg16b. A patchset was posted which
adds a different allocation/free fast path which should whack majority
of the problem, see: https://lore.kernel.org/linux-mm/20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz/

If this lands I'll definitely try to make the pathname allocs use it,
should drop about 5-6 percentage points on this sucker.

3. __legitimize_mnt issues a full fence (again over 3%)

As far as avoiding the fence is concerned waiting on rcu grace period on
unmount should do the trick. However, I found there is a bunch
complexity there to sort out before doing this will be feasible (notably
there are multiple mounts freed in one go, this needs to be batched).
There may be other issues which I missed and which make this not worth
it, but the fence is definitely avoidable in principle and I would be
surprised if there was no way to sensibly get there. No ETA, for now I'm
just pointing this out.

There is also the idea to speculatively elide lockref, but when I tried
that last time I ran into significant LSM-related trouble.

All that aside there is also quite a bit of branching and func calling
which does not need to be there (example: make vfsuid/vfsgid, could be
combined into one routine etc.).

Ultimately there is single-threaded perf left on the table in various
spots.

v2:
- add a dedicated field, flag and a helper instead of using i_size
- constify i_link

v1 can be found here:
https://lore.kernel.org/linux-fsdevel/20241118085357.494178-1-mjguzik@gmail.com/

benchmark:
plug into will-it-scale into tests/readlink1.c:

#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <string.h>

char *testcase_description = "readlink /initrd.img";

void testcase(unsigned long long *iterations, unsigned long nr)
{
        char *tmplink = "/initrd.img";
        char buf[1024];

        while (1) {
                int error = readlink(tmplink, buf, sizeof(buf));
                assert(error > 0);

                (*iterations)++;
        }
}


Mateusz Guzik (3):
  vfs: support caching symlink lengths in inodes
  ext4: use inode_set_cached_link()
  tmpfs: use inode_set_cached_link()

 fs/ext4/inode.c                |  3 ++-
 fs/ext4/namei.c                |  4 +++-
 fs/namei.c                     | 34 +++++++++++++++++++---------------
 fs/proc/namespaces.c           |  2 +-
 include/linux/fs.h             | 12 ++++++++++--
 mm/shmem.c                     |  6 ++++--
 security/apparmor/apparmorfs.c |  2 +-
 7 files changed, 40 insertions(+), 23 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] vfs: support caching symlink lengths in inodes
  2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
@ 2024-11-19  9:45 ` Mateusz Guzik
  2024-11-20  4:15   ` wangjianjian (C)
  2024-11-19  9:45 ` [PATCH v2 2/3] ext4: use inode_set_cached_link() Mateusz Guzik
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-19  9:45 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm, Mateusz Guzik

When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
speed up when issuing readlink on /initrd.img on ext4.

Filesystems opt in by calling inode_set_cached_link() when creating an
inode.

The size is stored in what used to be a 4-byte hole. If necessary the
field can be made smaller and converted into a union with something not
used with symlinks.

Churn-wise the current readlink_copy() helper is patched to accept the
size instead of calculating it.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 fs/namei.c                     | 34 +++++++++++++++++++---------------
 fs/proc/namespaces.c           |  2 +-
 include/linux/fs.h             | 12 ++++++++++--
 security/apparmor/apparmorfs.c |  2 +-
 4 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9d30c7aa9aa6..e56c29a22d26 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5272,19 +5272,16 @@ SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newna
 				getname(newname), 0);
 }
 
-int readlink_copy(char __user *buffer, int buflen, const char *link)
+int readlink_copy(char __user *buffer, int buflen, const char *link, int linklen)
 {
-	int len = PTR_ERR(link);
-	if (IS_ERR(link))
-		goto out;
+	int copylen;
 
-	len = strlen(link);
-	if (len > (unsigned) buflen)
-		len = buflen;
-	if (copy_to_user(buffer, link, len))
-		len = -EFAULT;
-out:
-	return len;
+	copylen = linklen;
+	if (unlikely(copylen > (unsigned) buflen))
+		copylen = buflen;
+	if (copy_to_user(buffer, link, copylen))
+		copylen = -EFAULT;
+	return copylen;
 }
 
 /**
@@ -5304,6 +5301,9 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	const char *link;
 	int res;
 
+	if (inode->i_opflags & IOP_CACHED_LINK)
+		return readlink_copy(buffer, buflen, inode->i_link, inode->i_linklen);
+
 	if (unlikely(!(inode->i_opflags & IOP_DEFAULT_READLINK))) {
 		if (unlikely(inode->i_op->readlink))
 			return inode->i_op->readlink(dentry, buffer, buflen);
@@ -5322,7 +5322,7 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 		if (IS_ERR(link))
 			return PTR_ERR(link);
 	}
-	res = readlink_copy(buffer, buflen, link);
+	res = readlink_copy(buffer, buflen, link, strlen(link));
 	do_delayed_call(&done);
 	return res;
 }
@@ -5391,10 +5391,14 @@ EXPORT_SYMBOL(page_put_link);
 
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
+	const char *link;
+	int res;
+
 	DEFINE_DELAYED_CALL(done);
-	int res = readlink_copy(buffer, buflen,
-				page_get_link(dentry, d_inode(dentry),
-					      &done));
+	link = page_get_link(dentry, d_inode(dentry), &done);
+	res = PTR_ERR(link);
+	if (!IS_ERR(link))
+		res = readlink_copy(buffer, buflen, link, strlen(link));
 	do_delayed_call(&done);
 	return res;
 }
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 8e159fc78c0a..c610224faf10 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -83,7 +83,7 @@ static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int bufl
 	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
 		res = ns_get_name(name, sizeof(name), task, ns_ops);
 		if (res >= 0)
-			res = readlink_copy(buffer, buflen, name);
+			res = readlink_copy(buffer, buflen, name, strlen(name));
 	}
 	put_task_struct(task);
 	return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 972147da71f9..30e332fb399d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -626,6 +626,7 @@ is_uncached_acl(struct posix_acl *acl)
 #define IOP_XATTR	0x0008
 #define IOP_DEFAULT_READLINK	0x0010
 #define IOP_MGTIME	0x0020
+#define IOP_CACHED_LINK	0x0040
 
 /*
  * Keep mostly read-only and often accessed (especially for
@@ -686,7 +687,7 @@ struct inode {
 
 	/* Misc */
 	u32			i_state;
-	/* 32-bit hole */
+	int			i_linklen;	/* for symlinks */
 	struct rw_semaphore	i_rwsem;
 
 	unsigned long		dirtied_when;	/* jiffies of first dirtying */
@@ -749,6 +750,13 @@ struct inode {
 	void			*i_private; /* fs or device private pointer */
 } __randomize_layout;
 
+static inline void inode_set_cached_link(struct inode *inode, char *link, int linklen)
+{
+	inode->i_link = link;
+	inode->i_linklen = linklen;
+	inode->i_opflags |= IOP_CACHED_LINK;
+}
+
 /*
  * Get bit address from inode->i_state to use with wait_var_event()
  * infrastructre.
@@ -3351,7 +3359,7 @@ extern const struct file_operations generic_ro_fops;
 
 #define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
 
-extern int readlink_copy(char __user *, int, const char *);
+extern int readlink_copy(char __user *, int, const char *, int);
 extern int page_readlink(struct dentry *, char __user *, int);
 extern const char *page_get_link(struct dentry *, struct inode *,
 				 struct delayed_call *);
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 01b923d97a44..60959cfba672 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2611,7 +2611,7 @@ static int policy_readlink(struct dentry *dentry, char __user *buffer,
 	res = snprintf(name, sizeof(name), "%s:[%lu]", AAFS_NAME,
 		       d_inode(dentry)->i_ino);
 	if (res > 0 && res < sizeof(name))
-		res = readlink_copy(buffer, buflen, name);
+		res = readlink_copy(buffer, buflen, name, strlen(name));
 	else
 		res = -ENOENT;
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] ext4: use inode_set_cached_link()
  2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
  2024-11-19  9:45 ` [PATCH v2 1/3] vfs: support caching symlink lengths in inodes Mateusz Guzik
@ 2024-11-19  9:45 ` Mateusz Guzik
  2024-11-19  9:45 ` [PATCH v2 3/3] tmpfs: " Mateusz Guzik
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-19  9:45 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm, Mateusz Guzik

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 fs/ext4/inode.c | 3 ++-
 fs/ext4/namei.c | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 89aade6f45f6..7c54ae5fcbd4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5006,10 +5006,11 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 		if (IS_ENCRYPTED(inode)) {
 			inode->i_op = &ext4_encrypted_symlink_inode_operations;
 		} else if (ext4_inode_is_fast_symlink(inode)) {
-			inode->i_link = (char *)ei->i_data;
 			inode->i_op = &ext4_fast_symlink_inode_operations;
 			nd_terminate_link(ei->i_data, inode->i_size,
 				sizeof(ei->i_data) - 1);
+			inode_set_cached_link(inode, (char *)ei->i_data,
+					      inode->i_size);
 		} else {
 			inode->i_op = &ext4_symlink_inode_operations;
 		}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index bcf2737078b8..536d56d15072 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3418,7 +3418,6 @@ static int ext4_symlink(struct mnt_idmap *idmap, struct inode *dir,
 			inode->i_op = &ext4_symlink_inode_operations;
 		} else {
 			inode->i_op = &ext4_fast_symlink_inode_operations;
-			inode->i_link = (char *)&EXT4_I(inode)->i_data;
 		}
 	}
 
@@ -3434,6 +3433,9 @@ static int ext4_symlink(struct mnt_idmap *idmap, struct inode *dir,
 		       disk_link.len);
 		inode->i_size = disk_link.len - 1;
 		EXT4_I(inode)->i_disksize = inode->i_size;
+		if (!IS_ENCRYPTED(inode))
+			inode_set_cached_link(inode, (char *)&EXT4_I(inode)->i_data,
+					      inode->i_size);
 	}
 	err = ext4_add_nondir(handle, dentry, &inode);
 	if (handle)
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] tmpfs: use inode_set_cached_link()
  2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
  2024-11-19  9:45 ` [PATCH v2 1/3] vfs: support caching symlink lengths in inodes Mateusz Guzik
  2024-11-19  9:45 ` [PATCH v2 2/3] ext4: use inode_set_cached_link() Mateusz Guzik
@ 2024-11-19  9:45 ` Mateusz Guzik
  2024-11-19 17:53 ` [PATCH v2 0/3] symlink length caching Theodore Ts'o
  2024-11-20 10:33 ` Christian Brauner
  4 siblings, 0 replies; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-19  9:45 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm, Mateusz Guzik

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 mm/shmem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3d17753afd94..698a4bbdc21d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3870,6 +3870,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 	int len;
 	struct inode *inode;
 	struct folio *folio;
+	char *link;
 
 	len = strlen(symname) + 1;
 	if (len > PAGE_SIZE)
@@ -3891,12 +3892,13 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 
 	inode->i_size = len-1;
 	if (len <= SHORT_SYMLINK_LEN) {
-		inode->i_link = kmemdup(symname, len, GFP_KERNEL);
-		if (!inode->i_link) {
+		link = kmemdup(symname, len, GFP_KERNEL);
+		if (!link) {
 			error = -ENOMEM;
 			goto out_remove_offset;
 		}
 		inode->i_op = &shmem_short_symlink_operations;
+		inode_set_cached_link(inode, link, len - 1);
 	} else {
 		inode_nohighmem(inode);
 		inode->i_mapping->a_ops = &shmem_aops;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
                   ` (2 preceding siblings ...)
  2024-11-19  9:45 ` [PATCH v2 3/3] tmpfs: " Mateusz Guzik
@ 2024-11-19 17:53 ` Theodore Ts'o
  2024-11-19 18:07   ` Mateusz Guzik
  2024-11-20 10:33 ` Christian Brauner
  4 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2024-11-19 17:53 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: brauner, viro, jack, linux-kernel, linux-fsdevel, hughd,
	linux-ext4, linux-mm

On Tue, Nov 19, 2024 at 10:45:52AM +0100, Mateusz Guzik wrote:
> 
> On my v1 Jan remarked 1.5% is not a particularly high win questioning
> whether doing this makes sense. I noted the value is only this small
> because of other slowdowns.

Do you have a workload in mind which calls readlink() at sufficiently
high numbers such that this would be noticeable in
non-micro-benchmarks?  What motiviated you to do this work?

Thanks,

					- Ted


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-19 17:53 ` [PATCH v2 0/3] symlink length caching Theodore Ts'o
@ 2024-11-19 18:07   ` Mateusz Guzik
  0 siblings, 0 replies; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-19 18:07 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: brauner, viro, jack, linux-kernel, linux-fsdevel, hughd,
	linux-ext4, linux-mm

On Tue, Nov 19, 2024 at 6:53 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Tue, Nov 19, 2024 at 10:45:52AM +0100, Mateusz Guzik wrote:
> >
> > On my v1 Jan remarked 1.5% is not a particularly high win questioning
> > whether doing this makes sense. I noted the value is only this small
> > because of other slowdowns.
>
> Do you have a workload in mind which calls readlink() at sufficiently
> high numbers such that this would be noticeable in
> non-micro-benchmarks?  What motiviated you to do this work?
>

I'm just messing about here. Given the triviality of the patch I'm not
sure where the objection is coming from. I can point a finger at
changes made by other people for supposed perf gains which are
virtually guaranteed to be invisible in isolation, just like this one.

Anyhow, I landed here from strlen -- the sucker is operating one byte
at a time which I was looking to sort out, but then I noticed that one
of the more commonly executing consumers does not even need to call
it.
-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vfs: support caching symlink lengths in inodes
  2024-11-19  9:45 ` [PATCH v2 1/3] vfs: support caching symlink lengths in inodes Mateusz Guzik
@ 2024-11-20  4:15   ` wangjianjian (C)
  2024-11-20  5:01     ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: wangjianjian (C) @ 2024-11-20  4:15 UTC (permalink / raw)
  To: Mateusz Guzik, brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm

On 2024/11/19 17:45, Mateusz Guzik wrote:
> When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
> speed up when issuing readlink on /initrd.img on ext4.
> 
> Filesystems opt in by calling inode_set_cached_link() when creating an
> inode.
> 
> The size is stored in what used to be a 4-byte hole. If necessary the
> field can be made smaller and converted into a union with something not
> used with symlinks.
> 
> Churn-wise the current readlink_copy() helper is patched to accept the
> size instead of calculating it.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
>   fs/namei.c                     | 34 +++++++++++++++++++---------------
>   fs/proc/namespaces.c           |  2 +-
>   include/linux/fs.h             | 12 ++++++++++--
>   security/apparmor/apparmorfs.c |  2 +-
>   4 files changed, 31 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 9d30c7aa9aa6..e56c29a22d26 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -5272,19 +5272,16 @@ SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newna
>   				getname(newname), 0);
>   }
>   
> -int readlink_copy(char __user *buffer, int buflen, const char *link)
> +int readlink_copy(char __user *buffer, int buflen, const char *link, int linklen)
>   {
> -	int len = PTR_ERR(link);
> -	if (IS_ERR(link))
> -		goto out;
> +	int copylen;
>   
> -	len = strlen(link);
> -	if (len > (unsigned) buflen)
> -		len = buflen;
> -	if (copy_to_user(buffer, link, len))
> -		len = -EFAULT;
> -out:
> -	return len;
> +	copylen = linklen;
> +	if (unlikely(copylen > (unsigned) buflen))
> +		copylen = buflen;
> +	if (copy_to_user(buffer, link, copylen))
> +		copylen = -EFAULT;
> +	return copylen;
>   }
>   
>   /**
> @@ -5304,6 +5301,9 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
>   	const char *link;
>   	int res;
>   
> +	if (inode->i_opflags & IOP_CACHED_LINK)
> +		return readlink_copy(buffer, buflen, inode->i_link, inode->i_linklen);
> +
>   	if (unlikely(!(inode->i_opflags & IOP_DEFAULT_READLINK))) {
>   		if (unlikely(inode->i_op->readlink))
>   			return inode->i_op->readlink(dentry, buffer, buflen);
> @@ -5322,7 +5322,7 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
>   		if (IS_ERR(link))
>   			return PTR_ERR(link);
>   	}
> -	res = readlink_copy(buffer, buflen, link);
> +	res = readlink_copy(buffer, buflen, link, strlen(link));
>   	do_delayed_call(&done);
>   	return res;
>   }
> @@ -5391,10 +5391,14 @@ EXPORT_SYMBOL(page_put_link);
>   
>   int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
>   {
> +	const char *link;
> +	int res;
> +
>   	DEFINE_DELAYED_CALL(done);
> -	int res = readlink_copy(buffer, buflen,
> -				page_get_link(dentry, d_inode(dentry),
> -					      &done));
> +	link = page_get_link(dentry, d_inode(dentry), &done);
> +	res = PTR_ERR(link);
> +	if (!IS_ERR(link))
> +		res = readlink_copy(buffer, buflen, link, strlen(link));
>   	do_delayed_call(&done);
>   	return res;
>   }
> diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
> index 8e159fc78c0a..c610224faf10 100644
> --- a/fs/proc/namespaces.c
> +++ b/fs/proc/namespaces.c
> @@ -83,7 +83,7 @@ static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int bufl
>   	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
>   		res = ns_get_name(name, sizeof(name), task, ns_ops);
>   		if (res >= 0)
> -			res = readlink_copy(buffer, buflen, name);
> +			res = readlink_copy(buffer, buflen, name, strlen(name));
>   	}
>   	put_task_struct(task);
>   	return res;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 972147da71f9..30e332fb399d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -626,6 +626,7 @@ is_uncached_acl(struct posix_acl *acl)
>   #define IOP_XATTR	0x0008
>   #define IOP_DEFAULT_READLINK	0x0010
>   #define IOP_MGTIME	0x0020
> +#define IOP_CACHED_LINK	0x0040
>   
>   /*
>    * Keep mostly read-only and often accessed (especially for
> @@ -686,7 +687,7 @@ struct inode {
>   
>   	/* Misc */
>   	u32			i_state;
> -	/* 32-bit hole */
> +	int			i_linklen;	/* for symlinks */
>   	struct rw_semaphore	i_rwsem;
>   
>   	unsigned long		dirtied_when;	/* jiffies of first dirtying */
> @@ -749,6 +750,13 @@ struct inode {
>   	void			*i_private; /* fs or device private pointer */
>   } __randomize_layout;
>   
> +static inline void inode_set_cached_link(struct inode *inode, char *link, int linklen)
> +{
> +	inode->i_link = link;
> +	inode->i_linklen = linklen;
Just curious, is this linklen equal to inode size? if it is, why don't 
use it?
> +	inode->i_opflags |= IOP_CACHED_LINK;
> +}
> +
>   /*
>    * Get bit address from inode->i_state to use with wait_var_event()
>    * infrastructre.
> @@ -3351,7 +3359,7 @@ extern const struct file_operations generic_ro_fops;
>   
>   #define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
>   
> -extern int readlink_copy(char __user *, int, const char *);
> +extern int readlink_copy(char __user *, int, const char *, int);
>   extern int page_readlink(struct dentry *, char __user *, int);
>   extern const char *page_get_link(struct dentry *, struct inode *,
>   				 struct delayed_call *);
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 01b923d97a44..60959cfba672 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -2611,7 +2611,7 @@ static int policy_readlink(struct dentry *dentry, char __user *buffer,
>   	res = snprintf(name, sizeof(name), "%s:[%lu]", AAFS_NAME,
>   		       d_inode(dentry)->i_ino);
>   	if (res > 0 && res < sizeof(name))
> -		res = readlink_copy(buffer, buflen, name);
> +		res = readlink_copy(buffer, buflen, name, strlen(name));
>   	else
>   		res = -ENOENT;
>   
-- 
Regards



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vfs: support caching symlink lengths in inodes
  2024-11-20  4:15   ` wangjianjian (C)
@ 2024-11-20  5:01     ` Matthew Wilcox
  2024-11-20  5:10       ` wangjianjian (C)
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-11-20  5:01 UTC (permalink / raw)
  To: wangjianjian (C)
  Cc: Mateusz Guzik, brauner, viro, jack, linux-kernel, linux-fsdevel,
	hughd, linux-ext4, tytso, linux-mm

On Wed, Nov 20, 2024 at 12:15:18PM +0800, wangjianjian (C) wrote:
> > +{
> > +	inode->i_link = link;
> > +	inode->i_linklen = linklen;
> Just curious, is this linklen equal to inode size? if it is, why don't use
> it?

Maybe you should read v1 of the patch series where Jan explained where
that's not true.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vfs: support caching symlink lengths in inodes
  2024-11-20  5:01     ` Matthew Wilcox
@ 2024-11-20  5:10       ` wangjianjian (C)
  0 siblings, 0 replies; 13+ messages in thread
From: wangjianjian (C) @ 2024-11-20  5:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mateusz Guzik, brauner, viro, jack, linux-kernel, linux-fsdevel,
	hughd, linux-ext4, tytso, linux-mm

On 2024/11/20 13:01, Matthew Wilcox wrote:
> On Wed, Nov 20, 2024 at 12:15:18PM +0800, wangjianjian (C) wrote:
>>> +{
>>> +	inode->i_link = link;
>>> +	inode->i_linklen = linklen;
>> Just curious, is this linklen equal to inode size? if it is, why don't use
>> it?
> 
> Maybe you should read v1 of the patch series where Jan explained where
> that's not true.
okay, I see, thanks for this.
> 
-- 
Regards



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
                   ` (3 preceding siblings ...)
  2024-11-19 17:53 ` [PATCH v2 0/3] symlink length caching Theodore Ts'o
@ 2024-11-20 10:33 ` Christian Brauner
  2024-11-20 10:42   ` Mateusz Guzik
  4 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2024-11-20 10:33 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm

On Tue, Nov 19, 2024 at 10:45:52AM +0100, Mateusz Guzik wrote:
> quote:
>     When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
>     speed up when issuing readlink on /initrd.img on ext4.
> 
> Benchmark code at the bottom.
> 
> ext4 and tmpfs are patched, other filesystems can also get there with
> some more work.
> 
> Arguably the current get_link API should be patched to let the fs return
> the size, but that's not a churn I'm interested into diving in.
> 
> On my v1 Jan remarked 1.5% is not a particularly high win questioning
> whether doing this makes sense. I noted the value is only this small
> because of other slowdowns.

The thing is that you're stealing one of the holes I just put into struct
inode a cycle ago or so. The general idea has been to shrink struct
inode if we can and I'm not sure that caching the link length is
actually worth losing that hole. Otherwise I wouldn't object.

> All that aside there is also quite a bit of branching and func calling
> which does not need to be there (example: make vfsuid/vfsgid, could be
> combined into one routine etc.).

They should probably also be made inline functions and likely/unlikely
sprinkled in there.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-20 10:33 ` Christian Brauner
@ 2024-11-20 10:42   ` Mateusz Guzik
  2024-11-20 11:12     ` Christian Brauner
  0 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-20 10:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm

On Wed, Nov 20, 2024 at 11:33 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Nov 19, 2024 at 10:45:52AM +0100, Mateusz Guzik wrote:
> > quote:
> >     When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
> >     speed up when issuing readlink on /initrd.img on ext4.
> >
> > Benchmark code at the bottom.
> >
> > ext4 and tmpfs are patched, other filesystems can also get there with
> > some more work.
> >
> > Arguably the current get_link API should be patched to let the fs return
> > the size, but that's not a churn I'm interested into diving in.
> >
> > On my v1 Jan remarked 1.5% is not a particularly high win questioning
> > whether doing this makes sense. I noted the value is only this small
> > because of other slowdowns.
>
> The thing is that you're stealing one of the holes I just put into struct
> inode a cycle ago or so. The general idea has been to shrink struct
> inode if we can and I'm not sure that caching the link length is
> actually worth losing that hole. Otherwise I wouldn't object.
>

Per the patch description this can be a union with something not used
for symlinks. I'll find a nice field.

> > All that aside there is also quite a bit of branching and func calling
> > which does not need to be there (example: make vfsuid/vfsgid, could be
> > combined into one routine etc.).
>
> They should probably also be made inline functions and likely/unlikely
> sprinkled in there.

someone(tm) should at least do a sweep through in-vfs code. for
example LOOKUP_IS_SCOPED is sometimes marked as unlikely and other
times has no annotations whatsoever, even though ultimately it all
executes in the same setting

Interestingly even __read_seqcount_begin (used *twice* in path_init())
is missing one. I sent a patch to fix it long time ago but the
recipient did not respond, maybe I should resend with more people in
cc (who though?), see:
https://lore.kernel.org/all/20230727180355.813995-1-mjguzik@gmail.com/

-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-20 10:42   ` Mateusz Guzik
@ 2024-11-20 11:12     ` Christian Brauner
  2024-11-20 19:47       ` Mateusz Guzik
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2024-11-20 11:12 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm

On Wed, Nov 20, 2024 at 11:42:33AM +0100, Mateusz Guzik wrote:
> On Wed, Nov 20, 2024 at 11:33 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Nov 19, 2024 at 10:45:52AM +0100, Mateusz Guzik wrote:
> > > quote:
> > >     When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
> > >     speed up when issuing readlink on /initrd.img on ext4.
> > >
> > > Benchmark code at the bottom.
> > >
> > > ext4 and tmpfs are patched, other filesystems can also get there with
> > > some more work.
> > >
> > > Arguably the current get_link API should be patched to let the fs return
> > > the size, but that's not a churn I'm interested into diving in.
> > >
> > > On my v1 Jan remarked 1.5% is not a particularly high win questioning
> > > whether doing this makes sense. I noted the value is only this small
> > > because of other slowdowns.
> >
> > The thing is that you're stealing one of the holes I just put into struct
> > inode a cycle ago or so. The general idea has been to shrink struct
> > inode if we can and I'm not sure that caching the link length is
> > actually worth losing that hole. Otherwise I wouldn't object.
> >
> 
> Per the patch description this can be a union with something not used
> for symlinks. I'll find a nice field.

Ok!

> 
> > > All that aside there is also quite a bit of branching and func calling
> > > which does not need to be there (example: make vfsuid/vfsgid, could be
> > > combined into one routine etc.).
> >
> > They should probably also be made inline functions and likely/unlikely
> > sprinkled in there.
> 
> someone(tm) should at least do a sweep through in-vfs code. for

Yeah, in this case I was specifically talking about make_vfs{g,u}id().
They should be inlines and they should contain likely/unlikely.

> example LOOKUP_IS_SCOPED is sometimes marked as unlikely and other
> times has no annotations whatsoever, even though ultimately it all
> executes in the same setting
> 
> Interestingly even __read_seqcount_begin (used *twice* in path_init())
> is missing one. I sent a patch to fix it long time ago but the
> recipient did not respond

I snatched it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] symlink length caching
  2024-11-20 11:12     ` Christian Brauner
@ 2024-11-20 19:47       ` Mateusz Guzik
  0 siblings, 0 replies; 13+ messages in thread
From: Mateusz Guzik @ 2024-11-20 19:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, hughd, linux-ext4,
	tytso, linux-mm

On Wed, Nov 20, 2024 at 12:13 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Nov 20, 2024 at 11:42:33AM +0100, Mateusz Guzik wrote:
> > Interestingly even __read_seqcount_begin (used *twice* in path_init())
> > is missing one. I sent a patch to fix it long time ago but the
> > recipient did not respond
>
> I snatched it.

Thanks.

But I have to say having *two* counters to check for each lookup is
bothering me and making me wonder if they could be unified (or another
counter added to cover for either of those?)? No clue about
feasibility, is there a known showstopper?

Both are defined like so:
__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);

Suppose nothing can be done to only look at one counter on lookup.

In that case how about combining the suckers into one cacheline at
least? Sure, this will result in new bounces for threads modifying
these, but this is relatively infrequent compared to how often lookups
performed and with these slapped together there will be only one line
spent on it, instead of two.

Just RFC'ing it here.
-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-20 19:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-19  9:45 [PATCH v2 0/3] symlink length caching Mateusz Guzik
2024-11-19  9:45 ` [PATCH v2 1/3] vfs: support caching symlink lengths in inodes Mateusz Guzik
2024-11-20  4:15   ` wangjianjian (C)
2024-11-20  5:01     ` Matthew Wilcox
2024-11-20  5:10       ` wangjianjian (C)
2024-11-19  9:45 ` [PATCH v2 2/3] ext4: use inode_set_cached_link() Mateusz Guzik
2024-11-19  9:45 ` [PATCH v2 3/3] tmpfs: " Mateusz Guzik
2024-11-19 17:53 ` [PATCH v2 0/3] symlink length caching Theodore Ts'o
2024-11-19 18:07   ` Mateusz Guzik
2024-11-20 10:33 ` Christian Brauner
2024-11-20 10:42   ` Mateusz Guzik
2024-11-20 11:12     ` Christian Brauner
2024-11-20 19:47       ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox