* [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short [not found] <CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com> @ 2025-03-06 11:39 ` K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw) To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm Cc: Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak Here is an attempt at converting pipe->{head,tail} to unsigned short members. All local variables storing the head and the tail have been modified to unsigned short too the best of my knowledge) pipe_resize_ring() has added a check to make sure nr_slots can be contained within the limits of the pipe->{head,tail}. Building on that, pipe->{max_usage,ring_size} were also converted to unsigned short to catch any cases of incorrect unsigned arithmetic. This has been tested for a few hours with anon pipes on a 5th Generation AMD EPYC System and on a dual socket Intel Granite Rapids system without experiencing any obvious issues. pipe_write() was tagged with a debug trace_printk() on one of the test machines to make sure the head has indeed wrapped around behind the tail to ensure the wraparound scenarios are indeed happening. Few pipe_occupancy() and pipe->max_usage based checks have been converted to use unsigned short based arithmetic in fs/fuse/dev.c, fs/splice.c, mm/filemap.c, and mm/filemap.c. Few of the observations from Rasmus on a parallel thread [1] has been folded into Patch 3 (thanks a ton for chasing them). More eyes and testing is greatly appreciated. If my tests run into any issues, I'll report back on this thread. Series was tested with: hackbench -g 16 -f 20 --threads --pipe -l 10000000 -s 100 # Warp around stress-ng --oom-pipe 128 --oom-pipe-ops 100000 -t 600s # pipe resize stress-ng --splice 128 --splice-ops 100000000 -t 600s # splice stress-ng --vm-splice 128 --vm-splice-ops 100000000 -t 600s # splice stress-ng --tee 128 --tee-ops 100000000 -t 600s stress-ng --zlib 128 --zlib-ops 1000000 -t 600s stress-ng --sigpipe 128 -t 60s stress-ng did not report any failure in my testing. [1] https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/ -- K Prateek Nayak (3): fs/pipe: Limit the slots in pipe_resize_ring() fs/splice: Atomically read pipe->{head,tail} in opipe_prep() treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short fs/fuse/dev.c | 4 +++- fs/pipe.c | 33 +++++++++++++++----------- fs/splice.c | 50 ++++++++++++++++++++------------------- include/linux/pipe_fs_i.h | 39 ++++++++++-------------------- kernel/watch_queue.c | 3 ++- mm/filemap.c | 5 ++-- mm/shmem.c | 5 ++-- 7 files changed, 69 insertions(+), 70 deletions(-) base-commit: 848e076317446f9c663771ddec142d7c2eb4cb43 -- 2.43.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() 2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak @ 2025-03-06 11:39 ` K Prateek Nayak 2025-03-06 12:28 ` Oleg Nesterov 2025-03-06 11:39 ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak 2 siblings, 1 reply; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw) To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm Cc: Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak Limit the number of slots in pipe_resize_ring() to the maximum value representable by pipe->{head,tail}. Values beyond the max limit can lead to incorrect pipe_occupancy() calculations where the pipe will never appear full. Since nr_slots is always a power of 2 and the maximum size of pipe_index_t is 32 bits, BIT() is sufficient to represent the maximum value possible for nr_slots. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> --- fs/pipe.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/pipe.c b/fs/pipe.c index e8e6698f3698..3ca3103e1de7 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) struct pipe_buffer *bufs; unsigned int head, tail, mask, n; + /* nr_slots larger than limits of pipe->{head,tail} */ + if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1))) + return -EINVAL; + bufs = kcalloc(nr_slots, sizeof(*bufs), GFP_KERNEL_ACCOUNT | __GFP_NOWARN); if (unlikely(!bufs)) base-commit: 848e076317446f9c663771ddec142d7c2eb4cb43 -- 2.43.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() 2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak @ 2025-03-06 12:28 ` Oleg Nesterov 2025-03-06 15:26 ` K Prateek Nayak 0 siblings, 1 reply; 11+ messages in thread From: Oleg Nesterov @ 2025-03-06 12:28 UTC (permalink / raw) To: K Prateek Nayak Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal On 03/06, K Prateek Nayak wrote: > > @@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) > struct pipe_buffer *bufs; > unsigned int head, tail, mask, n; > > + /* nr_slots larger than limits of pipe->{head,tail} */ > + if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1))) Hmm, perhaps if (nr_slots > (pipe_index_t)-1u) is more clear? Oleg. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() 2025-03-06 12:28 ` Oleg Nesterov @ 2025-03-06 15:26 ` K Prateek Nayak 0 siblings, 0 replies; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 15:26 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal Hello Oleg, On 3/6/2025 5:58 PM, Oleg Nesterov wrote: > On 03/06, K Prateek Nayak wrote: >> >> @@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) >> struct pipe_buffer *bufs; >> unsigned int head, tail, mask, n; >> >> + /* nr_slots larger than limits of pipe->{head,tail} */ >> + if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1))) > > Hmm, perhaps > > if (nr_slots > (pipe_index_t)-1u) > > is more clear? Indeed it is. I didn't even know we could do that! Thank you for pointing it out. > > Oleg. > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() 2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak @ 2025-03-06 11:39 ` K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak 2 siblings, 0 replies; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw) To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm Cc: Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak opipe_prep() checks pipe_full() before taking the "pipe->mutex". Use the newly introduced "pipe->head_tail" member to read the head and the tail atomically and not miss any updates between the reads. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> --- fs/splice.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index 28cfa63aa236..e51f33aca032 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1682,13 +1682,14 @@ static int ipipe_prep(struct pipe_inode_info *pipe, unsigned int flags) */ static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags) { + union pipe_index idx = { .head_tail = READ_ONCE(pipe->head_tail) }; int ret; /* * Check pipe occupancy without the inode lock first. This function * is speculative anyways, so missing one is ok. */ - if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) + if (!pipe_full(idx.head, idx.tail, READ_ONCE(pipe->max_usage))) return 0; ret = 0; -- 2.43.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak 2025-03-06 11:39 ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak @ 2025-03-06 11:39 ` K Prateek Nayak 2025-03-06 12:32 ` Oleg Nesterov 2 siblings, 1 reply; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw) To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm Cc: Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak Use 16-bit head and tail to track the pipe buffer production and consumption. Since "pipe->max_usage" and "pipe->ring_size" must fall between the head and the tail limits, convert them to unsigned short as well to catch any cases of unsigned arithmetic going wrong. Part of fs/fuse/dev.c, fs/splice.c, mm/filemap.c, and mm/shmem.c were touched to accommodate the "unsigned short" based calculations of pipe_occupancy(). pipe->tail is incremented always with both "pipe->mutex" and "pipe->rd_wait.lock" held for pipes with watch queue. pipe_write() exits early if pipe has a watch queue but otherwise takes the "pipe->muxtex" before updating pipe->head. post_one_notification() holds the "pipe->rd_wait.lock" when updating pipe->head. Updates to "pipe->head" and "pipe->tail" are always mutually exclusive, either guarded by "pipe->mutex" or by "pipe->rd_wait.lock". Even a RMW updates to the 16-bits fields should be safe because of those synchronization primitives on architectures that cannot do an atomic 16-bit store. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> --- fs/fuse/dev.c | 7 +++--- fs/pipe.c | 31 +++++++++++++------------- fs/splice.c | 47 ++++++++++++++++++++------------------- include/linux/pipe_fs_i.h | 39 +++++++++++--------------------- kernel/watch_queue.c | 3 ++- mm/filemap.c | 5 +++-- mm/shmem.c | 5 +++-- 7 files changed, 65 insertions(+), 72 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 2b2d1b755544..993e6dc24de1 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1440,6 +1440,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, int page_nr = 0; struct pipe_buffer *bufs; struct fuse_copy_state cs; + unsigned short free_slots; struct fuse_dev *fud = fuse_get_dev(in); if (!fud) @@ -1457,7 +1458,8 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, if (ret < 0) goto out; - if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->max_usage) { + free_slots = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail); + if (free_slots < cs.nr_segs) { ret = -EIO; goto out; } @@ -2107,9 +2109,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { - unsigned int head, tail, mask, count; + unsigned short head, tail, mask, count, idx; unsigned nbuf; - unsigned idx; struct pipe_buffer *bufs; struct fuse_copy_state cs; struct fuse_dev *fud; diff --git a/fs/pipe.c b/fs/pipe.c index 3ca3103e1de7..b8d87eabff79 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -216,9 +216,9 @@ static inline bool pipe_readable(const struct pipe_inode_info *pipe) return !pipe_empty(idx.head, idx.tail) || !writers; } -static inline unsigned int pipe_update_tail(struct pipe_inode_info *pipe, - struct pipe_buffer *buf, - unsigned int tail) +static inline unsigned short pipe_update_tail(struct pipe_inode_info *pipe, + struct pipe_buffer *buf, + unsigned short tail) { pipe_buf_release(pipe, buf); @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) */ for (;;) { /* Read ->head with a barrier vs post_one_notification() */ - unsigned int head = smp_load_acquire(&pipe->head); - unsigned int tail = pipe->tail; - unsigned int mask = pipe->ring_size - 1; + unsigned short head = smp_load_acquire(&pipe->head); + unsigned short tail = pipe->tail; + unsigned short mask = pipe->ring_size - 1; #ifdef CONFIG_WATCH_QUEUE if (pipe->note_loss) { @@ -417,7 +417,7 @@ static inline int is_packetized(struct file *file) static inline bool pipe_writable(const struct pipe_inode_info *pipe) { union pipe_index idx = { .head_tail = READ_ONCE(pipe->head_tail) }; - unsigned int max_usage = READ_ONCE(pipe->max_usage); + unsigned short max_usage = READ_ONCE(pipe->max_usage); return !pipe_full(idx.head, idx.tail, max_usage) || !READ_ONCE(pipe->readers); @@ -428,7 +428,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) { struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; - unsigned int head; + unsigned short head; ssize_t ret = 0; size_t total_len = iov_iter_count(from); ssize_t chars; @@ -471,7 +471,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) was_empty = pipe_empty(head, pipe->tail); chars = total_len & (PAGE_SIZE-1); if (chars && !was_empty) { - unsigned int mask = pipe->ring_size - 1; + unsigned short mask = pipe->ring_size - 1; struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask]; int offset = buf->offset + buf->len; @@ -614,7 +614,8 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct pipe_inode_info *pipe = filp->private_data; - unsigned int count, head, tail, mask; + unsigned short head, tail, mask; + unsigned int count; switch (cmd) { case FIONREAD: @@ -1270,10 +1271,10 @@ unsigned int round_pipe_size(unsigned int size) int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) { struct pipe_buffer *bufs; - unsigned int head, tail, mask, n; + unsigned short head, tail, mask, n; /* nr_slots larger than limits of pipe->{head,tail} */ - if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1))) + if (unlikely(nr_slots > USHRT_MAX)) return -EINVAL; bufs = kcalloc(nr_slots, sizeof(*bufs), @@ -1298,13 +1299,13 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) * and adjust the indices. */ if (n > 0) { - unsigned int h = head & mask; - unsigned int t = tail & mask; + unsigned short h = head & mask; + unsigned short t = tail & mask; if (h > t) { memcpy(bufs, pipe->bufs + t, n * sizeof(struct pipe_buffer)); } else { - unsigned int tsize = pipe->ring_size - t; + unsigned short tsize = pipe->ring_size - t; if (h > 0) memcpy(bufs + tsize, pipe->bufs, h * sizeof(struct pipe_buffer)); diff --git a/fs/splice.c b/fs/splice.c index e51f33aca032..891a7cf9fb55 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -198,9 +198,9 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { unsigned int spd_pages = spd->nr_pages; - unsigned int tail = pipe->tail; - unsigned int head = pipe->head; - unsigned int mask = pipe->ring_size - 1; + unsigned short tail = pipe->tail; + unsigned short head = pipe->head; + unsigned short mask = pipe->ring_size - 1; ssize_t ret = 0; int page_nr = 0; @@ -245,9 +245,9 @@ EXPORT_SYMBOL_GPL(splice_to_pipe); ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - unsigned int head = pipe->head; - unsigned int tail = pipe->tail; - unsigned int mask = pipe->ring_size - 1; + unsigned short head = pipe->head; + unsigned short tail = pipe->tail; + unsigned short mask = pipe->ring_size - 1; int ret; if (unlikely(!pipe->readers)) { @@ -271,7 +271,7 @@ EXPORT_SYMBOL(add_to_pipe); */ int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { - unsigned int max_usage = READ_ONCE(pipe->max_usage); + unsigned short max_usage = READ_ONCE(pipe->max_usage); spd->nr_pages_max = max_usage; if (max_usage <= PIPE_DEF_BUFFERS) @@ -327,12 +327,13 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos, struct kiocb kiocb; struct page **pages; ssize_t ret; - size_t used, npages, chunk, remain, keep = 0; + size_t npages, chunk, remain, keep = 0; + unsigned short used; int i; /* Work out how much data we can actually add into the pipe */ used = pipe_occupancy(pipe->head, pipe->tail); - npages = max_t(ssize_t, pipe->max_usage - used, 0); + npages = max_t(unsigned short, pipe->max_usage - used, 0); len = min_t(size_t, len, npages * PAGE_SIZE); npages = DIV_ROUND_UP(len, PAGE_SIZE); @@ -445,9 +446,9 @@ static void wakeup_pipe_writers(struct pipe_inode_info *pipe) static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_desc *sd, splice_actor *actor) { - unsigned int head = pipe->head; - unsigned int tail = pipe->tail; - unsigned int mask = pipe->ring_size - 1; + unsigned short head = pipe->head; + unsigned short tail = pipe->tail; + unsigned short mask = pipe->ring_size - 1; int ret; while (!pipe_empty(head, tail)) { @@ -494,8 +495,8 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des /* We know we have a pipe buffer, but maybe it's empty? */ static inline bool eat_empty_buffer(struct pipe_inode_info *pipe) { - unsigned int tail = pipe->tail; - unsigned int mask = pipe->ring_size - 1; + unsigned short tail = pipe->tail; + unsigned short mask = pipe->ring_size - 1; struct pipe_buffer *buf = &pipe->bufs[tail & mask]; if (unlikely(!buf->len)) { @@ -690,7 +691,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, while (sd.total_len) { struct kiocb kiocb; struct iov_iter from; - unsigned int head, tail, mask; + unsigned short head, tail, mask; size_t left; int n; @@ -809,7 +810,7 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, pipe_lock(pipe); while (len > 0) { - unsigned int head, tail, mask, bc = 0; + unsigned short head, tail, mask, bc = 0; size_t remain = len; /* @@ -960,7 +961,7 @@ static ssize_t do_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - unsigned int p_space; + unsigned short p_space; if (unlikely(!(in->f_mode & FMODE_READ))) return -EBADF; @@ -1724,9 +1725,9 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, size_t len, unsigned int flags) { struct pipe_buffer *ibuf, *obuf; - unsigned int i_head, o_head; - unsigned int i_tail, o_tail; - unsigned int i_mask, o_mask; + unsigned short i_head, o_head; + unsigned short i_tail, o_tail; + unsigned short i_mask, o_mask; int ret = 0; bool input_wakeup = false; @@ -1861,9 +1862,9 @@ static ssize_t link_pipe(struct pipe_inode_info *ipipe, size_t len, unsigned int flags) { struct pipe_buffer *ibuf, *obuf; - unsigned int i_head, o_head; - unsigned int i_tail, o_tail; - unsigned int i_mask, o_mask; + unsigned short i_head, o_head; + unsigned short i_tail, o_tail; + unsigned short i_mask, o_mask; ssize_t ret = 0; /* diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index e572e6fc4f81..0997c028548c 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -31,19 +31,6 @@ struct pipe_buffer { unsigned long private; }; -/* - * Really only alpha needs 32-bit fields, but - * might as well do it for 64-bit architectures - * since that's what we've historically done, - * and it makes 'head_tail' always be a simple - * 'unsigned long'. - */ -#ifdef CONFIG_64BIT -typedef unsigned int pipe_index_t; -#else -typedef unsigned short pipe_index_t; -#endif - /* * We have to declare this outside 'struct pipe_inode_info', * but then we can't use 'union pipe_index' for an anonymous @@ -51,10 +38,10 @@ typedef unsigned short pipe_index_t; * below. Annoying. */ union pipe_index { - unsigned long head_tail; + unsigned int head_tail; struct { - pipe_index_t head; - pipe_index_t tail; + unsigned short head; + unsigned short tail; }; }; @@ -89,15 +76,15 @@ struct pipe_inode_info { /* This has to match the 'union pipe_index' above */ union { - unsigned long head_tail; + unsigned int head_tail; struct { - pipe_index_t head; - pipe_index_t tail; + unsigned short head; + unsigned short tail; }; }; - unsigned int max_usage; - unsigned int ring_size; + unsigned short max_usage; + unsigned short ring_size; unsigned int nr_accounted; unsigned int readers; unsigned int writers; @@ -181,7 +168,7 @@ static inline bool pipe_has_watch_queue(const struct pipe_inode_info *pipe) * @head: The pipe ring head pointer * @tail: The pipe ring tail pointer */ -static inline bool pipe_empty(unsigned int head, unsigned int tail) +static inline bool pipe_empty(unsigned short head, unsigned short tail) { return head == tail; } @@ -191,9 +178,9 @@ static inline bool pipe_empty(unsigned int head, unsigned int tail) * @head: The pipe ring head pointer * @tail: The pipe ring tail pointer */ -static inline unsigned int pipe_occupancy(unsigned int head, unsigned int tail) +static inline unsigned short pipe_occupancy(unsigned short head, unsigned short tail) { - return (pipe_index_t)(head - tail); + return head - tail; } /** @@ -202,8 +189,8 @@ static inline unsigned int pipe_occupancy(unsigned int head, unsigned int tail) * @tail: The pipe ring tail pointer * @limit: The maximum amount of slots available. */ -static inline bool pipe_full(unsigned int head, unsigned int tail, - unsigned int limit) +static inline bool pipe_full(unsigned short head, unsigned short tail, + unsigned short limit) { return pipe_occupancy(head, tail) >= limit; } diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c index 5267adeaa403..c76cfebf46c8 100644 --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -101,7 +101,8 @@ static bool post_one_notification(struct watch_queue *wqueue, struct pipe_inode_info *pipe = wqueue->pipe; struct pipe_buffer *buf; struct page *page; - unsigned int head, tail, mask, note, offset, len; + unsigned short head, tail, mask; + unsigned int note, offset, len; bool done = false; spin_lock_irq(&pipe->rd_wait.lock); diff --git a/mm/filemap.c b/mm/filemap.c index d4564a79eb35..6007b2403471 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2943,9 +2943,10 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, { struct folio_batch fbatch; struct kiocb iocb; - size_t total_spliced = 0, used, npages; + size_t total_spliced = 0, npages; loff_t isize, end_offset; bool writably_mapped; + unsigned short used; int i, error = 0; if (unlikely(*ppos >= in->f_mapping->host->i_sb->s_maxbytes)) @@ -2956,7 +2957,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, /* Work out how much data we can actually add into the pipe */ used = pipe_occupancy(pipe->head, pipe->tail); - npages = max_t(ssize_t, pipe->max_usage - used, 0); + npages = max_t(unsigned short, pipe->max_usage - used, 0); len = min_t(size_t, len, npages * PAGE_SIZE); folio_batch_init(&fbatch); diff --git a/mm/shmem.c b/mm/shmem.c index 4ea6109a8043..339084e5a8a1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3509,13 +3509,14 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, struct inode *inode = file_inode(in); struct address_space *mapping = inode->i_mapping; struct folio *folio = NULL; - size_t total_spliced = 0, used, npages, n, part; + size_t total_spliced = 0, npages, n, part; + unsigned short used; loff_t isize; int error = 0; /* Work out how much data we can actually add into the pipe */ used = pipe_occupancy(pipe->head, pipe->tail); - npages = max_t(ssize_t, pipe->max_usage - used, 0); + npages = max_t(unsigned short, pipe->max_usage - used, 0); len = min_t(size_t, len, npages * PAGE_SIZE); do { -- 2.43.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak @ 2025-03-06 12:32 ` Oleg Nesterov 2025-03-06 12:41 ` Oleg Nesterov 2025-03-06 14:27 ` Rasmus Villemoes 0 siblings, 2 replies; 11+ messages in thread From: Oleg Nesterov @ 2025-03-06 12:32 UTC (permalink / raw) To: K Prateek Nayak Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal On 03/06, K Prateek Nayak wrote: > > @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) > */ > for (;;) { > /* Read ->head with a barrier vs post_one_notification() */ > - unsigned int head = smp_load_acquire(&pipe->head); > - unsigned int tail = pipe->tail; > - unsigned int mask = pipe->ring_size - 1; > + unsigned short head = smp_load_acquire(&pipe->head); > + unsigned short tail = pipe->tail; > + unsigned short mask = pipe->ring_size - 1; I dunno... but if we do this, perhaps we should s/unsigned int/pipe_index_t instead? At least this would be more grep friendly. Oleg. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 12:32 ` Oleg Nesterov @ 2025-03-06 12:41 ` Oleg Nesterov 2025-03-06 15:33 ` K Prateek Nayak 2025-03-06 14:27 ` Rasmus Villemoes 1 sibling, 1 reply; 11+ messages in thread From: Oleg Nesterov @ 2025-03-06 12:41 UTC (permalink / raw) To: K Prateek Nayak Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal On 03/06, Oleg Nesterov wrote: > > On 03/06, K Prateek Nayak wrote: > > > > @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) > > */ > > for (;;) { > > /* Read ->head with a barrier vs post_one_notification() */ > > - unsigned int head = smp_load_acquire(&pipe->head); > > - unsigned int tail = pipe->tail; > > - unsigned int mask = pipe->ring_size - 1; > > + unsigned short head = smp_load_acquire(&pipe->head); > > + unsigned short tail = pipe->tail; > > + unsigned short mask = pipe->ring_size - 1; > > I dunno... but if we do this, perhaps we should > s/unsigned int/pipe_index_t instead? > > At least this would be more grep friendly. in any case, I think another cleanup before this change makes sense... pipe->ring_size is overused. pipe_read(), pipe_write() and much more users do not need "unsigned int mask", they can use pipe_buf(buf, slot) instead. Oleg. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 12:41 ` Oleg Nesterov @ 2025-03-06 15:33 ` K Prateek Nayak 2025-03-06 18:04 ` Linus Torvalds 0 siblings, 1 reply; 11+ messages in thread From: K Prateek Nayak @ 2025-03-06 15:33 UTC (permalink / raw) To: Oleg Nesterov, Rasmus Villemoes Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal Hello Oleg, Rasmus, On 3/6/2025 6:11 PM, Oleg Nesterov wrote: > On 03/06, Oleg Nesterov wrote: >> >> On 03/06, K Prateek Nayak wrote: >>> >>> @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) >>> */ >>> for (;;) { >>> /* Read ->head with a barrier vs post_one_notification() */ >>> - unsigned int head = smp_load_acquire(&pipe->head); >>> - unsigned int tail = pipe->tail; >>> - unsigned int mask = pipe->ring_size - 1; >>> + unsigned short head = smp_load_acquire(&pipe->head); >>> + unsigned short tail = pipe->tail; >>> + unsigned short mask = pipe->ring_size - 1; >> >> I dunno... but if we do this, perhaps we should >> s/unsigned int/pipe_index_t instead? >> >> At least this would be more grep friendly. Ack. I'll leave the typedef untouched and convert these to use pipe_index_t. This was an experiment so see if anything breaks with u16 conversion just to get more testing on that scenario. As Rasmus mentioned, leaving the head and tail as u32 on 64bit will lead to better code generation. > > in any case, I think another cleanup before this change makes sense... > pipe->ring_size is overused. pipe_read(), pipe_write() and much more > users do not need "unsigned int mask", they can use pipe_buf(buf, slot) > instead. Ack. I'll add a cleanup patch ahead of this conversion. Thank you both for taking a look. > > Oleg. > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 15:33 ` K Prateek Nayak @ 2025-03-06 18:04 ` Linus Torvalds 0 siblings, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2025-03-06 18:04 UTC (permalink / raw) To: K Prateek Nayak Cc: Oleg Nesterov, Rasmus Villemoes, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal On Thu, 6 Mar 2025 at 05:33, K Prateek Nayak <kprateek.nayak@amd.com> wrote: > >> > >> I dunno... but if we do this, perhaps we should > >> s/unsigned int/pipe_index_t instead? > >> > >> At least this would be more grep friendly. > > Ack. I'll leave the typedef untouched and convert these to use > pipe_index_t. This was an experiment so see if anything breaks with u16 > conversion just to get more testing on that scenario. As Rasmus > mentioned, leaving the head and tail as u32 on 64bit will lead to > better code generation. Yes, I was going to say the same - please don't change to 'unsigned short'. Judicious use of 'pipe_index_t' may be a good idea, but as I fixed some issues Rasmus found, I was also looking at the generated code, and on at least x86 where 16-bit generates extra instructions and prefixes, it seems marginally better to treat the values as 32-bit, and then only do the compares in 16 bits. That only causes a few "movzwl" instructions (at load time), and then the occasional "cmpw" (empty check) and "movw" (store) etc. But I only did a very quick "let's look at a few cases of x86-64 also using a 16-bit pipe_index_t". So for testing purposes your patch looks fine, but not as something to apply. If anything, I think we should actively try to remove as many direct accesses to these pipe fields as humanly possible. As Oleg said, a lot of them should just be cleaned up to use the helpers we already have. Rasmus found a few cases of that already, like that FIONREAD case where it was just doing a lot of open-coding of things that shouldn't be open-coded. I've fixed the two cases he pointed at up as obvious bugs, but it would be good to see where else issues like this might lurk. Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short 2025-03-06 12:32 ` Oleg Nesterov 2025-03-06 12:41 ` Oleg Nesterov @ 2025-03-06 14:27 ` Rasmus Villemoes 1 sibling, 0 replies; 11+ messages in thread From: Rasmus Villemoes @ 2025-03-06 14:27 UTC (permalink / raw) To: Oleg Nesterov Cc: K Prateek Nayak, Linus Torvalds, Miklos Szeredi, Alexander Viro, Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel, linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle), Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal On Thu, Mar 06 2025, Oleg Nesterov <oleg@redhat.com> wrote: > On 03/06, K Prateek Nayak wrote: >> >> @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) >> */ >> for (;;) { >> /* Read ->head with a barrier vs post_one_notification() */ >> - unsigned int head = smp_load_acquire(&pipe->head); >> - unsigned int tail = pipe->tail; >> - unsigned int mask = pipe->ring_size - 1; >> + unsigned short head = smp_load_acquire(&pipe->head); >> + unsigned short tail = pipe->tail; >> + unsigned short mask = pipe->ring_size - 1; > > I dunno... but if we do this, perhaps we should > s/unsigned int/pipe_index_t instead? > > At least this would be more grep friendly. Agreed. Also, while using u16 on all arches may be good for now to make sure everything is updated, it may also be that it ends up causing suboptimal code gen for 64 bit architectures, so even if we do change pipe_index_t now, perhaps we'd want to change it back to "half a ulong" at some point in the future. Rasmus ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-03-06 18:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com>
2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
2025-03-06 12:28 ` Oleg Nesterov
2025-03-06 15:26 ` K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
2025-03-06 12:32 ` Oleg Nesterov
2025-03-06 12:41 ` Oleg Nesterov
2025-03-06 15:33 ` K Prateek Nayak
2025-03-06 18:04 ` Linus Torvalds
2025-03-06 14:27 ` Rasmus Villemoes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox