linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short
       [not found] <CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com>
@ 2025-03-06 11:39 ` K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw)
  To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm
  Cc: Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak

Here is an attempt at converting pipe->{head,tail} to unsigned short
members. All local variables storing the head and the tail have been
modified to unsigned short too the best of my knowledge)

pipe_resize_ring() has added a check to make sure nr_slots can be
contained within the limits of the pipe->{head,tail}. Building on that,
pipe->{max_usage,ring_size} were also converted to unsigned short to
catch any cases of incorrect unsigned arithmetic.

This has been tested for a few hours with anon pipes on a 5th Generation
AMD EPYC System and on a dual socket Intel Granite Rapids system without
experiencing any obvious issues.

pipe_write() was tagged with a debug trace_printk() on one of the test
machines to make sure the head has indeed wrapped around behind the tail
to ensure the wraparound scenarios are indeed happening.

Few pipe_occupancy() and pipe->max_usage based checks have been
converted to use unsigned short based arithmetic in fs/fuse/dev.c,
fs/splice.c, mm/filemap.c, and mm/filemap.c. Few of the observations
from Rasmus on a parallel thread [1] has been folded into Patch 3
(thanks a ton for chasing them).

More eyes and testing is greatly appreciated. If my tests run into any
issues, I'll report back on this thread. Series was tested with:

  hackbench -g 16 -f 20 --threads --pipe -l 10000000 -s 100 # Warp around
  stress-ng --oom-pipe 128 --oom-pipe-ops 100000 -t 600s # pipe resize
  stress-ng --splice 128 --splice-ops 100000000 -t 600s # splice
  stress-ng --vm-splice 128 --vm-splice-ops 100000000 -t 600s # splice

  stress-ng --tee 128 --tee-ops 100000000 -t 600s
  stress-ng --zlib 128 --zlib-ops 1000000 -t 600s
  stress-ng --sigpipe 128 -t 60s

stress-ng did not report any failure in my testing.

[1] https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/
--
K Prateek Nayak (3):
  fs/pipe: Limit the slots in pipe_resize_ring()
  fs/splice: Atomically read pipe->{head,tail} in opipe_prep()
  treewide: pipe: Convert all references to
    pipe->{head,tail,max_usage,ring_size} to unsigned short

 fs/fuse/dev.c             |  4 +++-
 fs/pipe.c                 | 33 +++++++++++++++-----------
 fs/splice.c               | 50 ++++++++++++++++++++-------------------
 include/linux/pipe_fs_i.h | 39 ++++++++++--------------------
 kernel/watch_queue.c      |  3 ++-
 mm/filemap.c              |  5 ++--
 mm/shmem.c                |  5 ++--
 7 files changed, 69 insertions(+), 70 deletions(-)


base-commit: 848e076317446f9c663771ddec142d7c2eb4cb43
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring()
  2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
@ 2025-03-06 11:39   ` K Prateek Nayak
  2025-03-06 12:28     ` Oleg Nesterov
  2025-03-06 11:39   ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
  2 siblings, 1 reply; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw)
  To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm
  Cc: Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak

Limit the number of slots in pipe_resize_ring() to the maximum value
representable by pipe->{head,tail}. Values beyond the max limit can
lead to incorrect pipe_occupancy() calculations where the pipe will
never appear full.

Since nr_slots is always a power of 2 and the maximum size of
pipe_index_t is 32 bits, BIT() is sufficient to represent the maximum
value possible for nr_slots.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 fs/pipe.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index e8e6698f3698..3ca3103e1de7 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
 	struct pipe_buffer *bufs;
 	unsigned int head, tail, mask, n;
 
+	/* nr_slots larger than limits of pipe->{head,tail} */
+	if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1)))
+		return -EINVAL;
+
 	bufs = kcalloc(nr_slots, sizeof(*bufs),
 		       GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
 	if (unlikely(!bufs))

base-commit: 848e076317446f9c663771ddec142d7c2eb4cb43
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep()
  2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
@ 2025-03-06 11:39   ` K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
  2 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw)
  To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm
  Cc: Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak

opipe_prep() checks pipe_full() before taking the "pipe->mutex". Use the
newly introduced "pipe->head_tail" member to read the head and the tail
atomically and not miss any updates between the reads.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 fs/splice.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 28cfa63aa236..e51f33aca032 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1682,13 +1682,14 @@ static int ipipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
  */
 static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
 {
+	union pipe_index idx = { .head_tail = READ_ONCE(pipe->head_tail) };
 	int ret;
 
 	/*
 	 * Check pipe occupancy without the inode lock first. This function
 	 * is speculative anyways, so missing one is ok.
 	 */
-	if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+	if (!pipe_full(idx.head, idx.tail, READ_ONCE(pipe->max_usage)))
 		return 0;
 
 	ret = 0;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
  2025-03-06 11:39   ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
@ 2025-03-06 11:39   ` K Prateek Nayak
  2025-03-06 12:32     ` Oleg Nesterov
  2 siblings, 1 reply; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 11:39 UTC (permalink / raw)
  To: Linus Torvalds, Oleg Nesterov, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm
  Cc: Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal, K Prateek Nayak

Use 16-bit head and tail to track the pipe buffer production and
consumption.

Since "pipe->max_usage" and "pipe->ring_size" must fall between the head
and the tail limits, convert them to unsigned short as well to catch any
cases of unsigned arithmetic going wrong.

Part of fs/fuse/dev.c, fs/splice.c, mm/filemap.c, and mm/shmem.c were
touched to accommodate the "unsigned short" based calculations of
pipe_occupancy().

pipe->tail is incremented always with both "pipe->mutex" and
"pipe->rd_wait.lock" held for pipes with watch queue. pipe_write() exits
early if pipe has a watch queue but otherwise takes the "pipe->muxtex"
before updating pipe->head. post_one_notification() holds the
"pipe->rd_wait.lock" when updating pipe->head.

Updates to "pipe->head" and "pipe->tail" are always mutually exclusive,
either guarded by "pipe->mutex" or by "pipe->rd_wait.lock". Even a RMW
updates to the 16-bits fields should be safe because of those
synchronization primitives on architectures that cannot do an atomic
16-bit store.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 fs/fuse/dev.c             |  7 +++---
 fs/pipe.c                 | 31 +++++++++++++-------------
 fs/splice.c               | 47 ++++++++++++++++++++-------------------
 include/linux/pipe_fs_i.h | 39 +++++++++++---------------------
 kernel/watch_queue.c      |  3 ++-
 mm/filemap.c              |  5 +++--
 mm/shmem.c                |  5 +++--
 7 files changed, 65 insertions(+), 72 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2b2d1b755544..993e6dc24de1 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1440,6 +1440,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
 	int page_nr = 0;
 	struct pipe_buffer *bufs;
 	struct fuse_copy_state cs;
+	unsigned short free_slots;
 	struct fuse_dev *fud = fuse_get_dev(in);
 
 	if (!fud)
@@ -1457,7 +1458,8 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
 	if (ret < 0)
 		goto out;
 
-	if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->max_usage) {
+	free_slots = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
+	if (free_slots < cs.nr_segs) {
 		ret = -EIO;
 		goto out;
 	}
@@ -2107,9 +2109,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 				     struct file *out, loff_t *ppos,
 				     size_t len, unsigned int flags)
 {
-	unsigned int head, tail, mask, count;
+	unsigned short head, tail, mask, count, idx;
 	unsigned nbuf;
-	unsigned idx;
 	struct pipe_buffer *bufs;
 	struct fuse_copy_state cs;
 	struct fuse_dev *fud;
diff --git a/fs/pipe.c b/fs/pipe.c
index 3ca3103e1de7..b8d87eabff79 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -216,9 +216,9 @@ static inline bool pipe_readable(const struct pipe_inode_info *pipe)
 	return !pipe_empty(idx.head, idx.tail) || !writers;
 }
 
-static inline unsigned int pipe_update_tail(struct pipe_inode_info *pipe,
-					    struct pipe_buffer *buf,
-					    unsigned int tail)
+static inline unsigned short pipe_update_tail(struct pipe_inode_info *pipe,
+					      struct pipe_buffer *buf,
+					      unsigned short tail)
 {
 	pipe_buf_release(pipe, buf);
 
@@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
 	 */
 	for (;;) {
 		/* Read ->head with a barrier vs post_one_notification() */
-		unsigned int head = smp_load_acquire(&pipe->head);
-		unsigned int tail = pipe->tail;
-		unsigned int mask = pipe->ring_size - 1;
+		unsigned short head = smp_load_acquire(&pipe->head);
+		unsigned short tail = pipe->tail;
+		unsigned short mask = pipe->ring_size - 1;
 
 #ifdef CONFIG_WATCH_QUEUE
 		if (pipe->note_loss) {
@@ -417,7 +417,7 @@ static inline int is_packetized(struct file *file)
 static inline bool pipe_writable(const struct pipe_inode_info *pipe)
 {
 	union pipe_index idx = { .head_tail = READ_ONCE(pipe->head_tail) };
-	unsigned int max_usage = READ_ONCE(pipe->max_usage);
+	unsigned short max_usage = READ_ONCE(pipe->max_usage);
 
 	return !pipe_full(idx.head, idx.tail, max_usage) ||
 		!READ_ONCE(pipe->readers);
@@ -428,7 +428,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *filp = iocb->ki_filp;
 	struct pipe_inode_info *pipe = filp->private_data;
-	unsigned int head;
+	unsigned short head;
 	ssize_t ret = 0;
 	size_t total_len = iov_iter_count(from);
 	ssize_t chars;
@@ -471,7 +471,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 	was_empty = pipe_empty(head, pipe->tail);
 	chars = total_len & (PAGE_SIZE-1);
 	if (chars && !was_empty) {
-		unsigned int mask = pipe->ring_size - 1;
+		unsigned short mask = pipe->ring_size - 1;
 		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
 		int offset = buf->offset + buf->len;
 
@@ -614,7 +614,8 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	struct pipe_inode_info *pipe = filp->private_data;
-	unsigned int count, head, tail, mask;
+	unsigned short head, tail, mask;
+	unsigned int count;
 
 	switch (cmd) {
 	case FIONREAD:
@@ -1270,10 +1271,10 @@ unsigned int round_pipe_size(unsigned int size)
 int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
 {
 	struct pipe_buffer *bufs;
-	unsigned int head, tail, mask, n;
+	unsigned short head, tail, mask, n;
 
 	/* nr_slots larger than limits of pipe->{head,tail} */
-	if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1)))
+	if (unlikely(nr_slots > USHRT_MAX))
 		return -EINVAL;
 
 	bufs = kcalloc(nr_slots, sizeof(*bufs),
@@ -1298,13 +1299,13 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
 	 * and adjust the indices.
 	 */
 	if (n > 0) {
-		unsigned int h = head & mask;
-		unsigned int t = tail & mask;
+		unsigned short h = head & mask;
+		unsigned short t = tail & mask;
 		if (h > t) {
 			memcpy(bufs, pipe->bufs + t,
 			       n * sizeof(struct pipe_buffer));
 		} else {
-			unsigned int tsize = pipe->ring_size - t;
+			unsigned short tsize = pipe->ring_size - t;
 			if (h > 0)
 				memcpy(bufs + tsize, pipe->bufs,
 				       h * sizeof(struct pipe_buffer));
diff --git a/fs/splice.c b/fs/splice.c
index e51f33aca032..891a7cf9fb55 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -198,9 +198,9 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
 		       struct splice_pipe_desc *spd)
 {
 	unsigned int spd_pages = spd->nr_pages;
-	unsigned int tail = pipe->tail;
-	unsigned int head = pipe->head;
-	unsigned int mask = pipe->ring_size - 1;
+	unsigned short tail = pipe->tail;
+	unsigned short head = pipe->head;
+	unsigned short mask = pipe->ring_size - 1;
 	ssize_t ret = 0;
 	int page_nr = 0;
 
@@ -245,9 +245,9 @@ EXPORT_SYMBOL_GPL(splice_to_pipe);
 
 ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 {
-	unsigned int head = pipe->head;
-	unsigned int tail = pipe->tail;
-	unsigned int mask = pipe->ring_size - 1;
+	unsigned short head = pipe->head;
+	unsigned short tail = pipe->tail;
+	unsigned short mask = pipe->ring_size - 1;
 	int ret;
 
 	if (unlikely(!pipe->readers)) {
@@ -271,7 +271,7 @@ EXPORT_SYMBOL(add_to_pipe);
  */
 int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd)
 {
-	unsigned int max_usage = READ_ONCE(pipe->max_usage);
+	unsigned short max_usage = READ_ONCE(pipe->max_usage);
 
 	spd->nr_pages_max = max_usage;
 	if (max_usage <= PIPE_DEF_BUFFERS)
@@ -327,12 +327,13 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 	struct kiocb kiocb;
 	struct page **pages;
 	ssize_t ret;
-	size_t used, npages, chunk, remain, keep = 0;
+	size_t npages, chunk, remain, keep = 0;
+	unsigned short used;
 	int i;
 
 	/* Work out how much data we can actually add into the pipe */
 	used = pipe_occupancy(pipe->head, pipe->tail);
-	npages = max_t(ssize_t, pipe->max_usage - used, 0);
+	npages = max_t(unsigned short, pipe->max_usage - used, 0);
 	len = min_t(size_t, len, npages * PAGE_SIZE);
 	npages = DIV_ROUND_UP(len, PAGE_SIZE);
 
@@ -445,9 +446,9 @@ static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
 static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_desc *sd,
 			  splice_actor *actor)
 {
-	unsigned int head = pipe->head;
-	unsigned int tail = pipe->tail;
-	unsigned int mask = pipe->ring_size - 1;
+	unsigned short head = pipe->head;
+	unsigned short tail = pipe->tail;
+	unsigned short mask = pipe->ring_size - 1;
 	int ret;
 
 	while (!pipe_empty(head, tail)) {
@@ -494,8 +495,8 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des
 /* We know we have a pipe buffer, but maybe it's empty? */
 static inline bool eat_empty_buffer(struct pipe_inode_info *pipe)
 {
-	unsigned int tail = pipe->tail;
-	unsigned int mask = pipe->ring_size - 1;
+	unsigned short tail = pipe->tail;
+	unsigned short mask = pipe->ring_size - 1;
 	struct pipe_buffer *buf = &pipe->bufs[tail & mask];
 
 	if (unlikely(!buf->len)) {
@@ -690,7 +691,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	while (sd.total_len) {
 		struct kiocb kiocb;
 		struct iov_iter from;
-		unsigned int head, tail, mask;
+		unsigned short head, tail, mask;
 		size_t left;
 		int n;
 
@@ -809,7 +810,7 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
 	pipe_lock(pipe);
 
 	while (len > 0) {
-		unsigned int head, tail, mask, bc = 0;
+		unsigned short head, tail, mask, bc = 0;
 		size_t remain = len;
 
 		/*
@@ -960,7 +961,7 @@ static ssize_t do_splice_read(struct file *in, loff_t *ppos,
 			      struct pipe_inode_info *pipe, size_t len,
 			      unsigned int flags)
 {
-	unsigned int p_space;
+	unsigned short p_space;
 
 	if (unlikely(!(in->f_mode & FMODE_READ)))
 		return -EBADF;
@@ -1724,9 +1725,9 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
-	unsigned int i_head, o_head;
-	unsigned int i_tail, o_tail;
-	unsigned int i_mask, o_mask;
+	unsigned short i_head, o_head;
+	unsigned short i_tail, o_tail;
+	unsigned short i_mask, o_mask;
 	int ret = 0;
 	bool input_wakeup = false;
 
@@ -1861,9 +1862,9 @@ static ssize_t link_pipe(struct pipe_inode_info *ipipe,
 			 size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
-	unsigned int i_head, o_head;
-	unsigned int i_tail, o_tail;
-	unsigned int i_mask, o_mask;
+	unsigned short i_head, o_head;
+	unsigned short i_tail, o_tail;
+	unsigned short i_mask, o_mask;
 	ssize_t ret = 0;
 
 	/*
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e572e6fc4f81..0997c028548c 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -31,19 +31,6 @@ struct pipe_buffer {
 	unsigned long private;
 };
 
-/*
- * Really only alpha needs 32-bit fields, but
- * might as well do it for 64-bit architectures
- * since that's what we've historically done,
- * and it makes 'head_tail' always be a simple
- * 'unsigned long'.
- */
-#ifdef CONFIG_64BIT
-typedef unsigned int pipe_index_t;
-#else
-typedef unsigned short pipe_index_t;
-#endif
-
 /*
  * We have to declare this outside 'struct pipe_inode_info',
  * but then we can't use 'union pipe_index' for an anonymous
@@ -51,10 +38,10 @@ typedef unsigned short pipe_index_t;
  * below. Annoying.
  */
 union pipe_index {
-	unsigned long head_tail;
+	unsigned int head_tail;
 	struct {
-		pipe_index_t head;
-		pipe_index_t tail;
+		unsigned short head;
+		unsigned short tail;
 	};
 };
 
@@ -89,15 +76,15 @@ struct pipe_inode_info {
 
 	/* This has to match the 'union pipe_index' above */
 	union {
-		unsigned long head_tail;
+		unsigned int head_tail;
 		struct {
-			pipe_index_t head;
-			pipe_index_t tail;
+			unsigned short head;
+			unsigned short tail;
 		};
 	};
 
-	unsigned int max_usage;
-	unsigned int ring_size;
+	unsigned short max_usage;
+	unsigned short ring_size;
 	unsigned int nr_accounted;
 	unsigned int readers;
 	unsigned int writers;
@@ -181,7 +168,7 @@ static inline bool pipe_has_watch_queue(const struct pipe_inode_info *pipe)
  * @head: The pipe ring head pointer
  * @tail: The pipe ring tail pointer
  */
-static inline bool pipe_empty(unsigned int head, unsigned int tail)
+static inline bool pipe_empty(unsigned short head, unsigned short tail)
 {
 	return head == tail;
 }
@@ -191,9 +178,9 @@ static inline bool pipe_empty(unsigned int head, unsigned int tail)
  * @head: The pipe ring head pointer
  * @tail: The pipe ring tail pointer
  */
-static inline unsigned int pipe_occupancy(unsigned int head, unsigned int tail)
+static inline unsigned short pipe_occupancy(unsigned short head, unsigned short tail)
 {
-	return (pipe_index_t)(head - tail);
+	return head - tail;
 }
 
 /**
@@ -202,8 +189,8 @@ static inline unsigned int pipe_occupancy(unsigned int head, unsigned int tail)
  * @tail: The pipe ring tail pointer
  * @limit: The maximum amount of slots available.
  */
-static inline bool pipe_full(unsigned int head, unsigned int tail,
-			     unsigned int limit)
+static inline bool pipe_full(unsigned short head, unsigned short tail,
+			     unsigned short limit)
 {
 	return pipe_occupancy(head, tail) >= limit;
 }
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 5267adeaa403..c76cfebf46c8 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -101,7 +101,8 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	struct pipe_inode_info *pipe = wqueue->pipe;
 	struct pipe_buffer *buf;
 	struct page *page;
-	unsigned int head, tail, mask, note, offset, len;
+	unsigned short head, tail, mask;
+	unsigned int note, offset, len;
 	bool done = false;
 
 	spin_lock_irq(&pipe->rd_wait.lock);
diff --git a/mm/filemap.c b/mm/filemap.c
index d4564a79eb35..6007b2403471 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2943,9 +2943,10 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 {
 	struct folio_batch fbatch;
 	struct kiocb iocb;
-	size_t total_spliced = 0, used, npages;
+	size_t total_spliced = 0, npages;
 	loff_t isize, end_offset;
 	bool writably_mapped;
+	unsigned short used;
 	int i, error = 0;
 
 	if (unlikely(*ppos >= in->f_mapping->host->i_sb->s_maxbytes))
@@ -2956,7 +2957,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 
 	/* Work out how much data we can actually add into the pipe */
 	used = pipe_occupancy(pipe->head, pipe->tail);
-	npages = max_t(ssize_t, pipe->max_usage - used, 0);
+	npages = max_t(unsigned short, pipe->max_usage - used, 0);
 	len = min_t(size_t, len, npages * PAGE_SIZE);
 
 	folio_batch_init(&fbatch);
diff --git a/mm/shmem.c b/mm/shmem.c
index 4ea6109a8043..339084e5a8a1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3509,13 +3509,14 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 	struct inode *inode = file_inode(in);
 	struct address_space *mapping = inode->i_mapping;
 	struct folio *folio = NULL;
-	size_t total_spliced = 0, used, npages, n, part;
+	size_t total_spliced = 0, npages, n, part;
+	unsigned short used;
 	loff_t isize;
 	int error = 0;
 
 	/* Work out how much data we can actually add into the pipe */
 	used = pipe_occupancy(pipe->head, pipe->tail);
-	npages = max_t(ssize_t, pipe->max_usage - used, 0);
+	npages = max_t(unsigned short, pipe->max_usage - used, 0);
 	len = min_t(size_t, len, npages * PAGE_SIZE);
 
 	do {
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring()
  2025-03-06 11:39   ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
@ 2025-03-06 12:28     ` Oleg Nesterov
  2025-03-06 15:26       ` K Prateek Nayak
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2025-03-06 12:28 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal

On 03/06, K Prateek Nayak wrote:
>
> @@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
>  	struct pipe_buffer *bufs;
>  	unsigned int head, tail, mask, n;
>
> +	/* nr_slots larger than limits of pipe->{head,tail} */
> +	if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1)))

Hmm, perhaps

	if (nr_slots > (pipe_index_t)-1u)

is more clear?

Oleg.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 11:39   ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
@ 2025-03-06 12:32     ` Oleg Nesterov
  2025-03-06 12:41       ` Oleg Nesterov
  2025-03-06 14:27       ` Rasmus Villemoes
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2025-03-06 12:32 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal

On 03/06, K Prateek Nayak wrote:
>
> @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>  	 */
>  	for (;;) {
>  		/* Read ->head with a barrier vs post_one_notification() */
> -		unsigned int head = smp_load_acquire(&pipe->head);
> -		unsigned int tail = pipe->tail;
> -		unsigned int mask = pipe->ring_size - 1;
> +		unsigned short head = smp_load_acquire(&pipe->head);
> +		unsigned short tail = pipe->tail;
> +		unsigned short mask = pipe->ring_size - 1;

I dunno... but if we do this, perhaps we should
s/unsigned int/pipe_index_t instead?

At least this would be more grep friendly.

Oleg.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 12:32     ` Oleg Nesterov
@ 2025-03-06 12:41       ` Oleg Nesterov
  2025-03-06 15:33         ` K Prateek Nayak
  2025-03-06 14:27       ` Rasmus Villemoes
  1 sibling, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2025-03-06 12:41 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal

On 03/06, Oleg Nesterov wrote:
>
> On 03/06, K Prateek Nayak wrote:
> >
> > @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> >  	 */
> >  	for (;;) {
> >  		/* Read ->head with a barrier vs post_one_notification() */
> > -		unsigned int head = smp_load_acquire(&pipe->head);
> > -		unsigned int tail = pipe->tail;
> > -		unsigned int mask = pipe->ring_size - 1;
> > +		unsigned short head = smp_load_acquire(&pipe->head);
> > +		unsigned short tail = pipe->tail;
> > +		unsigned short mask = pipe->ring_size - 1;
>
> I dunno... but if we do this, perhaps we should
> s/unsigned int/pipe_index_t instead?
>
> At least this would be more grep friendly.

in any case, I think another cleanup before this change makes sense...
pipe->ring_size is overused. pipe_read(), pipe_write() and much more
users do not need "unsigned int mask", they can use pipe_buf(buf, slot)
instead.

Oleg.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 12:32     ` Oleg Nesterov
  2025-03-06 12:41       ` Oleg Nesterov
@ 2025-03-06 14:27       ` Rasmus Villemoes
  1 sibling, 0 replies; 11+ messages in thread
From: Rasmus Villemoes @ 2025-03-06 14:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: K Prateek Nayak, Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay,
	Ananth.narayan, Swapnil Sapkal

On Thu, Mar 06 2025, Oleg Nesterov <oleg@redhat.com> wrote:

> On 03/06, K Prateek Nayak wrote:
>>
>> @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>>  	 */
>>  	for (;;) {
>>  		/* Read ->head with a barrier vs post_one_notification() */
>> -		unsigned int head = smp_load_acquire(&pipe->head);
>> -		unsigned int tail = pipe->tail;
>> -		unsigned int mask = pipe->ring_size - 1;
>> +		unsigned short head = smp_load_acquire(&pipe->head);
>> +		unsigned short tail = pipe->tail;
>> +		unsigned short mask = pipe->ring_size - 1;
>
> I dunno... but if we do this, perhaps we should
> s/unsigned int/pipe_index_t instead?
>
> At least this would be more grep friendly.

Agreed. Also, while using u16 on all arches may be good for now to make
sure everything is updated, it may also be that it ends up causing
suboptimal code gen for 64 bit architectures, so even if we do change
pipe_index_t now, perhaps we'd want to change it back to "half a ulong"
at some point in the future.

Rasmus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring()
  2025-03-06 12:28     ` Oleg Nesterov
@ 2025-03-06 15:26       ` K Prateek Nayak
  0 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 15:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Rasmus Villemoes,
	Neeraj.Upadhyay, Ananth.narayan, Swapnil Sapkal

Hello Oleg,

On 3/6/2025 5:58 PM, Oleg Nesterov wrote:
> On 03/06, K Prateek Nayak wrote:
>>
>> @@ -1272,6 +1272,10 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
>>   	struct pipe_buffer *bufs;
>>   	unsigned int head, tail, mask, n;
>>
>> +	/* nr_slots larger than limits of pipe->{head,tail} */
>> +	if (unlikely(nr_slots > BIT(BITS_PER_TYPE(pipe_index_t) - 1)))
> 
> Hmm, perhaps
> 
> 	if (nr_slots > (pipe_index_t)-1u)
> 
> is more clear?

Indeed it is. I didn't even know we could do that! Thank you for
pointing it out.

> 
> Oleg.
> 

-- 
Thanks and Regards,
Prateek



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 12:41       ` Oleg Nesterov
@ 2025-03-06 15:33         ` K Prateek Nayak
  2025-03-06 18:04           ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: K Prateek Nayak @ 2025-03-06 15:33 UTC (permalink / raw)
  To: Oleg Nesterov, Rasmus Villemoes
  Cc: Linus Torvalds, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay,
	Ananth.narayan, Swapnil Sapkal

Hello Oleg, Rasmus,

On 3/6/2025 6:11 PM, Oleg Nesterov wrote:
> On 03/06, Oleg Nesterov wrote:
>>
>> On 03/06, K Prateek Nayak wrote:
>>>
>>> @@ -272,9 +272,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>>>   	 */
>>>   	for (;;) {
>>>   		/* Read ->head with a barrier vs post_one_notification() */
>>> -		unsigned int head = smp_load_acquire(&pipe->head);
>>> -		unsigned int tail = pipe->tail;
>>> -		unsigned int mask = pipe->ring_size - 1;
>>> +		unsigned short head = smp_load_acquire(&pipe->head);
>>> +		unsigned short tail = pipe->tail;
>>> +		unsigned short mask = pipe->ring_size - 1;
>>
>> I dunno... but if we do this, perhaps we should
>> s/unsigned int/pipe_index_t instead?
>>
>> At least this would be more grep friendly.

Ack. I'll leave the typedef untouched and convert these to use
pipe_index_t. This was an experiment so see if anything breaks with u16
conversion just to get more testing on that scenario. As Rasmus
mentioned, leaving the head and tail as u32 on 64bit will lead to
better code generation.

> 
> in any case, I think another cleanup before this change makes sense...
> pipe->ring_size is overused. pipe_read(), pipe_write() and much more
> users do not need "unsigned int mask", they can use pipe_buf(buf, slot)
> instead.

Ack. I'll add a cleanup patch ahead of this conversion. Thank you both
for taking a look.

> 
> Oleg.
> 

-- 
Thanks and Regards,
Prateek



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short
  2025-03-06 15:33         ` K Prateek Nayak
@ 2025-03-06 18:04           ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2025-03-06 18:04 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Oleg Nesterov, Rasmus Villemoes, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Andrew Morton, Hugh Dickins, linux-fsdevel,
	linux-kernel, linux-mm, Jan Kara, Matthew Wilcox (Oracle),
	Mateusz Guzik, Gautham R. Shenoy, Neeraj.Upadhyay,
	Ananth.narayan, Swapnil Sapkal

On Thu, 6 Mar 2025 at 05:33, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >>
> >> I dunno... but if we do this, perhaps we should
> >> s/unsigned int/pipe_index_t instead?
> >>
> >> At least this would be more grep friendly.
>
> Ack. I'll leave the typedef untouched and convert these to use
> pipe_index_t. This was an experiment so see if anything breaks with u16
> conversion just to get more testing on that scenario. As Rasmus
> mentioned, leaving the head and tail as u32 on 64bit will lead to
> better code generation.

Yes, I was going to say the same - please don't change to 'unsigned short'.

Judicious use of 'pipe_index_t' may be a good idea, but as I fixed
some issues Rasmus found, I was also looking at the generated code,
and on at least x86 where 16-bit generates extra instructions and
prefixes, it seems marginally better to treat the values as 32-bit,
and then only do the compares in 16 bits.

That only causes a few "movzwl" instructions (at load time), and then
the occasional "cmpw" (empty check) and "movw" (store) etc.

But I only did a very quick "let's look at a few cases of x86-64 also
using a 16-bit pipe_index_t".

So for testing purposes your patch looks fine, but not as something to apply.

If anything, I think we should actively try to remove as many direct
accesses to these pipe fields as humanly possible. As Oleg said, a lot
of them should just be cleaned up to use the helpers we already have.

Rasmus found a few cases of that already, like that FIONREAD case
where it was just doing a lot of open-coding of things that shouldn't
be open-coded.

I've fixed the two cases he pointed at up as obvious bugs, but it
would be good to see where else issues like this might lurk.

                 Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-03-06 18:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com>
2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
2025-03-06 11:39   ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
2025-03-06 12:28     ` Oleg Nesterov
2025-03-06 15:26       ` K Prateek Nayak
2025-03-06 11:39   ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
2025-03-06 11:39   ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
2025-03-06 12:32     ` Oleg Nesterov
2025-03-06 12:41       ` Oleg Nesterov
2025-03-06 15:33         ` K Prateek Nayak
2025-03-06 18:04           ` Linus Torvalds
2025-03-06 14:27       ` Rasmus Villemoes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox