put the xfs xfile abstraction on a diet

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* put the xfs xfile abstraction on a diet
@ 2024-01-03  8:41 Christoph Hellwig
  2024-01-03  8:41 ` [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Hi all,

this series refactors and simplifies the code in the xfs xfile
abstraction, which is a thing layer on a kernel-use shmem file.

Do do this is needs a slighly lower level export from shmem.c,
which I combined with improving an assert and documentation there.

One thing I don't really like yet is that xfile is still based on
folios and not pages.  The main stumbling block for that is the
mess around the hwpoison flag - that one still is per-file and not
per-folio, and shmem checks it weirdly often and not really in
at the abstractions levels where I'd expect it and feels very
different from the normal page cache code in filemap.c.  Maybe
I'm just failing to understand why that is done, but especially
without comments explaining it it feels like it could use some
real attention first.

Diffstat:
 Documentation/filesystems/xfs/xfs-online-fsck-design.rst |   10 
 fs/xfs/scrub/trace.h                                     |   38 -
 fs/xfs/scrub/xfarray.c                                   |   60 --
 fs/xfs/scrub/xfarray.h                                   |    3 
 fs/xfs/scrub/xfile.c                                     |  311 +++------------
 fs/xfs/scrub/xfile.h                                     |   62 --
 mm/shmem.c                                               |   24 +
 7 files changed, 143 insertions(+), 365 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03 23:32   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 02/15] shmem: export shmem_get_folio Christoph Hellwig
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Move the check that the inode really is a shmemfs one from
shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio
can also be called from outside of shmem.c.  Also turn it into a
WARN_ON_ONCE and error return instead of BUG_ON to be less severe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/shmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 91e2620148b2f6..3349df6d4e0360 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1951,6 +1951,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	int error;
 	bool alloced;
 
+	if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
+		return -EINVAL;
+
 	if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
 		return -EFBIG;
 repeat:
@@ -4895,7 +4898,6 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 	struct folio *folio;
 	int error;
 
-	BUG_ON(!shmem_mapping(mapping));
 	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
 				    gfp, NULL, NULL);
 	if (error)
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 02/15] shmem: export shmem_get_folio
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
  2024-01-03  8:41 ` [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03  8:41 ` [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Export shmem_get_folio as a slightly lower-level variant of
shmem_read_folio_gfp.  This will be useful for XFS xfile use cases
that want to pass SGP_NOALLOC or get a locked page, which the thin
shmem_read_folio_gfp wrapper can't provide.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/shmem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3349df6d4e0360..328eb3dbea9f1c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2116,12 +2116,27 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	return error;
 }
 
+/**
+ * shmem_get_folio - find and get a reference to a shmem folio.
+ * @inode:	inode to search
+ * @index:	the page index.
+ * @foliop:	pointer to the found folio if one was found
+ * @sgp:	SGP_* flags to control behavior
+ *
+ * Looks up the page cache entry at @inode & @index.
+ *
+ * If this function returns a folio, it is returned with an increased refcount.
+ *
+ * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
+ * and no folio was found at @index, or an ERR_PTR() otherwise.
+ */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 		enum sgp_type sgp)
 {
 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
 			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
 }
+EXPORT_SYMBOL_GPL(shmem_get_folio);
 
 /*
  * This is like autoremove_wake_function, but it removes the wait queue
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
  2024-01-03  8:41 ` [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
  2024-01-03  8:41 ` [PATCH 02/15] shmem: export shmem_get_folio Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:21   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 04/15] xfs: remove xfile_stat Christoph Hellwig
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Add a blurb that simply dirtying the folio will persist data for in-kernel
shmem files.  This is what most of the callers already do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/shmem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 328eb3dbea9f1c..235fac6dc53a0b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2129,6 +2129,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
  *
  * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
  * and no folio was found at @index, or an ERR_PTR() otherwise.
+ *
+ * If the caller modifies data in the returned folio, it must call
+ * folio_mark_dirty() on the locked folio before dropping the reference to
+ * ensure the folio is not reclaimed.  Unlike for normal file systems there is
+ * no need to reserve space for users of shmem_*file_setup().
  */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 		enum sgp_type sgp)
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 04/15] xfs: remove xfile_stat
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03 23:45   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

vfs_getattr is needed to query inode attributes for unknown underlying
file systems.  But shmemfs is well known for users of shmem_file_setup
and shmem_read_mapping_page_gfp that rely on it not needing specific
inode revalidation and having a normal mapping.  Remove the detour
through the getattr method and an extra wrapper, and just read the
inode size and i_bytes directly in the scrub tracing code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/trace.h | 34 ++++++++++------------------------
 fs/xfs/scrub/xfile.c | 19 -------------------
 fs/xfs/scrub/xfile.h |  7 -------
 3 files changed, 10 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 6bbb4e8639dca6..ed9e044f6d603c 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -861,18 +861,11 @@ TRACE_EVENT(xfile_destroy,
 		__field(loff_t, size)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes = inode->i_bytes;
+		__entry->size = i_size_read(inode);
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx isize 0x%llx",
 		  __entry->ino,
@@ -891,19 +884,12 @@ DECLARE_EVENT_CLASS(xfile_class,
 		__field(unsigned long long, bytecount)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes_used = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes_used = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes_used = inode->i_bytes;
 		__entry->pos = pos;
+		__entry->size = i_size_read(inode);
 		__entry->bytecount = bytecount;
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx pos 0x%llx bytecount 0x%llx isize 0x%llx",
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 090c3ead43fdf1..87654cdd5ac6f9 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -291,25 +291,6 @@ xfile_seek_data(
 	return ret;
 }
 
-/* Query stat information for an xfile. */
-int
-xfile_stat(
-	struct xfile		*xf,
-	struct xfile_stat	*statbuf)
-{
-	struct kstat		ks;
-	int			error;
-
-	error = vfs_getattr_nosec(&xf->file->f_path, &ks,
-			STATX_SIZE | STATX_BLOCKS, AT_STATX_DONT_SYNC);
-	if (error)
-		return error;
-
-	statbuf->size = ks.size;
-	statbuf->bytes = ks.blocks << SECTOR_SHIFT;
-	return 0;
-}
-
 /*
  * Grab the (locked) page for a memory object.  The object cannot span a page
  * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index d56643b0f429e1..c602d11560d8ee 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -63,13 +63,6 @@ xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-struct xfile_stat {
-	loff_t			size;
-	unsigned long long	bytes;
-};
-
-int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
-
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
 int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 04/15] xfs: remove xfile_stat Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03 23:48   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

All current and pending xfile users use the xfile_obj_load
and xfile_obj_store API, so make those the actual implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../xfs/xfs-online-fsck-design.rst            | 10 +---
 fs/xfs/scrub/trace.h                          |  4 +-
 fs/xfs/scrub/xfile.c                          | 54 +++++++++----------
 fs/xfs/scrub/xfile.h                          | 32 +----------
 4 files changed, 30 insertions(+), 70 deletions(-)

diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 352516feef6ffe..324d5ec921e8e5 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -1915,19 +1915,13 @@ four of those five higher level data structures.
 The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
 study.
 
-The most general storage interface supported by the xfile enables the reading
-and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
-This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
-which behave similarly to their userspace counterparts.
 XFS is very record-based, which suggests that the ability to load and store
 complete records is important.
 To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
-functions are provided to read and persist objects into an xfile.
-They are internally the same as pread and pwrite, except that they treat any
-error as an out of memory error.
+functions are provided to read and persist objects into an xfile that unlike
+the pread and pwrite system calls treat any error as an out of memory error.
 For online repair, squashing error conditions in this manner is an acceptable
 behavior because the only reaction is to abort the operation back to userspace.
-All five xfile usecases can be serviced by these four functions.
 
 However, no discussion of file access idioms is complete without answering the
 question, "But what about mmap?"
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index ed9e044f6d603c..e6156d000fc615 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -903,8 +903,8 @@ DECLARE_EVENT_CLASS(xfile_class,
 DEFINE_EVENT(xfile_class, name, \
 	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount), \
 	TP_ARGS(xf, pos, bytecount))
-DEFINE_XFILE_EVENT(xfile_pread);
-DEFINE_XFILE_EVENT(xfile_pwrite);
+DEFINE_XFILE_EVENT(xfile_obj_load);
+DEFINE_XFILE_EVENT(xfile_obj_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
 DEFINE_XFILE_EVENT(xfile_get_page);
 DEFINE_XFILE_EVENT(xfile_put_page);
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 87654cdd5ac6f9..9e25ecf3dc2fec 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -118,13 +118,11 @@ xfile_destroy(
 }
 
 /*
- * Read a memory object directly from the xfile's page cache.  Unlike regular
- * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
- * high an offset, instead of truncating the read.  Otherwise, we return
- * bytes read or an error code, like regular pread.
+ * Load an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pread(
+int
+xfile_obj_load(
 	struct xfile		*xf,
 	void			*buf,
 	size_t			count,
@@ -133,16 +131,15 @@ xfile_pread(
 	struct inode		*inode = file_inode(xf->file);
 	struct address_space	*mapping = inode->i_mapping;
 	struct page		*page = NULL;
-	ssize_t			read = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pread(xf, pos, count);
+	trace_xfile_obj_load(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -160,8 +157,10 @@ xfile_pread(
 				__GFP_NOWARN);
 		if (IS_ERR(page)) {
 			error = PTR_ERR(page);
-			if (error != -ENOMEM)
+			if (error != -ENOMEM) {
+				error = -ENOMEM;
 				break;
+			}
 
 			memset(buf, 0, len);
 			goto advance;
@@ -185,23 +184,18 @@ xfile_pread(
 		count -= len;
 		pos += len;
 		buf += len;
-		read += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (read > 0)
-		return read;
 	return error;
 }
 
 /*
- * Write a memory object directly to the xfile's page cache.  Unlike regular
- * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
- * high an offset, instead of truncating the write.  Otherwise, we return
- * bytes written or an error code, like regular pwrite.
+ * Store an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pwrite(
+int
+xfile_obj_store(
 	struct xfile		*xf,
 	const void		*buf,
 	size_t			count,
@@ -211,16 +205,15 @@ xfile_pwrite(
 	struct address_space	*mapping = inode->i_mapping;
 	const struct address_space_operations *aops = mapping->a_ops;
 	struct page		*page = NULL;
-	ssize_t			written = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pwrite(xf, pos, count);
+	trace_xfile_obj_store(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -239,8 +232,10 @@ xfile_pwrite(
 		 */
 		error = aops->write_begin(NULL, mapping, pos, len, &page,
 				&fsdata);
-		if (error)
+		if (error) {
+			error = -ENOMEM;
 			break;
+		}
 
 		/*
 		 * xfile pages must never be mapped into userspace, so we skip
@@ -259,13 +254,14 @@ xfile_pwrite(
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
 		if (ret < 0) {
-			error = ret;
+			error = -ENOMEM;
 			break;
 		}
 
-		written += ret;
-		if (ret != len)
+		if (ret != len) {
+			error = -ENOMEM;
 			break;
+		}
 
 		count -= ret;
 		pos += ret;
@@ -273,8 +269,6 @@ xfile_pwrite(
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (written > 0)
-		return written;
 	return error;
 }
 
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index c602d11560d8ee..f0e11febf216a7 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -29,38 +29,10 @@ struct xfile {
 int xfile_create(const char *description, loff_t isize, struct xfile **xfilep);
 void xfile_destroy(struct xfile *xf);
 
-ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
-ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
+int xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos);
+int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
 		loff_t pos);
 
-/*
- * Load an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pread(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
-/*
- * Store an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03 23:55   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 07/15] xfs: shmem_file_setup can't return NULL Christoph Hellwig
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_read_mapping_page_gfp always returns an uptodate page or an
ERR_PTR.  Remove the code that tries to handle a non-uptodate page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 9e25ecf3dc2fec..46f4a06029cd4b 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -166,18 +166,14 @@ xfile_obj_load(
 			goto advance;
 		}
 
-		if (PageUptodate(page)) {
-			/*
-			 * xfile pages must never be mapped into userspace, so
-			 * we skip the dcache flush.
-			 */
-			kaddr = kmap_local_page(page);
-			p = kaddr + offset_in_page(pos);
-			memcpy(buf, p, len);
-			kunmap_local(kaddr);
-		} else {
-			memset(buf, 0, len);
-		}
+		/*
+		 * xfile pages must never be mapped into userspace, so
+		 * we skip the dcache flush.
+		 */
+		kaddr = kmap_local_page(page);
+		p = kaddr + offset_in_page(pos);
+		memcpy(buf, p, len);
+		kunmap_local(kaddr);
 		put_page(page);
 
 advance:
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 07/15] xfs: shmem_file_setup can't return NULL
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-03 23:56   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 08/15] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup always returns a struct file pointer or an ERR_PTR,
so remove the code to check for a NULL return.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 46f4a06029cd4b..ec1be08937977a 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -62,15 +62,13 @@ xfile_create(
 {
 	struct inode		*inode;
 	struct xfile		*xf;
-	int			error = -ENOMEM;
+	int			error;
 
 	xf = kmalloc(sizeof(struct xfile), XCHK_GFP_FLAGS);
 	if (!xf)
 		return -ENOMEM;
 
 	xf->file = shmem_file_setup(description, isize, 0);
-	if (!xf->file)
-		goto out_xfile;
 	if (IS_ERR(xf->file)) {
 		error = PTR_ERR(xf->file);
 		goto out_xfile;
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 08/15] xfs: don't modify file and inode flags for shmem files
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 07/15] xfs: shmem_file_setup can't return NULL Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:01   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup is explicitly intended for a file that can be
fully read and written by kernel users without restrictions.  Don't
poke into internals to change random flags in the file or inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index ec1be08937977a..e872f4f0263f59 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -74,22 +74,7 @@ xfile_create(
 		goto out_xfile;
 	}
 
-	/*
-	 * We want a large sparse file that we can pread, pwrite, and seek.
-	 * xfile users are responsible for keeping the xfile hidden away from
-	 * all other callers, so we skip timestamp updates and security checks.
-	 * Make the inode only accessible by root, just in case the xfile ever
-	 * escapes.
-	 */
-	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
-			    FMODE_LSEEK;
-	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;
 	inode = file_inode(xf->file);
-	inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;
-	inode->i_mode &= ~0177;
-	inode->i_uid = GLOBAL_ROOT_UID;
-	inode->i_gid = GLOBAL_ROOT_GID;
-
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
 	trace_xfile_create(xf);
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 08/15] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
       [not found]   ` <20240104000324.GC361584@frogsfrogsfrogs>
  2024-01-03  8:41 ` [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

XFS is generally used on 64-bit, non-highmem platforms and xfile
mappings are accessed all the time.  Reduce our pain by not allowing
any highmem mappings in the xfile page cache and remove all the kmap
calls for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfarray.c |  3 +--
 fs/xfs/scrub/xfile.c   | 21 +++++++++------------
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index f0f532c10a5acc..3a44700037924b 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -580,7 +580,7 @@ xfarray_sort_get_page(
 	 * xfile pages must never be mapped into userspace, so we skip the
 	 * dcache flush when mapping the page.
 	 */
-	si->page_kaddr = kmap_local_page(si->xfpage.page);
+	si->page_kaddr = page_address(si->xfpage.page);
 	return 0;
 }
 
@@ -592,7 +592,6 @@ xfarray_sort_put_page(
 	if (!si->page_kaddr)
 		return 0;
 
-	kunmap_local(si->page_kaddr);
 	si->page_kaddr = NULL;
 
 	return xfile_put_page(si->array->xfile, &si->xfpage);
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index e872f4f0263f59..afbd205289e9b0 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -77,6 +77,12 @@ xfile_create(
 	inode = file_inode(xf->file);
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
+	/*
+	 * We don't want to bother with kmapping data during repair, so don't
+	 * allow highmem pages to back this mapping.
+	 */
+	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+
 	trace_xfile_create(xf);
 
 	*xfilep = xf;
@@ -126,7 +132,6 @@ xfile_obj_load(
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*p, *kaddr;
 		unsigned int	len;
 
 		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
@@ -153,10 +158,7 @@ xfile_obj_load(
 		 * xfile pages must never be mapped into userspace, so
 		 * we skip the dcache flush.
 		 */
-		kaddr = kmap_local_page(page);
-		p = kaddr + offset_in_page(pos);
-		memcpy(buf, p, len);
-		kunmap_local(kaddr);
+		memcpy(buf, page_address(page) + offset_in_page(pos), len);
 		put_page(page);
 
 advance:
@@ -221,14 +223,13 @@ xfile_obj_store(
 		 * the dcache flush.  If the page is not uptodate, zero it
 		 * before writing data.
 		 */
-		kaddr = kmap_local_page(page);
+		kaddr = page_address(page);
 		if (!PageUptodate(page)) {
 			memset(kaddr, 0, PAGE_SIZE);
 			SetPageUptodate(page);
 		}
 		p = kaddr + offset_in_page(pos);
 		memcpy(p, buf, len);
-		kunmap_local(kaddr);
 
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
@@ -314,11 +315,7 @@ xfile_get_page(
 	 * to the caller and make sure the backing store will hold on to them.
 	 */
 	if (!PageUptodate(page)) {
-		void	*kaddr;
-
-		kaddr = kmap_local_page(page);
-		memset(kaddr, 0, PAGE_SIZE);
-		kunmap_local(kaddr);
+		memset(page_address(page), 0, PAGE_SIZE);
 		SetPageUptodate(page);
 	}
 
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:04   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page Christoph Hellwig
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Now that xfile pages don't need kmapping, there is no need to cache
the kernel virtual address for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfarray.c | 22 ++++------------------
 fs/xfs/scrub/xfarray.h |  1 -
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 3a44700037924b..c29a240d4e25f4 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -570,18 +570,7 @@ xfarray_sort_get_page(
 	loff_t			pos,
 	uint64_t		len)
 {
-	int			error;
-
-	error = xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
-	if (error)
-		return error;
-
-	/*
-	 * xfile pages must never be mapped into userspace, so we skip the
-	 * dcache flush when mapping the page.
-	 */
-	si->page_kaddr = page_address(si->xfpage.page);
-	return 0;
+	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
 }
 
 /* Release a page we grabbed for sorting records. */
@@ -589,11 +578,8 @@ static inline int
 xfarray_sort_put_page(
 	struct xfarray_sortinfo	*si)
 {
-	if (!si->page_kaddr)
+	if (!xfile_page_cached(&si->xfpage))
 		return 0;
-
-	si->page_kaddr = NULL;
-
 	return xfile_put_page(si->array->xfile, &si->xfpage);
 }
 
@@ -636,7 +622,7 @@ xfarray_pagesort(
 		return error;
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = si->page_kaddr + offset_in_page(lo_pos);
+	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
@@ -883,7 +869,7 @@ xfarray_sort_load_cached(
 			return error;
 	}
 
-	memcpy(ptr, si->page_kaddr + offset_in_page(idx_pos),
+	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
 			si->array->obj_size);
 	return 0;
 }
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 62b9c506fdd1b7..6f2862054e194d 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -107,7 +107,6 @@ struct xfarray_sortinfo {
 
 	/* Cache a page here for faster access. */
 	struct xfile_page	xfpage;
-	void			*page_kaddr;
 
 #ifdef DEBUG
 	/* Performance statistics. */
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:12   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 12/15] xfs: remove struct xfile_page Christoph Hellwig
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch to using shmem_get_folio and manually dirtying the page instead
of abusing aops->write_begin and aops->write_end in xfile_get_page.

This simplified the code by not doing indirect calls of not actually
exported interfaces that don't really fit the use case very well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfarray.c | 32 +++++-----------
 fs/xfs/scrub/xfile.c   | 84 +++++++++++++-----------------------------
 fs/xfs/scrub/xfile.h   |  2 +-
 3 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index c29a240d4e25f4..c6e62c119148a1 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -574,13 +574,12 @@ xfarray_sort_get_page(
 }
 
 /* Release a page we grabbed for sorting records. */
-static inline int
+static inline void
 xfarray_sort_put_page(
 	struct xfarray_sortinfo	*si)
 {
-	if (!xfile_page_cached(&si->xfpage))
-		return 0;
-	return xfile_put_page(si->array->xfile, &si->xfpage);
+	if (xfile_page_cached(&si->xfpage))
+		xfile_put_page(si->array->xfile, &si->xfpage);
 }
 
 /* Decide if these records are eligible for in-page sorting. */
@@ -626,7 +625,8 @@ xfarray_pagesort(
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
-	return xfarray_sort_put_page(si);
+	xfarray_sort_put_page(si);
+	return 0;
 }
 
 /* Return a pointer to the xfarray pivot record within the sortinfo struct. */
@@ -836,10 +836,7 @@ xfarray_sort_load_cached(
 	startpage = idx_pos >> PAGE_SHIFT;
 	endpage = (idx_pos + si->array->obj_size - 1) >> PAGE_SHIFT;
 	if (startpage != endpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
-
+		xfarray_sort_put_page(si);
 		if (xfarray_sort_terminated(si, &error))
 			return error;
 
@@ -849,11 +846,8 @@ xfarray_sort_load_cached(
 
 	/* If the cached page is not the one we want, release it. */
 	if (xfile_page_cached(&si->xfpage) &&
-	    xfile_page_index(&si->xfpage) != startpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
-	}
+	    xfile_page_index(&si->xfpage) != startpage)
+		xfarray_sort_put_page(si);
 
 	/*
 	 * If we don't have a cached page (and we know the load is contained
@@ -995,10 +989,7 @@ xfarray_sort(
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			xfarray_sort_put_page(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
@@ -1024,10 +1015,7 @@ xfarray_sort(
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			xfarray_sort_put_page(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index afbd205289e9b0..2b4b0c4e8d2fb6 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -278,11 +278,8 @@ xfile_get_page(
 	struct xfile_page	*xfpage)
 {
 	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
-	void			*fsdata = NULL;
-	loff_t			key = round_down(pos, PAGE_SIZE);
+	struct folio		*folio = NULL;
+	struct page		*page;
 	unsigned int		pflags;
 	int			error;
 
@@ -293,78 +290,49 @@ xfile_get_page(
 
 	trace_xfile_get_page(xf, pos, len);
 
-	pflags = memalloc_nofs_save();
-
 	/*
-	 * We call write_begin directly here to avoid all the freezer
-	 * protection lock-taking that happens in the normal path.  shmem
-	 * doesn't support fs freeze, but lockdep doesn't know that and will
-	 * trip over that.
+	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
+	 * actually allocates a folio instead of erroring out.
 	 */
-	error = aops->write_begin(NULL, mapping, key, PAGE_SIZE, &page,
-			&fsdata);
-	if (error)
-		goto out_pflags;
-
-	/* We got the page, so make sure we push out EOF. */
-	if (i_size_read(inode) < pos + len)
+	if (pos + len > i_size_read(inode))
 		i_size_write(inode, pos + len);
 
-	/*
-	 * If the page isn't up to date, fill it with zeroes before we hand it
-	 * to the caller and make sure the backing store will hold on to them.
-	 */
-	if (!PageUptodate(page)) {
-		memset(page_address(page), 0, PAGE_SIZE);
-		SetPageUptodate(page);
+	pflags = memalloc_nofs_save();
+	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
+	memalloc_nofs_restore(pflags);
+	if (error)
+		return error;
+
+	page = folio_file_page(folio, pos >> PAGE_SHIFT);
+	if (PageHWPoison(page)) {
+		folio_put(folio);
+		return -EIO;
 	}
 
 	/*
-	 * Mark each page dirty so that the contents are written to some
-	 * backing store when we drop this buffer, and take an extra reference
-	 * to prevent the xfile page from being swapped or removed from the
-	 * page cache by reclaim if the caller unlocks the page.
+	 * Mark the page dirty so that it won't be reclaimed once we drop the
+	 * (potentially last) reference in xfile_put_page.
 	 */
 	set_page_dirty(page);
-	get_page(page);
 
 	xfpage->page = page;
-	xfpage->fsdata = fsdata;
-	xfpage->pos = key;
-out_pflags:
-	memalloc_nofs_restore(pflags);
-	return error;
+	xfpage->fsdata = NULL;
+	xfpage->pos = round_down(pos, PAGE_SIZE);
+	return 0;
 }
 
 /*
- * Release the (locked) page for a memory object.  Returns 0 or a negative
- * errno.
+ * Release the (locked) page for a memory object.
  */
-int
+void
 xfile_put_page(
 	struct xfile		*xf,
 	struct xfile_page	*xfpage)
 {
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	unsigned int		pflags;
-	int			ret;
-
-	trace_xfile_put_page(xf, xfpage->pos, PAGE_SIZE);
+	struct page		*page = xfpage->page;
 
-	/* Give back the reference that we took in xfile_get_page. */
-	put_page(xfpage->page);
+	trace_xfile_put_page(xf, page->index << PAGE_SHIFT, PAGE_SIZE);
 
-	pflags = memalloc_nofs_save();
-	ret = aops->write_end(NULL, mapping, xfpage->pos, PAGE_SIZE, PAGE_SIZE,
-			xfpage->page, xfpage->fsdata);
-	memalloc_nofs_restore(pflags);
-	memset(xfpage, 0, sizeof(struct xfile_page));
-
-	if (ret < 0)
-		return ret;
-	if (ret != PAGE_SIZE)
-		return -EIO;
-	return 0;
+	unlock_page(page);
+	put_page(page);
 }
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index f0e11febf216a7..2f46b7d694ce99 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -37,6 +37,6 @@ loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
-int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
+void xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
 
 #endif /* __XFS_SCRUB_XFILE_H__ */
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 12/15] xfs: remove struct xfile_page
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-10 22:42   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page Christoph Hellwig
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Return the shmem page directly from xfile_page_get and pass it back
to xfile_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfarray.c | 23 +++++++++++++++--------
 fs/xfs/scrub/xfarray.h |  2 +-
 fs/xfs/scrub/xfile.c   | 27 ++++++++++-----------------
 fs/xfs/scrub/xfile.h   | 21 ++-------------------
 4 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index c6e62c119148a1..4f396462186793 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -570,7 +570,13 @@ xfarray_sort_get_page(
 	loff_t			pos,
 	uint64_t		len)
 {
-	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
+	struct page		*page;
+
+	page = xfile_get_page(si->array->xfile, pos, len);
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+	si->page = page;
+	return 0;
 }
 
 /* Release a page we grabbed for sorting records. */
@@ -578,8 +584,10 @@ static inline void
 xfarray_sort_put_page(
 	struct xfarray_sortinfo	*si)
 {
-	if (xfile_page_cached(&si->xfpage))
-		xfile_put_page(si->array->xfile, &si->xfpage);
+	if (si->page) {
+		xfile_put_page(si->array->xfile, si->page);
+		si->page = NULL;
+	}
 }
 
 /* Decide if these records are eligible for in-page sorting. */
@@ -621,7 +629,7 @@ xfarray_pagesort(
 		return error;
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
+	startp = page_address(si->page) + offset_in_page(lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
@@ -845,15 +853,14 @@ xfarray_sort_load_cached(
 	}
 
 	/* If the cached page is not the one we want, release it. */
-	if (xfile_page_cached(&si->xfpage) &&
-	    xfile_page_index(&si->xfpage) != startpage)
+	if (si->page && si->page->index != startpage)
 		xfarray_sort_put_page(si);
 
 	/*
 	 * If we don't have a cached page (and we know the load is contained
 	 * in a single page) then grab it.
 	 */
-	if (!xfile_page_cached(&si->xfpage)) {
+	if (!si->page) {
 		if (xfarray_sort_terminated(si, &error))
 			return error;
 
@@ -863,7 +870,7 @@ xfarray_sort_load_cached(
 			return error;
 	}
 
-	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
+	memcpy(ptr, page_address(si->page) + offset_in_page(idx_pos),
 			si->array->obj_size);
 	return 0;
 }
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 6f2862054e194d..5765f2ad30d885 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -106,7 +106,7 @@ struct xfarray_sortinfo {
 	unsigned int		flags;
 
 	/* Cache a page here for faster access. */
-	struct xfile_page	xfpage;
+	struct page		*page;
 
 #ifdef DEBUG
 	/* Performance statistics. */
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 2b4b0c4e8d2fb6..715c4d10b67c14 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -267,15 +267,14 @@ xfile_seek_data(
 
 /*
  * Grab the (locked) page for a memory object.  The object cannot span a page
- * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
- * cannot grab the page, or the usual negative errno.
+ * boundary.  Returns 0 the locked page if successful, or an ERR_PTR on
+ * failure.
  */
-int
+struct page *
 xfile_get_page(
 	struct xfile		*xf,
 	loff_t			pos,
-	unsigned int		len,
-	struct xfile_page	*xfpage)
+	unsigned int		len)
 {
 	struct inode		*inode = file_inode(xf->file);
 	struct folio		*folio = NULL;
@@ -284,9 +283,9 @@ xfile_get_page(
 	int			error;
 
 	if (inode->i_sb->s_maxbytes - pos < len)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	if (len > PAGE_SIZE - offset_in_page(pos))
-		return -ENOTBLK;
+		return ERR_PTR(-ENOTBLK);
 
 	trace_xfile_get_page(xf, pos, len);
 
@@ -301,12 +300,12 @@ xfile_get_page(
 	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
 	memalloc_nofs_restore(pflags);
 	if (error)
-		return error;
+		return ERR_PTR(error);
 
 	page = folio_file_page(folio, pos >> PAGE_SHIFT);
 	if (PageHWPoison(page)) {
 		folio_put(folio);
-		return -EIO;
+		return ERR_PTR(-EIO);
 	}
 
 	/*
@@ -314,11 +313,7 @@ xfile_get_page(
 	 * (potentially last) reference in xfile_put_page.
 	 */
 	set_page_dirty(page);
-
-	xfpage->page = page;
-	xfpage->fsdata = NULL;
-	xfpage->pos = round_down(pos, PAGE_SIZE);
-	return 0;
+	return page;
 }
 
 /*
@@ -327,10 +322,8 @@ xfile_get_page(
 void
 xfile_put_page(
 	struct xfile		*xf,
-	struct xfile_page	*xfpage)
+	struct page		*page)
 {
-	struct page		*page = xfpage->page;
-
 	trace_xfile_put_page(xf, page->index << PAGE_SHIFT, PAGE_SIZE);
 
 	unlock_page(page);
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index 2f46b7d694ce99..993368b37b4b7c 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -6,22 +6,6 @@
 #ifndef __XFS_SCRUB_XFILE_H__
 #define __XFS_SCRUB_XFILE_H__
 
-struct xfile_page {
-	struct page		*page;
-	void			*fsdata;
-	loff_t			pos;
-};
-
-static inline bool xfile_page_cached(const struct xfile_page *xfpage)
-{
-	return xfpage->page != NULL;
-}
-
-static inline pgoff_t xfile_page_index(const struct xfile_page *xfpage)
-{
-	return xfpage->page->index;
-}
-
 struct xfile {
 	struct file		*file;
 };
@@ -35,8 +19,7 @@ int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-		struct xfile_page *xbuf);
-void xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
+struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len);
+void xfile_put_page(struct xfile *xf, struct page *page);
 
 #endif /* __XFS_SCRUB_XFILE_H__ */
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 12/15] xfs: remove struct xfile_page Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:18   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store Christoph Hellwig
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Pass a flags argument to xfile_get_page, and only allocate a new page
if the XFILE_ALLOC flag is passed.  This allows to also use
xfile_get_page for pure readers that do not want to allocate a new
page or dirty the existing one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfarray.c |  2 +-
 fs/xfs/scrub/xfile.c   | 14 ++++++++++----
 fs/xfs/scrub/xfile.h   |  4 +++-
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 4f396462186793..8543067d46366d 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -572,7 +572,7 @@ xfarray_sort_get_page(
 {
 	struct page		*page;
 
-	page = xfile_get_page(si->array->xfile, pos, len);
+	page = xfile_get_page(si->array->xfile, pos, len, XFILE_ALLOC);
 	if (IS_ERR(page))
 		return PTR_ERR(page);
 	si->page = page;
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 715c4d10b67c14..3ed7fb82a4497b 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -274,7 +274,8 @@ struct page *
 xfile_get_page(
 	struct xfile		*xf,
 	loff_t			pos,
-	unsigned int		len)
+	unsigned int		len,
+	unsigned int		flags)
 {
 	struct inode		*inode = file_inode(xf->file);
 	struct folio		*folio = NULL;
@@ -293,15 +294,19 @@ xfile_get_page(
 	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
 	 * actually allocates a folio instead of erroring out.
 	 */
-	if (pos + len > i_size_read(inode))
+	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
 		i_size_write(inode, pos + len);
 
 	pflags = memalloc_nofs_save();
-	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
+	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
 	memalloc_nofs_restore(pflags);
 	if (error)
 		return ERR_PTR(error);
 
+	if (!folio)
+		return NULL;
+
 	page = folio_file_page(folio, pos >> PAGE_SHIFT);
 	if (PageHWPoison(page)) {
 		folio_put(folio);
@@ -312,7 +317,8 @@ xfile_get_page(
 	 * Mark the page dirty so that it won't be reclaimed once we drop the
 	 * (potentially last) reference in xfile_put_page.
 	 */
-	set_page_dirty(page);
+	if (flags & XFILE_ALLOC)
+		set_page_dirty(page);
 	return page;
 }
 
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index 993368b37b4b7c..f0403ea869e4d0 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -19,7 +19,9 @@ int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len);
+#define XFILE_ALLOC		(1 << 0) /* allocate page if not present */
+struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
+		unsigned int flags);
 void xfile_put_page(struct xfile *xf, struct page *page);
 
 #endif /* __XFS_SCRUB_XFILE_H__ */
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:20   ` Darrick J. Wong
  2024-01-03  8:41 ` [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load Christoph Hellwig
  2024-01-04  1:35 ` put the xfs xfile abstraction on a diet Darrick J. Wong
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Rewrite xfile_obj_store to use xfile_get_page and xfile_put_page to
access the data in the shmem page cache instead of abusing the
shmem write_begin and write_end aops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 66 ++++++++------------------------------------
 1 file changed, 11 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 3ed7fb82a4497b..987b03df241b02 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -182,74 +182,30 @@ xfile_obj_store(
 	size_t			count,
 	loff_t			pos)
 {
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
-	unsigned int		pflags;
-	int			error = 0;
-
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
-	if (inode->i_sb->s_maxbytes - pos < count)
+	if (file_inode(xf->file)->i_sb->s_maxbytes - pos < count)
 		return -ENOMEM;
 
 	trace_xfile_obj_store(xf, pos, count);
 
-	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*fsdata = NULL;
-		void		*p, *kaddr;
+		struct page	*page;
 		unsigned int	len;
-		int		ret;
 
 		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
+		page = xfile_get_page(xf, pos, len, XFILE_ALLOC);
+		if (IS_ERR(page))
+			return -ENOMEM;
+		memcpy(page_address(page) + offset_in_page(pos), buf, len);
+		xfile_put_page(xf, page);
 
-		/*
-		 * We call write_begin directly here to avoid all the freezer
-		 * protection lock-taking that happens in the normal path.
-		 * shmem doesn't support fs freeze, but lockdep doesn't know
-		 * that and will trip over that.
-		 */
-		error = aops->write_begin(NULL, mapping, pos, len, &page,
-				&fsdata);
-		if (error) {
-			error = -ENOMEM;
-			break;
-		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so we skip
-		 * the dcache flush.  If the page is not uptodate, zero it
-		 * before writing data.
-		 */
-		kaddr = page_address(page);
-		if (!PageUptodate(page)) {
-			memset(kaddr, 0, PAGE_SIZE);
-			SetPageUptodate(page);
-		}
-		p = kaddr + offset_in_page(pos);
-		memcpy(p, buf, len);
-
-		ret = aops->write_end(NULL, mapping, pos, len, len, page,
-				fsdata);
-		if (ret < 0) {
-			error = -ENOMEM;
-			break;
-		}
-
-		if (ret != len) {
-			error = -ENOMEM;
-			break;
-		}
-
-		count -= ret;
-		pos += ret;
-		buf += ret;
+		count -= len;
+		pos += len;
+		buf += len;
 	}
-	memalloc_nofs_restore(pflags);
 
-	return error;
+	return 0;
 }
 
 /* Find the next written area in the xfile data for a given offset. */
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store Christoph Hellwig
@ 2024-01-03  8:41 ` Christoph Hellwig
  2024-01-04  0:21   ` Darrick J. Wong
  2024-01-04  1:35 ` put the xfs xfile abstraction on a diet Darrick J. Wong
  15 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-03  8:41 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch xfile_obj_load to use xfile_get_page and xfile_put_page to access
the shmem page cache.  The former uses shmem_get_folio(..., SGP_READ),
which returns a NULL folio for a hole instead of allocating one to
optimize the case where the caller is reading from a whole and doesn't
want to a zeroed folio to the page cache.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 51 +++++++++++---------------------------------
 1 file changed, 12 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 987b03df241b02..3f9e416376d2f7 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -34,13 +34,6 @@
  * xfiles assume that the caller will handle all required concurrency
  * management; standard vfs locks (freezer and inode) are not taken.  Reads
  * and writes are satisfied directly from the page cache.
- *
- * NOTE: The current shmemfs implementation has a quirk that in-kernel reads
- * of a hole cause a page to be mapped into the file.  If you are going to
- * create a sparse xfile, please be careful about reading from uninitialized
- * parts of the file.  These pages are !Uptodate and will eventually be
- * reclaimed if not written, but in the short term this boosts memory
- * consumption.
  */
 
 /*
@@ -117,58 +110,38 @@ xfile_obj_load(
 	size_t			count,
 	loff_t			pos)
 {
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	struct page		*page = NULL;
-	unsigned int		pflags;
-	int			error = 0;
-
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
-	if (inode->i_sb->s_maxbytes - pos < count)
+	if (file_inode(xf->file)->i_sb->s_maxbytes - pos < count)
 		return -ENOMEM;
 
 	trace_xfile_obj_load(xf, pos, count);
 
-	pflags = memalloc_nofs_save();
 	while (count > 0) {
+		struct page	*page;
 		unsigned int	len;
 
 		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
-
-		/*
-		 * In-kernel reads of a shmem file cause it to allocate a page
-		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
-		 * we can continue by zeroing the caller's buffer.
-		 */
-		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
-				__GFP_NOWARN);
-		if (IS_ERR(page)) {
-			error = PTR_ERR(page);
-			if (error != -ENOMEM) {
-				error = -ENOMEM;
-				break;
-			}
-
+		page = xfile_get_page(xf, pos, len, 0);
+		if (IS_ERR(page))
+			return -ENOMEM;
+		if (!page) {
+			/*
+			 * No data stored at this offset, just zero the output
+			 * buffer.
+			 */
 			memset(buf, 0, len);
 			goto advance;
 		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so
-		 * we skip the dcache flush.
-		 */
 		memcpy(buf, page_address(page) + offset_in_page(pos), len);
-		put_page(page);
-
+		xfile_put_page(xf, page);
 advance:
 		count -= len;
 		pos += len;
 		buf += len;
 	}
-	memalloc_nofs_restore(pflags);
 
-	return error;
+	return 0;
 }
 
 /*
-- 
2.39.2



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp
  2024-01-03  8:41 ` [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
@ 2024-01-03 23:32   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-03 23:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:12AM +0000, Christoph Hellwig wrote:
> Move the check that the inode really is a shmemfs one from
> shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio
> can also be called from outside of shmem.c.  Also turn it into a
> WARN_ON_ONCE and error return instead of BUG_ON to be less severe.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

No complaints from me about converting a BUGON to an error return...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  mm/shmem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 91e2620148b2f6..3349df6d4e0360 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1951,6 +1951,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  	int error;
>  	bool alloced;
>  
> +	if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
> +		return -EINVAL;
> +
>  	if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
>  		return -EFBIG;
>  repeat:
> @@ -4895,7 +4898,6 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
>  	struct folio *folio;
>  	int error;
>  
> -	BUG_ON(!shmem_mapping(mapping));
>  	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
>  				    gfp, NULL, NULL);
>  	if (error)
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/15] xfs: remove xfile_stat
  2024-01-03  8:41 ` [PATCH 04/15] xfs: remove xfile_stat Christoph Hellwig
@ 2024-01-03 23:45   ` Darrick J. Wong
  2024-01-04  6:14     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-03 23:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:15AM +0000, Christoph Hellwig wrote:
> vfs_getattr is needed to query inode attributes for unknown underlying
> file systems.  But shmemfs is well known for users of shmem_file_setup
> and shmem_read_mapping_page_gfp that rely on it not needing specific
> inode revalidation and having a normal mapping.  Remove the detour
> through the getattr method and an extra wrapper, and just read the
> inode size and i_bytes directly in the scrub tracing code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/scrub/trace.h | 34 ++++++++++------------------------
>  fs/xfs/scrub/xfile.c | 19 -------------------
>  fs/xfs/scrub/xfile.h |  7 -------
>  3 files changed, 10 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index 6bbb4e8639dca6..ed9e044f6d603c 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -861,18 +861,11 @@ TRACE_EVENT(xfile_destroy,
>  		__field(loff_t, size)
>  	),
>  	TP_fast_assign(
> -		struct xfile_stat	statbuf;
> -		int			ret;
> -
> -		ret = xfile_stat(xf, &statbuf);
> -		if (!ret) {
> -			__entry->bytes = statbuf.bytes;
> -			__entry->size = statbuf.size;
> -		} else {
> -			__entry->bytes = -1;
> -			__entry->size = -1;
> -		}
> -		__entry->ino = file_inode(xf->file)->i_ino;
> +		struct inode		*inode = file_inode(xf->file);
> +
> +		__entry->ino = inode->i_ino;
> +		__entry->bytes = inode->i_bytes;

Shouldn't this be (i_blocks << 9) + i_bytes?

> +		__entry->size = i_size_read(inode);
>  	),
>  	TP_printk("xfino 0x%lx mem_bytes 0x%llx isize 0x%llx",
>  		  __entry->ino,
> @@ -891,19 +884,12 @@ DECLARE_EVENT_CLASS(xfile_class,
>  		__field(unsigned long long, bytecount)
>  	),
>  	TP_fast_assign(
> -		struct xfile_stat	statbuf;
> -		int			ret;
> -
> -		ret = xfile_stat(xf, &statbuf);
> -		if (!ret) {
> -			__entry->bytes_used = statbuf.bytes;
> -			__entry->size = statbuf.size;
> -		} else {
> -			__entry->bytes_used = -1;
> -			__entry->size = -1;
> -		}
> -		__entry->ino = file_inode(xf->file)->i_ino;
> +		struct inode		*inode = file_inode(xf->file);
> +
> +		__entry->ino = inode->i_ino;
> +		__entry->bytes_used = inode->i_bytes;

Here too.

>  		__entry->pos = pos;
> +		__entry->size = i_size_read(inode);
>  		__entry->bytecount = bytecount;
>  	),
>  	TP_printk("xfino 0x%lx mem_bytes 0x%llx pos 0x%llx bytecount 0x%llx isize 0x%llx",
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 090c3ead43fdf1..87654cdd5ac6f9 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -291,25 +291,6 @@ xfile_seek_data(
>  	return ret;
>  }
>  
> -/* Query stat information for an xfile. */
> -int
> -xfile_stat(
> -	struct xfile		*xf,
> -	struct xfile_stat	*statbuf)
> -{
> -	struct kstat		ks;
> -	int			error;
> -
> -	error = vfs_getattr_nosec(&xf->file->f_path, &ks,
> -			STATX_SIZE | STATX_BLOCKS, AT_STATX_DONT_SYNC);
> -	if (error)
> -		return error;
> -
> -	statbuf->size = ks.size;
> -	statbuf->bytes = ks.blocks << SECTOR_SHIFT;
> -	return 0;
> -}
> -
>  /*
>   * Grab the (locked) page for a memory object.  The object cannot span a page
>   * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index d56643b0f429e1..c602d11560d8ee 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -63,13 +63,6 @@ xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
>  
>  loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
>  
> -struct xfile_stat {
> -	loff_t			size;
> -	unsigned long long	bytes;
> -};
> -
> -int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);

Removing this function will put some distance between the kernel and
xfsprogs xfile implementations, since userspace can't do failure-free
fstat.  OTOH I guess if xfile_stat fails in userspace, our xfile is
screwed and we probably have to abort the whole program anyway.

For the kernel I like getting rid of this clutter, modulo the question
above.

--D

> -
>  int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
>  		struct xfile_page *xbuf);
>  int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs
  2024-01-03  8:41 ` [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
@ 2024-01-03 23:48   ` Darrick J. Wong
  2024-01-04  6:15     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-03 23:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:16AM +0000, Christoph Hellwig wrote:
> All current and pending xfile users use the xfile_obj_load
> and xfile_obj_store API, so make those the actual implementation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../xfs/xfs-online-fsck-design.rst            | 10 +---
>  fs/xfs/scrub/trace.h                          |  4 +-
>  fs/xfs/scrub/xfile.c                          | 54 +++++++++----------
>  fs/xfs/scrub/xfile.h                          | 32 +----------
>  4 files changed, 30 insertions(+), 70 deletions(-)
> 
> diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
> index 352516feef6ffe..324d5ec921e8e5 100644
> --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
> +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
> @@ -1915,19 +1915,13 @@ four of those five higher level data structures.
>  The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
>  study.
>  
> -The most general storage interface supported by the xfile enables the reading
> -and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
> -This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
> -which behave similarly to their userspace counterparts.
>  XFS is very record-based, which suggests that the ability to load and store
>  complete records is important.
>  To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
> -functions are provided to read and persist objects into an xfile.
> -They are internally the same as pread and pwrite, except that they treat any
> -error as an out of memory error.
> +functions are provided to read and persist objects into an xfile that unlike
> +the pread and pwrite system calls treat any error as an out of memory error.

I don't think we need to mention pread and pwrite anymore.

"To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
functions are provided to read and persist objects into an xfile.  An errors
encountered here are treated as an out of memory error."

>  For online repair, squashing error conditions in this manner is an acceptable
>  behavior because the only reaction is to abort the operation back to userspace.
> -All five xfile usecases can be serviced by these four functions.
>  
>  However, no discussion of file access idioms is complete without answering the
>  question, "But what about mmap?"
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index ed9e044f6d603c..e6156d000fc615 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -903,8 +903,8 @@ DECLARE_EVENT_CLASS(xfile_class,
>  DEFINE_EVENT(xfile_class, name, \
>  	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount), \
>  	TP_ARGS(xf, pos, bytecount))
> -DEFINE_XFILE_EVENT(xfile_pread);
> -DEFINE_XFILE_EVENT(xfile_pwrite);
> +DEFINE_XFILE_EVENT(xfile_obj_load);
> +DEFINE_XFILE_EVENT(xfile_obj_store);

Want to shorten the names to xfile_load and xfile_store?  That's really
what they're doing anyway.

With those changes,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


>  DEFINE_XFILE_EVENT(xfile_seek_data);
>  DEFINE_XFILE_EVENT(xfile_get_page);
>  DEFINE_XFILE_EVENT(xfile_put_page);
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 87654cdd5ac6f9..9e25ecf3dc2fec 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -118,13 +118,11 @@ xfile_destroy(
>  }
>  
>  /*
> - * Read a memory object directly from the xfile's page cache.  Unlike regular
> - * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
> - * high an offset, instead of truncating the read.  Otherwise, we return
> - * bytes read or an error code, like regular pread.
> + * Load an object.  Since we're treating this file as "memory", any error or
> + * short IO is treated as a failure to allocate memory.
>   */
> -ssize_t
> -xfile_pread(
> +int
> +xfile_obj_load(
>  	struct xfile		*xf,
>  	void			*buf,
>  	size_t			count,
> @@ -133,16 +131,15 @@ xfile_pread(
>  	struct inode		*inode = file_inode(xf->file);
>  	struct address_space	*mapping = inode->i_mapping;
>  	struct page		*page = NULL;
> -	ssize_t			read = 0;
>  	unsigned int		pflags;
>  	int			error = 0;
>  
>  	if (count > MAX_RW_COUNT)
> -		return -E2BIG;
> +		return -ENOMEM;
>  	if (inode->i_sb->s_maxbytes - pos < count)
> -		return -EFBIG;
> +		return -ENOMEM;
>  
> -	trace_xfile_pread(xf, pos, count);
> +	trace_xfile_obj_load(xf, pos, count);
>  
>  	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> @@ -160,8 +157,10 @@ xfile_pread(
>  				__GFP_NOWARN);
>  		if (IS_ERR(page)) {
>  			error = PTR_ERR(page);
> -			if (error != -ENOMEM)
> +			if (error != -ENOMEM) {
> +				error = -ENOMEM;
>  				break;
> +			}
>  
>  			memset(buf, 0, len);
>  			goto advance;
> @@ -185,23 +184,18 @@ xfile_pread(
>  		count -= len;
>  		pos += len;
>  		buf += len;
> -		read += len;
>  	}
>  	memalloc_nofs_restore(pflags);
>  
> -	if (read > 0)
> -		return read;
>  	return error;
>  }
>  
>  /*
> - * Write a memory object directly to the xfile's page cache.  Unlike regular
> - * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
> - * high an offset, instead of truncating the write.  Otherwise, we return
> - * bytes written or an error code, like regular pwrite.
> + * Store an object.  Since we're treating this file as "memory", any error or
> + * short IO is treated as a failure to allocate memory.
>   */
> -ssize_t
> -xfile_pwrite(
> +int
> +xfile_obj_store(
>  	struct xfile		*xf,
>  	const void		*buf,
>  	size_t			count,
> @@ -211,16 +205,15 @@ xfile_pwrite(
>  	struct address_space	*mapping = inode->i_mapping;
>  	const struct address_space_operations *aops = mapping->a_ops;
>  	struct page		*page = NULL;
> -	ssize_t			written = 0;
>  	unsigned int		pflags;
>  	int			error = 0;
>  
>  	if (count > MAX_RW_COUNT)
> -		return -E2BIG;
> +		return -ENOMEM;
>  	if (inode->i_sb->s_maxbytes - pos < count)
> -		return -EFBIG;
> +		return -ENOMEM;
>  
> -	trace_xfile_pwrite(xf, pos, count);
> +	trace_xfile_obj_store(xf, pos, count);
>  
>  	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> @@ -239,8 +232,10 @@ xfile_pwrite(
>  		 */
>  		error = aops->write_begin(NULL, mapping, pos, len, &page,
>  				&fsdata);
> -		if (error)
> +		if (error) {
> +			error = -ENOMEM;
>  			break;
> +		}
>  
>  		/*
>  		 * xfile pages must never be mapped into userspace, so we skip
> @@ -259,13 +254,14 @@ xfile_pwrite(
>  		ret = aops->write_end(NULL, mapping, pos, len, len, page,
>  				fsdata);
>  		if (ret < 0) {
> -			error = ret;
> +			error = -ENOMEM;
>  			break;
>  		}
>  
> -		written += ret;
> -		if (ret != len)
> +		if (ret != len) {
> +			error = -ENOMEM;
>  			break;
> +		}
>  
>  		count -= ret;
>  		pos += ret;
> @@ -273,8 +269,6 @@ xfile_pwrite(
>  	}
>  	memalloc_nofs_restore(pflags);
>  
> -	if (written > 0)
> -		return written;
>  	return error;
>  }
>  
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index c602d11560d8ee..f0e11febf216a7 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -29,38 +29,10 @@ struct xfile {
>  int xfile_create(const char *description, loff_t isize, struct xfile **xfilep);
>  void xfile_destroy(struct xfile *xf);
>  
> -ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
> -ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
> +int xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos);
> +int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
>  		loff_t pos);
>  
> -/*
> - * Load an object.  Since we're treating this file as "memory", any error or
> - * short IO is treated as a failure to allocate memory.
> - */
> -static inline int
> -xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
> -{
> -	ssize_t	ret = xfile_pread(xf, buf, count, pos);
> -
> -	if (ret < 0 || ret != count)
> -		return -ENOMEM;
> -	return 0;
> -}
> -
> -/*
> - * Store an object.  Since we're treating this file as "memory", any error or
> - * short IO is treated as a failure to allocate memory.
> - */
> -static inline int
> -xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
> -{
> -	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
> -
> -	if (ret < 0 || ret != count)
> -		return -ENOMEM;
> -	return 0;
> -}
> -
>  loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
>  
>  int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load
  2024-01-03  8:41 ` [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
@ 2024-01-03 23:55   ` Darrick J. Wong
  2024-01-04  6:21     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-03 23:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:17AM +0000, Christoph Hellwig wrote:
> shmem_read_mapping_page_gfp always returns an uptodate page or an
> ERR_PTR.  Remove the code that tries to handle a non-uptodate page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hmm.  xfile_pread calls shmem_read_mapping_page_gfp ->
shmem_read_folio_gfp -> shmem_get_folio_gfp(..., SGP_CACHE), right?

Therefore, if the page is !uptodate then the "clear:" code will mark it
uptodate, right?  And that's why xfile.c doesn't need to check uptodate?

If that's correct, then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> ---
>  fs/xfs/scrub/xfile.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 9e25ecf3dc2fec..46f4a06029cd4b 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -166,18 +166,14 @@ xfile_obj_load(
>  			goto advance;
>  		}
>  
> -		if (PageUptodate(page)) {
> -			/*
> -			 * xfile pages must never be mapped into userspace, so
> -			 * we skip the dcache flush.
> -			 */
> -			kaddr = kmap_local_page(page);
> -			p = kaddr + offset_in_page(pos);
> -			memcpy(buf, p, len);
> -			kunmap_local(kaddr);
> -		} else {
> -			memset(buf, 0, len);
> -		}
> +		/*
> +		 * xfile pages must never be mapped into userspace, so
> +		 * we skip the dcache flush.
> +		 */
> +		kaddr = kmap_local_page(page);
> +		p = kaddr + offset_in_page(pos);
> +		memcpy(buf, p, len);
> +		kunmap_local(kaddr);
>  		put_page(page);
>  
>  advance:
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/15] xfs: shmem_file_setup can't return NULL
  2024-01-03  8:41 ` [PATCH 07/15] xfs: shmem_file_setup can't return NULL Christoph Hellwig
@ 2024-01-03 23:56   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-03 23:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:18AM +0000, Christoph Hellwig wrote:
> shmem_file_setup always returns a struct file pointer or an ERR_PTR,
> so remove the code to check for a NULL return.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I guess the bots will stop hassling me about this if I say
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/scrub/xfile.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 46f4a06029cd4b..ec1be08937977a 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -62,15 +62,13 @@ xfile_create(
>  {
>  	struct inode		*inode;
>  	struct xfile		*xf;
> -	int			error = -ENOMEM;
> +	int			error;
>  
>  	xf = kmalloc(sizeof(struct xfile), XCHK_GFP_FLAGS);
>  	if (!xf)
>  		return -ENOMEM;
>  
>  	xf->file = shmem_file_setup(description, isize, 0);
> -	if (!xf->file)
> -		goto out_xfile;
>  	if (IS_ERR(xf->file)) {
>  		error = PTR_ERR(xf->file);
>  		goto out_xfile;
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 08/15] xfs: don't modify file and inode flags for shmem files
  2024-01-03  8:41 ` [PATCH 08/15] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
@ 2024-01-04  0:01   ` Darrick J. Wong
  2024-01-04  6:23     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:19AM +0000, Christoph Hellwig wrote:
> shmem_file_setup is explicitly intended for a file that can be
> fully read and written by kernel users without restrictions.  Don't
> poke into internals to change random flags in the file or inode.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/scrub/xfile.c | 15 ---------------
>  1 file changed, 15 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index ec1be08937977a..e872f4f0263f59 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -74,22 +74,7 @@ xfile_create(
>  		goto out_xfile;
>  	}
>  
> -	/*
> -	 * We want a large sparse file that we can pread, pwrite, and seek.
> -	 * xfile users are responsible for keeping the xfile hidden away from
> -	 * all other callers, so we skip timestamp updates and security checks.
> -	 * Make the inode only accessible by root, just in case the xfile ever
> -	 * escapes.
> -	 */
> -	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
> -			    FMODE_LSEEK;
> -	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;
>  	inode = file_inode(xf->file);
> -	inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;

I actually want S_PRIVATE here to avoid interference from all the
security hooks and whatnot when scrub is using an xfile to stash a
large amount of data.  Shouldn't this patch change xfile_create to call
shmem_kernel_file_setup instead?

> -	inode->i_mode &= ~0177;
> -	inode->i_uid = GLOBAL_ROOT_UID;
> -	inode->i_gid = GLOBAL_ROOT_GID;

Also, I don't know if it matters that the default uid/gid are now going
to be whatever the defaults would be for a new file instead of root
only.  That seems like it could invite problems, but otoh xfiles are
never installed in the fd table so userspace should never get access
anyway.

--D

> -
>  	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
>  
>  	trace_xfile_create(xf);
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr
  2024-01-03  8:41 ` [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
@ 2024-01-04  0:04   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:21AM +0000, Christoph Hellwig wrote:
> Now that xfile pages don't need kmapping, there is no need to cache
> the kernel virtual address for them.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/scrub/xfarray.c | 22 ++++------------------
>  fs/xfs/scrub/xfarray.h |  1 -
>  2 files changed, 4 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
> index 3a44700037924b..c29a240d4e25f4 100644
> --- a/fs/xfs/scrub/xfarray.c
> +++ b/fs/xfs/scrub/xfarray.c
> @@ -570,18 +570,7 @@ xfarray_sort_get_page(
>  	loff_t			pos,
>  	uint64_t		len)
>  {
> -	int			error;
> -
> -	error = xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
> -	if (error)
> -		return error;
> -
> -	/*
> -	 * xfile pages must never be mapped into userspace, so we skip the
> -	 * dcache flush when mapping the page.
> -	 */
> -	si->page_kaddr = page_address(si->xfpage.page);
> -	return 0;
> +	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
>  }
>  
>  /* Release a page we grabbed for sorting records. */
> @@ -589,11 +578,8 @@ static inline int
>  xfarray_sort_put_page(
>  	struct xfarray_sortinfo	*si)
>  {
> -	if (!si->page_kaddr)
> +	if (!xfile_page_cached(&si->xfpage))
>  		return 0;
> -
> -	si->page_kaddr = NULL;
> -
>  	return xfile_put_page(si->array->xfile, &si->xfpage);
>  }
>  
> @@ -636,7 +622,7 @@ xfarray_pagesort(
>  		return error;
>  
>  	xfarray_sort_bump_heapsorts(si);
> -	startp = si->page_kaddr + offset_in_page(lo_pos);
> +	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
>  	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
>  
>  	xfarray_sort_bump_stores(si);
> @@ -883,7 +869,7 @@ xfarray_sort_load_cached(
>  			return error;
>  	}
>  
> -	memcpy(ptr, si->page_kaddr + offset_in_page(idx_pos),
> +	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
>  			si->array->obj_size);
>  	return 0;
>  }
> diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
> index 62b9c506fdd1b7..6f2862054e194d 100644
> --- a/fs/xfs/scrub/xfarray.h
> +++ b/fs/xfs/scrub/xfarray.h
> @@ -107,7 +107,6 @@ struct xfarray_sortinfo {
>  
>  	/* Cache a page here for faster access. */
>  	struct xfile_page	xfpage;
> -	void			*page_kaddr;
>  
>  #ifdef DEBUG
>  	/* Performance statistics. */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page
  2024-01-03  8:41 ` [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page Christoph Hellwig
@ 2024-01-04  0:12   ` Darrick J. Wong
  2024-01-04  6:25     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:22AM +0000, Christoph Hellwig wrote:
> Switch to using shmem_get_folio and manually dirtying the page instead
> of abusing aops->write_begin and aops->write_end in xfile_get_page.
> 
> This simplified the code by not doing indirect calls of not actually
> exported interfaces that don't really fit the use case very well.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/scrub/xfarray.c | 32 +++++-----------
>  fs/xfs/scrub/xfile.c   | 84 +++++++++++++-----------------------------
>  fs/xfs/scrub/xfile.h   |  2 +-
>  3 files changed, 37 insertions(+), 81 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
> index c29a240d4e25f4..c6e62c119148a1 100644
> --- a/fs/xfs/scrub/xfarray.c
> +++ b/fs/xfs/scrub/xfarray.c
> @@ -574,13 +574,12 @@ xfarray_sort_get_page(
>  }
>  
>  /* Release a page we grabbed for sorting records. */
> -static inline int
> +static inline void
>  xfarray_sort_put_page(
>  	struct xfarray_sortinfo	*si)
>  {
> -	if (!xfile_page_cached(&si->xfpage))
> -		return 0;
> -	return xfile_put_page(si->array->xfile, &si->xfpage);
> +	if (xfile_page_cached(&si->xfpage))
> +		xfile_put_page(si->array->xfile, &si->xfpage);
>  }
>  
>  /* Decide if these records are eligible for in-page sorting. */
> @@ -626,7 +625,8 @@ xfarray_pagesort(
>  	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
>  
>  	xfarray_sort_bump_stores(si);
> -	return xfarray_sort_put_page(si);
> +	xfarray_sort_put_page(si);
> +	return 0;
>  }
>  
>  /* Return a pointer to the xfarray pivot record within the sortinfo struct. */
> @@ -836,10 +836,7 @@ xfarray_sort_load_cached(
>  	startpage = idx_pos >> PAGE_SHIFT;
>  	endpage = (idx_pos + si->array->obj_size - 1) >> PAGE_SHIFT;
>  	if (startpage != endpage) {
> -		error = xfarray_sort_put_page(si);
> -		if (error)
> -			return error;
> -
> +		xfarray_sort_put_page(si);
>  		if (xfarray_sort_terminated(si, &error))
>  			return error;
>  
> @@ -849,11 +846,8 @@ xfarray_sort_load_cached(
>  
>  	/* If the cached page is not the one we want, release it. */
>  	if (xfile_page_cached(&si->xfpage) &&
> -	    xfile_page_index(&si->xfpage) != startpage) {
> -		error = xfarray_sort_put_page(si);
> -		if (error)
> -			return error;
> -	}
> +	    xfile_page_index(&si->xfpage) != startpage)
> +		xfarray_sort_put_page(si);
>  
>  	/*
>  	 * If we don't have a cached page (and we know the load is contained
> @@ -995,10 +989,7 @@ xfarray_sort(
>  				if (error)
>  					goto out_free;
>  			}
> -			error = xfarray_sort_put_page(si);
> -			if (error)
> -				goto out_free;
> -
> +			xfarray_sort_put_page(si);
>  			if (xfarray_sort_terminated(si, &error))
>  				goto out_free;
>  
> @@ -1024,10 +1015,7 @@ xfarray_sort(
>  				if (error)
>  					goto out_free;
>  			}
> -			error = xfarray_sort_put_page(si);
> -			if (error)
> -				goto out_free;
> -
> +			xfarray_sort_put_page(si);
>  			if (xfarray_sort_terminated(si, &error))
>  				goto out_free;
>  
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index afbd205289e9b0..2b4b0c4e8d2fb6 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -278,11 +278,8 @@ xfile_get_page(
>  	struct xfile_page	*xfpage)
>  {
>  	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	const struct address_space_operations *aops = mapping->a_ops;
> -	struct page		*page = NULL;
> -	void			*fsdata = NULL;
> -	loff_t			key = round_down(pos, PAGE_SIZE);
> +	struct folio		*folio = NULL;
> +	struct page		*page;
>  	unsigned int		pflags;
>  	int			error;
>  
> @@ -293,78 +290,49 @@ xfile_get_page(
>  
>  	trace_xfile_get_page(xf, pos, len);
>  
> -	pflags = memalloc_nofs_save();
> -
>  	/*
> -	 * We call write_begin directly here to avoid all the freezer
> -	 * protection lock-taking that happens in the normal path.  shmem
> -	 * doesn't support fs freeze, but lockdep doesn't know that and will
> -	 * trip over that.
> +	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
> +	 * actually allocates a folio instead of erroring out.
>  	 */
> -	error = aops->write_begin(NULL, mapping, key, PAGE_SIZE, &page,
> -			&fsdata);
> -	if (error)
> -		goto out_pflags;
> -
> -	/* We got the page, so make sure we push out EOF. */
> -	if (i_size_read(inode) < pos + len)
> +	if (pos + len > i_size_read(inode))
>  		i_size_write(inode, pos + len);
>  
> -	/*
> -	 * If the page isn't up to date, fill it with zeroes before we hand it
> -	 * to the caller and make sure the backing store will hold on to them.
> -	 */
> -	if (!PageUptodate(page)) {
> -		memset(page_address(page), 0, PAGE_SIZE);
> -		SetPageUptodate(page);
> +	pflags = memalloc_nofs_save();
> +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
> +	memalloc_nofs_restore(pflags);
> +	if (error)
> +		return error;
> +
> +	page = folio_file_page(folio, pos >> PAGE_SHIFT);
> +	if (PageHWPoison(page)) {
> +		folio_put(folio);

We need to unlock the folio here, right?

The rest looks reasonable to me.

--D

> +		return -EIO;
>  	}
>  
>  	/*
> -	 * Mark each page dirty so that the contents are written to some
> -	 * backing store when we drop this buffer, and take an extra reference
> -	 * to prevent the xfile page from being swapped or removed from the
> -	 * page cache by reclaim if the caller unlocks the page.
> +	 * Mark the page dirty so that it won't be reclaimed once we drop the
> +	 * (potentially last) reference in xfile_put_page.
>  	 */
>  	set_page_dirty(page);
> -	get_page(page);
>  
>  	xfpage->page = page;
> -	xfpage->fsdata = fsdata;
> -	xfpage->pos = key;
> -out_pflags:
> -	memalloc_nofs_restore(pflags);
> -	return error;
> +	xfpage->fsdata = NULL;
> +	xfpage->pos = round_down(pos, PAGE_SIZE);
> +	return 0;
>  }
>  
>  /*
> - * Release the (locked) page for a memory object.  Returns 0 or a negative
> - * errno.
> + * Release the (locked) page for a memory object.
>   */
> -int
> +void
>  xfile_put_page(
>  	struct xfile		*xf,
>  	struct xfile_page	*xfpage)
>  {
> -	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	const struct address_space_operations *aops = mapping->a_ops;
> -	unsigned int		pflags;
> -	int			ret;
> -
> -	trace_xfile_put_page(xf, xfpage->pos, PAGE_SIZE);
> +	struct page		*page = xfpage->page;
>  
> -	/* Give back the reference that we took in xfile_get_page. */
> -	put_page(xfpage->page);
> +	trace_xfile_put_page(xf, page->index << PAGE_SHIFT, PAGE_SIZE);
>  
> -	pflags = memalloc_nofs_save();
> -	ret = aops->write_end(NULL, mapping, xfpage->pos, PAGE_SIZE, PAGE_SIZE,
> -			xfpage->page, xfpage->fsdata);
> -	memalloc_nofs_restore(pflags);
> -	memset(xfpage, 0, sizeof(struct xfile_page));
> -
> -	if (ret < 0)
> -		return ret;
> -	if (ret != PAGE_SIZE)
> -		return -EIO;
> -	return 0;
> +	unlock_page(page);
> +	put_page(page);
>  }
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index f0e11febf216a7..2f46b7d694ce99 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -37,6 +37,6 @@ loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
>  
>  int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
>  		struct xfile_page *xbuf);
> -int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
> +void xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
>  
>  #endif /* __XFS_SCRUB_XFILE_H__ */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page
  2024-01-03  8:41 ` [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page Christoph Hellwig
@ 2024-01-04  0:18   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:24AM +0000, Christoph Hellwig wrote:
> Pass a flags argument to xfile_get_page, and only allocate a new page
> if the XFILE_ALLOC flag is passed.  This allows to also use
> xfile_get_page for pure readers that do not want to allocate a new
> page or dirty the existing one.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks correct to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/scrub/xfarray.c |  2 +-
>  fs/xfs/scrub/xfile.c   | 14 ++++++++++----
>  fs/xfs/scrub/xfile.h   |  4 +++-
>  3 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
> index 4f396462186793..8543067d46366d 100644
> --- a/fs/xfs/scrub/xfarray.c
> +++ b/fs/xfs/scrub/xfarray.c
> @@ -572,7 +572,7 @@ xfarray_sort_get_page(
>  {
>  	struct page		*page;
>  
> -	page = xfile_get_page(si->array->xfile, pos, len);
> +	page = xfile_get_page(si->array->xfile, pos, len, XFILE_ALLOC);
>  	if (IS_ERR(page))
>  		return PTR_ERR(page);
>  	si->page = page;
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 715c4d10b67c14..3ed7fb82a4497b 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -274,7 +274,8 @@ struct page *
>  xfile_get_page(
>  	struct xfile		*xf,
>  	loff_t			pos,
> -	unsigned int		len)
> +	unsigned int		len,
> +	unsigned int		flags)
>  {
>  	struct inode		*inode = file_inode(xf->file);
>  	struct folio		*folio = NULL;
> @@ -293,15 +294,19 @@ xfile_get_page(
>  	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
>  	 * actually allocates a folio instead of erroring out.
>  	 */
> -	if (pos + len > i_size_read(inode))
> +	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
>  		i_size_write(inode, pos + len);
>  
>  	pflags = memalloc_nofs_save();
> -	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
> +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
>  	memalloc_nofs_restore(pflags);
>  	if (error)
>  		return ERR_PTR(error);
>  
> +	if (!folio)
> +		return NULL;
> +
>  	page = folio_file_page(folio, pos >> PAGE_SHIFT);
>  	if (PageHWPoison(page)) {
>  		folio_put(folio);
> @@ -312,7 +317,8 @@ xfile_get_page(
>  	 * Mark the page dirty so that it won't be reclaimed once we drop the
>  	 * (potentially last) reference in xfile_put_page.
>  	 */
> -	set_page_dirty(page);
> +	if (flags & XFILE_ALLOC)
> +		set_page_dirty(page);
>  	return page;
>  }
>  
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index 993368b37b4b7c..f0403ea869e4d0 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -19,7 +19,9 @@ int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
>  
>  loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
>  
> -struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len);
> +#define XFILE_ALLOC		(1 << 0) /* allocate page if not present */
> +struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
> +		unsigned int flags);
>  void xfile_put_page(struct xfile *xf, struct page *page);
>  
>  #endif /* __XFS_SCRUB_XFILE_H__ */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store
  2024-01-03  8:41 ` [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store Christoph Hellwig
@ 2024-01-04  0:20   ` Darrick J. Wong
  2024-01-04  6:26     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:25AM +0000, Christoph Hellwig wrote:
> Rewrite xfile_obj_store to use xfile_get_page and xfile_put_page to
> access the data in the shmem page cache instead of abusing the
> shmem write_begin and write_end aops.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Much simpler, though I wonder if willy is going to have something to say
about xfile.c continuing to pass pages around instead of folios.  I
/think/ that's ok since we actually need the physical base page for
doing IO, right?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/scrub/xfile.c | 66 ++++++++------------------------------------
>  1 file changed, 11 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 3ed7fb82a4497b..987b03df241b02 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -182,74 +182,30 @@ xfile_obj_store(
>  	size_t			count,
>  	loff_t			pos)
>  {
> -	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	const struct address_space_operations *aops = mapping->a_ops;
> -	struct page		*page = NULL;
> -	unsigned int		pflags;
> -	int			error = 0;
> -
>  	if (count > MAX_RW_COUNT)
>  		return -ENOMEM;
> -	if (inode->i_sb->s_maxbytes - pos < count)
> +	if (file_inode(xf->file)->i_sb->s_maxbytes - pos < count)
>  		return -ENOMEM;
>  
>  	trace_xfile_obj_store(xf, pos, count);
>  
> -	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> -		void		*fsdata = NULL;
> -		void		*p, *kaddr;
> +		struct page	*page;
>  		unsigned int	len;
> -		int		ret;
>  
>  		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> +		page = xfile_get_page(xf, pos, len, XFILE_ALLOC);
> +		if (IS_ERR(page))
> +			return -ENOMEM;
> +		memcpy(page_address(page) + offset_in_page(pos), buf, len);
> +		xfile_put_page(xf, page);
>  
> -		/*
> -		 * We call write_begin directly here to avoid all the freezer
> -		 * protection lock-taking that happens in the normal path.
> -		 * shmem doesn't support fs freeze, but lockdep doesn't know
> -		 * that and will trip over that.
> -		 */
> -		error = aops->write_begin(NULL, mapping, pos, len, &page,
> -				&fsdata);
> -		if (error) {
> -			error = -ENOMEM;
> -			break;
> -		}
> -
> -		/*
> -		 * xfile pages must never be mapped into userspace, so we skip
> -		 * the dcache flush.  If the page is not uptodate, zero it
> -		 * before writing data.
> -		 */
> -		kaddr = page_address(page);
> -		if (!PageUptodate(page)) {
> -			memset(kaddr, 0, PAGE_SIZE);
> -			SetPageUptodate(page);
> -		}
> -		p = kaddr + offset_in_page(pos);
> -		memcpy(p, buf, len);
> -
> -		ret = aops->write_end(NULL, mapping, pos, len, len, page,
> -				fsdata);
> -		if (ret < 0) {
> -			error = -ENOMEM;
> -			break;
> -		}
> -
> -		if (ret != len) {
> -			error = -ENOMEM;
> -			break;
> -		}
> -
> -		count -= ret;
> -		pos += ret;
> -		buf += ret;
> +		count -= len;
> +		pos += len;
> +		buf += len;
>  	}
> -	memalloc_nofs_restore(pflags);
>  
> -	return error;
> +	return 0;
>  }
>  
>  /* Find the next written area in the xfile data for a given offset. */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load
  2024-01-03  8:41 ` [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load Christoph Hellwig
@ 2024-01-04  0:21   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:26AM +0000, Christoph Hellwig wrote:
> Switch xfile_obj_load to use xfile_get_page and xfile_put_page to access
> the shmem page cache.  The former uses shmem_get_folio(..., SGP_READ),
> which returns a NULL folio for a hole instead of allocating one to
> optimize the case where the caller is reading from a whole and doesn't
> want to a zeroed folio to the page cache.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/scrub/xfile.c | 51 +++++++++++---------------------------------
>  1 file changed, 12 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 987b03df241b02..3f9e416376d2f7 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -34,13 +34,6 @@
>   * xfiles assume that the caller will handle all required concurrency
>   * management; standard vfs locks (freezer and inode) are not taken.  Reads
>   * and writes are satisfied directly from the page cache.
> - *
> - * NOTE: The current shmemfs implementation has a quirk that in-kernel reads
> - * of a hole cause a page to be mapped into the file.  If you are going to
> - * create a sparse xfile, please be careful about reading from uninitialized
> - * parts of the file.  These pages are !Uptodate and will eventually be
> - * reclaimed if not written, but in the short term this boosts memory
> - * consumption.

Awright, this goes away finally!

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>   */
>  
>  /*
> @@ -117,58 +110,38 @@ xfile_obj_load(
>  	size_t			count,
>  	loff_t			pos)
>  {
> -	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	struct page		*page = NULL;
> -	unsigned int		pflags;
> -	int			error = 0;
> -
>  	if (count > MAX_RW_COUNT)
>  		return -ENOMEM;
> -	if (inode->i_sb->s_maxbytes - pos < count)
> +	if (file_inode(xf->file)->i_sb->s_maxbytes - pos < count)
>  		return -ENOMEM;
>  
>  	trace_xfile_obj_load(xf, pos, count);
>  
> -	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> +		struct page	*page;
>  		unsigned int	len;
>  
>  		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> -
> -		/*
> -		 * In-kernel reads of a shmem file cause it to allocate a page
> -		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
> -		 * we can continue by zeroing the caller's buffer.
> -		 */
> -		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
> -				__GFP_NOWARN);
> -		if (IS_ERR(page)) {
> -			error = PTR_ERR(page);
> -			if (error != -ENOMEM) {
> -				error = -ENOMEM;
> -				break;
> -			}
> -
> +		page = xfile_get_page(xf, pos, len, 0);
> +		if (IS_ERR(page))
> +			return -ENOMEM;
> +		if (!page) {
> +			/*
> +			 * No data stored at this offset, just zero the output
> +			 * buffer.
> +			 */
>  			memset(buf, 0, len);
>  			goto advance;
>  		}
> -
> -		/*
> -		 * xfile pages must never be mapped into userspace, so
> -		 * we skip the dcache flush.
> -		 */
>  		memcpy(buf, page_address(page) + offset_in_page(pos), len);
> -		put_page(page);
> -
> +		xfile_put_page(xf, page);
>  advance:
>  		count -= len;
>  		pos += len;
>  		buf += len;
>  	}
> -	memalloc_nofs_restore(pflags);
>  
> -	return error;
> +	return 0;
>  }
>  
>  /*
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-03  8:41 ` [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-04  0:21   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:14AM +0000, Christoph Hellwig wrote:
> Add a blurb that simply dirtying the folio will persist data for in-kernel
> shmem files.  This is what most of the callers already do.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/shmem.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 328eb3dbea9f1c..235fac6dc53a0b 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2129,6 +2129,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>   *
>   * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
>   * and no folio was found at @index, or an ERR_PTR() otherwise.
> + *
> + * If the caller modifies data in the returned folio, it must call
> + * folio_mark_dirty() on the locked folio before dropping the reference to
> + * ensure the folio is not reclaimed.  Unlike for normal file systems there is
> + * no need to reserve space for users of shmem_*file_setup().

/me notes that this matches how /I/ think this is supposed to work, but
I think someone more familiar with tmpfs should review this...

--D

>   */
>  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
>  		enum sgp_type sgp)
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: put the xfs xfile abstraction on a diet
  2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
                   ` (14 preceding siblings ...)
  2024-01-03  8:41 ` [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load Christoph Hellwig
@ 2024-01-04  1:35 ` Darrick J. Wong
  2024-01-04  6:26   ` Christoph Hellwig
  15 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  1:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:11AM +0000, Christoph Hellwig wrote:
> Hi all,
> 
> this series refactors and simplifies the code in the xfs xfile
> abstraction, which is a thing layer on a kernel-use shmem file.
> 
> Do do this is needs a slighly lower level export from shmem.c,
> which I combined with improving an assert and documentation there.

What's the base for this series?  Is it xfs-linux for-next?  Or
djwong-wtf?

--D

> One thing I don't really like yet is that xfile is still based on
> folios and not pages.  The main stumbling block for that is the
> mess around the hwpoison flag - that one still is per-file and not
> per-folio, and shmem checks it weirdly often and not really in
> at the abstractions levels where I'd expect it and feels very
> different from the normal page cache code in filemap.c.  Maybe
> I'm just failing to understand why that is done, but especially
> without comments explaining it it feels like it could use some
> real attention first.
> 
> Diffstat:
>  Documentation/filesystems/xfs/xfs-online-fsck-design.rst |   10 
>  fs/xfs/scrub/trace.h                                     |   38 -
>  fs/xfs/scrub/xfarray.c                                   |   60 --
>  fs/xfs/scrub/xfarray.h                                   |    3 
>  fs/xfs/scrub/xfile.c                                     |  311 +++------------
>  fs/xfs/scrub/xfile.h                                     |   62 --
>  mm/shmem.c                                               |   24 +
>  7 files changed, 143 insertions(+), 365 deletions(-)
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/15] xfs: remove xfile_stat
  2024-01-03 23:45   ` Darrick J. Wong
@ 2024-01-04  6:14     ` Christoph Hellwig
  2024-01-04  6:55       ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 03:45:33PM -0800, Darrick J. Wong wrote:
> > +		__entry->bytes = inode->i_bytes;
> 
> Shouldn't this be (i_blocks << 9) + i_bytes?

Actually this should just be doing:

	__entry->bytes = inode->i_blocks << SECTOR_SHIFT;

The bytes name here really confused me.  Or we could change the trace
point to just report i_block directly and not rename it to bytes and
change the unit?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs
  2024-01-03 23:48   ` Darrick J. Wong
@ 2024-01-04  6:15     ` Christoph Hellwig
  2024-01-04  6:58       ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 03:48:49PM -0800, Darrick J. Wong wrote:
> "To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
> functions are provided to read and persist objects into an xfile.  An errors
> encountered here are treated as an out of memory error."

Ok.

> > -DEFINE_XFILE_EVENT(xfile_pwrite);
> > +DEFINE_XFILE_EVENT(xfile_obj_load);
> > +DEFINE_XFILE_EVENT(xfile_obj_store);
> 
> Want to shorten the names to xfile_load and xfile_store?  That's really
> what they're doing anyway.

Fine with me.  Just for the trace points or also for the functions?

Also - returning ENOMEM for the API misuse cases (too large object,
too large total size) always seemed weird to me.  Is there a really
strong case for it or should we go for actually useful errors for those?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load
  2024-01-03 23:55   ` Darrick J. Wong
@ 2024-01-04  6:21     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:21 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 03:55:38PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 03, 2024 at 08:41:17AM +0000, Christoph Hellwig wrote:
> > shmem_read_mapping_page_gfp always returns an uptodate page or an
> > ERR_PTR.  Remove the code that tries to handle a non-uptodate page.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Hmm.  xfile_pread calls shmem_read_mapping_page_gfp ->
> shmem_read_folio_gfp -> shmem_get_folio_gfp(..., SGP_CACHE), right?
> 
> Therefore, if the page is !uptodate then the "clear:" code will mark it
> uptodate, right?  And that's why xfile.c doesn't need to check uptodate?

Yes.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 08/15] xfs: don't modify file and inode flags for shmem files
  2024-01-04  0:01   ` Darrick J. Wong
@ 2024-01-04  6:23     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 04:01:45PM -0800, Darrick J. Wong wrote:
> I actually want S_PRIVATE here to avoid interference from all the
> security hooks and whatnot when scrub is using an xfile to stash a
> large amount of data.  Shouldn't this patch change xfile_create to call
> shmem_kernel_file_setup instead?

Yes, and it used to do that before I reshuffled it..

> > -	inode->i_mode &= ~0177;
> > -	inode->i_uid = GLOBAL_ROOT_UID;
> > -	inode->i_gid = GLOBAL_ROOT_GID;
> 
> Also, I don't know if it matters that the default uid/gid are now going
> to be whatever the defaults would be for a new file instead of root
> only.  That seems like it could invite problems, but otoh xfiles are
> never installed in the fd table so userspace should never get access
> anyway.

In-kernel shm files are created on shm_mnt, which is owned by the global
root, so this will do the right thing.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings
       [not found]   ` <20240104000324.GC361584@frogsfrogsfrogs>
@ 2024-01-04  6:24     ` Christoph Hellwig
  2024-01-04  7:01       ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 04:03:24PM -0800, Darrick J. Wong wrote:
> > +	/*
> > +	 * We don't want to bother with kmapping data during repair, so don't
> > +	 * allow highmem pages to back this mapping.
> > +	 */
> > +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> 
> Gonna be fun to see what happens on 32-bit. ;)

32-bit with highmem, yes.  I suspect we should just not allow online
repair and scrub on that.  I've in fact been tempted to see who would
scream if we'd disallow XFS on 32-bit entirel, as that would simplify
a lot of things.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page
  2024-01-04  0:12   ` Darrick J. Wong
@ 2024-01-04  6:25     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:25 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 04:12:51PM -0800, Darrick J. Wong wrote:
> > +	if (error)
> > +		return error;
> > +
> > +	page = folio_file_page(folio, pos >> PAGE_SHIFT);
> > +	if (PageHWPoison(page)) {
> > +		folio_put(folio);
> 
> We need to unlock the folio here, right?

On the error return?  Yes.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store
  2024-01-04  0:20   ` Darrick J. Wong
@ 2024-01-04  6:26     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 04:20:24PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 03, 2024 at 08:41:25AM +0000, Christoph Hellwig wrote:
> > Rewrite xfile_obj_store to use xfile_get_page and xfile_put_page to
> > access the data in the shmem page cache instead of abusing the
> > shmem write_begin and write_end aops.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Much simpler, though I wonder if willy is going to have something to say
> about xfile.c continuing to pass pages around instead of folios.  I
> /think/ that's ok since we actually need the physical base page for
> doing IO, right?

Well, as mentioned in the cover letter I'd much prefer to return a folio
here, but we'd first need to sort out the whole hwpoison flag mess for
that first.  There's also a few issues with the xfs-internal xfiles
interface that need attention, but they're solvable.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: put the xfs xfile abstraction on a diet
  2024-01-04  1:35 ` put the xfs xfile abstraction on a diet Darrick J. Wong
@ 2024-01-04  6:26   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2024-01-04  6:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 05:35:02PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 03, 2024 at 08:41:11AM +0000, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series refactors and simplifies the code in the xfs xfile
> > abstraction, which is a thing layer on a kernel-use shmem file.
> > 
> > Do do this is needs a slighly lower level export from shmem.c,
> > which I combined with improving an assert and documentation there.
> 
> What's the base for this series?  Is it xfs-linux for-next?  Or
> djwong-wtf?

xfs-linux for-next, sorry.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/15] xfs: remove xfile_stat
  2024-01-04  6:14     ` Christoph Hellwig
@ 2024-01-04  6:55       ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  6:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Thu, Jan 04, 2024 at 07:14:15AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 03, 2024 at 03:45:33PM -0800, Darrick J. Wong wrote:
> > > +		__entry->bytes = inode->i_bytes;
> > 
> > Shouldn't this be (i_blocks << 9) + i_bytes?
> 
> Actually this should just be doing:
> 
> 	__entry->bytes = inode->i_blocks << SECTOR_SHIFT;
> 
> The bytes name here really confused me.

Me too.  It looks like some weird way to encode the bytes used by the
file using a u64 sector count and a u16 byte count for ... some reason?
XFS (and thankfully tmpfs) seem to ignore all that entirely.

> Or we could change the trace
> point to just report i_block directly and not rename it to bytes and
> change the unit?

I prefer to keep the tracepoint in bytes because that's a little easier
than rshifting by 9 in my head.

--D


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs
  2024-01-04  6:15     ` Christoph Hellwig
@ 2024-01-04  6:58       ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  6:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Thu, Jan 04, 2024 at 07:15:42AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 03, 2024 at 03:48:49PM -0800, Darrick J. Wong wrote:
> > "To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
> > functions are provided to read and persist objects into an xfile.  An errors
> > encountered here are treated as an out of memory error."
> 
> Ok.
> 
> > > -DEFINE_XFILE_EVENT(xfile_pwrite);
> > > +DEFINE_XFILE_EVENT(xfile_obj_load);
> > > +DEFINE_XFILE_EVENT(xfile_obj_store);
> > 
> > Want to shorten the names to xfile_load and xfile_store?  That's really
> > what they're doing anyway.
> 
> Fine with me.  Just for the trace points or also for the functions?

Might as well do them both, I don't think anyone really depends on those
exact names.  I don't. :)

> Also - returning ENOMEM for the API misuse cases (too large object,
> too large total size) always seemed weird to me.  Is there a really
> strong case for it or should we go for actually useful errors for those?

The errors returned by the xfile APIs can float out to userspace, so I'd
rather have them all turn into:

$ xfs_io -c 'scrub <fubar>' /
XFS_IOC_SCRUB_METADATA: Cannot allocate memory.

vs.

$ xfs_io -c 'scrub <fubar>' /
XFS_IOC_SCRUB_METADATA: File is too large.

So that users won't think that the root directory is too big or something.

--D


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings
  2024-01-04  6:24     ` Christoph Hellwig
@ 2024-01-04  7:01       ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-04  7:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Thu, Jan 04, 2024 at 07:24:28AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 03, 2024 at 04:03:24PM -0800, Darrick J. Wong wrote:
> > > +	/*
> > > +	 * We don't want to bother with kmapping data during repair, so don't
> > > +	 * allow highmem pages to back this mapping.
> > > +	 */
> > > +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> > 
> > Gonna be fun to see what happens on 32-bit. ;)
> 
> 32-bit with highmem, yes.  I suspect we should just not allow online
> repair and scrub on that.  I've in fact been tempted to see who would
> scream if we'd disallow XFS on 32-bit entirel, as that would simplify
> a lot of things.

You and Dave both. :)

I guess we could propose deprecating it like V4 and see if people come
out of the woodwork.

--D


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/15] xfs: remove struct xfile_page
  2024-01-03  8:41 ` [PATCH 12/15] xfs: remove struct xfile_page Christoph Hellwig
@ 2024-01-10 22:42   ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2024-01-10 22:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm

On Wed, Jan 03, 2024 at 08:41:23AM +0000, Christoph Hellwig wrote:
> Return the shmem page directly from xfile_page_get and pass it back
> to xfile_page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/scrub/xfarray.c | 23 +++++++++++++++--------
>  fs/xfs/scrub/xfarray.h |  2 +-
>  fs/xfs/scrub/xfile.c   | 27 ++++++++++-----------------
>  fs/xfs/scrub/xfile.h   | 21 ++-------------------
>  4 files changed, 28 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
> index c6e62c119148a1..4f396462186793 100644
> --- a/fs/xfs/scrub/xfarray.c
> +++ b/fs/xfs/scrub/xfarray.c
> @@ -570,7 +570,13 @@ xfarray_sort_get_page(
>  	loff_t			pos,
>  	uint64_t		len)
>  {
> -	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
> +	struct page		*page;
> +
> +	page = xfile_get_page(si->array->xfile, pos, len);
> +	if (IS_ERR(page))
> +		return PTR_ERR(page);
> +	si->page = page;
> +	return 0;
>  }
>  
>  /* Release a page we grabbed for sorting records. */
> @@ -578,8 +584,10 @@ static inline void
>  xfarray_sort_put_page(
>  	struct xfarray_sortinfo	*si)
>  {
> -	if (xfile_page_cached(&si->xfpage))
> -		xfile_put_page(si->array->xfile, &si->xfpage);
> +	if (si->page) {
> +		xfile_put_page(si->array->xfile, si->page);
> +		si->page = NULL;
> +	}
>  }
>  
>  /* Decide if these records are eligible for in-page sorting. */
> @@ -621,7 +629,7 @@ xfarray_pagesort(
>  		return error;
>  
>  	xfarray_sort_bump_heapsorts(si);
> -	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
> +	startp = page_address(si->page) + offset_in_page(lo_pos);
>  	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
>  
>  	xfarray_sort_bump_stores(si);
> @@ -845,15 +853,14 @@ xfarray_sort_load_cached(
>  	}
>  
>  	/* If the cached page is not the one we want, release it. */
> -	if (xfile_page_cached(&si->xfpage) &&
> -	    xfile_page_index(&si->xfpage) != startpage)
> +	if (si->page && si->page->index != startpage)

Aha!  With the xfile diet series applied, I think we actually /can/
support THPs backing xfiles, because you removed all the places where
the xfile code was randomly trying to bring a page uptodate with its own
opencoded zeroing and replaced that with letting tmpfs do it for us.
Everything works pretty nicely!

With this one exception.

Before this change, the xfile_page caching in the xfarray sort routines
tracked the pos that we used to get the page.  This patch changes that
to accessing si->page->index, but doesn't account for the fact that
page->index is set only on the head page of a THP.  The non-head pages
appear to have zeroes or random values?  AFAICT the same applies to
large folios in iomap.

Therefore, the invalidation logic here breaks because the index is
nonsense.  Tracking the pos in xfarray_sortinfo fixes the data
corruption issues in the sorting routine.

I'll fix this up in my tree, having pulled in the diet series this
morning.

--D

>  		xfarray_sort_put_page(si);
>  
>  	/*
>  	 * If we don't have a cached page (and we know the load is contained
>  	 * in a single page) then grab it.
>  	 */
> -	if (!xfile_page_cached(&si->xfpage)) {
> +	if (!si->page) {
>  		if (xfarray_sort_terminated(si, &error))
>  			return error;
>  
> @@ -863,7 +870,7 @@ xfarray_sort_load_cached(
>  			return error;
>  	}
>  
> -	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
> +	memcpy(ptr, page_address(si->page) + offset_in_page(idx_pos),
>  			si->array->obj_size);
>  	return 0;
>  }
> diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
> index 6f2862054e194d..5765f2ad30d885 100644
> --- a/fs/xfs/scrub/xfarray.h
> +++ b/fs/xfs/scrub/xfarray.h
> @@ -106,7 +106,7 @@ struct xfarray_sortinfo {
>  	unsigned int		flags;
>  
>  	/* Cache a page here for faster access. */
> -	struct xfile_page	xfpage;
> +	struct page		*page;
>  
>  #ifdef DEBUG
>  	/* Performance statistics. */
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 2b4b0c4e8d2fb6..715c4d10b67c14 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -267,15 +267,14 @@ xfile_seek_data(
>  
>  /*
>   * Grab the (locked) page for a memory object.  The object cannot span a page
> - * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
> - * cannot grab the page, or the usual negative errno.
> + * boundary.  Returns 0 the locked page if successful, or an ERR_PTR on
> + * failure.
>   */
> -int
> +struct page *
>  xfile_get_page(
>  	struct xfile		*xf,
>  	loff_t			pos,
> -	unsigned int		len,
> -	struct xfile_page	*xfpage)
> +	unsigned int		len)
>  {
>  	struct inode		*inode = file_inode(xf->file);
>  	struct folio		*folio = NULL;
> @@ -284,9 +283,9 @@ xfile_get_page(
>  	int			error;
>  
>  	if (inode->i_sb->s_maxbytes - pos < len)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	if (len > PAGE_SIZE - offset_in_page(pos))
> -		return -ENOTBLK;
> +		return ERR_PTR(-ENOTBLK);
>  
>  	trace_xfile_get_page(xf, pos, len);
>  
> @@ -301,12 +300,12 @@ xfile_get_page(
>  	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
>  	memalloc_nofs_restore(pflags);
>  	if (error)
> -		return error;
> +		return ERR_PTR(error);
>  
>  	page = folio_file_page(folio, pos >> PAGE_SHIFT);
>  	if (PageHWPoison(page)) {
>  		folio_put(folio);
> -		return -EIO;
> +		return ERR_PTR(-EIO);
>  	}
>  
>  	/*
> @@ -314,11 +313,7 @@ xfile_get_page(
>  	 * (potentially last) reference in xfile_put_page.
>  	 */
>  	set_page_dirty(page);
> -
> -	xfpage->page = page;
> -	xfpage->fsdata = NULL;
> -	xfpage->pos = round_down(pos, PAGE_SIZE);
> -	return 0;
> +	return page;
>  }
>  
>  /*
> @@ -327,10 +322,8 @@ xfile_get_page(
>  void
>  xfile_put_page(
>  	struct xfile		*xf,
> -	struct xfile_page	*xfpage)
> +	struct page		*page)
>  {
> -	struct page		*page = xfpage->page;
> -
>  	trace_xfile_put_page(xf, page->index << PAGE_SHIFT, PAGE_SIZE);
>  
>  	unlock_page(page);
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index 2f46b7d694ce99..993368b37b4b7c 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -6,22 +6,6 @@
>  #ifndef __XFS_SCRUB_XFILE_H__
>  #define __XFS_SCRUB_XFILE_H__
>  
> -struct xfile_page {
> -	struct page		*page;
> -	void			*fsdata;
> -	loff_t			pos;
> -};
> -
> -static inline bool xfile_page_cached(const struct xfile_page *xfpage)
> -{
> -	return xfpage->page != NULL;
> -}
> -
> -static inline pgoff_t xfile_page_index(const struct xfile_page *xfpage)
> -{
> -	return xfpage->page->index;
> -}
> -
>  struct xfile {
>  	struct file		*file;
>  };
> @@ -35,8 +19,7 @@ int xfile_obj_store(struct xfile *xf, const void *buf, size_t count,
>  
>  loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
>  
> -int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
> -		struct xfile_page *xbuf);
> -void xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
> +struct page *xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len);
> +void xfile_put_page(struct xfile *xf, struct page *page);
>  
>  #endif /* __XFS_SCRUB_XFILE_H__ */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2024-01-10 22:43 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-03  8:41 put the xfs xfile abstraction on a diet Christoph Hellwig
2024-01-03  8:41 ` [PATCH 01/15] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
2024-01-03 23:32   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 02/15] shmem: export shmem_get_folio Christoph Hellwig
2024-01-03  8:41 ` [PATCH 03/15] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
2024-01-04  0:21   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 04/15] xfs: remove xfile_stat Christoph Hellwig
2024-01-03 23:45   ` Darrick J. Wong
2024-01-04  6:14     ` Christoph Hellwig
2024-01-04  6:55       ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 05/15] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
2024-01-03 23:48   ` Darrick J. Wong
2024-01-04  6:15     ` Christoph Hellwig
2024-01-04  6:58       ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 06/15] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
2024-01-03 23:55   ` Darrick J. Wong
2024-01-04  6:21     ` Christoph Hellwig
2024-01-03  8:41 ` [PATCH 07/15] xfs: shmem_file_setup can't return NULL Christoph Hellwig
2024-01-03 23:56   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 08/15] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
2024-01-04  0:01   ` Darrick J. Wong
2024-01-04  6:23     ` Christoph Hellwig
2024-01-03  8:41 ` [PATCH 09/15] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
     [not found]   ` <20240104000324.GC361584@frogsfrogsfrogs>
2024-01-04  6:24     ` Christoph Hellwig
2024-01-04  7:01       ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 10/15] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
2024-01-04  0:04   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 11/15] xfs: use shmem_get_folio in xfile_get_page Christoph Hellwig
2024-01-04  0:12   ` Darrick J. Wong
2024-01-04  6:25     ` Christoph Hellwig
2024-01-03  8:41 ` [PATCH 12/15] xfs: remove struct xfile_page Christoph Hellwig
2024-01-10 22:42   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 13/15] xfs: don't unconditionally allocate a new page in xfile_get_page Christoph Hellwig
2024-01-04  0:18   ` Darrick J. Wong
2024-01-03  8:41 ` [PATCH 14/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_store Christoph Hellwig
2024-01-04  0:20   ` Darrick J. Wong
2024-01-04  6:26     ` Christoph Hellwig
2024-01-03  8:41 ` [PATCH 15/15] xfs: use xfile_get_page and xfile_put_page in xfile_obj_load Christoph Hellwig
2024-01-04  0:21   ` Darrick J. Wong
2024-01-04  1:35 ` put the xfs xfile abstraction on a diet Darrick J. Wong
2024-01-04  6:26   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox