linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset()
@ 2024-05-27 14:14 David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].

Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value
at the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.

... but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.

We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.

So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().

Survived a couple of days with the built bots and my cross-compile
attempts. Briefly tested with zram on x86-64.

[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>

v1 -> v2:
 * Rebased to v6.10-rc1
 * "mm: update _mapcount and page_type documentation"
  -> Exchange members and fixup doc as suggested by Mike
 * "mm: allow reuse of the lower 16bit of the page type with an actual
    type"
  -> Remove "highest bit" comment, fixup PG_buddy, extend description
 * "mm/zsmalloc: use a proper page type"
  -> Add and use HAVE_ZSMALLOC to fixup compilcation
  -> Fixup BUILD_BUG_ON
  -> Add some VM_WARN_ON_ONCE(!PageZsmalloc(page));
 * "mm/mm_init: initialize page->_mapcount directly
    in __init_single_page()"
  -> Fixup patch subject

David Hildenbrand (6):
  mm: update _mapcount and page_type documentation
  mm: allow reuse of the lower 16 bit of the page type with an actual
    type
  mm/zsmalloc: use a proper page type
  mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
  mm/filemap: reinitialize folio->_mapcount directly
  mm/mm_init: initialize page->_mapcount directly in
    __init_single_page()

 drivers/block/zram/Kconfig |  1 +
 include/linux/mm.h         | 10 ----------
 include/linux/mm_types.h   | 33 ++++++++++++++++++++++-----------
 include/linux/page-flags.h | 23 +++++++++++++++--------
 mm/Kconfig                 | 10 ++++++++--
 mm/filemap.c               |  2 +-
 mm/mm_init.c               |  2 +-
 mm/page_alloc.c            |  6 ++++--
 mm/zsmalloc.c              | 29 +++++++++++++++++++++++++----
 9 files changed, 77 insertions(+), 39 deletions(-)


base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/6] mm: update _mapcount and page_type documentation
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).

Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make
it clearer what typed folios are.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h       |  5 -----
 include/linux/mm_types.h | 28 +++++++++++++++++-----------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..018e7c0265ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
 	return atomic_read(&folio->_entire_mapcount) + 1;
 }
 
-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
 static inline void page_mapcount_reset(struct page *page)
 {
 	atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd4..6b2aeba792c4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
  * which is guaranteed to be aligned.  If you use the same storage as
  * page->mapping, you must restore it to NULL before freeing the page.
  *
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
  *
  * If you want to use the refcount field, it must be used in such a way
  * that other CPUs temporarily incrementing and then decrementing the
@@ -152,18 +150,26 @@ struct page {
 
 	union {		/* This union is 4 bytes in size. */
 		/*
-		 * If the page can be mapped to userspace, encodes the number
-		 * of times this page is referenced by a page table.
+		 * For head pages of typed folios, the value stored here
+		 * allows for determining what this page is used for. The
+		 * tail pages of typed folios will not store a type
+		 * (mapcount == -1).
+		 *
+		 * See page-flags.h for a list of page types which are currently
+		 * stored here.
 		 */
-		atomic_t _mapcount;
+		unsigned int page_type;
 
 		/*
-		 * If the page is neither PageSlab nor mappable to userspace,
-		 * the value stored here may help determine what this page
-		 * is used for.  See page-flags.h for a list of page types
-		 * which are currently stored here.
+		 * For pages that are part of non-typed folios for which mappings
+		 * are tracked via the RMAP, encodes the number of times this page
+		 * is directly referenced by a page table.
+		 *
+		 * Note that the mapcount is always initialized to -1, so that
+		 * transitions both from it and to it can be tracked, using
+		 * atomic_inc_and_test() and atomic_add_negative(-1).
 		 */
-		unsigned int page_type;
+		atomic_t _mapcount;
 	};
 
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  2024-05-27 15:26   ` Matthew Wilcox
  2024-05-29 15:55   ` [PATCH v1 26] mm " wang wei
  2024-05-27 14:14 ` [PATCH v1 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

As long as the owner sets a page type first, we can allow reuse of the
lower 18 bit: sufficient to store an offset into a 64 KiB page, which
is the maximum base page size in *common* configurations (ignoring the
256 KiB variant). Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.

Fear of running out of bits for storing the actual type? Actually, we
don't need one bit per type, we could store a single value instead.
Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm_types.h   |  5 +++++
 include/linux/page-flags.h | 20 ++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6b2aeba792c4..598cfedbbfa0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -157,6 +157,11 @@ struct page {
 		 *
 		 * See page-flags.h for a list of page types which are currently
 		 * stored here.
+		 *
+		 * Owners of typed folios may reuse the lower 16 bit of the
+		 * head page page_type field after setting the page type,
+		 * but must reset these 16 bit to -1 before clearing the
+		 * page type.
 		 */
 		unsigned int page_type;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 104078afe0b1..b43e380ffa0b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
  */
 
 #define PAGE_TYPE_BASE	0xf0000000
-/* Reserve		0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE	-128
-#define PG_buddy	0x00000080
-#define PG_offline	0x00000100
-#define PG_table	0x00000200
-#define PG_guard	0x00000400
-#define PG_hugetlb	0x00000800
-#define PG_slab		0x00001000
+/*
+ * Reserve		0x0000ffff to catch underflows of _mapcount and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PAGE_MAPCOUNT_RESERVE	-65536
+#define PG_buddy	0x00010000
+#define PG_offline	0x00020000
+#define PG_table	0x00040000
+#define PG_guard	0x00080000
+#define PG_hugetlb	0x00100800
+#define PG_slab		0x00200000
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 3/6] mm/zsmalloc: use a proper page type
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.

We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.

Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].

This removes the last _mapcount/page_type offender.

[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/

Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/block/zram/Kconfig |  1 +
 include/linux/page-flags.h |  3 +++
 mm/Kconfig                 | 10 ++++++++--
 mm/zsmalloc.c              | 29 +++++++++++++++++++++++++----
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 7b29cce60ab2..39c04419bf87 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -3,6 +3,7 @@ config ZRAM
 	tristate "Compressed RAM block device support"
 	depends on BLOCK && SYSFS && MMU
 	depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842
+	depends on HAVE_ZSMALLOC
 	select ZSMALLOC
 	help
 	  Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b43e380ffa0b..36d9ded4462d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -957,6 +957,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #define PG_guard	0x00080000
 #define PG_hugetlb	0x00100800
 #define PG_slab		0x00200000
+#define PG_zsmalloc	0x00400000
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1071,6 +1072,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
 FOLIO_TEST_FLAG_FALSE(hugetlb)
 #endif
 
+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
 /**
  * PageHuge - Determine if the page belongs to hugetlbfs
  * @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a54..67dc18c94448 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -128,7 +128,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
 choice
 	prompt "Default allocator"
 	depends on ZSWAP
-	default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if MMU
+	default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if HAVE_ZSMALLOC
 	default ZSWAP_ZPOOL_DEFAULT_ZBUD
 	help
 	  Selects the default allocator for the compressed cache for
@@ -154,6 +154,7 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
 
 config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
 	bool "zsmalloc"
+	depends on HAVE_ZSMALLOC
 	select ZSMALLOC
 	help
 	  Use the zsmalloc allocator as the default allocator.
@@ -186,10 +187,15 @@ config Z3FOLD
 	  page. It is a ZBUD derivative so the simplicity and determinism are
 	  still there.
 
+config HAVE_ZSMALLOC
+	def_bool y
+	depends on MMU
+	depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
+
 config ZSMALLOC
 	tristate
 	prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
-	depends on MMU
+	depends on HAVE_ZSMALLOC
 	help
 	  zsmalloc is a slab-based memory allocator designed to store
 	  pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b42d3545ca85..1a6af454520e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
  *	page->index: links together all component pages of a zspage
  *		For the huge page, this is always 0, so we use this field
  *		to store handle.
- *	page->page_type: first object offset in a subpage of zspage
+ *	page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ *		offset in a subpage of a zspage
  *
  * Usage of struct page flags:
  *	PG_private: identifies the first component page
@@ -450,14 +451,28 @@ static inline struct page *get_first_page(struct zspage *zspage)
 	return first_page;
 }
 
+#define FIRST_OBJ_PAGE_TYPE_MASK	0xffff
+
+static inline void reset_first_obj_offset(struct page *page)
+{
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	page->page_type |= FIRST_OBJ_PAGE_TYPE_MASK;
+}
+
 static inline unsigned int get_first_obj_offset(struct page *page)
 {
-	return page->page_type;
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	return page->page_type & FIRST_OBJ_PAGE_TYPE_MASK;
 }
 
 static inline void set_first_obj_offset(struct page *page, unsigned int offset)
 {
-	page->page_type = offset;
+	/* With 16 bit available, we can support offsets into 64 KiB pages. */
+	BUILD_BUG_ON(PAGE_SIZE > SZ_64K);
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK);
+	page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK;
+	page->page_type |= offset & FIRST_OBJ_PAGE_TYPE_MASK;
 }
 
 static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +806,9 @@ static void reset_page(struct page *page)
 	__ClearPageMovable(page);
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
-	page_mapcount_reset(page);
 	page->index = 0;
+	reset_first_obj_offset(page);
+	__ClearPageZsmalloc(page);
 }
 
 static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +981,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 		if (!page) {
 			while (--i >= 0) {
 				dec_zone_page_state(pages[i], NR_ZSPAGES);
+				__ClearPageZsmalloc(pages[i]);
 				__free_page(pages[i]);
 			}
 			cache_free_zspage(pool, zspage);
 			return NULL;
 		}
+		__SetPageZsmalloc(page);
 
 		inc_zone_page_state(page, NR_ZSPAGES);
 		pages[i] = page;
@@ -1762,6 +1780,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 
 	VM_BUG_ON_PAGE(!PageIsolated(page), page);
 
+	/* We're committed, tell the world that this is a Zsmalloc page. */
+	__SetPageZsmalloc(newpage);
+
 	/* The page is locked, so this pointer must remain valid */
 	zspage = get_zspage(page);
 	pool = zspage->pool;
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (2 preceding siblings ...)
  2024-05-27 14:14 ` [PATCH v1 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
  5 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e22ce5675ca..b595342e73c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-	page_mapcount_reset(page); /* remove PageBuddy */
+	if (PageBuddy(page))
+		__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 }
 
@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
 {
 	if (unlikely(page->flags & __PG_HWPOISON)) {
 		/* Don't complain about hwpoisoned pages */
-		page_mapcount_reset(page); /* remove PageBuddy */
+		if (PageBuddy(page))
+			__ClearPageBuddy(page);
 		return;
 	}
 
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 5/6] mm/filemap: reinitialize folio->_mapcount directly
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (3 preceding siblings ...)
  2024-05-27 14:14 ` [PATCH v1 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  2024-05-27 14:14 ` [PATCH v1 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
  5 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 382c3d06bfb1..c4ac7289e88a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 				 * and we'd rather not leak it: if we're wrong,
 				 * another bad page check should catch it later.
 				 */
-				page_mapcount_reset(&folio->page);
+				atomic_set(&folio->_mapcount, -1);
 				folio_ref_sub(folio, mapcount);
 			}
 		}
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page()
  2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (4 preceding siblings ...)
  2024-05-27 14:14 ` [PATCH v1 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
@ 2024-05-27 14:14 ` David Hildenbrand
  5 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h | 5 -----
 mm/mm_init.c       | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 018e7c0265ca..3e1d3b0d545e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
 	return atomic_read(&folio->_entire_mapcount) + 1;
 }
 
-static inline void page_mapcount_reset(struct page *page)
-{
-	atomic_set(&(page)->_mapcount, -1);
-}
-
 /**
  * page_mapcount() - Number of times this precise page is mapped.
  * @page: The page.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f72b852bd5b8..b4916751edce 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	mm_zero_struct_page(page);
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
-	page_mapcount_reset(page);
+	atomic_set(&page->_mapcount, -1);
 	page_cpupid_reset_last(page);
 	page_kasan_tag_reset(page);
 
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-27 14:14 ` [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-27 15:26   ` Matthew Wilcox
  2024-05-27 18:49     ` David Hildenbrand
  2024-05-29 15:55   ` [PATCH v1 26] mm " wang wei
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2024-05-27 15:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
	Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

On Mon, May 27, 2024 at 04:14:50PM +0200, David Hildenbrand wrote:
> As long as the owner sets a page type first, we can allow reuse of the
> lower 18 bit: sufficient to store an offset into a 64 KiB page, which

You say 18 here and 16 below.

> is the maximum base page size in *common* configurations (ignoring the
> 256 KiB variant). Restrict it to the head page.
> 
> We'll use that for zsmalloc next, to set a proper type while still
> reusing that field to store information (offset into a base page) that
> cannot go elsewhere for now.
> 
> Fear of running out of bits for storing the actual type? Actually, we
> don't need one bit per type, we could store a single value instead.
> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.

We could, but it's more instructions to check.

> +++ b/include/linux/page-flags.h
> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>   */
>  
>  #define PAGE_TYPE_BASE	0xf0000000
> -/* Reserve		0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE	-128
> -#define PG_buddy	0x00000080
> -#define PG_offline	0x00000100
> -#define PG_table	0x00000200
> -#define PG_guard	0x00000400
> -#define PG_hugetlb	0x00000800
> -#define PG_slab		0x00001000
> +/*
> + * Reserve		0x0000ffff to catch underflows of _mapcount and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PAGE_MAPCOUNT_RESERVE	-65536

I think my original comment was misleading.  This should be:

 * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflow.

How about we start at the top end and let people extend down?  ie:

#define PAGE_TYPE_BASE	0x80000000
#define PG_buddy	0x40000000
#define PG_offline	0x20000000
#define PG_table	0x10000000
#define PG_guard	0x08000000
#define PG_hugetlb	0x04000000
#define PG_slab		0x02000000
#define PAGE_MAPCOUNT_RESERVE	(~0x0000ffff)

Now we can see that we have 9 flags remaining, which should last until
we can have proper memdesc typing.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-27 15:26   ` Matthew Wilcox
@ 2024-05-27 18:49     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-27 18:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
	Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Am 27.05.24 um 17:26 schrieb Matthew Wilcox:
> On Mon, May 27, 2024 at 04:14:50PM +0200, David Hildenbrand wrote:
>> As long as the owner sets a page type first, we can allow reuse of the
>> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
> 
> You say 18 here and 16 below.

Thanks, missed to fixup one instance after going back and forth.

> 
>> is the maximum base page size in *common* configurations (ignoring the
>> 256 KiB variant). Restrict it to the head page.
>>
>> We'll use that for zsmalloc next, to set a proper type while still
>> reusing that field to store information (offset into a base page) that
>> cannot go elsewhere for now.
>>
>> Fear of running out of bits for storing the actual type? Actually, we
>> don't need one bit per type, we could store a single value instead.
>> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
> 
> We could, but it's more instructions to check.

Maybe, and maybe not sufficient more that we care.

I was thinking of something like the following (probably broken but you should 
get the idea):

/*
  * If the _mapcount is negative, we might store a page type. The
  * page_type field corresponds to the most significant byte of the
  * _mapcount field. As the mapcount is initialized to -1, we have no
  * type as defaults. We have plenty of room to underflow the mapcount
  * before we would end up indicating a valid page_type.
  */
#define PAGE_TYPE_BASE	0x80
enum page_type {
	PT_buddy = PAGE_TYPE_BASE,
	PT_offline,
	PT_table,
	PT_guard,
	PT_hugetlb,
	PT_slab,
	/* we must forbid page_type == -1 */
	PT_unusable = 0xff
};

In struct page:

union {
	atomic_t _mapcount;

#if __BYTE_ORDER == __BIG_ENDIAN
	struct {
		uint16_t page_type_data;
		uint8_t page_type_reserved;
		uint8_t page_type;
	};
#else
	struct {
		uint8_t page_type;
		uint8_t page_type_reserved;
		uint16_t page_type_data;
	};
#end
};

#define PageType(page, type) (page->page_type == type)

Once could maybe also change page_has_type to simply work on the
fact that the highest bit must be set and any other bit of the type must be clear:

static inline int page_has_type(const struct page *page)
{
	return (page->page_type & PAGE_TYPE_BASE) &&
		page->page_type != 0xffff;
}

But just some thought.

> 
>> +++ b/include/linux/page-flags.h
>> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>    */
>>   
>>   #define PAGE_TYPE_BASE	0xf0000000
>> -/* Reserve		0x0000007f to catch underflows of _mapcount */
>> -#define PAGE_MAPCOUNT_RESERVE	-128
>> -#define PG_buddy	0x00000080
>> -#define PG_offline	0x00000100
>> -#define PG_table	0x00000200
>> -#define PG_guard	0x00000400
>> -#define PG_hugetlb	0x00000800
>> -#define PG_slab		0x00001000
>> +/*
>> + * Reserve		0x0000ffff to catch underflows of _mapcount and
>> + * allow owners that set a type to reuse the lower 16 bit for their own
>> + * purposes.
>> + */
>> +#define PAGE_MAPCOUNT_RESERVE	-65536
> 
> I think my original comment was misleading.  This should be:
> 
>   * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflow.

Makes sense.

> 
> How about we start at the top end and let people extend down?  ie:
> 
> #define PAGE_TYPE_BASE	0x80000000
> #define PG_buddy	0x40000000
> #define PG_offline	0x20000000
> #define PG_table	0x10000000
> #define PG_guard	0x08000000
> #define PG_hugetlb	0x04000000
> #define PG_slab		0x02000000
> #define PAGE_MAPCOUNT_RESERVE	(~0x0000ffff)
> 
> Now we can see that we have 9 flags remaining, which should last until
> we can have proper memdesc typing.

Also works for me.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 26] mm allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-27 14:14 ` [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
  2024-05-27 15:26   ` Matthew Wilcox
@ 2024-05-29 15:55   ` wang wei
  2024-05-29 15:58     ` David Hildenbrand
  1 sibling, 1 reply; 11+ messages in thread
From: wang wei @ 2024-05-29 15:55 UTC (permalink / raw)
  To: david
  Cc: 42.hyeyoo, akpm, linux-kernel, linux-mm, minchan, rppt,
	senozhatsky, willy, wang wei

---
> As long as the owner sets a page type first, we can allow reuse of the
> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
> is the maximum base page size in *common* configurations (ignoring the
> 256 KiB variant). Restrict it to the head page.
> 
> We'll use that for zsmalloc next, to set a proper type while still
> reusing that field to store information (offset into a base page) that
> cannot go elsewhere for now.
> 
> Fear of running out of bits for storing the actual type? Actually, we
> don't need one bit per type, we could store a single value instead.
> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/mm_types.h   |  5 +++++
>  include/linux/page-flags.h | 20 ++++++++++++--------
>  2 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6b2aeba792c4..598cfedbbfa0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -157,6 +157,11 @@ struct page {
>  		 *
>  		 * See page-flags.h for a list of page types which are currently
>  		 * stored here.
> +		 *
> +		 * Owners of typed folios may reuse the lower 16 bit of the
> +		 * head page page_type field after setting the page type,
> +		 * but must reset these 16 bit to -1 before clearing the
> +		 * page type.
>  		 */
>  		unsigned int page_type;
>  
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 104078afe0b1..b43e380ffa0b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>   */
>  
>  #define PAGE_TYPE_BASE	0xf0000000
> -/* Reserve		0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE	-128
> -#define PG_buddy	0x00000080
> -#define PG_offline	0x00000100
> -#define PG_table	0x00000200
> -#define PG_guard	0x00000400
> -#define PG_hugetlb	0x00000800
> -#define PG_slab		0x00001000
> +/*
> + * Reserve		0x0000ffff to catch underflows of _mapcount and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PAGE_MAPCOUNT_RESERVE	-65536
> +#define PG_buddy	0x00010000
> +#define PG_offline	0x00020000
> +#define PG_table	0x00040000
> +#define PG_guard	0x00080000
> +#define PG_hugetlb	0x00100800

Every PG_XX occupies one bit in my understanding.   But why PG_hugetlb occupies two bits?

> +#define PG_slab		0x00200000
>  
>  #define PageType(page, flag)						\
>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 26] mm allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-29 15:55   ` [PATCH v1 26] mm " wang wei
@ 2024-05-29 15:58     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-05-29 15:58 UTC (permalink / raw)
  To: wang wei
  Cc: 42.hyeyoo, akpm, linux-kernel, linux-mm, minchan, rppt,
	senozhatsky, willy

On 29.05.24 17:55, wang wei wrote:
> ---
>> As long as the owner sets a page type first, we can allow reuse of the
>> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
>> is the maximum base page size in *common* configurations (ignoring the
>> 256 KiB variant). Restrict it to the head page.
>>
>> We'll use that for zsmalloc next, to set a proper type while still
>> reusing that field to store information (offset into a base page) that
>> cannot go elsewhere for now.
>>
>> Fear of running out of bits for storing the actual type? Actually, we
>> don't need one bit per type, we could store a single value instead.
>> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   include/linux/mm_types.h   |  5 +++++
>>   include/linux/page-flags.h | 20 ++++++++++++--------
>>   2 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 6b2aeba792c4..598cfedbbfa0 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -157,6 +157,11 @@ struct page {
>>   		 *
>>   		 * See page-flags.h for a list of page types which are currently
>>   		 * stored here.
>> +		 *
>> +		 * Owners of typed folios may reuse the lower 16 bit of the
>> +		 * head page page_type field after setting the page type,
>> +		 * but must reset these 16 bit to -1 before clearing the
>> +		 * page type.
>>   		 */
>>   		unsigned int page_type;
>>   
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 104078afe0b1..b43e380ffa0b 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>    */
>>   
>>   #define PAGE_TYPE_BASE	0xf0000000
>> -/* Reserve		0x0000007f to catch underflows of _mapcount */
>> -#define PAGE_MAPCOUNT_RESERVE	-128
>> -#define PG_buddy	0x00000080
>> -#define PG_offline	0x00000100
>> -#define PG_table	0x00000200
>> -#define PG_guard	0x00000400
>> -#define PG_hugetlb	0x00000800
>> -#define PG_slab		0x00001000
>> +/*
>> + * Reserve		0x0000ffff to catch underflows of _mapcount and
>> + * allow owners that set a type to reuse the lower 16 bit for their own
>> + * purposes.
>> + */
>> +#define PAGE_MAPCOUNT_RESERVE	-65536
>> +#define PG_buddy	0x00010000
>> +#define PG_offline	0x00020000
>> +#define PG_table	0x00040000
>> +#define PG_guard	0x00080000
>> +#define PG_hugetlb	0x00100800
> 
> Every PG_XX occupies one bit in my understanding.   But why PG_hugetlb occupies two bits?

Because it's wrong (although not harmful). Same issue in v2, fat fingers.

Thanks for pointing that out!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-29 15:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-27 14:14 [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
2024-05-27 15:26   ` Matthew Wilcox
2024-05-27 18:49     ` David Hildenbrand
2024-05-29 15:55   ` [PATCH v1 26] mm " wang wei
2024-05-29 15:58     ` David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
2024-05-27 14:14 ` [PATCH v1 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox