linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()
@ 2024-05-29 11:18 David Hildenbrand
  2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].

Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value
at the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.

... but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.

We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.

So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().

Briefly tested with zram on x86-64.

[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>

v1 -> v2:
 * Rebased to mm/mm-unstable
 * "mm: update _mapcount and page_type documentation"
  -> Minor comment change
 * "mm: allow reuse of the lower 16 bit of the page type with an actual
    type"
  -> Fixup 18 vs 16 in description
  -> Reduce PAGE_TYPE_BASE to a single bit and hand-out bits from highest
     to lowest
  -> Adjust description

RFC -> v1:
 * Rebased to v6.10-rc1
 * "mm: update _mapcount and page_type documentation"
  -> Exchange members and fixup doc as suggested by Mike
 * "mm: allow reuse of the lower 16bit of the page type with an actual
    type"
  -> Remove "highest bit" comment, fixup PG_buddy, extend description
 * "mm/zsmalloc: use a proper page type"
  -> Add and use HAVE_ZSMALLOC to fixup compilcation
  -> Fixup BUILD_BUG_ON
  -> Add some VM_WARN_ON_ONCE(!PageZsmalloc(page));
 * "mm/mm_init: initialize page->_mapcount directly
    in __init_single_page()"
  -> Fixup patch subject

David Hildenbrand (6):
  mm: update _mapcount and page_type documentation
  mm: allow reuse of the lower 16 bit of the page type with an actual
    type
  mm/zsmalloc: use a proper page type
  mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
  mm/filemap: reinitialize folio->_mapcount directly
  mm/mm_init: initialize page->_mapcount directly in
    __init_single_page()

 drivers/block/zram/Kconfig |  1 +
 include/linux/mm.h         | 10 ----------
 include/linux/mm_types.h   | 33 ++++++++++++++++++++++-----------
 include/linux/page-flags.h | 25 ++++++++++++++++---------
 mm/Kconfig                 | 10 ++++++++--
 mm/filemap.c               |  2 +-
 mm/mm_init.c               |  2 +-
 mm/page_alloc.c            |  6 ++++--
 mm/zsmalloc.c              | 29 +++++++++++++++++++++++++----
 9 files changed, 78 insertions(+), 40 deletions(-)

-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/6] mm: update _mapcount and page_type documentation
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
@ 2024-05-29 11:18 ` David Hildenbrand
  2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).

Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make
it clearer what typed folios are.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h       |  5 -----
 include/linux/mm_types.h | 28 +++++++++++++++++-----------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3aa1b6889bccf..eebfce8f58bca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
 	return atomic_read(&folio->_entire_mapcount) + 1;
 }
 
-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
 static inline void page_mapcount_reset(struct page *page)
 {
 	atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd48..dd2ce1b3ec80e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
  * which is guaranteed to be aligned.  If you use the same storage as
  * page->mapping, you must restore it to NULL before freeing the page.
  *
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
  *
  * If you want to use the refcount field, it must be used in such a way
  * that other CPUs temporarily incrementing and then decrementing the
@@ -152,18 +150,26 @@ struct page {
 
 	union {		/* This union is 4 bytes in size. */
 		/*
-		 * If the page can be mapped to userspace, encodes the number
-		 * of times this page is referenced by a page table.
+		 * For head pages of typed folios, the value stored here
+		 * allows for determining what this page is used for. The
+		 * tail pages of typed folios will not store a type
+		 * (page_type == _mapcount == -1).
+		 *
+		 * See page-flags.h for a list of page types which are currently
+		 * stored here.
 		 */
-		atomic_t _mapcount;
+		unsigned int page_type;
 
 		/*
-		 * If the page is neither PageSlab nor mappable to userspace,
-		 * the value stored here may help determine what this page
-		 * is used for.  See page-flags.h for a list of page types
-		 * which are currently stored here.
+		 * For pages that are part of non-typed folios for which mappings
+		 * are tracked via the RMAP, encodes the number of times this page
+		 * is directly referenced by a page table.
+		 *
+		 * Note that the mapcount is always initialized to -1, so that
+		 * transitions both from it and to it can be tracked, using
+		 * atomic_inc_and_test() and atomic_add_negative(-1).
 		 */
-		unsigned int page_type;
+		atomic_t _mapcount;
 	};
 
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
  2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
  2024-05-29 16:00   ` David Hildenbrand
  2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

As long as the owner sets a page type first, we can allow reuse of the
lower 16 bit: sufficient to store an offset into a 64 KiB page, which
is the maximum base page size in *common* configurations (ignoring the
256 KiB variant). Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.

Let's reserve the lower 16 bit for that purpose and for catching
mapcount underflows, and let's reduce PAGE_TYPE_BASE to a single bit.

Note that we will still have to overflow the mapcount quite a lot until
we would actually indicate a valid page type.

Start handing out the type bits from highest to lowest, to make it
clearer how many bits for types we have left. Out of 15 bit we can use
for types, we currently use 6. If we run out of bits before we have
better typing (e.g., memdesc), we can always investigate storing a value
instead [1].

[1] https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm_types.h   |  5 +++++
 include/linux/page-flags.h | 22 +++++++++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index dd2ce1b3ec80e..791afaf1b1ec3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -157,6 +157,11 @@ struct page {
 		 *
 		 * See page-flags.h for a list of page types which are currently
 		 * stored here.
+		 *
+		 * Owners of typed folios may reuse the lower 16 bit of the
+		 * head page page_type field after setting the page type,
+		 * but must reset these 16 bit to -1 before clearing the
+		 * page type.
 		 */
 		unsigned int page_type;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1bdbaaccc964..f060db808102c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
  * mistaken for a page type value.
  */
 
-#define PAGE_TYPE_BASE	0xf0000000
-/* Reserve		0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE	-128
-#define PG_buddy	0x00000080
-#define PG_offline	0x00000100
-#define PG_table	0x00000200
-#define PG_guard	0x00000400
-#define PG_hugetlb	0x00000800
-#define PG_slab		0x00001000
+#define PAGE_TYPE_BASE	0x80000000
+/*
+ * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PG_buddy	0x40000000
+#define PG_offline	0x20000000
+#define PG_table	0x10000000
+#define PG_guard	0x08000000
+#define PG_hugetlb	0x04008000
+#define PG_slab		0x02000000
+#define PAGE_MAPCOUNT_RESERVE	(~0x0000ffff)
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
  2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
  2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
  2024-05-30  5:01   ` Sergey Senozhatsky
  2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.

We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.

Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].

This removes the last _mapcount/page_type offender.

[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/

Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/block/zram/Kconfig |  1 +
 include/linux/page-flags.h |  3 +++
 mm/Kconfig                 | 10 ++++++++--
 mm/zsmalloc.c              | 29 +++++++++++++++++++++++++----
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 6aea609b795c2..40e035468de22 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -2,6 +2,7 @@
 config ZRAM
 	tristate "Compressed RAM block device support"
 	depends on BLOCK && SYSFS && MMU
+	depends on HAVE_ZSMALLOC
 	select ZSMALLOC
 	help
 	  Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f060db808102c..3afcbfbb379ea 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -957,6 +957,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #define PG_guard	0x08000000
 #define PG_hugetlb	0x04008000
 #define PG_slab		0x02000000
+#define PG_zsmalloc	0x01000000
 #define PAGE_MAPCOUNT_RESERVE	(~0x0000ffff)
 
 #define PageType(page, flag)						\
@@ -1072,6 +1073,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
 FOLIO_TEST_FLAG_FALSE(hugetlb)
 #endif
 
+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
 /**
  * PageHuge - Determine if the page belongs to hugetlbfs
  * @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a541..67dc18c94448d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -128,7 +128,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
 choice
 	prompt "Default allocator"
 	depends on ZSWAP
-	default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if MMU
+	default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if HAVE_ZSMALLOC
 	default ZSWAP_ZPOOL_DEFAULT_ZBUD
 	help
 	  Selects the default allocator for the compressed cache for
@@ -154,6 +154,7 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
 
 config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
 	bool "zsmalloc"
+	depends on HAVE_ZSMALLOC
 	select ZSMALLOC
 	help
 	  Use the zsmalloc allocator as the default allocator.
@@ -186,10 +187,15 @@ config Z3FOLD
 	  page. It is a ZBUD derivative so the simplicity and determinism are
 	  still there.
 
+config HAVE_ZSMALLOC
+	def_bool y
+	depends on MMU
+	depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
+
 config ZSMALLOC
 	tristate
 	prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
-	depends on MMU
+	depends on HAVE_ZSMALLOC
 	help
 	  zsmalloc is a slab-based memory allocator designed to store
 	  pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a2a5866473bb8..44e0171d60036 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
  *	page->index: links together all component pages of a zspage
  *		For the huge page, this is always 0, so we use this field
  *		to store handle.
- *	page->page_type: first object offset in a subpage of zspage
+ *	page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ *		offset in a subpage of a zspage
  *
  * Usage of struct page flags:
  *	PG_private: identifies the first component page
@@ -450,14 +451,28 @@ static inline struct page *get_first_page(struct zspage *zspage)
 	return first_page;
 }
 
+#define FIRST_OBJ_PAGE_TYPE_MASK	0xffff
+
+static inline void reset_first_obj_offset(struct page *page)
+{
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	page->page_type |= FIRST_OBJ_PAGE_TYPE_MASK;
+}
+
 static inline unsigned int get_first_obj_offset(struct page *page)
 {
-	return page->page_type;
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	return page->page_type & FIRST_OBJ_PAGE_TYPE_MASK;
 }
 
 static inline void set_first_obj_offset(struct page *page, unsigned int offset)
 {
-	page->page_type = offset;
+	/* With 16 bit available, we can support offsets into 64 KiB pages. */
+	BUILD_BUG_ON(PAGE_SIZE > SZ_64K);
+	VM_WARN_ON_ONCE(!PageZsmalloc(page));
+	VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK);
+	page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK;
+	page->page_type |= offset & FIRST_OBJ_PAGE_TYPE_MASK;
 }
 
 static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +806,9 @@ static void reset_page(struct page *page)
 	__ClearPageMovable(page);
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
-	page_mapcount_reset(page);
 	page->index = 0;
+	reset_first_obj_offset(page);
+	__ClearPageZsmalloc(page);
 }
 
 static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +981,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 		if (!page) {
 			while (--i >= 0) {
 				dec_zone_page_state(pages[i], NR_ZSPAGES);
+				__ClearPageZsmalloc(pages[i]);
 				__free_page(pages[i]);
 			}
 			cache_free_zspage(pool, zspage);
 			return NULL;
 		}
+		__SetPageZsmalloc(page);
 
 		inc_zone_page_state(page, NR_ZSPAGES);
 		pages[i] = page;
@@ -1754,6 +1772,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 
 	VM_BUG_ON_PAGE(!PageIsolated(page), page);
 
+	/* We're committed, tell the world that this is a Zsmalloc page. */
+	__SetPageZsmalloc(newpage);
+
 	/* The page is locked, so this pointer must remain valid */
 	zspage = get_zspage(page);
 	pool = zspage->pool;
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (2 preceding siblings ...)
  2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
  2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1e3eb5787de1..591d28b9f3e48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-	page_mapcount_reset(page); /* remove PageBuddy */
+	if (PageBuddy(page))
+		__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 }
 
@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
 {
 	if (unlikely(page->flags & __PG_HWPOISON)) {
 		/* Don't complain about hwpoisoned pages */
-		page_mapcount_reset(page); /* remove PageBuddy */
+		if (PageBuddy(page))
+			__ClearPageBuddy(page);
 		return;
 	}
 
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (3 preceding siblings ...)
  2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
  2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
  2024-05-30  5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index ba06237b942d6..9fe5c02ae92e7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 				 * and we'd rather not leak it: if we're wrong,
 				 * another bad page check should catch it later.
 				 */
-				page_mapcount_reset(&folio->page);
+				atomic_set(&folio->_mapcount, -1);
 				folio_ref_sub(folio, mapcount);
 			}
 		}
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page()
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (4 preceding siblings ...)
  2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
  2024-05-30  5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h | 5 -----
 mm/mm_init.c       | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index eebfce8f58bca..c41c82bcbec2f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
 	return atomic_read(&folio->_entire_mapcount) + 1;
 }
 
-static inline void page_mapcount_reset(struct page *page)
-{
-	atomic_set(&(page)->_mapcount, -1);
-}
-
 /**
  * page_mapcount() - Number of times this precise page is mapped.
  * @page: The page.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e0023aa685556..426314eeecec3 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	mm_zero_struct_page(page);
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
-	page_mapcount_reset(page);
+	atomic_set(&page->_mapcount, -1);
 	page_cpupid_reset_last(page);
 	page_kasan_tag_reset(page);
 
-- 
2.45.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
  2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-29 16:00   ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 16:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo,
	wang wei

>   
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d1bdbaaccc964..f060db808102c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>    * mistaken for a page type value.
>    */
>   
> -#define PAGE_TYPE_BASE	0xf0000000
> -/* Reserve		0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE	-128
> -#define PG_buddy	0x00000080
> -#define PG_offline	0x00000100
> -#define PG_table	0x00000200
> -#define PG_guard	0x00000400
> -#define PG_hugetlb	0x00000800
> -#define PG_slab		0x00001000
> +#define PAGE_TYPE_BASE	0x80000000
> +/*
> + * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PG_buddy	0x40000000
> +#define PG_offline	0x20000000
> +#define PG_table	0x10000000
> +#define PG_guard	0x08000000
> +#define PG_hugetlb	0x04008000

As Wang Wei points out, the 0 and 8 look too similar on my screen ;)

This should be

#define PG_hugetlb	0x04000000

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
@ 2024-05-30  5:01   ` Sergey Senozhatsky
  2024-05-31 14:27     ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-05-30  5:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

On (24/05/29 13:19), David Hildenbrand wrote:
> We won't be able to support 256 KiB base pages, which is acceptable.
[..]
> +config HAVE_ZSMALLOC
> +	def_bool y
> +	depends on MMU
> +	depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB

Can't really say that I'm happy with this, but if mm-folks are
fine then okay.

FWIW
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()
  2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
                   ` (5 preceding siblings ...)
  2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
@ 2024-05-30  5:02 ` Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-05-30  5:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
	Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo

On (24/05/29 13:18), David Hildenbrand wrote:
> Wanting to remove the remaining abuser of _mapcount/page_type along with
> page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
> converted away from "struct page" [1].
> 
> Unfortunately, we cannot stop using the page_type field in zsmalloc code
> completely for its own purposes. All other fields in "struct page" are
> used one way or the other. Could we simply store a 2-byte offset value
> at the beginning of each page? Likely, but that will require a bit more
> work; and once we have memdesc we might want to move the offset in there
> (struct zsalloc?) again.
> 
> ... but we can limit the abuse to 16 bit, glue it to a page type that
> must be set, and document it. page_has_type() will always successfully
> indicate such zsmalloc pages, and such zsmalloc pages only.
> 
> We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
> We could use more bits from the page type, but 16 bit sounds like a good
> idea for now.
> 
> So clarify the _mapcount/page_type documentation, use a proper page_type
> for zsmalloc, and remove page_mapcount_reset().
> 
> Briefly tested with zram on x86-64.
> 
> [1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> # zram/zsmalloc workloads


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-05-30  5:01   ` Sergey Senozhatsky
@ 2024-05-31 14:27     ` Matthew Wilcox
  2024-05-31 14:32       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2024-05-31 14:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Mike Rapoport, Minchan Kim, Hyeonggon Yoo

On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> On (24/05/29 13:19), David Hildenbrand wrote:
> > We won't be able to support 256 KiB base pages, which is acceptable.
> [..]
> > +config HAVE_ZSMALLOC
> > +	def_bool y
> > +	depends on MMU
> > +	depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
> 
> Can't really say that I'm happy with this, but if mm-folks are
> fine then okay.

I have an idea ...

We use 6 of the bits in the top byte of the page_type to enumerate
a type (ie value 0x80-0xbf) and then the remaining 24 bits are
available.  It's actually more efficient:

$ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o 
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
Function                                     old     new   delta
__filemap_add_folio                         1102    1098      -4
filemap_unaccount_folio                      455     446      -9
replace_page_cache_folio                     474     447     -27
Total: Before=41258, After=41218, chg -0.10%

(that's all from PG_hugetlb)

before:
    1406:       8b 46 30                mov    0x30(%rsi),%eax
                mapcount = atomic_read(&folio->_mapcount) + 1;
    1409:       83 c0 01                add    $0x1,%eax
                if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
    140c:       83 f8 81                cmp    $0xffffff81,%eax
    140f:       7d 6c                   jge    147d <filemap_unaccount_folio+0x8d>
    1411:       8b 43 30                mov    0x30(%rbx),%eax
    1414:       25 00 08 00 f0          and    $0xf0000800,%eax
    1419:       3d 00 00 00 f0          cmp    $0xf0000000,%eax
    141e:       74 4e                   je     146e <filemap_unaccount_folio+0x7e>

after:
    1406:       8b 46 30                mov    0x30(%rsi),%eax
                mapcount = atomic_read(&folio->_mapcount) + 1;
    1409:       83 c0 01                add    $0x1,%eax
                if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
    140c:       83 f8 81                cmp    $0xffffff81,%eax
    140f:       7d 63                   jge    1474 <filemap_unaccount_folio+0x8
4>
        if (folio_test_hugetlb(folio))
    1411:       80 7b 33 84             cmpb   $0x84,0x33(%rbx)
    1415:       74 4e                   je     1465 <filemap_unaccount_folio+0x75>

so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
be faster to execute as well as being more compact in the I$ (6 bytes vs 15).

Anyway, not tested but this is the patch I used to generate the above.
More for comment than application.

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5265b3434b9e..4129d04ac812 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
  * mistaken for a page type value.
  */
 
-#define PAGE_TYPE_BASE	0xf0000000
-/* Reserve		0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE	-128
-#define PG_buddy	0x00000080
-#define PG_offline	0x00000100
-#define PG_table	0x00000200
-#define PG_guard	0x00000400
-#define PG_hugetlb	0x00000800
-#define PG_slab		0x00001000
-
-#define PageType(page, flag)						\
-	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
-#define folio_test_type(folio, flag)					\
-	((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+/* Reserve             0x0000007f to catch underflows of _mapcount */
+#define PAGE_MAPCOUNT_RESERVE  -128
+
+#define PG_buddy	0x80
+#define PG_offline	0x81
+#define PG_table	0x82
+#define PG_guard	0x83
+#define PG_hugetlb	0x84
+#define PG_slab		0x85
+
+#define PageType(page, type)						\
+	(((page)->page_type >> 24) == type)
+#define folio_test_type(folio, type)					\
+	(((folio)->page.page_type >> 24) == type)
 
 static inline int page_type_has_type(unsigned int page_type)
 {
-	return (int)page_type < PAGE_MAPCOUNT_RESERVE;
+	return ((int)page_type < 0) && (page_type < 0xc0000000);
 }
 
 static inline int page_has_type(const struct page *page)
@@ -975,12 +975,12 @@ static __always_inline bool folio_test_##fname(const struct folio *folio)\
 static __always_inline void __folio_set_##fname(struct folio *folio)	\
 {									\
 	VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio);		\
-	folio->page.page_type &= ~PG_##lname;				\
+	folio->page.page_type = PG_##lname << 24;			\
 }									\
 static __always_inline void __folio_clear_##fname(struct folio *folio)	\
 {									\
 	VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);		\
-	folio->page.page_type |= PG_##lname;				\
+	folio->page.page_type = 0xffffffff;				\
 }
 
 #define PAGE_TYPE_OPS(uname, lname, fname)				\
@@ -992,12 +992,12 @@ static __always_inline int Page##uname(const struct page *page)		\
 static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
 	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
-	page->page_type &= ~PG_##lname;					\
+	page->page_type = PG_##lname << 24;				\
 }									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
-	page->page_type |= PG_##lname;					\
+	page->page_type = 0xffffffff;					\
 }
 
 /*


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-05-31 14:27     ` Matthew Wilcox
@ 2024-05-31 14:32       ` David Hildenbrand
  2024-06-25 22:33         ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-31 14:32 UTC (permalink / raw)
  To: Matthew Wilcox, Sergey Senozhatsky
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
	Minchan Kim, Hyeonggon Yoo

On 31.05.24 16:27, Matthew Wilcox wrote:
> On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
>> On (24/05/29 13:19), David Hildenbrand wrote:
>>> We won't be able to support 256 KiB base pages, which is acceptable.
>> [..]
>>> +config HAVE_ZSMALLOC
>>> +	def_bool y
>>> +	depends on MMU
>>> +	depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
>>
>> Can't really say that I'm happy with this, but if mm-folks are
>> fine then okay.
> 
> I have an idea ...
> 
> We use 6 of the bits in the top byte of the page_type to enumerate
> a type (ie value 0x80-0xbf) and then the remaining 24 bits are
> available.  It's actually more efficient:
> 
> $ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o
> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
> Function                                     old     new   delta
> __filemap_add_folio                         1102    1098      -4
> filemap_unaccount_folio                      455     446      -9
> replace_page_cache_folio                     474     447     -27
> Total: Before=41258, After=41218, chg -0.10%
> 
> (that's all from PG_hugetlb)
> 
> before:
>      1406:       8b 46 30                mov    0x30(%rsi),%eax
>                  mapcount = atomic_read(&folio->_mapcount) + 1;
>      1409:       83 c0 01                add    $0x1,%eax
>                  if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
>      140c:       83 f8 81                cmp    $0xffffff81,%eax
>      140f:       7d 6c                   jge    147d <filemap_unaccount_folio+0x8d>
>      1411:       8b 43 30                mov    0x30(%rbx),%eax
>      1414:       25 00 08 00 f0          and    $0xf0000800,%eax
>      1419:       3d 00 00 00 f0          cmp    $0xf0000000,%eax
>      141e:       74 4e                   je     146e <filemap_unaccount_folio+0x7e>
> 
> after:
>      1406:       8b 46 30                mov    0x30(%rsi),%eax
>                  mapcount = atomic_read(&folio->_mapcount) + 1;
>      1409:       83 c0 01                add    $0x1,%eax
>                  if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
>      140c:       83 f8 81                cmp    $0xffffff81,%eax
>      140f:       7d 63                   jge    1474 <filemap_unaccount_folio+0x8
> 4>
>          if (folio_test_hugetlb(folio))
>      1411:       80 7b 33 84             cmpb   $0x84,0x33(%rbx)
>      1415:       74 4e                   je     1465 <filemap_unaccount_folio+0x75>
> 
> so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
> 
> Anyway, not tested but this is the patch I used to generate the above.
> More for comment than application.

Right, it's likely very similar to my previous proposal to use 8 bit 
(uint8_t) for the type.

https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/

I would prefer if we would do that separately; unless someone is able to 
raise why we care about zram + 256KiB that much right now. (claim: we don't)

> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5265b3434b9e..4129d04ac812 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>    * mistaken for a page type value.
>    */
>   
> -#define PAGE_TYPE_BASE	0xf0000000
> -/* Reserve		0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE	-128
> -#define PG_buddy	0x00000080
> -#define PG_offline	0x00000100
> -#define PG_table	0x00000200
> -#define PG_guard	0x00000400
> -#define PG_hugetlb	0x00000800
> -#define PG_slab		0x00001000
> -
> -#define PageType(page, flag)						\
> -	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> -#define folio_test_type(folio, flag)					\
> -	((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +/* Reserve             0x0000007f to catch underflows of _mapcount */
> +#define PAGE_MAPCOUNT_RESERVE  -128
> +
> +#define PG_buddy	0x80
> +#define PG_offline	0x81
> +#define PG_table	0x82
> +#define PG_guard	0x83
> +#define PG_hugetlb	0x84
> +#define PG_slab		0x85

Hoping we can stop calling that PG_ ...




-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-05-31 14:32       ` David Hildenbrand
@ 2024-06-25 22:33         ` Andrew Morton
  2024-06-26  4:41           ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2024-06-25 22:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, Sergey Senozhatsky, linux-kernel, linux-mm,
	Mike Rapoport, Minchan Kim, Hyeonggon Yoo

On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 31.05.24 16:27, Matthew Wilcox wrote:
> > On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> >      1409:       83 c0 01                add    $0x1,%eax
> >                  if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> >      140c:       83 f8 81                cmp    $0xffffff81,%eax
> >      140f:       7d 63                   jge    1474 <filemap_unaccount_folio+0x8
> > 4>
> >          if (folio_test_hugetlb(folio))
> >      1411:       80 7b 33 84             cmpb   $0x84,0x33(%rbx)
> >      1415:       74 4e                   je     1465 <filemap_unaccount_folio+0x75>
> > 
> > so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> > be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
> > 
> > Anyway, not tested but this is the patch I used to generate the above.
> > More for comment than application.
> 
> Right, it's likely very similar to my previous proposal to use 8 bit 
> (uint8_t) for the type.
> 
> https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
> 
> I would prefer if we would do that separately; unless someone is able to 
> raise why we care about zram + 256KiB that much right now. (claim: we don't)
> 

iow, "this is ok for now", yes?

I'd like to push this into mm-"stable" later this week.  Speak now or
forever hold your pieces.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-06-25 22:33         ` Andrew Morton
@ 2024-06-26  4:41           ` Sergey Senozhatsky
  2024-06-26  5:08             ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-06-26  4:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Matthew Wilcox, Sergey Senozhatsky,
	linux-kernel, linux-mm, Mike Rapoport, Minchan Kim,
	Hyeonggon Yoo

On (24/06/25 15:33), Andrew Morton wrote:
> On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
> > On 31.05.24 16:27, Matthew Wilcox wrote:
> > > On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> > >      1409:       83 c0 01                add    $0x1,%eax
> > >                  if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> > >      140c:       83 f8 81                cmp    $0xffffff81,%eax
> > >      140f:       7d 63                   jge    1474 <filemap_unaccount_folio+0x8
> > > 4>
> > >          if (folio_test_hugetlb(folio))
> > >      1411:       80 7b 33 84             cmpb   $0x84,0x33(%rbx)
> > >      1415:       74 4e                   je     1465 <filemap_unaccount_folio+0x75>
> > > 
> > > so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> > > be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
> > > 
> > > Anyway, not tested but this is the patch I used to generate the above.
> > > More for comment than application.
> > 
> > Right, it's likely very similar to my previous proposal to use 8 bit 
> > (uint8_t) for the type.
> > 
> > https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
> > 
> > I would prefer if we would do that separately; unless someone is able to 
> > raise why we care about zram + 256KiB that much right now. (claim: we don't)
> > 
> 
> iow, "this is ok for now", yes?

Perhaps.  I'm not in position to claim that zram + 256KiB PAGE_SIZE is
irrelevant, but I'm also not in position to claim the opposite.

Matthew and David have ideas/proposals/patches to fix it should 256KiB
PAGE_SIZE become an issue.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
  2024-06-26  4:41           ` Sergey Senozhatsky
@ 2024-06-26  5:08             ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-06-26  5:08 UTC (permalink / raw)
  To: Sergey Senozhatsky, Andrew Morton
  Cc: Matthew Wilcox, linux-kernel, linux-mm, Mike Rapoport,
	Minchan Kim, Hyeonggon Yoo

On 26.06.24 06:41, Sergey Senozhatsky wrote:
> On (24/06/25 15:33), Andrew Morton wrote:
>> On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 31.05.24 16:27, Matthew Wilcox wrote:
>>>> On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
>>>>       1409:       83 c0 01                add    $0x1,%eax
>>>>                   if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
>>>>       140c:       83 f8 81                cmp    $0xffffff81,%eax
>>>>       140f:       7d 63                   jge    1474 <filemap_unaccount_folio+0x8
>>>> 4>
>>>>           if (folio_test_hugetlb(folio))
>>>>       1411:       80 7b 33 84             cmpb   $0x84,0x33(%rbx)
>>>>       1415:       74 4e                   je     1465 <filemap_unaccount_folio+0x75>
>>>>
>>>> so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
>>>> be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
>>>>
>>>> Anyway, not tested but this is the patch I used to generate the above.
>>>> More for comment than application.
>>>
>>> Right, it's likely very similar to my previous proposal to use 8 bit
>>> (uint8_t) for the type.
>>>
>>> https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
>>>
>>> I would prefer if we would do that separately; unless someone is able to
>>> raise why we care about zram + 256KiB that much right now. (claim: we don't)
>>>
>>
>> iow, "this is ok for now", yes?
> 
> Perhaps.  I'm not in position to claim that zram + 256KiB PAGE_SIZE is
> irrelevant, but I'm also not in position to claim the opposite.
> 
> Matthew and David have ideas/proposals/patches to fix it should 256KiB
> PAGE_SIZE become an issue.

Yes, let's keep it simple for now. There are various ways to handle that 
if there is really the need to. 256KiB is not particularly common (quite 
the opposite I would claim), and a simple fix would be dedicating 18 
instead of 16 bit.

Long-term, we should handle it more cleanly though, and there are also 
various ways forward (store offset in page, separate allocation like 
memdesc for metadata, etc.).

Mess with turning page types from flags into values should be a separate 
effort, because requires more care (e.g., PAGE_SLAB_MAPCOUNT_VALUE).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-26  5:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
2024-05-29 16:00   ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
2024-05-30  5:01   ` Sergey Senozhatsky
2024-05-31 14:27     ` Matthew Wilcox
2024-05-31 14:32       ` David Hildenbrand
2024-06-25 22:33         ` Andrew Morton
2024-06-26  4:41           ` Sergey Senozhatsky
2024-06-26  5:08             ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
2024-05-30  5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox