* [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()
@ 2024-05-29 11:18 David Hildenbrand
2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:18 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].
Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value
at the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.
... but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.
We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.
So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().
Briefly tested with zram on x86-64.
[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
v1 -> v2:
* Rebased to mm/mm-unstable
* "mm: update _mapcount and page_type documentation"
-> Minor comment change
* "mm: allow reuse of the lower 16 bit of the page type with an actual
type"
-> Fixup 18 vs 16 in description
-> Reduce PAGE_TYPE_BASE to a single bit and hand-out bits from highest
to lowest
-> Adjust description
RFC -> v1:
* Rebased to v6.10-rc1
* "mm: update _mapcount and page_type documentation"
-> Exchange members and fixup doc as suggested by Mike
* "mm: allow reuse of the lower 16bit of the page type with an actual
type"
-> Remove "highest bit" comment, fixup PG_buddy, extend description
* "mm/zsmalloc: use a proper page type"
-> Add and use HAVE_ZSMALLOC to fixup compilcation
-> Fixup BUILD_BUG_ON
-> Add some VM_WARN_ON_ONCE(!PageZsmalloc(page));
* "mm/mm_init: initialize page->_mapcount directly
in __init_single_page()"
-> Fixup patch subject
David Hildenbrand (6):
mm: update _mapcount and page_type documentation
mm: allow reuse of the lower 16 bit of the page type with an actual
type
mm/zsmalloc: use a proper page type
mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
mm/filemap: reinitialize folio->_mapcount directly
mm/mm_init: initialize page->_mapcount directly in
__init_single_page()
drivers/block/zram/Kconfig | 1 +
include/linux/mm.h | 10 ----------
include/linux/mm_types.h | 33 ++++++++++++++++++++++-----------
include/linux/page-flags.h | 25 ++++++++++++++++---------
mm/Kconfig | 10 ++++++++--
mm/filemap.c | 2 +-
mm/mm_init.c | 2 +-
mm/page_alloc.c | 6 ++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
9 files changed, 78 insertions(+), 40 deletions(-)
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/6] mm: update _mapcount and page_type documentation
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
@ 2024-05-29 11:18 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
` (5 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:18 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).
Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make
it clearer what typed folios are.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 28 +++++++++++++++++-----------
2 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3aa1b6889bccf..eebfce8f58bca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}
-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
static inline void page_mapcount_reset(struct page *page)
{
atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd48..dd2ce1b3ec80e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
* which is guaranteed to be aligned. If you use the same storage as
* page->mapping, you must restore it to NULL before freeing the page.
*
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
*
* If you want to use the refcount field, it must be used in such a way
* that other CPUs temporarily incrementing and then decrementing the
@@ -152,18 +150,26 @@ struct page {
union { /* This union is 4 bytes in size. */
/*
- * If the page can be mapped to userspace, encodes the number
- * of times this page is referenced by a page table.
+ * For head pages of typed folios, the value stored here
+ * allows for determining what this page is used for. The
+ * tail pages of typed folios will not store a type
+ * (page_type == _mapcount == -1).
+ *
+ * See page-flags.h for a list of page types which are currently
+ * stored here.
*/
- atomic_t _mapcount;
+ unsigned int page_type;
/*
- * If the page is neither PageSlab nor mappable to userspace,
- * the value stored here may help determine what this page
- * is used for. See page-flags.h for a list of page types
- * which are currently stored here.
+ * For pages that are part of non-typed folios for which mappings
+ * are tracked via the RMAP, encodes the number of times this page
+ * is directly referenced by a page table.
+ *
+ * Note that the mapcount is always initialized to -1, so that
+ * transitions both from it and to it can be tracked, using
+ * atomic_inc_and_test() and atomic_add_negative(-1).
*/
- unsigned int page_type;
+ atomic_t _mapcount;
};
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
2024-05-29 16:00 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
` (4 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
As long as the owner sets a page type first, we can allow reuse of the
lower 16 bit: sufficient to store an offset into a 64 KiB page, which
is the maximum base page size in *common* configurations (ignoring the
256 KiB variant). Restrict it to the head page.
We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.
Let's reserve the lower 16 bit for that purpose and for catching
mapcount underflows, and let's reduce PAGE_TYPE_BASE to a single bit.
Note that we will still have to overflow the mapcount quite a lot until
we would actually indicate a valid page type.
Start handing out the type bits from highest to lowest, to make it
clearer how many bits for types we have left. Out of 15 bit we can use
for types, we currently use 6. If we run out of bits before we have
better typing (e.g., memdesc), we can always investigate storing a value
instead [1].
[1] https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 22 +++++++++++++---------
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index dd2ce1b3ec80e..791afaf1b1ec3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -157,6 +157,11 @@ struct page {
*
* See page-flags.h for a list of page types which are currently
* stored here.
+ *
+ * Owners of typed folios may reuse the lower 16 bit of the
+ * head page page_type field after setting the page type,
+ * but must reset these 16 bit to -1 before clearing the
+ * page type.
*/
unsigned int page_type;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1bdbaaccc964..f060db808102c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
* mistaken for a page type value.
*/
-#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
+#define PAGE_TYPE_BASE 0x80000000
+/*
+ * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PG_buddy 0x40000000
+#define PG_offline 0x20000000
+#define PG_table 0x10000000
+#define PG_guard 0x08000000
+#define PG_hugetlb 0x04008000
+#define PG_slab 0x02000000
+#define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)
#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
2024-05-30 5:01 ` Sergey Senozhatsky
2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
` (3 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.
We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.
Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].
This removes the last _mapcount/page_type offender.
[1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/block/zram/Kconfig | 1 +
include/linux/page-flags.h | 3 +++
mm/Kconfig | 10 ++++++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
4 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 6aea609b795c2..40e035468de22 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -2,6 +2,7 @@
config ZRAM
tristate "Compressed RAM block device support"
depends on BLOCK && SYSFS && MMU
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f060db808102c..3afcbfbb379ea 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -957,6 +957,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
#define PG_guard 0x08000000
#define PG_hugetlb 0x04008000
#define PG_slab 0x02000000
+#define PG_zsmalloc 0x01000000
#define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)
#define PageType(page, flag) \
@@ -1072,6 +1073,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
FOLIO_TEST_FLAG_FALSE(hugetlb)
#endif
+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a541..67dc18c94448d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -128,7 +128,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
choice
prompt "Default allocator"
depends on ZSWAP
- default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if MMU
+ default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if HAVE_ZSMALLOC
default ZSWAP_ZPOOL_DEFAULT_ZBUD
help
Selects the default allocator for the compressed cache for
@@ -154,6 +154,7 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
bool "zsmalloc"
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Use the zsmalloc allocator as the default allocator.
@@ -186,10 +187,15 @@ config Z3FOLD
page. It is a ZBUD derivative so the simplicity and determinism are
still there.
+config HAVE_ZSMALLOC
+ def_bool y
+ depends on MMU
+ depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
+
config ZSMALLOC
tristate
prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
- depends on MMU
+ depends on HAVE_ZSMALLOC
help
zsmalloc is a slab-based memory allocator designed to store
pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a2a5866473bb8..44e0171d60036 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
* page->index: links together all component pages of a zspage
* For the huge page, this is always 0, so we use this field
* to store handle.
- * page->page_type: first object offset in a subpage of zspage
+ * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ * offset in a subpage of a zspage
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -450,14 +451,28 @@ static inline struct page *get_first_page(struct zspage *zspage)
return first_page;
}
+#define FIRST_OBJ_PAGE_TYPE_MASK 0xffff
+
+static inline void reset_first_obj_offset(struct page *page)
+{
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ page->page_type |= FIRST_OBJ_PAGE_TYPE_MASK;
+}
+
static inline unsigned int get_first_obj_offset(struct page *page)
{
- return page->page_type;
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ return page->page_type & FIRST_OBJ_PAGE_TYPE_MASK;
}
static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
- page->page_type = offset;
+ /* With 16 bit available, we can support offsets into 64 KiB pages. */
+ BUILD_BUG_ON(PAGE_SIZE > SZ_64K);
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK);
+ page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK;
+ page->page_type |= offset & FIRST_OBJ_PAGE_TYPE_MASK;
}
static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +806,9 @@ static void reset_page(struct page *page)
__ClearPageMovable(page);
ClearPagePrivate(page);
set_page_private(page, 0);
- page_mapcount_reset(page);
page->index = 0;
+ reset_first_obj_offset(page);
+ __ClearPageZsmalloc(page);
}
static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +981,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!page) {
while (--i >= 0) {
dec_zone_page_state(pages[i], NR_ZSPAGES);
+ __ClearPageZsmalloc(pages[i]);
__free_page(pages[i]);
}
cache_free_zspage(pool, zspage);
return NULL;
}
+ __SetPageZsmalloc(page);
inc_zone_page_state(page, NR_ZSPAGES);
pages[i] = page;
@@ -1754,6 +1772,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
VM_BUG_ON_PAGE(!PageIsolated(page), page);
+ /* We're committed, tell the world that this is a Zsmalloc page. */
+ __SetPageZsmalloc(newpage);
+
/* The page is locked, so this pointer must remain valid */
zspage = get_zspage(page);
pool = zspage->pool;
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
` (2 preceding siblings ...)
2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
` (2 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1e3eb5787de1..591d28b9f3e48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}
@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
{
if (unlikely(page->flags & __PG_HWPOISON)) {
/* Don't complain about hwpoisoned pages */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
return;
}
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
` (3 preceding siblings ...)
2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
2024-05-30 5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky
6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index ba06237b942d6..9fe5c02ae92e7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
* and we'd rather not leak it: if we're wrong,
* another bad page check should catch it later.
*/
- page_mapcount_reset(&folio->page);
+ atomic_set(&folio->_mapcount, -1);
folio_ref_sub(folio, mapcount);
}
}
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page()
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
` (4 preceding siblings ...)
2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
@ 2024-05-29 11:19 ` David Hildenbrand
2024-05-30 5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky
6 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 11:19 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 5 -----
mm/mm_init.c | 2 +-
2 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index eebfce8f58bca..c41c82bcbec2f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}
-static inline void page_mapcount_reset(struct page *page)
-{
- atomic_set(&(page)->_mapcount, -1);
-}
-
/**
* page_mapcount() - Number of times this precise page is mapped.
* @page: The page.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e0023aa685556..426314eeecec3 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
- page_mapcount_reset(page);
+ atomic_set(&page->_mapcount, -1);
page_cpupid_reset_last(page);
page_kasan_tag_reset(page);
--
2.45.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type
2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
@ 2024-05-29 16:00 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-29 16:00 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo,
wang wei
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d1bdbaaccc964..f060db808102c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> * mistaken for a page type value.
> */
>
> -#define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> +#define PAGE_TYPE_BASE 0x80000000
> +/*
> + * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PG_buddy 0x40000000
> +#define PG_offline 0x20000000
> +#define PG_table 0x10000000
> +#define PG_guard 0x08000000
> +#define PG_hugetlb 0x04008000
As Wang Wei points out, the 0 and 8 look too similar on my screen ;)
This should be
#define PG_hugetlb 0x04000000
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
@ 2024-05-30 5:01 ` Sergey Senozhatsky
2024-05-31 14:27 ` Matthew Wilcox
0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-05-30 5:01 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
On (24/05/29 13:19), David Hildenbrand wrote:
> We won't be able to support 256 KiB base pages, which is acceptable.
[..]
> +config HAVE_ZSMALLOC
> + def_bool y
> + depends on MMU
> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
Can't really say that I'm happy with this, but if mm-folks are
fine then okay.
FWIW
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
` (5 preceding siblings ...)
2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
@ 2024-05-30 5:02 ` Sergey Senozhatsky
6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-05-30 5:02 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
Mike Rapoport, Minchan Kim, Sergey Senozhatsky, Hyeonggon Yoo
On (24/05/29 13:18), David Hildenbrand wrote:
> Wanting to remove the remaining abuser of _mapcount/page_type along with
> page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
> converted away from "struct page" [1].
>
> Unfortunately, we cannot stop using the page_type field in zsmalloc code
> completely for its own purposes. All other fields in "struct page" are
> used one way or the other. Could we simply store a 2-byte offset value
> at the beginning of each page? Likely, but that will require a bit more
> work; and once we have memdesc we might want to move the offset in there
> (struct zsalloc?) again.
>
> ... but we can limit the abuse to 16 bit, glue it to a page type that
> must be set, and document it. page_has_type() will always successfully
> indicate such zsmalloc pages, and such zsmalloc pages only.
>
> We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
> We could use more bits from the page type, but 16 bit sounds like a good
> idea for now.
>
> So clarify the _mapcount/page_type documentation, use a proper page_type
> for zsmalloc, and remove page_mapcount_reset().
>
> Briefly tested with zram on x86-64.
>
> [1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> # zram/zsmalloc workloads
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-05-30 5:01 ` Sergey Senozhatsky
@ 2024-05-31 14:27 ` Matthew Wilcox
2024-05-31 14:32 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2024-05-31 14:27 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
Mike Rapoport, Minchan Kim, Hyeonggon Yoo
On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> On (24/05/29 13:19), David Hildenbrand wrote:
> > We won't be able to support 256 KiB base pages, which is acceptable.
> [..]
> > +config HAVE_ZSMALLOC
> > + def_bool y
> > + depends on MMU
> > + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
>
> Can't really say that I'm happy with this, but if mm-folks are
> fine then okay.
I have an idea ...
We use 6 of the bits in the top byte of the page_type to enumerate
a type (ie value 0x80-0xbf) and then the remaining 24 bits are
available. It's actually more efficient:
$ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
Function old new delta
__filemap_add_folio 1102 1098 -4
filemap_unaccount_folio 455 446 -9
replace_page_cache_folio 474 447 -27
Total: Before=41258, After=41218, chg -0.10%
(that's all from PG_hugetlb)
before:
1406: 8b 46 30 mov 0x30(%rsi),%eax
mapcount = atomic_read(&folio->_mapcount) + 1;
1409: 83 c0 01 add $0x1,%eax
if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
140c: 83 f8 81 cmp $0xffffff81,%eax
140f: 7d 6c jge 147d <filemap_unaccount_folio+0x8d>
1411: 8b 43 30 mov 0x30(%rbx),%eax
1414: 25 00 08 00 f0 and $0xf0000800,%eax
1419: 3d 00 00 00 f0 cmp $0xf0000000,%eax
141e: 74 4e je 146e <filemap_unaccount_folio+0x7e>
after:
1406: 8b 46 30 mov 0x30(%rsi),%eax
mapcount = atomic_read(&folio->_mapcount) + 1;
1409: 83 c0 01 add $0x1,%eax
if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
140c: 83 f8 81 cmp $0xffffff81,%eax
140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
4>
if (folio_test_hugetlb(folio))
1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
Anyway, not tested but this is the patch I used to generate the above.
More for comment than application.
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5265b3434b9e..4129d04ac812 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
* mistaken for a page type value.
*/
-#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
-
-#define PageType(page, flag) \
- ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
-#define folio_test_type(folio, flag) \
- ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+/* Reserve 0x0000007f to catch underflows of _mapcount */
+#define PAGE_MAPCOUNT_RESERVE -128
+
+#define PG_buddy 0x80
+#define PG_offline 0x81
+#define PG_table 0x82
+#define PG_guard 0x83
+#define PG_hugetlb 0x84
+#define PG_slab 0x85
+
+#define PageType(page, type) \
+ (((page)->page_type >> 24) == type)
+#define folio_test_type(folio, type) \
+ (((folio)->page.page_type >> 24) == type)
static inline int page_type_has_type(unsigned int page_type)
{
- return (int)page_type < PAGE_MAPCOUNT_RESERVE;
+ return ((int)page_type < 0) && (page_type < 0xc0000000);
}
static inline int page_has_type(const struct page *page)
@@ -975,12 +975,12 @@ static __always_inline bool folio_test_##fname(const struct folio *folio)\
static __always_inline void __folio_set_##fname(struct folio *folio) \
{ \
VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
- folio->page.page_type &= ~PG_##lname; \
+ folio->page.page_type = PG_##lname << 24; \
} \
static __always_inline void __folio_clear_##fname(struct folio *folio) \
{ \
VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
- folio->page.page_type |= PG_##lname; \
+ folio->page.page_type = 0xffffffff; \
}
#define PAGE_TYPE_OPS(uname, lname, fname) \
@@ -992,12 +992,12 @@ static __always_inline int Page##uname(const struct page *page) \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!PageType(page, 0), page); \
- page->page_type &= ~PG_##lname; \
+ page->page_type = PG_##lname << 24; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
- page->page_type |= PG_##lname; \
+ page->page_type = 0xffffffff; \
}
/*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-05-31 14:27 ` Matthew Wilcox
@ 2024-05-31 14:32 ` David Hildenbrand
2024-06-25 22:33 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-31 14:32 UTC (permalink / raw)
To: Matthew Wilcox, Sergey Senozhatsky
Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
Minchan Kim, Hyeonggon Yoo
On 31.05.24 16:27, Matthew Wilcox wrote:
> On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
>> On (24/05/29 13:19), David Hildenbrand wrote:
>>> We won't be able to support 256 KiB base pages, which is acceptable.
>> [..]
>>> +config HAVE_ZSMALLOC
>>> + def_bool y
>>> + depends on MMU
>>> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
>>
>> Can't really say that I'm happy with this, but if mm-folks are
>> fine then okay.
>
> I have an idea ...
>
> We use 6 of the bits in the top byte of the page_type to enumerate
> a type (ie value 0x80-0xbf) and then the remaining 24 bits are
> available. It's actually more efficient:
>
> $ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o
> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
> Function old new delta
> __filemap_add_folio 1102 1098 -4
> filemap_unaccount_folio 455 446 -9
> replace_page_cache_folio 474 447 -27
> Total: Before=41258, After=41218, chg -0.10%
>
> (that's all from PG_hugetlb)
>
> before:
> 1406: 8b 46 30 mov 0x30(%rsi),%eax
> mapcount = atomic_read(&folio->_mapcount) + 1;
> 1409: 83 c0 01 add $0x1,%eax
> if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> 140c: 83 f8 81 cmp $0xffffff81,%eax
> 140f: 7d 6c jge 147d <filemap_unaccount_folio+0x8d>
> 1411: 8b 43 30 mov 0x30(%rbx),%eax
> 1414: 25 00 08 00 f0 and $0xf0000800,%eax
> 1419: 3d 00 00 00 f0 cmp $0xf0000000,%eax
> 141e: 74 4e je 146e <filemap_unaccount_folio+0x7e>
>
> after:
> 1406: 8b 46 30 mov 0x30(%rsi),%eax
> mapcount = atomic_read(&folio->_mapcount) + 1;
> 1409: 83 c0 01 add $0x1,%eax
> if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> 140c: 83 f8 81 cmp $0xffffff81,%eax
> 140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
> 4>
> if (folio_test_hugetlb(folio))
> 1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
> 1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
>
> so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
>
> Anyway, not tested but this is the patch I used to generate the above.
> More for comment than application.
Right, it's likely very similar to my previous proposal to use 8 bit
(uint8_t) for the type.
https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
I would prefer if we would do that separately; unless someone is able to
raise why we care about zram + 256KiB that much right now. (claim: we don't)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5265b3434b9e..4129d04ac812 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> * mistaken for a page type value.
> */
>
> -#define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> -
> -#define PageType(page, flag) \
> - ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> -#define folio_test_type(folio, flag) \
> - ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +/* Reserve 0x0000007f to catch underflows of _mapcount */
> +#define PAGE_MAPCOUNT_RESERVE -128
> +
> +#define PG_buddy 0x80
> +#define PG_offline 0x81
> +#define PG_table 0x82
> +#define PG_guard 0x83
> +#define PG_hugetlb 0x84
> +#define PG_slab 0x85
Hoping we can stop calling that PG_ ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-05-31 14:32 ` David Hildenbrand
@ 2024-06-25 22:33 ` Andrew Morton
2024-06-26 4:41 ` Sergey Senozhatsky
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2024-06-25 22:33 UTC (permalink / raw)
To: David Hildenbrand
Cc: Matthew Wilcox, Sergey Senozhatsky, linux-kernel, linux-mm,
Mike Rapoport, Minchan Kim, Hyeonggon Yoo
On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:
> On 31.05.24 16:27, Matthew Wilcox wrote:
> > On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> > 1409: 83 c0 01 add $0x1,%eax
> > if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> > 140c: 83 f8 81 cmp $0xffffff81,%eax
> > 140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
> > 4>
> > if (folio_test_hugetlb(folio))
> > 1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
> > 1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
> >
> > so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> > be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
> >
> > Anyway, not tested but this is the patch I used to generate the above.
> > More for comment than application.
>
> Right, it's likely very similar to my previous proposal to use 8 bit
> (uint8_t) for the type.
>
> https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
>
> I would prefer if we would do that separately; unless someone is able to
> raise why we care about zram + 256KiB that much right now. (claim: we don't)
>
iow, "this is ok for now", yes?
I'd like to push this into mm-"stable" later this week. Speak now or
forever hold your pieces.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-06-25 22:33 ` Andrew Morton
@ 2024-06-26 4:41 ` Sergey Senozhatsky
2024-06-26 5:08 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2024-06-26 4:41 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Matthew Wilcox, Sergey Senozhatsky,
linux-kernel, linux-mm, Mike Rapoport, Minchan Kim,
Hyeonggon Yoo
On (24/06/25 15:33), Andrew Morton wrote:
> On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:
>
> > On 31.05.24 16:27, Matthew Wilcox wrote:
> > > On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> > > 1409: 83 c0 01 add $0x1,%eax
> > > if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> > > 140c: 83 f8 81 cmp $0xffffff81,%eax
> > > 140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
> > > 4>
> > > if (folio_test_hugetlb(folio))
> > > 1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
> > > 1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
> > >
> > > so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> > > be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
> > >
> > > Anyway, not tested but this is the patch I used to generate the above.
> > > More for comment than application.
> >
> > Right, it's likely very similar to my previous proposal to use 8 bit
> > (uint8_t) for the type.
> >
> > https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
> >
> > I would prefer if we would do that separately; unless someone is able to
> > raise why we care about zram + 256KiB that much right now. (claim: we don't)
> >
>
> iow, "this is ok for now", yes?
Perhaps. I'm not in position to claim that zram + 256KiB PAGE_SIZE is
irrelevant, but I'm also not in position to claim the opposite.
Matthew and David have ideas/proposals/patches to fix it should 256KiB
PAGE_SIZE become an issue.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type
2024-06-26 4:41 ` Sergey Senozhatsky
@ 2024-06-26 5:08 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-06-26 5:08 UTC (permalink / raw)
To: Sergey Senozhatsky, Andrew Morton
Cc: Matthew Wilcox, linux-kernel, linux-mm, Mike Rapoport,
Minchan Kim, Hyeonggon Yoo
On 26.06.24 06:41, Sergey Senozhatsky wrote:
> On (24/06/25 15:33), Andrew Morton wrote:
>> On Fri, 31 May 2024 16:32:04 +0200 David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 31.05.24 16:27, Matthew Wilcox wrote:
>>>> On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
>>>> 1409: 83 c0 01 add $0x1,%eax
>>>> if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
>>>> 140c: 83 f8 81 cmp $0xffffff81,%eax
>>>> 140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
>>>> 4>
>>>> if (folio_test_hugetlb(folio))
>>>> 1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
>>>> 1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
>>>>
>>>> so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
>>>> be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
>>>>
>>>> Anyway, not tested but this is the patch I used to generate the above.
>>>> More for comment than application.
>>>
>>> Right, it's likely very similar to my previous proposal to use 8 bit
>>> (uint8_t) for the type.
>>>
>>> https://lore.kernel.org/all/00ba1dff-7c05-46e8-b0d9-a78ac1cfc198@redhat.com/
>>>
>>> I would prefer if we would do that separately; unless someone is able to
>>> raise why we care about zram + 256KiB that much right now. (claim: we don't)
>>>
>>
>> iow, "this is ok for now", yes?
>
> Perhaps. I'm not in position to claim that zram + 256KiB PAGE_SIZE is
> irrelevant, but I'm also not in position to claim the opposite.
>
> Matthew and David have ideas/proposals/patches to fix it should 256KiB
> PAGE_SIZE become an issue.
Yes, let's keep it simple for now. There are various ways to handle that
if there is really the need to. 256KiB is not particularly common (quite
the opposite I would claim), and a simple fix would be dedicating 18
instead of 16 bit.
Long-term, we should handle it more cleanly though, and there are also
various ways forward (store offset in page, separate allocation like
memdesc for metadata, etc.).
Mess with turning page types from flags into values should be a separate
effort, because requires more care (e.g., PAGE_SLAB_MAPCOUNT_VALUE).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-06-26 5:08 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-29 11:18 [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() David Hildenbrand
2024-05-29 11:18 ` [PATCH v2 1/6] mm: update _mapcount and page_type documentation David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type David Hildenbrand
2024-05-29 16:00 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 3/6] mm/zsmalloc: use a proper page type David Hildenbrand
2024-05-30 5:01 ` Sergey Senozhatsky
2024-05-31 14:27 ` Matthew Wilcox
2024-05-31 14:32 ` David Hildenbrand
2024-06-25 22:33 ` Andrew Morton
2024-06-26 4:41 ` Sergey Senozhatsky
2024-06-26 5:08 ` David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly David Hildenbrand
2024-05-29 11:19 ` [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page() David Hildenbrand
2024-05-30 5:02 ` [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset() Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox