* [PATCH] mm: add zblock allocator
@ 2025-04-01 17:17 Vitaly Wool
2025-04-01 18:24 ` Nhat Pham
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Vitaly Wool @ 2025-04-01 17:17 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, Vitaly Wool, Igor Belousov
zblock is a special purpose allocator for storing compressed pages.
It stores integer number of compressed objects per its block. These
blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
With zblock, it is possible to densely arrange objects of various sizes
resulting in low internal fragmentation. Also this allocator tries to
fill incomplete blocks instead of adding new ones, in many cases
providing a compression ratio substantially higher than z3fold and zbud
(though lower than zmalloc's).
zblock does not require MMU to operate and also is superior to zsmalloc
with regard to average performance and worst execution times, thus
allowing for better response time and real-time characteristics of the
whole system.
E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
Signed-off-by: Igor Belousov <igor.b@beldev.am>
---
Documentation/mm/zblock.rst | 22 ++
MAINTAINERS | 7 +
mm/Kconfig | 8 +
mm/Makefile | 1 +
mm/zblock.c | 492 ++++++++++++++++++++++++++++++++++++
5 files changed, 530 insertions(+)
create mode 100644 Documentation/mm/zblock.rst
create mode 100644 mm/zblock.c
diff --git a/Documentation/mm/zblock.rst b/Documentation/mm/zblock.rst
new file mode 100644
index 000000000000..754b3dbb9e94
--- /dev/null
+++ b/Documentation/mm/zblock.rst
@@ -0,0 +1,22 @@
+======
+zblock
+======
+
+zblock is a special purpose allocator for storing compressed pages.
+It stores integer number of compressed objects per its block. These
+blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
+
+With zblock, it is possible to densely arrange objects of various sizes
+resulting in low internal fragmentation. Also this allocator tries to
+fill incomplete blocks instead of adding new ones, in many cases
+providing a compression ratio substantially higher than z3fold and zbud
+(though lower than zmalloc's).
+
+zblock does not require MMU to operate and also is superior to zsmalloc
+with regard to average performance and worst execution times, thus
+allowing for better response time and real-time characteristics of the
+whole system.
+
+E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
+5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
+
diff --git a/MAINTAINERS b/MAINTAINERS
index 991a33bad10e..166e9bfa04dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26313,6 +26313,13 @@ F: Documentation/networking/device_drivers/hamradio/z8530drv.rst
F: drivers/net/hamradio/*scc.c
F: drivers/net/hamradio/z8530.h
+ZBLOCK COMPRESSED SLAB MEMORY ALLOCATOR
+M: Vitaly Wool <vitaly.wool@konsulko.se>
+L: linux-mm@kvack.org
+S: Maintained
+F: Documentation/mm/zblock.rst
+F: mm/zblock.c
+
ZBUD COMPRESSED PAGE ALLOCATOR
M: Seth Jennings <sjenning@redhat.com>
M: Dan Streetman <ddstreet@ieee.org>
diff --git a/mm/Kconfig b/mm/Kconfig
index 1b501db06417..26b79e3c1300 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,6 +193,14 @@ config Z3FOLD_DEPRECATED
page. It is a ZBUD derivative so the simplicity and determinism are
still there.
+config ZBLOCK
+ tristate "Fast compression allocator with high density"
+ depends on ZPOOL
+ help
+ A special purpose allocator for storing compressed pages.
+ It is designed to store same size compressed pages in blocks of
+ physical pages.
+
config Z3FOLD
tristate
default y if Z3FOLD_DEPRECATED=y
diff --git a/mm/Makefile b/mm/Makefile
index 850386a67b3e..2018455b7baa 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -116,6 +116,7 @@ obj-$(CONFIG_ZPOOL) += zpool.o
obj-$(CONFIG_ZBUD) += zbud.o
obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
obj-$(CONFIG_Z3FOLD) += z3fold.o
+obj-$(CONFIG_ZBLOCK) += zblock.o
obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
obj-$(CONFIG_CMA) += cma.o
obj-$(CONFIG_NUMA) += numa.o
diff --git a/mm/zblock.c b/mm/zblock.c
new file mode 100644
index 000000000000..a6778653c451
--- /dev/null
+++ b/mm/zblock.c
@@ -0,0 +1,492 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * zblock.c
+ *
+ * Author: Vitaly Wool <vitaly.wool@konsulko.com>
+ * Based on the work from Ananda Badmaev <a.badmaev@clicknet.pro>
+ * Copyright (C) 2022-2024, Konsulko AB.
+ *
+ * Zblock is a small object allocator with the intention to serve as a
+ * zpool backend. It operates on page blocks which consist of number
+ * of physical pages being a power of 2 and store integer number of
+ * compressed pages per block which results in determinism and simplicity.
+ *
+ * zblock doesn't export any API and is meant to be used via zpool API.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/atomic.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/preempt.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/zpool.h>
+
+#define SLOT_FREE 0
+#define BIT_SLOT_OCCUPIED 0
+#define BIT_SLOT_MAPPED 1
+
+#define SLOT_BITS 5
+#define MAX_SLOTS (1 << SLOT_BITS)
+#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1)
+
+#define ZBLOCK_HEADER_SIZE round_up(sizeof(struct zblock_block), sizeof(long))
+#define BLOCK_DATA_SIZE(order) ((PAGE_SIZE << order) - ZBLOCK_HEADER_SIZE)
+#define SLOT_SIZE(nslots, order) (round_down((BLOCK_DATA_SIZE(order) / nslots), sizeof(long)))
+
+#define BLOCK_CACHE_SIZE 32
+
+struct zblock_pool;
+
+/**
+ * struct zblock_block - block metadata
+ * Block consists of several (1/2/4/8) pages and contains fixed
+ * integer number of slots for allocating compressed pages.
+ *
+ * free_slots: number of free slots in the block
+ * slot_info: contains data about free/occupied slots
+ * cache_idx: index of the block in cache
+ */
+struct zblock_block {
+ atomic_t free_slots;
+ u64 slot_info[1];
+ int cache_idx;
+};
+
+/**
+ * struct block_desc - general metadata for block lists
+ * Each block list stores only blocks of corresponding type which means
+ * that all blocks in it have the same number and size of slots.
+ * All slots are aligned to size of long.
+ *
+ * slot_size: size of slot for this list
+ * slots_per_block: number of slots per block for this list
+ * order: order for __get_free_pages
+ */
+static const struct block_desc {
+ const unsigned int slot_size;
+ const unsigned short slots_per_block;
+ const unsigned short order;
+} block_desc[] = {
+ { SLOT_SIZE(32, 0), 32, 0 },
+ { SLOT_SIZE(22, 0), 22, 0 },
+ { SLOT_SIZE(17, 0), 17, 0 },
+ { SLOT_SIZE(13, 0), 13, 0 },
+ { SLOT_SIZE(11, 0), 11, 0 },
+ { SLOT_SIZE(9, 0), 9, 0 },
+ { SLOT_SIZE(8, 0), 8, 0 },
+ { SLOT_SIZE(14, 1), 14, 1 },
+ { SLOT_SIZE(12, 1), 12, 1 },
+ { SLOT_SIZE(11, 1), 11, 1 },
+ { SLOT_SIZE(10, 1), 10, 1 },
+ { SLOT_SIZE(9, 1), 9, 1 },
+ { SLOT_SIZE(8, 1), 8, 1 },
+ { SLOT_SIZE(15, 2), 15, 2 },
+ { SLOT_SIZE(14, 2), 14, 2 },
+ { SLOT_SIZE(13, 2), 13, 2 },
+ { SLOT_SIZE(12, 2), 12, 2 },
+ { SLOT_SIZE(11, 2), 11, 2 },
+ { SLOT_SIZE(10, 2), 10, 2 },
+ { SLOT_SIZE(9, 2), 9, 2 },
+ { SLOT_SIZE(8, 2), 8, 2 },
+ { SLOT_SIZE(15, 3), 15, 3 },
+ { SLOT_SIZE(14, 3), 14, 3 },
+ { SLOT_SIZE(13, 3), 13, 3 },
+ { SLOT_SIZE(12, 3), 12, 3 },
+ { SLOT_SIZE(11, 3), 11, 3 },
+ { SLOT_SIZE(10, 3), 10, 3 },
+ { SLOT_SIZE(9, 3), 9, 3 },
+ { SLOT_SIZE(7, 3), 7, 3 }
+};
+
+/**
+ * struct block_list - stores metadata of particular list
+ * lock: protects block_cache
+ * block_cache: blocks with free slots
+ * block_count: total number of blocks in the list
+ */
+struct block_list {
+ spinlock_t lock;
+ struct zblock_block *block_cache[BLOCK_CACHE_SIZE];
+ unsigned long block_count;
+};
+
+/**
+ * struct zblock_pool - stores metadata for each zblock pool
+ * @block_lists: array of block lists
+ * @zpool: zpool driver
+ * @alloc_flag: protects block allocation from memory leak
+ *
+ * This structure is allocated at pool creation time and maintains metadata
+ * for a particular zblock pool.
+ */
+struct zblock_pool {
+ struct block_list block_lists[ARRAY_SIZE(block_desc)];
+ struct zpool *zpool;
+ atomic_t alloc_flag;
+};
+
+/*****************
+ * Helpers
+ *****************/
+
+static int cache_insert_block(struct zblock_block *block, struct block_list *list)
+{
+ unsigned int i, min_free_slots = atomic_read(&block->free_slots);
+ int min_index = -1;
+
+ if (WARN_ON(block->cache_idx != -1))
+ return -EINVAL;
+
+ min_free_slots = atomic_read(&block->free_slots);
+ for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
+ if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) {
+ min_index = i;
+ break;
+ }
+ if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) {
+ min_free_slots = atomic_read(&(list->block_cache[i])->free_slots);
+ min_index = i;
+ }
+ }
+ if (min_index >= 0) {
+ if (list->block_cache[min_index])
+ (list->block_cache[min_index])->cache_idx = -1;
+ list->block_cache[min_index] = block;
+ block->cache_idx = min_index;
+ }
+ return min_index < 0 ? min_index : 0;
+}
+
+static struct zblock_block *cache_find_block(struct block_list *list)
+{
+ int i;
+ struct zblock_block *z = NULL;
+
+ for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
+ if (list->block_cache[i] &&
+ atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) {
+ z = list->block_cache[i];
+ break;
+ }
+ }
+ return z;
+}
+
+static int cache_remove_block(struct block_list *list, struct zblock_block *block)
+{
+ int idx = block->cache_idx;
+
+ block->cache_idx = -1;
+ if (idx >= 0)
+ list->block_cache[idx] = NULL;
+ return idx < 0 ? idx : 0;
+}
+
+/*
+ * Encodes the handle of a particular slot in the pool using metadata
+ */
+static inline unsigned long metadata_to_handle(struct zblock_block *block,
+ unsigned int block_type, unsigned int slot)
+{
+ return (unsigned long)(block) + (block_type << SLOT_BITS) + slot;
+}
+
+/* Returns block, block type and slot in the pool corresponding to handle */
+static inline struct zblock_block *handle_to_metadata(unsigned long handle,
+ unsigned int *block_type, unsigned int *slot)
+{
+ *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS;
+ *slot = handle & SLOT_MASK;
+ return (struct zblock_block *)(handle & PAGE_MASK);
+}
+
+
+/*
+ * allocate new block and add it to corresponding block list
+ */
+static struct zblock_block *alloc_block(struct zblock_pool *pool,
+ int block_type, gfp_t gfp,
+ unsigned long *handle)
+{
+ struct zblock_block *block;
+ struct block_list *list;
+
+ block = (void *)__get_free_pages(gfp, block_desc[block_type].order);
+ if (!block)
+ return NULL;
+
+ list = &(pool->block_lists)[block_type];
+
+ /* init block data */
+ memset(&block->slot_info, 0, sizeof(block->slot_info));
+ atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1);
+ block->cache_idx = -1;
+ set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info);
+ *handle = metadata_to_handle(block, block_type, 0);
+
+ spin_lock(&list->lock);
+ cache_insert_block(block, list);
+ list->block_count++;
+ spin_unlock(&list->lock);
+ return block;
+}
+
+/*****************
+ * API Functions
+ *****************/
+/**
+ * zblock_create_pool() - create a new zblock pool
+ * @gfp: gfp flags when allocating the zblock pool structure
+ * @ops: user-defined operations for the zblock pool
+ *
+ * Return: pointer to the new zblock pool or NULL if the metadata allocation
+ * failed.
+ */
+static struct zblock_pool *zblock_create_pool(gfp_t gfp)
+{
+ struct zblock_pool *pool;
+ struct block_list *list;
+ int i, j;
+
+ pool = kmalloc(sizeof(struct zblock_pool), gfp);
+ if (!pool)
+ return NULL;
+
+ /* init each block list */
+ for (i = 0; i < ARRAY_SIZE(block_desc); i++) {
+ list = &(pool->block_lists)[i];
+ spin_lock_init(&list->lock);
+ for (j = 0; j < BLOCK_CACHE_SIZE; j++)
+ list->block_cache[j] = NULL;
+ list->block_count = 0;
+ }
+ atomic_set(&pool->alloc_flag, 0);
+ return pool;
+}
+
+/**
+ * zblock_destroy_pool() - destroys an existing zblock pool
+ * @pool: the zblock pool to be destroyed
+ *
+ */
+static void zblock_destroy_pool(struct zblock_pool *pool)
+{
+ kfree(pool);
+}
+
+
+/**
+ * zblock_alloc() - allocates a slot of appropriate size
+ * @pool: zblock pool from which to allocate
+ * @size: size in bytes of the desired allocation
+ * @gfp: gfp flags used if the pool needs to grow
+ * @handle: handle of the new allocation
+ *
+ * Return: 0 if success and handle is set, otherwise -EINVAL if the size or
+ * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate
+ * a new slot.
+ */
+static int zblock_alloc(struct zblock_pool *pool, size_t size, gfp_t gfp,
+ unsigned long *handle)
+{
+ unsigned int block_type, slot;
+ struct zblock_block *block;
+ struct block_list *list;
+
+ if (!size)
+ return -EINVAL;
+
+ if (size > PAGE_SIZE)
+ return -ENOSPC;
+
+ /* find basic block type with suitable slot size */
+ for (block_type = 0; block_type < ARRAY_SIZE(block_desc); block_type++) {
+ if (size <= block_desc[block_type].slot_size)
+ break;
+ }
+ list = &(pool->block_lists[block_type]);
+
+check:
+ /* check if there are free slots in cache */
+ spin_lock(&list->lock);
+ block = cache_find_block(list);
+ spin_unlock(&list->lock);
+ if (block)
+ goto found;
+
+ /* not found block with free slots try to allocate new empty block */
+ if (atomic_cmpxchg(&pool->alloc_flag, 0, 1))
+ goto check;
+ block = alloc_block(pool, block_type, gfp, handle);
+ atomic_set(&pool->alloc_flag, 0);
+ if (block)
+ return 0;
+ return -ENOMEM;
+
+found:
+ /* find the first free slot in block */
+ for (slot = 0; slot < block_desc[block_type].slots_per_block; slot++) {
+ if (!test_and_set_bit(slot*2 + BIT_SLOT_OCCUPIED,
+ (unsigned long *)&block->slot_info))
+ break;
+ }
+ *handle = metadata_to_handle(block, block_type, slot);
+ return 0;
+}
+
+/**
+ * zblock_free() - frees the allocation associated with the given handle
+ * @pool: pool in which the allocation resided
+ * @handle: handle associated with the allocation returned by zblock_alloc()
+ *
+ */
+static void zblock_free(struct zblock_pool *pool, unsigned long handle)
+{
+ unsigned int slot, block_type;
+ struct zblock_block *block;
+ struct block_list *list;
+
+ block = handle_to_metadata(handle, &block_type, &slot);
+ list = &(pool->block_lists[block_type]);
+
+ spin_lock(&list->lock);
+ /* if all slots in block are empty delete whole block */
+ if (atomic_inc_return(&block->free_slots) == block_desc[block_type].slots_per_block) {
+ list->block_count--;
+ cache_remove_block(list, block);
+ spin_unlock(&list->lock);
+ free_pages((unsigned long)block, block_desc[block_type].order);
+ return;
+ }
+
+ if (atomic_read(&block->free_slots) < block_desc[block_type].slots_per_block/2
+ && block->cache_idx == -1)
+ cache_insert_block(block, list);
+ spin_unlock(&list->lock);
+
+ clear_bit(slot*2 + BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info);
+}
+
+/**
+ * zblock_map() - maps the allocation associated with the given handle
+ * @pool: pool in which the allocation resides
+ * @handle: handle associated with the allocation to be mapped
+ *
+ *
+ * Returns: a pointer to the mapped allocation
+ */
+static void *zblock_map(struct zblock_pool *pool, unsigned long handle)
+{
+ unsigned int block_type, slot;
+ struct zblock_block *block;
+ void *p;
+
+ block = handle_to_metadata(handle, &block_type, &slot);
+ p = (void *)block + ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size;
+ return p;
+}
+
+/**
+ * zblock_unmap() - unmaps the allocation associated with the given handle
+ * @pool: pool in which the allocation resides
+ * @handle: handle associated with the allocation to be unmapped
+ */
+static void zblock_unmap(struct zblock_pool *pool, unsigned long handle)
+{
+}
+
+/**
+ * zblock_get_total_pages() - gets the zblock pool size in pages
+ * @pool: pool being queried
+ *
+ * Returns: size in bytes of the given pool.
+ */
+static u64 zblock_get_total_pages(struct zblock_pool *pool)
+{
+ u64 total_size;
+ int i;
+
+ total_size = 0;
+ for (i = 0; i < ARRAY_SIZE(block_desc); i++)
+ total_size += pool->block_lists[i].block_count << block_desc[i].order;
+
+ return total_size;
+}
+
+/*****************
+ * zpool
+ ****************/
+
+static void *zblock_zpool_create(const char *name, gfp_t gfp)
+{
+ return zblock_create_pool(gfp);
+}
+
+static void zblock_zpool_destroy(void *pool)
+{
+ zblock_destroy_pool(pool);
+}
+
+static int zblock_zpool_malloc(void *pool, size_t size, gfp_t gfp,
+ unsigned long *handle)
+{
+ return zblock_alloc(pool, size, gfp, handle);
+}
+
+static void zblock_zpool_free(void *pool, unsigned long handle)
+{
+ zblock_free(pool, handle);
+}
+
+static void *zblock_zpool_map(void *pool, unsigned long handle,
+ enum zpool_mapmode mm)
+{
+ return zblock_map(pool, handle);
+}
+
+static void zblock_zpool_unmap(void *pool, unsigned long handle)
+{
+ zblock_unmap(pool, handle);
+}
+
+static u64 zblock_zpool_total_pages(void *pool)
+{
+ return zblock_get_total_pages(pool);
+}
+
+static struct zpool_driver zblock_zpool_driver = {
+ .type = "zblock",
+ .owner = THIS_MODULE,
+ .create = zblock_zpool_create,
+ .destroy = zblock_zpool_destroy,
+ .malloc = zblock_zpool_malloc,
+ .free = zblock_zpool_free,
+ .map = zblock_zpool_map,
+ .unmap = zblock_zpool_unmap,
+ .total_pages = zblock_zpool_total_pages,
+};
+
+MODULE_ALIAS("zpool-zblock");
+
+static int __init init_zblock(void)
+{
+ pr_info("loaded\n");
+ zpool_register_driver(&zblock_zpool_driver);
+ return 0;
+}
+
+static void __exit exit_zblock(void)
+{
+ zpool_unregister_driver(&zblock_zpool_driver);
+ pr_info("unloaded\n");
+}
+
+module_init(init_zblock);
+module_exit(exit_zblock);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Vitaly Wool <vitaly.wool@konsulko.com>");
+MODULE_DESCRIPTION("Block allocator for compressed pages");
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-01 17:17 [PATCH] mm: add zblock allocator Vitaly Wool
@ 2025-04-01 18:24 ` Nhat Pham
2025-04-01 21:44 ` Vitaly
2025-04-01 23:16 ` Shakeel Butt
2025-04-02 13:03 ` kernel test robot
2 siblings, 1 reply; 8+ messages in thread
From: Nhat Pham @ 2025-04-01 18:24 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-mm, akpm, Igor Belousov
On Tue, Apr 1, 2025 at 10:18 AM Vitaly Wool <vitaly.wool@konsulko.se> wrote:
>
> zblock is a special purpose allocator for storing compressed pages.
> It stores integer number of compressed objects per its block. These
> blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
Haven't taken a close look yet, but as a general principle I don't
mind having a separate allocator for a separate use case.
Some quick notes (will do a careful review later):
>
> With zblock, it is possible to densely arrange objects of various sizes
> resulting in low internal fragmentation. Also this allocator tries to
> fill incomplete blocks instead of adding new ones, in many cases
> providing a compression ratio substantially higher than z3fold and zbud
> (though lower than zmalloc's).
Do we have data for comparison here?
>
> zblock does not require MMU to operate and also is superior to zsmalloc
This is not an actual meaningful distinction. CONFIG_SWAP depends on CONFIG_MMU:
menuconfig SWAP
bool "Support for paging of anonymous memory (swap)"
depends on MMU && BLOCK && !ARCH_NO_SWAP
> with regard to average performance and worst execution times, thus
> allowing for better response time and real-time characteristics of the
> whole system.
By performance, do you mean latency or throughput or storage density?
>
> E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
> 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
>
> Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
> Signed-off-by: Igor Belousov <igor.b@beldev.am>
> ---
> Documentation/mm/zblock.rst | 22 ++
> MAINTAINERS | 7 +
> mm/Kconfig | 8 +
> mm/Makefile | 1 +
> mm/zblock.c | 492 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 530 insertions(+)
> create mode 100644 Documentation/mm/zblock.rst
> create mode 100644 mm/zblock.c
>
> diff --git a/Documentation/mm/zblock.rst b/Documentation/mm/zblock.rst
> new file mode 100644
> index 000000000000..754b3dbb9e94
> --- /dev/null
> +++ b/Documentation/mm/zblock.rst
> @@ -0,0 +1,22 @@
> +======
> +zblock
> +======
> +
> +zblock is a special purpose allocator for storing compressed pages.
> +It stores integer number of compressed objects per its block. These
> +blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
> +
> +With zblock, it is possible to densely arrange objects of various sizes
> +resulting in low internal fragmentation. Also this allocator tries to
> +fill incomplete blocks instead of adding new ones, in many cases
> +providing a compression ratio substantially higher than z3fold and zbud
> +(though lower than zmalloc's).
> +
> +zblock does not require MMU to operate and also is superior to zsmalloc
Same note as above.
> +with regard to average performance and worst execution times, thus
> +allowing for better response time and real-time characteristics of the
> +whole system.
> +
> +E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
> +5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
> +
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 991a33bad10e..166e9bfa04dc 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -26313,6 +26313,13 @@ F: Documentation/networking/device_drivers/hamradio/z8530drv.rst
> F: drivers/net/hamradio/*scc.c
> F: drivers/net/hamradio/z8530.h
>
> +ZBLOCK COMPRESSED SLAB MEMORY ALLOCATOR
> +M: Vitaly Wool <vitaly.wool@konsulko.se>
> +L: linux-mm@kvack.org
> +S: Maintained
> +F: Documentation/mm/zblock.rst
> +F: mm/zblock.c
> +
> ZBUD COMPRESSED PAGE ALLOCATOR
> M: Seth Jennings <sjenning@redhat.com>
> M: Dan Streetman <ddstreet@ieee.org>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 1b501db06417..26b79e3c1300 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -193,6 +193,14 @@ config Z3FOLD_DEPRECATED
> page. It is a ZBUD derivative so the simplicity and determinism are
> still there.
>
> +config ZBLOCK
> + tristate "Fast compression allocator with high density"
> + depends on ZPOOL
> + help
> + A special purpose allocator for storing compressed pages.
> + It is designed to store same size compressed pages in blocks of
> + physical pages.
> +
> config Z3FOLD
> tristate
> default y if Z3FOLD_DEPRECATED=y
> diff --git a/mm/Makefile b/mm/Makefile
> index 850386a67b3e..2018455b7baa 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -116,6 +116,7 @@ obj-$(CONFIG_ZPOOL) += zpool.o
> obj-$(CONFIG_ZBUD) += zbud.o
> obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
> obj-$(CONFIG_Z3FOLD) += z3fold.o
> +obj-$(CONFIG_ZBLOCK) += zblock.o
> obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
> obj-$(CONFIG_CMA) += cma.o
> obj-$(CONFIG_NUMA) += numa.o
> diff --git a/mm/zblock.c b/mm/zblock.c
> new file mode 100644
> index 000000000000..a6778653c451
> --- /dev/null
> +++ b/mm/zblock.c
> @@ -0,0 +1,492 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * zblock.c
> + *
> + * Author: Vitaly Wool <vitaly.wool@konsulko.com>
> + * Based on the work from Ananda Badmaev <a.badmaev@clicknet.pro>
> + * Copyright (C) 2022-2024, Konsulko AB.
> + *
> + * Zblock is a small object allocator with the intention to serve as a
> + * zpool backend. It operates on page blocks which consist of number
> + * of physical pages being a power of 2 and store integer number of
> + * compressed pages per block which results in determinism and simplicity.
> + *
> + * zblock doesn't export any API and is meant to be used via zpool API.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/atomic.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/preempt.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/zpool.h>
> +
> +#define SLOT_FREE 0
> +#define BIT_SLOT_OCCUPIED 0
> +#define BIT_SLOT_MAPPED 1
> +
> +#define SLOT_BITS 5
> +#define MAX_SLOTS (1 << SLOT_BITS)
> +#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1)
> +
> +#define ZBLOCK_HEADER_SIZE round_up(sizeof(struct zblock_block), sizeof(long))
> +#define BLOCK_DATA_SIZE(order) ((PAGE_SIZE << order) - ZBLOCK_HEADER_SIZE)
> +#define SLOT_SIZE(nslots, order) (round_down((BLOCK_DATA_SIZE(order) / nslots), sizeof(long)))
> +
> +#define BLOCK_CACHE_SIZE 32
> +
> +struct zblock_pool;
> +
> +/**
> + * struct zblock_block - block metadata
> + * Block consists of several (1/2/4/8) pages and contains fixed
> + * integer number of slots for allocating compressed pages.
> + *
> + * free_slots: number of free slots in the block
> + * slot_info: contains data about free/occupied slots
> + * cache_idx: index of the block in cache
> + */
> +struct zblock_block {
> + atomic_t free_slots;
> + u64 slot_info[1];
> + int cache_idx;
> +};
> +
> +/**
> + * struct block_desc - general metadata for block lists
> + * Each block list stores only blocks of corresponding type which means
> + * that all blocks in it have the same number and size of slots.
> + * All slots are aligned to size of long.
> + *
> + * slot_size: size of slot for this list
> + * slots_per_block: number of slots per block for this list
> + * order: order for __get_free_pages
> + */
> +static const struct block_desc {
> + const unsigned int slot_size;
> + const unsigned short slots_per_block;
> + const unsigned short order;
> +} block_desc[] = {
> + { SLOT_SIZE(32, 0), 32, 0 },
> + { SLOT_SIZE(22, 0), 22, 0 },
> + { SLOT_SIZE(17, 0), 17, 0 },
> + { SLOT_SIZE(13, 0), 13, 0 },
> + { SLOT_SIZE(11, 0), 11, 0 },
> + { SLOT_SIZE(9, 0), 9, 0 },
> + { SLOT_SIZE(8, 0), 8, 0 },
> + { SLOT_SIZE(14, 1), 14, 1 },
> + { SLOT_SIZE(12, 1), 12, 1 },
> + { SLOT_SIZE(11, 1), 11, 1 },
> + { SLOT_SIZE(10, 1), 10, 1 },
> + { SLOT_SIZE(9, 1), 9, 1 },
> + { SLOT_SIZE(8, 1), 8, 1 },
> + { SLOT_SIZE(15, 2), 15, 2 },
> + { SLOT_SIZE(14, 2), 14, 2 },
> + { SLOT_SIZE(13, 2), 13, 2 },
> + { SLOT_SIZE(12, 2), 12, 2 },
> + { SLOT_SIZE(11, 2), 11, 2 },
> + { SLOT_SIZE(10, 2), 10, 2 },
> + { SLOT_SIZE(9, 2), 9, 2 },
> + { SLOT_SIZE(8, 2), 8, 2 },
> + { SLOT_SIZE(15, 3), 15, 3 },
> + { SLOT_SIZE(14, 3), 14, 3 },
> + { SLOT_SIZE(13, 3), 13, 3 },
> + { SLOT_SIZE(12, 3), 12, 3 },
> + { SLOT_SIZE(11, 3), 11, 3 },
> + { SLOT_SIZE(10, 3), 10, 3 },
> + { SLOT_SIZE(9, 3), 9, 3 },
> + { SLOT_SIZE(7, 3), 7, 3 }
> +};
> +
> +/**
> + * struct block_list - stores metadata of particular list
> + * lock: protects block_cache
> + * block_cache: blocks with free slots
> + * block_count: total number of blocks in the list
> + */
> +struct block_list {
> + spinlock_t lock;
> + struct zblock_block *block_cache[BLOCK_CACHE_SIZE];
> + unsigned long block_count;
> +};
> +
> +/**
> + * struct zblock_pool - stores metadata for each zblock pool
> + * @block_lists: array of block lists
> + * @zpool: zpool driver
> + * @alloc_flag: protects block allocation from memory leak
> + *
> + * This structure is allocated at pool creation time and maintains metadata
> + * for a particular zblock pool.
> + */
> +struct zblock_pool {
> + struct block_list block_lists[ARRAY_SIZE(block_desc)];
> + struct zpool *zpool;
> + atomic_t alloc_flag;
> +};
> +
> +/*****************
> + * Helpers
> + *****************/
> +
> +static int cache_insert_block(struct zblock_block *block, struct block_list *list)
> +{
> + unsigned int i, min_free_slots = atomic_read(&block->free_slots);
> + int min_index = -1;
> +
> + if (WARN_ON(block->cache_idx != -1))
> + return -EINVAL;
> +
> + min_free_slots = atomic_read(&block->free_slots);
> + for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
> + if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) {
> + min_index = i;
> + break;
> + }
> + if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) {
> + min_free_slots = atomic_read(&(list->block_cache[i])->free_slots);
> + min_index = i;
> + }
> + }
> + if (min_index >= 0) {
> + if (list->block_cache[min_index])
> + (list->block_cache[min_index])->cache_idx = -1;
> + list->block_cache[min_index] = block;
> + block->cache_idx = min_index;
> + }
> + return min_index < 0 ? min_index : 0;
> +}
> +
> +static struct zblock_block *cache_find_block(struct block_list *list)
> +{
> + int i;
> + struct zblock_block *z = NULL;
> +
> + for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
> + if (list->block_cache[i] &&
> + atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) {
> + z = list->block_cache[i];
> + break;
> + }
> + }
> + return z;
> +}
> +
> +static int cache_remove_block(struct block_list *list, struct zblock_block *block)
> +{
> + int idx = block->cache_idx;
> +
> + block->cache_idx = -1;
> + if (idx >= 0)
> + list->block_cache[idx] = NULL;
> + return idx < 0 ? idx : 0;
> +}
> +
> +/*
> + * Encodes the handle of a particular slot in the pool using metadata
> + */
> +static inline unsigned long metadata_to_handle(struct zblock_block *block,
> + unsigned int block_type, unsigned int slot)
> +{
> + return (unsigned long)(block) + (block_type << SLOT_BITS) + slot;
> +}
> +
> +/* Returns block, block type and slot in the pool corresponding to handle */
> +static inline struct zblock_block *handle_to_metadata(unsigned long handle,
> + unsigned int *block_type, unsigned int *slot)
> +{
> + *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS;
> + *slot = handle & SLOT_MASK;
> + return (struct zblock_block *)(handle & PAGE_MASK);
> +}
> +
> +
> +/*
> + * allocate new block and add it to corresponding block list
> + */
> +static struct zblock_block *alloc_block(struct zblock_pool *pool,
> + int block_type, gfp_t gfp,
> + unsigned long *handle)
> +{
> + struct zblock_block *block;
> + struct block_list *list;
> +
> + block = (void *)__get_free_pages(gfp, block_desc[block_type].order);
> + if (!block)
> + return NULL;
> +
> + list = &(pool->block_lists)[block_type];
> +
> + /* init block data */
> + memset(&block->slot_info, 0, sizeof(block->slot_info));
> + atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1);
> + block->cache_idx = -1;
> + set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info);
> + *handle = metadata_to_handle(block, block_type, 0);
> +
> + spin_lock(&list->lock);
> + cache_insert_block(block, list);
> + list->block_count++;
> + spin_unlock(&list->lock);
> + return block;
> +}
> +
> +/*****************
> + * API Functions
> + *****************/
> +/**
> + * zblock_create_pool() - create a new zblock pool
> + * @gfp: gfp flags when allocating the zblock pool structure
> + * @ops: user-defined operations for the zblock pool
> + *
> + * Return: pointer to the new zblock pool or NULL if the metadata allocation
> + * failed.
> + */
> +static struct zblock_pool *zblock_create_pool(gfp_t gfp)
> +{
> + struct zblock_pool *pool;
> + struct block_list *list;
> + int i, j;
> +
> + pool = kmalloc(sizeof(struct zblock_pool), gfp);
> + if (!pool)
> + return NULL;
> +
> + /* init each block list */
> + for (i = 0; i < ARRAY_SIZE(block_desc); i++) {
> + list = &(pool->block_lists)[i];
> + spin_lock_init(&list->lock);
> + for (j = 0; j < BLOCK_CACHE_SIZE; j++)
> + list->block_cache[j] = NULL;
> + list->block_count = 0;
> + }
> + atomic_set(&pool->alloc_flag, 0);
> + return pool;
> +}
> +
> +/**
> + * zblock_destroy_pool() - destroys an existing zblock pool
> + * @pool: the zblock pool to be destroyed
> + *
> + */
> +static void zblock_destroy_pool(struct zblock_pool *pool)
> +{
> + kfree(pool);
> +}
> +
> +
> +/**
> + * zblock_alloc() - allocates a slot of appropriate size
> + * @pool: zblock pool from which to allocate
> + * @size: size in bytes of the desired allocation
> + * @gfp: gfp flags used if the pool needs to grow
> + * @handle: handle of the new allocation
> + *
> + * Return: 0 if success and handle is set, otherwise -EINVAL if the size or
> + * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate
> + * a new slot.
> + */
> +static int zblock_alloc(struct zblock_pool *pool, size_t size, gfp_t gfp,
> + unsigned long *handle)
> +{
> + unsigned int block_type, slot;
> + struct zblock_block *block;
> + struct block_list *list;
> +
> + if (!size)
> + return -EINVAL;
> +
> + if (size > PAGE_SIZE)
> + return -ENOSPC;
> +
> + /* find basic block type with suitable slot size */
> + for (block_type = 0; block_type < ARRAY_SIZE(block_desc); block_type++) {
> + if (size <= block_desc[block_type].slot_size)
> + break;
> + }
> + list = &(pool->block_lists[block_type]);
> +
> +check:
> + /* check if there are free slots in cache */
> + spin_lock(&list->lock);
> + block = cache_find_block(list);
> + spin_unlock(&list->lock);
> + if (block)
> + goto found;
> +
> + /* not found block with free slots try to allocate new empty block */
> + if (atomic_cmpxchg(&pool->alloc_flag, 0, 1))
> + goto check;
> + block = alloc_block(pool, block_type, gfp, handle);
> + atomic_set(&pool->alloc_flag, 0);
> + if (block)
> + return 0;
> + return -ENOMEM;
> +
> +found:
> + /* find the first free slot in block */
> + for (slot = 0; slot < block_desc[block_type].slots_per_block; slot++) {
> + if (!test_and_set_bit(slot*2 + BIT_SLOT_OCCUPIED,
> + (unsigned long *)&block->slot_info))
> + break;
> + }
> + *handle = metadata_to_handle(block, block_type, slot);
> + return 0;
> +}
> +
> +/**
> + * zblock_free() - frees the allocation associated with the given handle
> + * @pool: pool in which the allocation resided
> + * @handle: handle associated with the allocation returned by zblock_alloc()
> + *
> + */
> +static void zblock_free(struct zblock_pool *pool, unsigned long handle)
> +{
> + unsigned int slot, block_type;
> + struct zblock_block *block;
> + struct block_list *list;
> +
> + block = handle_to_metadata(handle, &block_type, &slot);
> + list = &(pool->block_lists[block_type]);
> +
> + spin_lock(&list->lock);
> + /* if all slots in block are empty delete whole block */
> + if (atomic_inc_return(&block->free_slots) == block_desc[block_type].slots_per_block) {
> + list->block_count--;
> + cache_remove_block(list, block);
> + spin_unlock(&list->lock);
> + free_pages((unsigned long)block, block_desc[block_type].order);
> + return;
> + }
> +
> + if (atomic_read(&block->free_slots) < block_desc[block_type].slots_per_block/2
> + && block->cache_idx == -1)
> + cache_insert_block(block, list);
> + spin_unlock(&list->lock);
> +
> + clear_bit(slot*2 + BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info);
> +}
> +
> +/**
> + * zblock_map() - maps the allocation associated with the given handle
> + * @pool: pool in which the allocation resides
> + * @handle: handle associated with the allocation to be mapped
> + *
> + *
> + * Returns: a pointer to the mapped allocation
> + */
> +static void *zblock_map(struct zblock_pool *pool, unsigned long handle)
> +{
> + unsigned int block_type, slot;
> + struct zblock_block *block;
> + void *p;
> +
> + block = handle_to_metadata(handle, &block_type, &slot);
> + p = (void *)block + ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size;
> + return p;
> +}
> +
> +/**
> + * zblock_unmap() - unmaps the allocation associated with the given handle
> + * @pool: pool in which the allocation resides
> + * @handle: handle associated with the allocation to be unmapped
> + */
> +static void zblock_unmap(struct zblock_pool *pool, unsigned long handle)
> +{
> +}
> +
> +/**
> + * zblock_get_total_pages() - gets the zblock pool size in pages
> + * @pool: pool being queried
> + *
> + * Returns: size in bytes of the given pool.
> + */
> +static u64 zblock_get_total_pages(struct zblock_pool *pool)
> +{
> + u64 total_size;
> + int i;
> +
> + total_size = 0;
> + for (i = 0; i < ARRAY_SIZE(block_desc); i++)
> + total_size += pool->block_lists[i].block_count << block_desc[i].order;
> +
> + return total_size;
> +}
> +
> +/*****************
> + * zpool
> + ****************/
> +
> +static void *zblock_zpool_create(const char *name, gfp_t gfp)
> +{
> + return zblock_create_pool(gfp);
> +}
> +
> +static void zblock_zpool_destroy(void *pool)
> +{
> + zblock_destroy_pool(pool);
> +}
> +
> +static int zblock_zpool_malloc(void *pool, size_t size, gfp_t gfp,
> + unsigned long *handle)
> +{
> + return zblock_alloc(pool, size, gfp, handle);
> +}
> +
> +static void zblock_zpool_free(void *pool, unsigned long handle)
> +{
> + zblock_free(pool, handle);
> +}
> +
> +static void *zblock_zpool_map(void *pool, unsigned long handle,
> + enum zpool_mapmode mm)
> +{
> + return zblock_map(pool, handle);
> +}
> +
> +static void zblock_zpool_unmap(void *pool, unsigned long handle)
> +{
> + zblock_unmap(pool, handle);
> +}
> +
> +static u64 zblock_zpool_total_pages(void *pool)
> +{
> + return zblock_get_total_pages(pool);
> +}
> +
> +static struct zpool_driver zblock_zpool_driver = {
> + .type = "zblock",
> + .owner = THIS_MODULE,
> + .create = zblock_zpool_create,
> + .destroy = zblock_zpool_destroy,
> + .malloc = zblock_zpool_malloc,
> + .free = zblock_zpool_free,
> + .map = zblock_zpool_map,
> + .unmap = zblock_zpool_unmap,
> + .total_pages = zblock_zpool_total_pages,
> +};
> +
> +MODULE_ALIAS("zpool-zblock");
> +
> +static int __init init_zblock(void)
> +{
> + pr_info("loaded\n");
> + zpool_register_driver(&zblock_zpool_driver);
> + return 0;
> +}
> +
> +static void __exit exit_zblock(void)
> +{
> + zpool_unregister_driver(&zblock_zpool_driver);
> + pr_info("unloaded\n");
> +}
> +
> +module_init(init_zblock);
> +module_exit(exit_zblock);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Vitaly Wool <vitaly.wool@konsulko.com>");
> +MODULE_DESCRIPTION("Block allocator for compressed pages");
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-01 18:24 ` Nhat Pham
@ 2025-04-01 21:44 ` Vitaly
0 siblings, 0 replies; 8+ messages in thread
From: Vitaly @ 2025-04-01 21:44 UTC (permalink / raw)
To: Nhat Pham; +Cc: linux-mm, akpm, Igor Belousov
[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]
Hi Nhat,
On Tuesday, April 1, 2025 at 8:24:15 pm +02:00, Nhat Pham <nphamcs@gmail.com> wrote:
>> With zblock, it is possible to densely arrange objects of various sizes
>> resulting in low internal fragmentation. Also this allocator tries to
>> fill incomplete blocks instead of adding new ones, in many cases
>> providing a compression ratio substantially higher than z3fold and zbud
>> (though lower than zmalloc's).
>> Do we have data for comparison here?
Yeah, for instance we normally run *stress-ng --vm 4 --vm-bytes 4G --vm-keep --timeout 10m --metrics-brief*
zblock:
stress-ng: metrc: [499] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [499] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [499] vm 43755994 600.53 2382.34 15.19 72861.92 18250.50
zsmalloc:
stress-ng: metrc: [491] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [491] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [491] vm 41769859 601.37 2381.85 16.56 69457.56 17415.62
which gives just a little short of 5% of advantage for zblock.
>> with regard to average performance and worst execution times, thus
>> allowing for better response time and real-time characteristics of the
>> whole system.
>> By performance, do you mean latency or throughput or storage density?
Latency and throughput. It's hard to compete with zsmalloc in storage density indeed.
~Vitaly
[-- Attachment #2.1: Type: text/html, Size: 2274 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-01 17:17 [PATCH] mm: add zblock allocator Vitaly Wool
2025-04-01 18:24 ` Nhat Pham
@ 2025-04-01 23:16 ` Shakeel Butt
2025-04-02 6:45 ` igor.b
2025-04-02 13:03 ` kernel test robot
2 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2025-04-01 23:16 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-mm, akpm, Igor Belousov
Hi Vitaly,
On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote:
> zblock is a special purpose allocator for storing compressed pages.
> It stores integer number of compressed objects per its block. These
> blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
>
> With zblock, it is possible to densely arrange objects of various sizes
> resulting in low internal fragmentation. Also this allocator tries to
> fill incomplete blocks instead of adding new ones, in many cases
> providing a compression ratio substantially higher than z3fold and zbud
> (though lower than zmalloc's).
>
> zblock does not require MMU
Can you explain why not requiring MMU is important for your use-case?
Also what exactly is your use-case? Are you planning to use zblock
through zram or zswap or something new?
> to operate and also is superior to zsmalloc
> with regard to average performance and worst execution times, thus
> allowing for better response time and real-time characteristics of the
> whole system.
>
> E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
> 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
Can you explain a bit more on this test? How is this test using zblock?
thanks,
Shakeel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-01 23:16 ` Shakeel Butt
@ 2025-04-02 6:45 ` igor.b
2025-04-02 16:24 ` Shakeel Butt
2025-04-03 21:54 ` Nhat Pham
0 siblings, 2 replies; 8+ messages in thread
From: igor.b @ 2025-04-02 6:45 UTC (permalink / raw)
To: Shakeel Butt; +Cc: Vitaly Wool, linux-mm, akpm
Hi Shakeel,
2025-04-02 03:16 Shakeel Butt wrote:
> Hi Vitaly,
>
> On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote:
>> zblock is a special purpose allocator for storing compressed pages.
>> It stores integer number of compressed objects per its block. These
>> blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
>>
>> With zblock, it is possible to densely arrange objects of various
>> sizes
>> resulting in low internal fragmentation. Also this allocator tries to
>> fill incomplete blocks instead of adding new ones, in many cases
>> providing a compression ratio substantially higher than z3fold and
>> zbud
>> (though lower than zmalloc's).
>>
>> zblock does not require MMU
>
> Can you explain why not requiring MMU is important for your use-case?
> Also what exactly is your use-case? Are you planning to use zblock
> through zram or zswap or something new?
We have 2 use cases for a zpool backend: data centers (zswap) and small
MMU less devices, this is where zram comes ito play. We know that zram
over zpool patch is still out of tree but we also know the story behind
that :)
>> to operate and also is superior to zsmalloc
>> with regard to average performance and worst execution times, thus
>> allowing for better response time and real-time characteristics of the
>> whole system.
>>
>> E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
>> 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
>
> Can you explain a bit more on this test? How is this test using zblock?
We're running stress-ng on a number of devices. The results published in
the previous message are from the run on Raspberry Pi 5. zpool's
settings used for comparison are lz4/80%.
Thanks,
Igor
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-01 17:17 [PATCH] mm: add zblock allocator Vitaly Wool
2025-04-01 18:24 ` Nhat Pham
2025-04-01 23:16 ` Shakeel Butt
@ 2025-04-02 13:03 ` kernel test robot
2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-04-02 13:03 UTC (permalink / raw)
To: Vitaly Wool, linux-mm
Cc: llvm, oe-kbuild-all, akpm, Vitaly Wool, Igor Belousov
Hi Vitaly,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.14]
[cannot apply to akpm-mm/mm-everything next-20250402]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vitaly-Wool/mm-add-zblock-allocator/20250402-011953
base: linus/master
patch link: https://lore.kernel.org/r/20250401171754.2686501-1-vitaly.wool%40konsulko.se
patch subject: [PATCH] mm: add zblock allocator
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20250402/202504022017.kHAz8bGB-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250402/202504022017.kHAz8bGB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504022017.kHAz8bGB-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/zblock.c:56: warning: Function parameter or struct member 'free_slots' not described in 'zblock_block'
>> mm/zblock.c:56: warning: Function parameter or struct member 'slot_info' not described in 'zblock_block'
>> mm/zblock.c:56: warning: Function parameter or struct member 'cache_idx' not described in 'zblock_block'
>> mm/zblock.c:102: warning: Function parameter or struct member 'slot_size' not described in 'block_desc'
>> mm/zblock.c:102: warning: Function parameter or struct member 'slots_per_block' not described in 'block_desc'
>> mm/zblock.c:102: warning: Function parameter or struct member 'order' not described in 'block_desc'
>> mm/zblock.c:102: warning: Function parameter or struct member 'block_desc' not described in 'block_desc'
>> mm/zblock.c:114: warning: Function parameter or struct member 'lock' not described in 'block_list'
>> mm/zblock.c:114: warning: Function parameter or struct member 'block_cache' not described in 'block_list'
>> mm/zblock.c:114: warning: Function parameter or struct member 'block_count' not described in 'block_list'
>> mm/zblock.c:249: warning: Excess function parameter 'ops' description in 'zblock_create_pool'
vim +56 mm/zblock.c
42
43 /**
44 * struct zblock_block - block metadata
45 * Block consists of several (1/2/4/8) pages and contains fixed
46 * integer number of slots for allocating compressed pages.
47 *
48 * free_slots: number of free slots in the block
49 * slot_info: contains data about free/occupied slots
50 * cache_idx: index of the block in cache
51 */
52 struct zblock_block {
53 atomic_t free_slots;
54 u64 slot_info[1];
55 int cache_idx;
> 56 };
57
58 /**
59 * struct block_desc - general metadata for block lists
60 * Each block list stores only blocks of corresponding type which means
61 * that all blocks in it have the same number and size of slots.
62 * All slots are aligned to size of long.
63 *
64 * slot_size: size of slot for this list
65 * slots_per_block: number of slots per block for this list
66 * order: order for __get_free_pages
67 */
68 static const struct block_desc {
69 const unsigned int slot_size;
70 const unsigned short slots_per_block;
71 const unsigned short order;
72 } block_desc[] = {
73 { SLOT_SIZE(32, 0), 32, 0 },
74 { SLOT_SIZE(22, 0), 22, 0 },
75 { SLOT_SIZE(17, 0), 17, 0 },
76 { SLOT_SIZE(13, 0), 13, 0 },
77 { SLOT_SIZE(11, 0), 11, 0 },
78 { SLOT_SIZE(9, 0), 9, 0 },
79 { SLOT_SIZE(8, 0), 8, 0 },
80 { SLOT_SIZE(14, 1), 14, 1 },
81 { SLOT_SIZE(12, 1), 12, 1 },
82 { SLOT_SIZE(11, 1), 11, 1 },
83 { SLOT_SIZE(10, 1), 10, 1 },
84 { SLOT_SIZE(9, 1), 9, 1 },
85 { SLOT_SIZE(8, 1), 8, 1 },
86 { SLOT_SIZE(15, 2), 15, 2 },
87 { SLOT_SIZE(14, 2), 14, 2 },
88 { SLOT_SIZE(13, 2), 13, 2 },
89 { SLOT_SIZE(12, 2), 12, 2 },
90 { SLOT_SIZE(11, 2), 11, 2 },
91 { SLOT_SIZE(10, 2), 10, 2 },
92 { SLOT_SIZE(9, 2), 9, 2 },
93 { SLOT_SIZE(8, 2), 8, 2 },
94 { SLOT_SIZE(15, 3), 15, 3 },
95 { SLOT_SIZE(14, 3), 14, 3 },
96 { SLOT_SIZE(13, 3), 13, 3 },
97 { SLOT_SIZE(12, 3), 12, 3 },
98 { SLOT_SIZE(11, 3), 11, 3 },
99 { SLOT_SIZE(10, 3), 10, 3 },
100 { SLOT_SIZE(9, 3), 9, 3 },
101 { SLOT_SIZE(7, 3), 7, 3 }
> 102 };
103
104 /**
105 * struct block_list - stores metadata of particular list
106 * lock: protects block_cache
107 * block_cache: blocks with free slots
108 * block_count: total number of blocks in the list
109 */
110 struct block_list {
111 spinlock_t lock;
112 struct zblock_block *block_cache[BLOCK_CACHE_SIZE];
113 unsigned long block_count;
> 114 };
115
116 /**
117 * struct zblock_pool - stores metadata for each zblock pool
118 * @block_lists: array of block lists
119 * @zpool: zpool driver
120 * @alloc_flag: protects block allocation from memory leak
121 *
122 * This structure is allocated at pool creation time and maintains metadata
123 * for a particular zblock pool.
124 */
125 struct zblock_pool {
126 struct block_list block_lists[ARRAY_SIZE(block_desc)];
127 struct zpool *zpool;
128 atomic_t alloc_flag;
129 };
130
131 /*****************
132 * Helpers
133 *****************/
134
135 static int cache_insert_block(struct zblock_block *block, struct block_list *list)
136 {
137 unsigned int i, min_free_slots = atomic_read(&block->free_slots);
138 int min_index = -1;
139
140 if (WARN_ON(block->cache_idx != -1))
141 return -EINVAL;
142
143 min_free_slots = atomic_read(&block->free_slots);
144 for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
145 if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) {
146 min_index = i;
147 break;
148 }
149 if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) {
150 min_free_slots = atomic_read(&(list->block_cache[i])->free_slots);
151 min_index = i;
152 }
153 }
154 if (min_index >= 0) {
155 if (list->block_cache[min_index])
156 (list->block_cache[min_index])->cache_idx = -1;
157 list->block_cache[min_index] = block;
158 block->cache_idx = min_index;
159 }
160 return min_index < 0 ? min_index : 0;
161 }
162
163 static struct zblock_block *cache_find_block(struct block_list *list)
164 {
165 int i;
166 struct zblock_block *z = NULL;
167
168 for (i = 0; i < BLOCK_CACHE_SIZE; i++) {
169 if (list->block_cache[i] &&
170 atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) {
171 z = list->block_cache[i];
172 break;
173 }
174 }
175 return z;
176 }
177
178 static int cache_remove_block(struct block_list *list, struct zblock_block *block)
179 {
180 int idx = block->cache_idx;
181
182 block->cache_idx = -1;
183 if (idx >= 0)
184 list->block_cache[idx] = NULL;
185 return idx < 0 ? idx : 0;
186 }
187
188 /*
189 * Encodes the handle of a particular slot in the pool using metadata
190 */
191 static inline unsigned long metadata_to_handle(struct zblock_block *block,
192 unsigned int block_type, unsigned int slot)
193 {
194 return (unsigned long)(block) + (block_type << SLOT_BITS) + slot;
195 }
196
197 /* Returns block, block type and slot in the pool corresponding to handle */
198 static inline struct zblock_block *handle_to_metadata(unsigned long handle,
199 unsigned int *block_type, unsigned int *slot)
200 {
201 *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS;
202 *slot = handle & SLOT_MASK;
203 return (struct zblock_block *)(handle & PAGE_MASK);
204 }
205
206
207 /*
208 * allocate new block and add it to corresponding block list
209 */
210 static struct zblock_block *alloc_block(struct zblock_pool *pool,
211 int block_type, gfp_t gfp,
212 unsigned long *handle)
213 {
214 struct zblock_block *block;
215 struct block_list *list;
216
217 block = (void *)__get_free_pages(gfp, block_desc[block_type].order);
218 if (!block)
219 return NULL;
220
221 list = &(pool->block_lists)[block_type];
222
223 /* init block data */
224 memset(&block->slot_info, 0, sizeof(block->slot_info));
225 atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1);
226 block->cache_idx = -1;
227 set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info);
228 *handle = metadata_to_handle(block, block_type, 0);
229
230 spin_lock(&list->lock);
231 cache_insert_block(block, list);
232 list->block_count++;
233 spin_unlock(&list->lock);
234 return block;
235 }
236
237 /*****************
238 * API Functions
239 *****************/
240 /**
241 * zblock_create_pool() - create a new zblock pool
242 * @gfp: gfp flags when allocating the zblock pool structure
243 * @ops: user-defined operations for the zblock pool
244 *
245 * Return: pointer to the new zblock pool or NULL if the metadata allocation
246 * failed.
247 */
248 static struct zblock_pool *zblock_create_pool(gfp_t gfp)
> 249 {
250 struct zblock_pool *pool;
251 struct block_list *list;
252 int i, j;
253
254 pool = kmalloc(sizeof(struct zblock_pool), gfp);
255 if (!pool)
256 return NULL;
257
258 /* init each block list */
259 for (i = 0; i < ARRAY_SIZE(block_desc); i++) {
260 list = &(pool->block_lists)[i];
261 spin_lock_init(&list->lock);
262 for (j = 0; j < BLOCK_CACHE_SIZE; j++)
263 list->block_cache[j] = NULL;
264 list->block_count = 0;
265 }
266 atomic_set(&pool->alloc_flag, 0);
267 return pool;
268 }
269
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-02 6:45 ` igor.b
@ 2025-04-02 16:24 ` Shakeel Butt
2025-04-03 21:54 ` Nhat Pham
1 sibling, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2025-04-02 16:24 UTC (permalink / raw)
To: igor.b; +Cc: Vitaly Wool, linux-mm, akpm
Hi Igor,
On Wed, Apr 02, 2025 at 10:45:25AM +0400, igor.b@beldev.am wrote:
> Hi Shakeel,
>
> 2025-04-02 03:16 Shakeel Butt wrote:
> > Hi Vitaly,
> >
> > On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote:
> > > zblock is a special purpose allocator for storing compressed pages.
> > > It stores integer number of compressed objects per its block. These
> > > blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
> > >
> > > With zblock, it is possible to densely arrange objects of various
> > > sizes
> > > resulting in low internal fragmentation. Also this allocator tries to
> > > fill incomplete blocks instead of adding new ones, in many cases
> > > providing a compression ratio substantially higher than z3fold and
> > > zbud
> > > (though lower than zmalloc's).
> > >
> > > zblock does not require MMU
> >
> > Can you explain why not requiring MMU is important for your use-case?
> > Also what exactly is your use-case? Are you planning to use zblock
> > through zram or zswap or something new?
>
> We have 2 use cases for a zpool backend: data centers (zswap) and small MMU
> less devices, this is where zram comes ito play. We know that zram over
> zpool patch is still out of tree but we also know the story behind that :)
>
So zswap and zram are the use-cases. It is usually frowned upon to add a
general purpose library without an actual user. So, I would recommend to
add a user along with the zblock.
> > > to operate and also is superior to zsmalloc
> > > with regard to average performance and worst execution times, thus
> > > allowing for better response time and real-time characteristics of the
> > > whole system.
> > >
> > > E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get
> > > 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison.
> >
> > Can you explain a bit more on this test? How is this test using zblock?
>
> We're running stress-ng on a number of devices. The results published in the
> previous message are from the run on Raspberry Pi 5. zpool's settings used
> for comparison are lz4/80%.
>
My question was mainly if this is zswap or zram and I think it is zswap
based on the stress-ng command. For zswap, I would recommend to run perf
experiment on a server (or beefy desktop) as well. Also pick a normal
server like workload. I would be interested in looking at zswapin
latencies and zswapout throuput in addition to the metrics the workload
cares about.
> Thanks,
> Igor
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: add zblock allocator
2025-04-02 6:45 ` igor.b
2025-04-02 16:24 ` Shakeel Butt
@ 2025-04-03 21:54 ` Nhat Pham
1 sibling, 0 replies; 8+ messages in thread
From: Nhat Pham @ 2025-04-03 21:54 UTC (permalink / raw)
To: igor.b; +Cc: Shakeel Butt, Vitaly Wool, linux-mm, akpm
On Tue, Apr 1, 2025 at 11:45 PM <igor.b@beldev.am> wrote:
>
> Hi Shakeel,
>
> 2025-04-02 03:16 Shakeel Butt wrote:
> > Hi Vitaly,
> >
> > On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote:
> >> zblock is a special purpose allocator for storing compressed pages.
> >> It stores integer number of compressed objects per its block. These
> >> blocks consist of several physical pages (2**n, i. e. 1/2/4/8).
> >>
> >> With zblock, it is possible to densely arrange objects of various
> >> sizes
> >> resulting in low internal fragmentation. Also this allocator tries to
> >> fill incomplete blocks instead of adding new ones, in many cases
> >> providing a compression ratio substantially higher than z3fold and
> >> zbud
> >> (though lower than zmalloc's).
> >>
> >> zblock does not require MMU
> >
> > Can you explain why not requiring MMU is important for your use-case?
> > Also what exactly is your use-case? Are you planning to use zblock
> > through zram or zswap or something new?
>
> We have 2 use cases for a zpool backend: data centers (zswap) and small
> MMU less devices, this is where zram comes ito play. We know that zram
> over zpool patch is still out of tree but we also know the story behind
> that :)
In the upstream tree, zram also depends on MMU. It is probably
different in your private tree, but please keep the conversation and
documentation aligned with the reality of upstream :)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-04-03 21:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-01 17:17 [PATCH] mm: add zblock allocator Vitaly Wool
2025-04-01 18:24 ` Nhat Pham
2025-04-01 21:44 ` Vitaly
2025-04-01 23:16 ` Shakeel Butt
2025-04-02 6:45 ` igor.b
2025-04-02 16:24 ` Shakeel Butt
2025-04-03 21:54 ` Nhat Pham
2025-04-02 13:03 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox