From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 634DBC433EF for ; Thu, 16 Sep 2021 08:51:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4489F61241 for ; Thu, 16 Sep 2021 08:51:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4489F61241 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=clicknet.pro Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C2BEF6B0071; Thu, 16 Sep 2021 04:51:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB2D16B0072; Thu, 16 Sep 2021 04:51:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A05E16B0073; Thu, 16 Sep 2021 04:51:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 8C0B56B0071 for ; Thu, 16 Sep 2021 04:51:15 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 47CB82CFE7 for ; Thu, 16 Sep 2021 08:51:15 +0000 (UTC) X-FDA: 78592817310.20.1A59FE0 Received: from forward100o.mail.yandex.net (forward100o.mail.yandex.net [37.140.190.180]) by imf01.hostedemail.com (Postfix) with ESMTP id 66ED6505C115 for ; Thu, 16 Sep 2021 08:51:14 +0000 (UTC) Received: from forward103q.mail.yandex.net (forward103q.mail.yandex.net [IPv6:2a02:6b8:c0e:50:0:640:b21c:d009]) by forward100o.mail.yandex.net (Yandex) with ESMTP id 68FD452A9BE2 for ; Thu, 16 Sep 2021 11:51:11 +0300 (MSK) Received: from vla5-76e286f9e0ab.qloud-c.yandex.net (vla5-76e286f9e0ab.qloud-c.yandex.net [IPv6:2a02:6b8:c18:340b:0:640:76e2:86f9]) by forward103q.mail.yandex.net (Yandex) with ESMTP id 6549B56A000F for ; Thu, 16 Sep 2021 11:51:11 +0300 (MSK) Received: from vla1-1bc5b51c612f.qloud-c.yandex.net (vla1-1bc5b51c612f.qloud-c.yandex.net [2a02:6b8:c0d:89c:0:640:1bc5:b51c]) by vla5-76e286f9e0ab.qloud-c.yandex.net (mxback/Yandex) with ESMTP id PWtc75ELTH-pBEuEdtH; Thu, 16 Sep 2021 11:51:11 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=clicknet.pro; s=mail; t=1631782271; bh=n03QLxaldTOMLois5qUCQhS+Wzk9TZZw9df5085PhTQ=; h=Message-Id:Date:Subject:To:From:Cc; b=cYkwW0g462XBNKoZ5e3L/6S6W12FZ1ImAcfI0vmywkklASlCAGPMs+jGw1jNuYii6 rgY/CFMGQZOYipqjw+oZ6kiAIh23TLL+Ds2rqa740gQtnevACN3IVCNFHzSs+Xl42x 94X9l+FdO5zNl43zASjVACtMWC5Z+ue36R78Exq0= Received: by vla1-1bc5b51c612f.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id ff1rYdhylk-pARqjaTA; Thu, 16 Sep 2021 11:51:10 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) From: Ananda Badmaev To: linux-mm@kvack.org Cc: Ananda Badmaev Subject: [PATCH] mm: add ztree - new allocator for use via zpool API Date: Thu, 16 Sep 2021 11:51:02 +0300 Message-Id: <20210916085102.19310-1-a.badmaev@clicknet.pro> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 66ED6505C115 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=clicknet.pro header.s=mail header.b=cYkwW0g4; dmarc=none; spf=pass (imf01.hostedemail.com: domain of a.badmaev@clicknet.pro designates 37.140.190.180 as permitted sender) smtp.mailfrom=a.badmaev@clicknet.pro X-Stat-Signature: d9ty71rnkss6e7etymoupy41t4iy9udc X-HE-Tag: 1631782274-715111 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ztree is a versatile backend for zswap and potentially zram. It got its n= ame due to the usage of red-black trees to store blocks of compressed objects= . These blocks consist of several consecutive pages and ztree keeps an inte= ger number of objects per block. For zram, ztree has better worst case malloc() and free() times than zsma= lloc, does not deteriorate over time and has slightly worse but comparable comp= ression ratio. For zswap, ztree has better worst case malloc() and free() times t= han z3fold, better compression ratio than z3fold and supports reclaim unlike = zsmalloc. Signed-off-by: Ananda Badmaev --- MAINTAINERS | 7 + mm/Kconfig | 18 ++ mm/Makefile | 1 + mm/ztree.c | 761 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 787 insertions(+) create mode 100644 mm/ztree.c diff --git a/MAINTAINERS b/MAINTAINERS index d7b4f32875a9..d22ab5641e98 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20482,6 +20482,13 @@ L: linux-mm@kvack.org S: Maintained F: mm/zbud.c =20 +ZTREE COMPRESSED PAGE ALLOCATOR +M: Ananda Badmaev +M: Vitaly Wool +L: linux-mm@kvack.org +S: Maintained +F: mm/ztree.c + ZD1211RW WIRELESS DRIVER M: Daniel Drake M: Ulrich Kunitz diff --git a/mm/Kconfig b/mm/Kconfig index 40a9bfcd5062..96b89ce95a3d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -641,6 +641,12 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD help Use the z3fold allocator as the default allocator. =20 +config ZSWAP_ZPOOL_DEFAULT_ZTREE + bool "ztree" + select ZTREE + help + Use the ztree allocator as the default allocator. + config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC bool "zsmalloc" select ZSMALLOC @@ -653,6 +659,7 @@ config ZSWAP_ZPOOL_DEFAULT depends on ZSWAP default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD + default "ztree" if ZSWAP_ZPOOL_DEFAULT_ZTREE default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC default "" =20 @@ -691,6 +698,17 @@ config Z3FOLD page. It is a ZBUD derivative so the simplicity and determinism are still there. =20 +config ZTREE + tristate "Simple allocator for compressed pages" + depends on ZPOOL + help + A special purpose allocator for storing compressed pages. + It stores integer number of compressed pages per block and + each block consists of integer number of physical pages. + Red-black trees are used for efficient block organization. + ztree provides fast random read/write operations and + good compression ratio in most scenarios. + config ZSMALLOC tristate "Memory allocator for compressed pages" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index e3436741d539..a38dd41e4dbc 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -108,6 +108,7 @@ obj-$(CONFIG_ZPOOL) +=3D zpool.o obj-$(CONFIG_ZBUD) +=3D zbud.o obj-$(CONFIG_ZSMALLOC) +=3D zsmalloc.o obj-$(CONFIG_Z3FOLD) +=3D z3fold.o +obj-$(CONFIG_ZTREE) +=3D ztree.o obj-$(CONFIG_GENERIC_EARLY_IOREMAP) +=3D early_ioremap.o obj-$(CONFIG_CMA) +=3D cma.o obj-$(CONFIG_MEMORY_BALLOON) +=3D balloon_compaction.o diff --git a/mm/ztree.c b/mm/ztree.c new file mode 100644 index 000000000000..df1562ea88e5 --- /dev/null +++ b/mm/ztree.c @@ -0,0 +1,761 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * ztree.c + * + * Author: Ananda Badmaev + * Copyright (C) 2021, Konsulko AB. + * + * This implementation is based on z3fold written by Vitaly Wool. + * + * ztree is an special purpose allocator for storing compressed pages. + * It stores integer number of objects per block - in the range from 8 t= o 16. + * Blocks consist of several physical pages - from 1 to 8 (always a powe= r of 2). + * ztree uses red-black trees for efficient block organization and + * creates a compile-time fixed amount of block trees. + * Each such tree stores only objects with a size in a certain range. + * + * ztree doesn't export any API and is meant to be used via zpool API. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include + + +#define SLOT_FREE 0 +#define SLOT_OCCUPIED 1 +#define SLOT_MAPPED 2 +#define SLOT_UNMAPPED 3 + +#define SLOT_BITS 4 +#define BLOCK_TYPE_BITS 4 + +#define BLOCK_TYPE_SHIFT (sizeof(long) * 8 - BLOCK_TYPE_BITS) +#define MAX_BLOCK_INDEX (ULONG_MAX >> (SLOT_BITS + BLOCK_TYPE_BITS)) +#define BLOCK_INDEX_MASK (MAX_BLOCK_INDEX << SLOT_BITS) +#define MAX_SLOTS (1 << SLOT_BITS) +#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1) + +/***************** + * Structures + *****************/ + +struct ztree_pool; + +struct ztree_ops { + int (*evict)(struct ztree_pool *pool, unsigned long handle); +}; + +/** + * struct ztree_block - block metadata + * Block consists of several (1/2/4/8) pages and contains fixed + * integer number of slots for allocating compressed pages. + * @block_node: links block into the relevant tree in the pool + * @slot_info: contains data about free/occupied slots + * @compressed_data: pointer to the memory block + * @block_index: unique for each ztree_block in the tree + * @free_slots: number of free slots in the block + * @coeff: to switch between blocks + * @under_reclaim: if true shows that block is being evicted + */ +struct ztree_block { + spinlock_t lock; + struct rb_node block_node; + u8 slot_info[MAX_SLOTS]; + void *compressed_data; + unsigned long block_index; + unsigned short free_slots; + unsigned short coeff; + bool under_reclaim; +}; + + +/** + * struct tree_desc - general metadata for block trees + * Each block tree stores only blocks of corresponding type which means + * that all blocks in it have the same number and size of slots. + * All slots are aligned to size of long. + * @slot_size: size of slot for this tree + * @slots_per_block: number of slots per block for this tree + * @order: order for __get_free_pages + */ +static struct tree_desc { + const unsigned int slot_size; + const unsigned short slots_per_block; + const unsigned short order; +} tree_desc[] =3D { + /* 1 page blocks with 16 slots */ + [0] =3D { PAGE_SIZE / 16, 0x10, 0 }, + /* 1 page blocks with 11 slots */ + [1] =3D { PAGE_SIZE / (11 * sizeof(long)) * sizeof(long), 0xB, 0 }, + /* 1 page blocks with 8 slots */ + [2] =3D { PAGE_SIZE / 8, 0x8, 0 }, + /* 2 page blocks with 11 slots */ + [3] =3D { 2 * PAGE_SIZE / (11 * sizeof(long)) * sizeof(long), 0xB, 1 }, + /* 2 page blocks with 8 slots */ + [4] =3D { PAGE_SIZE / 4, 0x8, 1 }, + /* 4 page blocks with 14 slots */ + [5] =3D { 4 * PAGE_SIZE / (14 * sizeof(long)) * sizeof(long), 0xE, 2 }, + /* 4 page blocks with 12 slots */ + [6] =3D { 4 * PAGE_SIZE / (12 * sizeof(long)) * sizeof(long), 0xC, 2 }, + /* 4 page blocks with 10 slots */ + [7] =3D { 4 * PAGE_SIZE / (10 * sizeof(long)) * sizeof(long), 0xA, 2 }, + /* 4 page blocks with 9 slots */ + [8] =3D { 4 * PAGE_SIZE / (9 * sizeof(long)) * sizeof(long), 0x9, 2 }, + /* 4 page blocks with 8 slots */ + [9] =3D { PAGE_SIZE / 2, 0x8, 2 }, + /* 8 page blocks with 14 slots */ + [10] =3D { 8 * PAGE_SIZE / (14 * sizeof(long)) * sizeof(long), 0xE, 3 }= , + /* 8 page blocks with 13 slots */ + [11] =3D { 8 * PAGE_SIZE / (13 * sizeof(long)) * sizeof(long), 0xD, 3 }= , + /* 8 page blocks with 12 slots */ + [12] =3D { 8 * PAGE_SIZE / (12 * sizeof(long)) * sizeof(long), 0xC, 3 }= , + /* 8 page blocks with 11 slots */ + [13] =3D { 8 * PAGE_SIZE / (11 * sizeof(long)) * sizeof(long), 0xB, 3 }= , + /* 8 page blocks with 10 slots */ + [14] =3D { 8 * PAGE_SIZE / (10 * sizeof(long)) * sizeof(long), 0xA, 3 }= , + /* 8 page blocks with 8 slots */ + [15] =3D { PAGE_SIZE, 0x8, 3} +}; + + +/** + * struct block_tree - stores metadata of particular tree + * @lock: protects tree + * @root: root of this tree + * @last_block: pointer to the last added block in the tree + * @current_block: pointer to current block for allocation + * @counter: counter for block_index in ztree_block + * @block_count: total number of blocks in the tree + */ +struct block_tree { + spinlock_t lock; + struct rb_root root; + struct ztree_block *last_block; + struct ztree_block *current_block; + unsigned long counter; + unsigned int block_count; +}; + +/** + * struct ztree_pool - stores metadata for each ztree pool + * @block_trees: array of block trees + * @block_cache: array of block cache for block metadata allocation + * @ops: pointer to a structure of user defined operations specifi= ed at + * pool creation time. + * + * This structure is allocated at pool creation time and maintains metad= ata + * for a particular ztree pool. + */ +struct ztree_pool { + struct block_tree *block_trees; + struct kmem_cache *block_cache; + const struct ztree_ops *ops; + struct zpool *zpool; + const struct zpool_ops *zpool_ops; +}; + + +/***************** + * Helpers + *****************/ + +/* compare 2 nodes by block index, used by rb_add() */ +static bool node_comp(struct rb_node *node1, const struct rb_node *node2= ) +{ + struct ztree_block *block1, *block2; + + block1 =3D rb_entry(node1, struct ztree_block, block_node); + block2 =3D rb_entry(node2, struct ztree_block, block_node); + return block1->block_index < block2->block_index; +} + +/* compare key with block index of node, used by rb_find() */ +static int index_comp(const void *key, const struct rb_node *node) +{ + struct ztree_block *block; + unsigned long block_index; + + block =3D rb_entry(node, struct ztree_block, block_node); + block_index =3D *(unsigned long *)key; + if (block_index < block->block_index) + return -1; + else if (block_index > block->block_index) + return 1; + else + return 0; +} + + +/* + * allocate new block and add it to corresponding block tree + */ +static struct ztree_block *alloc_block(struct ztree_pool *pool, + int block_type, gfp_t gfp) +{ + struct ztree_block *block; + struct block_tree *tree; + + block =3D kmem_cache_alloc(pool->block_cache, + (gfp & ~(__GFP_HIGHMEM | __GFP_MOVABLE))); + if (!block) + return NULL; + + block->compressed_data =3D (void *)__get_free_pages(gfp, tree_desc[bloc= k_type].order); + if (!block->compressed_data) { + kmem_cache_free(pool->block_cache, block); + return NULL; + } + tree =3D &(pool->block_trees)[block_type]; + + /* init block data */ + spin_lock_init(&block->lock); + memset(block->slot_info, SLOT_FREE, tree_desc[block_type].slots_per_blo= ck); + block->free_slots =3D tree_desc[block_type].slots_per_block; + block->coeff =3D 0; + block->under_reclaim =3D false; + + spin_lock(&tree->lock); + /* block indexation and inserting block into tree */ + block->block_index =3D tree->counter++; + tree->counter %=3D MAX_BLOCK_INDEX; + rb_add(&block->block_node, &tree->root, node_comp); + tree->last_block =3D block; + tree->block_count++; + spin_unlock(&tree->lock); + return block; +} + + +/* + * free block tree with blocks of particular type + */ +void free_block_tree(struct ztree_pool *pool, int block_type) +{ + struct ztree_block *block, *tmp; + struct block_tree *tree; + + tree =3D &(pool->block_trees)[block_type]; + spin_lock(&tree->lock); + rbtree_postorder_for_each_entry_safe(block, tmp, &tree->root, block_nod= e) { + free_pages((unsigned long)block->compressed_data, + tree_desc[block_type].order); + kmem_cache_free(pool->block_cache, block); + } + spin_unlock(&tree->lock); +} + +/* + * Encodes the handle of a particular slot in the pool using metadata + */ +static unsigned long metadata_to_handle(unsigned long block_type, + unsigned long block_index, unsigned long slot) +{ + unsigned long handle; + + handle =3D (block_type << BLOCK_TYPE_SHIFT) + + (block_index << SLOT_BITS) + slot; + return handle; +} + + +/* Returns block type, block index and slot in the pool corresponding to= handle */ +static void handle_to_metadata(unsigned long handle, unsigned long *bloc= k_type, + unsigned long *block_index, unsigned long *slot) +{ + *block_type =3D handle >> BLOCK_TYPE_SHIFT; + *block_index =3D (handle & BLOCK_INDEX_MASK) >> SLOT_BITS; + *slot =3D handle & SLOT_MASK; +} + + +/***************** + * API Functions + *****************/ +/** + * ztree_create_pool() - create a new ztree pool array + * @gfp: gfp flags when allocating the ztree pool structure + * @ops: user-defined operations for the ztree pool + * + * Return: pointer to the new ztree pool or NULL if the metadata allocat= ion + * failed. + */ +struct ztree_pool *ztree_create_pool(gfp_t gfp, const struct ztree_ops *= ops) +{ + struct ztree_pool *pool; + struct block_tree *tree; + int i, block_types_nr; + + pool =3D kmalloc(sizeof(struct ztree_pool), gfp); + if (!pool) + return NULL; + + block_types_nr =3D ARRAY_SIZE(tree_desc); + + pool->block_cache =3D kmem_cache_create("ztree_blocks", + sizeof(struct ztree_block), 0, 0, NULL); + if (!pool->block_cache) { + kfree(pool); + return NULL; + } + + pool->block_trees =3D kmalloc(block_types_nr * sizeof(struct block_tree= ), gfp); + if (!pool->block_trees) { + kmem_cache_destroy(pool->block_cache); + kfree(pool); + return NULL; + } + + /* init each basic block tree */ + for (i =3D 0; i < block_types_nr; i++) { + tree =3D &(pool->block_trees)[i]; + spin_lock_init(&tree->lock); + tree->root =3D RB_ROOT; + tree->last_block =3D NULL; + tree->current_block =3D NULL; + tree->counter =3D 0; + tree->block_count =3D 0; + } + pool->ops =3D ops; + return pool; +} + +/** + * ztree_destroy_pool() - destroys an existing ztree pool + * @pool: the ztree pool to be destroyed + * + */ +void ztree_destroy_pool(struct ztree_pool *pool) +{ + int i; + + for (i =3D 0; i < ARRAY_SIZE(tree_desc); i++) + free_block_tree(pool, i); + kmem_cache_destroy(pool->block_cache); + kfree(pool->block_trees); + kfree(pool); +} + + +/** + * ztree_alloc() - allocates a slot of appropriate size + * @pool: ztree pool from which to allocate + * @size: size in bytes of the desired allocation + * @gfp: gfp flags used if the pool needs to grow + * @handle: handle of the new allocation + * + * Return: 0 if success and handle is set, otherwise -EINVAL if the size= or + * gfp arguments are invalid or -ENOMEM if the pool was unable to alloca= te + * a new slot. + */ +int ztree_alloc(struct ztree_pool *pool, size_t size, gfp_t gfp, + unsigned long *handle) +{ + unsigned long block_type, slot; + struct ztree_block *block; + struct block_tree *tree; + + if (!size) + return -EINVAL; + + if (size > PAGE_SIZE) + return -ENOSPC; + + /* find basic block type with suitable slot size */ + for (block_type =3D 0; block_type < ARRAY_SIZE(tree_desc); block_type++= ) { + if (size <=3D tree_desc[block_type].slot_size) + break; + } + tree =3D &(pool->block_trees[block_type]); + spin_lock(&tree->lock); + /* check if there are free slots in the current and the last added bloc= ks */ + if (tree->current_block && tree->current_block->free_slots > 0) { + block =3D tree->current_block; + goto found; + } + if (tree->last_block && tree->last_block->free_slots > 0) { + block =3D tree->last_block; + goto found; + } + spin_unlock(&tree->lock); + + /* not found block with free slots try to allocate new empty block */ + block =3D alloc_block(pool, block_type, gfp); + spin_lock(&tree->lock); + if (block) { + tree->current_block =3D block; + goto found; + } + spin_unlock(&tree->lock); + return -ENOMEM; + +found: + spin_lock(&block->lock); + block->free_slots--; + spin_unlock(&tree->lock); + /* find the first free slot in block */ + for (slot =3D 0; slot < tree_desc[block_type].slots_per_block; slot++) = { + if (block->slot_info[slot] =3D=3D SLOT_FREE) + break; + } + block->slot_info[slot] =3D SLOT_OCCUPIED; + block->coeff =3D block->free_slots * + (tree_desc[block_type].slots_per_block - block->free_slots); + spin_unlock(&block->lock); + *handle =3D metadata_to_handle(block_type, block->block_index, slot); + return 0; +} + +/** + * ztree_free() - frees the allocation associated with the given handle + * @pool: pool in which the allocation resided + * @handle: handle associated with the allocation returned by ztree_allo= c() + * + */ +void ztree_free(struct ztree_pool *pool, unsigned long handle) +{ + unsigned long slot, block_type, block_index; + struct rb_node *tmp; + struct ztree_block *block; + struct block_tree *tree; + + handle_to_metadata(handle, &block_type, &block_index, &slot); + tree =3D &(pool->block_trees[block_type]); + + /* find block corresponding to handle */ + spin_lock(&tree->lock); + tmp =3D rb_find(&block_index, &tree->root, index_comp); + if (!tmp) { + spin_unlock(&tree->lock); + pr_err("ztree block not found\n"); + return; + } + block =3D rb_entry(tmp, struct ztree_block, block_node); + if (block->under_reclaim) { + spin_unlock(&tree->lock); + return; + } + block->free_slots++; + /* if all slots in block are empty delete whole block */ + if (block->free_slots =3D=3D tree_desc[block_type].slots_per_block) { + rb_erase(&block->block_node, &tree->root); + tree->block_count--; + + /* if the last block to be deleted */ + if (block =3D=3D tree->last_block) { + tree->current_block =3D NULL; + tree->last_block =3D NULL; + /* otherwise set current block to last block */ + } else { + tree->current_block =3D tree->last_block; + } + spin_unlock(&tree->lock); + free_pages((unsigned long)block->compressed_data, tree_desc[block_type= ].order); + kmem_cache_free(pool->block_cache, block); + return; + } + /* switch current block */ + if (!tree->current_block || block->coeff >=3D tree->current_block->coef= f) + tree->current_block =3D block; + spin_lock(&block->lock); + spin_unlock(&tree->lock); + block->slot_info[slot] =3D SLOT_FREE; + block->coeff =3D block->free_slots * + (tree_desc[block_type].slots_per_block - block->free_slots); + spin_unlock(&block->lock); +} + +/** + * ztree_reclaim_block() - evicts allocations from block and frees it + * @pool: pool from which a block will attempt to be evicted + * + * Returns: pages reclaimed count if block is successfully freed + * otherwise -EINVAL if there are no blocks to evict + */ +int ztree_reclaim_block(struct ztree_pool *pool) +{ + struct ztree_block *block; + struct rb_node *tmp; + struct block_tree *tree; + unsigned long handle, block_type, slot; + int ret, reclaimed; + + /* start with tree storing blocks with the worst compression and try + * to evict block with the lowest index (the first element in tree) + */ + for (block_type =3D ARRAY_SIZE(tree_desc) - 1; block_type >=3D 0; --blo= ck_type) { + tree =3D &(pool->block_trees[block_type]); + spin_lock(&tree->lock); + + /* find the first block in tree */ + tmp =3D rb_first(&tree->root); + + if (tmp =3D=3D NULL) { + spin_unlock(&tree->lock); + continue; + } + block =3D rb_entry(tmp, struct ztree_block, block_node); + + /* skip iteration if this block is current or last */ + if (block =3D=3D tree->current_block || block =3D=3D tree->last_block)= { + spin_unlock(&tree->lock); + continue; + } + block->under_reclaim =3D true; + spin_unlock(&tree->lock); + reclaimed =3D 0; + + /* try to evict all UNMAPPED slots in block */ + for (slot =3D 0; slot < tree_desc[block_type].slots_per_block; ++slot)= { + if (block->slot_info[slot] =3D=3D SLOT_UNMAPPED) { + handle =3D metadata_to_handle(block_type, block->block_index, slot); + ret =3D pool->ops->evict(pool, handle); + if (ret) + break; + + ++reclaimed; + spin_lock(&block->lock); + block->slot_info[slot] =3D SLOT_FREE; + spin_unlock(&block->lock); + block->free_slots++; + } + } + spin_lock(&tree->lock); + /* some occupied slots remained - modify coeff and leave the block */ + if (block->free_slots !=3D tree_desc[block_type].slots_per_block) { + block->under_reclaim =3D false; + block->coeff =3D block->free_slots * + (tree_desc[block_type].slots_per_block - block->free_slots); + spin_unlock(&tree->lock); + } else { + /* all slots are free - delete this block */ + rb_erase(&block->block_node, &tree->root); + tree->block_count--; + spin_unlock(&tree->lock); + free_pages((unsigned long)block->compressed_data, + tree_desc[block_type].order); + kmem_cache_free(pool->block_cache, block); + } + if (reclaimed !=3D 0) + return reclaimed; + return -EAGAIN; + } + return -EINVAL; +} + + +/** + * ztree_map() - maps the allocation associated with the given handle + * @pool: pool in which the allocation resides + * @handle: handle associated with the allocation to be mapped + * + * + * Returns: a pointer to the mapped allocation + */ +void *ztree_map(struct ztree_pool *pool, unsigned long handle) +{ + unsigned long block_type, block_index, slot; + struct rb_node *tmp; + struct ztree_block *block; + struct block_tree *tree; + + handle_to_metadata(handle, &block_type, &block_index, &slot); + tree =3D &(pool->block_trees[block_type]); + spin_lock(&tree->lock); + + /* often requested block turns out to be current_block */ + if (tree->current_block && tree->current_block->block_index =3D=3D bloc= k_index) { + block =3D tree->current_block; + spin_unlock(&tree->lock); + goto found; + } + /* find block with index corresponding to handle */ + tmp =3D rb_find(&block_index, &tree->root, index_comp); + spin_unlock(&tree->lock); + if (!tmp) { + pr_err("ztree block not found\n"); + return NULL; + } + block =3D rb_entry(tmp, struct ztree_block, block_node); +found: + spin_lock(&block->lock); + block->slot_info[slot] =3D SLOT_MAPPED; + spin_unlock(&block->lock); + return block->compressed_data + slot * tree_desc[block_type].slot_size; +} + +/** + * ztree_unmap() - unmaps the allocation associated with the given handl= e + * @pool: pool in which the allocation resides + * @handle: handle associated with the allocation to be unmapped + */ +void ztree_unmap(struct ztree_pool *pool, unsigned long handle) +{ + unsigned long block_type, block_index, slot; + struct rb_node *tmp; + struct ztree_block *block; + struct block_tree *tree; + + handle_to_metadata(handle, &block_type, &block_index, &slot); + tree =3D &(pool->block_trees[block_type]); + spin_lock(&tree->lock); + + /* often requested block turns out to be current_block */ + if (tree->current_block && tree->current_block->block_index =3D=3D bloc= k_index) { + block =3D tree->current_block; + spin_unlock(&tree->lock); + goto found; + } + /* find block with index corresponding to handle */ + tmp =3D rb_find(&block_index, &tree->root, index_comp); + spin_unlock(&tree->lock); + if (!tmp) { + pr_err("ztree block not found\n"); + return; + } + block =3D rb_entry(tmp, struct ztree_block, block_node); +found: + spin_lock(&block->lock); + block->slot_info[slot] =3D SLOT_UNMAPPED; + spin_unlock(&block->lock); +} + +/** + * ztree_get_pool_size() - gets the ztree pool size in bytes + * @pool: pool whose size is being queried + * + * Returns: size in bytes of the given pool. + */ +u64 ztree_get_pool_size(struct ztree_pool *pool) +{ + u64 total_size; + int i; + + total_size =3D 0; + for (i =3D 0; i < ARRAY_SIZE(tree_desc); i++) { + total_size +=3D (pool->block_trees)[i].block_count + * tree_desc[i].slot_size * tree_desc[i].slots_per_block; + } + return total_size; +} + +/***************** + * zpool + ****************/ + +static int ztree_zpool_evict(struct ztree_pool *pool, unsigned long hand= le) +{ + if (pool->zpool && pool->zpool_ops && pool->zpool_ops->evict) + return pool->zpool_ops->evict(pool->zpool, handle); + else + return -ENOENT; +} + +static const struct ztree_ops ztree_zpool_ops =3D { + .evict =3D ztree_zpool_evict +}; + +static void *ztree_zpool_create(const char *name, gfp_t gfp, + const struct zpool_ops *zpool_ops, + struct zpool *zpool) +{ + struct ztree_pool *pool; + + pool =3D ztree_create_pool(gfp, &ztree_zpool_ops); + if (pool) { + pool->zpool =3D zpool; + pool->zpool_ops =3D zpool_ops; + } + return pool; +} + +static void ztree_zpool_destroy(void *pool) +{ + ztree_destroy_pool(pool); +} + +static int ztree_zpool_malloc(void *pool, size_t size, gfp_t gfp, + unsigned long *handle) +{ + return ztree_alloc(pool, size, gfp, handle); +} + +static void ztree_zpool_free(void *pool, unsigned long handle) +{ + ztree_free(pool, handle); +} + +static int ztree_zpool_shrink(void *pool, unsigned int pages, + unsigned int *reclaimed) +{ + unsigned int total =3D 0; + int ret =3D -EINVAL; + + while (total < pages) { + ret =3D ztree_reclaim_block(pool); + if (ret < 0) + break; + total +=3D ret; + } + if (reclaimed) + *reclaimed =3D total; + + return ret; +} + +static void *ztree_zpool_map(void *pool, unsigned long handle, + enum zpool_mapmode mm) +{ + return ztree_map(pool, handle); +} + +static void ztree_zpool_unmap(void *pool, unsigned long handle) +{ + ztree_unmap(pool, handle); +} + +static u64 ztree_zpool_total_size(void *pool) +{ + return ztree_get_pool_size(pool); +} + +static struct zpool_driver ztree_zpool_driver =3D { + .type =3D "ztree", + .owner =3D THIS_MODULE, + .create =3D ztree_zpool_create, + .destroy =3D ztree_zpool_destroy, + .malloc =3D ztree_zpool_malloc, + .free =3D ztree_zpool_free, + .shrink =3D ztree_zpool_shrink, + .map =3D ztree_zpool_map, + .unmap =3D ztree_zpool_unmap, + .total_size =3D ztree_zpool_total_size, +}; + +MODULE_ALIAS("zpool-ztree"); + +static int __init init_ztree(void) +{ + pr_info("loaded\n"); + zpool_register_driver(&ztree_zpool_driver); + return 0; +} + +static void __exit exit_ztree(void) +{ + zpool_unregister_driver(&ztree_zpool_driver); + pr_info("unloaded\n"); +} + +module_init(init_ztree); +module_exit(exit_ztree); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Ananda Badmaeb "); +MODULE_DESCRIPTION("simple block allocator"); --=20 2.20.1