From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C293CC05027 for ; Tue, 7 Feb 2023 01:02:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24C626B0071; Mon, 6 Feb 2023 20:02:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FAB36B0073; Mon, 6 Feb 2023 20:02:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09B7C6B0074; Mon, 6 Feb 2023 20:02:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EC0746B0071 for ; Mon, 6 Feb 2023 20:02:58 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BEFACA6FC1 for ; Tue, 7 Feb 2023 01:02:58 +0000 (UTC) X-FDA: 80438696436.29.F09D18F Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf19.hostedemail.com (Postfix) with ESMTP id E31381A0005 for ; Tue, 7 Feb 2023 01:02:56 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dwGb1aIm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675731777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eI0E0KfbMh1tl/kmihqYOety3/NlUN0gS/GZJiIqWAc=; b=UC7hs4IlrasEfE+Eztp3EtC0vylC65bEbnH5p0gUgGPp2Y19H/TfRUa0kQbG5LJ0owcEY8 zrFDlXC50VCzF+mWO+nH4HIKLICco7Qjxc7OWUquZ/WuNxPzP5GDqXm/5wRQYw8zE0nyVd kLfdt7hJoQqUEjAvL0bXDcNvKZ+lN3w= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dwGb1aIm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675731777; a=rsa-sha256; cv=none; b=CP/G35+KXNRBFbl5sCyU/M1cUDklP9P9RpBdnB5cv2NLvLLaUVBQ+gvItPBGHyNetwayhx 2ix1b0Os5xblW334EjHXyWt9LxbnluObx1dLSv5zRbTLkj+C0e9gJFDw7IJq5Wkonj+DUS kHYZGer9H75Ist9RQR0Q7CyIQvomQrY= Received: by mail-ej1-f43.google.com with SMTP id p26so39180975ejx.13 for ; Mon, 06 Feb 2023 17:02:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=eI0E0KfbMh1tl/kmihqYOety3/NlUN0gS/GZJiIqWAc=; b=dwGb1aIm+zlsS2i/YRAeWgSi4quFVIikHWPJSIyAQmhXDsg5Sox302oOpZpV8IOpzY fAbAUeGMiLn5N7Ez3thWAwxKvj/tFe/vjrqFuD/0YxC9yq5g4R4Thh4TVxfU2RpmYO1K q8C2k70kll0TiscAlJShqc7rswaKgIEMjC9//YXniWYGccQbtpLv4fti//yXYb0ZN1mZ U4XkqISP7viv7SKUY556jBo4CB9e60xEOOZuYD6UZ8cjJLAZ0Ft7ofhFV+2S8YnnniJy N5bgYg1VRPAJXsxq/qV9Dcxwvw0bwKgRAdbEyJ06OzFMmTzObKSk32N2D20ljx2hiZGR qIUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eI0E0KfbMh1tl/kmihqYOety3/NlUN0gS/GZJiIqWAc=; b=Lm5p+3xG9krRcYPVH+0RrANveecJ/216uGGZduyWk64z9uaWTZd8QrImwkphTcz74a LxuL7AqawoVrKcnOXbCKRAfWfehN8Ad3uUZnqFcwob0EesMtNBcK0VuV++3OS7WlPmHb Rrd0zdfpfu9hFpUBavROmeIY12cv3XiAbkZ0ViwPNB6ih3AFX05QFJHMaZ54t5j3dail 3i5KMix4uOZyk0Tg5hsGh7x8fj9s2m07bjYHM0ZZwuoIs6AbkO1kuJ+thNxSKF+DABnH KqGwch2rdjZ7uC55qdKTdkb4Je44RsygzF/dwSnf3Edd4ObJ1hEAqDzHuAIJXkAD4y5x WX4g== X-Gm-Message-State: AO0yUKUtEu7SaPrLsd2iJwHMQQ+1GmZwFcvG2yn/ozvHsX/tX3Md/FjH 15xkVXFt6L/k/lk6ViwoEZmWx3E7lD50TnKISa4NRQ== X-Google-Smtp-Source: AK7set83JGcXC2wJde+/wGhhCaXVYmerbToe105TAr9I1vVU4/2sNVIi1QS95M4RV0uzRJy+NB33HS0Ohf9vGEwI1ow= X-Received: by 2002:a17:906:37c2:b0:878:7bc7:958a with SMTP id o2-20020a17090637c200b008787bc7958amr368230ejc.220.1675731775095; Mon, 06 Feb 2023 17:02:55 -0800 (PST) MIME-Version: 1.0 References: <20230206092559.2722946-1-senozhatsky@chromium.org> <20230206092559.2722946-3-senozhatsky@chromium.org> In-Reply-To: <20230206092559.2722946-3-senozhatsky@chromium.org> From: Yosry Ahmed Date: Mon, 6 Feb 2023 17:02:18 -0800 Message-ID: Subject: Re: [PATCH 2/2] zsmalloc: fine-grained inuse ratio based fullness grouping To: Sergey Senozhatsky Cc: Minchan Kim , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: E31381A0005 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: gqckebc7g4w4wru5x7q9phh6gy6t1m5r X-HE-Tag: 1675731776-776810 X-HE-Meta: U2FsdGVkX1+vIk2bhu2UoRQ+QSAt5VGoIoFl9yCJvnpxXjVGSpd4tmWyAVJ+ium2mrm1ZpKHkneErulQsIYZPiAA5duulItIKRgsF8/g/UizVcqKwvYrv9VHBg0bgVzsfn0i+zheRWF/bjILdCMwySxPZsZFSFTbP1E5INkmahqPD+Yo7fZGIQJA5oPFf4D2Wb6wg2TXF7p1/8UIAZI3rWTMXWxinyLoRP54LFZxPCbWJ3ad6zAK+cYHAZpFD/TH0abgK2bgFeDm3b4PRHtPFONQgCZOgImJtgnbSx2osZIbXdr5twaNVveQq4HR6ADoRR2Xk6BSROMuo1nAFvvAYTXnniyEuhMGWiLgOhxTeGqMc2TKKv5JMDOhicjTlqUYlAYZTQAywpCO6mB7d1apyugwGb5ZswrL7C7EhqUwQhssbcV3johwB9Npf7XVuZLuf54yoQo7752Ngzkhdgf/xTH90h6DjMPtwsGD6Mlp2xK73ZCCEu726kXCodcsMuErYUcsR/squJVz+7iUMt96ObqhduqMDwOgBdHEliIEOKHyuEEYoHGW/ucx2K9363RgWjOXxty9YHo9+XlQA73D0tlakLcsBuuSuJqPyFk6vurY2FchflJ7xEfVHbr5mTXdKd5xPxRJ/vTfnElegiLSBU8gHMUFJDZqb0w2Oo9/aMDBQBOZqfmGYjUw27qgXUwQQLK6w/NPvTvJFISA6tlbb7YeZQgat966xBkKUUktgSqFgwjxB8xnQM+Kor+q+IBbm+/cf7oSOINQSBeojvNF7fsFMwq9VeR1DhbQM3GCGdO1c9Yzsdh/BF8ywtWgOylhAhh/8qZ+T3dO1Ab+2B4MUnW9g1u01ivu1MBFPqj9gl0On9vpLVpm+DiamdWwn08uq7Ma2W9CXeqZcAW3jRdRC3HUK2dj2V6OWiTfNxhLZa9D9w/LxESrWYOHvY3u/1aJi2N4KUFZSQi+uw83s+4 YvxVVT9w Uzp5q7wN2SRGigGqNBvUdKejHRoNJVMZEzMXPhZOln3K2OaRgfjU4xAbOK3ZbBH6E2isQKjeam8jKJmRG3B+ctxj2cv604AGr+lA+HuY2F1UbFdah+w8dvN/ntQ1dqOqxW1yzxrV6d12QrBDGTT12UXqXBMaj9H5voCKId2h9d9q4riBYpANjiCB2IRny/tSA0cKgnwAqNFTEn+EoeB+7r4UM3eBllNXKftSr8RG9b+wMAiWzc5wTT41e4evI574KAle5P9qt4+Fl+b7naTO9dyqR94CYvraRMEWL7MW7Cregy6zH1XQ00KYLckoXzN2+c5QAqTxcNsuYz+PNDaF8pNTrS0SRQJ7sIHWR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 6, 2023 at 1:26 AM Sergey Senozhatsky wrote: > > Each zspage maintains ->inuse counter which keeps track of the > number of objects stored in the page. The ->inuse counter also > determines the page's "fullness group" which is calculated as > the ratio of the "inuse" objects to the total number of objects > the page can hold (objs_per_zspage). The closer the ->inuse > counter is to objs_per_zspage, the better. > > Each size class maintains several fullness lists, that keep > track of zspages of particular "fullness". There are four lists > at the moment: > > ZS_EMPTY for pages with zero "inuse" counter > ZS_FULL for pages with "inuse" equal to objs_per_zspage > ZS_ALMOST_EMPTY for pages with "inuse" less than or equal to > 3 * objs_per_zspage / 4 > ZS_ALMOST_FULL for pages with "inuse" greater than > 3 * objs_per_zspage / 4. > > First or all, this makes ZS_ALMOST_EMPTY fullness list pretty > busy for certain size classes. For example, the ZS_ALMOST_EMPTY > list for class-112 (which can store 256 objects per zspage) will > contain pages with ->inuse counters in the range from 1 to 192. > > Second, pages within each fullness list are stored in random > order with regard to the ->inuse counter. This is because > sorting the pages by ->inuse counter each time obj_malloc() or > obj_free() is called would be too expensive. > > However, the ->inuse counter is still a crucial factor in many > situations. > > In a busy system with many obj_malloc() and obj_free() calls, > fullness lists become inefficient. For instance, the ->inuse > counters for the first 7 zspages of some random classes are: > > class-1840 objs_per_zspage 20: > ZS_ALMOST_EMPTY: 3 13 8 2 11 14 3 > ZS_ALMOST_FULL : empty > > class-688 objs_per_zspage 59: > ZS_ALMOST_EMPTY: 1 3 5 1 18 13 15 > ZS_ALMOST_FULL : empty > > For the two major zsmalloc operations, zs_malloc() and zs_compact(), > we typically select the head page from the corresponding fullness > list as the best candidate page. However, this assumption is not > always accurate. > > For the zs_malloc() operation, the optimal candidate page should > have the highest ->inuse counter. This is because the goal is to > maximize the number of ZS_FULL pages and make full use of all > allocated memory. > > For the zs_compact() operation, the optimal candidate page should > have the lowest ->inuse counter. This is because compaction needs > to move objects in use to another page before it can release the > zspage and return its physical pages to the buddy allocator. The > fewer objects in use, the quicker compaction can release the page. > Additionally, compaction is measured by the number of pages it > releases. For example, assume the following case: > > - size class stores 8 objects per zspage > - ALMOST_FULL list contains one page that has ->inuse equal to 6 > - ALMOST_EMPTY list contains 3 pages: one pages has ->inuse > equal to 2, and two pages have ->inuse equal to 1. > > The current compaction algorithm selects the head page of the > ALMOST_EMPTY list (the source page), which has inuse equals 2, > moves its objects to the ALMOST_FULL list page (the destination > page), and then releases the source page. The ALMOST_FULL page > (destination page) becomes FULL, so further compaction is not > possible. > > At the same time, if compaction were to choose ALMOST_EMPTY pages > with ->inuse equal to 1, it would be able to release two zspages > while still performing the same number of memcpy() operations. > > This patch reworks the fullness grouping mechanism. Instead of > relying on a threshold that results in too many pages being > included in the ALMOST_EMPTY group for specific classes, size > classes maintain a larger number of fullness lists that give > strict guarantees on the minimum and maximum ->inuse values > within each group. Each group represents a 10% change in the > ->inuse ratio compared to neighboring groups. In essence, there > are groups for pages with 0%, 10%, 20% usage ratios, and so on, > up to 100%. > > This enhances the selection of candidate pages for both zs_malloc() > and zs_compact(). A printout of the ->inuse counters of the first 7 > pages per (random) class fullness group: > > class-768 objs_per_zspage 16: > fullness 100%: empty > fullness 99%: empty > fullness 90%: empty > fullness 80%: empty > fullness 70%: empty > fullness 60%: 8 8 9 9 8 8 8 > fullness 50%: empty > fullness 40%: 5 5 6 5 5 5 5 > fullness 30%: 4 4 4 4 4 4 4 > fullness 20%: 2 3 2 3 3 2 2 > fullness 10%: 1 1 1 1 1 1 1 > fullness 0%: empty > > The zs_malloc() function searches through the groups of pages > starting with the one having the highest usage ratio. This means > that it always selects a page from the group with the least > internal fragmentation (highest usage ratio) and makes it even > less fragmented by increasing its usage ratio. > > The zs_compact() function, on the other hand, begins by scanning > the group with the highest fragmentation (lowest usage ratio) to > locate the source page. The first available page is selected, and > then the function moves downward to find a destination page in > the group with the lowest internal fragmentation (highest usage > ratio). > > The example demonstrates that zs_malloc() would choose a page > with ->inuse of 8 as the candidate page, while zs_compact() > would pick a page with ->inuse of 1 as the source page and > another page with ->inuse of 8 as the destination page. > > A 1/10 difference in ratio between fullness groups is intentional > and critical for classes that have a high number of objs_per_zspage. > For instance, class-624 stores 59 objects per zspage. With a 1/10 > ratio grouping, the difference in inuse values between the page > with the lowest and highest inuse in a single fullness group is > only 4 objects (2469 bytes), whereas a 1/5 ratio grouping would > result in a difference of 10 objects (6240 bytes). The memory extra overhead would be sizeof(list_head) * nr of classes * extra fullness groups = 16 * 255 * 6 = 24480 bytes ~= 24KB on a machine with 4096 page size. Sounds reasonable (although I wonder how it scales with PAGE_SIZE). > > > Signed-off-by: Sergey Senozhatsky > > --- > mm/zsmalloc.c | 224 +++++++++++++++++++++++++++++++++----------------- > 1 file changed, 148 insertions(+), 76 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index b57a89ed6f30..1901edd01e38 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -127,7 +127,7 @@ > #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > > #define HUGE_BITS 1 > -#define FULLNESS_BITS 2 > +#define FULLNESS_BITS 4 > #define CLASS_BITS 8 > #define ISOLATED_BITS 5 > #define MAGIC_VAL_BITS 8 > @@ -159,24 +159,88 @@ > #define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \ > ZS_SIZE_CLASS_DELTA) + 1) > > +/* > + * Pages are distinguished by the ratio of used memory (that is the ratio > + * of ->inuse objects to all objects that page can store). For example, > + * USAGE_30 means that the ratio of used objects is > 20% and <= 30%. > + * > + * The number of fullness groups is not random. It allows us to keep > + * diffeence between the least busy page in the group (minimum permitted > + * number of ->inuse objects) and the most busy page (maximum permitted > + * number of ->inuse objects) at a reasonable value. > + */ > enum fullness_group { > - ZS_EMPTY, > - ZS_ALMOST_EMPTY, > - ZS_ALMOST_FULL, > - ZS_FULL, > + ZS_USAGE_0, > + ZS_USAGE_10, > + ZS_USAGE_20, > + ZS_USAGE_30, > + ZS_USAGE_40, > + ZS_USAGE_50, > + ZS_USAGE_60, > + ZS_USAGE_70, > + ZS_USAGE_80, > + ZS_USAGE_90, > + ZS_USAGE_99, > + ZS_USAGE_100, > NR_ZS_FULLNESS, > }; > Is there a reason why this can't be done with something like #define FULLNESS_GROUPS 10? We can make sure during build that (100 % FULLNESS_GROUPS == 0) to make our lives easier. I feel like the code will be much more concise and easier to navigate, instead of multiple enums and static arrays. > > enum class_stat_type { > - CLASS_EMPTY, > - CLASS_ALMOST_EMPTY, > - CLASS_ALMOST_FULL, > - CLASS_FULL, > + CLASS_USAGE_0, > + CLASS_USAGE_10, > + CLASS_USAGE_20, > + CLASS_USAGE_30, > + CLASS_USAGE_40, > + CLASS_USAGE_50, > + CLASS_USAGE_60, > + CLASS_USAGE_70, > + CLASS_USAGE_80, > + CLASS_USAGE_90, > + CLASS_USAGE_99, > + CLASS_USAGE_100, > OBJ_ALLOCATED, > OBJ_USED, > NR_ZS_STAT_TYPE, > }; > > +#define NUM_FULLNESS_GROUPS 10 > + > +/* > + * Lookup pages in increasing (from lowest to highest) order of usage ratio. > + * This is useful, for instance, during compaction, when we want to migrate > + * as few objects as possible in order to free zspage. > + */ > +static const enum fullness_group fullness_asc[NUM_FULLNESS_GROUPS] = { > + ZS_USAGE_10, > + ZS_USAGE_20, > + ZS_USAGE_30, > + ZS_USAGE_40, > + ZS_USAGE_50, > + ZS_USAGE_60, > + ZS_USAGE_70, > + ZS_USAGE_80, > + ZS_USAGE_90, > + ZS_USAGE_99 > +}; > + > +/* > + * Lookup pages in decreasing (from highest to lowest) order of usage ratio. > + * This is useful in zs_malloc() and compaction, when we want to have as > + * many full pages as possible for more efficient memory usage. > + */ > +static const enum fullness_group fullness_desc[NUM_FULLNESS_GROUPS] = { > + ZS_USAGE_99, > + ZS_USAGE_90, > + ZS_USAGE_80, > + ZS_USAGE_70, > + ZS_USAGE_60, > + ZS_USAGE_50, > + ZS_USAGE_40, > + ZS_USAGE_30, > + ZS_USAGE_20, > + ZS_USAGE_10, > +}; > + > struct zs_size_stat { > unsigned long objs[NR_ZS_STAT_TYPE]; > }; > @@ -185,21 +249,6 @@ struct zs_size_stat { > static struct dentry *zs_stat_root; > #endif > > -/* > - * We assign a page to ZS_ALMOST_EMPTY fullness group when: > - * n <= N / f, where > - * n = number of allocated objects > - * N = total number of objects zspage can store > - * f = fullness_threshold_frac > - * > - * Similarly, we assign zspage to: > - * ZS_ALMOST_FULL when n > N / f > - * ZS_EMPTY when n == 0 > - * ZS_FULL when n == N > - * > - * (see: fix_fullness_group()) > - */ > -static const int fullness_threshold_frac = 4; > static size_t huge_class_size; > > struct size_class { > @@ -652,8 +701,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v) > continue; > > spin_lock(&pool->lock); > - class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL); > - class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); > + > + /* > + * Replecate old behaviour for almost_full and almost_empty > + * stats. > + */ > + class_almost_full = zs_stat_get(class, CLASS_USAGE_99); > + class_almost_full += zs_stat_get(class, CLASS_USAGE_90); > + class_almost_full += zs_stat_get(class, CLASS_USAGE_80); > + class_almost_full += zs_stat_get(class, CLASS_USAGE_70); > + > + class_almost_empty = zs_stat_get(class, CLASS_USAGE_60); > + class_almost_empty += zs_stat_get(class, CLASS_USAGE_50); > + class_almost_empty += zs_stat_get(class, CLASS_USAGE_40); > + class_almost_empty += zs_stat_get(class, CLASS_USAGE_30); > + class_almost_empty += zs_stat_get(class, CLASS_USAGE_20); > + class_almost_empty += zs_stat_get(class, CLASS_USAGE_10); > + > obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); > obj_used = zs_stat_get(class, OBJ_USED); > freeable = zs_can_compact(class); > @@ -723,33 +787,39 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool) > } > #endif > > - > /* > * For each size class, zspages are divided into different groups > - * depending on how "full" they are. This was done so that we could > - * easily find empty or nearly empty zspages when we try to shrink > - * the pool (not yet implemented). This function returns fullness > + * depending on their usage ratio. This function returns fullness > * status of the given page. > */ > static enum fullness_group get_fullness_group(struct size_class *class, > - struct zspage *zspage) > -{ > + struct zspage *zspage) > +{ > + static const enum fullness_group groups[] = { > + ZS_USAGE_10, > + ZS_USAGE_20, > + ZS_USAGE_30, > + ZS_USAGE_40, > + ZS_USAGE_50, > + ZS_USAGE_60, > + ZS_USAGE_70, > + ZS_USAGE_80, > + ZS_USAGE_90, > + ZS_USAGE_99, > + }; > int inuse, objs_per_zspage; > - enum fullness_group fg; > + int ratio; > > inuse = get_zspage_inuse(zspage); > objs_per_zspage = class->objs_per_zspage; > > if (inuse == 0) > - fg = ZS_EMPTY; > - else if (inuse == objs_per_zspage) > - fg = ZS_FULL; > - else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac) > - fg = ZS_ALMOST_EMPTY; > - else > - fg = ZS_ALMOST_FULL; > + return ZS_USAGE_0; > + if (inuse == objs_per_zspage) > + return ZS_USAGE_100; > > - return fg; > + ratio = 100 * inuse / objs_per_zspage; > + return groups[ratio / 10]; > } > > /* > @@ -781,14 +851,13 @@ static void remove_zspage(struct size_class *class, > /* > * Each size class maintains zspages in different fullness groups depending > * on the number of live objects they contain. When allocating or freeing > - * objects, the fullness status of the page can change, say, from ALMOST_FULL > - * to ALMOST_EMPTY when freeing an object. This function checks if such > - * a status change has occurred for the given page and accordingly moves the > - * page from the freelist of the old fullness group to that of the new > - * fullness group. > + * objects, the fullness status of the page can change, say, from USAGE_80 > + * to USAGE_70 when freeing an object. This function checks if such a status > + * change has occurred for the given page and accordingly moves the page from > + * the list of the old fullness group to that of the new fullness group. > */ > static enum fullness_group fix_fullness_group(struct size_class *class, > - struct zspage *zspage) > + struct zspage *zspage) > { > int class_idx; > enum fullness_group currfg, newfg; > @@ -972,7 +1041,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, > assert_spin_locked(&pool->lock); > > VM_BUG_ON(get_zspage_inuse(zspage)); > - VM_BUG_ON(fg != ZS_EMPTY); > + VM_BUG_ON(fg != ZS_USAGE_0); > > /* Free all deferred handles from zs_free */ > free_handles(pool, class, zspage); > @@ -1011,7 +1080,7 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > return; > } > > - remove_zspage(class, zspage, ZS_EMPTY); > + remove_zspage(class, zspage, ZS_USAGE_0); > #ifdef CONFIG_ZPOOL > list_del(&zspage->lru); > #endif > @@ -1142,14 +1211,15 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, > return zspage; > } > > -static struct zspage *find_get_zspage(struct size_class *class) > +static struct zspage *find_get_zspage(struct size_class *class, > + const enum fullness_group *groups) > { > - int i; > struct zspage *zspage; > + int i; > > - for (i = ZS_ALMOST_FULL; i >= ZS_EMPTY; i--) { > + for (i = 0; i < NUM_FULLNESS_GROUPS; i++) { I think you missed using the passed groups arg here. > > zspage = list_first_entry_or_null(&class->fullness_list[i], > - struct zspage, list); > + struct zspage, list); > if (zspage) > break; > } > @@ -1524,7 +1594,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > /* pool->lock effectively protects the zpage migration */ > spin_lock(&pool->lock); > - zspage = find_get_zspage(class); > + zspage = find_get_zspage(class, fullness_desc); > if (likely(zspage)) { > obj = obj_malloc(pool, zspage, handle); > /* Now move the zspage to another fullness group, if required */ > @@ -1642,7 +1712,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) > obj_free(class->size, obj, NULL); > > fullness = fix_fullness_group(class, zspage); > - if (fullness == ZS_EMPTY) > + if (fullness == ZS_USAGE_0) > free_zspage(pool, class, zspage); > > spin_unlock(&pool->lock); > @@ -1824,22 +1894,19 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, > return ret; > } > > -static struct zspage *isolate_zspage(struct size_class *class, bool source) > +static struct zspage *isolate_zspage(struct size_class *class, > + const enum fullness_group *groups) > { > - int i; > struct zspage *zspage; > - enum fullness_group fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL}; > + int i; > > - if (!source) { > - fg[0] = ZS_ALMOST_FULL; > - fg[1] = ZS_ALMOST_EMPTY; > - } > + for (i = 0; i < NUM_FULLNESS_GROUPS; i++) { > + enum fullness_group fg = groups[i]; > > - for (i = 0; i < 2; i++) { > - zspage = list_first_entry_or_null(&class->fullness_list[fg[i]], > - struct zspage, list); > + zspage = list_first_entry_or_null(&class->fullness_list[fg], > + struct zspage, list); > if (zspage) { > - remove_zspage(class, zspage, fg[i]); > + remove_zspage(class, zspage, fg); > return zspage; > } > } > @@ -2133,7 +2200,8 @@ static void async_free_zspage(struct work_struct *work) > continue; > > spin_lock(&pool->lock); > - list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages); > + list_splice_init(&class->fullness_list[ZS_USAGE_0], > + &free_pages); > spin_unlock(&pool->lock); > } > > @@ -2142,7 +2210,7 @@ static void async_free_zspage(struct work_struct *work) > lock_zspage(zspage); > > get_zspage_mapping(zspage, &class_idx, &fullness); > - VM_BUG_ON(fullness != ZS_EMPTY); > + VM_BUG_ON(fullness != ZS_USAGE_0); > class = pool->size_class[class_idx]; > spin_lock(&pool->lock); > #ifdef CONFIG_ZPOOL > @@ -2215,7 +2283,7 @@ static unsigned long __zs_compact(struct zs_pool *pool, > * as well as zpage allocation/free > */ > spin_lock(&pool->lock); > - while ((src_zspage = isolate_zspage(class, true))) { > + while ((src_zspage = isolate_zspage(class, fullness_asc))) { > /* protect someone accessing the zspage(i.e., zs_map_object) */ > migrate_write_lock(src_zspage); > > @@ -2225,10 +2293,11 @@ static unsigned long __zs_compact(struct zs_pool *pool, > cc.obj_idx = 0; > cc.s_page = get_first_page(src_zspage); > > - while ((dst_zspage = isolate_zspage(class, false))) { > + while ((dst_zspage = isolate_zspage(class, fullness_desc))) { > migrate_write_lock_nested(dst_zspage); > > cc.d_page = get_first_page(dst_zspage); > + > /* > * If there is no more space in dst_page, resched > * and see if anyone had allocated another zspage. > @@ -2250,7 +2319,7 @@ static unsigned long __zs_compact(struct zs_pool *pool, > putback_zspage(class, dst_zspage); > migrate_write_unlock(dst_zspage); > > - if (putback_zspage(class, src_zspage) == ZS_EMPTY) { > + if (putback_zspage(class, src_zspage) == ZS_USAGE_0) { > migrate_write_unlock(src_zspage); > free_zspage(pool, class, src_zspage); > pages_freed += class->pages_per_zspage; > @@ -2408,7 +2477,7 @@ struct zs_pool *zs_create_pool(const char *name) > int pages_per_zspage; > int objs_per_zspage; > struct size_class *class; > - int fullness = 0; > + int fullness; > > size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA; > if (size > ZS_MAX_ALLOC_SIZE) > @@ -2462,9 +2531,12 @@ struct zs_pool *zs_create_pool(const char *name) > class->pages_per_zspage = pages_per_zspage; > class->objs_per_zspage = objs_per_zspage; > pool->size_class[i] = class; > - for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS; > - fullness++) > + > + fullness = ZS_USAGE_0; > + while (fullness < NR_ZS_FULLNESS) { > INIT_LIST_HEAD(&class->fullness_list[fullness]); > + fullness++; > + } > > prev_class = class; > } > @@ -2510,7 +2582,7 @@ void zs_destroy_pool(struct zs_pool *pool) > if (class->index != i) > continue; > > - for (fg = ZS_EMPTY; fg < NR_ZS_FULLNESS; fg++) { > + for (fg = ZS_USAGE_0; fg < NR_ZS_FULLNESS; fg++) { > if (!list_empty(&class->fullness_list[fg])) { > pr_info("Freeing non-empty class with size %db, fullness group %d\n", > class->size, fg); > @@ -2686,7 +2758,7 @@ static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries) > * while the page is removed from the pool. Fix it > * up for the check in __free_zspage(). > */ > - zspage->fullness = ZS_EMPTY; > + zspage->fullness = ZS_USAGE_0; > > __free_zspage(pool, class, zspage); > spin_unlock(&pool->lock); > -- > 2.39.1.519.gcb327c4b5f-goog > >