From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C8DDC64EC4 for ; Tue, 28 Feb 2023 23:14:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D698C6B0072; Tue, 28 Feb 2023 18:14:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D19106B0073; Tue, 28 Feb 2023 18:14:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB9196B0074; Tue, 28 Feb 2023 18:14:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AB96E6B0072 for ; Tue, 28 Feb 2023 18:14:53 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B00E4073F for ; Tue, 28 Feb 2023 23:14:53 +0000 (UTC) X-FDA: 80518257666.11.3514097 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf09.hostedemail.com (Postfix) with ESMTP id 88225140006 for ; Tue, 28 Feb 2023 23:14:51 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=L5FrJGgk; spf=pass (imf09.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677626091; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=; b=1b2JL652o1RR6LiuKg55hJ3lLZIsCn9mazn49GD5s7cwzRzkjo96dQ4Jc3p3ZccxU71c9d gV9Ilx+GjQ2Z+owgVQFvRyHhU/cWSskpNoD4wXdHZqQr5/qJdSBQPVIV20b3zDpGMWp5KA AFLJgarA5s3383GzTs1lHMOLqmGA/dI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=L5FrJGgk; spf=pass (imf09.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677626091; a=rsa-sha256; cv=none; b=3VuSH0o9SWvPcFUTL2M3hJb5YvO+5k2mWPZsKQrCYDDv06Q5kmqmows56w7KH2Sbj4i7I7 x8d8I4qh/+N3GS4bwvH5KsmF3GVP7f4kBG5r3ymap4Q/QwqGoqMe39IcAZ88uDR1oyQjU9 YrmhRLGc+KXe/NOgHxpKebG8n4sbqdc= Received: by mail-pj1-f45.google.com with SMTP id m3-20020a17090ade0300b00229eec90a7fso2637141pjv.0 for ; Tue, 28 Feb 2023 15:14:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677626090; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=; b=L5FrJGgkVEevxRepHVOTmUx3GaOP0U5GsRZlH4bqXkei6mJcLx9/uGq/JKZnzuDIdJ h0RAHGdTn1QLSvjSXMR6wGLJF+NWg/WQjjzpnKoyeFUH/ajZjXVeELBmgDKKNQGhFeU3 VyGdWQo4Ad9i/W01RC0YThtv8z6h1Po4uwQJJfcjkaHqfg2yPJW8wA6xua9n2nVpFIPg Pd/FMcKV38YstjyCD6jl9kwtYcWE/C77HuZNRjrczcTdhXG2PsSkckE4ncE3u2upWpiD QietehTRZhbwL9uYW0hNuM5vvkI7WRW8rsVsuEw8R+qMWjmuz4UZm1WRu4ozgq2fkuUA sPKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677626090; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=; b=pr6RRTbUn8tVU2/BNeKKX8L7wKT8a0+6LRfrMhNvG/GIposQhTRMaDPE1GvxbEpr+o mGgXoEqNp2UNagV8etzw0DrUYeRAj/6N3/ILkoX7EpqYoEEnOMfeCLjyDMrHDUfZSpD6 caESi0TiRoEggEVaS8UsEh7PEkjP74fvzhihSgsrHzC6/wnZ+ejl5+u0XUa5iNBhw8Gu n98uZly9qAU4eD6iAtTMjBvSUI7JHur1d/QpTk+EraKf5ySdCW1WMP+F2H0OAxZBxHs7 qJC1NmcSsW47T8ldVFOTiIDu7rLwvC1tZgZOQqJ/87L1h7dt6/EF+TfteEK3w3LG1aln 9tQw== X-Gm-Message-State: AO0yUKUlONIc7vYuF3bOEvknVX34eozpWW3ZFu6QwL0ofVprrf2Nrffq EKBWqlDJPiJxoYhg98Imyy8= X-Google-Smtp-Source: AK7set+uQEKJGMvciIQ9HGglTcsr6kDKy3GcYCT6m79r97G8wkVUPwGQIa6gzRrj5oEIsENq7XKUew== X-Received: by 2002:a17:903:2447:b0:19c:c9da:a62e with SMTP id l7-20020a170903244700b0019cc9daa62emr5013045pls.54.1677626090194; Tue, 28 Feb 2023 15:14:50 -0800 (PST) Received: from google.com ([2620:15c:211:201:639:82f5:b510:3494]) by smtp.gmail.com with ESMTPSA id g12-20020a170902fe0c00b001994fc55998sm7002311plj.217.2023.02.28.15.14.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 15:14:49 -0800 (PST) Date: Tue, 28 Feb 2023 15:14:47 -0800 From: Minchan Kim To: Sergey Senozhatsky Cc: Andrew Morton , Yosry Ahmed , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv2 4/6] zsmalloc: rework compaction algorithm Message-ID: References: <20230223030451.543162-1-senozhatsky@chromium.org> <20230223030451.543162-5-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230223030451.543162-5-senozhatsky@chromium.org> X-Stat-Signature: 4rxwkitu16amwmbq91it4yj8ninxncdy X-Rspam-User: X-Rspamd-Queue-Id: 88225140006 X-Rspamd-Server: rspam06 X-HE-Tag: 1677626091-282333 X-HE-Meta: U2FsdGVkX1+EWxwd63XtJsP/wQvwveDPlcmR/yFJdKwcbTF7/coivLfreFQqRruNRu16t3ku1AbexzAdvGJTHta4v/DMen1oo5Gqz525nWUv0W8b3a5jfaVYfME6vdMs0rLW6qCfcArOvNTFeU2o9Gqp8DOHTP7QM9bF1c5WfncseTtnBWTO8Um9+Qibsr9njcJY/NyH2Br+7g+FkDfnbp8yQ2lwUzxHmHoUuB4ngNDCXPvpUbkArnaQxu9uSt6JmVJxF7k+Ow4EECul0jISE4OgoSATcn/Ntmd+XVHwbz+YvQtXNxq8k4BSjwJ1ulRIylm5YBLWErv+AlIyHFv2Ga6r8GIlydI7Lo6mZbSL1SMMQ69dh2uDKjGnVy0LQNx6fgyykLJxcudvhqmVno2rWJJtJLWXi7K690NGUp+Fz011qWy+B8ceyXsQaTfJLEk/GUJlbmH23PHt6VbXmpim/09dmOjPbRT0qF/77FC9GmATVTrfVFjF9THhjl6Ixxolw3ir9h9zktSeU/QR5eDr6RStVMvwLnCazvPqnq/zXukjNS3Ms5RWmEO0LOrg80253rjWZlTp8VKQf7d9cGck1CUs6UTXI4DdJnNYgxGOOvJhyVO4jrmXJO0kAEbvHcFFvd5t6ftNwdC53nNdADW6TpvaOHDyhdtFcZvg30hIjqqqP1Av6rWbg5RlRGS9Y5CjGrWJL8EherJGcrDmWa3BIf/+4rX/T47/Kv93/Hjxfr+F940kA18Yas+lTlxWsh50HltdRoqz1TfXfW8vVSubcl/hZdO5SASHM+FZAdFV8Dzfx4NeR/bCXZIJewGv5UD0NUIuwnkI8rG+Dhr+12fKLC12/fXf97eiBsgeFGeqPWpsX8/jrwP1OCYvR+iWuL//cNmVVy4d7s6C8u8WMkBPHoonpifPO6SizWFHbnpVRxh8wLfSpkrtgDeElGyj2Uw750XL6toGZBD/W0+pm7b 02t22Coz 2u5vhDz3QuATK3qa9rdDU+fwyoVKHkJc510qkqPwExoMm0ZsV/+OYcLiATZHg+geYWIACpPGjpvyNeFVLd63l3dSf2FAmTqQ75xSTl+4t9y65+0ch7d/V4/weeDdT08LAu1svZW+KwpCrLHIWl42eNAQUGygRNSGSNm8xBlE17l9s7aO7O5c23GHirgTvhUtBFnu5nkYKJhl1lKLKMkO+gkG9Jkuz2IMqv9kDl+OdS9SjeTCet7JvVIbe0ewE8OiqlLZvZR7lmbmHiTiow6VocGoY7m6e1IGD931d2hDHo6EQP4NtNmqeJ6t9cFc+BfM/uPULmCxuNn4TnzHKsj44ymn7IcfGLJCD3np6+KnGvpvxnQRLIlT2RhEU0krxvlhu9WYq9gQedjbMWGoIbEfOlTmkA9SwKwruYR7Tenz13L6WaOw0Wz70D2V/GFCiLx8QF4D+SGecdkXV7v9EE49W1VrtPQE9CjOkq5pIB+jshjeFNBO9qWzRacNjHNRlB+eN/A4/GbXwsZcbFDEpyzC+lhaKjj5rSykm7maRMMIL4/1CDp85jY/JEjvwumHJlLkpU7yt2lN/ZK5TjwZGKx7GKfKoxe4+BeTM/hn0MribxsT/LcI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000146, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 23, 2023 at 12:04:49PM +0900, Sergey Senozhatsky wrote: > The zsmalloc compaction algorithm has the potential to > waste some CPU cycles, particularly when compacting pages > within the same fullness group. This is due to the way it > selects the head page of the fullness list for source and > destination pages, and how it reinserts those pages during > each iteration. The algorithm may first use a page as a > migration destination and then as a migration source, > leading to an unnecessary back-and-forth movement of > objects. > > Consider the following fullness list: > > PageA PageB PageC PageD PageE > > During the first iteration, the compaction algorithm will > select PageA as the source and PageB as the destination. > All of PageA's objects will be moved to PageB, and then > PageA will be released while PageB is reinserted into the > fullness list. > > PageB PageC PageD PageE > > During the next iteration, the compaction algorithm will > again select the head of the list as the source and destination, > meaning that PageB will now serve as the source and PageC as > the destination. This will result in the objects being moved > away from PageB, the same objects that were just moved to PageB > in the previous iteration. > > To prevent this avalanche effect, the compaction algorithm > should not reinsert the destination page between iterations. > By doing so, the most optimal page will continue to be used > and its usage ratio will increase, reducing internal > fragmentation. The destination page should only be reinserted > into the fullness list if: > - It becomes full > - No source page is available. > > Signed-off-by: Sergey Senozhatsky > --- > mm/zsmalloc.c | 82 ++++++++++++++++++++++++--------------------------- > 1 file changed, 38 insertions(+), 44 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 1a92ebe338eb..eacf9e32da5c 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1786,15 +1786,14 @@ struct zs_compact_control { > int obj_idx; > }; > > -static int migrate_zspage(struct zs_pool *pool, struct size_class *class, > - struct zs_compact_control *cc) > +static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > + struct zs_compact_control *cc) > { > unsigned long used_obj, free_obj; > unsigned long handle; > struct page *s_page = cc->s_page; > struct page *d_page = cc->d_page; > int obj_idx = cc->obj_idx; > - int ret = 0; > > while (1) { > handle = find_alloced_obj(class, s_page, &obj_idx); > @@ -1807,10 +1806,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, > } > > /* Stop if there is no more space */ > - if (zspage_full(class, get_zspage(d_page))) { > - ret = -ENOMEM; > + if (zspage_full(class, get_zspage(d_page))) > break; > - } > > used_obj = handle_to_obj(handle); > free_obj = obj_malloc(pool, get_zspage(d_page), handle); > @@ -1823,8 +1820,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, > /* Remember last position in this iteration */ > cc->s_page = s_page; > cc->obj_idx = obj_idx; > - > - return ret; > } > > static struct zspage *isolate_src_zspage(struct size_class *class) > @@ -2228,57 +2223,56 @@ static unsigned long __zs_compact(struct zs_pool *pool, > * as well as zpage allocation/free > */ > spin_lock(&pool->lock); > - while ((src_zspage = isolate_src_zspage(class))) { > - /* protect someone accessing the zspage(i.e., zs_map_object) */ > - migrate_write_lock(src_zspage); > - > - if (!zs_can_compact(class)) > - break; > - > - cc.obj_idx = 0; > - cc.s_page = get_first_page(src_zspage); > - > - while ((dst_zspage = isolate_dst_zspage(class))) { > - migrate_write_lock_nested(dst_zspage); > - > + while (1) { > + if (!dst_zspage) { > + dst_zspage = isolate_dst_zspage(class); > + if (!dst_zspage) > + goto out; > + migrate_write_lock(dst_zspage); > cc.d_page = get_first_page(dst_zspage); > - /* > - * If there is no more space in dst_page, resched > - * and see if anyone had allocated another zspage. > - */ > - if (!migrate_zspage(pool, class, &cc)) > - break; > + } > > + if (!zs_can_compact(class)) { > putback_zspage(class, dst_zspage); > migrate_write_unlock(dst_zspage); > - dst_zspage = NULL; > - if (spin_is_contended(&pool->lock)) > - break; > + goto out; just break instead of goto > } > > - /* Stop if we couldn't find slot */ > - if (dst_zspage == NULL) > - break; > + src_zspage = isolate_src_zspage(class); > + if (!src_zspage) { > + putback_zspage(class, dst_zspage); > + migrate_write_unlock(dst_zspage); > + goto out; just break instead of goto > + } > > - putback_zspage(class, dst_zspage); > - migrate_write_unlock(dst_zspage); > + migrate_write_lock_nested(src_zspage); > + > + cc.obj_idx = 0; > + cc.s_page = get_first_page(src_zspage); > + migrate_zspage(pool, class, &cc); > > if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) { > migrate_write_unlock(src_zspage); > free_zspage(pool, class, src_zspage); > pages_freed += class->pages_per_zspage; > - } else > + } else { > migrate_write_unlock(src_zspage); So here, migratre_wirite_unlock(src_zspage) is done in both conditions we we could change like this. ret = putback_zspage(class, src_zspage); migrate_write_unlock(src_zspage); if (ret == ZS_INUSE_RATIO_0 or ZS_EMPTY) { free_zspage(); xxx } > - spin_unlock(&pool->lock); > - cond_resched(); > - spin_lock(&pool->lock); > - } > + } > > - if (src_zspage) { > - putback_zspage(class, src_zspage); > - migrate_write_unlock(src_zspage); > - } > + if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100 > + || spin_is_contended(&pool->lock)) { > + putback_zspage(class, dst_zspage); > + migrate_write_unlock(dst_zspage); > + dst_zspage = NULL; spin_unlock(&pool->lock); cond_resched() spin_lock(&pool->lock); > + } > > + if (!dst_zspage) { Then we could remove the condition logic, here. > + spin_unlock(&pool->lock); > + cond_resched(); > + spin_lock(&pool->lock); > + } > + } > +out: > spin_unlock(&pool->lock); > > return pages_freed; So, how about this on top of your patch? diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index eacf9e32da5c..4dfc910f5d89 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -2223,40 +2223,33 @@ static unsigned long __zs_compact(struct zs_pool *pool, * as well as zpage allocation/free */ spin_lock(&pool->lock); - while (1) { + while (zs_can_compact(class)) { + int ret; + if (!dst_zspage) { dst_zspage = isolate_dst_zspage(class); if (!dst_zspage) - goto out; + break; migrate_write_lock(dst_zspage); cc.d_page = get_first_page(dst_zspage); } - if (!zs_can_compact(class)) { - putback_zspage(class, dst_zspage); - migrate_write_unlock(dst_zspage); - goto out; - } - src_zspage = isolate_src_zspage(class); - if (!src_zspage) { - putback_zspage(class, dst_zspage); - migrate_write_unlock(dst_zspage); - goto out; - } + if (!src_zspage) + break; migrate_write_lock_nested(src_zspage); - cc.obj_idx = 0; cc.s_page = get_first_page(src_zspage); + migrate_zspage(pool, class, &cc); + ret = putback_zspage(class, src_zspage); + migrate_write_unlock(src_zspage); - if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) { - migrate_write_unlock(src_zspage); + if (ret == ZS_INUSE_RATIO_0) { free_zspage(pool, class, src_zspage); pages_freed += class->pages_per_zspage; - } else { - migrate_write_unlock(src_zspage); + src_zspage = NULL; } if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100 @@ -2264,14 +2257,22 @@ static unsigned long __zs_compact(struct zs_pool *pool, putback_zspage(class, dst_zspage); migrate_write_unlock(dst_zspage); dst_zspage = NULL; - } - if (!dst_zspage) { spin_unlock(&pool->lock); cond_resched(); spin_lock(&pool->lock); } } + + if (src_zspage) { + putback_zspage(class, src_zspage); + migrate_write_unlock(src_zspage); + } + + if (dst_zspage) { + putback_zspage(class, dst_zspage); + migrate_write_unlock(dst_zspage); + } out: spin_unlock(&pool->lock);