From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6C8DDC64EC4
	for <linux-mm@archiver.kernel.org>; Tue, 28 Feb 2023 23:14:54 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id D698C6B0072; Tue, 28 Feb 2023 18:14:53 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D19106B0073; Tue, 28 Feb 2023 18:14:53 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id BB9196B0074; Tue, 28 Feb 2023 18:14:53 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id AB96E6B0072
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 18:14:53 -0500 (EST)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B00E4073F
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 23:14:53 +0000 (UTC)
X-FDA: 80518257666.11.3514097
Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45])
	by imf09.hostedemail.com (Postfix) with ESMTP id 88225140006
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 23:14:51 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=L5FrJGgk;
	spf=pass (imf09.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none)
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1677626091;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=;
	b=1b2JL652o1RR6LiuKg55hJ3lLZIsCn9mazn49GD5s7cwzRzkjo96dQ4Jc3p3ZccxU71c9d
	gV9Ilx+GjQ2Z+owgVQFvRyHhU/cWSskpNoD4wXdHZqQr5/qJdSBQPVIV20b3zDpGMWp5KA
	AFLJgarA5s3383GzTs1lHMOLqmGA/dI=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=L5FrJGgk;
	spf=pass (imf09.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none)
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677626091; a=rsa-sha256;
	cv=none;
	b=3VuSH0o9SWvPcFUTL2M3hJb5YvO+5k2mWPZsKQrCYDDv06Q5kmqmows56w7KH2Sbj4i7I7
	x8d8I4qh/+N3GS4bwvH5KsmF3GVP7f4kBG5r3ymap4Q/QwqGoqMe39IcAZ88uDR1oyQjU9
	YrmhRLGc+KXe/NOgHxpKebG8n4sbqdc=
Received: by mail-pj1-f45.google.com with SMTP id m3-20020a17090ade0300b00229eec90a7fso2637141pjv.0
        for <linux-mm@kvack.org>; Tue, 28 Feb 2023 15:14:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1677626090;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=;
        b=L5FrJGgkVEevxRepHVOTmUx3GaOP0U5GsRZlH4bqXkei6mJcLx9/uGq/JKZnzuDIdJ
         h0RAHGdTn1QLSvjSXMR6wGLJF+NWg/WQjjzpnKoyeFUH/ajZjXVeELBmgDKKNQGhFeU3
         VyGdWQo4Ad9i/W01RC0YThtv8z6h1Po4uwQJJfcjkaHqfg2yPJW8wA6xua9n2nVpFIPg
         Pd/FMcKV38YstjyCD6jl9kwtYcWE/C77HuZNRjrczcTdhXG2PsSkckE4ncE3u2upWpiD
         QietehTRZhbwL9uYW0hNuM5vvkI7WRW8rsVsuEw8R+qMWjmuz4UZm1WRu4ozgq2fkuUA
         sPKA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1677626090;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=rKJlhnCff5sNnYAeJlixRN9YIHYvvlU48QM4lr2nN4Y=;
        b=pr6RRTbUn8tVU2/BNeKKX8L7wKT8a0+6LRfrMhNvG/GIposQhTRMaDPE1GvxbEpr+o
         mGgXoEqNp2UNagV8etzw0DrUYeRAj/6N3/ILkoX7EpqYoEEnOMfeCLjyDMrHDUfZSpD6
         caESi0TiRoEggEVaS8UsEh7PEkjP74fvzhihSgsrHzC6/wnZ+ejl5+u0XUa5iNBhw8Gu
         n98uZly9qAU4eD6iAtTMjBvSUI7JHur1d/QpTk+EraKf5ySdCW1WMP+F2H0OAxZBxHs7
         qJC1NmcSsW47T8ldVFOTiIDu7rLwvC1tZgZOQqJ/87L1h7dt6/EF+TfteEK3w3LG1aln
         9tQw==
X-Gm-Message-State: AO0yUKUlONIc7vYuF3bOEvknVX34eozpWW3ZFu6QwL0ofVprrf2Nrffq
	EKBWqlDJPiJxoYhg98Imyy8=
X-Google-Smtp-Source: AK7set+uQEKJGMvciIQ9HGglTcsr6kDKy3GcYCT6m79r97G8wkVUPwGQIa6gzRrj5oEIsENq7XKUew==
X-Received: by 2002:a17:903:2447:b0:19c:c9da:a62e with SMTP id l7-20020a170903244700b0019cc9daa62emr5013045pls.54.1677626090194;
        Tue, 28 Feb 2023 15:14:50 -0800 (PST)
Received: from google.com ([2620:15c:211:201:639:82f5:b510:3494])
        by smtp.gmail.com with ESMTPSA id g12-20020a170902fe0c00b001994fc55998sm7002311plj.217.2023.02.28.15.14.49
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 28 Feb 2023 15:14:49 -0800 (PST)
Date: Tue, 28 Feb 2023 15:14:47 -0800
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCHv2 4/6] zsmalloc: rework compaction algorithm
Message-ID: <Y/6K5+pC+nXvDsjq@google.com>
References: <20230223030451.543162-1-senozhatsky@chromium.org>
 <20230223030451.543162-5-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230223030451.543162-5-senozhatsky@chromium.org>
X-Stat-Signature: 4rxwkitu16amwmbq91it4yj8ninxncdy
X-Rspam-User: 
X-Rspamd-Queue-Id: 88225140006
X-Rspamd-Server: rspam06
X-HE-Tag: 1677626091-282333
X-HE-Meta: U2FsdGVkX1+EWxwd63XtJsP/wQvwveDPlcmR/yFJdKwcbTF7/coivLfreFQqRruNRu16t3ku1AbexzAdvGJTHta4v/DMen1oo5Gqz525nWUv0W8b3a5jfaVYfME6vdMs0rLW6qCfcArOvNTFeU2o9Gqp8DOHTP7QM9bF1c5WfncseTtnBWTO8Um9+Qibsr9njcJY/NyH2Br+7g+FkDfnbp8yQ2lwUzxHmHoUuB4ngNDCXPvpUbkArnaQxu9uSt6JmVJxF7k+Ow4EECul0jISE4OgoSATcn/Ntmd+XVHwbz+YvQtXNxq8k4BSjwJ1ulRIylm5YBLWErv+AlIyHFv2Ga6r8GIlydI7Lo6mZbSL1SMMQ69dh2uDKjGnVy0LQNx6fgyykLJxcudvhqmVno2rWJJtJLWXi7K690NGUp+Fz011qWy+B8ceyXsQaTfJLEk/GUJlbmH23PHt6VbXmpim/09dmOjPbRT0qF/77FC9GmATVTrfVFjF9THhjl6Ixxolw3ir9h9zktSeU/QR5eDr6RStVMvwLnCazvPqnq/zXukjNS3Ms5RWmEO0LOrg80253rjWZlTp8VKQf7d9cGck1CUs6UTXI4DdJnNYgxGOOvJhyVO4jrmXJO0kAEbvHcFFvd5t6ftNwdC53nNdADW6TpvaOHDyhdtFcZvg30hIjqqqP1Av6rWbg5RlRGS9Y5CjGrWJL8EherJGcrDmWa3BIf/+4rX/T47/Kv93/Hjxfr+F940kA18Yas+lTlxWsh50HltdRoqz1TfXfW8vVSubcl/hZdO5SASHM+FZAdFV8Dzfx4NeR/bCXZIJewGv5UD0NUIuwnkI8rG+Dhr+12fKLC12/fXf97eiBsgeFGeqPWpsX8/jrwP1OCYvR+iWuL//cNmVVy4d7s6C8u8WMkBPHoonpifPO6SizWFHbnpVRxh8wLfSpkrtgDeElGyj2Uw750XL6toGZBD/W0+pm7b
 02t22Coz
 2u5vhDz3QuATK3qa9rdDU+fwyoVKHkJc510qkqPwExoMm0ZsV/+OYcLiATZHg+geYWIACpPGjpvyNeFVLd63l3dSf2FAmTqQ75xSTl+4t9y65+0ch7d/V4/weeDdT08LAu1svZW+KwpCrLHIWl42eNAQUGygRNSGSNm8xBlE17l9s7aO7O5c23GHirgTvhUtBFnu5nkYKJhl1lKLKMkO+gkG9Jkuz2IMqv9kDl+OdS9SjeTCet7JvVIbe0ewE8OiqlLZvZR7lmbmHiTiow6VocGoY7m6e1IGD931d2hDHo6EQP4NtNmqeJ6t9cFc+BfM/uPULmCxuNn4TnzHKsj44ymn7IcfGLJCD3np6+KnGvpvxnQRLIlT2RhEU0krxvlhu9WYq9gQedjbMWGoIbEfOlTmkA9SwKwruYR7Tenz13L6WaOw0Wz70D2V/GFCiLx8QF4D+SGecdkXV7v9EE49W1VrtPQE9CjOkq5pIB+jshjeFNBO9qWzRacNjHNRlB+eN/A4/GbXwsZcbFDEpyzC+lhaKjj5rSykm7maRMMIL4/1CDp85jY/JEjvwumHJlLkpU7yt2lN/ZK5TjwZGKx7GKfKoxe4+BeTM/hn0MribxsT/LcI=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000146, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Feb 23, 2023 at 12:04:49PM +0900, Sergey Senozhatsky wrote:
> The zsmalloc compaction algorithm has the potential to
> waste some CPU cycles, particularly when compacting pages
> within the same fullness group. This is due to the way it
> selects the head page of the fullness list for source and
> destination pages, and how it reinserts those pages during
> each iteration. The algorithm may first use a page as a
> migration destination and then as a migration source,
> leading to an unnecessary back-and-forth movement of
> objects.
> 
> Consider the following fullness list:
> 
> PageA PageB PageC PageD PageE
> 
> During the first iteration, the compaction algorithm will
> select PageA as the source and PageB as the destination.
> All of PageA's objects will be moved to PageB, and then
> PageA will be released while PageB is reinserted into the
> fullness list.
> 
> PageB PageC PageD PageE
> 
> During the next iteration, the compaction algorithm will
> again select the head of the list as the source and destination,
> meaning that PageB will now serve as the source and PageC as
> the destination. This will result in the objects being moved
> away from PageB, the same objects that were just moved to PageB
> in the previous iteration.
> 
> To prevent this avalanche effect, the compaction algorithm
> should not reinsert the destination page between iterations.
> By doing so, the most optimal page will continue to be used
> and its usage ratio will increase, reducing internal
> fragmentation. The destination page should only be reinserted
> into the fullness list if:
> - It becomes full
> - No source page is available.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  mm/zsmalloc.c | 82 ++++++++++++++++++++++++---------------------------
>  1 file changed, 38 insertions(+), 44 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 1a92ebe338eb..eacf9e32da5c 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1786,15 +1786,14 @@ struct zs_compact_control {
>  	int obj_idx;
>  };
>  
> -static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> -				struct zs_compact_control *cc)
> +static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
> +			   struct zs_compact_control *cc)
>  {
>  	unsigned long used_obj, free_obj;
>  	unsigned long handle;
>  	struct page *s_page = cc->s_page;
>  	struct page *d_page = cc->d_page;
>  	int obj_idx = cc->obj_idx;
> -	int ret = 0;
>  
>  	while (1) {
>  		handle = find_alloced_obj(class, s_page, &obj_idx);
> @@ -1807,10 +1806,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		}
>  
>  		/* Stop if there is no more space */
> -		if (zspage_full(class, get_zspage(d_page))) {
> -			ret = -ENOMEM;
> +		if (zspage_full(class, get_zspage(d_page)))
>  			break;
> -		}
>  
>  		used_obj = handle_to_obj(handle);
>  		free_obj = obj_malloc(pool, get_zspage(d_page), handle);
> @@ -1823,8 +1820,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  	/* Remember last position in this iteration */
>  	cc->s_page = s_page;
>  	cc->obj_idx = obj_idx;
> -
> -	return ret;
>  }
>  
>  static struct zspage *isolate_src_zspage(struct size_class *class)
> @@ -2228,57 +2223,56 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  	 * as well as zpage allocation/free
>  	 */
>  	spin_lock(&pool->lock);
> -	while ((src_zspage = isolate_src_zspage(class))) {
> -		/* protect someone accessing the zspage(i.e., zs_map_object) */
> -		migrate_write_lock(src_zspage);
> -
> -		if (!zs_can_compact(class))
> -			break;
> -
> -		cc.obj_idx = 0;
> -		cc.s_page = get_first_page(src_zspage);
> -
> -		while ((dst_zspage = isolate_dst_zspage(class))) {
> -			migrate_write_lock_nested(dst_zspage);
> -
> +	while (1) {
> +		if (!dst_zspage) {
> +			dst_zspage = isolate_dst_zspage(class);
> +			if (!dst_zspage)
> +				goto out;
> +			migrate_write_lock(dst_zspage);
>  			cc.d_page = get_first_page(dst_zspage);
> -			/*
> -			 * If there is no more space in dst_page, resched
> -			 * and see if anyone had allocated another zspage.
> -			 */
> -			if (!migrate_zspage(pool, class, &cc))
> -				break;
> +		}
>  
> +		if (!zs_can_compact(class)) {
>  			putback_zspage(class, dst_zspage);
>  			migrate_write_unlock(dst_zspage);
> -			dst_zspage = NULL;
> -			if (spin_is_contended(&pool->lock))
> -				break;
> +			goto out;

just break instead of goto

>  		}
>  
> -		/* Stop if we couldn't find slot */
> -		if (dst_zspage == NULL)
> -			break;
> +		src_zspage = isolate_src_zspage(class);
> +		if (!src_zspage) {
> +			putback_zspage(class, dst_zspage);
> +			migrate_write_unlock(dst_zspage);
> +			goto out;

just break instead of goto

> +		}
>  
> -		putback_zspage(class, dst_zspage);
> -		migrate_write_unlock(dst_zspage);
> +		migrate_write_lock_nested(src_zspage);
> +
> +		cc.obj_idx = 0;
> +		cc.s_page = get_first_page(src_zspage);
> +		migrate_zspage(pool, class, &cc);
>  
>  		if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) {
>  			migrate_write_unlock(src_zspage);
>  			free_zspage(pool, class, src_zspage);
>  			pages_freed += class->pages_per_zspage;
> -		} else
> +		} else {
>  			migrate_write_unlock(src_zspage);

So here, migratre_wirite_unlock(src_zspage) is done in both conditions
we we could change like this.

ret = putback_zspage(class, src_zspage);
migrate_write_unlock(src_zspage);
if (ret == ZS_INUSE_RATIO_0 or ZS_EMPTY) {
    free_zspage();
    xxx
}
    

> -		spin_unlock(&pool->lock);
> -		cond_resched();
> -		spin_lock(&pool->lock);
> -	}
> +		}
>  
> -	if (src_zspage) {
> -		putback_zspage(class, src_zspage);
> -		migrate_write_unlock(src_zspage);
> -	}
> +		if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100
> +		    || spin_is_contended(&pool->lock)) {
> +			putback_zspage(class, dst_zspage);
> +			migrate_write_unlock(dst_zspage);
> +			dst_zspage = NULL;

			spin_unlock(&pool->lock);
            cond_resched()
			spin_lock(&pool->lock);
> +		}
>  
> +		if (!dst_zspage) {

Then we could remove the condition logic, here.

> +			spin_unlock(&pool->lock);
> +			cond_resched();
> +			spin_lock(&pool->lock);
> +		}
> +	}
> +out:
>  	spin_unlock(&pool->lock);
>  
>  	return pages_freed;

So, how about this on top of your patch?


diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index eacf9e32da5c..4dfc910f5d89 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2223,40 +2223,33 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 	 * as well as zpage allocation/free
 	 */
 	spin_lock(&pool->lock);
-	while (1) {
+	while (zs_can_compact(class)) {
+		int ret;
+
 		if (!dst_zspage) {
 			dst_zspage = isolate_dst_zspage(class);
 			if (!dst_zspage)
-				goto out;
+				break;
 			migrate_write_lock(dst_zspage);
 			cc.d_page = get_first_page(dst_zspage);
 		}
 
-		if (!zs_can_compact(class)) {
-			putback_zspage(class, dst_zspage);
-			migrate_write_unlock(dst_zspage);
-			goto out;
-		}
-
 		src_zspage = isolate_src_zspage(class);
-		if (!src_zspage) {
-			putback_zspage(class, dst_zspage);
-			migrate_write_unlock(dst_zspage);
-			goto out;
-		}
+		if (!src_zspage)
+			break;
 
 		migrate_write_lock_nested(src_zspage);
-
 		cc.obj_idx = 0;
 		cc.s_page = get_first_page(src_zspage);
+
 		migrate_zspage(pool, class, &cc);
+		ret = putback_zspage(class, src_zspage);
+		migrate_write_unlock(src_zspage);
 
-		if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) {
-			migrate_write_unlock(src_zspage);
+		if (ret == ZS_INUSE_RATIO_0) {
 			free_zspage(pool, class, src_zspage);
 			pages_freed += class->pages_per_zspage;
-		} else {
-			migrate_write_unlock(src_zspage);
+			src_zspage = NULL;
 		}
 
 		if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100
@@ -2264,14 +2257,22 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 			putback_zspage(class, dst_zspage);
 			migrate_write_unlock(dst_zspage);
 			dst_zspage = NULL;
-		}
 
-		if (!dst_zspage) {
 			spin_unlock(&pool->lock);
 			cond_resched();
 			spin_lock(&pool->lock);
 		}
 	}
+
+	if (src_zspage) {
+		putback_zspage(class, src_zspage);
+		migrate_write_unlock(src_zspage);
+	}
+
+	if (dst_zspage) {
+		putback_zspage(class, dst_zspage);
+		migrate_write_unlock(dst_zspage);
+	}
 out:
 	spin_unlock(&pool->lock);