From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C953C4167B for ; Wed, 6 Dec 2023 16:56:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3A5F6B0088; Wed, 6 Dec 2023 11:56:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEA4E6B008A; Wed, 6 Dec 2023 11:56:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD85D6B008C; Wed, 6 Dec 2023 11:56:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BF3766B0088 for ; Wed, 6 Dec 2023 11:56:57 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8BF57401B2 for ; Wed, 6 Dec 2023 16:56:57 +0000 (UTC) X-FDA: 81536998074.04.19B8451 Received: from mail-il1-f177.google.com (mail-il1-f177.google.com [209.85.166.177]) by imf20.hostedemail.com (Postfix) with ESMTP id B579A1C002F for ; Wed, 6 Dec 2023 16:56:55 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FCIN0yv9; spf=pass (imf20.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.177 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701881815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; b=iVknCzY8hnYsljtZjxMKaZbGBaxiVb8c+r5jhXw2lOHtJ0cg5sj32NRrcaoHl1CH4MRCFv fIK6BxpE8CwUneWIwJII2WEKxjxQZnO3oxW7kjFqzI4ALvfevpRE5tgRK8/fUDQYvm1Q4Z BSzYwUXoYrpUvG7kFK17nWuUfPg7DFM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701881815; a=rsa-sha256; cv=none; b=wJlPGlKyj4mmruoECO24zKLujYzSvt8XNW+iQ9MrRuYsX4zBP8w0zYRWkRVZin5kAQua9i Bk52E5chtpYFvD5xN2F38gWC0MgLldytHCK0x3aUjgCJ7CZ506UY46xKYBYOXz2yEXE9gu QmMAuCHVMCJ2DraRa1QfJXRGqTb5ESE= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FCIN0yv9; spf=pass (imf20.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.177 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-il1-f177.google.com with SMTP id e9e14a558f8ab-35d4e557c4bso24753225ab.0 for ; Wed, 06 Dec 2023 08:56:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701881815; x=1702486615; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; b=FCIN0yv9iwWBqyaILOk73keLNlGZcLo/+qQZ37jsJ7Qg2NRQ1hyEH8KwXZLCekE+HJ +cUKgWCj25pnfwAIgpm2K4RzqQTtT84VZxlw86+M9m8FM6ZF/B4+Qygr3fvVpiucYyDT s1Xni6PTM4/Z9RQn+R91Ip9ZuaaXARvxUay517nsb0aBAgNFkYQdlblEXbZfDYieqNBj XOOI6OddLOpES5qETmuP1zLhHjXyvgtuFXAusppSukCmFj3IVeXdycjvMwI48K7xhjcH Qio7FfpPDp63j3WDPgIEtgKPidj4tuMQT+deSgGswVKOrKzO0+a8SWR9oqvHfQJydn0Y q5kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701881815; x=1702486615; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; b=u9etcS6AutDeFhGnCTBZcIUqYnLVmkNI9Y5hXX1FUhedOeOyylIygHl5HlrsFvj218 aRHshaZcRZiVJZt9oS9GqZgCR1RG08GmTMz3JNQ3uT3XJirgXfQdfoNfNzLnSSwImNmj MfI9oEq9cSkHIf+5TniRPrXlyZZWaGiJmTEeEsFtp0GbutOWfA63QK9k5adE4DnXsh1W 4vAwxMt/TPd+jIz9OP5CeemjcqU+e/rcj02AsnOSEgXXemCwTJZRLftYgK8ytMhA5iZK Bz2SZ+Pk3y8BTAMZrl8UI2ElplEwNeQNzsYuAHHQjHeUNTKEvFCdbbXIY1kjvzJE3wbu a03Q== X-Gm-Message-State: AOJu0YwyTW0XRkeQYdi84MxffE9sHG0z5Ju41FNfVq7dxXwTA2IIZHJd CEGFYQQRcS6gXa/3UhD8D1xYT9YflWfITyiDU14= X-Google-Smtp-Source: AGHT+IFNq8u43WbI4c01PT05oYMO3aPQSJ6ErYBAcTShuJSw9WrajU+G7etAPmGSFnGOLDah6Qzb1dasQimC9b9ONCQ= X-Received: by 2002:a05:6e02:1050:b0:35d:59a2:1281 with SMTP id p16-20020a056e02105000b0035d59a21281mr1435226ilj.45.1701881814603; Wed, 06 Dec 2023 08:56:54 -0800 (PST) MIME-Version: 1.0 References: <20231130194023.4102148-1-nphamcs@gmail.com> <20231130194023.4102148-7-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Wed, 6 Dec 2023 08:56:43 -0800 Message-ID: Subject: Re: [PATCH v8 6/6] zswap: shrinks zswap pool based on memory pressure To: Yosry Ahmed Cc: Chengming Zhou , akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: xsscbbodihwj5kq76i6bybter48656zo X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B579A1C002F X-Rspam-User: X-HE-Tag: 1701881815-500551 X-HE-Meta: U2FsdGVkX18iWZahghNt2l5c667WzUbnJsjd4JxdD4fOPNoEunC4oysuroG/WCVOEZue6JULNABZ87IJXUSWdMWVRpMZAJS/tcZ2AykNAsAcrZCSxd4yNHtD1fA5kz3ckYfbbMc8b2ORztfGOgoqjsbh6QmkIHy8f8OtcfFUfBf88Avhwt3oTAPY/M+in1MICtUPD7nkJqbJzoKA77OqiHnbMRWP6AKHA6LsA6pvFR73Wglmyb1QOc32d+M428e27dvrxdvVtW6iCTjWzIJF0zDgT71+I+nnXofO9fBIfn3fD/0Z3kH5p8kWHZzbygGeNS6iEYoc0m421wHFoayrcdCuCleFWzS+n9RgRHsbIRiSffkDGRg+z5OgxYS3lH65OKdCQ6dPYZwqatDHdOM4HwYGSkGSweuR6UjTLuq4RqF4HqSvOsFzlmwMETWaJaM621yzXmpav+himFJqXFKGseVBWwiSun8vqh8AeM5i1RicWGg31h6EQANgsq2O09z60a+S91LFk8gU6Oqxl37YB5pskUiiKTrM6RGDovTLFL4jv2zVQbv0L+Fc6oDQMl9rAmDHHsHHQXWsMLYELP98+NPSYRYBubaB1HKKp1+Dz606nZhjqsPiSeYpCyOKPr3xmwmpT+mVnlsEM63+n+6oJcZONAXaO2GGSKZGpURnJ6OW+aChJzGET97WITvoIBBiCeycoUkIRbX72UXvy3j6ZzFPW76yTaWeqJn/Of9PF47IjLqlhrlhNc+VXHQqRv8YcC0IDiplYLp3FcijZmuwTNUOdWnI7h0+9zVCf2Xw+3IcieVte8NlATFz7+FzY36Z/8Rfmqe27EGMO6r4KCnouZVZHmMxzaqQUP3EOhFUKd5vQj5uyfUfGF7H7JjfjNPo6KNJ3Wqf3+F3t1FrqgzZ6jsFdcR8zApsG/9hxxGZOmLLCqoNv34l8LjULSFit2Kwq9csJB71IMCBpvksKPc 4enzp4ir XOc7+otH037LLfLvzW6pkUcclt2wvvDOyFmhkWzxJ8NUintN7Lw48aGFQBR/A22JCl6lGL5K126GOB1rfF1s4UCGpoXfw8ir+xc91yQ2nqb5zhBne2VDekHbZ9bQHZvhteAuasQvOtm2FZGZz6a5ASZNtoyqtN3Oew5/Swf5Xq0/ZPMy/7hit+1ASmB7BRE/acv9FzLcBjI50vCvuE8OM5LGyid+z8OYK7hYLB7AOrrZgVAmr5rVnBBm/O77C7izd0p8C3dsKw7OeGr44X/3cXb8/q1S6yr9o0H0feuzxMbcvhLffqN6yE5fo8FbyAdCMYa60qICe5oQ/ud0nxazBDbQ0E/K8Mr/T+1C2AqEpsk0NPG8JSy28mzAVwamHcuXi9VsZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 5, 2023 at 10:00=E2=80=AFPM Yosry Ahmed = wrote: > > [..] > > > @@ -526,6 +582,102 @@ static struct zswap_entry *zswap_entry_find_get= (struct rb_root *root, > > > return entry; > > > } > > > > > > +/********************************* > > > +* shrinker functions > > > +**********************************/ > > > +static enum lru_status shrink_memcg_cb(struct list_head *item, struc= t list_lru_one *l, > > > + spinlock_t *lock, void *arg); > > > + > > > +static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, > > > + struct shrink_control *sc) > > > +{ > > > + struct lruvec *lruvec =3D mem_cgroup_lruvec(sc->memcg, NODE_DAT= A(sc->nid)); > > > + unsigned long shrink_ret, nr_protected, lru_size; > > > + struct zswap_pool *pool =3D shrinker->private_data; > > > + bool encountered_page_in_swapcache =3D false; > > > + > > > + nr_protected =3D > > > + atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_p= rotected); > > > + lru_size =3D list_lru_shrink_count(&pool->list_lru, sc); > > > + > > > + /* > > > + * Abort if the shrinker is disabled or if we are shrinking int= o the > > > + * protected region. > > > + * > > > + * This short-circuiting is necessary because if we have too ma= ny multiple > > > + * concurrent reclaimers getting the freeable zswap object coun= ts at the > > > + * same time (before any of them made reasonable progress), the= total > > > + * number of reclaimed objects might be more than the number of= unprotected > > > + * objects (i.e the reclaimers will reclaim into the protected = area of the > > > + * zswap LRU). > > > + */ > > > + if (!zswap_shrinker_enabled || nr_protected >=3D lru_size - sc-= >nr_to_scan) { > > > + sc->nr_scanned =3D 0; > > > + return SHRINK_STOP; > > > + } > > > + > > > + shrink_ret =3D list_lru_shrink_walk(&pool->list_lru, sc, &shrin= k_memcg_cb, > > > + &encountered_page_in_swapcache); > > > + > > > + if (encountered_page_in_swapcache) > > > + return SHRINK_STOP; > > > + > > > + return shrink_ret ? shrink_ret : SHRINK_STOP; > > > +} > > > + > > > +static unsigned long zswap_shrinker_count(struct shrinker *shrinker, > > > + struct shrink_control *sc) > > > +{ > > > + struct zswap_pool *pool =3D shrinker->private_data; > > > + struct mem_cgroup *memcg =3D sc->memcg; > > > + struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(sc= ->nid)); > > > + unsigned long nr_backing, nr_stored, nr_freeable, nr_protected; > > > + > > > +#ifdef CONFIG_MEMCG_KMEM > > > + cgroup_rstat_flush(memcg->css.cgroup); > > > + nr_backing =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_S= HIFT; > > > + nr_stored =3D memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > +#else > > > + /* use pool stats instead of memcg stats */ > > > + nr_backing =3D get_zswap_pool_size(pool) >> PAGE_SHIFT; > > > + nr_stored =3D atomic_read(&pool->nr_stored); > > > +#endif > > > + > > > + if (!zswap_shrinker_enabled || !nr_stored) > > When I tested with this series, with !zswap_shrinker_enabled in the def= ault case, > > I found the performance is much worse than that without this patch. > > > > Testcase: memory.max=3D2G, zswap enabled, kernel build -j32 in a tmpfs = directory. > > > > The reason seems the above cgroup_rstat_flush(), caused much rstat lock= contention > > to the zswap_store() path. And if I put the "zswap_shrinker_enabled" ch= eck above > > the cgroup_rstat_flush(), the performance become much better. > > > > Maybe we can put the "zswap_shrinker_enabled" check above cgroup_rstat_= flush()? > > Yes, we should do nothing if !zswap_shrinker_enabled. We should also > use mem_cgroup_flush_stats() here like other places unless accuracy is > crucial, which I doubt given that reclaim uses > mem_cgroup_flush_stats(). Ah, good points on both suggestions. We should not do extra work for non-user. And, this is a best-effort approximation of the memory saving factor, so as long as it is not *too* far off I think it's acceptable. > > mem_cgroup_flush_stats() has some thresholding to make sure we don't > do flushes unnecessarily, and I have a pending series in mm-unstable > that makes that thresholding per-memcg. Keep in mind that adding a > call to mem_cgroup_flush_stats() will cause a conflict in mm-unstable, > because the series there adds a memcg argument to > mem_cgroup_flush_stats(). That should be easily amenable though, I can > post a fixlet for my series to add the memcg argument there on top of > users if needed. Hmm so how should we proceed from here? How about this: a) I can send a fixlet to move the enablement check above the stats flushing + use mem_cgroup_flush_stats b) Then maybe, you can send a fixlet to update this new callsite? Does that sound reasonable? > > > > > Thanks! > > > > > + return 0; > > > + > > > + nr_protected =3D > > > + atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_p= rotected); > > > + nr_freeable =3D list_lru_shrink_count(&pool->list_lru, sc); > > > + /* > > > + * Subtract the lru size by an estimate of the number of pages > > > + * that should be protected. > > > + */ > > > + nr_freeable =3D nr_freeable > nr_protected ? nr_freeable - nr_p= rotected : 0; > > > + > > > + /* > > > + * Scale the number of freeable pages by the memory saving fact= or. > > > + * This ensures that the better zswap compresses memory, the fe= wer > > > + * pages we will evict to swap (as it will otherwise incur IO f= or > > > + * relatively small memory saving). > > > + */ > > > + return mult_frac(nr_freeable, nr_backing, nr_stored); > > > +} > > > + > > > +static void zswap_alloc_shrinker(struct zswap_pool *pool) > > > +{ > > > + pool->shrinker =3D > > > + shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWA= RE, "mm-zswap"); > > > + if (!pool->shrinker) > > > + return; > > > + > > > + pool->shrinker->private_data =3D pool; > > > + pool->shrinker->scan_objects =3D zswap_shrinker_scan; > > > + pool->shrinker->count_objects =3D zswap_shrinker_count; > > > + pool->shrinker->batch =3D 0; > > > + pool->shrinker->seeks =3D DEFAULT_SEEKS; > > > +} > > > + > > > /********************************* > > > * per-cpu code > > > **********************************/ > [..]