From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BFA4C5475B for ; Mon, 11 Mar 2024 22:09:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A98566B0137; Mon, 11 Mar 2024 18:09:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A48618D0005; Mon, 11 Mar 2024 18:09:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 910506B0139; Mon, 11 Mar 2024 18:09:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7AA346B0137 for ; Mon, 11 Mar 2024 18:09:40 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 30322A08CB for ; Mon, 11 Mar 2024 22:09:40 +0000 (UTC) X-FDA: 81886150920.05.0409873 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf26.hostedemail.com (Postfix) with ESMTP id 8914B140011 for ; Mon, 11 Mar 2024 22:09:38 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="mmfrLkM/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3IYHvZQoKCBgMCGFMy5A214CC492.0CA96BIL-AA8Jy08.CF4@flex--yosryahmed.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3IYHvZQoKCBgMCGFMy5A214CC492.0CA96BIL-AA8Jy08.CF4@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710194978; a=rsa-sha256; cv=none; b=3y8KqqYkxLkJrDZ1RNHLmOgC9zkF93ZIxF5gYQFDOwmi7b4mlPVeTjiAJmsNdC0kd/HkDI TiRlqjTVViLYCuZ8LHAP8h2kt1iUj4jdquN7iawFhDOTStu6nlhcUyil8nGu7ODylqiQZr thyN20Z7IueMonY3cvFPBdczHMxX3mw= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="mmfrLkM/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3IYHvZQoKCBgMCGFMy5A214CC492.0CA96BIL-AA8Jy08.CF4@flex--yosryahmed.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3IYHvZQoKCBgMCGFMy5A214CC492.0CA96BIL-AA8Jy08.CF4@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710194978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A5gxKcBG9B2NwGIgck91cK0cCgZ6XAorrAmw0ICvRr8=; b=an98zFDcPzn6mE4taHmW4y3HSCHR5xmFYvgqwkk1D/lPR/37l59nQdtDUKMkxn40GDOFm/ zCz6oP1zhSDCGMrA2yZAUNMl85RYbv2sDT3voQ7V+gMT6ZPDm3keUfAgN+bZSjIQ4zU2JG KBoV3jPCs+yUu9oio/cW24DkJdj27D0= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-60a03635590so61400767b3.0 for ; Mon, 11 Mar 2024 15:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710194977; x=1710799777; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A5gxKcBG9B2NwGIgck91cK0cCgZ6XAorrAmw0ICvRr8=; b=mmfrLkM/+MpJpadeTX1KNXGPqBvGxNAEv0mHhECx3PH4OIZraQkCbekOpB4y3zceSF x9LiklnBVNA6n9OVfgGAifvesfPJb0iJsAtwl5BOWcspWPrLwlERqA3Xq+SMXCB0qleR +MUrp3D33X63mrOkWcDXCG2YE6D2sNFd0fs7WpaiF1YDW3RReZahbU0jmpqwGzSHdbN0 wb14tzlEDxnfWu8t54gZz5Tlvd3DgIPQkPDJJkri1/e46SjHewBZ+ksUCrJ8Xar65RBt UajePQOScjRfC/c+LDZx0GpSHSCj3klPEk7cBT3h01KmtJ/v3QqSGGFWEb4lW8A1MnmA qTNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710194977; x=1710799777; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A5gxKcBG9B2NwGIgck91cK0cCgZ6XAorrAmw0ICvRr8=; b=Wbkz47OGrur39osfQ+Z39kM2eYP8gcJnvVZouaZ2IihSBOpa/pA1Ehc3AqyP51i1TA gIatNkYGGrwSGS1qiwWoucRVT1m27LF/MgKII/flOB2jm+llGxWUW/o+Vqfl0IW2Bhl2 p4nyVybhmva4AUR2noHLRfxMxpA+RfKdDJZM2N57SmprLHLhew1sCTXudzxIvnxiKdlq w+Y0K5WG0TDRqp/wEn7x+DYGkLHTdSE9Nt71zMOePvGI4/YcRj7WUNoTPJvrBK9Zb2Qz CkC11/aBkBiKk8KihVh5kiPUA3yLXvmRf6ZjZog4I7yGSBOn4lLUYX8UQ2LbJO6535cV SnIw== X-Forwarded-Encrypted: i=1; AJvYcCXCIzAlI+bj+lws/oor8IQjmMwLFvSia5C+pnRfd6Pe49cepjHW084G+pdIqRIGpmru2T731BIKf06y/5UCt6BX3Gw= X-Gm-Message-State: AOJu0YxR7gTZjAb+bIkc2w01sAs+Qx1RjyDfYvfwLC/Pc6htd8qOizT7 x0ae79dimPfpMchBsBXj5OZZNU03/rJ63MsXmeVCcJzPeX4Ug7gJyCVlYtWuWZoCDS/d8J7nXe3 buMGCLg9px5iLbGQ3tQ== X-Google-Smtp-Source: AGHT+IF1RBJE1y4HMiePUf4HmcGxy0skZsuDCxP/wlOEXOnJllclb1u65Ny1Tv6mofVWBbXbmXXmVD7XoiGKGLOD X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:110a:b0:dcc:50ca:e153 with SMTP id o10-20020a056902110a00b00dcc50cae153mr2168318ybu.7.1710194977555; Mon, 11 Mar 2024 15:09:37 -0700 (PDT) Date: Mon, 11 Mar 2024 22:09:35 +0000 In-Reply-To: <20240311161214.1145168-1-hannes@cmpxchg.org> Mime-Version: 1.0 References: <20240311161214.1145168-1-hannes@cmpxchg.org> Message-ID: Subject: Re: [PATCH 1/2] mm: zswap: optimize zswap pool size tracking From: Yosry Ahmed To: Johannes Weiner Cc: Andrew Morton , Nhat Pham , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8914B140011 X-Stat-Signature: 1sf7neg8cwsnsqz35dtbu1oo44nsoom7 X-HE-Tag: 1710194978-707109 X-HE-Meta: U2FsdGVkX1/ES0sJQk4dckmZXFUbELkUVtPm2D0blVBOTQXwqNhBVrDDLSEcIkUbFNsg+DlSAIhPiRzToknQkBSCGX+O+2rm98UjbhZzeZO1IgLEQKlGYmfR9+gS7vFOa2ZlXAORGJGHBLFIgM7HcvUkRUbRouyq5QxaC09OPm8giHqpUWb6s5ikiv4sHfDU328u9OKhUucS0XRYFFzrkUlGJwzrxFLP2pkK5zj66J0CqNszcdetrsHLxRkQhZ3W6cmxdi6yIR8jVpv2Sr1BHR9aMoLII6z3fJxxPj7otXM4SkTD5GLkh6+zy91DPJUPjr5+4oFhbINXosykAnwRPEoTYrIhrwWuRQFeMRwF28EHpXnMD9EtajlZ3MFLXF09aUbKe+FBoJCs3psi8TY8DWf6nxcM/O3H/SptZygZmgntIbXQ9ro2ubDOaKVfj+DhRAyEzsM/lqLcOZYPtAUqMgq8VMC4JgJFGmb5+ZrGOYmSfwmfYQHixqtr2nHlmolXzZz5zLTYGMIWeYCY5NItlee7T9zvY149QHnD0dfnGhMF3oCLZ5kpOzgw9FSq7BPSFFZYSe7wK/Jm96xXa4Mk4EI4HGf8CY5AIrYtsMy43z3qsBIKFmX2ZEcp3vGb6vK1CR2nQWf5l8+XyUqtZ+sYJYeQOwswl+DAIGFBZLNfSuvO8sKV3IuvOLgcwIpD5jNS99GshzfG2YLC5jFISGgNg3j4yrr7BF+net+xupkaQ1klAMAo8qEsuU/KY5Q4XZadCo2y8SrAJ8u7lzJZgmH5cVF1/GPJCXuW5NS/pBPoG0HShSYITTpjSsHrhgV13HVLL383vkjIAiYpWtPeLM/+VfmQmPCuw0X5TFzYsg/bsgebIlYbBTfuhJ/RTb3NgpunwpN1JBMiQwHcuxedjW/724mHyYV2xJCGE+bDx7t33b4NEYGmogufYh1buqA3ZNz9MQgu9c5ZZevoKS8Oy6D ZhaH/PZh qufcn47KNW7u4VAVUtJn1CtLtD7aqNSaBKMQauVLlmGpU0mBddJ+JCGCCAeOJVqyL80oejAW4H6EAK9uj9NegKsaUJC7O8PB2dA9yJTsEtOsq9P+T897WTA4eXjVMlgkMtek2cPh4nsgTK2z8hLMJUj/+X26ZTK/0l1PxfySx1JDoVvdBJoQs+InOtZiR2akGrOSWsh6FtwIP2Vn6i3mBwID2gcFMg8/4I7Xfc7kT9d/Ywiq2V/mD74Pfz8v6t9DWXgwLrLawA5xrR8/8jsWWLrQT2z6ilFz0Nu8Ws3AiE25UslZ6n6wC87jeefizY2Eob0ZkaMAS6NiMhpIW8t7mV0kzLJpz7EuSd0bWyzGPO+jBTj5YQk4GdKKv86jAajVyUNMA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 12:12:13PM -0400, Johannes Weiner wrote: > Profiling the munmap() of a zswapped memory region shows 50%(!) of the > total cycles currently going into updating the zswap_pool_total_size. Yikes. I have always hated that size update scheme FWIW. I have also wondered whether it makes sense to just maintain the number of pages in zswap as an atomic, like zswap_stored_pages. I guess your proposed scheme is even cheaper for the load/invalidate paths because we do nothing at all. It could be an option if the aggregation in other paths ever becomes a problem, but we would need to make sure it doesn't regress the load/invalidate paths. Just sharing some thoughts. > > There are three consumers of this counter: > - store, to enforce the globally configured pool limit > - meminfo & debugfs, to report the size to the user > - shrink, to determine the batch size for each cycle > > Instead of aggregating everytime an entry enters or exits the zswap > pool, aggregate the value from the zpools on-demand: > > - Stores aggregate the counter anyway upon success. Aggregating to > check the limit instead is the same amount of work. > > - Meminfo & debugfs might benefit somewhat from a pre-aggregated > counter, but aren't exactly hotpaths. > > - Shrinking can aggregate once for every cycle instead of doing it for > every freed entry. As the shrinker might work on tens or hundreds of > objects per scan cycle, this is a large reduction in aggregations. > > The paths that benefit dramatically are swapin, swapoff, and > unmaps. There could be millions of pages being processed until > somebody asks for the pool size again. This eliminates the pool size > updates from those paths entirely. This looks like a big win, thanks! I wonder if you have any numbers of perf profiles to share. That would be nice to have, but I think the benefit is clear regardless. I also like the implicit cleanup when we switch to maintaining the number of pages rather than bytes. The code looks much better with all the shifts and divisions gone :) I have a couple of comments below. With them addressed, feel free to add: Acked-by: Yosry Ahmed [..] > @@ -1385,6 +1365,10 @@ static void shrink_worker(struct work_struct *w) > { > struct mem_cgroup *memcg; > int ret, failures = 0; > + unsigned long thr; > + > + /* Reclaim down to the accept threshold */ > + thr = zswap_max_pages() * zswap_accept_thr_percent / 100; This calculation is repeated twice, so I'd rather keep a helper for it as an alternative to zswap_can_accept(). Perhaps zswap_threshold_page() or zswap_acceptance_pages()? > > /* global reclaim will select cgroup in a round-robin fashion. */ > do { > @@ -1432,10 +1416,9 @@ static void shrink_worker(struct work_struct *w) > break; > if (ret && ++failures == MAX_RECLAIM_RETRIES) > break; > - > resched: > cond_resched(); > - } while (!zswap_can_accept()); > + } while (zswap_total_pages() > thr); > } [..] > @@ -1711,6 +1700,13 @@ void zswap_swapoff(int type) > > static struct dentry *zswap_debugfs_root; > > +static int debugfs_get_total_size(void *data, u64 *val) > +{ > + *val = zswap_total_pages() * PAGE_SIZE; > + return 0; > +} > +DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu"); I think we are missing a newline here to maintain the current format (i.e "%llu\n"). > + > static int zswap_debugfs_init(void) > { > if (!debugfs_initialized()) > @@ -1732,8 +1728,8 @@ static int zswap_debugfs_init(void) > zswap_debugfs_root, &zswap_reject_compress_poor); > debugfs_create_u64("written_back_pages", 0444, > zswap_debugfs_root, &zswap_written_back_pages); > - debugfs_create_u64("pool_total_size", 0444, > - zswap_debugfs_root, &zswap_pool_total_size); > + debugfs_create_file("pool_total_size", 0444, > + zswap_debugfs_root, NULL, &total_size_fops); > debugfs_create_atomic_t("stored_pages", 0444, > zswap_debugfs_root, &zswap_stored_pages); > debugfs_create_atomic_t("same_filled_pages", 0444, > -- > 2.44.0 >