From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E30D7CEB2CD for ; Mon, 30 Sep 2024 23:11:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 785726B00E3; Mon, 30 Sep 2024 19:11:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 73598280036; Mon, 30 Sep 2024 19:11:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D5D16B00EA; Mon, 30 Sep 2024 19:11:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3D82E6B00E3 for ; Mon, 30 Sep 2024 19:11:36 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 00B7E16065B for ; Mon, 30 Sep 2024 23:11:35 +0000 (UTC) X-FDA: 82622953350.02.DD1E2C1 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) by imf18.hostedemail.com (Postfix) with ESMTP id 3B9391C0009 for ; Mon, 30 Sep 2024 23:11:34 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SbrTNecQ; spf=pass (imf18.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727737855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bt1p6qi/avv3Z7JNtcGvm+YnEBXlYqUjxh3iCuxFEK4=; b=dPS+gv3SVg+EAJvNpW1xNkI2yZoG6L6Prs9ZlfBDEXM5bSA4RD4tjIuZPacssaQElRYQ5q uW29zw+Au9UIkwT4AFi5U2/ucagDbcTYtQ9/BtpMhYtrAoMlyOzeRQcJafiyweH+IVjPjt duLKAQs8lDKr/gDgOinipqzKiqrqxtI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SbrTNecQ; spf=pass (imf18.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727737855; a=rsa-sha256; cv=none; b=1v7QPPJgO40SiToWN4INImo5gYxgLhBHQlmHCI0aIRDYy8HRbgOdALzIRZJryyXWhfMxjy JWGR14P4l8OWoWEnlCVmiMdU+IZOV4ok5OSJTe4QpR85aWfZols00zvddWN0EZYjDjLc3K Vrc/MzbTA2cY6Np/NTM9j4wO9jbOVHs= Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-7ae3e3db294so180055985a.2 for ; Mon, 30 Sep 2024 16:11:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727737893; x=1728342693; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Bt1p6qi/avv3Z7JNtcGvm+YnEBXlYqUjxh3iCuxFEK4=; b=SbrTNecQP0T9gzOVK0VI6BhyxsvCUL1LwvNDRjVq5DiWcqVsIbMKjBAPS1P5zY3tm6 5SU/+ZyWBgjHbp1/MtNT6ehn8WDAaLTEW1e7IXDUUPEOMGiaMerXJ1zX2ovLEA92bv5+ HhcG3Gmz5XXRYljGigful7G6N+ZeUEZ3FooBnH0qH8v3QbuIioWSFmOhI3FV28rdygKw 5iJUwXE7PnuaykYXZ6M7bPXFxLg9P3Gd35/+fDX1WglY4m7EPws+uLz2e7Vn4NFyeJPK 3eImpxcUKBaIK6a/WjFND9ZWP9KL3tCYDv79Mudtfwt95HSLBYIN7w/ld9W60qde5cpL FtNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727737893; x=1728342693; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Bt1p6qi/avv3Z7JNtcGvm+YnEBXlYqUjxh3iCuxFEK4=; b=KHJfcOc4ZK88QZhgHNsmGHDAR+2+Rve3yvJIOoSSXOdh9PIklcOoSm8mafhuOnvKlW LGiB847/EEKA1fk3HX6+9+ycvCYLQSqfPegb0Rp+4uyDJcxm2wuznlhq7L4F8OloXXZh Ynf+09HPjrZTbHEgUULnv7Zm8lkE/6WcJu7oND3GaL8cEWCSX7+eUFm4udFmIFdR+HDH 9Q3kBbM5VLShjeEPgX700hkNh7oyjgUGlYJZWqpa+FpsVNDWY7PmA4FBGdRNv/RQ+KxE 9I8xH4w+UdzkFWmTxghjzRcLV91u6kumftay88Q2FAPaa0XFYxn/WUJu1u8s0V25DYp9 1adA== X-Forwarded-Encrypted: i=1; AJvYcCUMqkWZIF+wOUd8heWdOsjIF1bn8KsRmeMfCLJPSffS4oLZY8amdaiSVLAOdC73FDaIpGeyin8fAQ==@kvack.org X-Gm-Message-State: AOJu0YxXNE9ZtaPDyDHqLH9orAkjwDEfuz8uNhSmYaX9v81NFpU9/4HK b8oRTGC3v6dgPLfoxyB3V+ATYgtXAWojYQDBxXsJum7OZ5nCAdVsxFvenC6+OoGaopz9TqcCTw2 5x+zgxCiPF4lDx5uXxlAKCNgZ3JE= X-Google-Smtp-Source: AGHT+IEZsCfg9FoV3ap/Q71ShaRhWDJAzq8H8/gftDicFSR8wkle3axkldA0oLqVhTU52iUKUH4ysSsEsc91Ico8dZg= X-Received: by 2002:a05:620a:17ab:b0:7a9:c13d:6e5c with SMTP id af79cd13be357-7ae37852751mr1708462385a.29.1727737893167; Mon, 30 Sep 2024 16:11:33 -0700 (PDT) MIME-Version: 1.0 References: <20240930221221.6981-1-kanchana.p.sridhar@intel.com> <20240930221221.6981-7-kanchana.p.sridhar@intel.com> In-Reply-To: <20240930221221.6981-7-kanchana.p.sridhar@intel.com> From: Nhat Pham Date: Mon, 30 Sep 2024 16:11:22 -0700 Message-ID: Subject: Re: [PATCH v9 6/7] mm: zswap: Support large folios in zswap_store(). To: Kanchana P Sridhar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, willy@infradead.org, nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: zxfgzkyrjtfrbrqmpxhok8jund5za5yw X-Rspamd-Queue-Id: 3B9391C0009 X-Rspamd-Server: rspam11 X-HE-Tag: 1727737894-294221 X-HE-Meta: U2FsdGVkX1/+AELydw2PCxSXM+GLW9waHEHKmyGtNkeLYLnEDzBdbFxh7KXIA95Q9YumwrxFvXqm7/9x5UUR69G6hIrogBfSJGFQdKy54nDxD3wR8x2ZIw08O25EsMMDpHC7HIZmD7129BX8MlOkrrvmVG8WCJ+sNGMAh7kMPldn//4vXwvPJfB/Ln9KFY0W/8Ja1btzwhacsjuBJmrn4oAzyrKe2bf40QTLsLg4m8g38ACd6+LgM8rcXlXZ/utY0LvTFKDeTv7zvHw6DrHnyOdBcU1AjVavrP/2A/xBpyG05KAYOhdkWA+Tj1ZT3kPSL6heJ+iR069rYtUpwNbsXxW0jZeUGsOZ/H0E+kXW/7iaeEmG/XdEwYCRth/r0Ebrg+MIbjNI3nnbhNtN/RZjk69L+V5Wq1bViLGOHvVzYIlzz9GOkuISYIvhx6j1QCAPfeZnOepQuN73E03fFQjHkf96pPXR38w9W/g4k1uRqQrBT4d/93SbZYn+6mHr7uYh1kuKG/BYRvJ73/1HlUhHxGYnB4NN3V/CvVhJFmouQJNuVfeMayqpOsyrMrGa0T2rlHePAIslxsPqCclcOCG/EhPd/NvxUJcqKVB3Ukt0cPBpRQ1+Q//rjOZuajT2RmzME0ZNRJWzIDIbqdNgGt86XlKIr/FRVKf9hvqKeXLAp6axCv4EagUs1VS5GYonefsR1ho75VNnVQy5G+PWP7rOwrqAVcgjeI06LcR1qFgBZSm4C4wM7Akisr7h4gG/K6WXIkaMR0Lb19bVN7VZB6EpRm7TEsTOxg5b91XNhYU/p4LPqULpRrozPqJb9T0KtToOPWUG2Dltjsj8gGwQrPuow2JyVCfu3Z8WAn46i7LcZtxCq35AuJiwlOzu2tgPgAG88yfWZVAJP7XIPINP93lpn6obEDGICQQy2toNsHC/IZEJ/glVt7d3qRqry/D1XVYO4LtKvWfMHumBMJs+NVs IVy1so86 yDT6IJPXUr/y7xUKRKs/xJRYBtz+4nn80JXRT2bw3nfgcK++OrWXAbv1/0DLt2fg2Ttoxs797m3L7qe1XOBJicgv7AF6/ovdRvYqZQNd5i8Ks8RzXfk/DwcgCTjce3gJp9+4slATL0yK1bNasKzh1EfAjwVtAY/Vi9WKe+RWKlRbfMiUNSZFcwsoZM3ILUICiWa2IPZYGqgYDo4yCBojS4rZ6wacAWslOFPzwBDtoSaaQF2OHw1iuLB1GAmJElCrGtRko+xvOLAX82RwRwyBaL3FDh7plN3eY0ibZc9dRcwV5vvsLEMPpQh29jahlF0hld2UWH5L/ERXNdJsKor9VinK0Pra+IRIAX4GgOH6TLFy5Hbdb8wx/udrouej3O8324nhDNOUu1uyfJO/5+sI67drrdXkqf33IHxcfkexW1a44ScO5REEBiJkRX0qWjcUmkA6zyEVd4YZzoBsAYb0PZetAlQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 30, 2024 at 3:12=E2=80=AFPM Kanchana P Sridhar wrote: > > zswap_store() will store large folios by compressing them page by page. > > This patch provides a sequential implementation of storing a large folio > in zswap_store() by iterating through each page in the folio to compress > and store it in the zswap zpool. > > zswap_store() calls the newly added zswap_store_page() function for each > page in the folio. zswap_store_page() handles compressing and storing eac= h > page. > > We check the global and per-cgroup limits once at the beginning of > zswap_store(), and only check that the limit is not reached yet. This is > racy and inaccurate, but it should be sufficient for now. We also obtain > initial references to the relevant objcg and pool to guarantee that > subsequent references can be acquired by zswap_store_page(). A new functi= on > zswap_pool_get() is added to facilitate this. > > If these one-time checks pass, we compress the pages of the folio, while > maintaining a running count of compressed bytes for all the folio's pages= . > If all pages are successfully compressed and stored, we do the cgroup > zswap charging with the total compressed bytes, and batch update the > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() once, > before returning from zswap_store(). > > If an error is encountered during the store of any page in the folio, > all pages in that folio currently stored in zswap will be invalidated. > Thus, a folio is either entirely stored in zswap, or entirely not stored > in zswap. > > The most important value provided by this patch is it enables swapping ou= t > large folios to zswap without splitting them. Furthermore, it batches som= e > operations while doing so (cgroup charging, stats updates). > > This patch also forms the basis for building compress batching of pages i= n > a large folio in zswap_store() by compressing up to say, 8 pages of the > folio in parallel in hardware using the Intel In-Memory Analytics > Accelerator (Intel IAA). > > This change reuses and adapts the functionality in Ryan Roberts' RFC > patch [1]: > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.robe= rts@arm.com/T/#u > > Also, addressed some of the RFC comments from the discussion in [1]. > > Co-developed-by: Ryan Roberts > Signed-off-by: > Signed-off-by: Kanchana P Sridhar > --- > mm/zswap.c | 220 +++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 153 insertions(+), 67 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 2b8da50f6322..b74c8de99646 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -411,6 +411,12 @@ static int __must_check zswap_pool_tryget(struct zsw= ap_pool *pool) > return percpu_ref_tryget(&pool->ref); > } > > +/* The caller must already have a reference. */ > +static void zswap_pool_get(struct zswap_pool *pool) > +{ > + percpu_ref_get(&pool->ref); > +} > + > static void zswap_pool_put(struct zswap_pool *pool) > { > percpu_ref_put(&pool->ref); > @@ -1402,68 +1408,52 @@ static void shrink_worker(struct work_struct *w) > /********************************* > * main API > **********************************/ > -bool zswap_store(struct folio *folio) > + > +/* > + * Stores the page at specified "index" in a folio. > + * > + * @page: The page to store in zswap. > + * @objcg: The folio's objcg. Caller has a reference. > + * @pool: The zswap_pool to store the compressed data for the page. > + * The caller should have obtained a reference to a valid > + * zswap_pool by calling zswap_pool_tryget(), to pass as this > + * argument. > + * @tree: The xarray for the @page's folio's swap. > + * @compressed_bytes: The compressed entry->length value is added > + * to this, so that the caller can get the total > + * compressed lengths of all sub-pages in a folio. > + */ > +static bool zswap_store_page(struct page *page, > + struct obj_cgroup *objcg, > + struct zswap_pool *pool, > + struct xarray *tree, > + size_t *compressed_bytes) > { > - swp_entry_t swp =3D folio->swap; > - pgoff_t offset =3D swp_offset(swp); > - struct xarray *tree =3D swap_zswap_tree(swp); > struct zswap_entry *entry, *old; > - struct obj_cgroup *objcg =3D NULL; > - struct mem_cgroup *memcg =3D NULL; > - > - VM_WARN_ON_ONCE(!folio_test_locked(folio)); > - VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > - > - /* Large folios aren't supported */ > - if (folio_test_large(folio)) > - return false; > - > - if (!zswap_enabled) > - goto check_old; > - > - /* Check cgroup limits */ > - objcg =3D get_obj_cgroup_from_folio(folio); > - if (objcg && !obj_cgroup_may_zswap(objcg)) { > - memcg =3D get_mem_cgroup_from_objcg(objcg); > - if (shrink_memcg(memcg)) { > - mem_cgroup_put(memcg); > - goto reject; > - } > - mem_cgroup_put(memcg); > - } > - > - if (zswap_check_limits()) > - goto reject; > > /* allocate entry */ > - entry =3D zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); > + entry =3D zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(page_foli= o(page))); > if (!entry) { > zswap_reject_kmemcache_fail++; > goto reject; > } > > - /* if entry is successfully added, it keeps the reference */ > - entry->pool =3D zswap_pool_current_get(); > - if (!entry->pool) > - goto freepage; > + /* zswap_store() already holds a ref on 'objcg' and 'pool' */ > + if (objcg) > + obj_cgroup_get(objcg); > + zswap_pool_get(pool); Should we also batch-get references to the pool as well? i.e add a helper function: /* The caller must already have a reference. */ static void zswap_pool_get_many(struct zswap_pool *pool, unsigned long nr) { percpu_ref_get_many(&pool->ref, nr); } then do it in a fell swoop after you're done storing all individual subpage= s (near atomic_long_add(nr_pages, &zswap_stored_pages)). Do double check that it is safe - I think it should be, since we have the folio locked in swapcache, so there should not be any shenanigans (for e.g no race with concurrent free or writeback). Perhaps a fixlet suffices?