From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28BDAE77188 for ; Tue, 7 Jan 2025 01:17:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA6EB6B00CC; Mon, 6 Jan 2025 20:17:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2F596B00CF; Mon, 6 Jan 2025 20:17:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A92A6B00D0; Mon, 6 Jan 2025 20:17:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 749076B00CC for ; Mon, 6 Jan 2025 20:17:36 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2F4FA1A05C5 for ; Tue, 7 Jan 2025 01:17:36 +0000 (UTC) X-FDA: 82978893312.08.72FE70B Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf27.hostedemail.com (Postfix) with ESMTP id 4FC404000C for ; Tue, 7 Jan 2025 01:17:34 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AMa74AZT; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.176 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736212654; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8e+RtCGudvSKljC3J4TMkvI9D8wepFwqkWCe1hsZ2Qk=; b=v4CtMPzS0z633KbrkXd6Kr+FVPq7DjByslmczwGdqFaMOMb7D1NEF9OPhrSMrv69rncjNl cz0rHWZLwR9fiLkPjkk+WYeE0SxlvhXbMYGd0fcICUvuOKv01lrHyYFS6r7mHTVXkocXbq GbXh2qT1uVUEN0rXXtw+e+g29Nf07Rs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AMa74AZT; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.176 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736212654; a=rsa-sha256; cv=none; b=mQl1q/ASKej8u5fiJ4ZbgRnmyBS2qq0zYovG//hYIdJ/mORMrdFwG830rBYts9If4Y1olj M3zwAJlPu17olr6e/j7zpyyDvOP0Vwt5ahqa4j0LRcy3dXNOPYin6Ngek+sorWqhv3PTrA H1UTRbGhF2tN8YocMAs3x0CtfMW+ku8= Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7b6f95d2eafso1464846285a.3 for ; Mon, 06 Jan 2025 17:17:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736212653; x=1736817453; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8e+RtCGudvSKljC3J4TMkvI9D8wepFwqkWCe1hsZ2Qk=; b=AMa74AZToJP+Ym/9bUgaUc+pAX46q2RTjfAemI6pLcibtabf9NVbxh4FwbK7wbvXUa As1fYod79znWRGjskfUM/8a+CbZCGn6RXR/dk+GC5JUjzqBKWxzFE33UKG+ln5Ug8pWq 9kVymsCmTkPTIqZ5f6G4738DhBAkhxGyqfIM3JWBnmlDq3X524Cez+ZfUVscC2UiH4dH oshiExBpWSModkqB0hMTUV3hVGV6whjB1LUz7tcvu/72uLRvKOlQ+ORzsJUPhAM5Ieyx 8IAIaSBd3bcJMjxHmiWU6h3UQrt/f4j79O3HdGPyl0UgM3rT5MsctzKIGpLUV3RJixXm gNlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736212653; x=1736817453; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8e+RtCGudvSKljC3J4TMkvI9D8wepFwqkWCe1hsZ2Qk=; b=jA7AWzTt/6pt4X3l1sdciVEFkdZc4rb/ByOHuQ2NqCJyWqGQj95UBq7ZF/YQ+PdAJB 3A9yNBCpH6UqwM2UT2nL7eahYofm6p/moit4AQjWH001oGpCqQjqRBc6jEWooM7ALp0q 21zaByiUj3f2Tj4Zi3dveZtpNEy2v6E43qN6CgBKSU3AQoKbJbHm5vPrUhkcKPf5Opka ZOsMx+XI+K+P6gMB6VueXSHXU/eCkZ0Rd8o0aXUaw6tk0heJ3BmVuxzx+rE1Gt+T2AuU 9mUiq7ZZbAsLvVC5HNj8s86QaBFHj+ReMB7NYMpTBkz/H+BAVB7cyEpu1kp4eJTZUPax lfFA== X-Forwarded-Encrypted: i=1; AJvYcCXvFbq37FzaCfTdtKFI5ntlPHj8NgspbrpKErlCyLWSuAjDS2J7dJOG33SuEM+VCbrgHI7orzaI9g==@kvack.org X-Gm-Message-State: AOJu0YxAAVmFc7PWTvZV2PNYj92NX/UEUMzVlqUS9Q70cUc4aaPenOIK QxgJn4tf9NbBedFGAAeUkbLUyOSHgmEvFzYlQHW0tSb3vY/L4qk9DwowmWW6nfCwBVNzrVVYjIr dZdiyM9nBnPP04qDD7wG0Ai38ur47XzswOyh8 X-Gm-Gg: ASbGncviTbvPUB1ux/AzrSPs83Az6i3yrZEneeaLjolsG0TTwT+fiz5DKlC6qbOYvUk dCOlqZVD+LpiBODwpcxUxIuHhGVoFZBJqJCQ= X-Google-Smtp-Source: AGHT+IFEwBEDT7tBkECwtnZEMAHyH9n4yChnvqe+bwfQTa7pEUr7djsQ3CSQUKvo5IY+wbx7KjXrc7isNgAoHg5OSfQ= X-Received: by 2002:a05:6214:262c:b0:6cb:d4e6:2507 with SMTP id 6a1803df08f44-6dd2335706fmr950815736d6.22.1736212653189; Mon, 06 Jan 2025 17:17:33 -0800 (PST) MIME-Version: 1.0 References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> <20241221063119.29140-12-kanchana.p.sridhar@intel.com> In-Reply-To: <20241221063119.29140-12-kanchana.p.sridhar@intel.com> From: Yosry Ahmed Date: Mon, 6 Jan 2025 17:16:56 -0800 X-Gm-Features: AbW1kvamYUatLjiZ5OPcQdyR7T0WzjBviKiXRpM86mf9wmkQgM_poT1gAzmWmbI Message-ID: Subject: Re: [PATCH v5 11/12] mm: zswap: Restructure & simplify zswap_store() to make it amenable for batching. To: Kanchana P Sridhar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4FC404000C X-Rspam-User: X-Stat-Signature: wknbs4eiktjskjyu1fujafofas4b6apj X-HE-Tag: 1736212654-38556 X-HE-Meta: U2FsdGVkX18bXpF3SOQKPsDPNSrzHiqf60JqKAXEl/6oOWsD6yPyEyMzqyKIYDersQOKZi2ld+HXi1OJNUi+TzjsmkPZ1xaZh1n2WjqJMn351cjxK7rtzFHhnx86SratYjLDCmAnrlCjT0/7EWTf/cNtQipkoVGH5nnPU5nYfEEmwFIxyTi19YmaVR9oKSjrrMPiPJxvtT8wQdGnNzcFi+/ZW3BHO3Qg62KrMCL1xUcsJ8BGCvGVNmI4BnZ72+cWAOm/g8cA4vni2p/hQrb+B4Nle7lNU/jFkw8MqIvkdrf/5rt4bbRIsByREyz2Yemd/ACpGjaI506mrTH58iHfYlS9ASn1QLGAQCdkFTFKNdQYzSzJzWchyJirQteYh7Jsb3SfPkTPmo/G+3hwlWuutTNKhahFO/z04VKIHsWIzVoVrspSM0PnBx6eCIhyI7bWthWCXqf/ZBGqw/dNDTf0pCLDJzz/5003W9QZjYXRThBQ2uhHBfnNgrrkEbn4uiA/oUN/CBVacwNZ53sk6xC/XdvIVUJuDRb3eJEW3R5NH1/2syugoS4u9F69ZwBmaFQ05lCm6guc3eD4TtsGl9kHxQmoONuVe+4yRseiO8/Do/RuvDmnbeRjVX5fCW1+LfkYVVO82B46bR6O+NvvRBaoQ8bC6ntkcUIstXJOQsOHn+kruOn9evgPAWj6nrJrJsMNk8wompSHZEtKqmYz6WtZMstLxsp6bvAu4ldoUPkoOEHH05DQhdDbNLhNGrTUjJF+fIOKNmTJPCx+TT/TVs2QJ5CQ0IopQyZ8s6VHi/UqT4sAiOYhiD4g/8jsjw64OmXuNaRb2Gr4tLLZ7pVKghvCOLoburXBfdS4lLVs8hSpCFT9sDFD37rQSx/r9YchbVneTO1wpMjQcmgUXyX+0G8/ik2rWgkRTS9iUWiAa9ajIXpRNmeWgDncdvU3cS0bmjh5hYSXUz9ZjojZt+nP6HT 6GNyi22k POuCbr0C3fQe4x0E2UB/rNaon7chPKdL89PvKU6bCYNdg3RZpNTR2kTwj3KC/F8wAm/LO1Neo1lVqhkqXbQOWP1nXXOJ/Qj8TGG1QK/ejYTGPky8pSk1e1/lHwPnT0GVHOy5B+dbYAtcfGKgBWLw1cQvbUdVFyr8yVDrq6uyfThi9m52BdEvCMG1YaeRLHXkzF57m+H0onzm97kjVdarn7KASpt5NEkrDB26lN8CPmPNhmYMGCEPccrUgGDbwVXmWRv55ccSBfc9u720H8TzzYo7BM5xl3WrT1O6oL5b8/NR/ZL8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 20, 2024 at 10:31=E2=80=AFPM Kanchana P Sridhar wrote: > > This patch introduces zswap_store_folio() that implements all the compute= s > done earlier in zswap_store_page() for a single-page, for all the pages i= n > a folio. This allows us to move the loop over the folio's pages from > zswap_store() to zswap_store_folio(). > > A distinct zswap_compress_folio() is also added, that simply calls > zswap_compress() for each page in the folio it is called with. The git diff looks funky, it may make things clearer to introduce zswap_compress_folio() in a separate patch. > > zswap_store_folio() starts by allocating all zswap entries required to > store the folio. Next, it calls zswap_compress_folio() and finally, adds > the entries to the xarray and LRU. > > The error handling and cleanup required for all failure scenarios that ca= n > occur while storing a folio in zswap is now consolidated to a > "store_folio_failed" label in zswap_store_folio(). > > These changes facilitate developing support for compress batching in > zswap_store_folio(). > > Signed-off-by: Kanchana P Sridhar > --- > mm/zswap.c | 183 +++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 116 insertions(+), 67 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 99cd78891fd0..1be0f1807bfc 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1467,77 +1467,129 @@ static void shrink_worker(struct work_struct *w) > * main API > **********************************/ > > -static ssize_t zswap_store_page(struct page *page, > - struct obj_cgroup *objcg, > - struct zswap_pool *pool) > +static bool zswap_compress_folio(struct folio *folio, > + struct zswap_entry *entries[], > + struct zswap_pool *pool) > { > - swp_entry_t page_swpentry =3D page_swap_entry(page); > - struct zswap_entry *entry, *old; > + long index, nr_pages =3D folio_nr_pages(folio); > > - /* allocate entry */ > - entry =3D zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); > - if (!entry) { > - zswap_reject_kmemcache_fail++; > - return -EINVAL; > + for (index =3D 0; index < nr_pages; ++index) { > + struct page *page =3D folio_page(folio, index); > + > + if (!zswap_compress(page, entries[index], pool)) > + return false; > } > > - if (!zswap_compress(page, entry, pool)) > - goto compress_failed; > + return true; > +} > > - old =3D xa_store(swap_zswap_tree(page_swpentry), > - swp_offset(page_swpentry), > - entry, GFP_KERNEL); > - if (xa_is_err(old)) { > - int err =3D xa_err(old); > +/* > + * Store all pages in a folio. > + * > + * The error handling from all failure points is consolidated to the > + * "store_folio_failed" label, based on the initialization of the zswap = entries' > + * handles to ERR_PTR(-EINVAL) at allocation time, and the fact that the > + * entry's handle is subsequently modified only upon a successful zpool_= malloc() > + * after the page is compressed. > + */ > +static ssize_t zswap_store_folio(struct folio *folio, > + struct obj_cgroup *objcg, > + struct zswap_pool *pool) > +{ > + long index, nr_pages =3D folio_nr_pages(folio); > + struct zswap_entry **entries =3D NULL; > + int node_id =3D folio_nid(folio); > + size_t compressed_bytes =3D 0; > > - WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray error: %d\= n", err); > - zswap_reject_alloc_fail++; > - goto store_failed; > + entries =3D kmalloc(nr_pages * sizeof(*entries), GFP_KERNEL); We can probably use kcalloc() here. > + if (!entries) > + return -ENOMEM; > + > + /* allocate entries */ This comment can be dropped. > + for (index =3D 0; index < nr_pages; ++index) { > + entries[index] =3D zswap_entry_cache_alloc(GFP_KERNEL, no= de_id); > + > + if (!entries[index]) { > + zswap_reject_kmemcache_fail++; > + nr_pages =3D index; > + goto store_folio_failed; > + } > + > + entries[index]->handle =3D (unsigned long)ERR_PTR(-EINVAL= ); > } > > - /* > - * We may have had an existing entry that became stale when > - * the folio was redirtied and now the new version is being > - * swapped out. Get rid of the old. > - */ > - if (old) > - zswap_entry_free(old); > + if (!zswap_compress_folio(folio, entries, pool)) > + goto store_folio_failed; > > - /* > - * The entry is successfully compressed and stored in the tree, t= here is > - * no further possibility of failure. Grab refs to the pool and o= bjcg. > - * These refs will be dropped by zswap_entry_free() when the entr= y is > - * removed from the tree. > - */ > - zswap_pool_get(pool); > - if (objcg) > - obj_cgroup_get(objcg); > + for (index =3D 0; index < nr_pages; ++index) { > + swp_entry_t page_swpentry =3D page_swap_entry(folio_page(= folio, index)); > + struct zswap_entry *old, *entry =3D entries[index]; > + > + old =3D xa_store(swap_zswap_tree(page_swpentry), > + swp_offset(page_swpentry), > + entry, GFP_KERNEL); > + if (xa_is_err(old)) { > + int err =3D xa_err(old); > + > + WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray er= ror: %d\n", err); > + zswap_reject_alloc_fail++; > + goto store_folio_failed; > + } > > - /* > - * We finish initializing the entry while it's already in xarray. > - * This is safe because: > - * > - * 1. Concurrent stores and invalidations are excluded by folio l= ock. > - * > - * 2. Writeback is excluded by the entry not being on the LRU yet= . > - * The publishing order matters to prevent writeback from seei= ng > - * an incoherent entry. > - */ > - entry->pool =3D pool; > - entry->swpentry =3D page_swpentry; > - entry->objcg =3D objcg; > - entry->referenced =3D true; > - if (entry->length) { > - INIT_LIST_HEAD(&entry->lru); > - zswap_lru_add(&zswap_list_lru, entry); > + /* > + * We may have had an existing entry that became stale wh= en > + * the folio was redirtied and now the new version is bei= ng > + * swapped out. Get rid of the old. > + */ > + if (old) > + zswap_entry_free(old); > + > + /* > + * The entry is successfully compressed and stored in the= tree, there is > + * no further possibility of failure. Grab refs to the po= ol and objcg. > + * These refs will be dropped by zswap_entry_free() when = the entry is > + * removed from the tree. > + */ > + zswap_pool_get(pool); > + if (objcg) > + obj_cgroup_get(objcg); > + > + /* > + * We finish initializing the entry while it's already in= xarray. > + * This is safe because: > + * > + * 1. Concurrent stores and invalidations are excluded by= folio lock. > + * > + * 2. Writeback is excluded by the entry not being on the= LRU yet. > + * The publishing order matters to prevent writeback f= rom seeing > + * an incoherent entry. > + */ > + entry->pool =3D pool; > + entry->swpentry =3D page_swpentry; > + entry->objcg =3D objcg; > + entry->referenced =3D true; > + if (entry->length) { > + INIT_LIST_HEAD(&entry->lru); > + zswap_lru_add(&zswap_list_lru, entry); > + } > + > + compressed_bytes +=3D entry->length; > } > > - return entry->length; > + kfree(entries); > + > + return compressed_bytes; > + > +store_folio_failed: > + for (index =3D 0; index < nr_pages; ++index) { > + if (!IS_ERR_VALUE(entries[index]->handle)) > + zpool_free(pool->zpool, entries[index]->handle); > + > + zswap_entry_cache_free(entries[index]); > + } If there is a failure in xa_store() halfway through the entries, this loop will free all the compressed objects and entries. But, some of the entries are already in the xarray, and zswap_store() will try to free them again. This seems like a bug, or did I miss something here? > + > + kfree(entries); > > -store_failed: > - zpool_free(pool->zpool, entry->handle); > -compress_failed: > - zswap_entry_cache_free(entry); > return -EINVAL; > } > > @@ -1549,8 +1601,8 @@ bool zswap_store(struct folio *folio) > struct mem_cgroup *memcg =3D NULL; > struct zswap_pool *pool; > size_t compressed_bytes =3D 0; > + ssize_t bytes; > bool ret =3D false; > - long index; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > @@ -1584,15 +1636,11 @@ bool zswap_store(struct folio *folio) > mem_cgroup_put(memcg); > } > > - for (index =3D 0; index < nr_pages; ++index) { > - struct page *page =3D folio_page(folio, index); > - ssize_t bytes; > + bytes =3D zswap_store_folio(folio, objcg, pool); > + if (bytes < 0) > + goto put_pool; > > - bytes =3D zswap_store_page(page, objcg, pool); > - if (bytes < 0) > - goto put_pool; > - compressed_bytes +=3D bytes; > - } > + compressed_bytes =3D bytes; What's the point of having both compressed_bytes and bytes now? > > if (objcg) { > obj_cgroup_charge_zswap(objcg, compressed_bytes); > @@ -1622,6 +1670,7 @@ bool zswap_store(struct folio *folio) > pgoff_t offset =3D swp_offset(swp); > struct zswap_entry *entry; > struct xarray *tree; > + long index; > > for (index =3D 0; index < nr_pages; ++index) { > tree =3D swap_zswap_tree(swp_entry(type, offset += index)); > -- > 2.27.0 >