From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0EB8C0218A for ; Tue, 28 Jan 2025 19:19:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F69E280220; Tue, 28 Jan 2025 14:19:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A6C7280010; Tue, 28 Jan 2025 14:19:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4479F280220; Tue, 28 Jan 2025 14:19:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 24323280010 for ; Tue, 28 Jan 2025 14:19:48 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BF91B1C6EC7 for ; Tue, 28 Jan 2025 19:19:47 +0000 (UTC) X-FDA: 83057825214.29.F061FBD Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf18.hostedemail.com (Postfix) with ESMTP id E15361C0014 for ; Tue, 28 Jan 2025 19:19:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JNN1uygY; spf=pass (imf18.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738091986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W0rsO5JuIwjI5YH8nG8ffxUOU3TO+bJybM752ZjwPgo=; b=STjCQUSt25Nn0n0Y/6L5O+8Gb0SimUrkmgn8gtxND3R0A1H0/XwlMdvP3+nEUCHcXQhQID V17ETnzwvqt9iIGw/y3nLeb/LtbJpl/5gpKqqCwsAmBU74X/wcGAvOBRKaxJZGGJf6DE7+ SfUaBZs8jgITSZ2NTdPWwSDY6aqTwa8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JNN1uygY; spf=pass (imf18.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738091986; a=rsa-sha256; cv=none; b=6g+iEYPUKxgDXEf/Ly8/GzBbamgZ//4BnoTZyek316WFjjsjQPPlHjccU9b6gsToHlMko3 qoTNc4DzwzwWXvW0/rrj1XRQ5UHY+Aq8+UlnT2yyQithF2re4U1Irgoie+RvImP7LzMEeZ WI7b1CRE4G9CHSXL1J7Vpv/D6pnhEhs= Date: Tue, 28 Jan 2025 19:19:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1738091983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=W0rsO5JuIwjI5YH8nG8ffxUOU3TO+bJybM752ZjwPgo=; b=JNN1uygY2u1TyRhqWdGPJY+RiglaJICTc6795+pcxRCMKH/q4U5BfnhW4NJ4PGQULpjMHJ ncIVvQHm9d6Mk29RQqBJ2C8pxefzQ3CsyiN3ft+daa67EiDPomJnlMCVPr5p06ZQD4RoJb l982H3Z4tBaoV3yhD6kEX25q952bbyk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: "Sridhar, Kanchana P" Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , "linux-mm@kvack.org" , "stable@vger.kernel.org" Subject: Re: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails Message-ID: References: <20250128185507.2176-1-42.hyeyoo@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E15361C0014 X-Stat-Signature: 5y77qeynx96ct5efx96z9z4pswjdjb5n X-HE-Tag: 1738091985-741465 X-HE-Meta: U2FsdGVkX1+ZONby7TSJc9v67Gwbh2vANZWrRE++RVHsztcyOtNDjC61DYJph9mKkxD/e5m3APeK9CSrzopyii67wHdd80ZJp8H65TeahKyp5EpQVIJnlquyc4qfFrmk/K3XaYyjeGnNKVFxiXeKjaGTwQnQPqMgxpV3N0FUzYb8DFwGoAzRsEgmrwaxBbyb96Qm30sECHj3KBvFkRCVzjVc0bvla+9BQLhHZUV7+PnautpCvqgHKma6J7BmNrmKkSGbHMQDccPU5sJNCWVm9+jGErNiMtcNL8886abXUBRa9o2ps06VJl3nMbbvub58mJ6aa3nH4RWlqiVmhHayZ87EzYIp0JGCIHcUtXrfvQBkZ7BG0qZCMgrGA2MfhnK904z1oWS0G9kBhuFGDdT2aoIVh+L6r+FPRYl80x3qw0aDkP/m6BeLt6rFovf7iSeysDuCfiqTZFI7xEiIkhlTfvDnnXQHDMobuFjduf0E0bFw3YvDCXuyZUstvJQzL0pZPldbTtT8S8vDLbm0c5Ely5jAwdCpwGkxeDZzgRSiUgfk7RhDcojnzV3aELei+H7RTHuYWXFrdRq0lviy8DRoeL0XQwS9p3cmN8MjpVDW0Svni6XR40JwWKL6ZwjkALBC4hsVJFRAMuBuZw7fozWzti/Dq5S77OZPuCZSkstLOQUD3Fs/cmPu7GyBWN/CvTTNxvLBmVduQqz7ZsBB2J0JoPIth0501DLFMfcQwt0qZCE45pz8Ki2UBqXhJ8NFL7UlC+CW5uidJcLlAcJXjNVpt0ZRSzfw/NGAcdzPJ6CyOFDlFX189RNmtCMEyDRPRaHEtCNBxqFqDQfvGgNGbAlom02dh6d+NMCghPhsWtN0aBUXEhKSYDk+XwuF76MY7AJyerQ/khNIbSTDwc6AAAzqxaJUx7ZaqUwMpd5UXJXwhrwHHu3KQrjuml4/mwpoObTpz1r1FevDTIhytOCgpal tQrIcYTR yL3UltT+8YW5spDf2SnkOkLOTKqjSB4Xm1/iLJS9ngEu8vzGSlnaYbajIOC1zqLqBeQts9vRIaMIelcv13kFLi+1EkKVxgQGoFB214pxlYnqEWmaxZRrurmHh47+NcQH6T8bDjAvm4knp24Ai8x4PLo+175GPty43xZZ+gzZ+H6x0AyMbKLbanSi0XTx2c3M5BH4r9HO+c7SNMSzH1vGBakJaKeeJa3Dmm3u12z8FRLPlYwVzCzloJCJN93HeWqgFlCR7JLh3RhOV39AomzNDXFUbZgnHaK6Eyb08s3cirDqagJjGfss6Cn10ShW3MOw8Yk0v+lR8ncQnFoyYrF83Z1lPTaM8mx3z3ero7jVb0lnIva/Omloa2PzNjeL9Zm3gWElzjD6eccsE8E5o3ZLAO7E9cJ/vJo+bG0LL0wR/9EZ5GilQjX+7gZ7Vro9GGAfhM4rQ74OxR9YhLQRSVdaL6f9o6K4GhhKiw01wIfheWzP6oYty6YBRQLJ1qmxonXXKnI5COTNO/vh6DjjDmi477S2Ipz4N7qqZZK8KQPDe+QOwidjf3blrxx+T7WQieXqq7tZn7QSvIdDgn7bK0xcPsC6OKkWQdlX5O8wR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 28, 2025 at 07:09:05PM +0000, Sridhar, Kanchana P wrote: > Hi Hyeonggon, > > > -----Original Message----- > > From: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > Sent: Tuesday, January 28, 2025 10:55 AM > > To: Sridhar, Kanchana P ; Johannes Weiner > > ; Yosry Ahmed ; Nhat > > Pham ; Chengming Zhou > > ; Andrew Morton > foundation.org> > > Cc: linux-mm@kvack.org; Hyeonggon Yoo <42.hyeyoo@gmail.com>; > > stable@vger.kernel.org > > Subject: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when > > zswap_store_page() fails > > > > Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") > > skips charging any zswapped base pages when it failed to zswap the entire > > folio. > > > > However, when some base pages are zswapped but it failed to zswap > > the entire folio, the zswap operation is rolled back. > > When freeing zswap entries for those pages, zswap_entry_free() uncharges > > the pages that were not previously charged, causing zswap charging to > > become inconsistent. > > > > This inconsistency triggers two warnings with following steps: > > # On a machine with 64GiB of RAM and 36GiB of zswap > > $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng > > $ sudo reboot > > > > Two warnings are: > > in mm/memcontrol.c:163, function obj_cgroup_release(): > > WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); > > > > in mm/page_counter.c:60, function page_counter_cancel(): > > if (WARN_ONCE(new < 0, "page_counter underflow: %ld > > nr_pages=%lu\n", > > new, nr_pages)) > > > > While objcg events should only be accounted for when the entire folio is > > zswapped, objcg charging should be performed regardlessly. > > Fix accordingly. > > > > After resolving the inconsistency, these warnings disappear. > > > > Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") > > Cc: stable@vger.kernel.org > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > --- > > > > v1->v2: > > > > Fixed objcg events being accounted for on zswap failure. > > > > Fixed the incorrect description. I misunderstood that the base pages are > > going to be stored in zswap, but their zswap entries are freed immediately. > > > > Added a comment on why it charges pages that are going to be removed > > from zswap. > > > > mm/zswap.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 6504174fbc6a..10b30ac46deb 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio) > > > > bytes = zswap_store_page(page, objcg, pool); > > if (bytes < 0) > > - goto put_pool; > > + goto charge_zswap; > > compressed_bytes += bytes; > > } > > > > - if (objcg) { > > - obj_cgroup_charge_zswap(objcg, compressed_bytes); > > + if (objcg) > > count_objcg_events(objcg, ZSWPOUT, nr_pages); > > - } > > > > atomic_long_add(nr_pages, &zswap_stored_pages); > > count_vm_events(ZSWPOUT, nr_pages); > > > > ret = true; > > > > +charge_zswap: > > + /* > > + * Charge zswapped pages even when it failed to zswap the entire > > folio, > > + * because zswap_entry_free() will uncharge them anyway. > > + * Otherwise zswap charging will become inconsistent. > > + */ > > + if (objcg) > > + obj_cgroup_charge_zswap(objcg, compressed_bytes); > > Thanks for finding this bug! I am thinking it might make sense to charge > and increment the zswap_stored_pages counter in zswap_store_page(). > Something like: > > diff --git a/mm/zswap.c b/mm/zswap.c > index b84c20d889b1..fd2a72598a8a 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page *page, > entry->pool = pool; > entry->swpentry = page_swpentry; > entry->objcg = objcg; > + if (objcg) > + obj_cgroup_charge_zswap(objcg, entry->length); > entry->referenced = true; > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > zswap_lru_add(&zswap_list_lru, entry); > } > + atomic_long_inc(&zswap_stored_pages); > > return entry->length; > > @@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio) > struct obj_cgroup *objcg = NULL; > struct mem_cgroup *memcg = NULL; > struct zswap_pool *pool; > - size_t compressed_bytes = 0; > bool ret = false; > long index; > > @@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio) > bytes = zswap_store_page(page, objcg, pool); > if (bytes < 0) > goto put_pool; > - compressed_bytes += bytes; > } > > - if (objcg) { > - obj_cgroup_charge_zswap(objcg, compressed_bytes); > + if (objcg) > count_objcg_events(objcg, ZSWPOUT, nr_pages); > - } > > - atomic_long_add(nr_pages, &zswap_stored_pages); > count_vm_events(ZSWPOUT, nr_pages); > > ret = true; > > What do you think? > > Yosry, Nhat, Johannes, please let me know if this would be a cleaner > approach. If so, I don't think we would be losing a lot of performance > by not doing the one-time charge per folio, but please let me know > your thoughts as well. This is certainly cleaner, and thanks for catching that zswap_stored_pages cleanup is also wrong. I am not sure if this has meaningful impact on performance, but it seems like we are doing a bit more work in the common success case to avoid the work in the uncommon failure case. Moving the charge (and atomic addition) above the zswap_store_page() loop would be doing the opposite, albeit less clean. I don't feel strongly either way, but I slightly prefer the latter. > > Thanks, > Kanchana > > > put_pool: > > zswap_pool_put(pool); > > put_objcg: > > -- > > 2.47.1 > >