From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Yosry Ahmed <yosryahmed@google.com>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: RE: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails
Date: Tue, 28 Jan 2025 19:09:05 +0000 [thread overview]
Message-ID: <SA3PR11MB81206771932B54FCFFD0DF2CC9EF2@SA3PR11MB8120.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20250128185507.2176-1-42.hyeyoo@gmail.com>
Hi Hyeonggon,
> -----Original Message-----
> From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Sent: Tuesday, January 28, 2025 10:55 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes Weiner
> <hannes@cmpxchg.org>; Yosry Ahmed <yosryahmed@google.com>; Nhat
> Pham <nphamcs@gmail.com>; Chengming Zhou
> <chengming.zhou@linux.dev>; Andrew Morton <akpm@linux-
> foundation.org>
> Cc: linux-mm@kvack.org; Hyeonggon Yoo <42.hyeyoo@gmail.com>;
> stable@vger.kernel.org
> Subject: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when
> zswap_store_page() fails
>
> Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
> skips charging any zswapped base pages when it failed to zswap the entire
> folio.
>
> However, when some base pages are zswapped but it failed to zswap
> the entire folio, the zswap operation is rolled back.
> When freeing zswap entries for those pages, zswap_entry_free() uncharges
> the pages that were not previously charged, causing zswap charging to
> become inconsistent.
>
> This inconsistency triggers two warnings with following steps:
> # On a machine with 64GiB of RAM and 36GiB of zswap
> $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
> $ sudo reboot
>
> Two warnings are:
> in mm/memcontrol.c:163, function obj_cgroup_release():
> WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
>
> in mm/page_counter.c:60, function page_counter_cancel():
> if (WARN_ONCE(new < 0, "page_counter underflow: %ld
> nr_pages=%lu\n",
> new, nr_pages))
>
> While objcg events should only be accounted for when the entire folio is
> zswapped, objcg charging should be performed regardlessly.
> Fix accordingly.
>
> After resolving the inconsistency, these warnings disappear.
>
> Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
>
> v1->v2:
>
> Fixed objcg events being accounted for on zswap failure.
>
> Fixed the incorrect description. I misunderstood that the base pages are
> going to be stored in zswap, but their zswap entries are freed immediately.
>
> Added a comment on why it charges pages that are going to be removed
> from zswap.
>
> mm/zswap.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 6504174fbc6a..10b30ac46deb 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio)
>
> bytes = zswap_store_page(page, objcg, pool);
> if (bytes < 0)
> - goto put_pool;
> + goto charge_zswap;
> compressed_bytes += bytes;
> }
>
> - if (objcg) {
> - obj_cgroup_charge_zswap(objcg, compressed_bytes);
> + if (objcg)
> count_objcg_events(objcg, ZSWPOUT, nr_pages);
> - }
>
> atomic_long_add(nr_pages, &zswap_stored_pages);
> count_vm_events(ZSWPOUT, nr_pages);
>
> ret = true;
>
> +charge_zswap:
> + /*
> + * Charge zswapped pages even when it failed to zswap the entire
> folio,
> + * because zswap_entry_free() will uncharge them anyway.
> + * Otherwise zswap charging will become inconsistent.
> + */
> + if (objcg)
> + obj_cgroup_charge_zswap(objcg, compressed_bytes);
Thanks for finding this bug! I am thinking it might make sense to charge
and increment the zswap_stored_pages counter in zswap_store_page().
Something like:
diff --git a/mm/zswap.c b/mm/zswap.c
index b84c20d889b1..fd2a72598a8a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page *page,
entry->pool = pool;
entry->swpentry = page_swpentry;
entry->objcg = objcg;
+ if (objcg)
+ obj_cgroup_charge_zswap(objcg, entry->length);
entry->referenced = true;
if (entry->length) {
INIT_LIST_HEAD(&entry->lru);
zswap_lru_add(&zswap_list_lru, entry);
}
+ atomic_long_inc(&zswap_stored_pages);
return entry->length;
@@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio)
struct obj_cgroup *objcg = NULL;
struct mem_cgroup *memcg = NULL;
struct zswap_pool *pool;
- size_t compressed_bytes = 0;
bool ret = false;
long index;
@@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)
goto put_pool;
- compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
- atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
What do you think?
Yosry, Nhat, Johannes, please let me know if this would be a cleaner
approach. If so, I don't think we would be losing a lot of performance
by not doing the one-time charge per folio, but please let me know
your thoughts as well.
Thanks,
Kanchana
> put_pool:
> zswap_pool_put(pool);
> put_objcg:
> --
> 2.47.1
next prev parent reply other threads:[~2025-01-28 19:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 18:55 Hyeonggon Yoo
2025-01-28 19:03 ` Yosry Ahmed
2025-01-28 19:19 ` Sridhar, Kanchana P
2025-01-28 19:09 ` Sridhar, Kanchana P [this message]
2025-01-28 19:19 ` Yosry Ahmed
2025-01-29 5:48 ` Hyeonggon Yoo
2025-01-29 6:40 ` Hyeonggon Yoo
2025-01-29 7:56 ` Sridhar, Kanchana P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SA3PR11MB81206771932B54FCFFD0DF2CC9EF2@SA3PR11MB8120.namprd11.prod.outlook.com \
--to=kanchana.p.sridhar@intel.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=stable@vger.kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox