From: Yosry Ahmed <yosryahmed@google.com>
To: Erhard Furtner <erhard_f@mailbox.org>
Cc: Yu Zhao <yuzhao@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)
Date: Mon, 3 Jun 2024 16:24:02 -0700 [thread overview]
Message-ID: <CAJD7tkbCRLdy0vD2Pd17fNrxHgkzW1VucN4qMkohLFLBLaaeCQ@mail.gmail.com> (raw)
In-Reply-To: <20240604001304.5420284f@yea>
[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]
On Mon, Jun 3, 2024 at 3:13 PM Erhard Furtner <erhard_f@mailbox.org> wrote:
>
> On Sun, 2 Jun 2024 20:03:32 +0200
> Erhard Furtner <erhard_f@mailbox.org> wrote:
>
> > On Sat, 1 Jun 2024 00:01:48 -0600
> > Yu Zhao <yuzhao@google.com> wrote:
> >
> > > The OOM kills on both kernel versions seem to be reasonable to me.
> > >
> > > Your system has 2GB memory and it uses zswap with zsmalloc (which is
> > > good since it can allocate from the highmem zone) and zstd/lzo (which
> > > doesn't matter much). Somehow -- I couldn't figure out why -- it
> > > splits the 2GB into a 0.25GB DMA zone and a 1.75GB highmem zone:
> > >
> > > [ 0.000000] Zone ranges:
> > > [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
> > > [ 0.000000] Normal empty
> > > [ 0.000000] HighMem [mem 0x0000000030000000-0x000000007fffffff]
> > >
> > > The kernel can't allocate from the highmem zone -- only userspace and
> > > zsmalloc can. OOM kills were due to the low memory conditions in the
> > > DMA zone where the kernel itself failed to allocate from.
> > >
> > > Do you know a kernel version that doesn't have OOM kills while running
> > > the same workload? If so, could you send that .config to me? If not,
> > > could you try disabling CONFIG_HIGHMEM? (It might not help but I'm out
> > > of ideas at the moment.)
>
> Ok, the bisect I did actually revealed something meaningful:
>
> # git bisect good
> b8cf32dc6e8c75b712cbf638e0fd210101c22f17 is the first bad commit
> commit b8cf32dc6e8c75b712cbf638e0fd210101c22f17
> Author: Yosry Ahmed <yosryahmed@google.com>
> Date: Tue Jun 20 19:46:44 2023 +0000
>
> mm: zswap: multiple zpools support
Thanks for bisecting. Taking a look at the thread, it seems like you
have a very limited area of memory to allocate kernel memory from. One
possible reason why that commit can cause an issue is because we will
have multiple instances of the zsmalloc slab caches 'zspage' and
'zs_handle', which may contribute to fragmentation in slab memory.
Do you have /proc/slabinfo from a good and a bad run by any chance?
Also, could you check if the attached patch helps? It makes sure that
even when we use multiple zsmalloc zpools, we will use a single slab
cache of each type.
[-- Attachment #2: 0001-mm-zsmalloc-share-slab-caches-for-all-zsmalloc-zpool.patch --]
[-- Type: application/octet-stream, Size: 3744 bytes --]
From 1ac0fa34783f66ea7334fd2e9129bc362bee51dd Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed@google.com>
Date: Sat, 13 May 2023 01:10:34 +0000
Subject: [PATCH] mm: zsmalloc: share slab caches for all zsmalloc zpools
Zswap creates 32 zpools to improve concurrency. Each zsmalloc zpool
creates its own 'zs_handle' and 'zspage' slab caches. We end up with 32
slab caches of each type.
Since each slab cache holds some free objects, we end up with a lot of
free objects distributed among the separate zpool caches. Slab caches
are designed to handle concurrent allocations by using percpu
structures, so having a single instance of each cache should be enough,
and avoids wasting more memory than needed due to fragmentation.
Additionally, having more slab caches than needed unnecessarily slows
down code paths that iterate slab_caches.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
mm/zsmalloc.c | 60 ++++++++++++++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 27 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b42d3545ca856..c3c90618ec8b4 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -289,29 +289,8 @@ static void init_deferred_free(struct zs_pool *pool) {}
static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) {}
#endif
-static int create_cache(struct zs_pool *pool)
-{
- pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
- 0, 0, NULL);
- if (!pool->handle_cachep)
- return 1;
-
- pool->zspage_cachep = kmem_cache_create("zspage", sizeof(struct zspage),
- 0, 0, NULL);
- if (!pool->zspage_cachep) {
- kmem_cache_destroy(pool->handle_cachep);
- pool->handle_cachep = NULL;
- return 1;
- }
-
- return 0;
-}
-
-static void destroy_cache(struct zs_pool *pool)
-{
- kmem_cache_destroy(pool->handle_cachep);
- kmem_cache_destroy(pool->zspage_cachep);
-}
+static struct kmem_cache *zs_handle_cache;
+static struct kmem_cache *zspage_cache;
static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
{
@@ -2107,13 +2086,13 @@ struct zs_pool *zs_create_pool(const char *name)
spin_lock_init(&pool->lock);
atomic_set(&pool->compaction_in_progress, 0);
+ pool->handle_cachep = zs_handle_cache;
+ pool->zspage_cachep = zspage_cache;
+
pool->name = kstrdup(name, GFP_KERNEL);
if (!pool->name)
goto err;
- if (create_cache(pool))
- goto err;
-
/*
* Iterate reversely, because, size of size_class that we want to use
* for merging should be larger or equal to current size.
@@ -2234,16 +2213,41 @@ void zs_destroy_pool(struct zs_pool *pool)
kfree(class);
}
- destroy_cache(pool);
kfree(pool->name);
kfree(pool);
}
EXPORT_SYMBOL_GPL(zs_destroy_pool);
+static void zs_destroy_caches(void)
+{
+ kmem_cache_destroy(zs_handle_cache);
+ kmem_cache_destroy(zspage_cache);
+ zs_handle_cache = NULL;
+ zspage_cache = NULL;
+}
+
+static int zs_create_caches(void)
+{
+ zs_handle_cache = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
+ 0, 0, NULL);
+ zspage_cache = kmem_cache_create("zspage", sizeof(struct zspage),
+ 0, 0, NULL);
+
+ if (!zs_handle_cache || !zspage_cache) {
+ zs_destroy_caches();
+ return -1;
+ }
+ return 0;
+}
+
static int __init zs_init(void)
{
int ret;
+ ret = zs_create_caches();
+ if (ret)
+ goto out;
+
ret = cpuhp_setup_state(CPUHP_MM_ZS_PREPARE, "mm/zsmalloc:prepare",
zs_cpu_prepare, zs_cpu_dead);
if (ret)
@@ -2258,6 +2262,7 @@ static int __init zs_init(void)
return 0;
out:
+ zs_destroy_caches();
return ret;
}
@@ -2269,6 +2274,7 @@ static void __exit zs_exit(void)
cpuhp_remove_state(CPUHP_MM_ZS_PREPARE);
zs_stat_exit();
+ zs_destroy_caches();
}
module_init(zs_init);
--
2.45.1.288.g0e0cd299f1-goog
next prev parent reply other threads:[~2024-06-03 23:24 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 18:21 Erhard Furtner
2024-05-15 20:45 ` Erhard Furtner
2024-05-15 22:06 ` Yu Zhao
2024-06-01 6:01 ` Yu Zhao
2024-06-01 15:37 ` David Hildenbrand
2024-06-06 3:11 ` Michael Ellerman
2024-06-06 3:38 ` Yu Zhao
2024-06-06 12:08 ` Michael Ellerman
2024-06-06 16:05 ` Erhard Furtner
2024-06-02 18:03 ` Erhard Furtner
2024-06-02 20:38 ` Yu Zhao
2024-06-02 21:36 ` Erhard Furtner
2024-06-03 22:13 ` Erhard Furtner
2024-06-03 23:24 ` Yosry Ahmed [this message]
[not found] ` <20240604134458.3ae4396a@yea>
2024-06-04 16:11 ` Yosry Ahmed
2024-06-04 17:18 ` Yu Zhao
2024-06-04 17:34 ` Yosry Ahmed
2024-06-04 17:53 ` Yu Zhao
2024-06-04 18:01 ` Yosry Ahmed
2024-06-04 21:00 ` Vlastimil Babka (SUSE)
2024-06-04 21:10 ` Erhard Furtner
2024-06-05 3:03 ` Yosry Ahmed
2024-06-05 23:04 ` Erhard Furtner
2024-06-05 23:41 ` Yosry Ahmed
2024-06-05 23:52 ` Yu Zhao
2024-06-05 23:58 ` Yosry Ahmed
2024-06-06 13:28 ` Erhard Furtner
2024-06-06 16:42 ` Yosry Ahmed
2024-06-06 2:49 ` Chengming Zhou
2024-06-06 4:31 ` Sergey Senozhatsky
2024-06-06 4:46 ` Chengming Zhou
2024-06-06 5:43 ` Sergey Senozhatsky
2024-06-06 5:55 ` Chengming Zhou
2024-06-07 9:40 ` Nhat Pham
2024-06-07 11:20 ` Sergey Senozhatsky
2024-06-06 7:24 ` Vlastimil Babka (SUSE)
2024-06-06 13:32 ` Erhard Furtner
2024-06-06 16:53 ` Vlastimil Babka (SUSE)
2024-06-06 17:14 ` Takero Funaki
2024-06-06 17:41 ` Yosry Ahmed
2024-06-06 17:55 ` Yu Zhao
2024-06-06 18:03 ` Yosry Ahmed
2024-06-04 22:17 ` Erhard Furtner
2024-06-04 20:52 ` Vlastimil Babka (SUSE)
2024-06-04 20:55 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkbCRLdy0vD2Pd17fNrxHgkzW1VucN4qMkohLFLBLaaeCQ@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=erhard_f@mailbox.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox