From: Yosry Ahmed <yosryahmed@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Erhard Furtner <erhard_f@mailbox.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Minchan Kim <minchan@kernel.org>
Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)
Date: Tue, 4 Jun 2024 11:01:39 -0700 [thread overview]
Message-ID: <CAJD7tkZ+QY55GTzW9A7ZCm=rxAEfrW76cWXf8o5nwiKSXp8z=w@mail.gmail.com> (raw)
In-Reply-To: <CAOUHufb6zXr14Wm3T-4-OJh7iAq+vzDKwVYfHLhMMt96SpiZXg@mail.gmail.com>
On Tue, Jun 4, 2024 at 10:54 AM Yu Zhao <yuzhao@google.com> wrote:
>
> On Tue, Jun 4, 2024 at 11:34 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Tue, Jun 4, 2024 at 10:19 AM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Tue, Jun 4, 2024 at 10:12 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > On Tue, Jun 4, 2024 at 4:45 AM Erhard Furtner <erhard_f@mailbox.org> wrote:
> > > > >
> > > > > On Mon, 3 Jun 2024 16:24:02 -0700
> > > > > Yosry Ahmed <yosryahmed@google.com> wrote:
> > > > >
> > > > > > Thanks for bisecting. Taking a look at the thread, it seems like you
> > > > > > have a very limited area of memory to allocate kernel memory from. One
> > > > > > possible reason why that commit can cause an issue is because we will
> > > > > > have multiple instances of the zsmalloc slab caches 'zspage' and
> > > > > > 'zs_handle', which may contribute to fragmentation in slab memory.
> > > > > >
> > > > > > Do you have /proc/slabinfo from a good and a bad run by any chance?
> > > > > >
> > > > > > Also, could you check if the attached patch helps? It makes sure that
> > > > > > even when we use multiple zsmalloc zpools, we will use a single slab
> > > > > > cache of each type.
> > > > >
> > > > > Thanks for looking into this! I got you 'cat /proc/slabinfo' from a good HEAD, from a bad HEAD and from the bad HEAD + your patch applied.
> > > > >
> > > > > Good was 6be3601517d90b728095d70c14f3a04b9adcb166, bad was b8cf32dc6e8c75b712cbf638e0fd210101c22f17 which I got both from my bisect.log. I got the slabinfo shortly after boot and a 2nd time shortly before the OOM or the kswapd0: page allocation failure happens. I terminated the workload (stress-ng --vm 2 --vm-bytes 1930M --verify -v) manually shortly before the 2 GiB RAM exhausted and got the slabinfo then.
> > > > >
> > > > > The patch applied to git b8cf32dc6e8c75b712cbf638e0fd210101c22f17 unfortunately didn't make a difference, I got the kswapd0: page allocation failure nevertheless.
> > > >
> > > > Thanks for trying this out. The patch reduces the amount of wasted
> > > > memory due to the 'zs_handle' and 'zspage' caches by an order of
> > > > magnitude, but it was a small number to begin with (~250K).
> > > >
> > > > I cannot think of other reasons why having multiple zsmalloc pools
> > > > will end up using more memory in the 0.25GB zone that the kernel
> > > > allocations can be made from.
> > > >
> > > > The number of zpools can be made configurable or determined at runtime
> > > > by the size of the machine, but I don't want to do this without
> > > > understanding the problem here first. Adding other zswap and zsmalloc
> > > > folks in case they have any ideas.
> > >
> > > Hi Erhard,
> > >
> > > If it's not too much trouble, could you "grep nr_zspages /proc/vmstat"
> > > on kernels before and after the bad commit? It'd be great if you could
> > > run the grep command right before the OOM kills.
> > >
> > > The overall internal fragmentation of multiple zsmalloc pools might be
> > > higher than a single one. I suspect this might be the cause.
> >
> > I thought about the internal fragmentation of pools, but zsmalloc
> > should have access to highmem, and if I understand correctly the
> > problem here is that we are running out of space in the DMA zone when
> > making kernel allocations.
> >
> > Do you suspect zsmalloc is allocating memory from the DMA zone
> > initially, even though it has access to highmem?
>
> There was a lot of user memory in the DMA zone. So at a point the
> highmem zone was full and allocation fallback happened.
>
> The problem with zone fallback is that recent allocations go into
> lower zones, meaning they are further back on the LRU list. This
> applies to both user memory and zsmalloc memory -- the latter has a
> writeback LRU. On top of this, neither the zswap shrinker nor the
> zsmalloc shrinker (compaction) is zone aware. So page reclaim might
> have trouble hitting the right target zone.
I see what you mean. In this case, yeah I think the internal
fragmentation in the zsmalloc pools may be the reason behind the
problem.
How many CPUs does this machine have? I am wondering if 32 can be an
overkill for small machines, perhaps the number of pools should be
max(nr_cpus, 32)?
Alternatively, the number of pools should scale with the memory size
in some way, such that we only increase fragmentation when it's
tolerable.
>
> We can't really tell how zspages are distributed across zones, but the
> overall number might be helpful. It'd be great if someone could make
> nr_zspages per zone :)
next prev parent reply other threads:[~2024-06-04 18:02 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 18:21 Erhard Furtner
2024-05-15 20:45 ` Erhard Furtner
2024-05-15 22:06 ` Yu Zhao
2024-06-01 6:01 ` Yu Zhao
2024-06-01 15:37 ` David Hildenbrand
2024-06-06 3:11 ` Michael Ellerman
2024-06-06 3:38 ` Yu Zhao
2024-06-06 12:08 ` Michael Ellerman
2024-06-06 16:05 ` Erhard Furtner
2024-06-02 18:03 ` Erhard Furtner
2024-06-02 20:38 ` Yu Zhao
2024-06-02 21:36 ` Erhard Furtner
2024-06-03 22:13 ` Erhard Furtner
2024-06-03 23:24 ` Yosry Ahmed
[not found] ` <20240604134458.3ae4396a@yea>
2024-06-04 16:11 ` Yosry Ahmed
2024-06-04 17:18 ` Yu Zhao
2024-06-04 17:34 ` Yosry Ahmed
2024-06-04 17:53 ` Yu Zhao
2024-06-04 18:01 ` Yosry Ahmed [this message]
2024-06-04 21:00 ` Vlastimil Babka (SUSE)
2024-06-04 21:10 ` Erhard Furtner
2024-06-05 3:03 ` Yosry Ahmed
2024-06-05 23:04 ` Erhard Furtner
2024-06-05 23:41 ` Yosry Ahmed
2024-06-05 23:52 ` Yu Zhao
2024-06-05 23:58 ` Yosry Ahmed
2024-06-06 13:28 ` Erhard Furtner
2024-06-06 16:42 ` Yosry Ahmed
2024-06-06 2:49 ` Chengming Zhou
2024-06-06 4:31 ` Sergey Senozhatsky
2024-06-06 4:46 ` Chengming Zhou
2024-06-06 5:43 ` Sergey Senozhatsky
2024-06-06 5:55 ` Chengming Zhou
2024-06-07 9:40 ` Nhat Pham
2024-06-07 11:20 ` Sergey Senozhatsky
2024-06-06 7:24 ` Vlastimil Babka (SUSE)
2024-06-06 13:32 ` Erhard Furtner
2024-06-06 16:53 ` Vlastimil Babka (SUSE)
2024-06-06 17:14 ` Takero Funaki
2024-06-06 17:41 ` Yosry Ahmed
2024-06-06 17:55 ` Yu Zhao
2024-06-06 18:03 ` Yosry Ahmed
2024-06-04 22:17 ` Erhard Furtner
2024-06-04 20:52 ` Vlastimil Babka (SUSE)
2024-06-04 20:55 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkZ+QY55GTzW9A7ZCm=rxAEfrW76cWXf8o5nwiKSXp8z=w@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=chengming.zhou@linux.dev \
--cc=erhard_f@mailbox.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=senozhatsky@chromium.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox