Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Daniel Secik <daniel.secik@gooddata.com>,
	 Charan Teja Kalla <quic_charante@quicinc.com>,
	Igor Raits <igor.raits@gooddata.com>,
	 Kalesh Singh <kaleshsingh@google.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU
Date: Thu, 4 Jan 2024 15:34:20 +0100	[thread overview]
Message-ID: <CAK8fFZ4v3zJXseEDDP5cvArD-eQYwJf-6VQFPPQOphRQ6L-PiA@mail.gmail.com> (raw)
In-Reply-To: <CAK8fFZ6_9SieEz_JdOxUXKBpai17XbAHPJUddjig=kQZ0gP4iQ@mail.gmail.com>

>
> >
> > On Wed, Jan 3, 2024 at 2:30 PM Jaroslav Pulchart
> > <jaroslav.pulchart@gooddata.com> wrote:
> > >
> > > >
> > > > >
> > > > > Hi yu,
> > > > >
> > > > > On 12/2/2023 5:22 AM, Yu Zhao wrote:
> > > > > > Charan, does the fix previously attached seem acceptable to you? Any
> > > > > > additional feedback? Thanks.
> > > > >
> > > > > First, thanks for taking this patch to upstream.
> > > > >
> > > > > A comment in code snippet is checking just 'high wmark' pages might
> > > > > succeed here but can fail in the immediate kswapd sleep, see
> > > > > prepare_kswapd_sleep(). This can show up into the increased
> > > > > KSWAPD_HIGH_WMARK_HIT_QUICKLY, thus unnecessary kswapd run time.
> > > > > @Jaroslav: Have you observed something like above?
> > > >
> > > > I do not see any unnecessary kswapd run time, on the contrary it is
> > > > fixing the kswapd continuous run issue.
> > > >
> > > > >
> > > > > So, in downstream, we have something like for zone_watermark_ok():
> > > > > unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH << 2;
> > > > >
> > > > > Hard to convince of this 'MIN_LRU_BATCH << 2' empirical value, may be we
> > > > > should atleast use the 'MIN_LRU_BATCH' with the mentioned reasoning, is
> > > > > what all I can say for this patch.
> > > > >
> > > > > +       mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ?
> > > > > +              WMARK_PROMO : WMARK_HIGH;
> > > > > +       for (i = 0; i <= sc->reclaim_idx; i++) {
> > > > > +               struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
> > > > > +               unsigned long size = wmark_pages(zone, mark);
> > > > > +
> > > > > +               if (managed_zone(zone) &&
> > > > > +                   !zone_watermark_ok(zone, sc->order, size, sc->reclaim_idx, 0))
> > > > > +                       return false;
> > > > > +       }
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Charan
> > > >
> > > >
> > > >
> > > > --
> > > > Jaroslav Pulchart
> > > > Sr. Principal SW Engineer
> > > > GoodData
> > >
> > >
> > > Hello,
> > >
> > > today we try to update servers to 6.6.9 which contains the mglru fixes
> > > (from 6.6.8) and the server behaves much much worse.
> > >
> > > I got multiple kswapd* load to ~100% imediatelly.
> > >     555 root      20   0       0      0      0 R  99.7   0.0   4:32.86
> > > kswapd1
> > >     554 root      20   0       0      0      0 R  99.3   0.0   3:57.76
> > > kswapd0
> > >     556 root      20   0       0      0      0 R  97.7   0.0   3:42.27
> > > kswapd2
> > > are the changes in upstream different compared to the initial patch
> > > which I tested?
> > >
> > > Best regards,
> > > Jaroslav Pulchart
> >
> > Hi Jaroslav,
> >
> > My apologies for all the trouble!
> >
> > Yes, there is a slight difference between the fix you verified and
> > what went into 6.6.9. The fix in 6.6.9 is disabled under a special
> > condition which I thought wouldn't affect you.
> >
> > Could you try the attached fix again on top of 6.6.9? It removed that
> > special condition.
> >
> > Thanks!
>
> Thanks for prompt response. I did a test with the patch and it didn't
> help. The situation is super strange.
>
> I tried kernels 6.6.7, 6.6.8 and  6.6.9. I see high memory utilization
> of all numa nodes of the first cpu socket if using 6.6.9 and it is the
> worst situation, but the kswapd load is visible from 6.6.8.
>
> Setup of this server:
> * 4 chiplets per each sockets, there are 2 sockets
> * 32 GB of RAM for each chiplet, 28GB are in hugepages
>   Note: previously I have 29GB in Hugepages, I free up 1GB to avoid
> memory pressure however it is even worse now in contrary.
>
> kernel 6.6.7: I do not see kswapd usage when application started == OK
> NUMA nodes: 0 1 2 3 4 5 6 7
> HPTotalGiB: 28 28 28 28 28 28 28 28
> HPFreeGiB: 28 28 28 28 28 28 28 28
> MemTotal: 32264 32701 32701 32686 32701 32659 32701 32696
> MemFree: 2766 2715 63 2366 3495 2990 3462 252
>
> kernel 6.6.8: I see kswapd on nodes 2 and 3 when application started
> NUMA nodes: 0 1 2 3 4 5 6 7
> HPTotalGiB: 28 28 28 28 28 28 28 28
> HPFreeGiB: 28 28 28 28 28 28 28 28
> MemTotal: 32264 32701 32701 32686 32701 32701 32659 32696
> MemFree: 2744 2788 65 581 3304 3215 3266 2226
>
> kernel 6.6.9: I see kswapd on nodes 0, 1, 2 and 3 when application started
> NUMA nodes: 0 1 2 3 4 5 6 7
> HPTotalGiB: 28 28 28 28 28 28 28 28
> HPFreeGiB: 28 28 28 28 28 28 28 28
> MemTotal: 32264 32701 32701 32686 32659 32701 32701 32696
> MemFree: 75 60 60 60 3169 2784 3203 2944

I run few more combinations, and here are results / findings:

  6.6.7-1  (vanila)                            == OK, no issue

  6.6.8-1  (vanila)                            == single kswapd 100% !
  6.6.8-1  (vanila plus mglru-fix-6.6.9.patch) == OK, no issue
  6.6.8-1  (revert four mglru patches)         == OK, no issue

  6.6.9-1  (vanila)                            == four kswapd 100% !!!!
  6.6.9-2  (vanila plus mglru-fix-6.6.9.patch) == four kswapd 100% !!!!
  6.6.9-3  (revert four mglru patches)         == four kswapd 100% !!!!

Summary:
* mglru-fix-6.6.9.patch or reverting mglru patches helps in case of
kernel 6.6.8,
* there is (new?) problem in case of 6.6.9 kernel, which looks not to
be related to mglru patches at all

next prev parent reply	other threads:[~2024-01-04 14:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 14:35 Jaroslav Pulchart
2023-11-08 18:47 ` Yu Zhao
2023-11-08 20:04   ` Jaroslav Pulchart
2023-11-08 22:09     ` Yu Zhao
2023-11-09  6:39       ` Jaroslav Pulchart
2023-11-09  6:48         ` Yu Zhao
2023-11-09 10:58           ` Jaroslav Pulchart
2023-11-10  1:31             ` Yu Zhao
     [not found]               ` <CAK8fFZ5xUe=JMOxUWgQ-0aqWMXuZYF2EtPOoZQqr89sjrL+zTw@mail.gmail.com>
2023-11-13 20:09                 ` Yu Zhao
2023-11-14  7:29                   ` Jaroslav Pulchart
2023-11-14  7:47                     ` Yu Zhao
2023-11-20  8:41                       ` Jaroslav Pulchart
2023-11-22  6:13                         ` Yu Zhao
2023-11-22  7:12                           ` Jaroslav Pulchart
2023-11-22  7:30                             ` Jaroslav Pulchart
2023-11-22 14:18                               ` Yu Zhao
2023-11-29 13:54                                 ` Jaroslav Pulchart
2023-12-01 23:52                                   ` Yu Zhao
2023-12-07  8:46                                     ` Charan Teja Kalla
2023-12-07 18:23                                       ` Yu Zhao
2023-12-08  8:03                                       ` Jaroslav Pulchart
2024-01-03 21:30                                         ` Jaroslav Pulchart
2024-01-04  3:03                                           ` Yu Zhao
2024-01-04  9:46                                             ` Jaroslav Pulchart
2024-01-04 14:34                                               ` Jaroslav Pulchart [this message]
2024-01-04 23:51                                                 ` Igor Raits
2024-01-05 17:35                                                   ` Ertman, David M
2024-01-08 17:53                                                     ` Jaroslav Pulchart
2024-01-16  4:58                                                       ` Yu Zhao
2024-01-16 17:34                                                         ` Jaroslav Pulchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK8fFZ4v3zJXseEDDP5cvArD-eQYwJf-6VQFPPQOphRQ6L-PiA@mail.gmail.com \
    --to=jaroslav.pulchart@gooddata.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.secik@gooddata.com \
    --cc=igor.raits@gooddata.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=quic_charante@quicinc.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox