linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Daniel Secik <daniel.secik@gooddata.com>,
	 Charan Teja Kalla <quic_charante@quicinc.com>,
	Igor Raits <igor.raits@gooddata.com>,
	 Kalesh Singh <kaleshsingh@google.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU
Date: Thu, 4 Jan 2024 10:46:49 +0100	[thread overview]
Message-ID: <CAK8fFZ6_9SieEz_JdOxUXKBpai17XbAHPJUddjig=kQZ0gP4iQ@mail.gmail.com> (raw)
In-Reply-To: <CAOUHufaTYEuKcgnpjk5C9QgDhiEtnv0B4S8FdARQhN5=T2MPew@mail.gmail.com>

>
> On Wed, Jan 3, 2024 at 2:30 PM Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com> wrote:
> >
> > >
> > > >
> > > > Hi yu,
> > > >
> > > > On 12/2/2023 5:22 AM, Yu Zhao wrote:
> > > > > Charan, does the fix previously attached seem acceptable to you? Any
> > > > > additional feedback? Thanks.
> > > >
> > > > First, thanks for taking this patch to upstream.
> > > >
> > > > A comment in code snippet is checking just 'high wmark' pages might
> > > > succeed here but can fail in the immediate kswapd sleep, see
> > > > prepare_kswapd_sleep(). This can show up into the increased
> > > > KSWAPD_HIGH_WMARK_HIT_QUICKLY, thus unnecessary kswapd run time.
> > > > @Jaroslav: Have you observed something like above?
> > >
> > > I do not see any unnecessary kswapd run time, on the contrary it is
> > > fixing the kswapd continuous run issue.
> > >
> > > >
> > > > So, in downstream, we have something like for zone_watermark_ok():
> > > > unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH << 2;
> > > >
> > > > Hard to convince of this 'MIN_LRU_BATCH << 2' empirical value, may be we
> > > > should atleast use the 'MIN_LRU_BATCH' with the mentioned reasoning, is
> > > > what all I can say for this patch.
> > > >
> > > > +       mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ?
> > > > +              WMARK_PROMO : WMARK_HIGH;
> > > > +       for (i = 0; i <= sc->reclaim_idx; i++) {
> > > > +               struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
> > > > +               unsigned long size = wmark_pages(zone, mark);
> > > > +
> > > > +               if (managed_zone(zone) &&
> > > > +                   !zone_watermark_ok(zone, sc->order, size, sc->reclaim_idx, 0))
> > > > +                       return false;
> > > > +       }
> > > >
> > > >
> > > > Thanks,
> > > > Charan
> > >
> > >
> > >
> > > --
> > > Jaroslav Pulchart
> > > Sr. Principal SW Engineer
> > > GoodData
> >
> >
> > Hello,
> >
> > today we try to update servers to 6.6.9 which contains the mglru fixes
> > (from 6.6.8) and the server behaves much much worse.
> >
> > I got multiple kswapd* load to ~100% imediatelly.
> >     555 root      20   0       0      0      0 R  99.7   0.0   4:32.86
> > kswapd1
> >     554 root      20   0       0      0      0 R  99.3   0.0   3:57.76
> > kswapd0
> >     556 root      20   0       0      0      0 R  97.7   0.0   3:42.27
> > kswapd2
> > are the changes in upstream different compared to the initial patch
> > which I tested?
> >
> > Best regards,
> > Jaroslav Pulchart
>
> Hi Jaroslav,
>
> My apologies for all the trouble!
>
> Yes, there is a slight difference between the fix you verified and
> what went into 6.6.9. The fix in 6.6.9 is disabled under a special
> condition which I thought wouldn't affect you.
>
> Could you try the attached fix again on top of 6.6.9? It removed that
> special condition.
>
> Thanks!

Thanks for prompt response. I did a test with the patch and it didn't
help. The situation is super strange.

I tried kernels 6.6.7, 6.6.8 and  6.6.9. I see high memory utilization
of all numa nodes of the first cpu socket if using 6.6.9 and it is the
worst situation, but the kswapd load is visible from 6.6.8.

Setup of this server:
* 4 chiplets per each sockets, there are 2 sockets
* 32 GB of RAM for each chiplet, 28GB are in hugepages
  Note: previously I have 29GB in Hugepages, I free up 1GB to avoid
memory pressure however it is even worse now in contrary.

kernel 6.6.7: I do not see kswapd usage when application started == OK
NUMA nodes: 0 1 2 3 4 5 6 7
HPTotalGiB: 28 28 28 28 28 28 28 28
HPFreeGiB: 28 28 28 28 28 28 28 28
MemTotal: 32264 32701 32701 32686 32701 32659 32701 32696
MemFree: 2766 2715 63 2366 3495 2990 3462 252

kernel 6.6.8: I see kswapd on nodes 2 and 3 when application started
NUMA nodes: 0 1 2 3 4 5 6 7
HPTotalGiB: 28 28 28 28 28 28 28 28
HPFreeGiB: 28 28 28 28 28 28 28 28
MemTotal: 32264 32701 32701 32686 32701 32701 32659 32696
MemFree: 2744 2788 65 581 3304 3215 3266 2226

kernel 6.6.9: I see kswapd on nodes 0, 1, 2 and 3 when application started
NUMA nodes: 0 1 2 3 4 5 6 7
HPTotalGiB: 28 28 28 28 28 28 28 28
HPFreeGiB: 28 28 28 28 28 28 28 28
MemTotal: 32264 32701 32701 32686 32659 32701 32701 32696
MemFree: 75 60 60 60 3169 2784 3203 2944


  reply	other threads:[~2024-01-04  9:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 14:35 Jaroslav Pulchart
2023-11-08 18:47 ` Yu Zhao
2023-11-08 20:04   ` Jaroslav Pulchart
2023-11-08 22:09     ` Yu Zhao
2023-11-09  6:39       ` Jaroslav Pulchart
2023-11-09  6:48         ` Yu Zhao
2023-11-09 10:58           ` Jaroslav Pulchart
2023-11-10  1:31             ` Yu Zhao
     [not found]               ` <CAK8fFZ5xUe=JMOxUWgQ-0aqWMXuZYF2EtPOoZQqr89sjrL+zTw@mail.gmail.com>
2023-11-13 20:09                 ` Yu Zhao
2023-11-14  7:29                   ` Jaroslav Pulchart
2023-11-14  7:47                     ` Yu Zhao
2023-11-20  8:41                       ` Jaroslav Pulchart
2023-11-22  6:13                         ` Yu Zhao
2023-11-22  7:12                           ` Jaroslav Pulchart
2023-11-22  7:30                             ` Jaroslav Pulchart
2023-11-22 14:18                               ` Yu Zhao
2023-11-29 13:54                                 ` Jaroslav Pulchart
2023-12-01 23:52                                   ` Yu Zhao
2023-12-07  8:46                                     ` Charan Teja Kalla
2023-12-07 18:23                                       ` Yu Zhao
2023-12-08  8:03                                       ` Jaroslav Pulchart
2024-01-03 21:30                                         ` Jaroslav Pulchart
2024-01-04  3:03                                           ` Yu Zhao
2024-01-04  9:46                                             ` Jaroslav Pulchart [this message]
2024-01-04 14:34                                               ` Jaroslav Pulchart
2024-01-04 23:51                                                 ` Igor Raits
2024-01-05 17:35                                                   ` Ertman, David M
2024-01-08 17:53                                                     ` Jaroslav Pulchart
2024-01-16  4:58                                                       ` Yu Zhao
2024-01-16 17:34                                                         ` Jaroslav Pulchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK8fFZ6_9SieEz_JdOxUXKBpai17XbAHPJUddjig=kQZ0gP4iQ@mail.gmail.com' \
    --to=jaroslav.pulchart@gooddata.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.secik@gooddata.com \
    --cc=igor.raits@gooddata.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=quic_charante@quicinc.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox