From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 146D9C10DC1
	for <linux-mm@archiver.kernel.org>; Fri,  1 Dec 2023 23:52:55 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 495108D0079; Fri,  1 Dec 2023 18:52:55 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 444618D0001; Fri,  1 Dec 2023 18:52:55 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 2E52B8D0079; Fri,  1 Dec 2023 18:52:55 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 1E4988D0001
	for <linux-mm@kvack.org>; Fri,  1 Dec 2023 18:52:55 -0500 (EST)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id DEA26A06DE
	for <linux-mm@kvack.org>; Fri,  1 Dec 2023 23:52:54 +0000 (UTC)
X-FDA: 81519902268.19.C6FA1C0
Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171])
	by imf27.hostedemail.com (Postfix) with ESMTP id 1616240017
	for <linux-mm@kvack.org>; Fri,  1 Dec 2023 23:52:52 +0000 (UTC)
Authentication-Results: imf27.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=4ywIJ1N6;
	spf=pass (imf27.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=yuzhao@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1701474773;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=;
	b=ZNDV9vUOPicBiEdeYS2X2tyKX6Gq45lsl/u2SzYWtfLxds2YMhittYo6N2QW+9Tbs0U7/s
	+AMdfN8sBwXQfg7FgH6btodT5McfmRh9sSl969zYc+8yczWcTd2CZblFDeHPBVqEpjh/cn
	sQPzyxb5QHk99ryIn1kQjeoV+raqzVk=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701474773; a=rsa-sha256;
	cv=none;
	b=Dwt3r91njcziroPmfBGcleHsA9sQ65laXP5I3OxwgDDLBI4auRtRLnEx7sc8po+4zt23vQ
	slv3MrbTF5OQs4t2jfGyH5yE8XFKlMn7BUkoE35u8bCH6KVqn0FqWgGbWMNX2G1peESuDz
	9Z1WF5a6oQORQYiZWgCeR16jTucLiFA=
ARC-Authentication-Results: i=1;
	imf27.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=4ywIJ1N6;
	spf=pass (imf27.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=yuzhao@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-423f28ae2d0so44341cf.1
        for <linux-mm@kvack.org>; Fri, 01 Dec 2023 15:52:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1701474772; x=1702079572; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=;
        b=4ywIJ1N6LpcoPhqzgYbxZHxHGDxmZLFzMlLgYr3uC6RXDu4ZFAmEYszHUj0kXd5m03
         LK2ZCS78fljcs5mmFuZDxZ0OElCLAcqcdKft56uxxS7Xk26YNAR6PHNJLIzfD3wTkt2E
         FxrEzdQZ8aI4/2alwQfq1J/+hODq7mdEZg4SQ9nSp5bGsF+s6DVqK0bwehHB5J0auOlk
         /BJBPkUSGy71RTjEhVnlgLyUYPXtygJyf4MnEbocuzoDAOBDly6CdUG97CnivzTajks3
         Na/rSvXlV5/SX0ze1wRURb+vBlqD63RjoFgAUDF1jtDqMBJylOGykOM9LT7iNs7WEAlI
         SNOg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1701474772; x=1702079572;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=;
        b=LjuNkdl3WARX8GsCnUe+ix1T0uJc9boFqwSLWBHs4fWCpFyFttYeGJiy1nsDuf1tmB
         DhJzoS9UG79aAAa2HdUzPnzcZo8xu60iaUaS4rqs/Q1JQw1LMUBjhFasY/y27BV71LEB
         jQOkKAX5GO6H3W25o/AwTB7ybvePSINxo6YXTAVthetEPRYaVnapxzGz7kKDjpoaQ///
         /AoHyvwfePL6eRc5hqP+Ae+a3CcBWTp6qjpuzyBB6brYw54x1KqxyCuVCj9lSq3Mp1Mf
         FTEKv7QfSHARCgRi1hyiIt/uUtuRjb8x3HOS5G0gKWxRNKG3wdAQvjFLTpyQiGftWd06
         GeuA==
X-Gm-Message-State: AOJu0YzHv/95CDan+1g4AOuIt3QwyTzaZ5EI5/AFrfGZduVx8KfrtniU
	xdAp3FD+ETAX9pGwbhZ10R5/IYt8jDQnEqv5uxUOfQ==
X-Google-Smtp-Source: AGHT+IEs6fv3Bth4bEkG9yYuUbjP6PC5hAzI+VcXti/mV4SRARru+aYV2jx+jqksBExUnCBp12dVWUDPhx0HOutfgIU=
X-Received: by 2002:ac8:580d:0:b0:421:c480:11f with SMTP id
 g13-20020ac8580d000000b00421c480011fmr351602qtg.0.1701474771993; Fri, 01 Dec
 2023 15:52:51 -0800 (PST)
MIME-Version: 1.0
References: <CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com>
 <CAOUHufYzBfc-NtYd6XRanqPKPwyLUDBG_VYMEB4G3PsVBwLDfg@mail.gmail.com>
 <CAK8fFZ7dxkMwJFvWrsqfWRbvCoxxtC0pBrFLR_2fuJ1FeHU8Cw@mail.gmail.com>
 <CAOUHufaxNQchy9gyPLVUq67uOcF8BkV5J93ZK5Vr+SosdXZw_g@mail.gmail.com>
 <CAK8fFZ5Uez5VWDnR4Nk1FUO5Q47rr2g4=2heixkLoxCj7Cp22Q@mail.gmail.com>
 <CAOUHufb8RCBXCF_f33kO2HiEKK03nXu=W+PikfYRdnRM3kWo9w@mail.gmail.com>
 <CAK8fFZ4ZMmvp__J9vsDB8TsX4908G4vDTh2nTkDwJ107LC3Odg@mail.gmail.com>
 <CAOUHufZoYYZyQjXkU7RPedcYpxQJtPV3k1A4e2p-LC0pr28xWQ@mail.gmail.com>
 <CAK8fFZ5xUe=JMOxUWgQ-0aqWMXuZYF2EtPOoZQqr89sjrL+zTw@mail.gmail.com>
 <CAOUHufZak2uzgFog8dkwzbfVomdBOz2U+cb+k_FH3jtcxb=WfA@mail.gmail.com>
 <CAK8fFZ7b8B+Cu_3SxcsQUALByyLnKikGNOLuUuMb127jLu385g@mail.gmail.com>
 <CAOUHufYEKhG0fOTOvfWqRMTOV5MWpRdb6k9ABAvsqx-ZBjQzOA@mail.gmail.com>
 <CAK8fFZ5SYBpMU3azxrq-JEdTxTF0OT2UphwTrMma6Bf_Wm3Vsg@mail.gmail.com>
 <CAOUHufaTW7GDvTi0Dj4duk=KO35o4Di5T5pYi2x51Uu-hABBJQ@mail.gmail.com>
 <CAK8fFZ59XrdZPa3K+2UE2MwdRgoV70HuYTRHZejN37nrJvXD3w@mail.gmail.com>
 <CAK8fFZ4AxnE3ZOFDKf4bee81C933xKOBPRkb_V_Uz6ioGxQb8Q@mail.gmail.com>
 <CAOUHufbTPpKz_kdRW+5wbjtZCVGNwVVrzgZ1x==M+NOv-KVHpQ@mail.gmail.com> <CAK8fFZ7cJLcWg_Ppi8JeGYVEhzHGL_q5dDKqitRz_D9-118OUQ@mail.gmail.com>
In-Reply-To: <CAK8fFZ7cJLcWg_Ppi8JeGYVEhzHGL_q5dDKqitRz_D9-118OUQ@mail.gmail.com>
From: Yu Zhao <yuzhao@google.com>
Date: Fri, 1 Dec 2023 16:52:14 -0700
Message-ID: <CAOUHufYgLrOkVXtJij-MzuVdhmTttgBaNt2V20nSvSUpTbtX_A@mail.gmail.com>
Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with
 multi-gen LRU
To: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>, 
	Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Daniel Secik <daniel.secik@gooddata.com>, Igor Raits <igor.raits@gooddata.com>, 
	Kalesh Singh <kaleshsingh@google.com>, akpm@linux-foundation.org, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 1616240017
X-Rspam-User: 
X-Rspamd-Server: rspam11
X-Stat-Signature: 8odesg9og3pbhpodn549ufafhaz5ac6m
X-HE-Tag: 1701474772-545370
X-HE-Meta: U2FsdGVkX1+wc/O9zJVTKeAeK9fM2s3M7+FmBGGoAtfIFFth8oiSKguLDsWys6q6OR73OZA3WLg6FutW/o+6amm86fJcOiUWb0XioPTDxoHthwbwAW8EecnC9Mk+B/fg89VGwgR7sDgJ6hIC9jJgDpaawQyUjOjMgsMT0nQJMBn+Ohpbe2PXJk+L70PF47qamdcP9v8KagyoGdYN0BLKHhj1977ZxLGBXEtTO+ggwuxnDFJ1anX7svKUAXWWnkQap4Us8JwCwaUWoB/5BUzeRZ4xWaZ9Np2rDZQHyjRcTGCfr0NG3rDf5pI6lA4pmGmx8ERH5FMACNCfFB56JhMcUP7n0GJMuaW26rQK88Nvcgz2xJS6ZEL4ISeJvLQl0dNrHmZjk9gy+5V/G2JUDirm7Vnwbw98BbvMJs+oVndgHTkEV2Fkm/tcljGIkgR/4Ntll/FqJAkyeF3BMt8d0k3HFZ+/VYw4W3GeMEAmYjoTKckhqG6UteQDhqgc7jbkDsRn97fmeIOf1/pXPrX7HVIzz2sYcyn7NJLRYjYVDckLMvz8rA4nY0J92z9/7Bh7AAgShpoUoGMbdK4rvO/RzXI6NR/zeajguYtLkbzoUBdEUt9IDCCB8OHzpmNuyQOXQ4Eumo96bjDqPT7TaGm3CBxKmw3ib32CepYIzXuCjghaGk0WjaRGEiOlP5QQrm0wBXuMim+bedNuO7Yf46rF+yCndA0G2yDR3Hboo/771xAjYxIHZN+kc3QB+uLruSYtQspYxofVY/gYhWyRXioEWWDWSLx8r80aJSOuuKHTlXfN/I4qtgOB7kwR+CTVCUGm7WAGHWkUW4IXdnL+cYciwbR31cvtz2HLoEj7Niqg9i4IQs3a2dL5GRIBmmV9F61WCqzzu2ltRJLoEuQurYIOzQyZTSftBeFP6VC403Pw2+7DxfXazML/Q1FcrwZhL6j93KjJu+X/rY0X+zNKmfb57w0
 kBZ+LxVn
 TyrAPGarhbeHK5AJ5X8/09fIhJ4s2+vc7atXfk4unVl+EVgPl3MMj2UZsbcsAp1xdIayMjF1vBZNAUTh3FbN2qq/PqNPgEEbYXsX7Ka7IgTQ9cVoBUMKZEOgS27Tgzoxhc9deDriGErTCLIsaCSAIIva5w2LkVOoYtL+5wybGqLEH/dEwI7zdMVq13Ha58bxzx08eBNpeF3lt54NhqlOUc9ypZ0pQtGbWafWBVwHd7jiEBt8ggXmPlaJ85KxdgYaaLz6We16W2+xdZrhYifxwYO6AEaoGGvixpz+Z
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Nov 29, 2023 at 6:54=E2=80=AFAM Jaroslav Pulchart
<jaroslav.pulchart@gooddata.com> wrote:
>
> > On Wed, Nov 22, 2023 at 12:31=E2=80=AFAM Jaroslav Pulchart <jaroslav.pu=
lchart@gooddata.com> wrote:
> >>
> >> >
> >> > >
> >> > > On Mon, Nov 20, 2023 at 1:42=E2=80=AFAM Jaroslav Pulchart
> >> > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > >
> >> > > > > On Tue, Nov 14, 2023 at 12:30=E2=80=AFAM Jaroslav Pulchart
> >> > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > >
> >> > > > > > >
> >> > > > > > > On Mon, Nov 13, 2023 at 1:36=E2=80=AFAM Jaroslav Pulchart
> >> > > > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Nov 9, 2023 at 3:58=E2=80=AFAM Jaroslav Pulcha=
rt
> >> > > > > > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Nov 8, 2023 at 10:39=E2=80=AFPM Jaroslav P=
ulchart
> >> > > > > > > > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Wed, Nov 8, 2023 at 12:04=E2=80=AFPM Jarosl=
av Pulchart
> >> > > > > > > > > > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Hi Jaroslav,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Yu Zhao
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > thanks for response, see answers inline:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jar=
oslav Pulchart
> >> > > > > > > > > > > > > > > <jaroslav.pulchart@gooddata.com> wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hello,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I would like to report to you an unpleas=
ant behavior of multi-gen LRU
> >> > > > > > > > > > > > > > > > with strange swap in/out usage on my Del=
l 7525 two socket AMD 74F3
> >> > > > > > > > > > > > > > > > system (16numa domains).
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Kernel version please?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > 6.5.y, but we saw it sooner as it is in inve=
stigation from 23th May
> >> > > > > > > > > > > > > > (6.4.y and maybe even the 6.3.y).
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > v6.6 has a few critical fixes for MGLRU, I can=
 backport them to v6.5
> >> > > > > > > > > > > > > for you if you run into other problems with v6=
.6.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I will give it a try using 6.6.y. When it will w=
ork we can switch to
> >> > > > > > > > > > > > 6.6.y instead of backporting the stuff to 6.5.y.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Symptoms of my issue are
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > /A/ if mult-gen LRU is enabled
> >> > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 100% CPU
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Just thinking out loud: kswapd3 means the =
fourth node was under memory pressure.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >     top - 15:03:11 up 34 days,  1:51,  2=
 users,  load average: 23.34,
> >> > > > > > > > > > > > > > > > 18.26, 15.01
> >> > > > > > > > > > > > > > > >     Tasks: 1226 total,   2 running, 1224=
 sleeping,   0 stopped,   0 zombie
> >> > > > > > > > > > > > > > > >     %Cpu(s): 12.5 us,  4.7 sy,  0.0 ni, =
82.1 id,  0.0 wa,  0.4 hi,
> >> > > > > > > > > > > > > > > > 0.4 si,  0.0 st
> >> > > > > > > > > > > > > > > >     MiB Mem : 1047265.+total,  28382.7 f=
ree, 1021308.+used,    767.6 buff/cache
> >> > > > > > > > > > > > > > > >     MiB Swap:   8192.0 total,   8187.7 f=
ree,      4.2 used.  25956.7 avail Mem
> >> > > > > > > > > > > > > > > >     ...
> >> > > > > > > > > > > > > > > >         765 root      20   0       0    =
  0      0 R  98.3   0.0
> >> > > > > > > > > > > > > > > > 34969:04 kswapd3
> >> > > > > > > > > > > > > > > >     ...
> >> > > > > > > > > > > > > > > > 2/ swap space usage is low about ~4MB fr=
om 8GB as swap in zram (was
> >> > > > > > > > > > > > > > > > observed with swap disk as well and caus=
e IO latency issues due to
> >> > > > > > > > > > > > > > > > some kind of locking)
> >> > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~=
12MB/s in and ~12MB/s out
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > /B/ if mult-gen LRU is disabled
> >> > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU
> >> > > > > > > > > > > > > > > >     top - 15:02:49 up 34 days,  1:51,  2=
 users,  load average: 23.05,
> >> > > > > > > > > > > > > > > > 17.77, 14.77
> >> > > > > > > > > > > > > > > >     Tasks: 1226 total,   1 running, 1225=
 sleeping,   0 stopped,   0 zombie
> >> > > > > > > > > > > > > > > >     %Cpu(s): 14.7 us,  2.8 sy,  0.0 ni, =
81.8 id,  0.0 wa,  0.4 hi,
> >> > > > > > > > > > > > > > > > 0.4 si,  0.0 st
> >> > > > > > > > > > > > > > > >     MiB Mem : 1047265.+total,  28378.5 f=
ree, 1021313.+used,    767.3 buff/cache
> >> > > > > > > > > > > > > > > >     MiB Swap:   8192.0 total,   8189.0 f=
ree,      3.0 used.  25952.4 avail Mem
> >> > > > > > > > > > > > > > > >     ...
> >> > > > > > > > > > > > > > > >        765 root      20   0       0     =
 0      0 S   3.6   0.0
> >> > > > > > > > > > > > > > > > 34966:46 [kswapd3]
> >> > > > > > > > > > > > > > > >     ...
> >> > > > > > > > > > > > > > > > 2/ swap space usage is low (4MB)
> >> > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~=
500kB/s in and ~500kB/s out
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Both situations are wrong as they are us=
ing swap in/out extensively,
> >> > > > > > > > > > > > > > > > however the multi-gen LRU situation is 1=
0times worse.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > From the stats below, node 3 had the lowes=
t free memory. So I think in
> >> > > > > > > > > > > > > > > both cases, the reclaim activities were as=
 expected.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I do not see a reason for the memory pressur=
e and reclaims. This node
> >> > > > > > > > > > > > > > has the lowest free memory of all nodes (~30=
2MB free) that is true,
> >> > > > > > > > > > > > > > however the swap space usage is just 4MB (st=
ill going in and out). So
> >> > > > > > > > > > > > > > what can be the reason for that behaviour?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > The best analogy is that refuel (reclaim) happ=
ens before the tank
> >> > > > > > > > > > > > > becomes empty, and it happens even sooner when=
 there is a long road
> >> > > > > > > > > > > > > ahead (high order allocations).
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > The workers/application is running in pre-al=
located HugePages and the
> >> > > > > > > > > > > > > > rest is used for a small set of system servi=
ces and drivers of
> >> > > > > > > > > > > > > > devices. It is static and not growing. The i=
ssue persists when I stop
> >> > > > > > > > > > > > > > the system services and free the memory.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Yes, this helps.
> >> > > > > > > > > > > > >  Also could you attach /proc/buddyinfo from th=
e moment
> >> > > > > > > > > > > > > you hit the problem?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I can. The problem is continuous, it is 100% of =
time continuously
> >> > > > > > > > > > > > doing in/out and consuming 100% of CPU and locki=
ng IO.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The output of /proc/buddyinfo is:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > # cat /proc/buddyinfo
> >> > > > > > > > > > > > Node 0, zone      DMA      7      2      2      =
1      1      2      1
> >> > > > > > > > > > > >      1      1      2      1
> >> > > > > > > > > > > > Node 0, zone    DMA32   4567   3395   1357    84=
6    439    190     93
> >> > > > > > > > > > > >     61     43     23      4
> >> > > > > > > > > > > > Node 0, zone   Normal     19    190    140    12=
9    136     75     66
> >> > > > > > > > > > > >     41      9      1      5
> >> > > > > > > > > > > > Node 1, zone   Normal    194   1210   2080   180=
0    715    255    111
> >> > > > > > > > > > > >     56     42     36     55
> >> > > > > > > > > > > > Node 2, zone   Normal    204    768   3766   339=
4   1742    468    185
> >> > > > > > > > > > > >    194    238     47     74
> >> > > > > > > > > > > > Node 3, zone   Normal   1622   2137   1058    84=
6    388    208     97
> >> > > > > > > > > > > >     44     14     42     10
> >> > > > > > > > > > >
> >> > > > > > > > > > > Again, thinking out loud: there is only one zone o=
n node 3, i.e., the
> >> > > > > > > > > > > normal zone, and this excludes the problem commit
> >> > > > > > > > > > > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-g=
en LRU: fix per-zone
> >> > > > > > > > > > > reclaim") fixed in v6.6.
> >> > > > > > > > > >
> >> > > > > > > > > > I built vanila 6.6.1 and did the first fast test - s=
pin up and destroy
> >> > > > > > > > > > VMs only - This test does not always trigger the ksw=
apd3 continuous
> >> > > > > > > > > > swap in/out  usage but it uses it and it  looks like=
 there is a
> >> > > > > > > > > > change:
> >> > > > > > > > > >
> >> > > > > > > > > >  I can see kswapd non-continous (15s and more) usage=
 with 6.5.y
> >> > > > > > > > > >  # ps ax | grep [k]swapd
> >> > > > > > > > > >     753 ?        S      0:00 [kswapd0]
> >> > > > > > > > > >     754 ?        S      0:00 [kswapd1]
> >> > > > > > > > > >     755 ?        S      0:00 [kswapd2]
> >> > > > > > > > > >     756 ?        S      0:15 [kswapd3]    <<<<<<<<<
> >> > > > > > > > > >     757 ?        S      0:00 [kswapd4]
> >> > > > > > > > > >     758 ?        S      0:00 [kswapd5]
> >> > > > > > > > > >     759 ?        S      0:00 [kswapd6]
> >> > > > > > > > > >     760 ?        S      0:00 [kswapd7]
> >> > > > > > > > > >     761 ?        S      0:00 [kswapd8]
> >> > > > > > > > > >     762 ?        S      0:00 [kswapd9]
> >> > > > > > > > > >     763 ?        S      0:00 [kswapd10]
> >> > > > > > > > > >     764 ?        S      0:00 [kswapd11]
> >> > > > > > > > > >     765 ?        S      0:00 [kswapd12]
> >> > > > > > > > > >     766 ?        S      0:00 [kswapd13]
> >> > > > > > > > > >     767 ?        S      0:00 [kswapd14]
> >> > > > > > > > > >     768 ?        S      0:00 [kswapd15]
> >> > > > > > > > > >
> >> > > > > > > > > > and none kswapd usage with 6.6.1, that looks to be p=
romising path
> >> > > > > > > > > >
> >> > > > > > > > > > # ps ax | grep [k]swapd
> >> > > > > > > > > >     808 ?        S      0:00 [kswapd0]
> >> > > > > > > > > >     809 ?        S      0:00 [kswapd1]
> >> > > > > > > > > >     810 ?        S      0:00 [kswapd2]
> >> > > > > > > > > >     811 ?        S      0:00 [kswapd3]    <<<< nice
> >> > > > > > > > > >     812 ?        S      0:00 [kswapd4]
> >> > > > > > > > > >     813 ?        S      0:00 [kswapd5]
> >> > > > > > > > > >     814 ?        S      0:00 [kswapd6]
> >> > > > > > > > > >     815 ?        S      0:00 [kswapd7]
> >> > > > > > > > > >     816 ?        S      0:00 [kswapd8]
> >> > > > > > > > > >     817 ?        S      0:00 [kswapd9]
> >> > > > > > > > > >     818 ?        S      0:00 [kswapd10]
> >> > > > > > > > > >     819 ?        S      0:00 [kswapd11]
> >> > > > > > > > > >     820 ?        S      0:00 [kswapd12]
> >> > > > > > > > > >     821 ?        S      0:00 [kswapd13]
> >> > > > > > > > > >     822 ?        S      0:00 [kswapd14]
> >> > > > > > > > > >     823 ?        S      0:00 [kswapd15]
> >> > > > > > > > > >
> >> > > > > > > > > > I will install the 6.6.1 on the server which is doin=
g some work and
> >> > > > > > > > > > observe it later today.
> >> > > > > > > > >
> >> > > > > > > > > Thanks. Fingers crossed.
> >> > > > > > > >
> >> > > > > > > > The 6.6.y was deployed and used from 9th Nov 3PM CEST. S=
o far so good.
> >> > > > > > > > The node 3 has 163MiB free of memory and I see
> >> > > > > > > > just a few in/out swap usage sometimes (which is expecte=
d) and minimal
> >> > > > > > > > kswapd3 process usage for almost 4days.
> >> > > > > > >
> >> > > > > > > Thanks for the update!
> >> > > > > > >
> >> > > > > > > Just to confirm:
> >> > > > > > > 1. MGLRU was enabled, and
> >> > > > > >
> >> > > > > > Yes, MGLRU is enabled
> >> > > > > >
> >> > > > > > > 2. The v6.6 deployed did NOT have the patch I attached ear=
lier.
> >> > > > > >
> >> > > > > > Vanila 6.6, attached patch NOT applied.
> >> > > > > >
> >> > > > > > > Are both correct?
> >> > > > > > >
> >> > > > > > > If so, I'd very appreciate it if you could try the attache=
d patch on
> >> > > > > > > top of v6.5 and see if it helps. My suspicion is that the =
problem is
> >> > > > > > > compaction related, i.e., kswapd was woken up by high orde=
r
> >> > > > > > > allocations but didn't properly stop. But what causes the =
behavior
> >> > > > > >
> >> > > > > > Sure, I can try it. Will inform you about progress.
> >> > > > >
> >> > > > > Thanks!
> >> > > > >
> >> > > > > > > difference on v6.5 between MGLRU and the active/inactive L=
RU still
> >> > > > > > > puzzles me --the problem might be somehow masked rather th=
an fixed on
> >> > > > > > > v6.6.
> >> > > > > >
> >> > > > > > I'm not sure how I can help with the issue. Any suggestions =
on what to
> >> > > > > > change/try?
> >> > > > >
> >> > > > > Trying the attached patch is good enough for now :)
> >> > > >
> >> > > > So far I'm running the "6.5.y + patch" for 4 days without trigge=
ring
> >> > > > the infinite swap in//out usage.
> >> > > >
> >> > > > I'm observing a similar pattern in kswapd usage - "if it uses ks=
wapd,
> >> > > > then it is in majority the kswapd3 - like the vanila 6.5.y which=
 is
> >> > > > not observed with 6.6.y, (The Node's 3 free mem is 159 MB)
> >> > > > # ps ax | grep [k]swapd
> >> > > >     750 ?        S      0:00 [kswapd0]
> >> > > >     751 ?        S      0:00 [kswapd1]
> >> > > >     752 ?        S      0:00 [kswapd2]
> >> > > >     753 ?        S      0:02 [kswapd3]    <<<< it uses kswapd3, =
good
> >> > > > is that it is not continuous
> >> > > >     754 ?        S      0:00 [kswapd4]
> >> > > >     755 ?        S      0:00 [kswapd5]
> >> > > >     756 ?        S      0:00 [kswapd6]
> >> > > >     757 ?        S      0:00 [kswapd7]
> >> > > >     758 ?        S      0:00 [kswapd8]
> >> > > >     759 ?        S      0:00 [kswapd9]
> >> > > >     760 ?        S      0:00 [kswapd10]
> >> > > >     761 ?        S      0:00 [kswapd11]
> >> > > >     762 ?        S      0:00 [kswapd12]
> >> > > >     763 ?        S      0:00 [kswapd13]
> >> > > >     764 ?        S      0:00 [kswapd14]
> >> > > >     765 ?        S      0:00 [kswapd15]
> >> > > >
> >> > > > Good stuff is that the system did not end in a continuous loop o=
f swap
> >> > > > in/out usage (at least so far) which is great. See attached
> >> > > > swap_in_out_good_vs_bad.png. I will keep it running for the next=
 3
> >> > > > days.
> >> > >
> >> > > Thanks again, Jaroslav!
> >> > >
> >> > > Just a note here: I suspect the problem still exists on v6.6 but
> >> > > somehow is masked, possibly by reduced memory usage from the kerne=
l
> >> > > itself and more free memory for userspace. So to be on the safe si=
de,
> >> > > I'll post the patch and credit you as the reporter and tester.
> >> >
> >> > Morning, let's wait. I reviewed the graph and the swap in/out starte=
d
> >> > to be happening from 1:50 AM CET. Slower than before (util of cpu
> >> > 0.3%) but it is doing in/out see attached png.
> >>
> >> I investigated it more, there was an operation issue and the system
> >> disabled multi-gen lru yesterday ~10 AM CET (our temporary workaround
> >> for this problem) by
> >>    echo N > /sys/kernel/mm/lru_gen/enabled
> >> when an alert was triggered by an unexpected setup of the server.
> >> Could it be that the patch is not functional if lru_gen/enabled is
> >> 0x0000?
> >
> >
> > That=E2=80=99s correct.
> >
> >> I need to reboot the system and do the whole week's test again.
> >
> >
> > Thanks a lot!
>
> The server with 6.5.y + lru patch is stable, no continuous swap in/out
> is observed in the last 7days!
>
> I assume the fix is correct. Can you share with me the final patch for
> 6.6.y, I will use in our kernel builds till it is in the upstream.

Will do. Thank you.

Charan, does the fix previously attached seem acceptable to you? Any
additional feedback? Thanks.