From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 146D9C10DC1 for ; Fri, 1 Dec 2023 23:52:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 495108D0079; Fri, 1 Dec 2023 18:52:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 444618D0001; Fri, 1 Dec 2023 18:52:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E52B8D0079; Fri, 1 Dec 2023 18:52:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E4988D0001 for ; Fri, 1 Dec 2023 18:52:55 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DEA26A06DE for ; Fri, 1 Dec 2023 23:52:54 +0000 (UTC) X-FDA: 81519902268.19.C6FA1C0 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 1616240017 for ; Fri, 1 Dec 2023 23:52:52 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4ywIJ1N6; spf=pass (imf27.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701474773; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=; b=ZNDV9vUOPicBiEdeYS2X2tyKX6Gq45lsl/u2SzYWtfLxds2YMhittYo6N2QW+9Tbs0U7/s +AMdfN8sBwXQfg7FgH6btodT5McfmRh9sSl969zYc+8yczWcTd2CZblFDeHPBVqEpjh/cn sQPzyxb5QHk99ryIn1kQjeoV+raqzVk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701474773; a=rsa-sha256; cv=none; b=Dwt3r91njcziroPmfBGcleHsA9sQ65laXP5I3OxwgDDLBI4auRtRLnEx7sc8po+4zt23vQ slv3MrbTF5OQs4t2jfGyH5yE8XFKlMn7BUkoE35u8bCH6KVqn0FqWgGbWMNX2G1peESuDz 9Z1WF5a6oQORQYiZWgCeR16jTucLiFA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4ywIJ1N6; spf=pass (imf27.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-423f28ae2d0so44341cf.1 for ; Fri, 01 Dec 2023 15:52:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701474772; x=1702079572; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=; b=4ywIJ1N6LpcoPhqzgYbxZHxHGDxmZLFzMlLgYr3uC6RXDu4ZFAmEYszHUj0kXd5m03 LK2ZCS78fljcs5mmFuZDxZ0OElCLAcqcdKft56uxxS7Xk26YNAR6PHNJLIzfD3wTkt2E FxrEzdQZ8aI4/2alwQfq1J/+hODq7mdEZg4SQ9nSp5bGsF+s6DVqK0bwehHB5J0auOlk /BJBPkUSGy71RTjEhVnlgLyUYPXtygJyf4MnEbocuzoDAOBDly6CdUG97CnivzTajks3 Na/rSvXlV5/SX0ze1wRURb+vBlqD63RjoFgAUDF1jtDqMBJylOGykOM9LT7iNs7WEAlI SNOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701474772; x=1702079572; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iVfArZMTijnttXQsSDOF7UlJd/+o9yeQOrzcD3d4fpI=; b=LjuNkdl3WARX8GsCnUe+ix1T0uJc9boFqwSLWBHs4fWCpFyFttYeGJiy1nsDuf1tmB DhJzoS9UG79aAAa2HdUzPnzcZo8xu60iaUaS4rqs/Q1JQw1LMUBjhFasY/y27BV71LEB jQOkKAX5GO6H3W25o/AwTB7ybvePSINxo6YXTAVthetEPRYaVnapxzGz7kKDjpoaQ/// /AoHyvwfePL6eRc5hqP+Ae+a3CcBWTp6qjpuzyBB6brYw54x1KqxyCuVCj9lSq3Mp1Mf FTEKv7QfSHARCgRi1hyiIt/uUtuRjb8x3HOS5G0gKWxRNKG3wdAQvjFLTpyQiGftWd06 GeuA== X-Gm-Message-State: AOJu0YzHv/95CDan+1g4AOuIt3QwyTzaZ5EI5/AFrfGZduVx8KfrtniU xdAp3FD+ETAX9pGwbhZ10R5/IYt8jDQnEqv5uxUOfQ== X-Google-Smtp-Source: AGHT+IEs6fv3Bth4bEkG9yYuUbjP6PC5hAzI+VcXti/mV4SRARru+aYV2jx+jqksBExUnCBp12dVWUDPhx0HOutfgIU= X-Received: by 2002:ac8:580d:0:b0:421:c480:11f with SMTP id g13-20020ac8580d000000b00421c480011fmr351602qtg.0.1701474771993; Fri, 01 Dec 2023 15:52:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yu Zhao Date: Fri, 1 Dec 2023 16:52:14 -0700 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Jaroslav Pulchart , Charan Teja Kalla Cc: Daniel Secik , Igor Raits , Kalesh Singh , akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1616240017 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8odesg9og3pbhpodn549ufafhaz5ac6m X-HE-Tag: 1701474772-545370 X-HE-Meta: U2FsdGVkX1+wc/O9zJVTKeAeK9fM2s3M7+FmBGGoAtfIFFth8oiSKguLDsWys6q6OR73OZA3WLg6FutW/o+6amm86fJcOiUWb0XioPTDxoHthwbwAW8EecnC9Mk+B/fg89VGwgR7sDgJ6hIC9jJgDpaawQyUjOjMgsMT0nQJMBn+Ohpbe2PXJk+L70PF47qamdcP9v8KagyoGdYN0BLKHhj1977ZxLGBXEtTO+ggwuxnDFJ1anX7svKUAXWWnkQap4Us8JwCwaUWoB/5BUzeRZ4xWaZ9Np2rDZQHyjRcTGCfr0NG3rDf5pI6lA4pmGmx8ERH5FMACNCfFB56JhMcUP7n0GJMuaW26rQK88Nvcgz2xJS6ZEL4ISeJvLQl0dNrHmZjk9gy+5V/G2JUDirm7Vnwbw98BbvMJs+oVndgHTkEV2Fkm/tcljGIkgR/4Ntll/FqJAkyeF3BMt8d0k3HFZ+/VYw4W3GeMEAmYjoTKckhqG6UteQDhqgc7jbkDsRn97fmeIOf1/pXPrX7HVIzz2sYcyn7NJLRYjYVDckLMvz8rA4nY0J92z9/7Bh7AAgShpoUoGMbdK4rvO/RzXI6NR/zeajguYtLkbzoUBdEUt9IDCCB8OHzpmNuyQOXQ4Eumo96bjDqPT7TaGm3CBxKmw3ib32CepYIzXuCjghaGk0WjaRGEiOlP5QQrm0wBXuMim+bedNuO7Yf46rF+yCndA0G2yDR3Hboo/771xAjYxIHZN+kc3QB+uLruSYtQspYxofVY/gYhWyRXioEWWDWSLx8r80aJSOuuKHTlXfN/I4qtgOB7kwR+CTVCUGm7WAGHWkUW4IXdnL+cYciwbR31cvtz2HLoEj7Niqg9i4IQs3a2dL5GRIBmmV9F61WCqzzu2ltRJLoEuQurYIOzQyZTSftBeFP6VC403Pw2+7DxfXazML/Q1FcrwZhL6j93KjJu+X/rY0X+zNKmfb57w0 kBZ+LxVn TyrAPGarhbeHK5AJ5X8/09fIhJ4s2+vc7atXfk4unVl+EVgPl3MMj2UZsbcsAp1xdIayMjF1vBZNAUTh3FbN2qq/PqNPgEEbYXsX7Ka7IgTQ9cVoBUMKZEOgS27Tgzoxhc9deDriGErTCLIsaCSAIIva5w2LkVOoYtL+5wybGqLEH/dEwI7zdMVq13Ha58bxzx08eBNpeF3lt54NhqlOUc9ypZ0pQtGbWafWBVwHd7jiEBt8ggXmPlaJ85KxdgYaaLz6We16W2+xdZrhYifxwYO6AEaoGGvixpz+Z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 29, 2023 at 6:54=E2=80=AFAM Jaroslav Pulchart wrote: > > > On Wed, Nov 22, 2023 at 12:31=E2=80=AFAM Jaroslav Pulchart wrote: > >> > >> > > >> > > > >> > > On Mon, Nov 20, 2023 at 1:42=E2=80=AFAM Jaroslav Pulchart > >> > > wrote: > >> > > > > >> > > > > On Tue, Nov 14, 2023 at 12:30=E2=80=AFAM Jaroslav Pulchart > >> > > > > wrote: > >> > > > > > > >> > > > > > > > >> > > > > > > On Mon, Nov 13, 2023 at 1:36=E2=80=AFAM Jaroslav Pulchart > >> > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > On Thu, Nov 9, 2023 at 3:58=E2=80=AFAM Jaroslav Pulcha= rt > >> > > > > > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > On Wed, Nov 8, 2023 at 10:39=E2=80=AFPM Jaroslav P= ulchart > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > On Wed, Nov 8, 2023 at 12:04=E2=80=AFPM Jarosl= av Pulchart > >> > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Jaroslav, > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Yu Zhao > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > thanks for response, see answers inline: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jar= oslav Pulchart > >> > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hello, > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > I would like to report to you an unpleas= ant behavior of multi-gen LRU > >> > > > > > > > > > > > > > > > with strange swap in/out usage on my Del= l 7525 two socket AMD 74F3 > >> > > > > > > > > > > > > > > > system (16numa domains). > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Kernel version please? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 6.5.y, but we saw it sooner as it is in inve= stigation from 23th May > >> > > > > > > > > > > > > > (6.4.y and maybe even the 6.3.y). > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > v6.6 has a few critical fixes for MGLRU, I can= backport them to v6.5 > >> > > > > > > > > > > > > for you if you run into other problems with v6= .6. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > I will give it a try using 6.6.y. When it will w= ork we can switch to > >> > > > > > > > > > > > 6.6.y instead of backporting the stuff to 6.5.y. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Symptoms of my issue are > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > /A/ if mult-gen LRU is enabled > >> > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 100% CPU > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Just thinking out loud: kswapd3 means the = fourth node was under memory pressure. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > top - 15:03:11 up 34 days, 1:51, 2= users, load average: 23.34, > >> > > > > > > > > > > > > > > > 18.26, 15.01 > >> > > > > > > > > > > > > > > > Tasks: 1226 total, 2 running, 1224= sleeping, 0 stopped, 0 zombie > >> > > > > > > > > > > > > > > > %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, = 82.1 id, 0.0 wa, 0.4 hi, > >> > > > > > > > > > > > > > > > 0.4 si, 0.0 st > >> > > > > > > > > > > > > > > > MiB Mem : 1047265.+total, 28382.7 f= ree, 1021308.+used, 767.6 buff/cache > >> > > > > > > > > > > > > > > > MiB Swap: 8192.0 total, 8187.7 f= ree, 4.2 used. 25956.7 avail Mem > >> > > > > > > > > > > > > > > > ... > >> > > > > > > > > > > > > > > > 765 root 20 0 0 = 0 0 R 98.3 0.0 > >> > > > > > > > > > > > > > > > 34969:04 kswapd3 > >> > > > > > > > > > > > > > > > ... > >> > > > > > > > > > > > > > > > 2/ swap space usage is low about ~4MB fr= om 8GB as swap in zram (was > >> > > > > > > > > > > > > > > > observed with swap disk as well and caus= e IO latency issues due to > >> > > > > > > > > > > > > > > > some kind of locking) > >> > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~= 12MB/s in and ~12MB/s out > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > /B/ if mult-gen LRU is disabled > >> > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU > >> > > > > > > > > > > > > > > > top - 15:02:49 up 34 days, 1:51, 2= users, load average: 23.05, > >> > > > > > > > > > > > > > > > 17.77, 14.77 > >> > > > > > > > > > > > > > > > Tasks: 1226 total, 1 running, 1225= sleeping, 0 stopped, 0 zombie > >> > > > > > > > > > > > > > > > %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, = 81.8 id, 0.0 wa, 0.4 hi, > >> > > > > > > > > > > > > > > > 0.4 si, 0.0 st > >> > > > > > > > > > > > > > > > MiB Mem : 1047265.+total, 28378.5 f= ree, 1021313.+used, 767.3 buff/cache > >> > > > > > > > > > > > > > > > MiB Swap: 8192.0 total, 8189.0 f= ree, 3.0 used. 25952.4 avail Mem > >> > > > > > > > > > > > > > > > ... > >> > > > > > > > > > > > > > > > 765 root 20 0 0 = 0 0 S 3.6 0.0 > >> > > > > > > > > > > > > > > > 34966:46 [kswapd3] > >> > > > > > > > > > > > > > > > ... > >> > > > > > > > > > > > > > > > 2/ swap space usage is low (4MB) > >> > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~= 500kB/s in and ~500kB/s out > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Both situations are wrong as they are us= ing swap in/out extensively, > >> > > > > > > > > > > > > > > > however the multi-gen LRU situation is 1= 0times worse. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > From the stats below, node 3 had the lowes= t free memory. So I think in > >> > > > > > > > > > > > > > > both cases, the reclaim activities were as= expected. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > I do not see a reason for the memory pressur= e and reclaims. This node > >> > > > > > > > > > > > > > has the lowest free memory of all nodes (~30= 2MB free) that is true, > >> > > > > > > > > > > > > > however the swap space usage is just 4MB (st= ill going in and out). So > >> > > > > > > > > > > > > > what can be the reason for that behaviour? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > The best analogy is that refuel (reclaim) happ= ens before the tank > >> > > > > > > > > > > > > becomes empty, and it happens even sooner when= there is a long road > >> > > > > > > > > > > > > ahead (high order allocations). > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > The workers/application is running in pre-al= located HugePages and the > >> > > > > > > > > > > > > > rest is used for a small set of system servi= ces and drivers of > >> > > > > > > > > > > > > > devices. It is static and not growing. The i= ssue persists when I stop > >> > > > > > > > > > > > > > the system services and free the memory. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Yes, this helps. > >> > > > > > > > > > > > > Also could you attach /proc/buddyinfo from th= e moment > >> > > > > > > > > > > > > you hit the problem? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > I can. The problem is continuous, it is 100% of = time continuously > >> > > > > > > > > > > > doing in/out and consuming 100% of CPU and locki= ng IO. > >> > > > > > > > > > > > > >> > > > > > > > > > > > The output of /proc/buddyinfo is: > >> > > > > > > > > > > > > >> > > > > > > > > > > > # cat /proc/buddyinfo > >> > > > > > > > > > > > Node 0, zone DMA 7 2 2 = 1 1 2 1 > >> > > > > > > > > > > > 1 1 2 1 > >> > > > > > > > > > > > Node 0, zone DMA32 4567 3395 1357 84= 6 439 190 93 > >> > > > > > > > > > > > 61 43 23 4 > >> > > > > > > > > > > > Node 0, zone Normal 19 190 140 12= 9 136 75 66 > >> > > > > > > > > > > > 41 9 1 5 > >> > > > > > > > > > > > Node 1, zone Normal 194 1210 2080 180= 0 715 255 111 > >> > > > > > > > > > > > 56 42 36 55 > >> > > > > > > > > > > > Node 2, zone Normal 204 768 3766 339= 4 1742 468 185 > >> > > > > > > > > > > > 194 238 47 74 > >> > > > > > > > > > > > Node 3, zone Normal 1622 2137 1058 84= 6 388 208 97 > >> > > > > > > > > > > > 44 14 42 10 > >> > > > > > > > > > > > >> > > > > > > > > > > Again, thinking out loud: there is only one zone o= n node 3, i.e., the > >> > > > > > > > > > > normal zone, and this excludes the problem commit > >> > > > > > > > > > > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-g= en LRU: fix per-zone > >> > > > > > > > > > > reclaim") fixed in v6.6. > >> > > > > > > > > > > >> > > > > > > > > > I built vanila 6.6.1 and did the first fast test - s= pin up and destroy > >> > > > > > > > > > VMs only - This test does not always trigger the ksw= apd3 continuous > >> > > > > > > > > > swap in/out usage but it uses it and it looks like= there is a > >> > > > > > > > > > change: > >> > > > > > > > > > > >> > > > > > > > > > I can see kswapd non-continous (15s and more) usage= with 6.5.y > >> > > > > > > > > > # ps ax | grep [k]swapd > >> > > > > > > > > > 753 ? S 0:00 [kswapd0] > >> > > > > > > > > > 754 ? S 0:00 [kswapd1] > >> > > > > > > > > > 755 ? S 0:00 [kswapd2] > >> > > > > > > > > > 756 ? S 0:15 [kswapd3] <<<<<<<<< > >> > > > > > > > > > 757 ? S 0:00 [kswapd4] > >> > > > > > > > > > 758 ? S 0:00 [kswapd5] > >> > > > > > > > > > 759 ? S 0:00 [kswapd6] > >> > > > > > > > > > 760 ? S 0:00 [kswapd7] > >> > > > > > > > > > 761 ? S 0:00 [kswapd8] > >> > > > > > > > > > 762 ? S 0:00 [kswapd9] > >> > > > > > > > > > 763 ? S 0:00 [kswapd10] > >> > > > > > > > > > 764 ? S 0:00 [kswapd11] > >> > > > > > > > > > 765 ? S 0:00 [kswapd12] > >> > > > > > > > > > 766 ? S 0:00 [kswapd13] > >> > > > > > > > > > 767 ? S 0:00 [kswapd14] > >> > > > > > > > > > 768 ? S 0:00 [kswapd15] > >> > > > > > > > > > > >> > > > > > > > > > and none kswapd usage with 6.6.1, that looks to be p= romising path > >> > > > > > > > > > > >> > > > > > > > > > # ps ax | grep [k]swapd > >> > > > > > > > > > 808 ? S 0:00 [kswapd0] > >> > > > > > > > > > 809 ? S 0:00 [kswapd1] > >> > > > > > > > > > 810 ? S 0:00 [kswapd2] > >> > > > > > > > > > 811 ? S 0:00 [kswapd3] <<<< nice > >> > > > > > > > > > 812 ? S 0:00 [kswapd4] > >> > > > > > > > > > 813 ? S 0:00 [kswapd5] > >> > > > > > > > > > 814 ? S 0:00 [kswapd6] > >> > > > > > > > > > 815 ? S 0:00 [kswapd7] > >> > > > > > > > > > 816 ? S 0:00 [kswapd8] > >> > > > > > > > > > 817 ? S 0:00 [kswapd9] > >> > > > > > > > > > 818 ? S 0:00 [kswapd10] > >> > > > > > > > > > 819 ? S 0:00 [kswapd11] > >> > > > > > > > > > 820 ? S 0:00 [kswapd12] > >> > > > > > > > > > 821 ? S 0:00 [kswapd13] > >> > > > > > > > > > 822 ? S 0:00 [kswapd14] > >> > > > > > > > > > 823 ? S 0:00 [kswapd15] > >> > > > > > > > > > > >> > > > > > > > > > I will install the 6.6.1 on the server which is doin= g some work and > >> > > > > > > > > > observe it later today. > >> > > > > > > > > > >> > > > > > > > > Thanks. Fingers crossed. > >> > > > > > > > > >> > > > > > > > The 6.6.y was deployed and used from 9th Nov 3PM CEST. S= o far so good. > >> > > > > > > > The node 3 has 163MiB free of memory and I see > >> > > > > > > > just a few in/out swap usage sometimes (which is expecte= d) and minimal > >> > > > > > > > kswapd3 process usage for almost 4days. > >> > > > > > > > >> > > > > > > Thanks for the update! > >> > > > > > > > >> > > > > > > Just to confirm: > >> > > > > > > 1. MGLRU was enabled, and > >> > > > > > > >> > > > > > Yes, MGLRU is enabled > >> > > > > > > >> > > > > > > 2. The v6.6 deployed did NOT have the patch I attached ear= lier. > >> > > > > > > >> > > > > > Vanila 6.6, attached patch NOT applied. > >> > > > > > > >> > > > > > > Are both correct? > >> > > > > > > > >> > > > > > > If so, I'd very appreciate it if you could try the attache= d patch on > >> > > > > > > top of v6.5 and see if it helps. My suspicion is that the = problem is > >> > > > > > > compaction related, i.e., kswapd was woken up by high orde= r > >> > > > > > > allocations but didn't properly stop. But what causes the = behavior > >> > > > > > > >> > > > > > Sure, I can try it. Will inform you about progress. > >> > > > > > >> > > > > Thanks! > >> > > > > > >> > > > > > > difference on v6.5 between MGLRU and the active/inactive L= RU still > >> > > > > > > puzzles me --the problem might be somehow masked rather th= an fixed on > >> > > > > > > v6.6. > >> > > > > > > >> > > > > > I'm not sure how I can help with the issue. Any suggestions = on what to > >> > > > > > change/try? > >> > > > > > >> > > > > Trying the attached patch is good enough for now :) > >> > > > > >> > > > So far I'm running the "6.5.y + patch" for 4 days without trigge= ring > >> > > > the infinite swap in//out usage. > >> > > > > >> > > > I'm observing a similar pattern in kswapd usage - "if it uses ks= wapd, > >> > > > then it is in majority the kswapd3 - like the vanila 6.5.y which= is > >> > > > not observed with 6.6.y, (The Node's 3 free mem is 159 MB) > >> > > > # ps ax | grep [k]swapd > >> > > > 750 ? S 0:00 [kswapd0] > >> > > > 751 ? S 0:00 [kswapd1] > >> > > > 752 ? S 0:00 [kswapd2] > >> > > > 753 ? S 0:02 [kswapd3] <<<< it uses kswapd3, = good > >> > > > is that it is not continuous > >> > > > 754 ? S 0:00 [kswapd4] > >> > > > 755 ? S 0:00 [kswapd5] > >> > > > 756 ? S 0:00 [kswapd6] > >> > > > 757 ? S 0:00 [kswapd7] > >> > > > 758 ? S 0:00 [kswapd8] > >> > > > 759 ? S 0:00 [kswapd9] > >> > > > 760 ? S 0:00 [kswapd10] > >> > > > 761 ? S 0:00 [kswapd11] > >> > > > 762 ? S 0:00 [kswapd12] > >> > > > 763 ? S 0:00 [kswapd13] > >> > > > 764 ? S 0:00 [kswapd14] > >> > > > 765 ? S 0:00 [kswapd15] > >> > > > > >> > > > Good stuff is that the system did not end in a continuous loop o= f swap > >> > > > in/out usage (at least so far) which is great. See attached > >> > > > swap_in_out_good_vs_bad.png. I will keep it running for the next= 3 > >> > > > days. > >> > > > >> > > Thanks again, Jaroslav! > >> > > > >> > > Just a note here: I suspect the problem still exists on v6.6 but > >> > > somehow is masked, possibly by reduced memory usage from the kerne= l > >> > > itself and more free memory for userspace. So to be on the safe si= de, > >> > > I'll post the patch and credit you as the reporter and tester. > >> > > >> > Morning, let's wait. I reviewed the graph and the swap in/out starte= d > >> > to be happening from 1:50 AM CET. Slower than before (util of cpu > >> > 0.3%) but it is doing in/out see attached png. > >> > >> I investigated it more, there was an operation issue and the system > >> disabled multi-gen lru yesterday ~10 AM CET (our temporary workaround > >> for this problem) by > >> echo N > /sys/kernel/mm/lru_gen/enabled > >> when an alert was triggered by an unexpected setup of the server. > >> Could it be that the patch is not functional if lru_gen/enabled is > >> 0x0000? > > > > > > That=E2=80=99s correct. > > > >> I need to reboot the system and do the whole week's test again. > > > > > > Thanks a lot! > > The server with 6.5.y + lru patch is stable, no continuous swap in/out > is observed in the last 7days! > > I assume the fix is correct. Can you share with me the final patch for > 6.6.y, I will use in our kernel builds till it is in the upstream. Will do. Thank you. Charan, does the fix previously attached seem acceptable to you? Any additional feedback? Thanks.