From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99F30C61D97 for ; Wed, 22 Nov 2023 14:19:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CC996B04D6; Wed, 22 Nov 2023 09:19:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27BFA6B04D9; Wed, 22 Nov 2023 09:19:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A80C6B04DE; Wed, 22 Nov 2023 09:19:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E88D36B04D6 for ; Wed, 22 Nov 2023 09:19:05 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C0ADF40C5B for ; Wed, 22 Nov 2023 14:19:05 +0000 (UTC) X-FDA: 81485797050.03.B51AB70 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf05.hostedemail.com (Postfix) with ESMTP id D7117100020 for ; Wed, 22 Nov 2023 14:19:03 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bDhmxNnl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700662743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e7D6ZbVPz2S03Q8htHcoMC4Ek+hCX25w+cAiNJ+O7Mc=; b=5UVLpe02g+f1ZV677IpOi2p2ae5KeXrek05FnyOgWsJH5j8d+tKV/CiEzn9DoZJLB3ic9w K5P/2tPnpk37zqH52EjsXexFKDTaCePhZx7jIeQ5yPUVRi0txGDHz7cziqT422RVn06QRe FlUYrN/Sknlr8YkOvfWR01nhQF6ZyTA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bDhmxNnl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700662743; a=rsa-sha256; cv=none; b=0PgU2Gfd0dNTePxl2IlYqNmMYItD8ULfthZNqcl2Y7DmwX/AD7qkdu17Ve5yWWhPMOwHdP If+O15IkWfI5nPoti/GxExBdJJ93zCmNa+/QX4LCa9ZyPbdZH4BkPCboFhnk2v8pOYNxme MIX/awZESeX2Upi9JiZrmJZlzppcd80= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-41ea9c5e83cso295121cf.0 for ; Wed, 22 Nov 2023 06:19:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700662743; x=1701267543; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=e7D6ZbVPz2S03Q8htHcoMC4Ek+hCX25w+cAiNJ+O7Mc=; b=bDhmxNnli0/1p+QLUsnoBWYubY3vo3eyfjd09sqZQSKh+EnRjQChO1YpCTaUAhb7dH n0/39RFnyxSOzeKtXvj9gjrqLqA1SZOaqIrv7bxkkACU2AqoZ+vsVUrs40jQChNDys1T QvL1ArOXRB2AiwXDoTe9T7mbCjZTiyhD7wYr78Jp48ebrI5cGUAicOX1iNcXDeXo/UYL kHfWZV0qaNe3W5g27jmoDlHNEbrUis30TLFjQHc2G9O/6enJ4IMifYGp08X2kybffKJb OyRBD0QKWlpgvZ8VZBgO8TGLUi0O+P+GK+a0Rq8bjft/0fnNwH2T93/1y7Kq5+j4RI1f N85g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700662743; x=1701267543; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=e7D6ZbVPz2S03Q8htHcoMC4Ek+hCX25w+cAiNJ+O7Mc=; b=hqHjj+28pTi2UY7L1s4KcGGCwdFyoMks3eSisJSEJtkKOLlcJe/XBCQZPkPXNl6/Mx 9krAbTO8xouqXrGjry9UN3P3ANOghum0iNd5AzKKUYcRYxsSMss+wjvwt5f8weA4BAfw oFfi8Okp5QlRMzS+0IcGFA8XTF1zwnY3iDgCfjQIm2HaBqIro5QFgoiae21uH17wTbWs KUQwjF7+zErc2n+jBsa8iOFmKnpKRQ/Xwz/a1CNHIqAv5dZjd9CPViAIM0z+PsFRiHVl a6Qioq4WQZSNq4dKm7fRNm9PDelu8auRCoJxDxqRN6GGBj8jzKSf/swGvPfBUVBjo2Qw 4E3Q== X-Gm-Message-State: AOJu0YwoXKrHgQbpOd3wFbrFbHHAwayRUgZpqrfCyS/1WmrV9ThAi0BC P0kEz39HBsoIEB7l2vfjhCyKZkRadyoxPS5dIwMsYA== X-Google-Smtp-Source: AGHT+IHqjIlohtdUC3m6P6JMK1sGGhPAqaFOTXTnC844rMJgpyk6UK5s7AWA3c/xWV7gSPGLm+0zwxNexcAsmVLAxBI= X-Received: by 2002:ac8:57c9:0:b0:421:c480:11f with SMTP id w9-20020ac857c9000000b00421c480011fmr246905qta.0.1700662742551; Wed, 22 Nov 2023 06:19:02 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yu Zhao Date: Wed, 22 Nov 2023 07:18:49 -0700 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Jaroslav Pulchart Cc: Charan Teja Kalla , Daniel Secik , Igor Raits , Kalesh Singh , akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="000000000000a760dc060abe667a" X-Rspamd-Queue-Id: D7117100020 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: u3kztdgje5rduaawxctbkgi176f3rrka X-HE-Tag: 1700662743-501164 X-HE-Meta: U2FsdGVkX1/wvE1pKqvuig6clAV0Ns5BxmIotnSbm1fTSDFd+67zCv+kmdAfwLTrzxIRGlwEAja3DQc4jiG1Kf1eT6EkmdGkRLF3pl/RueX9b6BzJZ8xhORJEpjtSTWAMWL+3rhA7/Q82dYw+f7xVVytHof1Pq3p8kwqPYoCtcSGiknhAvUjOZFYZYnaB/lnmR7tp+a1DZXr6u20+XOlvTdJm+kQfibd0TiC8Oqdw7h0RlAK8rNNCHCzlTgwd+6x82FDa55NKRER8nve/2Mq8JOPUbcOpBYABupLqdMcWZGruZZW4EFdjZne0Xp5Jj5lsYA+grJIVxEQr0NHyzBMcAP4qR6R6QuQ6vgZ79qKUwR56VTZWw3Bbc4kuIeuzI/NqJ/M7Lsbrz9bVEhWF8+TZC6YahQgIEwDSW1Dmiv1rBey/DSkS1vDFUiGh70XwnSI6m6B1FYE+J/rK3wFoB6w5fQNYKdG1lRiuDsO2vQantMqVi1Q1LTut1Wxx+OpCpJB3PpZubZV7dohOG2j8rTBronX9+dr+jF5KpTwZyGqZCjkZctAu8pGbDO3BYrtjC8XNA37KT2npytmK/QJmxNmsAGVWmnyCj1HuCDy9gz6qiBw5f/ssW//G3dJ4kEkPbbXrNIPj8gpcAmG96WtypmjduxizamMJuKnBS2r6LlE0AoS9QEFg5XVrEfkqCcrA3AFHGqsScckTEhhcNVLwAna9U0Y+8fFAaUK67+um4vNzMTdE/top426yRf+ao+1cqa9+bxlresCo4l/y/ScDdiBPsXCO5h5ciqfvZzliSSb17OJPPtGz+hOhY+dtUlx9aQJiI2PUNpVA7W1oTTwz2XYY6vnzWeUFyU1+RPcElWX4Pnmf3dorQX3C0slJDRQCGndsFoXA6AM9G/f2V5qRAoykEhttAGc+lBof5f54OOeSevUWeIucINaXLmakofl5cnR7IEUzCaZJtAVcmBjl9e CPFO4USh FVC6UFcx/Cqfo4PsLkgPB/Ax6cVCNxSvSUEfpXFtGtRHx1xZpMWw33NL9W4HIUqXEmphLBz4ox15KKEZavRQ1aD8SR8ZgtZBlcx68mvUQ1SMUolcEs/IrAnCqdze88mbVcLKGWPmjqTZd0cnpkHX7WiPkBfJGRZViQKIRgwPmiNe6uW2ZjbC55qH9VGGa37wxGMBDew9+MSbkRmWpwZRZF4dPIduo7ycaLUZYozYF7XqZ6JLj1FyAI2NeGJLQj4x015tuJKIC340wxJMBJt2wZ8U5iy64DyY9I1L7e/W3PbkwdiNZ9tL6fOGisEHC35tu942uZSSwwglrtaja0z6oCPkYoO9R4fUMxEFahMAlQhL/o62Bs/iiCreToOvCFegfvTkvRuZgVUM9Ntw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --000000000000a760dc060abe667a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Nov 22, 2023 at 12:31=E2=80=AFAM Jaroslav Pulchart < jaroslav.pulchart@gooddata.com> wrote: > > > > > > > > On Mon, Nov 20, 2023 at 1:42=E2=80=AFAM Jaroslav Pulchart > > > wrote: > > > > > > > > > On Tue, Nov 14, 2023 at 12:30=E2=80=AFAM Jaroslav Pulchart > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Nov 13, 2023 at 1:36=E2=80=AFAM Jaroslav Pulchart > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 9, 2023 at 3:58=E2=80=AFAM Jaroslav Pulchart > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 10:39=E2=80=AFPM Jaroslav Pulc= hart > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 12:04=E2=80=AFPM Jaroslav = Pulchart > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Jaroslav, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Yu Zhao > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks for response, see answers inline: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jarosl= av > Pulchart > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to report to you an unpleasant > behavior of multi-gen LRU > > > > > > > > > > > > > > > > with strange swap in/out usage on my Dell > 7525 two socket AMD 74F3 > > > > > > > > > > > > > > > > system (16numa domains). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kernel version please? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6.5.y, but we saw it sooner as it is in > investigation from 23th May > > > > > > > > > > > > > > (6.4.y and maybe even the 6.3.y). > > > > > > > > > > > > > > > > > > > > > > > > > > v6.6 has a few critical fixes for MGLRU, I can > backport them to v6.5 > > > > > > > > > > > > > for you if you run into other problems with v6.6. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I will give it a try using 6.6.y. When it will work > we can switch to > > > > > > > > > > > > 6.6.y instead of backporting the stuff to 6.5.y. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Symptoms of my issue are > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /A/ if mult-gen LRU is enabled > > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 100% CPU > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Just thinking out loud: kswapd3 means the > fourth node was under memory pressure. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > top - 15:03:11 up 34 days, 1:51, 2 > users, load average: 23.34, > > > > > > > > > > > > > > > > 18.26, 15.01 > > > > > > > > > > > > > > > > Tasks: 1226 total, 2 running, 1224 > sleeping, 0 stopped, 0 zombie > > > > > > > > > > > > > > > > %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, 82.= 1 > id, 0.0 wa, 0.4 hi, > > > > > > > > > > > > > > > > 0.4 si, 0.0 st > > > > > > > > > > > > > > > > MiB Mem : 1047265.+total, 28382.7 free= , > 1021308.+used, 767.6 buff/cache > > > > > > > > > > > > > > > > MiB Swap: 8192.0 total, 8187.7 > free, 4.2 used. 25956.7 avail Mem > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > 765 root 20 0 0 0 > 0 R 98.3 0.0 > > > > > > > > > > > > > > > > 34969:04 kswapd3 > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > 2/ swap space usage is low about ~4MB from > 8GB as swap in zram (was > > > > > > > > > > > > > > > > observed with swap disk as well and cause I= O > latency issues due to > > > > > > > > > > > > > > > > some kind of locking) > > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical > ~12MB/s in and ~12MB/s out > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /B/ if mult-gen LRU is disabled > > > > > > > > > > > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU > > > > > > > > > > > > > > > > top - 15:02:49 up 34 days, 1:51, 2 > users, load average: 23.05, > > > > > > > > > > > > > > > > 17.77, 14.77 > > > > > > > > > > > > > > > > Tasks: 1226 total, 1 running, 1225 > sleeping, 0 stopped, 0 zombie > > > > > > > > > > > > > > > > %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, 81.= 8 > id, 0.0 wa, 0.4 hi, > > > > > > > > > > > > > > > > 0.4 si, 0.0 st > > > > > > > > > > > > > > > > MiB Mem : 1047265.+total, 28378.5 free= , > 1021313.+used, 767.3 buff/cache > > > > > > > > > > > > > > > > MiB Swap: 8192.0 total, 8189.0 > free, 3.0 used. 25952.4 avail Mem > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > 765 root 20 0 0 0 > 0 S 3.6 0.0 > > > > > > > > > > > > > > > > 34966:46 [kswapd3] > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > 2/ swap space usage is low (4MB) > > > > > > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical > ~500kB/s in and ~500kB/s out > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Both situations are wrong as they are using > swap in/out extensively, > > > > > > > > > > > > > > > > however the multi-gen LRU situation is > 10times worse. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From the stats below, node 3 had the lowest > free memory. So I think in > > > > > > > > > > > > > > > both cases, the reclaim activities were as > expected. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I do not see a reason for the memory pressure > and reclaims. This node > > > > > > > > > > > > > > has the lowest free memory of all nodes (~302MB > free) that is true, > > > > > > > > > > > > > > however the swap space usage is just 4MB (still > going in and out). So > > > > > > > > > > > > > > what can be the reason for that behaviour? > > > > > > > > > > > > > > > > > > > > > > > > > > The best analogy is that refuel (reclaim) happens > before the tank > > > > > > > > > > > > > becomes empty, and it happens even sooner when > there is a long road > > > > > > > > > > > > > ahead (high order allocations). > > > > > > > > > > > > > > > > > > > > > > > > > > > The workers/application is running in > pre-allocated HugePages and the > > > > > > > > > > > > > > rest is used for a small set of system services > and drivers of > > > > > > > > > > > > > > devices. It is static and not growing. The issu= e > persists when I stop > > > > > > > > > > > > > > the system services and free the memory. > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, this helps. > > > > > > > > > > > > > Also could you attach /proc/buddyinfo from the > moment > > > > > > > > > > > > > you hit the problem? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can. The problem is continuous, it is 100% of tim= e > continuously > > > > > > > > > > > > doing in/out and consuming 100% of CPU and locking > IO. > > > > > > > > > > > > > > > > > > > > > > > > The output of /proc/buddyinfo is: > > > > > > > > > > > > > > > > > > > > > > > > # cat /proc/buddyinfo > > > > > > > > > > > > Node 0, zone DMA 7 2 2 1 > 1 2 1 > > > > > > > > > > > > 1 1 2 1 > > > > > > > > > > > > Node 0, zone DMA32 4567 3395 1357 846 > 439 190 93 > > > > > > > > > > > > 61 43 23 4 > > > > > > > > > > > > Node 0, zone Normal 19 190 140 129 > 136 75 66 > > > > > > > > > > > > 41 9 1 5 > > > > > > > > > > > > Node 1, zone Normal 194 1210 2080 1800 > 715 255 111 > > > > > > > > > > > > 56 42 36 55 > > > > > > > > > > > > Node 2, zone Normal 204 768 3766 3394 > 1742 468 185 > > > > > > > > > > > > 194 238 47 74 > > > > > > > > > > > > Node 3, zone Normal 1622 2137 1058 846 > 388 208 97 > > > > > > > > > > > > 44 14 42 10 > > > > > > > > > > > > > > > > > > > > > > Again, thinking out loud: there is only one zone on > node 3, i.e., the > > > > > > > > > > > normal zone, and this excludes the problem commit > > > > > > > > > > > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-gen > LRU: fix per-zone > > > > > > > > > > > reclaim") fixed in v6.6. > > > > > > > > > > > > > > > > > > > > I built vanila 6.6.1 and did the first fast test - spin > up and destroy > > > > > > > > > > VMs only - This test does not always trigger the kswapd= 3 > continuous > > > > > > > > > > swap in/out usage but it uses it and it looks like > there is a > > > > > > > > > > change: > > > > > > > > > > > > > > > > > > > > I can see kswapd non-continous (15s and more) usage > with 6.5.y > > > > > > > > > > # ps ax | grep [k]swapd > > > > > > > > > > 753 ? S 0:00 [kswapd0] > > > > > > > > > > 754 ? S 0:00 [kswapd1] > > > > > > > > > > 755 ? S 0:00 [kswapd2] > > > > > > > > > > 756 ? S 0:15 [kswapd3] <<<<<<<<< > > > > > > > > > > 757 ? S 0:00 [kswapd4] > > > > > > > > > > 758 ? S 0:00 [kswapd5] > > > > > > > > > > 759 ? S 0:00 [kswapd6] > > > > > > > > > > 760 ? S 0:00 [kswapd7] > > > > > > > > > > 761 ? S 0:00 [kswapd8] > > > > > > > > > > 762 ? S 0:00 [kswapd9] > > > > > > > > > > 763 ? S 0:00 [kswapd10] > > > > > > > > > > 764 ? S 0:00 [kswapd11] > > > > > > > > > > 765 ? S 0:00 [kswapd12] > > > > > > > > > > 766 ? S 0:00 [kswapd13] > > > > > > > > > > 767 ? S 0:00 [kswapd14] > > > > > > > > > > 768 ? S 0:00 [kswapd15] > > > > > > > > > > > > > > > > > > > > and none kswapd usage with 6.6.1, that looks to be > promising path > > > > > > > > > > > > > > > > > > > > # ps ax | grep [k]swapd > > > > > > > > > > 808 ? S 0:00 [kswapd0] > > > > > > > > > > 809 ? S 0:00 [kswapd1] > > > > > > > > > > 810 ? S 0:00 [kswapd2] > > > > > > > > > > 811 ? S 0:00 [kswapd3] <<<< nice > > > > > > > > > > 812 ? S 0:00 [kswapd4] > > > > > > > > > > 813 ? S 0:00 [kswapd5] > > > > > > > > > > 814 ? S 0:00 [kswapd6] > > > > > > > > > > 815 ? S 0:00 [kswapd7] > > > > > > > > > > 816 ? S 0:00 [kswapd8] > > > > > > > > > > 817 ? S 0:00 [kswapd9] > > > > > > > > > > 818 ? S 0:00 [kswapd10] > > > > > > > > > > 819 ? S 0:00 [kswapd11] > > > > > > > > > > 820 ? S 0:00 [kswapd12] > > > > > > > > > > 821 ? S 0:00 [kswapd13] > > > > > > > > > > 822 ? S 0:00 [kswapd14] > > > > > > > > > > 823 ? S 0:00 [kswapd15] > > > > > > > > > > > > > > > > > > > > I will install the 6.6.1 on the server which is doing > some work and > > > > > > > > > > observe it later today. > > > > > > > > > > > > > > > > > > Thanks. Fingers crossed. > > > > > > > > > > > > > > > > The 6.6.y was deployed and used from 9th Nov 3PM CEST. So > far so good. > > > > > > > > The node 3 has 163MiB free of memory and I see > > > > > > > > just a few in/out swap usage sometimes (which is expected) > and minimal > > > > > > > > kswapd3 process usage for almost 4days. > > > > > > > > > > > > > > Thanks for the update! > > > > > > > > > > > > > > Just to confirm: > > > > > > > 1. MGLRU was enabled, and > > > > > > > > > > > > Yes, MGLRU is enabled > > > > > > > > > > > > > 2. The v6.6 deployed did NOT have the patch I attached earlie= r. > > > > > > > > > > > > Vanila 6.6, attached patch NOT applied. > > > > > > > > > > > > > Are both correct? > > > > > > > > > > > > > > If so, I'd very appreciate it if you could try the attached > patch on > > > > > > > top of v6.5 and see if it helps. My suspicion is that the > problem is > > > > > > > compaction related, i.e., kswapd was woken up by high order > > > > > > > allocations but didn't properly stop. But what causes the > behavior > > > > > > > > > > > > Sure, I can try it. Will inform you about progress. > > > > > > > > > > Thanks! > > > > > > > > > > > > difference on v6.5 between MGLRU and the active/inactive LRU > still > > > > > > > puzzles me --the problem might be somehow masked rather than > fixed on > > > > > > > v6.6. > > > > > > > > > > > > I'm not sure how I can help with the issue. Any suggestions on > what to > > > > > > change/try? > > > > > > > > > > Trying the attached patch is good enough for now :) > > > > > > > > So far I'm running the "6.5.y + patch" for 4 days without triggerin= g > > > > the infinite swap in//out usage. > > > > > > > > I'm observing a similar pattern in kswapd usage - "if it uses kswap= d, > > > > then it is in majority the kswapd3 - like the vanila 6.5.y which is > > > > not observed with 6.6.y, (The Node's 3 free mem is 159 MB) > > > > # ps ax | grep [k]swapd > > > > 750 ? S 0:00 [kswapd0] > > > > 751 ? S 0:00 [kswapd1] > > > > 752 ? S 0:00 [kswapd2] > > > > 753 ? S 0:02 [kswapd3] <<<< it uses kswapd3, goo= d > > > > is that it is not continuous > > > > 754 ? S 0:00 [kswapd4] > > > > 755 ? S 0:00 [kswapd5] > > > > 756 ? S 0:00 [kswapd6] > > > > 757 ? S 0:00 [kswapd7] > > > > 758 ? S 0:00 [kswapd8] > > > > 759 ? S 0:00 [kswapd9] > > > > 760 ? S 0:00 [kswapd10] > > > > 761 ? S 0:00 [kswapd11] > > > > 762 ? S 0:00 [kswapd12] > > > > 763 ? S 0:00 [kswapd13] > > > > 764 ? S 0:00 [kswapd14] > > > > 765 ? S 0:00 [kswapd15] > > > > > > > > Good stuff is that the system did not end in a continuous loop of > swap > > > > in/out usage (at least so far) which is great. See attached > > > > swap_in_out_good_vs_bad.png. I will keep it running for the next 3 > > > > days. > > > > > > Thanks again, Jaroslav! > > > > > > Just a note here: I suspect the problem still exists on v6.6 but > > > somehow is masked, possibly by reduced memory usage from the kernel > > > itself and more free memory for userspace. So to be on the safe side, > > > I'll post the patch and credit you as the reporter and tester. > > > > Morning, let's wait. I reviewed the graph and the swap in/out started > > to be happening from 1:50 AM CET. Slower than before (util of cpu > > 0.3%) but it is doing in/out see attached png. > > I investigated it more, there was an operation issue and the system > disabled multi-gen lru yesterday ~10 AM CET (our temporary workaround > for this problem) by > echo N > /sys/kernel/mm/lru_gen/enabled > when an alert was triggered by an unexpected setup of the server. > Could it be that the patch is not functional if lru_gen/enabled is > 0x0000? That=E2=80=99s correct. I need to reboot the system and do the whole week's test again. Thanks a lot! > --000000000000a760dc060abe667a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Wed, Nov 22, 2023 at 12:31=E2=80=AFAM Jaroslav Pulchart = <jaroslav.pulchart@goo= ddata.com> wrote:
>
> >
> > On Mon, Nov 20, 2023 at 1:42=E2=80=AFAM Jaroslav Pulchart
> > <jaroslav.pulchart@gooddata.com> wrote:
> > >
> > > > On Tue, Nov 14, 2023 at 12:30=E2=80=AFAM Jaroslav Pulch= art
> > > > <jaroslav.pulchart@gooddata.com> wrote:
> > > > >
> > > > > >
> > > > > > On Mon, Nov 13, 2023 at 1:36=E2=80=AFAM Jaros= lav Pulchart
> > > > > > <jaroslav.pulchart@gooddata.com> wrote: > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 9, 2023 at 3:58=E2=80= =AFAM Jaroslav Pulchart
> > > > > > > > <jaroslav.pulchart@gooddata.com>= wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 8, 2023 at 10= :39=E2=80=AFPM Jaroslav Pulchart
> > > > > > > > > > <jaroslav.pulchart@gooddata.c= om> wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 8, = 2023 at 12:04=E2=80=AFPM Jaroslav Pulchart
> > > > > > > > > > > > <jaroslav.pulchart@= gooddata.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ja= roslav,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Yu Zhao=
> > > > > > > > > > > > >
> > > > > > > > > > > > > thanks for= response, see answers inline:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On We= d, Nov 8, 2023 at 6:35=E2=80=AFAM Jaroslav Pulchart
> > > > > > > > > > > > > > <<= a href=3D"mailto:jaroslav.pulchart@gooddata.com" target=3D"_blank">jaroslav= .pulchart@gooddata.com> wrote:
> > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > > = Hello,
> > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > > = I would like to report to you an unpleasant behavior of multi-gen LRU
> > > > > > > > > > > > > > > = with strange swap in/out usage on my Dell 7525 two socket AMD 74F3
> > > > > > > > > > > > > > > = system (16numa domains).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kerne= l version please?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 6.5.y, but= we saw it sooner as it is in investigation from 23th May
> > > > > > > > > > > > > (6.4.y and= maybe even the 6.3.y).
> > > > > > > > > > > >
> > > > > > > > > > > > v6.6 has a few = critical fixes for MGLRU, I can backport them to v6.5
> > > > > > > > > > > > for you if you = run into other problems with v6.6.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I will give it a try= using 6.6.y. When it will work we can switch to
> > > > > > > > > > > 6.6.y instead of bac= kporting the stuff to 6.5.y.
> > > > > > > > > > >
> > > > > > > > > > > > > > > = Symptoms of my issue are
> > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > > = /A/ if mult-gen LRU is enabled
> > > > > > > > > > > > > > > = 1/ [kswapd3] is consuming 100% CPU
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Just = thinking out loud: kswapd3 means the fourth node was under memory pressure.=
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0top - 15:03:11 up 34 days,=C2=A0 1:51,=C2=A0 2 users,= =C2=A0 load average: 23.34,
> > > > > > > > > > > > > > > = 18.26, 15.01
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0Tasks: 1226 total,=C2=A0 =C2=A02 running, 1224 sleeping= ,=C2=A0 =C2=A00 stopped,=C2=A0 =C2=A00 zombie
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0%Cpu(s): 12.5 us,=C2=A0 4.7 sy,=C2=A0 0.0 ni, 82.1 id,= =C2=A0 0.0 wa,=C2=A0 0.4 hi,
> > > > > > > > > > > > > > > = 0.4 si,=C2=A0 0.0 st
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0MiB Mem : 1047265.+total,=C2=A0 28382.7 free, 1021308.+= used,=C2=A0 =C2=A0 767.6 buff/cache
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0MiB Swap:=C2=A0 =C2=A08192.0 total,=C2=A0 =C2=A08187.7 = free,=C2=A0 =C2=A0 =C2=A0 4.2 used.=C2=A0 25956.7 avail Mem
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0...
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0765 root=C2=A0 =C2=A0 =C2=A0 20=C2=A0 =C2= =A00=C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 = 0 R=C2=A0 98.3=C2=A0 =C2=A00.0
> > > > > > > > > > > > > > > = 34969:04 kswapd3
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0...
> > > > > > > > > > > > > > > = 2/ swap space usage is low about ~4MB from 8GB as swap in zram (was
> > > > > > > > > > > > > > > = observed with swap disk as well and cause IO latency issues due to
> > > > > > > > > > > > > > > = some kind of locking)
> > > > > > > > > > > > > > > = 3/ swap In/Out is huge and symmetrical ~12MB/s in and ~12MB/s out
> > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > > = /B/ if mult-gen LRU is disabled
> > > > > > > > > > > > > > > = 1/ [kswapd3] is consuming 3%-10% CPU
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0top - 15:02:49 up 34 days,=C2=A0 1:51,=C2=A0 2 users,= =C2=A0 load average: 23.05,
> > > > > > > > > > > > > > > = 17.77, 14.77
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0Tasks: 1226 total,=C2=A0 =C2=A01 running, 1225 sleeping= ,=C2=A0 =C2=A00 stopped,=C2=A0 =C2=A00 zombie
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0%Cpu(s): 14.7 us,=C2=A0 2.8 sy,=C2=A0 0.0 ni, 81.8 id,= =C2=A0 0.0 wa,=C2=A0 0.4 hi,
> > > > > > > > > > > > > > > = 0.4 si,=C2=A0 0.0 st
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0MiB Mem : 1047265.+total,=C2=A0 28378.5 free, 1021313.+= used,=C2=A0 =C2=A0 767.3 buff/cache
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0MiB Swap:=C2=A0 =C2=A08192.0 total,=C2=A0 =C2=A08189.0 = free,=C2=A0 =C2=A0 =C2=A0 3.0 used.=C2=A0 25952.4 avail Mem
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0...
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0 =C2=A0 765 root=C2=A0 =C2=A0 =C2=A0 20=C2=A0 =C2=A00= =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 0 S= =C2=A0 =C2=A03.6=C2=A0 =C2=A00.0
> > > > > > > > > > > > > > > = 34966:46 [kswapd3]
> > > > > > > > > > > > > > >= =C2=A0 =C2=A0 =C2=A0...
> > > > > > > > > > > > > > > = 2/ swap space usage is low (4MB)
> > > > > > > > > > > > > > > = 3/ swap In/Out is huge and symmetrical ~500kB/s in and ~500kB/s out
> > > > > > > > > > > > > > ><= br> > > > > > > > > > > > > > > > = Both situations are wrong as they are using swap in/out extensively,
> > > > > > > > > > > > > > > = however the multi-gen LRU situation is 10times worse.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > From = the stats below, node 3 had the lowest free memory. So I think in
> > > > > > > > > > > > > > both = cases, the reclaim activities were as expected.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I do not s= ee a reason for the memory pressure and reclaims. This node
> > > > > > > > > > > > > has the lo= west free memory of all nodes (~302MB free) that is true,
> > > > > > > > > > > > > however th= e swap space usage is just 4MB (still going in and out). So
> > > > > > > > > > > > > what can b= e the reason for that behaviour?
> > > > > > > > > > > >
> > > > > > > > > > > > The best analog= y is that refuel (reclaim) happens before the tank
> > > > > > > > > > > > becomes empty, = and it happens even sooner when there is a long road
> > > > > > > > > > > > ahead (high ord= er allocations).
> > > > > > > > > > > >
> > > > > > > > > > > > > The worker= s/application is running in pre-allocated HugePages and the
> > > > > > > > > > > > > rest is us= ed for a small set of system services and drivers of
> > > > > > > > > > > > > devices. I= t is static and not growing. The issue persists when I stop
> > > > > > > > > > > > > the system= services and free the memory.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, this helps= .
> > > > > > > > > > > >=C2=A0 Also coul= d you attach /proc/buddyinfo from the moment
> > > > > > > > > > > > you hit the pro= blem?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I can. The problem i= s continuous, it is 100% of time continuously
> > > > > > > > > > > doing in/out and con= suming 100% of CPU and locking IO.
> > > > > > > > > > >
> > > > > > > > > > > The output of /proc/= buddyinfo is:
> > > > > > > > > > >
> > > > > > > > > > > # cat /proc/buddyinf= o
> > > > > > > > > > > Node 0, zone=C2=A0 = =C2=A0 =C2=A0 DMA=C2=A0 =C2=A0 =C2=A0 7=C2=A0 =C2=A0 =C2=A0 2=C2=A0 =C2=A0 = =C2=A0 2=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 =C2=A0 2= =C2=A0 =C2=A0 =C2=A0 1
> > > > > > > > > > >=C2=A0 =C2=A0 =C2=A0 = 1=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 =C2=A0 2=C2=A0 =C2=A0 =C2=A0 1
> > > > > > > > > > > Node 0, zone=C2=A0 = =C2=A0 DMA32=C2=A0 =C2=A04567=C2=A0 =C2=A03395=C2=A0 =C2=A01357=C2=A0 =C2= =A0 846=C2=A0 =C2=A0 439=C2=A0 =C2=A0 190=C2=A0 =C2=A0 =C2=A093
> > > > > > > > > > >=C2=A0 =C2=A0 =C2=A06= 1=C2=A0 =C2=A0 =C2=A043=C2=A0 =C2=A0 =C2=A023=C2=A0 =C2=A0 =C2=A0 4
> > > > > > > > > > > Node 0, zone=C2=A0 = =C2=A0Normal=C2=A0 =C2=A0 =C2=A019=C2=A0 =C2=A0 190=C2=A0 =C2=A0 140=C2=A0 = =C2=A0 129=C2=A0 =C2=A0 136=C2=A0 =C2=A0 =C2=A075=C2=A0 =C2=A0 =C2=A066
> > > > > > > > > > >=C2=A0 =C2=A0 =C2=A04= 1=C2=A0 =C2=A0 =C2=A0 9=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 =C2=A0 5
> > > > > > > > > > > Node 1, zone=C2=A0 = =C2=A0Normal=C2=A0 =C2=A0 194=C2=A0 =C2=A01210=C2=A0 =C2=A02080=C2=A0 =C2= =A01800=C2=A0 =C2=A0 715=C2=A0 =C2=A0 255=C2=A0 =C2=A0 111
> > > > > > > > > > >=C2=A0 =C2=A0 =C2=A05= 6=C2=A0 =C2=A0 =C2=A042=C2=A0 =C2=A0 =C2=A036=C2=A0 =C2=A0 =C2=A055
> > > > > > > > > > > Node 2, zone=C2=A0 = =C2=A0Normal=C2=A0 =C2=A0 204=C2=A0 =C2=A0 768=C2=A0 =C2=A03766=C2=A0 =C2= =A03394=C2=A0 =C2=A01742=C2=A0 =C2=A0 468=C2=A0 =C2=A0 185
> > > > > > > > > > >=C2=A0 =C2=A0 194=C2= =A0 =C2=A0 238=C2=A0 =C2=A0 =C2=A047=C2=A0 =C2=A0 =C2=A074
> > > > > > > > > > > Node 3, zone=C2=A0 = =C2=A0Normal=C2=A0 =C2=A01622=C2=A0 =C2=A02137=C2=A0 =C2=A01058=C2=A0 =C2= =A0 846=C2=A0 =C2=A0 388=C2=A0 =C2=A0 208=C2=A0 =C2=A0 =C2=A097
> > > > > > > > > > >=C2=A0 =C2=A0 =C2=A04= 4=C2=A0 =C2=A0 =C2=A014=C2=A0 =C2=A0 =C2=A042=C2=A0 =C2=A0 =C2=A010
> > > > > > > > > >
> > > > > > > > > > Again, thinking out loud:= there is only one zone on node 3, i.e., the
> > > > > > > > > > normal zone, and this exc= ludes the problem commit
> > > > > > > > > > 669281ee7ef731fb5204df9d9= 48669bf32a5e68d ("Multi-gen LRU: fix per-zone
> > > > > > > > > > reclaim") fixed in v= 6.6.
> > > > > > > > >
> > > > > > > > > I built vanila 6.6.1 and did t= he first fast test - spin up and destroy
> > > > > > > > > VMs only - This test does not = always trigger the kswapd3 continuous
> > > > > > > > > swap in/out=C2=A0 usage but it= uses it and it=C2=A0 looks like there is a
> > > > > > > > > change:
> > > > > > > > >
> > > > > > > > >=C2=A0 I can see kswapd non-con= tinous (15s and more) usage with 6.5.y
> > > > > > > > >=C2=A0 # ps ax | grep [k]swapd<= br> > > > > > > > > >=C2=A0 =C2=A0 =C2=A0753 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd0]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0754 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd1]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0755 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd2]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0756 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:15 [kswapd3]=C2=A0 =C2=A0 <= ;<<<<<<<<
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0757 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd4]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0758 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd5]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0759 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd6]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0760 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd7]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0761 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd8]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0762 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd9]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0763 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd10]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0764 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd11]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0765 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd12]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0766 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd13]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0767 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd14]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0768 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd15]
> > > > > > > > >
> > > > > > > > > and none kswapd usage with 6.6= .1, that looks to be promising path
> > > > > > > > >
> > > > > > > > > # ps ax | grep [k]swapd
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0808 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd0]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0809 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd1]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0810 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd2]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0811 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd3]=C2=A0 =C2=A0 <= ;<<< nice
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0812 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd4]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0813 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd5]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0814 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd6]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0815 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd7]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0816 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd8]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0817 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd9]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0818 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd10]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0819 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd11]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0820 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd12]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0821 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd13]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0822 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd14]
> > > > > > > > >=C2=A0 =C2=A0 =C2=A0823 ?=C2=A0= =C2=A0 =C2=A0 =C2=A0 S=C2=A0 =C2=A0 =C2=A0 0:00 [kswapd15]
> > > > > > > > >
> > > > > > > > > I will install the 6.6.1 on th= e server which is doing some work and
> > > > > > > > > observe it later today.
> > > > > > > >
> > > > > > > > Thanks. Fingers crossed.
> > > > > > >
> > > > > > > The 6.6.y was deployed and used from 9th= Nov 3PM CEST. So far so good.
> > > > > > > The node 3 has 163MiB free of memory and= I see
> > > > > > > just a few in/out swap usage sometimes (= which is expected) and minimal
> > > > > > > kswapd3 process usage for almost 4days.<= br> > > > > > >
> > > > > > Thanks for the update!
> > > > > >
> > > > > > Just to confirm:
> > > > > > 1. MGLRU was enabled, and
> > > > >
> > > > > Yes, MGLRU is enabled
> > > > >
> > > > > > 2. The v6.6 deployed did NOT have the patch I= attached earlier.
> > > > >
> > > > > Vanila 6.6, attached patch NOT applied.
> > > > >
> > > > > > Are both correct?
> > > > > >
> > > > > > If so, I'd very appreciate it if you coul= d try the attached patch on
> > > > > > top of v6.5 and see if it helps. My suspicion= is that the problem is
> > > > > > compaction related, i.e., kswapd was woken up= by high order
> > > > > > allocations but didn't properly stop. But= what causes the behavior
> > > > >
> > > > > Sure, I can try it. Will inform you about progress= .
> > > >
> > > > Thanks!
> > > >
> > > > > > difference on v6.5 between MGLRU and the acti= ve/inactive LRU still
> > > > > > puzzles me --the problem might be somehow mas= ked rather than fixed on
> > > > > > v6.6.
> > > > >
> > > > > I'm not sure how I can help with the issue. An= y suggestions on what to
> > > > > change/try?
> > > >
> > > > Trying the attached patch is good enough for now :)
> > >
> > > So far I'm running the "6.5.y + patch" for 4 d= ays without triggering
> > > the infinite swap in//out usage.
> > >
> > > I'm observing a similar pattern in kswapd usage - "= if it uses kswapd,
> > > then it is in majority the kswapd3 - like the vanila 6.5.y w= hich is
> > > not observed with 6.6.y, (The Node's 3 free mem is 159 M= B)
> > > # ps ax | grep [k]swapd
> > >=C2=A0 =C2=A0 =C2=A0750 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd0]
> > >=C2=A0 =C2=A0 =C2=A0751 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd1]
> > >=C2=A0 =C2=A0 =C2=A0752 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd2]
> > >=C2=A0 =C2=A0 =C2=A0753 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:02 [kswapd3]=C2=A0 =C2=A0 <<<< it uses kswapd3,= good
> > > is that it is not continuous
> > >=C2=A0 =C2=A0 =C2=A0754 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd4]
> > >=C2=A0 =C2=A0 =C2=A0755 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd5]
> > >=C2=A0 =C2=A0 =C2=A0756 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd6]
> > >=C2=A0 =C2=A0 =C2=A0757 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd7]
> > >=C2=A0 =C2=A0 =C2=A0758 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd8]
> > >=C2=A0 =C2=A0 =C2=A0759 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd9]
> > >=C2=A0 =C2=A0 =C2=A0760 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd10]
> > >=C2=A0 =C2=A0 =C2=A0761 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd11]
> > >=C2=A0 =C2=A0 =C2=A0762 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd12]
> > >=C2=A0 =C2=A0 =C2=A0763 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd13]
> > >=C2=A0 =C2=A0 =C2=A0764 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd14]
> > >=C2=A0 =C2=A0 =C2=A0765 ?=C2=A0 =C2=A0 =C2=A0 =C2=A0 S=C2=A0 = =C2=A0 =C2=A0 0:00 [kswapd15]
> > >
> > > Good stuff is that the system did not end in a continuous lo= op of swap
> > > in/out usage (at least so far) which is great. See attached<= br> > > > swap_in_out_good_vs_bad.png. I will keep it running for the = next 3
> > > days.
> >
> > Thanks again, Jaroslav!
> >
> > Just a note here: I suspect the problem still exists on v6.6 but<= br> > > somehow is masked, possibly by reduced memory usage from the kern= el
> > itself and more free memory for userspace. So to be on the safe s= ide,
> > I'll post the patch and credit you as the reporter and tester= .
>
> Morning, let's wait. I reviewed the graph and the swap in/out star= ted
> to be happening from 1:50 AM CET. Slower than before (util of cpu
> 0.3%) but it is doing in/out see attached png.

I investigated it more, there was an operation issue and the system
disabled multi-gen lru yesterday ~10 AM CET (our temporary workaround
for this problem) by
=C2=A0 =C2=A0echo N > /sys/kernel/mm/lru_gen/enabled
when an alert was triggered by an unexpected setup of the server.
Could it be that the patch is not functional if lru_gen/enabled is
0x0000?

That=E2= =80=99s correct.

I need to reboot the system and do the whole week's test again.

Thanks a lot!
--000000000000a760dc060abe667a--