From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD526C4332F for ; Wed, 8 Nov 2023 20:04:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39DEC6B01F5; Wed, 8 Nov 2023 15:04:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34DB66B01F6; Wed, 8 Nov 2023 15:04:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 216826B01F7; Wed, 8 Nov 2023 15:04:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E5A296B01F5 for ; Wed, 8 Nov 2023 15:04:29 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 89B36160C81 for ; Wed, 8 Nov 2023 20:04:29 +0000 (UTC) X-FDA: 81435864258.08.5F1442D Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf06.hostedemail.com (Postfix) with ESMTP id AFA3F180027 for ; Wed, 8 Nov 2023 20:04:27 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b=MH25a8YK; spf=pass (imf06.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699473867; a=rsa-sha256; cv=none; b=1gZ3iusmp9WHonb40UNp7u75+RmjESnvS217r9TJhFWnBJGN+gcvHM2gjYMmPNTAAysP6N qZ+B1OKZcG5IdDCk1FRaivpztNPx6MLp4wBMuoskuMmOFag55npcjoO8/ra+CXpjZx8swO MTgOdK5/8l88qJEUAB/lwMx3pefivck= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b=MH25a8YK; spf=pass (imf06.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699473867; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ACgv2uaiktuX0vl0kcJfvMWS70QO7wR/6zfA/jlQsrs=; b=Smf/TWbsLklssBPWjq4gj00loMHg/hyxDLtYxwp+Bfbq9NkhLmoRMwito9JtvzKqkBDSiU 6PZRscFqPguKMCLE57y+t7/RryavyQX+ExH7x0+UXmhS058dY7+Do1G73414lJepI8nO1Z vzeRndkfLrAXUkFyrJjaayMykba6pQY= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5435336ab0bso28027a12.1 for ; Wed, 08 Nov 2023 12:04:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gooddata.com; s=google; t=1699473866; x=1700078666; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ACgv2uaiktuX0vl0kcJfvMWS70QO7wR/6zfA/jlQsrs=; b=MH25a8YKPjbJrntx8Umkq22IJ/QZUgnQP//s2mtirjXo6vYeHY7M7lnZ+epi4ffbpg dF+KOewbfCvK+lBVmUmDJSN6GnPzvB6EVZjZlaQsjvBo2wYC0byQtIjN37HvZbeFvrVs RGsuExhCcmgz32Gi0/F8kHracge1mF+qg/j2c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699473866; x=1700078666; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ACgv2uaiktuX0vl0kcJfvMWS70QO7wR/6zfA/jlQsrs=; b=omwkKzrGKq5w4ZF/hSsmP3tm3zuRmbGKR98uKfvHLOINiTWQ+aOrtj5mL0eXohpgNX MsJI7/ftF+pDUAzZ/XxFiEahRSJ8YSLRUoqIrb/2zf5U9KEb5ZXki/361hIC661Nmy9f iInOjuFaDaDvWcpNIG372mIPdEeX9gVyRZ6vU1VHJRrcpLyq1SBK5R2k29UJmsSqTBW/ xifAZiw79azxPF9+J8Yv54+L+Oh9FK6XnMHL+erTyFVUdUavjklqeMFZC8gjsItvWVLC GdiHKaTt/5UfTx32GWC2DtentP2uBlzZh4ObbCHOREbR0rfFCYJ1VE/RMOewPHscCt2q ll0g== X-Gm-Message-State: AOJu0Yw0IKQuQUnG106848jdqiRQdkm5BhxCpQqv+HTGyy92h23J1/+u ZAEnIi1g323wwXF5eI8ayjrN2NXlDWqcNvRbPerG/w== X-Google-Smtp-Source: AGHT+IHUjEAPuD9cJ4RYSSRVNiQEGZE2SJsRmjsAliMNtG5jHJWNd/Yp8Sf766+6CqPFZ6Q1SkoZtPDbxCvXCgPzL4c= X-Received: by 2002:a17:907:701:b0:9c7:56ee:b6e5 with SMTP id xb1-20020a170907070100b009c756eeb6e5mr2085285ejb.40.1699473866040; Wed, 08 Nov 2023 12:04:26 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jaroslav Pulchart Date: Wed, 8 Nov 2023 21:04:00 +0100 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Yu Zhao Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Igor Raits , Daniel Secik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AFA3F180027 X-Stat-Signature: hjwibg1tbaeqaa4cmfizhxwar7mi5xm5 X-Rspam-User: X-HE-Tag: 1699473867-281842 X-HE-Meta: U2FsdGVkX1/cvx6PKcCQqScA4eCeq7M5/pnw+zk+7au4T2u6QvKHFOf51TQEC2u6ImccnP4KINGl0N3Ilw3DqTJrez23h4iir5FVvuu88D1CtMthJLQSfohNX29I5q4wqwiOBVX6Y5RYYU1sGjjv6Yfm1YzUyYGp86H0DiLR0Zj/Xdpb6yHlcmJVG5UiR5za4j7o9f7SCrr+7nmQYGpPAQPP0ql4g31/ZUbhu3ysAcQFakGVTNytPinC4gt0t1qWD9xFl7ckqKqg+Ni/b2aiZhHiameY2ebm7HcqC1S/EVSqcCuoBb3/+ciMlBVChFgTHG6Oa3YBo2fWuwkGs4B7Us1VHq+fpf56cfzt5NXVgR7niW3f/vaz63shQhtsGMaChHfVeZcXnlUSmsiofTH1UJxiqZUABe9FIbxiYD8ttkPIHITVoyGplOHSar7bQOmKywzLYBRcGj3P8Ha54CABiXrDCu9O7/A3KgiDPPXivJ3bFTOxlsacRPqZWC+pOz4eIm5Mmlh2auNwfPBVJE+j+l4iz8i0iEdHSPIS1pb567xdVRVKdWlRWrtoFRL3/GT0Tsh1qu7c1K67fwr+peVjBg4Brs3EhEXPu0fvg3xn2te5BSaupP6LEZC+CIPHFoR5ZO41a7RFviFJrUOQ4j4+T+hC316jT4hXYAFGkfQ+imoyr13YYe46LQ3f17pO2aw8XNXRNPcY6dnHK5abIun8EquZP8s/vqmzx/+FRof/pnhYcu+cxx9QXVltd7Xj3SAv2rZWvaLCC6L1DNeuSK2bpUXykgtYrk72+yerTKQ2jl3ivAgKsTJAPtkwAPzpVwwDSN9O3oUyoF3/roJQ/qFuHeDFW2b0L9i2pC0gxspL2Yfwhf/5C65ONWGGpruP7ceqj4qyAIjgmwJad8NLlhkQpDbKDORnXL/KxaZT+BUAVAHeOWtfmoMZw0ZwMczVkTF3ZfxsY2KpD7F+sJ3B6+i Rujpus4C fE7TGQwgUvBWWTa9QffcZJ+9zktR75fXKlNftUdrH5EeidbCcYIWnIrpN8cIWtI/KnRr8QMKvNYhJTr3JityY9pdJmi6IXybTm3/cn+B56zoP3664CDeQgiEf5PI/tandPoTNd80958qyWLw9lFKgI9hl8HibFJlYiOswaeR6frTBR0oWvHrPlxOlnM0YpFxcO8VQptaSikRWpBvVLfzOqn0C4Eqi/vSgaqC2YSrwHChmgwrtXUP3yytJJ6cfVjrY4o7tcbEx3DVugOYdSvIfNq3dk1Wm7BUqiyHQm0NTIIemLj/zIqJ6YDrwFfDd/YqI81Fb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > Hi Jaroslav, Hi Yu Zhao thanks for response, see answers inline: > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jaroslav Pulchart > wrote: > > > > Hello, > > > > I would like to report to you an unpleasant behavior of multi-gen LRU > > with strange swap in/out usage on my Dell 7525 two socket AMD 74F3 > > system (16numa domains). > > Kernel version please? 6.5.y, but we saw it sooner as it is in investigation from 23th May (6.4.y and maybe even the 6.3.y). > > > Symptoms of my issue are > > > > /A/ if mult-gen LRU is enabled > > 1/ [kswapd3] is consuming 100% CPU > > Just thinking out loud: kswapd3 means the fourth node was under memory pr= essure. > > > top - 15:03:11 up 34 days, 1:51, 2 users, load average: 23.34, > > 18.26, 15.01 > > Tasks: 1226 total, 2 running, 1224 sleeping, 0 stopped, 0 zom= bie > > %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, 82.1 id, 0.0 wa, 0.4 hi, > > 0.4 si, 0.0 st > > MiB Mem : 1047265.+total, 28382.7 free, 1021308.+used, 767.6 bu= ff/cache > > MiB Swap: 8192.0 total, 8187.7 free, 4.2 used. 25956.7 av= ail Mem > > ... > > 765 root 20 0 0 0 0 R 98.3 0.0 > > 34969:04 kswapd3 > > ... > > 2/ swap space usage is low about ~4MB from 8GB as swap in zram (was > > observed with swap disk as well and cause IO latency issues due to > > some kind of locking) > > 3/ swap In/Out is huge and symmetrical ~12MB/s in and ~12MB/s out > > > > > > /B/ if mult-gen LRU is disabled > > 1/ [kswapd3] is consuming 3%-10% CPU > > top - 15:02:49 up 34 days, 1:51, 2 users, load average: 23.05, > > 17.77, 14.77 > > Tasks: 1226 total, 1 running, 1225 sleeping, 0 stopped, 0 zom= bie > > %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, 81.8 id, 0.0 wa, 0.4 hi, > > 0.4 si, 0.0 st > > MiB Mem : 1047265.+total, 28378.5 free, 1021313.+used, 767.3 bu= ff/cache > > MiB Swap: 8192.0 total, 8189.0 free, 3.0 used. 25952.4 av= ail Mem > > ... > > 765 root 20 0 0 0 0 S 3.6 0.0 > > 34966:46 [kswapd3] > > ... > > 2/ swap space usage is low (4MB) > > 3/ swap In/Out is huge and symmetrical ~500kB/s in and ~500kB/s out > > > > Both situations are wrong as they are using swap in/out extensively, > > however the multi-gen LRU situation is 10times worse. > > From the stats below, node 3 had the lowest free memory. So I think in > both cases, the reclaim activities were as expected. I do not see a reason for the memory pressure and reclaims. This node has the lowest free memory of all nodes (~302MB free) that is true, however the swap space usage is just 4MB (still going in and out). So what can be the reason for that behaviour? The workers/application is running in pre-allocated HugePages and the rest is used for a small set of system services and drivers of devices. It is static and not growing. The issue persists when I stop the system services and free the memory. > > > Could I ask for any suggestions on how to avoid the kswapd utilization > > pattern? > > The easiest way is to disable NUMA domain so that there would be only > two nodes with 8x more memory. IOW, you have fewer pools but each pool > has more memory and therefore they are less likely to become empty. > > > There is a free RAM in each numa node for the few MB used in > > swap: > > NUMA stats: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > MemTotal: 65048 65486 65486 65486 65486 65486 65486 65469 65486 > > 65486 65486 65486 65486 65486 65486 65424 > > MemFree: 468 601 1200 302 548 1879 2321 2478 1967 2239 1453 2417 > > 2623 2833 2530 2269 > > the in/out usage does not make sense for me nor the CPU utilization by > > multi-gen LRU. > > My questions: > 1. Were there any OOM kills with either case? There is no OOM. The memory usage is not growing nor the swap space usage, it is still a few MB there. > 2. Was THP enabled? Both situations with enabled and with disabled THP. > MGLRU might have spent the extra CPU cycles just to void OOM kills or > produce more THPs. > > If disabling the NUMA domain isn't an option, I'd recommend: Disabling numa is not an option. However we are now testing a setup with -1GB in HugePages per each numa. > 1. Try the latest kernel (6.6.1) if you haven't. Not yet, the 6.6.1 was released today. > 2. Disable THP if it was enabled, to verify whether it has an impact. I try disabling THP without any effect. > > Thanks.