From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2F3FC4332F for ; Fri, 10 Nov 2023 01:31:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F0114401BD; Thu, 9 Nov 2023 20:31:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 19FEE4401BC; Thu, 9 Nov 2023 20:31:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0682F4401BD; Thu, 9 Nov 2023 20:31:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E7B414401BC for ; Thu, 9 Nov 2023 20:31:56 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C1F96B60A1 for ; Fri, 10 Nov 2023 01:31:56 +0000 (UTC) X-FDA: 81440318232.09.F5D5EC0 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf13.hostedemail.com (Postfix) with ESMTP id 17D5520020 for ; Fri, 10 Nov 2023 01:31:54 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="USabq/rC"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699579915; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qD6xGIGcfKR5SDHU+jIFCnOlgNVxHhHTxyn3EeSmyMU=; b=c96blaOvfNHR7YpSqO7OWMW67c8FTbbcfQEHJHaGpr7pSi5bSpkLmKnvRyZNDyVVekfeBL Pq7elhEa2w1g0oPyny3PDkbR1J2JGDnKOpQIDuY8tjDOycPdNUCv6N8atBliG8J7J1vxGM zuvG+ACcIwjJ5KzsZcqsHfAtpa9li+o= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="USabq/rC"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699579915; a=rsa-sha256; cv=none; b=c9XAbX8xz5Huw5FosvHS+rSfwXmr6EUJJ1b+WMBDeUk8RTQNMaPxoLzz6M1c5CFmlFqZLz F8nQ3LyhgM9nIGomgQTYxY0H/3iej2RBghmuJCBiCUZzmRX7RsTD3YTg0tf7BJ4raG93Zx jV5wD4q8OcgAC2i8kSFF+7TaPSjj/CQ= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-41cb7720579so436231cf.1 for ; Thu, 09 Nov 2023 17:31:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699579914; x=1700184714; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qD6xGIGcfKR5SDHU+jIFCnOlgNVxHhHTxyn3EeSmyMU=; b=USabq/rCJfS4bPegBDAbmXdRQGj+ub5WoGfPRHW3k+e2bBWUmarqS/Cl1vU0b7Mlox Xf9ipd6+hfCCEcBYOt82zIKdHvVIu1afEvctmUu9n0APqv5CrzktiL9NY+Di903R6E8X Lje9LevhX2mgOTT8Pwbk2utoPvK+mHc/n3jrmSPorDbYgzxZOD+PyUkxBBZ6ggr+kEef 8c8ULeycezuElY0x5xplG6oFyqttQGLFONmWS2LVIpELc2HI5pa5RllPa0IDkRFQngF/ 7B6Bne/d5zmiCHApfwRjeSsmoeZYYQ8D8jVC6v+Jck2v4pYtuSskHJrj1Pzzp5J+6JF5 XCJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699579914; x=1700184714; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qD6xGIGcfKR5SDHU+jIFCnOlgNVxHhHTxyn3EeSmyMU=; b=hHCB+R7UFpvFLjk5KxU8zTio3U8FRz9nO1T8qXKmkq7RCYUMWPtyaXSORRJ1P/qFmB A9PkO7+7N2Aa8AdVK26zq5yvxcc/1/28P8LVllBpupbeOQmGa0ouZAnjQ0JbvuctHcMQ iRTSLZCBEIy4/9+MJk6/eHEOXQBQjxjuTZVCkIjc9sDqqIMivmlpEBsfGNOQA+LpsHzc EXiPhFfIc+CJLvzUO1b2UFjuwWMEDrY5BzwwpjLc+goN7fJvro9t0F+t9aIpFvm0a52D M1I1xAHn/zK4yIaR2L3TsDVsNeSs3XEObrqgOfF6HsLz1NWZ+DgRQvUeMnQbQ+vB9I7J 058A== X-Gm-Message-State: AOJu0Yx7cOB/nW7NXFLuoVDvcqLwhWx42JvYEXxrjkX7hJVghAyHwzSi LERdBjmethLDFT1XMIwTjMDawyGpyoOuHrWIDfkDmg== X-Google-Smtp-Source: AGHT+IEnMh0ci07l+oa5wrr1LF5ORfQa/KRV24UpfAM5E3iPc8GHD4rBDCVuOnDx7g2NbS1gwEk35XGgVQhYhE1lodg= X-Received: by 2002:ac8:45c4:0:b0:417:944a:bcb2 with SMTP id e4-20020ac845c4000000b00417944abcb2mr461139qto.13.1699579914016; Thu, 09 Nov 2023 17:31:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yu Zhao Date: Thu, 9 Nov 2023 18:31:17 -0700 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Jaroslav Pulchart Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Igor Raits , Daniel Secik , Charan Teja Kalla Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 17D5520020 X-Stat-Signature: em779gh1eqh7oq6oodbdiyukxxujzg64 X-Rspam-User: X-HE-Tag: 1699579914-948978 X-HE-Meta: U2FsdGVkX1+yIsPPQq2yruRzESum0OpRbvU8T17hiZKOZR8K4yW0ZlzXnfAHvyOy1HtNy9raXdoF8I61Naqy21HfnVJSwz5SW1uESNviaKI9n0AhJ9vacfoR4DAZEWkCIO7IwGLNXvd83kHv/FxKX1WFxImLk0VS9wjewsRFvFGCC32XTvj3DxQCPXlnwdXFZdI+6g7FiIRSJ1QwlBDdBcuVSRDzRzPNybdiaXatduWDk9xtSIx02+ioxdi8jkPslnDG/YL44uCn6quqdC5CNLb1xemKpJsgbJFqj37X1nxjZZjAuZXqZmDttIWVenkumW6HcWypGqiNfWi4E58982QST4pa0eA9Ot/3D38EA/9gaLAoSb0OU2Rv2arnYEMW1EPigpuPrp/ytzp78+S0SUW23XwbvFcELG35h1saPYD7j6yREDE/pPcNZccAIyDAA2htv4eqICjS7Qe1BRjaaJadEXF+6i8LU+EODbp9oXfXjMH/PUJezMlxM78YKz26/5EBrWQvInh8R5AtdU2OUybqGcd4ykJZOKev+SZYo2qmgRaUeGNjI68FuI+7OlArB4Q1duaNT7mdlsTgGyqJr5OnfPgHgdZc6t2Xveu16CYxLT3xcsMyJINXTLLbDikPCouMd3N6lUoXwV8bp1g+yS2RrNsUwRH5g+nnFlYwhgvz9xF+JPjCUdRIwQZBX2bQUftPkAB7i1opWWSLz+HCA0JKMBmtURhCOchvdFBP9UH2eUYAWlT782Hn1wYWEAwQuFyPGTEPI/qDxiIvxQ3kZstpI4AQG08tSw3HPthbOf0lJGtgXlwHPaRZnXCjJm7FyIddVh+kclxlseWZIsCAcRJPltjSJtmr5MtHv4DoZ8VXKLEQ5/0sqrDTTldY+0/+OAtbeP4N1ZHRK/p757vQia3/L5jJp35MnpSgGj1jhVH7cw6rdO4rs00ssEXaJp6gPECuvuIGWh5PRitRPQd Mai4imAu eM/bHRWD/kRPYBXcGneSaG202blsVZqyGtMQ19w8k13tGQExoQgOQ43zLqxPIWkQ9SYZrax/2GJYhPPlpQCPK/oJkj3mn8VHR438gUv+VCecJSpZYB1qneRwlB1rDzBR9MRnOPTsPxbcvBs6OmOCmLRa+28vgzdX6olglW3xjw0aadR19BEZSMg5DRP+01CizVrWkUKqciBgtkJYXK7EGdcbDX2+wAzgdzHbxLLTVcumqPfPf1CT+nMCmiHVOBNVKB0InN3VMEQF1QyCOHMYWYimHi9cLm/P1q4f8TnoY7zB3NUWfQFpq+adi9DXb3LlBrt9z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 9, 2023 at 3:58=E2=80=AFAM Jaroslav Pulchart wrote: > > > > > On Wed, Nov 8, 2023 at 10:39=E2=80=AFPM Jaroslav Pulchart > > wrote: > > > > > > > > > > > On Wed, Nov 8, 2023 at 12:04=E2=80=AFPM Jaroslav Pulchart > > > > wrote: > > > > > > > > > > > > > > > > > Hi Jaroslav, > > > > > > > > > > Hi Yu Zhao > > > > > > > > > > thanks for response, see answers inline: > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jaroslav Pulchart > > > > > > wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I would like to report to you an unpleasant behavior of multi= -gen LRU > > > > > > > with strange swap in/out usage on my Dell 7525 two socket AMD= 74F3 > > > > > > > system (16numa domains). > > > > > > > > > > > > Kernel version please? > > > > > > > > > > 6.5.y, but we saw it sooner as it is in investigation from 23th M= ay > > > > > (6.4.y and maybe even the 6.3.y). > > > > > > > > v6.6 has a few critical fixes for MGLRU, I can backport them to v6.= 5 > > > > for you if you run into other problems with v6.6. > > > > > > > > > > I will give it a try using 6.6.y. When it will work we can switch to > > > 6.6.y instead of backporting the stuff to 6.5.y. > > > > > > > > > > Symptoms of my issue are > > > > > > > > > > > > > > /A/ if mult-gen LRU is enabled > > > > > > > 1/ [kswapd3] is consuming 100% CPU > > > > > > > > > > > > Just thinking out loud: kswapd3 means the fourth node was under= memory pressure. > > > > > > > > > > > > > top - 15:03:11 up 34 days, 1:51, 2 users, load average= : 23.34, > > > > > > > 18.26, 15.01 > > > > > > > Tasks: 1226 total, 2 running, 1224 sleeping, 0 stoppe= d, 0 zombie > > > > > > > %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, 82.1 id, 0.0 wa, 0.= 4 hi, > > > > > > > 0.4 si, 0.0 st > > > > > > > MiB Mem : 1047265.+total, 28382.7 free, 1021308.+used, = 767.6 buff/cache > > > > > > > MiB Swap: 8192.0 total, 8187.7 free, 4.2 used. = 25956.7 avail Mem > > > > > > > ... > > > > > > > 765 root 20 0 0 0 0 R 98.3 = 0.0 > > > > > > > 34969:04 kswapd3 > > > > > > > ... > > > > > > > 2/ swap space usage is low about ~4MB from 8GB as swap in zra= m (was > > > > > > > observed with swap disk as well and cause IO latency issues d= ue to > > > > > > > some kind of locking) > > > > > > > 3/ swap In/Out is huge and symmetrical ~12MB/s in and ~12MB/s= out > > > > > > > > > > > > > > > > > > > > > /B/ if mult-gen LRU is disabled > > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU > > > > > > > top - 15:02:49 up 34 days, 1:51, 2 users, load average= : 23.05, > > > > > > > 17.77, 14.77 > > > > > > > Tasks: 1226 total, 1 running, 1225 sleeping, 0 stoppe= d, 0 zombie > > > > > > > %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, 81.8 id, 0.0 wa, 0.= 4 hi, > > > > > > > 0.4 si, 0.0 st > > > > > > > MiB Mem : 1047265.+total, 28378.5 free, 1021313.+used, = 767.3 buff/cache > > > > > > > MiB Swap: 8192.0 total, 8189.0 free, 3.0 used. = 25952.4 avail Mem > > > > > > > ... > > > > > > > 765 root 20 0 0 0 0 S 3.6 0= .0 > > > > > > > 34966:46 [kswapd3] > > > > > > > ... > > > > > > > 2/ swap space usage is low (4MB) > > > > > > > 3/ swap In/Out is huge and symmetrical ~500kB/s in and ~500kB= /s out > > > > > > > > > > > > > > Both situations are wrong as they are using swap in/out exten= sively, > > > > > > > however the multi-gen LRU situation is 10times worse. > > > > > > > > > > > > From the stats below, node 3 had the lowest free memory. So I t= hink in > > > > > > both cases, the reclaim activities were as expected. > > > > > > > > > > I do not see a reason for the memory pressure and reclaims. This = node > > > > > has the lowest free memory of all nodes (~302MB free) that is tru= e, > > > > > however the swap space usage is just 4MB (still going in and out)= . So > > > > > what can be the reason for that behaviour? > > > > > > > > The best analogy is that refuel (reclaim) happens before the tank > > > > becomes empty, and it happens even sooner when there is a long road > > > > ahead (high order allocations). > > > > > > > > > The workers/application is running in pre-allocated HugePages and= the > > > > > rest is used for a small set of system services and drivers of > > > > > devices. It is static and not growing. The issue persists when I = stop > > > > > the system services and free the memory. > > > > > > > > Yes, this helps. > > > > Also could you attach /proc/buddyinfo from the moment > > > > you hit the problem? > > > > > > > > > > I can. The problem is continuous, it is 100% of time continuously > > > doing in/out and consuming 100% of CPU and locking IO. > > > > > > The output of /proc/buddyinfo is: > > > > > > # cat /proc/buddyinfo > > > Node 0, zone DMA 7 2 2 1 1 2 = 1 > > > 1 1 2 1 > > > Node 0, zone DMA32 4567 3395 1357 846 439 190 9= 3 > > > 61 43 23 4 > > > Node 0, zone Normal 19 190 140 129 136 75 6= 6 > > > 41 9 1 5 > > > Node 1, zone Normal 194 1210 2080 1800 715 255 11= 1 > > > 56 42 36 55 > > > Node 2, zone Normal 204 768 3766 3394 1742 468 18= 5 > > > 194 238 47 74 > > > Node 3, zone Normal 1622 2137 1058 846 388 208 9= 7 > > > 44 14 42 10 > > > > Again, thinking out loud: there is only one zone on node 3, i.e., the > > normal zone, and this excludes the problem commit > > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-gen LRU: fix per-zone > > reclaim") fixed in v6.6. > > I built vanila 6.6.1 and did the first fast test - spin up and destroy > VMs only - This test does not always trigger the kswapd3 continuous > swap in/out usage but it uses it and it looks like there is a > change: > > I can see kswapd non-continous (15s and more) usage with 6.5.y > # ps ax | grep [k]swapd > 753 ? S 0:00 [kswapd0] > 754 ? S 0:00 [kswapd1] > 755 ? S 0:00 [kswapd2] > 756 ? S 0:15 [kswapd3] <<<<<<<<< > 757 ? S 0:00 [kswapd4] > 758 ? S 0:00 [kswapd5] > 759 ? S 0:00 [kswapd6] > 760 ? S 0:00 [kswapd7] > 761 ? S 0:00 [kswapd8] > 762 ? S 0:00 [kswapd9] > 763 ? S 0:00 [kswapd10] > 764 ? S 0:00 [kswapd11] > 765 ? S 0:00 [kswapd12] > 766 ? S 0:00 [kswapd13] > 767 ? S 0:00 [kswapd14] > 768 ? S 0:00 [kswapd15] > > and none kswapd usage with 6.6.1, that looks to be promising path > > # ps ax | grep [k]swapd > 808 ? S 0:00 [kswapd0] > 809 ? S 0:00 [kswapd1] > 810 ? S 0:00 [kswapd2] > 811 ? S 0:00 [kswapd3] <<<< nice > 812 ? S 0:00 [kswapd4] > 813 ? S 0:00 [kswapd5] > 814 ? S 0:00 [kswapd6] > 815 ? S 0:00 [kswapd7] > 816 ? S 0:00 [kswapd8] > 817 ? S 0:00 [kswapd9] > 818 ? S 0:00 [kswapd10] > 819 ? S 0:00 [kswapd11] > 820 ? S 0:00 [kswapd12] > 821 ? S 0:00 [kswapd13] > 822 ? S 0:00 [kswapd14] > 823 ? S 0:00 [kswapd15] > > I will install the 6.6.1 on the server which is doing some work and > observe it later today. Thanks. Fingers crossed.