From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AF72C4332F for ; Tue, 14 Nov 2023 07:47:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 124CC6B02B2; Tue, 14 Nov 2023 02:47:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D5686B02BB; Tue, 14 Nov 2023 02:47:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDF7B6B02BC; Tue, 14 Nov 2023 02:47:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DF7466B02B2 for ; Tue, 14 Nov 2023 02:47:48 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AF6B080780 for ; Tue, 14 Nov 2023 07:47:48 +0000 (UTC) X-FDA: 81455780616.08.83A1617 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf14.hostedemail.com (Postfix) with ESMTP id DEA6810000E for ; Tue, 14 Nov 2023 07:47:46 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CyA+ToWC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699948066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IF9wnZ4nWjMYp8mJ0nnWtVFlThVHyWRKthWmG5uOlnw=; b=Q3tITxGdln474u7MzwR6UnLrehSpO5vZbnw4qnQBxp6uE4UHsW0jvlf9hjRwt59hgJLRAp 6qSxOOwsj0GuM8GXzmCy6CSQqY6FDuOUZd2R5uoIS5cwTZwaIoUmoK9gPdKdiHXsznPCld jRMtYgzhHZ3XKQlfTm7lHxP6gTdoW2U= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CyA+ToWC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699948066; a=rsa-sha256; cv=none; b=RFMno+PISFrS47v7rginugWcPa9Oye+ehQJZrrCeiRzIDBvhRVfpFHpPxrkk0r/HctgCYg x4WW8IBas1jNttNaXWiFoIwJBGDLJOGmHFHTFPJDa6nAkKGuQidKDCJIE1FXYyRSLHbTOO Mig8a0XX03Jq97rzm+B9fEDGkNd/V7U= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-421ae866bc2so212821cf.0 for ; Mon, 13 Nov 2023 23:47:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699948066; x=1700552866; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IF9wnZ4nWjMYp8mJ0nnWtVFlThVHyWRKthWmG5uOlnw=; b=CyA+ToWCPKetAeR4hKDopNkja2vDuX7Adb72x7ebiU+vyGkvMX7L5nBw6E0wK+XN14 UOyuu9Rlquze4k5aFBiF9npvwGowPG+Mkzex5YKKkB2r71r94kXOSXaJHdhdNAfBERfJ ROPeRMHIMuU3g48lLjo1b5vrddvlpHs6MZG9uhLu3pINibvqfgcTUz9Jcs4hMFWKuDf0 QdmSBMVFH1HEo0ckOa0/Y+cyKlaqCVZSL+8Vu/DauNGmeUtFEhGFF0LwIot3IMsr1Yu0 oll1l0B12N+dL0bTSpYwC1nqKMQ4VuWPEnj9/KBmkjis/BApPu8WLKjb/L7yLYRuIHSO 47oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699948066; x=1700552866; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IF9wnZ4nWjMYp8mJ0nnWtVFlThVHyWRKthWmG5uOlnw=; b=ojNMZM8g7WSkzb4/o0yCpHcOhj06Lj/XdpHBDorK+lPJ2CR8K9zUr6eBaGTpWYSjPe unZNKruaXOqNR2KbYOz0ExuF5YWTDSVnN9FuxQTlpD79KFQfl/vRnRwA6ASJkpWzzc2J MlnUnLkje06ydNN+0bJcqT2jZTdXTSiKa7HI7/a017BBkW2an+DhCwaMrahBkFscNIbY w4B7BJtupdzXtLT2MJ3wvozf4K9y65HhczCAfuZr0zCrlL/aaXdUTkiy9buf1BFOxzFG oounXBqjnImWslQ4dFdnkc3XYYiHMOX3ynTVtmTLF4/kHoh08oUbHgLSBGWlAmQel73+ bWYA== X-Gm-Message-State: AOJu0YxCa7ZJQ3zdcM9cXMOmEEb2p2Lw1sCXUn96JEiDNVVD5DJ6vnnF TG34u3JZ8MXLgy9W2LJOOwNUTk0riBInvomIcEgXNQ== X-Google-Smtp-Source: AGHT+IGXnUKrnIpPiYV2qncuN9ns2rDoKvcFgqD03bHqeSNmDkc3HNAMUNoGrrA3MG56DNw6RKRXV6d5TFFtfpNuPKo= X-Received: by 2002:a05:622a:6093:b0:421:6310:b4a6 with SMTP id hf19-20020a05622a609300b004216310b4a6mr197107qtb.8.1699948065765; Mon, 13 Nov 2023 23:47:45 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yu Zhao Date: Tue, 14 Nov 2023 00:47:08 -0700 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Jaroslav Pulchart Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Igor Raits , Daniel Secik , Charan Teja Kalla Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DEA6810000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: p65a7d85tny5set5fw57aotfhipjme7s X-HE-Tag: 1699948066-146706 X-HE-Meta: U2FsdGVkX189DqnVqhHVCQeco9LnvOMAKY10/WC1/iGhYAkzDnv0QCpOC+vR3WypANAWA9HUZhEOAAQUGWwUuAXrZJ6mttBTw+1Mq2B6pdMYajAQqfw0pXtoGnGNcCqF34BmEhlPFjBvYsbBqoUNBEEiuIUXsvrYTuBJiJcN1Cocc/BSKQVOmEaSywEfqd1E/3if7/kDI6kG5+CMiIHqQ/G8zGdty2fo+5cbjfxwOrcH57Z5Vw2cQhumbUo9ilANXyikcwUR6Arozfu5ditmu5+YSMCd1J3Xp41VNyJIjYcZSp4JW5akUuj02GK1xczXUHpnD/A7Fu+bYm2hSweZSPyGBdz9cxqqoePlsrvzSFs4OhlQqUy+H1GYJH92ybUwqxgbJUyB2iL+BSrWnvp/odFH08ghGNuFFKFdzHhWgARME6ckZPgfx7Q/+7utkxUU0cSXagaLeokjHMEzlGBkK3awSKpOgCaBLQhew1yxWW0uSG7v8LT3yZNJDMWbcEdakjZTQcnwV4wsSqqm14yfP27WJxNdin2gsK+BJvS7isepHmi3CJj8WMGx86VUdXvPpQCTAJ+ZneHOdGIj/3ZKqi5eKqJOM0mtU4Pz6hoSqAFgvnZLkhx9zPyNucrek7NrVPjDaqNN+f57OHHIXzMymNhkAGi3aMVDoL9HwOZ8O/PBaGdgzRi5JEe+eOx6im1F25oBDUAS1kiQgdro2F7crT8zzM78Ompa73sI+1CMyU3daIwhxg3zFN5P6cY1yAAl5OG6EtOjwZuSOBHH01i1dU0IiOslZrTfXYY8f4UHV+bWU5sjsJ5EHjz6MDrLUjE8Uhyxdy/Rmot147keKj3EdU9hA83amNBS7zPHitx/B3f/OZwdDnX+klo4ymby6bHLdnREnULmnzKnKTuF+lXOltaH6dTe/o/CSgHiL69/36FSTK7viXJ7yYjn9PJTDKJHe5ZWzfPfns7uK+L18tt 3K9/FvmW qEv8X5jq88sV/GoD4gm0+yHljminxB276a8rZPoR+Bzr/cS1iuQ+AXkPV+IT/VubUInrEWScPMXC5CIcm2Tx4Gm1PFNIsm+QhbvWuQAZEz3tDzPjBuMHl1p/abeeOfRHauLTE0Y9dXv4qXgpik4d8QnHC/s6ocjwWd0l/gDC4fLou/1khthmUqy8/UtYv5XhZowt8IG53xO0XWe6YAvchNL/WrxooKl48ajzMhBZmN6GNVFbHgzNbavMbOREQk2e/9UGqLgfczyYawnRAQ+tjUMXfxNMyWyWNXFHK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 14, 2023 at 12:30=E2=80=AFAM Jaroslav Pulchart wrote: > > > > > On Mon, Nov 13, 2023 at 1:36=E2=80=AFAM Jaroslav Pulchart > > wrote: > > > > > > > > > > > On Thu, Nov 9, 2023 at 3:58=E2=80=AFAM Jaroslav Pulchart > > > > wrote: > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 10:39=E2=80=AFPM Jaroslav Pulchart > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 12:04=E2=80=AFPM Jaroslav Pulchart > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Jaroslav, > > > > > > > > > > > > > > > > > > Hi Yu Zhao > > > > > > > > > > > > > > > > > > thanks for response, see answers inline: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 8, 2023 at 6:35=E2=80=AFAM Jaroslav Pulchar= t > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > I would like to report to you an unpleasant behavior = of multi-gen LRU > > > > > > > > > > > with strange swap in/out usage on my Dell 7525 two so= cket AMD 74F3 > > > > > > > > > > > system (16numa domains). > > > > > > > > > > > > > > > > > > > > Kernel version please? > > > > > > > > > > > > > > > > > > 6.5.y, but we saw it sooner as it is in investigation fro= m 23th May > > > > > > > > > (6.4.y and maybe even the 6.3.y). > > > > > > > > > > > > > > > > v6.6 has a few critical fixes for MGLRU, I can backport the= m to v6.5 > > > > > > > > for you if you run into other problems with v6.6. > > > > > > > > > > > > > > > > > > > > > > I will give it a try using 6.6.y. When it will work we can sw= itch to > > > > > > > 6.6.y instead of backporting the stuff to 6.5.y. > > > > > > > > > > > > > > > > > > Symptoms of my issue are > > > > > > > > > > > > > > > > > > > > > > /A/ if mult-gen LRU is enabled > > > > > > > > > > > 1/ [kswapd3] is consuming 100% CPU > > > > > > > > > > > > > > > > > > > > Just thinking out loud: kswapd3 means the fourth node w= as under memory pressure. > > > > > > > > > > > > > > > > > > > > > top - 15:03:11 up 34 days, 1:51, 2 users, load= average: 23.34, > > > > > > > > > > > 18.26, 15.01 > > > > > > > > > > > Tasks: 1226 total, 2 running, 1224 sleeping, = 0 stopped, 0 zombie > > > > > > > > > > > %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, 82.1 id, 0.0= wa, 0.4 hi, > > > > > > > > > > > 0.4 si, 0.0 st > > > > > > > > > > > MiB Mem : 1047265.+total, 28382.7 free, 1021308.= +used, 767.6 buff/cache > > > > > > > > > > > MiB Swap: 8192.0 total, 8187.7 free, 4.2= used. 25956.7 avail Mem > > > > > > > > > > > ... > > > > > > > > > > > 765 root 20 0 0 0 0 R = 98.3 0.0 > > > > > > > > > > > 34969:04 kswapd3 > > > > > > > > > > > ... > > > > > > > > > > > 2/ swap space usage is low about ~4MB from 8GB as swa= p in zram (was > > > > > > > > > > > observed with swap disk as well and cause IO latency = issues due to > > > > > > > > > > > some kind of locking) > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~12MB/s in and= ~12MB/s out > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /B/ if mult-gen LRU is disabled > > > > > > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU > > > > > > > > > > > top - 15:02:49 up 34 days, 1:51, 2 users, load= average: 23.05, > > > > > > > > > > > 17.77, 14.77 > > > > > > > > > > > Tasks: 1226 total, 1 running, 1225 sleeping, = 0 stopped, 0 zombie > > > > > > > > > > > %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, 81.8 id, 0.0= wa, 0.4 hi, > > > > > > > > > > > 0.4 si, 0.0 st > > > > > > > > > > > MiB Mem : 1047265.+total, 28378.5 free, 1021313.= +used, 767.3 buff/cache > > > > > > > > > > > MiB Swap: 8192.0 total, 8189.0 free, 3.0= used. 25952.4 avail Mem > > > > > > > > > > > ... > > > > > > > > > > > 765 root 20 0 0 0 0 S = 3.6 0.0 > > > > > > > > > > > 34966:46 [kswapd3] > > > > > > > > > > > ... > > > > > > > > > > > 2/ swap space usage is low (4MB) > > > > > > > > > > > 3/ swap In/Out is huge and symmetrical ~500kB/s in an= d ~500kB/s out > > > > > > > > > > > > > > > > > > > > > > Both situations are wrong as they are using swap in/o= ut extensively, > > > > > > > > > > > however the multi-gen LRU situation is 10times worse. > > > > > > > > > > > > > > > > > > > > From the stats below, node 3 had the lowest free memory= . So I think in > > > > > > > > > > both cases, the reclaim activities were as expected. > > > > > > > > > > > > > > > > > > I do not see a reason for the memory pressure and reclaim= s. This node > > > > > > > > > has the lowest free memory of all nodes (~302MB free) tha= t is true, > > > > > > > > > however the swap space usage is just 4MB (still going in = and out). So > > > > > > > > > what can be the reason for that behaviour? > > > > > > > > > > > > > > > > The best analogy is that refuel (reclaim) happens before th= e tank > > > > > > > > becomes empty, and it happens even sooner when there is a l= ong road > > > > > > > > ahead (high order allocations). > > > > > > > > > > > > > > > > > The workers/application is running in pre-allocated HugeP= ages and the > > > > > > > > > rest is used for a small set of system services and drive= rs of > > > > > > > > > devices. It is static and not growing. The issue persists= when I stop > > > > > > > > > the system services and free the memory. > > > > > > > > > > > > > > > > Yes, this helps. > > > > > > > > Also could you attach /proc/buddyinfo from the moment > > > > > > > > you hit the problem? > > > > > > > > > > > > > > > > > > > > > > I can. The problem is continuous, it is 100% of time continuo= usly > > > > > > > doing in/out and consuming 100% of CPU and locking IO. > > > > > > > > > > > > > > The output of /proc/buddyinfo is: > > > > > > > > > > > > > > # cat /proc/buddyinfo > > > > > > > Node 0, zone DMA 7 2 2 1 1 = 2 1 > > > > > > > 1 1 2 1 > > > > > > > Node 0, zone DMA32 4567 3395 1357 846 439 1= 90 93 > > > > > > > 61 43 23 4 > > > > > > > Node 0, zone Normal 19 190 140 129 136 = 75 66 > > > > > > > 41 9 1 5 > > > > > > > Node 1, zone Normal 194 1210 2080 1800 715 2= 55 111 > > > > > > > 56 42 36 55 > > > > > > > Node 2, zone Normal 204 768 3766 3394 1742 4= 68 185 > > > > > > > 194 238 47 74 > > > > > > > Node 3, zone Normal 1622 2137 1058 846 388 2= 08 97 > > > > > > > 44 14 42 10 > > > > > > > > > > > > Again, thinking out loud: there is only one zone on node 3, i.e= ., the > > > > > > normal zone, and this excludes the problem commit > > > > > > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-gen LRU: fix p= er-zone > > > > > > reclaim") fixed in v6.6. > > > > > > > > > > I built vanila 6.6.1 and did the first fast test - spin up and de= stroy > > > > > VMs only - This test does not always trigger the kswapd3 continuo= us > > > > > swap in/out usage but it uses it and it looks like there is a > > > > > change: > > > > > > > > > > I can see kswapd non-continous (15s and more) usage with 6.5.y > > > > > # ps ax | grep [k]swapd > > > > > 753 ? S 0:00 [kswapd0] > > > > > 754 ? S 0:00 [kswapd1] > > > > > 755 ? S 0:00 [kswapd2] > > > > > 756 ? S 0:15 [kswapd3] <<<<<<<<< > > > > > 757 ? S 0:00 [kswapd4] > > > > > 758 ? S 0:00 [kswapd5] > > > > > 759 ? S 0:00 [kswapd6] > > > > > 760 ? S 0:00 [kswapd7] > > > > > 761 ? S 0:00 [kswapd8] > > > > > 762 ? S 0:00 [kswapd9] > > > > > 763 ? S 0:00 [kswapd10] > > > > > 764 ? S 0:00 [kswapd11] > > > > > 765 ? S 0:00 [kswapd12] > > > > > 766 ? S 0:00 [kswapd13] > > > > > 767 ? S 0:00 [kswapd14] > > > > > 768 ? S 0:00 [kswapd15] > > > > > > > > > > and none kswapd usage with 6.6.1, that looks to be promising path > > > > > > > > > > # ps ax | grep [k]swapd > > > > > 808 ? S 0:00 [kswapd0] > > > > > 809 ? S 0:00 [kswapd1] > > > > > 810 ? S 0:00 [kswapd2] > > > > > 811 ? S 0:00 [kswapd3] <<<< nice > > > > > 812 ? S 0:00 [kswapd4] > > > > > 813 ? S 0:00 [kswapd5] > > > > > 814 ? S 0:00 [kswapd6] > > > > > 815 ? S 0:00 [kswapd7] > > > > > 816 ? S 0:00 [kswapd8] > > > > > 817 ? S 0:00 [kswapd9] > > > > > 818 ? S 0:00 [kswapd10] > > > > > 819 ? S 0:00 [kswapd11] > > > > > 820 ? S 0:00 [kswapd12] > > > > > 821 ? S 0:00 [kswapd13] > > > > > 822 ? S 0:00 [kswapd14] > > > > > 823 ? S 0:00 [kswapd15] > > > > > > > > > > I will install the 6.6.1 on the server which is doing some work a= nd > > > > > observe it later today. > > > > > > > > Thanks. Fingers crossed. > > > > > > The 6.6.y was deployed and used from 9th Nov 3PM CEST. So far so good= . > > > The node 3 has 163MiB free of memory and I see > > > just a few in/out swap usage sometimes (which is expected) and minima= l > > > kswapd3 process usage for almost 4days. > > > > Thanks for the update! > > > > Just to confirm: > > 1. MGLRU was enabled, and > > Yes, MGLRU is enabled > > > 2. The v6.6 deployed did NOT have the patch I attached earlier. > > Vanila 6.6, attached patch NOT applied. > > > Are both correct? > > > > If so, I'd very appreciate it if you could try the attached patch on > > top of v6.5 and see if it helps. My suspicion is that the problem is > > compaction related, i.e., kswapd was woken up by high order > > allocations but didn't properly stop. But what causes the behavior > > Sure, I can try it. Will inform you about progress. Thanks! > > difference on v6.5 between MGLRU and the active/inactive LRU still > > puzzles me --the problem might be somehow masked rather than fixed on > > v6.6. > > I'm not sure how I can help with the issue. Any suggestions on what to > change/try? Trying the attached patch is good enough for now :)