From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C11C47073 for ; Thu, 4 Jan 2024 14:34:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FEC66B038F; Thu, 4 Jan 2024 09:34:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AF886B0390; Thu, 4 Jan 2024 09:34:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB91E6B0393; Thu, 4 Jan 2024 09:34:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DBF506B038F for ; Thu, 4 Jan 2024 09:34:50 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B12C4120958 for ; Thu, 4 Jan 2024 14:34:50 +0000 (UTC) X-FDA: 81641875140.27.1A1D632 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf30.hostedemail.com (Postfix) with ESMTP id C5E0D80012 for ; Thu, 4 Jan 2024 14:34:48 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b="OSGz/hmF"; spf=pass (imf30.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704378889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PuSIsEFKNbK1pKbw4K4chPrysSL8RUueZo8q/fBH2oE=; b=b+PrZ0x8zaaZjo4GV21VXie4K8sZGadWmQUbqjXLDCOkI1EEv9cvNcxmtwGTWxY3pv9DEP iTR6Nzeeyb1y2WXMjeVvQGRXJQCNLxxpNdweqVFelwL4U02ojBRnnoOVSnzXH9xLIJFkLI QeTfoZjhCmWO9h2lpykBxorj2a0hyXw= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b="OSGz/hmF"; spf=pass (imf30.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704378889; a=rsa-sha256; cv=none; b=S2CgzIfCGAfh+Aiu6NBDaB+6P82XNcjQzoN8S63l68KYN1Ha272HQBEp3ZCCUjrNDmxLVB JKvc0CCARZeSyva7/Xadl7YZRaG4PSuvO32Q3cbbYCX97tYgK+PJSmeTVkpY/dQNqRBBEf nRZwb3dPl6i4uHF7kAZoTnq01vMNTgg= Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-50ea98440a7so571982e87.1 for ; Thu, 04 Jan 2024 06:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gooddata.com; s=google; t=1704378887; x=1704983687; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PuSIsEFKNbK1pKbw4K4chPrysSL8RUueZo8q/fBH2oE=; b=OSGz/hmF7oBrFsBGhqAzNpcbZpn0cINuEfVCjOGaj76qztn1b2M9PQIHpog0o8+Qyw VtEIZif0xUbVtkVpM0jyKzWmR4Czp4zk0W4XISYG0zm0S5j0DlZzuol/Y4Wdxj4LSb+n rzRc92gQKEX2Av9Aj6YPXTV+gPiolR/4nhLow= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704378887; x=1704983687; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PuSIsEFKNbK1pKbw4K4chPrysSL8RUueZo8q/fBH2oE=; b=BDqanAVZyL3vSzkkH8NEzNE8ewRMSRig9ASRUNyY5I2+4ePVEUnNI7W6fXpyK0WBOR 9QI0yLUaE3Dk+3KN1Y6NzsO5tUe631JXjXgb1IUqKWINs0l6Cn+VSu7fEFSmPDqN+GLm WINEnyzPr7GXy0IV48TBAJSzm5oeCNx7SqdRb5UJ6HybbjkC32s9FnqjLt/W6NZqzrP2 jjCLAMI9/1nMXbb1HgPXXWLwJzJMvea9yiGMdENA5A9voFBW1mx+cJGAyChvmpiVOdsQ 9z7+qq0OhAtRrsI6Qq9rn4rmZ7Xf47ELGCbNqghHUnsI8+dUavyMUlvr1Iqzzi18eK5g 5kDw== X-Gm-Message-State: AOJu0YzLIASvbuUHxQh1Xt40pWoFr1s5mQtlWT8ISwcvkfD+btJq3E3a oEoXKgJOwq21SgcqydsGDgEtujKR1iSK8JFtlNfOxrMV0K9/ X-Google-Smtp-Source: AGHT+IHT7V/+NO5MO7jPnG8vIMW3GMa/AxmwdwJ0++Iz96ttcyjAin6D8f51GhEME02YtZsmzIy2B6+bhXaRR40uFEY= X-Received: by 2002:a05:6512:3994:b0:50e:80d9:dd9b with SMTP id j20-20020a056512399400b0050e80d9dd9bmr311286lfu.183.1704378886911; Thu, 04 Jan 2024 06:34:46 -0800 (PST) MIME-Version: 1.0 References: <7df7e478-bd93-03df-5b10-19308f416e95@quicinc.com> In-Reply-To: From: Jaroslav Pulchart Date: Thu, 4 Jan 2024 15:34:20 +0100 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Yu Zhao Cc: Daniel Secik , Charan Teja Kalla , Igor Raits , Kalesh Singh , akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C5E0D80012 X-Rspam-User: X-Stat-Signature: 51bg944mmh4btmd5qda17c54j9c4ytjo X-Rspamd-Server: rspam01 X-HE-Tag: 1704378888-934527 X-HE-Meta: U2FsdGVkX1/Nry7pzLjs8O4sWiVeBzW2Dib7bZwSJA4oNwMqm3bMBo6AbNZvWatb2ZV0ZXDq7jDR7Ki98oZZoibTFSdpUIeU0WwC7VmCdF8Oz8FfwIL5pkBq8kvQDvohvDc+NS/mRaaWGThKT/nV8YzjHIYfD8hQf8deY0iLoe0TeTWWQTQwN3/CK++8Ml6+/pnSZC+ILMWs05+B16z+2yIF9BR2tvjrGCXl87x7Ac5cA0Bu40KAZyzYGSwXADiq7lu0ebBGm58d1eI812bMCbNy3QPK1coXS4A1APRsHMC22KFzIi0/kjcp/Pn0CgAT6ULcp081LSwLB/PzUIsyD0QqbXRFzzp+pZstt0vJzNPwEL98q6+YMizWPJe48UZkTxXclIhm7Mr8u9fBP93xCqivyCctfXGEi6sqHOPTFDj/ifc4ojub/l09qnRFZbHGKiOB52lLzQFFQn0o4/2fHPdxlaKtzS6z8LFaUM75CwVuwEWGiJ9XCnecMHao2ArzG/3yEWzJE2mwcw62NQ741mkqHVUAyt6vzzte4L7qFg7ScKeFC4MefBEe9xonlC8Lj3UnNZSuFtEsjmcCTjpuFYBoq9WBM4r8Dmp1i8Zljbp3HltJQ0DqwgfE/g4Wz8OuwMfyZY5xco5cXVNmVbpRW4JUtWgEidIrEJXpfBl78vpNFtdfv0DHKKCUq2o6ofFye0hBC6bJFhjWWutJL12Ie5JlLhzrqV31LOBju0dFKL3TmbZ3j2g+NKOH9L8UnCpQPg6luCsfKslPAIZSHY7+YjHCt64RpbhV2c5Ao6SZpRWDGULjchU1ljZYKffkJpITjRZZHaRFzp32kC+JyPkOfS9vvLBl4Xp/+HDauI5Eenc7WKPD9jJyVVPyv9E21fvrdPeP/GGV26x58cALdj+kjBM1uh33DR2FmKMBPBjRWW8EQE1VNC+OZkDeH8aWqXpDkqEydT8rZlAKy50wH8X OnbNiZMt T0h+4LeRq/7hRoxHDaAgSVfRruOZ6hGFTyHDzo1g1ErHfVuCdeziq/liRSC/gX0q4aZxr2+t7fE/4A7WWswswbUgHEF0zVjv+O7N7Zj/x45l3VrQjIYOk5lZ97ZAGzc0RThzk3835hPPzoatVJ/0gsfSATHsstUPUR4UjQX7S0awSmqmeJP44C9ZsysVVwbOcVhQfNV2qMqkld2MRJHCDELTQUUnsc2CpnbLNA9nv42CI0dRkHwSw9xbtJvjhjciuOryPwg+RZfY4ZE37nlXy7AUEaSKgO9WtSsTA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000048, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > > > > On Wed, Jan 3, 2024 at 2:30=E2=80=AFPM Jaroslav Pulchart > > wrote: > > > > > > > > > > > > > > > > > Hi yu, > > > > > > > > > > On 12/2/2023 5:22 AM, Yu Zhao wrote: > > > > > > Charan, does the fix previously attached seem acceptable to you= ? Any > > > > > > additional feedback? Thanks. > > > > > > > > > > First, thanks for taking this patch to upstream. > > > > > > > > > > A comment in code snippet is checking just 'high wmark' pages mig= ht > > > > > succeed here but can fail in the immediate kswapd sleep, see > > > > > prepare_kswapd_sleep(). This can show up into the increased > > > > > KSWAPD_HIGH_WMARK_HIT_QUICKLY, thus unnecessary kswapd run time. > > > > > @Jaroslav: Have you observed something like above? > > > > > > > > I do not see any unnecessary kswapd run time, on the contrary it is > > > > fixing the kswapd continuous run issue. > > > > > > > > > > > > > > So, in downstream, we have something like for zone_watermark_ok()= : > > > > > unsigned long size =3D wmark_pages(zone, mark) + MIN_LRU_BATCH <<= 2; > > > > > > > > > > Hard to convince of this 'MIN_LRU_BATCH << 2' empirical value, ma= y be we > > > > > should atleast use the 'MIN_LRU_BATCH' with the mentioned reasoni= ng, is > > > > > what all I can say for this patch. > > > > > > > > > > + mark =3D sysctl_numa_balancing_mode & NUMA_BALANCING_MEMO= RY_TIERING ? > > > > > + WMARK_PROMO : WMARK_HIGH; > > > > > + for (i =3D 0; i <=3D sc->reclaim_idx; i++) { > > > > > + struct zone *zone =3D lruvec_pgdat(lruvec)->node_= zones + i; > > > > > + unsigned long size =3D wmark_pages(zone, mark); > > > > > + > > > > > + if (managed_zone(zone) && > > > > > + !zone_watermark_ok(zone, sc->order, size, sc-= >reclaim_idx, 0)) > > > > > + return false; > > > > > + } > > > > > > > > > > > > > > > Thanks, > > > > > Charan > > > > > > > > > > > > > > > > -- > > > > Jaroslav Pulchart > > > > Sr. Principal SW Engineer > > > > GoodData > > > > > > > > > Hello, > > > > > > today we try to update servers to 6.6.9 which contains the mglru fixe= s > > > (from 6.6.8) and the server behaves much much worse. > > > > > > I got multiple kswapd* load to ~100% imediatelly. > > > 555 root 20 0 0 0 0 R 99.7 0.0 4:32.8= 6 > > > kswapd1 > > > 554 root 20 0 0 0 0 R 99.3 0.0 3:57.7= 6 > > > kswapd0 > > > 556 root 20 0 0 0 0 R 97.7 0.0 3:42.2= 7 > > > kswapd2 > > > are the changes in upstream different compared to the initial patch > > > which I tested? > > > > > > Best regards, > > > Jaroslav Pulchart > > > > Hi Jaroslav, > > > > My apologies for all the trouble! > > > > Yes, there is a slight difference between the fix you verified and > > what went into 6.6.9. The fix in 6.6.9 is disabled under a special > > condition which I thought wouldn't affect you. > > > > Could you try the attached fix again on top of 6.6.9? It removed that > > special condition. > > > > Thanks! > > Thanks for prompt response. I did a test with the patch and it didn't > help. The situation is super strange. > > I tried kernels 6.6.7, 6.6.8 and 6.6.9. I see high memory utilization > of all numa nodes of the first cpu socket if using 6.6.9 and it is the > worst situation, but the kswapd load is visible from 6.6.8. > > Setup of this server: > * 4 chiplets per each sockets, there are 2 sockets > * 32 GB of RAM for each chiplet, 28GB are in hugepages > Note: previously I have 29GB in Hugepages, I free up 1GB to avoid > memory pressure however it is even worse now in contrary. > > kernel 6.6.7: I do not see kswapd usage when application started =3D=3D O= K > NUMA nodes: 0 1 2 3 4 5 6 7 > HPTotalGiB: 28 28 28 28 28 28 28 28 > HPFreeGiB: 28 28 28 28 28 28 28 28 > MemTotal: 32264 32701 32701 32686 32701 32659 32701 32696 > MemFree: 2766 2715 63 2366 3495 2990 3462 252 > > kernel 6.6.8: I see kswapd on nodes 2 and 3 when application started > NUMA nodes: 0 1 2 3 4 5 6 7 > HPTotalGiB: 28 28 28 28 28 28 28 28 > HPFreeGiB: 28 28 28 28 28 28 28 28 > MemTotal: 32264 32701 32701 32686 32701 32701 32659 32696 > MemFree: 2744 2788 65 581 3304 3215 3266 2226 > > kernel 6.6.9: I see kswapd on nodes 0, 1, 2 and 3 when application starte= d > NUMA nodes: 0 1 2 3 4 5 6 7 > HPTotalGiB: 28 28 28 28 28 28 28 28 > HPFreeGiB: 28 28 28 28 28 28 28 28 > MemTotal: 32264 32701 32701 32686 32659 32701 32701 32696 > MemFree: 75 60 60 60 3169 2784 3203 2944 I run few more combinations, and here are results / findings: 6.6.7-1 (vanila) =3D=3D OK, no issue 6.6.8-1 (vanila) =3D=3D single kswapd 100% ! 6.6.8-1 (vanila plus mglru-fix-6.6.9.patch) =3D=3D OK, no issue 6.6.8-1 (revert four mglru patches) =3D=3D OK, no issue 6.6.9-1 (vanila) =3D=3D four kswapd 100% !!!! 6.6.9-2 (vanila plus mglru-fix-6.6.9.patch) =3D=3D four kswapd 100% !!!! 6.6.9-3 (revert four mglru patches) =3D=3D four kswapd 100% !!!! Summary: * mglru-fix-6.6.9.patch or reverting mglru patches helps in case of kernel 6.6.8, * there is (new?) problem in case of 6.6.9 kernel, which looks not to be related to mglru patches at all