From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34A82C47074 for ; Thu, 4 Jan 2024 09:47:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FA9D8D006F; Thu, 4 Jan 2024 04:47:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 882CC8D006C; Thu, 4 Jan 2024 04:47:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D52D8D006F; Thu, 4 Jan 2024 04:47:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 58B448D006C for ; Thu, 4 Jan 2024 04:47:19 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1A9D21A0919 for ; Thu, 4 Jan 2024 09:47:19 +0000 (UTC) X-FDA: 81641150598.02.D1A21BA Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf18.hostedemail.com (Postfix) with ESMTP id 245DD1C000A for ; Thu, 4 Jan 2024 09:47:16 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b=IhWawJCt; spf=pass (imf18.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704361637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qfyEshqfXDkoPPYSTmaboswpPN9EnbkO72APd1rHFCo=; b=4HXl3YEbn7pKxGGSUXnQWwEOjrUEl6Y8kzeuxZLmYV1ELvsuGN4WNvjOAgdcAHYYTnN5Vg 0aGsGLHq+SXH5KdhODjvIfLjoCPYKju2Bvi8DqThddmazDg5qn6ilTooXTckp/e9YGb9jw TIcCOCah+FGTXlUTrwwGmZQJeyneLeI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704361637; a=rsa-sha256; cv=none; b=Wvm4USkob5RtgymB3W7IfGxSeBXZy293smCSlNelmecSNYPDxSpNk7QDhRmNDreUfguk1I qqg9EFU337gVGcsWJsr/ZXQF3F8xAUuChYufHE7aNEa/gi5M8ZcnCP8xLYoql9HrqEad0a Mg4MpVOcVPDTFUFelgHmW2DNZpTyBgw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gooddata.com header.s=google header.b=IhWawJCt; spf=pass (imf18.hostedemail.com: domain of jaroslav.pulchart@gooddata.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=jaroslav.pulchart@gooddata.com; dmarc=pass (policy=none) header.from=gooddata.com Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a2888d65f1fso34747666b.1 for ; Thu, 04 Jan 2024 01:47:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gooddata.com; s=google; t=1704361635; x=1704966435; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qfyEshqfXDkoPPYSTmaboswpPN9EnbkO72APd1rHFCo=; b=IhWawJCtbhm++g/pLtu/X2TkYe+hj4yoNTIT9JdqhnaWUqxtnlhnh2WboMou94U8V3 +hWFkPEprIVzsFrRt8hSFjfM0icZdXIQHr5l8K6TRRkjVQoG9STtncTzdCXGfBzEAlx5 wwodnYjA/k7ZqgbSCh2IbcPOEtPeZk9QH1vV8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704361635; x=1704966435; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qfyEshqfXDkoPPYSTmaboswpPN9EnbkO72APd1rHFCo=; b=eYyzj2owQYMdMJYRvgKcbe03Th6yCpIXp8quTo1FM8p250FpbxgzfPQHi9OiG1u5ve iMepwC+e4C2e5+ryC/kviBsITRckmot90m8RzRYAyOxuC9E1DKx5SwZnNtkuGibaAysD VMKTn29Pa1fJmbkmoz4PpVEhIXbRflUtF61UPzE3W01I3PM+IArPdqGx4PhwzIIy2VEA YRsvp09iT+1aMP4bGSRCTHE30knnEqIUuBuuozufFLWuWFW3C81TfNqcDtaDunc+F/uj TiAQSpqVqucF9s0XWT7H0xgyVcU3xNpp7Afnc0U545Aw1lU+RQv9o8Lbn+JF4UvfUdTF GgDw== X-Gm-Message-State: AOJu0YxmSS+NF69sreRyr0w8g1Kua9/sO4DWpeJ4rIHwOBr+PUK+vM5u UKXTglMapGPNlRBgtYxTvhfNRdS+VoqLkwlCbqlTxskZGdVS X-Google-Smtp-Source: AGHT+IFwt7BIWCJ1ZOPz4kP7eJxPb7IzFUtP+mzT/IEEaG+Ldph0gnImh3YCLrjiC3gywlpcb6j/gYT46BndqSz60Qw= X-Received: by 2002:a17:906:4698:b0:a27:43ad:e6e7 with SMTP id a24-20020a170906469800b00a2743ade6e7mr172894ejr.117.1704361635447; Thu, 04 Jan 2024 01:47:15 -0800 (PST) MIME-Version: 1.0 References: <7df7e478-bd93-03df-5b10-19308f416e95@quicinc.com> In-Reply-To: From: Jaroslav Pulchart Date: Thu, 4 Jan 2024 10:46:49 +0100 Message-ID: Subject: Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU To: Yu Zhao Cc: Daniel Secik , Charan Teja Kalla , Igor Raits , Kalesh Singh , akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 245DD1C000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 9irjufxa3aan8ghfh3ac7izwqoh6gsyn X-HE-Tag: 1704361636-453446 X-HE-Meta: U2FsdGVkX1/1j9Xj7SD+y8jCyunR9yNl6QH4zBA75MitygXqOERna15M1aAQn33EqnuDBCgNZs8MQLsKO3zaP9K71LU4NN3O4xzWJ5WWN6Ynxukl7TvbNnpeWP97RxtAQf+fHtnJo8kHUCO3gHQVvfuAaqOMPmu05Wts9zcGEYS0nEow0z82/6yMndE6GdGPNh//l9J7/oZcJAzyw7mm13Gaz4WUBWKTOYMhJ3W9Tqv766bwN7Z3uKlF537OTbN6KyvlF1RBiFEsYyNmhe9PzSRLuI30APgUw3y1m8ysyElB2UX+w8qxrS0qKsgH2zBcfG4k+QoQJ2b9KahaV571mEZzQwbcy4mmpbfbqueZNviF81c4vaLco3fM8vaglOfUvrK1YnLR0kGSgb6cDBAeKQMKaivj/ZwZwKGPq/ThNtLG2Upmb6FN4U4x5VpzIXkHABWE7vYOvtvF8fpDMD2Xz/27zP0MIdq7Nuk6nfXghE8cvUWmzCwB5llnLRG1ienwNQEJGcUpN8difY9zE+xueh4x1NSWYe9222IjQncomw17yQpR1CrCDa3exZ3yY0oZO8/zcbDV13KoIaAvsxeIbZxiyK5Nz1R7xjirKmf6ESGPGxt7BIkmJ4aPRqqm/y7+G/Fvhi/zDTPnHyviNXeXmbWaR/RM7pCN+jBJRCCuRw36/lVtnUPoOTJJd7AULyGgQcSDv0CHdY7SxeWeSePhSzC3OK9O3seiTXlCpz1yQs5pd8xAv8Kb2L0t7IA+/H8e0LTHBL7M0csmoM2nvUEzjs0NAcYYO8eStobt7NtuQoXEDQGpml3meSwnDygvv64KZYePTItOoGzCUkod5uv76QqeQRqGzr1SFjXhlhtF/82zHrvd+tHDaqoFWCq+SMIaE4fd9mYMtWIhegDHYRmIOxeKBUuO4ZG0WrcIi3dH7xywSpsNoyDeJA/WVwDhmpLx5j5xsqWtxvg6uUgq+hF /S9OCzlY yeCoDDFuybJHXkwfszeH7JbLl4zqAvAln5Jz70e1uRrAwwyyMaSh0fqr7ijjRlKCa3P0+MBd3qNT6zdbMxvVk58oturrhJAFxN/QFjL5Sd9/0IunlWToxMp7tYXBRHRRQggPIgXGKzTUtKPhnodO/4VXyVLJLmiXONAF0WUPTKdz8Sa9UO014/ji4TyVrsebchV0oh3NZQ2GV+YOg+wr3mqb6RK352Lc+aGmYloRakXg60kAsSwOAE686nJALb3HGCWnnCBc+kUE1pwoDP0BeTmS2ajYcuaQ7SDqO X-Bogosity: Ham, tests=bogofilter, spamicity=0.001023, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > On Wed, Jan 3, 2024 at 2:30=E2=80=AFPM Jaroslav Pulchart > wrote: > > > > > > > > > > > > > Hi yu, > > > > > > > > On 12/2/2023 5:22 AM, Yu Zhao wrote: > > > > > Charan, does the fix previously attached seem acceptable to you? = Any > > > > > additional feedback? Thanks. > > > > > > > > First, thanks for taking this patch to upstream. > > > > > > > > A comment in code snippet is checking just 'high wmark' pages might > > > > succeed here but can fail in the immediate kswapd sleep, see > > > > prepare_kswapd_sleep(). This can show up into the increased > > > > KSWAPD_HIGH_WMARK_HIT_QUICKLY, thus unnecessary kswapd run time. > > > > @Jaroslav: Have you observed something like above? > > > > > > I do not see any unnecessary kswapd run time, on the contrary it is > > > fixing the kswapd continuous run issue. > > > > > > > > > > > So, in downstream, we have something like for zone_watermark_ok(): > > > > unsigned long size =3D wmark_pages(zone, mark) + MIN_LRU_BATCH << 2= ; > > > > > > > > Hard to convince of this 'MIN_LRU_BATCH << 2' empirical value, may = be we > > > > should atleast use the 'MIN_LRU_BATCH' with the mentioned reasoning= , is > > > > what all I can say for this patch. > > > > > > > > + mark =3D sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY= _TIERING ? > > > > + WMARK_PROMO : WMARK_HIGH; > > > > + for (i =3D 0; i <=3D sc->reclaim_idx; i++) { > > > > + struct zone *zone =3D lruvec_pgdat(lruvec)->node_zo= nes + i; > > > > + unsigned long size =3D wmark_pages(zone, mark); > > > > + > > > > + if (managed_zone(zone) && > > > > + !zone_watermark_ok(zone, sc->order, size, sc->r= eclaim_idx, 0)) > > > > + return false; > > > > + } > > > > > > > > > > > > Thanks, > > > > Charan > > > > > > > > > > > > -- > > > Jaroslav Pulchart > > > Sr. Principal SW Engineer > > > GoodData > > > > > > Hello, > > > > today we try to update servers to 6.6.9 which contains the mglru fixes > > (from 6.6.8) and the server behaves much much worse. > > > > I got multiple kswapd* load to ~100% imediatelly. > > 555 root 20 0 0 0 0 R 99.7 0.0 4:32.86 > > kswapd1 > > 554 root 20 0 0 0 0 R 99.3 0.0 3:57.76 > > kswapd0 > > 556 root 20 0 0 0 0 R 97.7 0.0 3:42.27 > > kswapd2 > > are the changes in upstream different compared to the initial patch > > which I tested? > > > > Best regards, > > Jaroslav Pulchart > > Hi Jaroslav, > > My apologies for all the trouble! > > Yes, there is a slight difference between the fix you verified and > what went into 6.6.9. The fix in 6.6.9 is disabled under a special > condition which I thought wouldn't affect you. > > Could you try the attached fix again on top of 6.6.9? It removed that > special condition. > > Thanks! Thanks for prompt response. I did a test with the patch and it didn't help. The situation is super strange. I tried kernels 6.6.7, 6.6.8 and 6.6.9. I see high memory utilization of all numa nodes of the first cpu socket if using 6.6.9 and it is the worst situation, but the kswapd load is visible from 6.6.8. Setup of this server: * 4 chiplets per each sockets, there are 2 sockets * 32 GB of RAM for each chiplet, 28GB are in hugepages Note: previously I have 29GB in Hugepages, I free up 1GB to avoid memory pressure however it is even worse now in contrary. kernel 6.6.7: I do not see kswapd usage when application started =3D=3D OK NUMA nodes: 0 1 2 3 4 5 6 7 HPTotalGiB: 28 28 28 28 28 28 28 28 HPFreeGiB: 28 28 28 28 28 28 28 28 MemTotal: 32264 32701 32701 32686 32701 32659 32701 32696 MemFree: 2766 2715 63 2366 3495 2990 3462 252 kernel 6.6.8: I see kswapd on nodes 2 and 3 when application started NUMA nodes: 0 1 2 3 4 5 6 7 HPTotalGiB: 28 28 28 28 28 28 28 28 HPFreeGiB: 28 28 28 28 28 28 28 28 MemTotal: 32264 32701 32701 32686 32701 32701 32659 32696 MemFree: 2744 2788 65 581 3304 3215 3266 2226 kernel 6.6.9: I see kswapd on nodes 0, 1, 2 and 3 when application started NUMA nodes: 0 1 2 3 4 5 6 7 HPTotalGiB: 28 28 28 28 28 28 28 28 HPFreeGiB: 28 28 28 28 28 28 28 28 MemTotal: 32264 32701 32701 32686 32659 32701 32701 32696 MemFree: 75 60 60 60 3169 2784 3203 2944