From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E2B5C25B78 for ; Tue, 4 Jun 2024 17:19:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 862CD6B008C; Tue, 4 Jun 2024 13:19:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 814276B0092; Tue, 4 Jun 2024 13:19:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D9E36B0093; Tue, 4 Jun 2024 13:19:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 50CEA6B008C for ; Tue, 4 Jun 2024 13:19:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B8FCC81062 for ; Tue, 4 Jun 2024 17:19:07 +0000 (UTC) X-FDA: 82193866734.16.11763F2 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf05.hostedemail.com (Postfix) with ESMTP id EC12710000B for ; Tue, 4 Jun 2024 17:19:05 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PPglaSCz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717521546; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ogohNARcfHl3s69mR1boc+24tgTnFa1tIMowF9fUlWI=; b=OTQRLAdImnb9PFddsB6IygqbSBmgVVbERlm4skvSCJwropPIXCrnRUNHYPBLajO26/KGfs FT5jLCC+nVo1kFkmQYS+CCwRsLHPpT+qyzZN4ybHt8VPSFUxhiqATieoYE3RaSg6TiKt23 MS08TJZXBVsDiHXgfQaLg/6+OCbdZL4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PPglaSCz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717521546; a=rsa-sha256; cv=none; b=36QcRtC+5JN4c0lR8J/Tj7H5Gyiu2FkZGO/cYC/aW/S6oDtRLRiE0RMAQrAKe2WwNx5q91 YZJMgtDh8WZ/lRF47CpOlavWO9ZHa3oRKW7Nn64ZGaHbWrBo6o1hieaB/8gj2TsBciGK9f uJWMKP06ACfpZ8ZqDKXgE3MlL1CAVng= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-42152bb7b81so7905e9.0 for ; Tue, 04 Jun 2024 10:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717521544; x=1718126344; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ogohNARcfHl3s69mR1boc+24tgTnFa1tIMowF9fUlWI=; b=PPglaSCzBfr8JHml0ocN8pVtEVqxLXWM3w+XC2uIfNtpSdOtUrUZJAp7YTavypcfqr KOyDSzX4ojdQRR0a5LuZ40B2c+qu7u5bHior+lHK55nxEOM6aHY6XNN0sughJNd2aPu3 oVlzBC/PvHiojqyD/EIvg37R1/U7BiaNiaUdw2mxjab9JCCdluK8SOxHYQwtmLcoYstc KzshZjmqm6erJZeYaqiNDfi76beYtpDooBn9+5zgWHtysim9i5OYgnBy7eeEBAcHKYlP HBCVSSN3vMhP2jIrBFPZViE9nlf1zPDuDTQgAP4hvjnCbCCg/3gd79l0/yZS6tOpdgBI +e/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717521544; x=1718126344; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ogohNARcfHl3s69mR1boc+24tgTnFa1tIMowF9fUlWI=; b=vrfr+Uyek5RgGcD0rpbERYdh+BcML3wcIz+dba8b7OPFCj8D+QbXO5US0dY4JQCAnz nrQLSQUuGv+vQGsDO8ONnWVDf7ECZQMm4teaB8cuwObn+4QZFhst2mDiJVvupP2LHiU6 +68LIsOO3T9RiVAyaJyd0QLpEWDzxTk8dWli3SVC01Q1toPica/vxEakk9TvgcI1OOsW BVb1CkL/Pk65tGBukD6WIByUWCJnYafRmTA4n7jLOSiphzJHTTue4qIAKZJJtu+ouowf GqsBW0fAPHgWbbbVfycv3GAWTUSirGATO+oItA6oRheOdxYZToySYP+v9AG9JO0jpBn6 o+eA== X-Gm-Message-State: AOJu0YwYtO9qs3b72yQ6K/mdQyiWlw3apyiuL3JKhhkazm+EUoYZwM0c NrAP5UBBL+7NndYmGFnA9g4aadpSTHt/rk+ENMPvOQjyKJlGmP7+JXbWEgAJDWdAKkGaw5xX9Xz Pot93BUTlC4tHom58SxLDqV1ipMDshF2L1wfY X-Google-Smtp-Source: AGHT+IF5/uJiqqJnXLO0gjralIhLsD4wtNsd8X9tynQYjHW0e86jKHZvA4UnWrB4kknhGZbX7Nkad9V0dBq4mHb1naw= X-Received: by 2002:a7b:c8d7:0:b0:41b:4c6a:de6d with SMTP id 5b1f17b1804b1-4214b21954cmr2630725e9.5.1717521543952; Tue, 04 Jun 2024 10:19:03 -0700 (PDT) MIME-Version: 1.0 References: <20240508202111.768b7a4d@yea> <20240515224524.1c8befbe@yea> <20240602200332.3e531ff1@yea> <20240604001304.5420284f@yea> <20240604134458.3ae4396a@yea> In-Reply-To: From: Yu Zhao Date: Tue, 4 Jun 2024 11:18:25 -0600 Message-ID: Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc) To: Erhard Furtner Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Johannes Weiner , Nhat Pham , Chengming Zhou , Sergey Senozhatsky , Minchan Kim , Yosry Ahmed Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: EC12710000B X-Stat-Signature: ntig7ysogqcj3yqm7umugb4fj4j1ka6q X-HE-Tag: 1717521545-797872 X-HE-Meta: U2FsdGVkX1+7bLBXcTCQ8gX/P6cLz/eZEg3amt+JNQLcAKQl2xdx7WRFketRqx6o3HgsygOK8mTFjeNo+HSmB5Haq03pbMS6SrqV8Z2jY9cAiq5c4IFfFhb7en3vae4dcoJoYCfs1guqBNApu3gD2KfJF9vSQTkNOJKeF20IXxa3MyJ8ezUc3Qrb1vEZb17MiiBovdTHivJ7WHO1uNbFEYoZ+6CiFq2D0ke26y+uDwFsQI66d0GlTFZErxozm/d1o+QrEaxEh0KRM2at1hboFdlGhXcGXlDHoeFk1t11zEbXmAjIDY0sr9mf/ZTjqiygt6xsMzzar2YdjogN4SvYOhOWnVYpXjqqx6ZaQAOf90BkJZbbDCkwDR9+yEpfNQvkk+rmdPDWO9atsiTdQsngsI4PRfXYTp9ToLfum38KQZKmqinrU62JeCNTXnOIUBSx2dj+dzktvlNVJo+DiVKVL/E+5P0r2ZEmqeaEt7m6qg6Drl0pZspe9T97T1CvGikMss7A+XkaAGN3Gi08ilnB7SkkzvJyU72RGR5NRucADjneHtUnmWWGqgxZvMb/6EmzNH/xVVa8U8r17bS8pyDafc53iJcFZgreTQBvCrWaUUOHoTh4i3ajRxbTpAbU26nhfiQvVuyMeu20nzhRvLFYtx35sO9fFOJ9tOv6OrFzY1Im83FEit6sQkwx208OlpeCqyjP7Jb6FznX4mNObbW8ji1/7trQ2fs71xO647kRUikO4JPbSfK3smF5fEwd9vesR5++dRvMZeU2S+985sZagXHSALQDaNVuf942DDJNJ+NPo2+oVUxzjElDgIaDB3ITOlFhF4T4TRq7J/QdAUIgA6RpsGhOetpLBphJgre/lBtqkBfSmHtLoNFce3Xjyz6NA8L3Y2QBsADOP/LbW4SgzBNV6S4SpdMGtCRCq/MGIf+vppBNbwYvl2hWXVUhTBrLlytSDhTNWwdo5e/EZAl YKIFjr87 bKIi1hQg5SIZSx7stTXkX7/dTwtO+EvmEjQ0QxqHFo95sq+XgppzKrWwFLaaeKNdBcrno63uuszzqILg8b5cfRcRzqlzqIHsAlDO5J60c4GFeqGZbE2udX9CIvOXgtVrcJ5gObJwfLAYxydysRN/K0V6+yJIHW6Da8qnFHSSlh8Jh0hWWaAXyOUH41kbFrDGMoQCoEmXPD1geWzaF1a262cDXXOkxWEsAiZwgUt6lRbw0bBSNM6RBa33bO3lTehxrm/qFVGdZBADAj/w3LLgpaOiCLy900t/gzCE3SvdRNMT/Ua7uG6e92lQATs2St1NJGpVRA23qkTdf08HETAYgfGuwJbX142cmYD0VvnTCwthX9CY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 4, 2024 at 10:12=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, Jun 4, 2024 at 4:45=E2=80=AFAM Erhard Furtner wrote: > > > > On Mon, 3 Jun 2024 16:24:02 -0700 > > Yosry Ahmed wrote: > > > > > Thanks for bisecting. Taking a look at the thread, it seems like you > > > have a very limited area of memory to allocate kernel memory from. On= e > > > possible reason why that commit can cause an issue is because we will > > > have multiple instances of the zsmalloc slab caches 'zspage' and > > > 'zs_handle', which may contribute to fragmentation in slab memory. > > > > > > Do you have /proc/slabinfo from a good and a bad run by any chance? > > > > > > Also, could you check if the attached patch helps? It makes sure that > > > even when we use multiple zsmalloc zpools, we will use a single slab > > > cache of each type. > > > > Thanks for looking into this! I got you 'cat /proc/slabinfo' from a goo= d HEAD, from a bad HEAD and from the bad HEAD + your patch applied. > > > > Good was 6be3601517d90b728095d70c14f3a04b9adcb166, bad was b8cf32dc6e8c= 75b712cbf638e0fd210101c22f17 which I got both from my bisect.log. I got the= slabinfo shortly after boot and a 2nd time shortly before the OOM or the k= swapd0: page allocation failure happens. I terminated the workload (stress-= ng --vm 2 --vm-bytes 1930M --verify -v) manually shortly before the 2 GiB R= AM exhausted and got the slabinfo then. > > > > The patch applied to git b8cf32dc6e8c75b712cbf638e0fd210101c22f17 unfor= tunately didn't make a difference, I got the kswapd0: page allocation failu= re nevertheless. > > Thanks for trying this out. The patch reduces the amount of wasted > memory due to the 'zs_handle' and 'zspage' caches by an order of > magnitude, but it was a small number to begin with (~250K). > > I cannot think of other reasons why having multiple zsmalloc pools > will end up using more memory in the 0.25GB zone that the kernel > allocations can be made from. > > The number of zpools can be made configurable or determined at runtime > by the size of the machine, but I don't want to do this without > understanding the problem here first. Adding other zswap and zsmalloc > folks in case they have any ideas. Hi Erhard, If it's not too much trouble, could you "grep nr_zspages /proc/vmstat" on kernels before and after the bad commit? It'd be great if you could run the grep command right before the OOM kills. The overall internal fragmentation of multiple zsmalloc pools might be higher than a single one. I suspect this might be the cause. Thank you.