From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8674AC41513 for ; Tue, 4 Jun 2024 18:02:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 130996B00A2; Tue, 4 Jun 2024 14:02:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BA166B00A4; Tue, 4 Jun 2024 14:02:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E756F6B00A5; Tue, 4 Jun 2024 14:02:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C5F7A6B00A2 for ; Tue, 4 Jun 2024 14:02:23 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7F5B41A0FCA for ; Tue, 4 Jun 2024 18:02:23 +0000 (UTC) X-FDA: 82193975766.22.A6C9B03 Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) by imf07.hostedemail.com (Postfix) with ESMTP id A49164000D for ; Tue, 4 Jun 2024 18:02:20 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CkBawMtr; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.175 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717524140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; b=Qnfx7K4Tml+Ba8kdaxamzdn8PlfdVq5Ec8zpV4Lefabkamnq4lyXnfkjhcDvkvKBvqfpaf ziVLjNmnvwXUbWGMbKOg0SFzXbtxnOQGRPx16ogyBpKFQ2aC0JeFS9pblC7/JF/F7dta+s jvxH54vC17nZDcPHU02jpf5gOcGdQHU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CkBawMtr; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.175 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717524140; a=rsa-sha256; cv=none; b=IRsLy/b3+oCbfPVrB19SpcEOHGSb4SvJdVOh9wwVCi6OeCLVYnH8omO5+X5FgwoUY/bo+o 7pCOeUm0gF1Xv3corIlqnpkUqqjFio0/qpqXjR+EvC6YTdcI4tXV4+c1PjbL/TKxNIi3Jz bpsWdxGGt4jlZou1EB7/4OoDODv+R4k= Received: by mail-oi1-f175.google.com with SMTP id 5614622812f47-3d1fd55081fso668991b6e.2 for ; Tue, 04 Jun 2024 11:02:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717524139; x=1718128939; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; b=CkBawMtr+Mpd5Fc/pyqgXsPLpflUlZHefYdx5VQeCZoegNSSdyhdw16/TOj26eyJsl ZiqbWk4aGSCEaZft8DWt1L9IpOuWmSdc5sxiCOiSd7/CWZfljcg4Jmck6rxQg5kPJcdP VIAZ82QtjBjeKj/RHkKwOE49H+w3unffDDux2jenk2xggjP61KFaYZqZt+ydmKfeLUuz YQspr2lnENR2s4f6D43bQC/6KHNUUhe49Zq+RsqaoRnozvMtM8KjVRyNZMOpP/5Q1Erf MPs6YS9mBs2PSWoAIiomxph1DJ+JXioYidtIm98cEhhyZuxr0vlJhcVqgRcofyifXQhD qfUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717524139; x=1718128939; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; b=toSbJ81ixO7DQgmQK+dPk9xGM02rOASQo1ALZD7iNskJKt/g8pADSUL7f5gMzQd4lF UQ0vNPArEZFsEDsCAIEtLmyysOoxMiA6FOe6nIKSeWPaHp4sEaeDRcQiX0/59qF6zTgs cDLPfktUJW6wSRJuGxAHrJ54AKBxNjCxj5Xz1tAU3lmQYM64dDE41jCZkAdHU1+o5zVz jCVJRfKHuTf6U0Rd4Ngxoe+2/I+FJlEskmEXRyyhRIbTEjM5fGLx1kDqIVXCTTsS65zt Oaqc4PplUhZV24C2tZ9ztsQnt1OrfXjv3G27cbX93jAYbPZcjnmhIy5SyXaFevxZWU0t yL4A== X-Forwarded-Encrypted: i=1; AJvYcCXOT/i/hv7VRYrmr14NIPaIBCHqp6/mFrM1i4VX8dE7/9/Nj9gPMhNfS8sfFS4nJcUurPZZLhyk56JxcvVpHEczJY4= X-Gm-Message-State: AOJu0YwqcAd2R7lLFlttMXOZj8BdSK5/qCObZbJEyDNhdJ7zdtSHCjhl fYkXedNylyhmN0KCthW+PO+R8rplh538qhvMojPm3gyI90zPlX9d/MZK5/YbFqS97NzTI4Z73Lw cvCBJa7D94w0sJpgdmDB9r/9L7MC7I9+ZHLzM X-Google-Smtp-Source: AGHT+IGLP4t3hL8c6rQwnWojRitsYcXuRXha13VBXTuVelVGnvix3EEiSfqrbr4AO/JmxEnQlMJSjTPnhcgT9+b9UCM= X-Received: by 2002:a05:6808:911:b0:3d1:d9e6:7ee9 with SMTP id 5614622812f47-3d20439d504mr112188b6e.33.1717524139167; Tue, 04 Jun 2024 11:02:19 -0700 (PDT) MIME-Version: 1.0 References: <20240508202111.768b7a4d@yea> <20240515224524.1c8befbe@yea> <20240602200332.3e531ff1@yea> <20240604001304.5420284f@yea> <20240604134458.3ae4396a@yea> In-Reply-To: From: Yosry Ahmed Date: Tue, 4 Jun 2024 11:01:39 -0700 Message-ID: Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc) To: Yu Zhao Cc: Erhard Furtner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Johannes Weiner , Nhat Pham , Chengming Zhou , Sergey Senozhatsky , Minchan Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A49164000D X-Stat-Signature: 4g9wxu97pxcn6dds5cr4xu3jk8e3jodk X-Rspam-User: X-HE-Tag: 1717524140-414815 X-HE-Meta: U2FsdGVkX1999ppU+wT9vIQK0Oy6bZAFsDQA8HX6SbTNs+KYwd9b+HgMmJ8mU5UYBMWhbyrfZd9JDsOGYTNMJktBQiU7dms+3Dptl28zKMQ1KbMLFuf5QFz0wIM3GJf4rrHsTzt3/P8i+NoAS6NKwXuUqArekJjpeSdVlItfQagsrxpP11w8xo/60vn61v+Y79pM4nxW67Db70UI7g6klwfHYm1Hu+XEpA+R4KWasjmyQbDPW11QNOa7CcH4JKBuUpxluGsLBWX+1YQYo84JQSG4vs8tp3CQgQv/MgjJefiqVOe6c+6H/VODmE+/n9IZe3OfiVs2QnUWw6rUOKJ/MGE46MaOAYL9fLgNN/rwygRYXhWPOpsR8ynCvr5zgVtBtdKKbL3P4ztPNT9HAQASaGEDRh/3cOYrZs5BQQO5yqgbGAy6+jvKv5aDgV0ghwoviJfJNL1SF6yjKZ13+87OiFnrv20f6QbxHapvqX9tNE3/bN/1rvlGz2UtkAwcrUppCs8z2PIHFBJKVm0vA8UheCqHtgozQT6Q/8cGgTikxo9CgDOFbtnE4x9ccl63orYogJcbjUOCg87MvEAfAmBLXcoQkgtzP1fA02+5j98DOqE5o/5Fq+5JvJ6FjCb2WPslEph5yFKMmud5M5Mv1mAXpCirwW2MYHUwYBkzZFi5gJHalevlrEyZ4T/NV1F7gasPpCnH1kTVA5q1SAvHVCfMyLEwVCnnqI8N20LOPzL1C2XJfamwqI232VwUBIHlcDwssFL4aoFiKJzXZF4J/nyElBW21UJOa94xBwQDiTahd02K7rUL3wJ5W2YneHng2mBlr64zIAGgxoyi4eWC5ZbYRxIQFb7yAwgRcIMeUcIv4t9brljxrz8tQexDyUObgHQnW6FxKjBpZrKKuzXbZcegMTs0KZkDuQOn/XPxxkQsgu2G8yDlveQ5no3OZyIiaNwRIP0eCVKbdOjL2YEUb5P ScbNJiVL IrxctWgd3qEJauw/kd8xOQd8xFT+uILBKJfmVNbNmvq8VsftCdcI+NdunoiFUPcFJsmd1aH7EifHvdDOQw2PYmsgquSVdERyfF1c7CreVn5vhUSYN5OD//AO/bekATBJ4y3U2aCLsFgNdE9TuzVXgikhaXvBbngUCAiuQI1C9a5D6EwAYT+iekfk3SB4TkuIq+x8rUZObNE8uwIrl7ZTXDWvR/oonclQfbfwLmolgGZ0Kd9qrD3ooDhnkmW3ET0Iqjq0vyjnBdM314Xzh9XhipqjtF7We06nArOCsZTmGVorXMe2KWE0255p2/62A7u0IWH8VNxSNilDO5TlN4a0FfEGuC4BfPBsJ3F07breqBCOzn1o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 4, 2024 at 10:54=E2=80=AFAM Yu Zhao wrote: > > On Tue, Jun 4, 2024 at 11:34=E2=80=AFAM Yosry Ahmed wrote: > > > > On Tue, Jun 4, 2024 at 10:19=E2=80=AFAM Yu Zhao wro= te: > > > > > > On Tue, Jun 4, 2024 at 10:12=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > On Tue, Jun 4, 2024 at 4:45=E2=80=AFAM Erhard Furtner wrote: > > > > > > > > > > On Mon, 3 Jun 2024 16:24:02 -0700 > > > > > Yosry Ahmed wrote: > > > > > > > > > > > Thanks for bisecting. Taking a look at the thread, it seems lik= e you > > > > > > have a very limited area of memory to allocate kernel memory fr= om. One > > > > > > possible reason why that commit can cause an issue is because w= e will > > > > > > have multiple instances of the zsmalloc slab caches 'zspage' an= d > > > > > > 'zs_handle', which may contribute to fragmentation in slab memo= ry. > > > > > > > > > > > > Do you have /proc/slabinfo from a good and a bad run by any cha= nce? > > > > > > > > > > > > Also, could you check if the attached patch helps? It makes sur= e that > > > > > > even when we use multiple zsmalloc zpools, we will use a single= slab > > > > > > cache of each type. > > > > > > > > > > Thanks for looking into this! I got you 'cat /proc/slabinfo' from= a good HEAD, from a bad HEAD and from the bad HEAD + your patch applied. > > > > > > > > > > Good was 6be3601517d90b728095d70c14f3a04b9adcb166, bad was b8cf32= dc6e8c75b712cbf638e0fd210101c22f17 which I got both from my bisect.log. I g= ot the slabinfo shortly after boot and a 2nd time shortly before the OOM or= the kswapd0: page allocation failure happens. I terminated the workload (s= tress-ng --vm 2 --vm-bytes 1930M --verify -v) manually shortly before the 2= GiB RAM exhausted and got the slabinfo then. > > > > > > > > > > The patch applied to git b8cf32dc6e8c75b712cbf638e0fd210101c22f17= unfortunately didn't make a difference, I got the kswapd0: page allocation= failure nevertheless. > > > > > > > > Thanks for trying this out. The patch reduces the amount of wasted > > > > memory due to the 'zs_handle' and 'zspage' caches by an order of > > > > magnitude, but it was a small number to begin with (~250K). > > > > > > > > I cannot think of other reasons why having multiple zsmalloc pools > > > > will end up using more memory in the 0.25GB zone that the kernel > > > > allocations can be made from. > > > > > > > > The number of zpools can be made configurable or determined at runt= ime > > > > by the size of the machine, but I don't want to do this without > > > > understanding the problem here first. Adding other zswap and zsmall= oc > > > > folks in case they have any ideas. > > > > > > Hi Erhard, > > > > > > If it's not too much trouble, could you "grep nr_zspages /proc/vmstat= " > > > on kernels before and after the bad commit? It'd be great if you coul= d > > > run the grep command right before the OOM kills. > > > > > > The overall internal fragmentation of multiple zsmalloc pools might b= e > > > higher than a single one. I suspect this might be the cause. > > > > I thought about the internal fragmentation of pools, but zsmalloc > > should have access to highmem, and if I understand correctly the > > problem here is that we are running out of space in the DMA zone when > > making kernel allocations. > > > > Do you suspect zsmalloc is allocating memory from the DMA zone > > initially, even though it has access to highmem? > > There was a lot of user memory in the DMA zone. So at a point the > highmem zone was full and allocation fallback happened. > > The problem with zone fallback is that recent allocations go into > lower zones, meaning they are further back on the LRU list. This > applies to both user memory and zsmalloc memory -- the latter has a > writeback LRU. On top of this, neither the zswap shrinker nor the > zsmalloc shrinker (compaction) is zone aware. So page reclaim might > have trouble hitting the right target zone. I see what you mean. In this case, yeah I think the internal fragmentation in the zsmalloc pools may be the reason behind the problem. How many CPUs does this machine have? I am wondering if 32 can be an overkill for small machines, perhaps the number of pools should be max(nr_cpus, 32)? Alternatively, the number of pools should scale with the memory size in some way, such that we only increase fragmentation when it's tolerable. > > We can't really tell how zspages are distributed across zones, but the > overall number might be helpful. It'd be great if someone could make > nr_zspages per zone :)