From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A9CC4345F for ; Wed, 17 Apr 2024 14:33:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7A7E6B0092; Wed, 17 Apr 2024 10:33:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E29A86B0095; Wed, 17 Apr 2024 10:33:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D17FD6B0098; Wed, 17 Apr 2024 10:33:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B022A6B0092 for ; Wed, 17 Apr 2024 10:33:35 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5D736811A6 for ; Wed, 17 Apr 2024 14:33:34 +0000 (UTC) X-FDA: 82019267148.27.3B18063 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by imf11.hostedemail.com (Postfix) with ESMTP id 2730C4001F for ; Wed, 17 Apr 2024 14:33:32 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=R1a7Zpn6; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713364412; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fYjWrnxXzVPnBVhxJlTn/8OmhjvM1CO6cGLjT7MOA9I=; b=txlDjkpmsxd2OIGo7iBgMkhJw9k0bfXB+0Th9hd0+b3NUka1HML6xzxwOlAZQK+v0FxI3X dZUlSTpuMOScxZ5fs1vWIhacKGeh373AdNLqplpajde9LFeJC8FpS2EHVrcbbxyMUHhX7q 3I/jU0unKVKGC43dS89hUdejkHiMsVI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=R1a7Zpn6; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713364412; a=rsa-sha256; cv=none; b=U/8Ouw6kV8wNUWb9aGZSLxUDTlYcvymMPYzir+tWjb7oq4uLFSDQM9VmUg56UrW0tWChCz XLOBDt+hPOWbXi4gQ+2ZS7xAm2YZmgQKWsZrUERIOYG1InqW4ExwDYpblhmrKbidC4J5gg u3atarx4q6j4TrKMtyK76Hi0a29ddng= Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-78efd533a00so62228785a.0 for ; Wed, 17 Apr 2024 07:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1713364411; x=1713969211; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=fYjWrnxXzVPnBVhxJlTn/8OmhjvM1CO6cGLjT7MOA9I=; b=R1a7Zpn6eeH+AydQ7lj8Kl9dLKPWAwguluzw4coWWKKJSYVKsx2bIFuX6qULmBbnK3 r95498F/e+FqpEYLD3nEapv8fS01TmALbnXz5pxpt5akPwpiouGiayxkLAQNUL3GXzZi y5h2vQbhPsq5GnjVtN8f/kdt9ARmj/aETaVS62h+4eTkM5iHlMBzhSCmGFd7BuI/gfS8 2bYT7rM0V5EhfC1Z9XiaCiBuP7PtB66y0qD5SnjsUjdxz7vAGW/dgFXNbWcjlVgDXUG1 bdW9mm77OkVzQXOe4a+52u2cgIFruokZp8Z3meOAc3hpWctbc/zymLKSSntdAqnObbx7 1vwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713364411; x=1713969211; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fYjWrnxXzVPnBVhxJlTn/8OmhjvM1CO6cGLjT7MOA9I=; b=oOM5HArTfEBL7altAQhIEeRxtuI7CFhaRZivySumpYxh6IZ08383YSxuZQ1wxbv1Qj npd4+8OWWksgZBiVkt8jn6fPviPB2P/n9M1WHD1nE83xdIZi6ETeOpEPC7zYnqa4EZK9 NcE7J6S3tR0bM9DdvY5SIPxCtezjAik4786YEwGKh5HhTWdQe3JYowAvjk8J6Wwk67pa yXKIaMHSsjnEIlcro55SsNAb3D8Era1bTI2qi8AC9eXp9STquzLqVUzINCzg+8tEwlnC ulfpCBOsCDlJZB21YqI7neh/rMvuKr1TJx6j6Cz83CtZ7ZfkRIeeyQNtL7ap7irGg1wg ulgg== X-Forwarded-Encrypted: i=1; AJvYcCVZsTDICUSxY9Nm7f/VG/m8w3+JrUOV99ohiv7Ce/8LLvywQT2pV9HO1sBeAy5jIO8EWWezPs+hbrl5uNjoi6HiLDI= X-Gm-Message-State: AOJu0YwXoIWc3IzvkwWTVpu8jfaXe6XNepv33ZoFFRdMds8Ik4+GYZMd crA5ztmNtLPB3SAwJmVPh37P8de+U/wbMp5jZ55zeV/L0O7DGzHStCBj9nSB0ZY= X-Google-Smtp-Source: AGHT+IGMO064THKZbI3D+TjKDUvnCQ9eceCJdFB3I+efweCrewGFx2TWhVjKd7pJ/Unkt/w7tOJsaw== X-Received: by 2002:a0c:dc8f:0:b0:69b:71f8:b3fe with SMTP id n15-20020a0cdc8f000000b0069b71f8b3femr10445124qvk.42.1713364411072; Wed, 17 Apr 2024 07:33:31 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id gv5-20020a056214262500b0069b58f8c33dsm6787082qvb.45.2024.04.17.07.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 07:33:30 -0700 (PDT) Date: Wed, 17 Apr 2024 10:33:24 -0400 From: Johannes Weiner To: Chengming Zhou Cc: Nhat Pham , Christian Heusel , Seth Jennings , Dan Streetman , Vitaly Wool , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Runge , "Richard W.M. Jones" , Mark W , regressions@lists.linux.dev, Yosry Ahmed Subject: Re: [REGRESSION] Null pointer dereference while shrinking zswap Message-ID: <20240417143324.GA1055428@cmpxchg.org> References: <3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2> <246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev> X-Rspamd-Queue-Id: 2730C4001F X-Stat-Signature: 3tfnjbkz5mi7bdwre7brnze1qc5oz9k3 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1713364411-279781 X-HE-Meta: U2FsdGVkX19zUzZ7XwE67JxcCvLSRMBBzPtYRazpY/T+Nk4sLVhtWQwROflXF7cFPu1+fywIUlXgC9/JBqsbSZY5dwWKI5fAgBAASffriVmaehtyfDnsImRHiSXGJiIhheUyUKJE3qjG6I36ykbH0CNS7jZHUkpJOznTirPdWfu+P1r5rg6ukNn7ewX5E5piafVduy12z+aAEzNH4twKSbWaZ7z89dbR9QQnbWznU6pq4bdT0VKs5WDaJsBNFqnUH9GNIYJoBg7GmEhAnYZzfqfhqtcOTc+W178HcdDkV+r0o8HgOp+U7xsaP1F76gWhtLbSK/e96sU7P1q4s0R/XBQFTwuzL323tMmC0vvX84hVX4Sf2zcp1ZYVRULTvq/63LDQjKD+c1If2ra/s21saQwOYZ8DIjZG/tIaNMpaDhlfYR6tZT2EApXnuG6T2at7VoJ5RTQ3mDVUGH++K7wwF8DxJOk2ToB+tULI8KDqNfpGWNBWjaekjjTGxoB8TlJl0HYken/28J5noPU9XXAo+Z9RxCaqPXt91s2aeaVARDpjTg5W7bRW7uyKMgw3WG+ovAN6siALKwxuqegFZFamTyB8LGFzNJ1K1VC04LcK48Q09ngC1HAMv2mwFWg0s8kWGXNq7Qd6fRtGlsw9FV/kq0cTk6dXoT4Hf5N34UNous5ZE1X96pDCbkCdkVoNf9th6VZN+FjKVXqfJ5DH4Tg8vVCaN6a1wowEpE5Zh9vkvlZULg52NeaWHBpCIBvp1qbLfawq4j1G5lnYFhKZOzcN2nA1Vpm8rgCxe3fzu9N4J3CUWx+agH7UWyIIW6+IqJUIY2nmbi5B3XS5Byuje5QMIMjfXB0d2u5f868GoYeqg/8MIC87psujT3IC0lDdbYLqnHAqEglii7CG2y8qcufeX6mKdeWMAymNLReR1rjSPotL8Ne20IuokpWBuslty/Dler6RLbhN2hqLGf4/A3Q VKR+40GB /hv3CQm38YuwJ/cRp4L71faCPDLRI4DXJXMKtXdYbL7fKfSxBOGlr35ReusG95sFMzsEvRuVp2DWbgxwP79SYNhlKfYd3ht5+4bwDY8tu+Wszo+DDzvEzHGKZJjnxgMvHObJxkzzj8GfNTTlk1MrcR+tTzFzaQyuGSG4HfmzrsMr7HxitMZEOerREBC0zJL1+BH0pYrnhXOLLXvBGfg2hpTuHZ82KJ2suxMvuijvFpecufD2UeJgU3QIGw5KinwJMWgQ59MECs5eV91HYwM5AgX+WffYrXFK6ts7XdkIhddi8q6e6g+PcG+/whiyvKYn2DTQ0vvTZxg8OF0su5FaGWK7Ei+s/2uOURf1Sn29GHs9Isx/Pg95pQFnhS3RDqlZCqF4XSLdQLp538DSaXrWRmxozli+eNZAj/tZjt02NGN1zoHWmifSEd1rT+odOVvjMu5TRlmkOL7jko7zxFrWpam7GDrHoNNka7SL0xR9NsoFXcORJA94AtNuhIW0/3tXFn5+xDVau1wFxYe4Jc2UIwZff5gOcPbWxrxpdZIMmQeMz/KGINOBuXrrZ7POkJINGODwBxeSH9xVCgZf/MzahQ/vU0VlM2p0gq3FV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 17, 2024 at 11:44:45AM +0800, Chengming Zhou wrote: > On 2024/4/17 08:22, Nhat Pham wrote: > > On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham wrote: > >> > >> On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham wrote: > >>> > >>> On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel wrote: > >>>> > >>>> Hello everyone, > >>> > >>> Thanks for the report, Christian! Looking at it now. > >>> > >>>> > >>>> while rebuilding a few packages in Arch Linux we have recently come > >>>> across a regression in the linux kernel which was made visible by a test > >>>> failure in libguestfs[0], where the booted kernel showed a Call Trace > >>>> like the following one: > >>>> > >>>> [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > >>> > >>> Is this one of the kernel versions that was broken? That looks a bit > >>> odd, as zswap shrinker landed on 6.8... > >> > >> Ah ignore this - I understand the versioning now... > >> > >>> > >>>> [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > >>>> [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > >>>> [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > >>>> [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > >>>> [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > >>>> [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > >>>> [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > >>>> [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > >>>> [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > >>>> [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > >>>> [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > >>>> [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > >>>> [ 218.743494] PKRU: 55555554^M > >>>> [ 218.743593] Call Trace:^M > >>>> [ 218.743733] ^M > >>>> [ 218.743847] ? __die+0x23/0x70^M > >>>> [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > >>>> [ 218.744056] ? free_unref_page+0xf6/0x180^M > >>>> [ 218.744458] ? exc_page_fault+0x7f/0x180^M > >>>> [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > >>>> [ 218.744684] ? memcg_page_state+0x9/0x30^M > >>>> [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > >>>> [ 218.744896] do_shrink_slab+0x3a/0x360^M > >>>> [ 218.744990] shrink_slab+0xc7/0x3c0^M > >>>> [ 218.745609] drop_slab+0x85/0x140^M > >>>> [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > >>>> [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > >>>> [ 218.745912] vfs_write+0x23d/0x400^M > >>>> [ 218.745998] ksys_write+0x6f/0xf0^M > >>>> [ 218.746080] do_syscall_64+0x64/0xe0^M > >>>> [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > >>>> [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > >>>> > > > > Actually, inspecting the code a bit more - can memcg be null here? > > > > Specifically, if mem_cgroup_disabled() is true, can we see null memcg > > here? Looks like in this case, mem_cgroup_iter() can return null, and > > the first iteration of drop_slab_node() can have null memcg if it's > > returned by mem_cgroup_iter(). > > > > shrink_slab() will still proceed and call do_shrink_slab() if the > > memcg is null - provided that mem_cgroup_disabled() holds: > > > > if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > > > Ah, I think your analysis is very right, here memcg is NULL but memcg_page_state > won't check. +1. I could reproduce the NULL crash locally with cgroup_disable=memory, the shrinker enabled, and echo 3 >/proc/sys/vm/drop_caches. > > Inside zswap_shrink_count(), all the memcg accessors in this area seem > > to always check for null memcg (mem_cgroup_lruvec, > > mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which > > is the one line that fail. > > > > If this is all to it, we can simply add a null check or > > mem_cgroup_disabled() check here, and use pool stats instead? > > Both look ok to me. The VM could only set sc->memcg to NULL when memcg > disabled, right? Christian, can you please test the below patch on top of current upstream? diff --git a/mm/zswap.c b/mm/zswap.c index caed028945b0..6f8850c44b61 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, if (!gfp_has_io_fs(sc->gfp_mask)) return 0; -#ifdef CONFIG_MEMCG_KMEM - mem_cgroup_flush_stats(memcg); - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); -#else - /* use pool stats instead of memcg stats */ - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; - nr_stored = atomic_read(&zswap_nr_stored); -#endif + /* + * For memcg, use the cgroup-wide ZSWAP stats since we don't + * have them per-node and thus per-lruvec. Careful if memcg is + * runtime-disabled: we can get sc->memcg == NULL, which is ok + * for the lruvec, but not for memcg_page_state(). + * + * Without memcg, use the zswap pool-wide metrics. + */ + if (!mem_cgroup_disabled()) { + mem_cgroup_flush_stats(memcg); + nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; + nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); + } else { + nr_backing = zswap_pool_total_size >> PAGE_SHIFT; + nr_stored = atomic_read(&zswap_nr_stored); + } if (!nr_stored) return 0;