From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BCB5E77197 for ; Tue, 7 Jan 2025 23:01:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A7606B0082; Tue, 7 Jan 2025 18:01:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 957766B0083; Tue, 7 Jan 2025 18:01:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F7D76B0088; Tue, 7 Jan 2025 18:01:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5FEA56B0082 for ; Tue, 7 Jan 2025 18:01:25 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E8910140E00 for ; Tue, 7 Jan 2025 23:01:24 +0000 (UTC) X-FDA: 82982178888.01.475C710 Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by imf06.hostedemail.com (Postfix) with ESMTP id F1517180022 for ; Tue, 7 Jan 2025 23:01:21 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736290882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ady+Y4bB+Q17MnDRqEgHct5lK+jxzjfxkYbdIRfkLLU=; b=JdKPq4suP/yKKu2luNCc/ozI/VDdZQg58bMbnHY/cc62u6R+7J89/Vs6LjBl5wwxrq6Thg o0PM1crI4D4s3k5tXG32AdfSgO2rcKSJBzcA9rjrNZ+ihFQbRnXK2k2AtoM6YEiEPPtwTY zcMpLDHD2V+RactaigOnWVx/QtEzp5o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736290882; a=rsa-sha256; cv=none; b=XdHlSI95P5dmSTwneSpCJ7W8AX+5Q/bRLamH2Z5VH62nIhcLsUXcMf6jG4cdZpGXNqjD0c MZh4ERR/Kbn+QVjLMWtq4G27tZKGTjEp/XDX1jnMjVVOb5Ce6aTZwGD7dWzMUwYJIsAuJh cof4jayDrINR3aNH68V1Lp4LpN9LDhY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f177.google.com with SMTP id 71dfb90a1353d-51c4bc9cd19so1319280e0c.3 for ; Tue, 07 Jan 2025 15:01:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736290881; x=1736895681; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ady+Y4bB+Q17MnDRqEgHct5lK+jxzjfxkYbdIRfkLLU=; b=GdPW9rPHSq41/FZeFWCWfdMKQccXjVTZB2ij4HswdFQQ9BfC67hu9ZYnpcOHN7U327 1Wj6jkP4YOKEVhkOsvI0W8ST4Fo+rEtXp0dU2NSJrE65VvcXe+Ar/kIyjEcc5f1LrUgV mQdaJX9QQMRkb+HZEfmNeX0EpzjyyRCt1CFQQ6/YfHb5d5uGqgfA3zb4YDGSI483dbaE r+HGbJ7jlmzPg663kcOkzq4BySWU7s/fXV6SCs6KEHjQSFgg8YHQ95Qt5H4xPCdlH0FK kSuGXq6Oeh6LJVe+JBd/c2xcgbLyFDWxdi0TxKBWo8rzFLC6WTpYxJoMVFW5otvrBGuj 3LjQ== X-Forwarded-Encrypted: i=1; AJvYcCU2x2hMir2kffAoU3KG6HcgFicO6Sx/2tQW5h1Nxg89INtljB6cvc8Gl+KGA6123CPObt9C4cSIWg==@kvack.org X-Gm-Message-State: AOJu0YyHtJyhRaBdZ1C33LjthZSkGbv3vSfnO+A1hpnHi5HKdnnFr/AD dXQPk4WRf63cuNuV5bj3uqrocPLbL7onfo5ML7dyR1SdAbcEvQned0rsRA2yWZRcGUbxBHwr+IV bKJVDE2TCxYFTypgN8z2nCR3qYO8= X-Gm-Gg: ASbGncsULylGCH+ljlDpdmJxzb7cbLOV0V1iTn+VujvHoCnAq6og/dY5XQ2KZ5TJDCr soRhKwT0l+Gm9/fpGUrJokrzb2eLK3kMdpGM06SVy11r4zZsjl9TjbUat7kPIlRoGlw+q/vI= X-Google-Smtp-Source: AGHT+IFS12QBWuQUkz/p3skYHsScRGez0G7jyFXQbBEejpkmYKXLJsS5Dk3RDUKsEUJbgYu9pgw7/zEHNEpETgnYhnE= X-Received: by 2002:a05:6122:8f82:b0:515:f586:5298 with SMTP id 71dfb90a1353d-51c6c3237c2mr817097e0c.7.1736290880726; Tue, 07 Jan 2025 15:01:20 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> In-Reply-To: <20250107222236.2715883-1-yosryahmed@google.com> From: Barry Song Date: Wed, 8 Jan 2025 12:01:09 +1300 X-Gm-Features: AbW1kvYwBBZhmWiOM6yu0XrVESkeCDssEPBTi_g6lvqFvSRheJZs5HVURfJDBh4 Message-ID: Subject: Re: [PATCH v2 1/2] Revert "mm: zswap: fix race between [de]compression and CPU hotunplug" To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, syzbot Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zpqdozjzsmwtcbih5qc5aw7pmz8d17jn X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspamd-Queue-Id: F1517180022 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspam: Yes X-HE-Tag: 1736290881-82524 X-HE-Meta: U2FsdGVkX1+mLfnlK4Ey3IkDasZxtec3gF7hkiZhAM9rp6cLFwVmoouQR6VEkVXan0UQOqWDTLemP0jHFtHOnqbxqbEuHOnS22/17wgtnOiv9DKIm4Ii99Qca17puvDJYwexJlRyNccrs8DCZVWa1yWpJWJN9lgSv3/j0YR/msS7/fJ+fUUaOecWylBeTIyk0bAcg1R78iCtFa1k3QdiU6ZgAxFwJJ/dXa6ez3O0dO85sqsJ1PYqtUes55kSOPTvpcTEKBThRnDmFxrI841tVoGQ7bFSLPxIPMD++9kHtI+7xnwXLMRjiJmCLh+/sETLXFoWwtA5AaQUNGO+v9HvrYHhnnqQ+juB5m3m3yTrXbXIU/oaA29AUzpLIg0zmFR9r4IML39RmHu5Xk9O1ykvL9hBZa8TvItbvGMh8AVLJsUXHy6TKpHHYLmKJ/m4PTFDOfbKVqJkseBOPhKaWTpR9qe6sH0h3BgE11HGVTSJM+FZJlUwPISA/yP1iHersZNAaCUGPW9cS8S0+digeIc3uyxS6Wq6V41vtxOOHeU3mqR2KSZLFMt7iMs5Eu2/Yk3/vSb/2NEV2OwSvgL2qewiml8OBBOn39kRy0g4tudDVf7EmlOnYzm9rH+akGqpCwp7ijMdf0weYUgLStBdUa5u5F6L0sNhCZ0cjEwmHk3JyYIGxz3fOgp4vURfXeaQGRSh347Kpt2AOzfHzq5rSSZao4E2invmts695yf3v0dab+pQsoaumm+ivgISQ3Z/+KRX+fptunBKRcDEOGsUljytYeyQ8UcjFLKswZK2TfYpqJ5oN2Yef2cHURaaZAEm1jDRZsBNGZl83mmayDLZqg5V9g6zXm4tLCFdG3Gmr5BfRCmiFLbye0C84v8rp1PzHPqyqAiEmsa+sErYBXyBVB3bpKTt8w4PlVsic2spOCVEcu4iYiwFhrIbIOqUJFOe6q2+UbKZ5n0mM3Ofbr0rfvJ xROjm40o rQtWoYBclHdHYN4fv+ZddVlrm7SfNoqQWkbNNFWMeScCCgvpMRkesELtZ4pvvx2R4A30ZURcHC6mdtvBQv9CrPNzIRHiHrfcKJkhPebUWPIEDcPPxMUfHVvlSdVixmlhSsjDc7uiQM+W/pY6zefe+CyKY7raybRCEd7JwxbGwz/p9R/RYXmhnsak/k02pnqdUlfNbEw038Z6MsYpPxb3mGcW09LisZpmIinCKMR8mVZ675zFVUzr/sOkktczrgyhNb+gHsJ007MyhCoatOkRue30W0nOMmxrsPs+kwCn7rg3iP728gxqM4BLEJV+mpSHi+i4VRATiSiHNKJlchjXNIPJ2k88Pxwdcz/5jvy4RIWTIS3864Y5eZcyMWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000086, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 11:22=E2=80=AFAM Yosry Ahmed = wrote: > > This reverts commit eaebeb93922ca6ab0dd92027b73d0112701706ef. > > Commit eaebeb93922c ("mm: zswap: fix race between [de]compression and > CPU hotunplug") used the CPU hotplug lock in zswap compress/decompress > operations to protect against a race with CPU hotunplug making some > per-CPU resources go away. > > However, zswap compress/decompress can be reached through reclaim while > the lock is held, resulting in a potential deadlock as reported by > syzbot: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > WARNING: possible circular locking dependency detected > 6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Not tainted > ------------------------------------------------------ > kswapd0/89 is trying to acquire lock: > ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: acomp_ctx_get_cpu m= m/zswap.c:886 [inline] > ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_compress mm/z= swap.c:908 [inline] > ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store_page mm= /zswap.c:1439 [inline] > ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store+0xa74/0= x1ba0 mm/zswap.c:1546 > > but task is already holding lock: > ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c= :6871 [inline] > ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vm= scan.c:7253 > > which lock already depends on the new lock. We have functions like percpu_is_write_locked(), percpu_is_read_locked(), and cpus_read_trylock(). Could they help prevent circular locking dependencies if we perform a check before acquiring the lock? > > the existing dependency chain (in reverse order) is: > > -> #1 (fs_reclaim){+.+.}-{0:0}: > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 > __fs_reclaim_acquire mm/page_alloc.c:3853 [inline] > fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867 > might_alloc include/linux/sched/mm.h:318 [inline] > slab_pre_alloc_hook mm/slub.c:4070 [inline] > slab_alloc_node mm/slub.c:4148 [inline] > __kmalloc_cache_node_noprof+0x40/0x3a0 mm/slub.c:4337 > kmalloc_node_noprof include/linux/slab.h:924 [inline] > alloc_worker kernel/workqueue.c:2638 [inline] > create_worker+0x11b/0x720 kernel/workqueue.c:2781 > workqueue_prepare_cpu+0xe3/0x170 kernel/workqueue.c:6628 > cpuhp_invoke_callback+0x48d/0x830 kernel/cpu.c:194 > __cpuhp_invoke_callback_range kernel/cpu.c:965 [inline] > cpuhp_invoke_callback_range kernel/cpu.c:989 [inline] > cpuhp_up_callbacks kernel/cpu.c:1020 [inline] > _cpu_up+0x2b3/0x580 kernel/cpu.c:1690 > cpu_up+0x184/0x230 kernel/cpu.c:1722 > cpuhp_bringup_mask+0xdf/0x260 kernel/cpu.c:1788 > cpuhp_bringup_cpus_parallel+0xf9/0x160 kernel/cpu.c:1878 > bringup_nonboot_cpus+0x2b/0x50 kernel/cpu.c:1892 > smp_init+0x34/0x150 kernel/smp.c:1009 > kernel_init_freeable+0x417/0x5d0 init/main.c:1569 > kernel_init+0x1d/0x2b0 init/main.c:1466 > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > -> #0 (cpu_hotplug_lock){++++}-{0:0}: > check_prev_add kernel/locking/lockdep.c:3161 [inline] > check_prevs_add kernel/locking/lockdep.c:3280 [inline] > validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 > __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 > percpu_down_read include/linux/percpu-rwsem.h:51 [inline] > cpus_read_lock+0x42/0x150 kernel/cpu.c:490 > acomp_ctx_get_cpu mm/zswap.c:886 [inline] > zswap_compress mm/zswap.c:908 [inline] > zswap_store_page mm/zswap.c:1439 [inline] > zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 > swap_writepage+0x647/0xce0 mm/page_io.c:279 > shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 > pageout mm/vmscan.c:696 [inline] > shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 > shrink_inactive_list mm/vmscan.c:1967 [inline] > shrink_list mm/vmscan.c:2205 [inline] > shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 > mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 > mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] > memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 > balance_pgdat mm/vmscan.c:6975 [inline] > kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 > kthread+0x2f0/0x390 kernel/kthread.c:389 > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(fs_reclaim); > lock(cpu_hotplug_lock); > lock(fs_reclaim); > rlock(cpu_hotplug_lock); > > *** DEADLOCK *** > > 1 lock held by kswapd0/89: > #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vms= can.c:6871 [inline] > #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 = mm/vmscan.c:7253 > > stack backtrace: > CPU: 0 UID: 0 PID: 89 Comm: kswapd0 Not tainted 6.13.0-rc6-syzkaller-0000= 6-g5428dc1906dd #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS G= oogle 09/13/2024 > Call Trace: > > __dump_stack lib/dump_stack.c:94 [inline] > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 > print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074 > check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206 > check_prev_add kernel/locking/lockdep.c:3161 [inline] > check_prevs_add kernel/locking/lockdep.c:3280 [inline] > validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 > __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 > percpu_down_read include/linux/percpu-rwsem.h:51 [inline] > cpus_read_lock+0x42/0x150 kernel/cpu.c:490 > acomp_ctx_get_cpu mm/zswap.c:886 [inline] > zswap_compress mm/zswap.c:908 [inline] > zswap_store_page mm/zswap.c:1439 [inline] > zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 > swap_writepage+0x647/0xce0 mm/page_io.c:279 > shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 > pageout mm/vmscan.c:696 [inline] > shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 > shrink_inactive_list mm/vmscan.c:1967 [inline] > shrink_list mm/vmscan.c:2205 [inline] > shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 > mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 > mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] > memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 > balance_pgdat mm/vmscan.c:6975 [inline] > kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 > kthread+0x2f0/0x390 kernel/kthread.c:389 > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > > Revert the change. A different fix for the race with CPU hotunplug will > follow. > > Reported-by: syzbot > Signed-off-by: Yosry Ahmed > --- > > The patches apply on top of mm-hotfixes-unstable and are meant for > v6.13. > > Andrew, I am not sure what's the best way to handle this. This fix is > already merged into Linus's tree and had CC:stable, so I thought it's > best to revert it and replace it with a separate fix that would be easy > to backport instead of the revert patch, especially that functionally > the new fix is different anyway. > > v1 -> v2: > - Disable migration as an alternative fix instead of SRCU, and explain > why SRCU and cpus_read_lock() cannot be used in the commit log of > patch 2. > > --- > mm/zswap.c | 19 +++---------------- > 1 file changed, 3 insertions(+), 16 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 5a27af8d86ea9..f6316b66fb236 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -880,18 +880,6 @@ static int zswap_cpu_comp_dead(unsigned int cpu, str= uct hlist_node *node) > return 0; > } > > -/* Prevent CPU hotplug from freeing up the per-CPU acomp_ctx resources *= / > -static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_ct= x __percpu *acomp_ctx) > -{ > - cpus_read_lock(); > - return raw_cpu_ptr(acomp_ctx); > -} > - > -static void acomp_ctx_put_cpu(void) > -{ > - cpus_read_unlock(); > -} > - > static bool zswap_compress(struct page *page, struct zswap_entry *entry, > struct zswap_pool *pool) > { > @@ -905,7 +893,8 @@ static bool zswap_compress(struct page *page, struct = zswap_entry *entry, > gfp_t gfp; > u8 *dst; > > - acomp_ctx =3D acomp_ctx_get_cpu(pool->acomp_ctx); > + acomp_ctx =3D raw_cpu_ptr(pool->acomp_ctx); > + > mutex_lock(&acomp_ctx->mutex); > > dst =3D acomp_ctx->buffer; > @@ -961,7 +950,6 @@ static bool zswap_compress(struct page *page, struct = zswap_entry *entry, > zswap_reject_alloc_fail++; > > mutex_unlock(&acomp_ctx->mutex); > - acomp_ctx_put_cpu(); > return comp_ret =3D=3D 0 && alloc_ret =3D=3D 0; > } > > @@ -972,7 +960,7 @@ static void zswap_decompress(struct zswap_entry *entr= y, struct folio *folio) > struct crypto_acomp_ctx *acomp_ctx; > u8 *src; > > - acomp_ctx =3D acomp_ctx_get_cpu(entry->pool->acomp_ctx); > + acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); > mutex_lock(&acomp_ctx->mutex); > > src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > @@ -1002,7 +990,6 @@ static void zswap_decompress(struct zswap_entry *ent= ry, struct folio *folio) > > if (src !=3D acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > - acomp_ctx_put_cpu(); > } > > /********************************* > -- > 2.47.1.613.gc27f4b7a9f-goog > Thanks barry