From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3962C77B75 for ; Mon, 8 May 2023 14:09:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CBC16B0071; Mon, 8 May 2023 10:09:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17D1C6B0074; Mon, 8 May 2023 10:09:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0430F6B0078; Mon, 8 May 2023 10:09:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EA2CA6B0071 for ; Mon, 8 May 2023 10:09:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 32F90403A4 for ; Mon, 8 May 2023 14:09:03 +0000 (UTC) X-FDA: 80767269366.17.8DC046F Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by imf15.hostedemail.com (Postfix) with ESMTP id 819CDA004D for ; Mon, 8 May 2023 14:07:02 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=PnFkj4n4; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683554822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0zHIB1p0r9db3H8iesbnDhoKul8VUYPYDCCOq/EsIAs=; b=YHPW3VGyuMfOukHcuVnqmXjR/9deepJ8kaGlickmNsXS7pFy77iUmLjLwfXqZg67ZJWHWU lxlGQifHD8RYkPlJsP1876ePeoR/WnlO23pkqjJC/YnHmv7X/jkPhVDBV/5bK3e+vFqgWS 2ecgHhDKAbj/4bRebPOhywNC/6Pq6uQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=PnFkj4n4; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683554822; a=rsa-sha256; cv=none; b=kWA+2gvu9htm5vHLRBAyhh5UCk5Kj54s9/Dw+dyynTXTLspPZPFxNDhK93qS/3yIXiwQeC ++z4uVe0PMjwZ97F51zEkG783gssWKTcuVMu/t/k6n9Y4n06qiHQqPOXoDIFH7L/Seop2K /U77dKi9IAw94/5Fw78szYY2SaBy0jA= Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7577ef2fa31so296558085a.0 for ; Mon, 08 May 2023 07:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1683554821; x=1686146821; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0zHIB1p0r9db3H8iesbnDhoKul8VUYPYDCCOq/EsIAs=; b=PnFkj4n4y+ttVT7PqKup3F5pFJAFTWzr09cEqKrDuVzaVafMm9brRAWQjqQNdGMluu 0QeNEFD0bR+u/rNSk9K4O6Y3X8BJqYwDaXKsxLcyqD+4XQt7i8xYKDlCwQFLRloKDqz1 OaGYQ3p32vOb53U8IJQYKIcSvSlIEgtpDVNDPwL4eqHSfdw4FZwW2RX8ogm+kic0qqXq bfNtcIwNiQqTV3bBHvHvkAaOF3e5GB+aGuMIMQp0HTTkasWR/3Qv2JzMPtu/8o1TAH2m H3/CVtezpZcCtXSx6BMYQpRtQzmX4rfAxgc/odEIBWx6Ls3gJUJUqklC8hYgcyFD5bk3 a4NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683554821; x=1686146821; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0zHIB1p0r9db3H8iesbnDhoKul8VUYPYDCCOq/EsIAs=; b=FYHj47Paad3Q1rgwBAapjfhUTnZxa/OyIldtVCVPi0JvnrFpR8ClXkuUwfrXo7U2Ff 71E+/47m7GsuMs13m6wZCR1mhFbnTpZAEB92hlP2YhY+65x99ZL0yt8MmQn3BfO7aDIZ 3bqBeM7MRxaZNj31lDern+DNDaGQEPV2wydcdvCySyJ0K70paxSBI+1iRGmkheM+n0q9 QqHxYSDtgBXXDZJ0qf/Byjguja79BmhtlnpUhNgfzUe8cxNTOaqniSSwUtDWpxYasygM GVvEWOs20pwwsWdMebwM3agEtGKRbH1LA0IwWXXULTF5pwhvp1sokzKvGpKHjFbEiHW6 oupw== X-Gm-Message-State: AC+VfDwQvEp84cRMgo3LtXY/UtZ2Ex4F9CkhRoPi7vJA77sTTBqvE5Qi jD0rL4yylvgLfhZ3ED83lhncyA== X-Google-Smtp-Source: ACHHUZ6wpokICs+/mgYMHuejLrCPxUUQwBF+qP13E/j1zk2eQdB0d9Z7h7dIrUEkeiRDZF/3zlFIAA== X-Received: by 2002:a05:6214:d0f:b0:5ef:653e:169b with SMTP id 15-20020a0562140d0f00b005ef653e169bmr15621972qvh.8.1683554821234; Mon, 08 May 2023 07:07:01 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id o14-20020a0c8c4e000000b0062119a7a7a3sm18000qvb.4.2023.05.08.07.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 May 2023 07:07:00 -0700 (PDT) Date: Mon, 8 May 2023 10:06:58 -0400 From: Johannes Weiner To: Sergey Senozhatsky Cc: Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Subject: Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to zs_malloc() Message-ID: <20230508140658.GA3421@cmpxchg.org> References: <20230505185054.2417128-1-nphamcs@gmail.com> <20230506030140.GC3281499@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230506030140.GC3281499@google.com> X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 819CDA004D X-Stat-Signature: wpku9xnjanf3g1zxo1cfndsi5c19c9b6 X-HE-Tag: 1683554822-766363 X-HE-Meta: U2FsdGVkX18RSCHihdwPtXyUxNJlP9TeE0JUOCyoM441MJWq2Jn1Y5bbgEBB4oePpMQmLggdviJ0gca3wIdmzZgz0uAeGjHL3iRM0frYdVZEKNOuOU7vCgN13uepMslglSN1m3nNue/l1MOuWEbDGBcNZBy08kGWy61Xz3sBRmTF6vPwL16pulO0aEZJgudblllu81TrymGfXjJ8bOXE/9peqfbhkLCm//9JIs2szs8nLvNnIIdthBUb3h/rjJRJFHG2qg3H0CtuBaUBeqMXoyRpYI6jGqG2yke3/fw+I6N8mK1zNJxg33hVOpk+RVYi5zsLYy9QtT3GVwxN7rmBENuGm+xrBe2qU7tMQq7pRAkja6Hx51G0wuhZW3lKavrrQARQIJC3PIuxKVDIENv75+Qofv3ATkIQdP3HOMZGFhe9LWU3LpXs/NN5JZn33WiVWso7Ogdd/3a9beEmV9DlfS0IYx297bx7XS5t1BeAE1gxolvz21UF9eNg2gYE6D0bVpVl6shfG4oqVc7qzPzZCEUXuKwXQjsg5d5Qg6NPtfL8CQQwLFgSEHyv1bHIs8Y4WdHgWi0JPmhAPwCWERn7VScvOObmrjA2JZeVs33DyC0vfMBrlUJfjK0DBb44pvOexNRfHNTEz5vgypQRKgnkQ/Zx4xg0xivDz8IVEns9u0IxO/E6Bm5JOa4ldfZ/sns1CjReU5lrAFnNDrq+Ax+QZMRYB5lihMSYe9DUha6vL8bVjQYLoFfbrABuDPYJnXgJMEvSb5hcd0Z7KwzxhEN1GZjO/8z0MGNthn8ZiNH1ldOwsBg5lalMmoIQW5O4SaanxuqkF3vweRWsWmZ4qjXvbxW0kJ6B8OheIlU56aTjChHwCcdS6zbFsp2hia7PAeGymcL3hc5uSlhsdHVH5/L26uElGtSmCuRHEzovjR2zqsxUD86Os9iLb/WKPxsHct7g8sspBL9NrsQi0Jc/9wr aVB4wCU9 EJAHhr0nLy92TGw+rDcruPdpWHLJ+l90pwKQFx31R5x+xvPniQnFCy7p7a8TTgBPHpo1t6lPawRC9CLWo/1uyrJdU0zStvSAXVfOzlrIdCsdVjTnbbO9yhMz2lYTJWKT9qJj9d0qRffRRKyKqMVRKFhagqFCa0i08us5SFwMu/Rcpm3uOHekOjSb6Yod3dW2jpv/ALOzXrpcDcMeKNrzZah26YjBE8gqfPmR3dzpoTTgC+Gz1177KxkYVQz1iOUA0mYeyzaLlkB8J4Oty7iM0rBY+hGAwqXqmHQYCAU6cKlos3I15TaG9RpK3A8ATZxaQB4qwWEXLTUXsEZpT21FdD6ZMzt1hrTT/2MHfbghrhTdzwNMuKrDbiP3qBNdNqcjtRWZ2iS3Cqv4EeBJPL/RyFM9AOX7/QmRDn2JFtir1sjS49LZBFu6JYlrz1oGQo70pvK6VehsH3UhJuAuEJ3pnddiGSIA7pnOrzs53p+BSAFMDsiKPwoD3jtfBV7Nm/k1kNOAblNFuzmnROB4AtxUSmDycoVlV0IrdBuwsSEQdlIh1IGbo0oTnLwBZLg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, May 06, 2023 at 12:01:40PM +0900, Sergey Senozhatsky wrote: > On (23/05/05 11:50), Nhat Pham wrote: > > Under memory pressure, we sometimes observe the following crash: > > > > [ 5694.832838] ------------[ cut here ]------------ > > [ 5694.842093] list_del corruption, ffff888014b6a448->next is LIST_POISON1 (dead000000000100) > > [ 5694.858677] WARNING: CPU: 33 PID: 418824 at lib/list_debug.c:47 __list_del_entry_valid+0x42/0x80 > > [ 5694.961820] CPU: 33 PID: 418824 Comm: fuse_counters.s Kdump: loaded Tainted: G S 5.19.0-0_fbk3_rc3_hoangnhatpzsdynshrv41_10870_g85a9558a25de #1 > > [ 5694.990194] Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM16 05/24/2021 > > [ 5695.007072] RIP: 0010:__list_del_entry_valid+0x42/0x80 > > [ 5695.017351] Code: 08 48 83 c2 22 48 39 d0 74 24 48 8b 10 48 39 f2 75 2c 48 8b 51 08 b0 01 48 39 f2 75 34 c3 48 c7 c7 55 d7 78 82 e8 4e 45 3b 00 <0f> 0b eb 31 48 c7 c7 27 a8 70 82 e8 3e 45 3b 00 0f 0b eb 21 48 c7 > > [ 5695.054919] RSP: 0018:ffffc90027aef4f0 EFLAGS: 00010246 > > [ 5695.065366] RAX: 41fe484987275300 RBX: ffff888008988180 RCX: 0000000000000000 > > [ 5695.079636] RDX: ffff88886006c280 RSI: ffff888860060480 RDI: ffff888860060480 > > [ 5695.093904] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffc90027aef370 > > [ 5695.108175] R10: 0000000000000000 R11: ffffffff82fdf1c0 R12: 0000000010000002 > > [ 5695.122447] R13: ffff888014b6a448 R14: ffff888014b6a420 R15: 00000000138dc240 > > [ 5695.136717] FS: 00007f23a7d3f740(0000) GS:ffff888860040000(0000) knlGS:0000000000000000 > > [ 5695.152899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 5695.164388] CR2: 0000560ceaab6ac0 CR3: 000000001c06c001 CR4: 00000000007706e0 > > [ 5695.178659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 5695.192927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 5695.207197] PKRU: 55555554 > > [ 5695.212602] Call Trace: > > [ 5695.217486] > > [ 5695.221674] zs_map_object+0x91/0x270 > > [ 5695.229000] zswap_frontswap_store+0x33d/0x870 > > [ 5695.237885] ? do_raw_spin_lock+0x5d/0xa0 > > [ 5695.245899] __frontswap_store+0x51/0xb0 > > [ 5695.253742] swap_writepage+0x3c/0x60 > > [ 5695.261063] shrink_page_list+0x738/0x1230 > > [ 5695.269255] shrink_lruvec+0x5ec/0xcd0 > > [ 5695.276749] ? shrink_slab+0x187/0x5f0 > > [ 5695.284240] ? mem_cgroup_iter+0x6e/0x120 > > [ 5695.292255] shrink_node+0x293/0x7b0 > > [ 5695.299402] do_try_to_free_pages+0xea/0x550 > > [ 5695.307940] try_to_free_pages+0x19a/0x490 > > [ 5695.316126] __folio_alloc+0x19ff/0x3e40 > > [ 5695.323971] ? __filemap_get_folio+0x8a/0x4e0 > > [ 5695.332681] ? walk_component+0x2a8/0xb50 > > [ 5695.340697] ? generic_permission+0xda/0x2a0 > > [ 5695.349231] ? __filemap_get_folio+0x8a/0x4e0 > > [ 5695.357940] ? walk_component+0x2a8/0xb50 > > [ 5695.365955] vma_alloc_folio+0x10e/0x570 > > [ 5695.373796] ? walk_component+0x52/0xb50 > > [ 5695.381634] wp_page_copy+0x38c/0xc10 > > [ 5695.388953] ? filename_lookup+0x378/0xbc0 > > [ 5695.397140] handle_mm_fault+0x87f/0x1800 > > [ 5695.405157] do_user_addr_fault+0x1bd/0x570 > > [ 5695.413520] exc_page_fault+0x5d/0x110 > > [ 5695.421017] asm_exc_page_fault+0x22/0x30 > > > > After some investigation, I have found the following issue: unlike other > > zswap backends, zsmalloc performs the LRU list update at the object > > mapping time, rather than when the slot for the object is allocated. > > This deviation was discussed and agreed upon during the review process > > of the zsmalloc writeback patch series: > > > > https://lore.kernel.org/lkml/Y3flcAXNxxrvy3ZH@cmpxchg.org/ > > > > Unfortunately, this introduces a subtle bug that occurs when there is a > > concurrent store and reclaim, which interleave as follows: > > > > zswap_frontswap_store() shrink_worker() > > zs_malloc() zs_zpool_shrink() > > spin_lock(&pool->lock) zs_reclaim_page() > > zspage = find_get_zspage() > > spin_unlock(&pool->lock) > > spin_lock(&pool->lock) > > zspage = list_first_entry(&pool->lru) > > list_del(&zspage->lru) > > zspage->lru.next = LIST_POISON1 > > zspage->lru.prev = LIST_POISON2 > > Will list_del_init() there do the trick? > > > spin_unlock(&pool->lock) > > zs_map_object() > > spin_lock(&pool->lock) > > if (!list_empty(&zspage->lru)) > > list_del(&zspage->lru) > > list_del_init() The deeper bug here is that zs_map_object() tries to add the page to the LRU list while the shrinker has it isolated for reclaim. This is way too sutble and error prone. Even if it worked now, it'll cause corruption issues down the line. For example, Nhat is adding a secondary entry point to reclaim. Reclaim expects that a page that's on the LRU is also on the fullness list, so this would lead to a double remove_zspage() and BUG_ON(). This patch doesn't just fix the crash, it eliminates the deeper LRU isolation issue and makes the code more robust and simple.