From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAD7EC77B75 for ; Mon, 8 May 2023 16:01:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C7206B0071; Mon, 8 May 2023 12:01:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 376856B0074; Mon, 8 May 2023 12:01:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23DD66B0075; Mon, 8 May 2023 12:01:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 13BE46B0071 for ; Mon, 8 May 2023 12:01:03 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 839C8407AE for ; Mon, 8 May 2023 16:01:02 +0000 (UTC) X-FDA: 80767551564.14.7E69830 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf16.hostedemail.com (Postfix) with ESMTP id 9F95F180052 for ; Mon, 8 May 2023 16:00:28 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=pS50okJn; spf=pass (imf16.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683561629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+pB2YUBbkkhRqdEuS86bLaceZppwl5VGLTAnx/EHATw=; b=Eo9NW/fvyGoCnuHzp52nZu5zsNsx7hm+bBfTq9KqJrhQd2Cpkz8gW315ey3SaNE2OsUyx7 53VrhOK6phUp0/LlAD1YSDLdRVPMZ9ehCBwdzGtwLRBU+2hYLvgHU+wKlo8FLrWtRmKlDL e3GMzVJQbuZjdpy4cdim4ot0U0/NViI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683561629; a=rsa-sha256; cv=none; b=wp1xHUzQXQBhH3wE5nHfHL0Dwbs+XoAoJuhcDbiveRir8GOL/ynPQmwEY2/gwM8WhDwkCw VsMZiD5wnla/mxfc19NsiHVivSelkGgRsxVu8VLxSoqyLr4LuTTlZPuHSIgN7HhskWPv2I AgEw5ET4kex2yD8MkJ79RfI1rZj9JC4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=pS50okJn; spf=pass (imf16.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7575ff76964so103824185a.2 for ; Mon, 08 May 2023 09:00:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683561628; x=1686153628; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+pB2YUBbkkhRqdEuS86bLaceZppwl5VGLTAnx/EHATw=; b=pS50okJn+DEj1FhKDW07zXVSpLfpXPb/fBXu4OCYGcVBBjkLHrJ4kZfooEZvcXP9rp 0r1p5nAfrYx3zKpDb378iLebIlAjdrbzyWuA3S+bip2eB7QW9rgKCIMNkS9/xKjy/PsL SlSbI2PQ0ofEYV5cCMDDLftwf25XD3w2FGW6tI55Ka0+/K2uR3qaqrz8Ue0paqd3jc9l hmNsCbMTEyq9N06q3jW0texS/Zmou8gS7BhWp07lP+Jr9+rsLT1H5QViG3UMNnXPMCBg Jfwt6zsLNRcrTH3KIdrSxBbOp2dH9nDVtntlvvwzuVxH06ysu3XImTp41a9v5emfoPa/ jTaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683561628; x=1686153628; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+pB2YUBbkkhRqdEuS86bLaceZppwl5VGLTAnx/EHATw=; b=BmHH19bDu5AVexx2pEj4Ky6Bz7RKe/AxEePMFctp+/CVUeK7bLU95RaOxeYUOrfsRE ouYxS0I32xVXJ3Qk3DtKe0oKaWlEtWpH6pgrdy/G9rVWpg2/xaiyC5+j2HT9Ay78od90 ifHldGtn1OPeD0z0TMHUk4Q9o0XDoFseJdrHP70GwudND9Qfi6iqg/7nHmxF07OSrYax QPs4uTV0+bvqq3XQ298JUR/9AQcu0aFYS2hh0dZLmn3RAcoET6oaMB6TVLZdkILvFMEG 1kXuZMWSCWOEY+Yi2k2Vv4xKFhKT6VeWAK63bBu5adaWlg7yqvdI7owMAFlOn4WIgNpz S12g== X-Gm-Message-State: AC+VfDxYkteE/+dtLnPaLKdDxg9DF78iT/ZWsTD2ZKrdjW6YtZcCDRd+ 0eSPeA0oQMIDopJCqGgiLEBCfsY3n3BiX8Eef2c= X-Google-Smtp-Source: ACHHUZ4+YaRzP/2WFqgxw9MEO6lRKxnBbuXGeKO0JmvxRM5A20ZI9JXGXms558Gyu90FkMXNicwXp1YM+KyUP5eGb9c= X-Received: by 2002:a05:6214:2427:b0:61a:c2bf:9f0b with SMTP id gy7-20020a056214242700b0061ac2bf9f0bmr18496822qvb.6.1683561627381; Mon, 08 May 2023 09:00:27 -0700 (PDT) MIME-Version: 1.0 References: <20230505185054.2417128-1-nphamcs@gmail.com> <20230506030140.GC3281499@google.com> <20230508140658.GA3421@cmpxchg.org> In-Reply-To: <20230508140658.GA3421@cmpxchg.org> From: Nhat Pham Date: Mon, 8 May 2023 09:00:16 -0700 Message-ID: Subject: Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to zs_malloc() To: Johannes Weiner Cc: Sergey Senozhatsky , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9F95F180052 X-Stat-Signature: mbxzh54dzxin6qjit7c4q9iz5gbt5tam X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1683561628-314875 X-HE-Meta: U2FsdGVkX18UWMngTtIbLdJd477x/7WHAnB5n83S/LdVCECtsJgJfPapLipp4TWRLdJMNTRMpWPuJenbHAZlqdIqx97PA+HCl2OfTr7VPbmh4GQOyMzhT3bH9HKgffE3E7OOztMfARFqTfKDNzTRMwZl2M6+rJ06xsW5GAGq3c6a2Njrc4Ad+CM7AN+FndDmq2waBjzENSK+36Tzz30ros4oaGPhKuvWDzJxfQGbyv+ZFMjW1iHnLTWcxtPQVukGE7ifVbFsdCY+jP1iG5UE9L5qctQ+uDR7vP0uesH64HHCqvs/CipJpcaqnFSJzn/1iBWnSe+Xs8juW3csWaM7vLTWZyIANdEUIHcOh2hkjsQQugIpcHyohM7/m7x82Yq2CoonM2pSJlpAL5XQbYZBo6lVtrJvNn4msxnemcz7ldcyzKkgzlZzfXBI0l1BVFG6wIGOB8OucSCFwH1hxQtSRRE6/22Is8dB3jvT/vt9ZvtqKr21S0fLC9l2IhiQ8PCIkvbnGeWYtLdTwQgNmiC2STbTnihwi27APEyQXmbgxgQcnH9PYvVVCsJqR82GG8hh+G1X38d4ArF7SZ4VuOX+RagUGWr79KkVKRI0y5M0ua6LaqWXzns1QJLxuRYkFmwkDW7ZzwIze8VSUXsWkTN/GzZrxdwnoGSyzc7w2dgmM6QDsaKQ1wY3ijVoNv3l1J8FrM26y3KKkf56dUKwnQvxLSDHINMnGdQlPPSISN9hLPLFM+2GkqPidXUcqI/0VubdILrPZ+4aPFtqKPdSDUxNhyHkoezfvAoMz0vHm2QL00PeOpzHR+Hte3PJtgg1WbW5kNKYcdwG2OwLjNu3co2g+4wi/UyvBYoGqLQ9C7fgZKZb0C1MkQx1M74B7jtb32hJuzWB+277a4bm1JW0bbidLH3YcjrGxuV+P0G5Bmgsvmh+1DTIgRqpCg71N/8YS0haUW15FoAGgSw57az2ycG MQUjJY+X 9/d19Nezm3Ev0Um186jv+JXdf2w01/zv5lKH169kbzIl7S9iKkx6O9RJIXiiXbVevGFcvH86ndEO9SJ50WzLTF5u1nO76YAbfik9l5S1LnWhy1wZGnume6YWlPteBJ7WtgjSGbYvCJVJlbAMHQfuyk8YPOJBMNqFRdnKK0Hd5KLPt1UH40X+6QxrFsDZ0DSPrk8I4D3LNNuQfHpjp6DVUdfjRT1vqqMiTkaqhQI0rT1hdyBw1gVCv8z+GvHI+IZCJ7DbhromU2QaDL5H9NEAdBXIes33+LuOPTNWnCo8tCGJAz6KZB+SdfRYfVN9DZ1oqj+uuFP18iE8oCGZWGA+hnYZ3TB2bkpnqsQTNcVjg6JwPqIkX5krw/LuwuP9qc1AwVRaTKIMyIown1pz3SHvxLx22teiUxG94yYYx5OjFF8yqsk5xD0Bae8pBihtdwxoZPRzawv8fCW3bQwFwoOp4dCTPJ8YtMO6wHzG5xkpHWCrpzyMkvU8mvMvrpqGymQBceP2uF3cU2QdKEtQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 8, 2023 at 7:07=E2=80=AFAM Johannes Weiner = wrote: > > On Sat, May 06, 2023 at 12:01:40PM +0900, Sergey Senozhatsky wrote: > > On (23/05/05 11:50), Nhat Pham wrote: > > > Under memory pressure, we sometimes observe the following crash: > > > > > > [ 5694.832838] ------------[ cut here ]------------ > > > [ 5694.842093] list_del corruption, ffff888014b6a448->next is LIST_PO= ISON1 (dead000000000100) > > > [ 5694.858677] WARNING: CPU: 33 PID: 418824 at lib/list_debug.c:47 __= list_del_entry_valid+0x42/0x80 > > > [ 5694.961820] CPU: 33 PID: 418824 Comm: fuse_counters.s Kdump: loade= d Tainted: G S 5.19.0-0_fbk3_rc3_hoangnhatpzsdynshrv41_10870= _g85a9558a25de #1 > > > [ 5694.990194] Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive= MP, BIOS YMM16 05/24/2021 > > > [ 5695.007072] RIP: 0010:__list_del_entry_valid+0x42/0x80 > > > [ 5695.017351] Code: 08 48 83 c2 22 48 39 d0 74 24 48 8b 10 48 39 f2 = 75 2c 48 8b 51 08 b0 01 48 39 f2 75 34 c3 48 c7 c7 55 d7 78 82 e8 4e 45 3b = 00 <0f> 0b eb 31 48 c7 c7 27 a8 70 82 e8 3e 45 3b 00 0f 0b eb 21 48 c7 > > > [ 5695.054919] RSP: 0018:ffffc90027aef4f0 EFLAGS: 00010246 > > > [ 5695.065366] RAX: 41fe484987275300 RBX: ffff888008988180 RCX: 00000= 00000000000 > > > [ 5695.079636] RDX: ffff88886006c280 RSI: ffff888860060480 RDI: ffff8= 88860060480 > > > [ 5695.093904] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffc= 90027aef370 > > > [ 5695.108175] R10: 0000000000000000 R11: ffffffff82fdf1c0 R12: 00000= 00010000002 > > > [ 5695.122447] R13: ffff888014b6a448 R14: ffff888014b6a420 R15: 00000= 000138dc240 > > > [ 5695.136717] FS: 00007f23a7d3f740(0000) GS:ffff888860040000(0000) = knlGS:0000000000000000 > > > [ 5695.152899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 5695.164388] CR2: 0000560ceaab6ac0 CR3: 000000001c06c001 CR4: 00000= 000007706e0 > > > [ 5695.178659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000= 00000000000 > > > [ 5695.192927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000= 00000000400 > > > [ 5695.207197] PKRU: 55555554 > > > [ 5695.212602] Call Trace: > > > [ 5695.217486] > > > [ 5695.221674] zs_map_object+0x91/0x270 > > > [ 5695.229000] zswap_frontswap_store+0x33d/0x870 > > > [ 5695.237885] ? do_raw_spin_lock+0x5d/0xa0 > > > [ 5695.245899] __frontswap_store+0x51/0xb0 > > > [ 5695.253742] swap_writepage+0x3c/0x60 > > > [ 5695.261063] shrink_page_list+0x738/0x1230 > > > [ 5695.269255] shrink_lruvec+0x5ec/0xcd0 > > > [ 5695.276749] ? shrink_slab+0x187/0x5f0 > > > [ 5695.284240] ? mem_cgroup_iter+0x6e/0x120 > > > [ 5695.292255] shrink_node+0x293/0x7b0 > > > [ 5695.299402] do_try_to_free_pages+0xea/0x550 > > > [ 5695.307940] try_to_free_pages+0x19a/0x490 > > > [ 5695.316126] __folio_alloc+0x19ff/0x3e40 > > > [ 5695.323971] ? __filemap_get_folio+0x8a/0x4e0 > > > [ 5695.332681] ? walk_component+0x2a8/0xb50 > > > [ 5695.340697] ? generic_permission+0xda/0x2a0 > > > [ 5695.349231] ? __filemap_get_folio+0x8a/0x4e0 > > > [ 5695.357940] ? walk_component+0x2a8/0xb50 > > > [ 5695.365955] vma_alloc_folio+0x10e/0x570 > > > [ 5695.373796] ? walk_component+0x52/0xb50 > > > [ 5695.381634] wp_page_copy+0x38c/0xc10 > > > [ 5695.388953] ? filename_lookup+0x378/0xbc0 > > > [ 5695.397140] handle_mm_fault+0x87f/0x1800 > > > [ 5695.405157] do_user_addr_fault+0x1bd/0x570 > > > [ 5695.413520] exc_page_fault+0x5d/0x110 > > > [ 5695.421017] asm_exc_page_fault+0x22/0x30 > > > > > > After some investigation, I have found the following issue: unlike ot= her > > > zswap backends, zsmalloc performs the LRU list update at the object > > > mapping time, rather than when the slot for the object is allocated. > > > This deviation was discussed and agreed upon during the review proces= s > > > of the zsmalloc writeback patch series: > > > > > > https://lore.kernel.org/lkml/Y3flcAXNxxrvy3ZH@cmpxchg.org/ > > > > > > Unfortunately, this introduces a subtle bug that occurs when there is= a > > > concurrent store and reclaim, which interleave as follows: > > > > > > zswap_frontswap_store() shrink_worker() > > > zs_malloc() zs_zpool_shrink() > > > spin_lock(&pool->lock) zs_reclaim_page() > > > zspage =3D find_get_zspage() > > > spin_unlock(&pool->lock) > > > spin_lock(&pool->lock) > > > zspage =3D list_first_entry(= &pool->lru) > > > list_del(&zspage->lru) > > > zspage->lru.next =3D LIST_= POISON1 > > > zspage->lru.prev =3D LIST_= POISON2 > > > > Will list_del_init() there do the= trick? > > > > > spin_unlock(&pool->lock) > > > zs_map_object() > > > spin_lock(&pool->lock) > > > if (!list_empty(&zspage->lru)) > > > list_del(&zspage->lru) > > > > list_del_init() > > The deeper bug here is that zs_map_object() tries to add the page to > the LRU list while the shrinker has it isolated for reclaim. This is > way too sutble and error prone. Even if it worked now, it'll cause > corruption issues down the line. > > For example, Nhat is adding a secondary entry point to reclaim. > Reclaim expects that a page that's on the LRU is also on the fullness > list, so this would lead to a double remove_zspage() and BUG_ON(). > > This patch doesn't just fix the crash, it eliminates the deeper LRU > isolation issue and makes the code more robust and simple. I agree. IMO, less unnecessary concurrent interaction is always a win for developers' and maintainers' cognitive load. As a side benefit - this also gets rid of the inelegant check (mm =3D=3D ZS_MM_WO). The fact that we had to include a a multi-paragraph explanation for a 3-line piece of code should have been a red flag.