From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 599A2C3DA5D for ; Fri, 19 Jul 2024 18:57:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDF4A6B0082; Fri, 19 Jul 2024 14:57:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8F106B0083; Fri, 19 Jul 2024 14:57:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B588E6B0088; Fri, 19 Jul 2024 14:57:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 98D256B0082 for ; Fri, 19 Jul 2024 14:57:36 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1CEEDA08CB for ; Fri, 19 Jul 2024 18:57:36 +0000 (UTC) X-FDA: 82357410912.02.2994280 Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) by imf18.hostedemail.com (Postfix) with ESMTP id 5A4231C0027 for ; Fri, 19 Jul 2024 18:57:34 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UMR3rwiQ; spf=pass (imf18.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721415433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ip3h7VGQW/ACNut06jGAQyoClbNHht4iMIUnDrB895A=; b=5V0kEuJ8TJr+3KGWM/KAZaoVsUOaAVeHUFemzUssZsqygr8rkWlkkyFwUlRnIq8uJxdJFn dudD2+ePORazouDFogGEUO6w7IMapt4KWnmd0oEExfhXOvzrosvD6DtKCikTs9Au3ijFQW 6AghFWrfq0SB8EBI75aT9OhWwqdSfCg= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UMR3rwiQ; spf=pass (imf18.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721415433; a=rsa-sha256; cv=none; b=x5OdAozziZ9x4jJhtes30yqlxSZHQ1XafTXx5uXy6g+swr9o9XgX1QtUnEBUtsbQ2gY0wn ozEKYkvWRuDhVC45yrRfmp/COhatIYs7FnzjFBdqaCSd2+zWuvudIRw5u0aatwFvG9NN1E zZVOejhu0TKp7MwRt/jm1zMyOVhJE/0= Received: by mail-ua1-f50.google.com with SMTP id a1e0cc1a2514c-82608b73144so419774241.0 for ; Fri, 19 Jul 2024 11:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721415453; x=1722020253; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ip3h7VGQW/ACNut06jGAQyoClbNHht4iMIUnDrB895A=; b=UMR3rwiQ/nteOl54QQ+Qkk6qZDx8igEseQ9SojmTpdH3sB2zA+7vEd/xppaBvUyK8q moM03Unp6go0Ehxuh3eA8GLmALj+upswS/wmjyMbL4aPyJYw8ZlGXjgjloSovNvEZrSf AfM6Tspc0+aRGte9g06/J5IS7ZnUpImEIPnkPSEZth48XSyow/hwRybzlwqI4GZKSX3W mbmoEP0CEPhKAAo5dXHR6smnI+FUApWatLMZtpHGa7n7pJFva2ZrpwG4GqzTRuUyqm6e 47PG9HgTON2Vp7/0ykg/sXEvqLQR+TJDPikKMhkU1jscqTwequdINlOuoH/5hThAI5Aj snYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721415453; x=1722020253; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ip3h7VGQW/ACNut06jGAQyoClbNHht4iMIUnDrB895A=; b=MEeEX3G+VJxvKM0UPdluaoACaTMCpZt0Nr1De1/wppiC2knbcVRxOWBWF1MhMVj+xw eYSDRWqCClDBIt933PsXaGyArHKqhWn/Xkk8czGlUC+UxaK+t2dCb6x40tHiuZXgQB3M oz6e9pzECMpfZ8T/VcWJPbw0RMMd47hv/Kr29VA2dL9axyJ9jxSIFGftgdUyUAcOD3qF S7JxwNBw2qzD14x/YxR8aKAKhs2nA/vDsjVZ6ocwXlkXQZ4Ql9/huqeVcML+2+MpggBS /+lXfqcTMi1uWzcQQyf1aCfIxrsRmqbBr01P6wVyDNOUBRf8CHxy9twuUNLdcqKwKRNr 9mVA== X-Gm-Message-State: AOJu0YzZ5AJYQ/GSVKHO6UtJ15m4NioAzerNI63SVPnKCX6zSwVIyDgD l+0ekY+eKSuBEquhMZ9vLXElvXgXD8v2t6Ca+oSL0ksUuTVUgBXyqKrSVTYAeBOlvLbxSbwpXg4 RIrx7BVJW1Syx7g0ZS+4n3vs57UE= X-Google-Smtp-Source: AGHT+IERf9/Xps48kV2W7nS4RryP3dSSJlqo0zdteVRW81icpznCDJJbkvVGfjiztfU9CcTYTJaZwhLDGN8Ldan8YCc= X-Received: by 2002:a05:6102:504d:b0:48f:e46e:ab8b with SMTP id ada2fe7eead31-49283f33784mr892492137.19.1721415453276; Fri, 19 Jul 2024 11:57:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pedro Falcato Date: Fri, 19 Jul 2024 19:57:21 +0100 Message-ID: Subject: Re: zswap_writeback_entry crashes in 6.9.5 To: Nhat Pham Cc: mm , Linux Regressions , Johannes Weiner , Yosry Ahmed , Chengming Zhou , Christian Heusel , yshuiv7@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 5A4231C0027 X-Stat-Signature: nm9k4cdtk6q51jn6w3uhtg1sz7wutfjh X-Rspam-User: X-HE-Tag: 1721415454-339884 X-HE-Meta: U2FsdGVkX1+kRR2/e38g80b9GV8fdHnL0YjuXzaokl3sLCzCN9viR+24aIGY2nENaFpY/UwlS8RwwLLThMX1peRfSMRMQRxRMhmqhCfzHvAx+azKUsV3ZNHO1TufVrjh0qDw1524cfzV6r5WyEktybIuynMmG8LsjgSYVriuxTEHfXFiibGk0he6ffVWIKyptOcTfbaslU2nQeLuEU45GELct3vxzyjzwt33khtZax9xEJv1a9Nb4j9t1Vd8FZ9tj6Kz0a9gWniCG3wv8ECQuB45f0UhOXwBaZmybUOs8KwxEM2TwmSvZoontGPFR3aZ/7CtqueFVdiyQUfsGkFIzBSYDsoZnr6KvBtblL3CBBS3YX2TGwfQz1AZy2LXSv/y2FBZDPn8FoxkHB5wLPUcngnt8vj5Ji3q2hSTVQBz/IsbwAKzJREXPHxt8aW6GGif9AhlstsUMW8KdmD9QfYmSnPCeYgqevPkE77u9uE56ByVix515lbpdImGRQuVmrJoZ5opgnV43nsCPXZmVdIve89mjKy0+q+AMYt8FMPV0CUt4dZvXgBf7LP3C1CDKFuKiG2O2FbP+HWX/iG3+SbUVGTl1+qlknLBPgZ8fp7fKHQH28uDtsm5wnl5KyD1/l7mFgK77cfgVYsHXHgRTzTU7IdiuNLNG0R0sm/LmNxVo5mMtNiQxZ31yD2tdeuCjr4hHW8AkrRlaqNtintRnjM1F5kiG6gS0qWMhXlvZqAdrtSxW7/zWEJDnVHZSAh6P/DP6OD8EK3t/Y+rXgvLwOetsEWE6I16/3FbuOPlXGIbx3H0YxT5EehZHcS2JyRSMBqd5H+zgUMEeDwvIZAadBRh+NSoOpskyel6pYrWdj+xflKUXKz033+tmyLrCSWCMiwl7UuXGvAiNzS9qT8kwhhcOhEkQzg1JGaEGtRO6xY6ltLoh+GEy5tRKix7a1aZYiFLmnFjelQSShE+ium7YhM XX4MuSqT nWe93LsElSnfwbkdX/JtBN53dkdWyJRKzoiw1xQPWPddhEuofs+VDPmrYnXheYMVR+8Jkm8KNNtAhUd7bhc/o0EL9QE3jzmRC8pzzJyZufwL8ciwNaJfNAtAiiPPxlsSTWsMy4uMNdMBhQNY/YH9Vs1a+Z9q7ruT1Bsf6otjd1xM2OjKXb4PVpIQGW6B/l8KTyQxYzb8HALxCfD8DfOJVjTSKhl/S+u8V+v8H5T98uruk1Tk5eaqobj5wnrcOVyghhUE8RcZFIW5neqRPbPPw23fLdKE7jfk5SIrUycFO0ZyE/xu7qzfk9g0OdlVWk3ahk1d0Iknj34V7HRaslsIFcsKbaSK5NjPnpQ2VAcXw7vrmIFYgn1wkigSLQeYL4CTwZUUcqdBxImtdcsQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000050, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 1, 2024 at 2:07=E2=80=AFAM Nhat Pham wrote: > > On Sun, Jun 30, 2024 at 10:58=E2=80=AFAM Pedro Falcato wrote: > > > > Hi everyone, > > Hi Pedro, > Hi Nhat, Sorry for necroing, but I've been really busy and this issue totally left my mind. Following up so I don't leave you hanging: > Thanks for the bug report! Taking a look now - some preliminary > questions to narrow down the suspects and aid the debugging process: > > a) Do you observe this bug in 6.8? 6.10? Yes. > > b) Have you run the faddr2line script to verify that the line that > triggers the crash is count_objcg_event(entry->objcg, ZSWPWB);? I did not (distro kernel does not seem to have debug symbols), but it's pretty clear from disassembly. > > c) Do you have a full dmesg log? Or maybe some other reproduction instruc= tions? > > If entry->objcg is garbage, then this smells like a lifetime/reference > counting issue. Either: > > a) The zswap entry itself is garbage. Not impossible, but seems > unlikely. In 6.9, we effectively isolate the entry first through the > swap cache, then check and remove it from the zswap tree (under the > tree's lock). The former locks out concurrent accessors, and the > latter should have taken care of invalidated entries (and prevents > future invalidation attempts). Furthermore, after this, if the entry > is somehow garbage (i.e freed and recycled), it should also be > possible to blow up in the decompression step first, by feeding a > garbage handle to zsmalloc and crashing the kernel at that point. IOW, > we should also see zsmalloc crashes in addition to this particular > crash, no? I cannot think of any protection mechanism that applies to > the decompression step and not to count_objcg_event(). > > b) entry->objcg has been freed/recycled under us. This is much > trickier, as the culprit could be any holder of the objcg reference > who accidentally double-released the reference it held. That said, if > it only happened on zswap shrinker path, then maybe there is something > to this... > I have a separate theory. I also run the NVIDIA proprietary drivers. slabinfo -a shows us: [...] :0000080 <- zswap_entry Acpi-Parse kernfs_iattrs_cache uvm_tools_replay_data_t Acpi-State audit_tree_mark [...] See the uvm_tools_replay_data_t there? Yeah, it's entirely possible some random nvidia.ko bug has been corrupting zswap_entry from time to time (which explains why e.g the big server people have not seen this). I'm not sure if Yuxuan is running the same driver, but their kernel is also proprietary-tainted. As such I'll refrain from posting more about this or similar bugs until I can get a guarantee it happens with a non-tainted kernel (fwiw, I have not seen crashes for 2 weeks or so, hopefully this issue is fixed). Again, sorry for not checking the taint before posting this, and thank you for your time :) --=20 Pedro