From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3259DD6C2AF for ; Tue, 19 Nov 2024 23:45:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64A786B007B; Tue, 19 Nov 2024 18:45:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D1B96B0082; Tue, 19 Nov 2024 18:45:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 472D76B0083; Tue, 19 Nov 2024 18:45:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2154D6B007B for ; Tue, 19 Nov 2024 18:45:24 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 972E51C6900 for ; Tue, 19 Nov 2024 23:45:23 +0000 (UTC) X-FDA: 82804478022.28.E46D79E Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 7FD9010000A for ; Tue, 19 Nov 2024 23:43:45 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L1rk7P61; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732059855; a=rsa-sha256; cv=none; b=P0aDaN62NvegsBPJ7bIMDNrFnHDJisnOTBk0nqtec/y//HSKpJhMoOvW4ku/gsaJp3zyK1 EDeWEwK8r8ZqNmvpuggPmkY7Krt0vovRrBffXSXu1DxbyUGKhMFzQdF+8j3U+YrsBGEJw2 MQvYAQkGbtg6MgC9o2Gs9FpcJFLF8go= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L1rk7P61; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732059855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cWlUuRMKiApRmaxAbXuXbiTLO2hmGeeb6kTTIbzYnVg=; b=gLEhK6PJrmS5CmNrTsyFf2uNBCntgI8gmaJgJtDBl1iVA/1fiDh63U0h2s26fPxhgDv4Hq tEvJUUnLL4afVwLC+X8rESmV+13tNOi2YrZSMHn2OwxW1EpYG0fIsqcJ9T9DTx/wD5oUPP r8L+pTo0Jp64XSI+erz27wIfwTCKtbc= Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-50d4a6ef70aso1721020e0c.3 for ; Tue, 19 Nov 2024 15:45:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732059920; x=1732664720; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cWlUuRMKiApRmaxAbXuXbiTLO2hmGeeb6kTTIbzYnVg=; b=L1rk7P61vcecWjY1wcj5IIlELUuarw+CB8auGqTY1pPMy0pDG+oyoFtl2+NuJZdOts 5rvADa3a1++eI4m/Fv/GGwdI/xpuigkcy5gBPFv60ZOMlSb7cfbBRs+DXrz+QJwMW2To q7lfDU66gO4kaQj7kg0fyQREDXn4tDmjG3tAzV9xlpS0dyNHCzRmxNZKA1Q/LCBBkJmf Eed85fbWcKAEjtqFEH9XUZZIpMeyaTpJ6wQId2Yfj4PrK+TnnQzef3US5PcBtkPNcjaY Qty2p3IS68VGh9jXqpmtYGv1La7a+KsS33cx13+agJUjjzlkkE7uXh4owBJCuiSyr5R5 shiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732059920; x=1732664720; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cWlUuRMKiApRmaxAbXuXbiTLO2hmGeeb6kTTIbzYnVg=; b=tnNeEjwtlUx+eIBJu6HagNW9R1H94+/tu4PRCJcPNhBfrTiPmBRJrSX37Q38Ox2Ce+ n2YpKb79rYJ+ttWWjhTjeZrYTbeQ0bABH0mmb1AYK4GZc3C23OpG2gExUWIwuOC+tbPu HjVrOO2wu+O1LnD04iRYoeAUgxfaeUNX1SRONTufY5Sifc+YUR9AYZilv6uEtbCwaBTb SkRJKsFDyYz2U2P+5rr7vKVwEeH422KAXyxC7P/zKwdQkHE6DXnnxNNf17oi/M4l1jLF E2EubSwzjCJe+pbiyb9yuUBkeT6fZkHCWbFvGZ17wVDvPDfQJLFFDxEG/kJLgkQIYN38 ctbQ== X-Forwarded-Encrypted: i=1; AJvYcCWd4rPUWPtfmSCw8RhSogrvFU+tqP+y7FCaALqlF90wDieQLBXpK4UuumCnvfl2CVBWLi7A0xFYVw==@kvack.org X-Gm-Message-State: AOJu0YylSgrR5wrGpXSMg70gJqd02PXP8qrsvAT8rP9YQ5KPr12R0/5X XMQMxBHhpVgfmAnYtS1/Wrwr2kFwbgDNvxBep/eKY8QEoBf9iwhUlEM+ijnBKNurvAqc5/ydmYA 6mCQu5g0Uzwx2I83RgLdNXSZGhhjMD7u7EzVN X-Google-Smtp-Source: AGHT+IEiyvDLitEaGAqt71fDospHnsnVYuKxpk76DhOs0+vvYOYhS+67eX7noEcBYj784G6tFdvXmAoy7R1dRc5/Ues= X-Received: by 2002:a05:6122:1da5:b0:50c:55f4:b529 with SMTP id 71dfb90a1353d-514cf965221mr883450e0c.8.1732059920404; Tue, 19 Nov 2024 15:45:20 -0800 (PST) MIME-Version: 1.0 References: <20241113052413.157039-1-kanchana.p.sridhar@intel.com> <20241113213007.GB1564047@cmpxchg.org> <20241114051149.GC1564047@cmpxchg.org> <9a807484-6693-4e2a-a087-97bbc5ee4ed9@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Tue, 19 Nov 2024 15:44:43 -0800 Message-ID: Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in zswap_decompress(). To: "Sridhar, Kanchana P" Cc: Chengming Zhou , Johannes Weiner , Nhat Pham , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "usamaarif642@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Feghali, Wajdi K" , "Gopal, Vinodh" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 7FD9010000A X-Rspamd-Server: rspam11 X-Stat-Signature: h149qnjd1zutfpdkzo8teujfnfca383s X-HE-Tag: 1732059825-383486 X-HE-Meta: U2FsdGVkX18jqH895JCm/NApDlCHXj3m5XiRfOaY22eGI6y0kYd3p38DB5EIXi7LWz7wjuXqgvapQNGwSxUGchY2teA0JI1Qvaew76pNGvd2S59xyllkySd/bvrWe8pMFHhavTUqWD/IXVu0TQmv0PVV6MFbbrSI84c9fSE1S6fGw9wE04wrgqLE2ROf3Ma05b6jbQGay+OenrQ+Y4QeQLbRqnWsiGmlQluzt4Tp43Y89VfUJhMYz4mJkhmZCvyAdtRYqFiFr2yjJSwP40KYi0r1IV5/g7kyrPaQAIUa6tJNpZ2V0SqAivyxudr3hVf7eo6Fhtu2I+Bk34NHwvbI2eZCVobp+WGyvx492Xb3DgH7IqvasfEmEi3EUw48piPg8DpMtiq+fCbxQrpu8a9Ib7gW0t0Ea6aiLjJKBaZE4+9/8QdjithCwvnmONKTYUVrXmlk2aMd+M9OIevrWmDMuUtlYiON21gDbC9ouOFp3eQh82TKHSNIMnQb4K2iYjira+gDM9pv5Jntgk4KrsyC1WZu8I1+7D5eW+xkd1j1H5NHCZbtEHPbVJMLQ1CS7gVTXi23nVN6Qp94BiWo6BkoArlCzrtYwEk1gaC8XCXl9VjI0z8w8qhftA9ZbH/Q407x79R7mt2uPVMZTk/UN6uXB5qYclQP2gjo5AC2H60T7ATjzCPA/Eqa3tas5vaApnN/mJoBj9juGh1iNRntUgW26aWK43GlyYUGVnur0wWb/2++G2/8jaUUOTY622lMCVuG6jVT2pptgM2gp5JDb8Of5+RYVA9aZruvoFiRXp7U5sUKAyubJ6ixvXCm5wd1l/YPtoI71U6O6Y1JQRblAL4inK77jMScoXYJurX6kugevRDcZEEFOaSWyw8CCUZV5i8d/mqUVrqX9oxVaUeszctkVpBcc0TR9j9128sroagyaaFlFiHuJNGEH9abDYOiqJEdyplnNOZH4gUDVzUIuGA pLC/1yMB ArZ8LjHQ9VxQgeVqXsCSrsUr8FFBZx8E0CDjFoY5dGREITzAaKrRNdZxqXFj6OGG2GUbVXXFzGoZD5h2DPc9rNWhAnxgQa2sMDcp6OBd2WtsMO/BXg2i4s1kzINbUBifLpxCuve4bs5gyOmGPxfoGy7LcFG2lMYByoGKLsYH7sSLFqICWtRs/8tcP9aqsrnWXaDj9h/w8+wnpDPRPTNTn4Cgk4TmZPqQG6AZq6WjXoErmdvzmPiIDxQwUH7hpO6VHvK9j5IfQ9oFbK68V3VF3pwIvQz1da+4S/NT9nJcMC+P4/gGoapaam/jJpOqUpKSCt5EIRufPYGxfRJGW31DCPq9MwK3V6qoYTa5BYfJWi+R5TF7qt2S7Tpv/NqVGPeYL5N9xUbMRytOHyPsz6flkzOqTIklo4MfG7/uX++5XuBhKqbqrgPTdbq0EILNz/XhmoSjMeQ65aH0VemgVqAFgmxQ8/j/wJ6c0wjva3kQcXcqemEMGH1ec/MjXn5hLLyHnSty87lM+iOUyr3dZnUTTn0svUu7k+KyvVSfb/N6ULt8+AYyvXp9Q6lTMt40J6ANSAjiKleX/wk303Gy3CPkqLKtRQ/cqL0zpsIJWLbc6WkMpev/D7pezDFEAo39c18TBfCXhVBsug6RqasdIQTKWVwJhSQvQm2zs26VbyNTkV8gXhuA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 19, 2024 at 2:35=E2=80=AFPM Sridhar, Kanchana P wrote: > > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Tuesday, November 19, 2024 11:51 AM > > To: Sridhar, Kanchana P > > Cc: Chengming Zhou ; Johannes Weiner > > ; Nhat Pham ; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K ; Gopal, Vinodh > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 19, 2024 at 11:42=E2=80=AFAM Sridhar, Kanchana P > > wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed > > > > Sent: Tuesday, November 19, 2024 11:27 AM > > > > To: Sridhar, Kanchana P > > > > Cc: Chengming Zhou ; Johannes Weiner > > > > ; Nhat Pham ; linux- > > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > > ryan.roberts@arm.com; Huang, Ying ; > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > ; Gopal, Vinodh > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Tue, Nov 19, 2024 at 11:22=E2=80=AFAM Sridhar, Kanchana P > > > > wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Yosry Ahmed > > > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > > > To: Sridhar, Kanchana P > > > > > > Cc: Chengming Zhou ; Johannes Weiner > > > > > > ; Nhat Pham ; linux- > > > > > > kernel@vger.kernel.org; linux-mm@kvack.org; > > usamaarif642@gmail.com; > > > > > > ryan.roberts@arm.com; Huang, Ying ; > > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > > ; Gopal, Vinodh > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak = in > > > > > > zswap_decompress(). > > > > > > > > > > > > On Fri, Nov 15, 2024 at 1:14=E2=80=AFPM Sridhar, Kanchana P > > > > > > wrote: > > > > > > > > > > > > > > Hi Chengming, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Chengming Zhou > > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > > > To: Sridhar, Kanchana P ; > > Johannes > > > > > > Weiner > > > > > > > > > > > > > > > > Cc: Nhat Pham ; Yosry Ahmed > > > > > > > > ; linux-kernel@vger.kernel.org; linu= x- > > > > > > > > mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; > > > > > > Huang, > > > > > > > > Ying ; 21cnbao@gmail.com; akpm@linux- > > > > > > > > foundation.org; Feghali, Wajdi K ; > > Gopal, > > > > > > Vinodh > > > > > > > > > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory l= eak in > > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > > >> From: Johannes Weiner > > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > > > >> To: Sridhar, Kanchana P > > > > > > > > >> Cc: Nhat Pham ; Yosry Ahmed > > > > > > > > >> ; linux-kernel@vger.kernel.org; > > linux- > > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > > > > usamaarif642@gmail.com; > > > > > > > > >> ryan.roberts@arm.com; Huang, Ying = ; > > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, > > Wajdi K > > > > > > > > >> ; Gopal, Vinodh > > > > > > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memor= y > > leak in > > > > > > > > >> zswap_decompress(). > > > > > > > > >> > > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanch= ana > > P > > > > > > wrote: > > > > > > > > >>> So my question was, can we prevent the migration to a > > different > > > > cpu > > > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > > > >> > > > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mut= ex of a > > per- > > > > cpu > > > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock()= to > > protect > > > > > > > > this section. > > > > > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu = prior > > to > > > > > > > > > the migration (in the UAF scenario you described) from be= ing > > > > deleted? > > > > > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->= buffer. > > > > > > > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient= to > > prevent > > > > the > > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ct= x > > from > > > > being > > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to n= ot use > > the > > > > > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for th= e check, > > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since = cpu > > offline > > > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > I see. How would the refcounts work? Would this add latency t= o > > zswap > > > > > > > ops? In low memory situations, could the cpu offlining code o= ver-ride > > > > > > > the refcounts? > > > > > > > > > > > > I think what Johannes meant is that the zswap compress/decompre= ss > > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > > > offlining code only drops the initial ref, and does not free th= e > > > > > > buffer directly. The buffer is only freed when the ref drops to= zero. > > > > > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we h= ave a > > > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A simil= ar > > > > > > wrapper, put_acompt_ctx() will be used after we are done using = the > > > > > > acomp_ctx. > > > > > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > > > zswap_cpu_comp_dead() and if this returns true, to exit without > > deleting > > > > > the acomp? > > > > > > > > I don't think this works. First of all, it's racy. It's possible th= e > > > > mutex gets locked after we check mutex_is_locked() but before we > > > > delete the acomp_ctx. Also, if we find that the mutex is locked, th= en > > > > we do nothing and essentially leak the memory. > > > > > > Yes, this would assume the cpu offlining code retries at some interva= l, > > > which could prevent the memory leak. > > > > I am not sure about that, but even so, it wouldn't handle the first > > scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > > > > > > > > > Second, and probably more important, this only checks if anyone is > > > > currently holding the mutex. What about tasks that may be sleeping > > > > waiting for the mutex to be unlocked? The mutex will be deleted fro= m > > > > under them as well. > > > > > > Wouldn't this and the race described above, also be issues for the > > > refcount based approach? > > > > I don't think so, at least if implemented correctly. There are a lot > > of examples around the kernel that use RCU + refcounts for such use > > cases. I think there are also some examples in kernel docs. > > > > That being said, I am wondering if we can get away with something > > simpler like holding the cpus read lock or disabling migration as I > > suggested earlier, but I am not quite sure. > > Another idea to consider is how zsmalloc avoids this issue through > its use of the local_lock() on the per-cpu mapping area. This disables > preemption from zs_map_object() through zs_unmap_object(). > Would changing the acomp_ctx's mutex to a local_lock solve the > problem? This is similar to disabling migration as I suggested, but disabling preemption means that we cannot sleep, we spin on a lock instead. In zswap_compress(), we send the compression request and may sleep waiting for it. We also make a non-atomic allocation later that may also sleep but that's less of a problem. In zswap_decompress() we may also sleep, which is why we sometimes copy the data into acomp_ctx->buffer and unmap the handle to begin with. So I don't think we can just replace the mutex with a lock. > > > > > > > > > Also, I am wondering if the mutex design already handles cases where > > > tasks are sleeping, waiting for a mutex that disappears? > > > > I don't believe so. It doesn't make sense for someone to free a mutex > > while someone is waiting for it. How would the waiter know if the > > memory backing the mutex was freed? > > Thanks Yosry, all good points. There would need to be some sort of > arbiter (for e.g., the cpu offlining code) that would reschedule tasks > running on a cpu before shutting it down, which could address > this specific issue. I was thinking these are not problems unique to > zswap's per-cpu acomp_ctx->mutex wrt the offlining? There are a few reasons why zswap has this problem and other code may not have it. For example the data structure is dynamically allocated and is freed during offlining, it wouldn't be a problem if it was static. Also the fact that we don't disable preemption when accessing the per-CPU data, as I mentioned earlier, which would prevent the CPU we are running on from going offline while we access the per-CPU data. I think we should either: (a) Use refcounts. (b) Disable migration. (c) Hold the CPUs read lock. I was hoping someone with more knowledge about CPU offlining would confirm (b) and (c) would work, but I am pretty confident they would.