From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B373D6207F for ; Wed, 20 Nov 2024 02:32:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C469C6B007B; Tue, 19 Nov 2024 21:32:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF6546B0083; Tue, 19 Nov 2024 21:32:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABE7B6B0085; Tue, 19 Nov 2024 21:32:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 939DB6B007B for ; Tue, 19 Nov 2024 21:32:11 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 094B5808E6 for ; Wed, 20 Nov 2024 02:32:11 +0000 (UTC) X-FDA: 82804895250.17.F2D12EB Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) by imf14.hostedemail.com (Postfix) with ESMTP id 8B04C100009 for ; Wed, 20 Nov 2024 02:31:12 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=UnG5Y3Ma; spf=pass (imf14.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732069684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7YpUv7rVcJFYRzzR3PZNYnI2fm9zbQhap8P3eqbO/jA=; b=tapWonQwNIth687ogvLtGnm+dcdtIiET7mxS1b9qSlRqIXAEOpd7Iq4IIkV05ZEarM9tGk B+WNdct1UIFapC9qvvRAK22R9KXTMrtqu3CfPhFXnIFhH2BReVRqOgvKujS/BLqs1vNRkX N6iwXuZabh5CY8/koYJe1f19lKeY2JI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=UnG5Y3Ma; spf=pass (imf14.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732069684; a=rsa-sha256; cv=none; b=ic7ELqA/K3QleVL50Hq3tbcLX9PJVDGQBq/xK2F9EvJxms2851aoO4Ylwz7m5mlj5MG24B ierpUOeyM9Ve6Re7KI1VdV6JOiLzDAH9l6cnPsBJw9CESnr9xY89MKc5xP0TCcnGBttelM xBlvZxxPmeNny+6r17CXTAsUVFrm7sI= Message-ID: <78374b54-1a68-4dc1-a220-dcef30a338c1@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1732069926; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7YpUv7rVcJFYRzzR3PZNYnI2fm9zbQhap8P3eqbO/jA=; b=UnG5Y3Ma0zxvODOk1WVsIt4vW/4aUWryiIWUbzxytKAqczBJ4qnd1iplphXbukQUOTD2K6 TTXMzHT1kwgWNXkvZydlCA5X2SduFfYEwHTdJi0vUIBcb7ZsbRNixmmtdnjKcE2Xf3N52w Fm8c4UpKMC3sAfGcWn4z35tC3A+QVcw= Date: Wed, 20 Nov 2024 10:31:53 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in zswap_decompress(). To: Yosry Ahmed , "Sridhar, Kanchana P" Cc: Johannes Weiner , Nhat Pham , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "usamaarif642@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Feghali, Wajdi K" , "Gopal, Vinodh" References: <20241113052413.157039-1-kanchana.p.sridhar@intel.com> <20241114051149.GC1564047@cmpxchg.org> <9a807484-6693-4e2a-a087-97bbc5ee4ed9@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8B04C100009 X-Stat-Signature: 1ic8tawytoh8u7htgh9ozp89apgq3md3 X-Rspam-User: X-HE-Tag: 1732069872-806255 X-HE-Meta: U2FsdGVkX18LU9MQrOmMk05cG5qvYp4EMr8th9fKE+1GJ6my94xGvTYgcJwjeTttSPbpIAsP7ySGu9lcsTEpr++Y4zbhhVm61Hc2oTe5s6hr12UuE4xFdvA8dVtUoN0H6KHxnMzyHNehfuoMvERvPSszsL8oxHdScv6crndm3+6bMMhhEozSTupVQT7RmyISTYXdr/VajE5rXRVGSNUClcwGJ/j7mfDVv4/lyFvGZFOeDbAgRpudxx/2VqtMkut6YynMHUgTLy4iLsKeGNU+NROyQDELShQZrQngdj3yoTA2SPsHqMQB53mkG25Sn9DCAJvLs3wNNe8wPkCBj4v/kicBKgODi4d9Tg4otxURN2aX6KpQxYwOsfJ4PRVsjETPc2e3m1SozTBWR09H7KtLuoDLzdquWBCks620IhJgI9JtrqYHPzuwiRKJx7ftictC6ymjMFgGdL+3X7e/yutdXfarvMrfIAukeTGB38CI7+g7zN+pCpZHpFJcD4qdymi08oGRbxJ3fRScyNNxcwJ8cMZlTaANtVHpX0uvp/lVgUqEuz5aan4btE5Yfu0EYplaIn9rBbJeoTtxjy/clJSYTmjdJPB0jhicEBmWo/FzTXzrD03MxvSVwHIjFCrfEvItc00j8xa3Tn6ebWc+qBG7+7VJuyizjo8ChOe9Pi0dHg/qiPqBTm7qLQOO68bwFDAQ5Xj9BRmP23Uhz3pxNoOz2V58wlrHYQg4xo6HUSpn21b0Gj/bmGYFigCOrXKUsM00l1G5bx9barI1k9T1aSpZ/68PDC3QAz4Fiqav6RVf4UjXGwrAanHjYotLEWx/ZDbERQ3VUIsYarR6N6zPaP4edO39bo2Cfw3gOpBp0SDmoYthxVFstEOUT6fd53J/wAIm/vVpu94qVHbvsTWjwIZN73xJKFNfZkXk/9FNGFzgkEvwCWmB77PQHV615+db/MqX5gldnQzi+1texwKK3Ri dYMf4tIc HWzFqABccqbrj1LKPzNMpy2Ue7RrtsBjOtAtuscoOSaff/bDiu86UUupRmFf2WQZTLlc6XPwQyO0aTrbYEhK1FO9WdidXd5N+Sa+8Eshu8d2qa+z4dWd2Ig0oKlTVQNHB7zTY+YV5jtM3a3BHMRLvtpfQzx7AFy/F48ZWp0iOsgRrVmWQ+qPAEkS84OEXNZzFrIgI97Nlji0RxL9O4WyJBZuEM+jFfvXGo81lvPtvVisAHP9MpJjr6rXTCX01ELJcPmHaJoXMCvqWbqmuDM5BdTJHL71H9iTEHGh+1QN8One06hivmS0T3VIqftJEI0XGiMNVBEKx34j5zfqLsyZMrRBPNsaRxsLk38fkDpefeIdxIq9dg2Y7HVKxAThuRfAySh3W15L30VwHrIvUYrDVo9XhbRCMGc2CG1gHFwJ8Z0jAfuPonTswLlWYJmuWw5BH06pZftUPoXNOXfyFAC1b2js7mf6SI5IJoivglkX4hAKcSz5x9nyldyL3yWJHvJRqdES1StQOZhJ3//5W+M0R/1qVDJA3KjGdktVB5XLfFHohceoBJxQir5Kpw8SM1giykEBf9bP6vLTH4rPmk4GZLJdY1eWiy+uDJPHqruPeSj4b/ANkYosI0Vfj82LdHp5xOj4BJqOZiXhhDVkHPAJEHhKyfA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/11/20 07:44, Yosry Ahmed wrote: > On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P > wrote: >> >> >>> -----Original Message----- >>> From: Yosry Ahmed >>> Sent: Tuesday, November 19, 2024 11:51 AM >>> To: Sridhar, Kanchana P >>> Cc: Chengming Zhou ; Johannes Weiner >>> ; Nhat Pham ; linux- >>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; >>> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; >>> Feghali, Wajdi K ; Gopal, Vinodh >>> >>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>> zswap_decompress(). >>> >>> On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P >>> wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Yosry Ahmed >>>>> Sent: Tuesday, November 19, 2024 11:27 AM >>>>> To: Sridhar, Kanchana P >>>>> Cc: Chengming Zhou ; Johannes Weiner >>>>> ; Nhat Pham ; linux- >>>>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; >>>>> ryan.roberts@arm.com; Huang, Ying ; >>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K >>>>> ; Gopal, Vinodh >>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>> zswap_decompress(). >>>>> >>>>> On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P >>>>> wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Yosry Ahmed >>>>>>> Sent: Friday, November 15, 2024 1:49 PM >>>>>>> To: Sridhar, Kanchana P >>>>>>> Cc: Chengming Zhou ; Johannes Weiner >>>>>>> ; Nhat Pham ; linux- >>>>>>> kernel@vger.kernel.org; linux-mm@kvack.org; >>> usamaarif642@gmail.com; >>>>>>> ryan.roberts@arm.com; Huang, Ying ; >>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K >>>>>>> ; Gopal, Vinodh >>> >>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>>>> zswap_decompress(). >>>>>>> >>>>>>> On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Chengming, >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Chengming Zhou >>>>>>>>> Sent: Wednesday, November 13, 2024 11:24 PM >>>>>>>>> To: Sridhar, Kanchana P ; >>> Johannes >>>>>>> Weiner >>>>>>>>> >>>>>>>>> Cc: Nhat Pham ; Yosry Ahmed >>>>>>>>> ; linux-kernel@vger.kernel.org; linux- >>>>>>>>> mm@kvack.org; usamaarif642@gmail.com; >>> ryan.roberts@arm.com; >>>>>>> Huang, >>>>>>>>> Ying ; 21cnbao@gmail.com; akpm@linux- >>>>>>>>> foundation.org; Feghali, Wajdi K ; >>> Gopal, >>>>>>> Vinodh >>>>>>>>> >>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>>>>>> zswap_decompress(). >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> On 2024/11/14 14:37, Sridhar, Kanchana P wrote: >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Johannes Weiner >>>>>>>>>>> Sent: Wednesday, November 13, 2024 9:12 PM >>>>>>>>>>> To: Sridhar, Kanchana P >>>>>>>>>>> Cc: Nhat Pham ; Yosry Ahmed >>>>>>>>>>> ; linux-kernel@vger.kernel.org; >>> linux- >>>>>>>>>>> mm@kvack.org; chengming.zhou@linux.dev; >>>>>>> usamaarif642@gmail.com; >>>>>>>>>>> ryan.roberts@arm.com; Huang, Ying ; >>>>>>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, >>> Wajdi K >>>>>>>>>>> ; Gopal, Vinodh >>>>> >>>>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory >>> leak in >>>>>>>>>>> zswap_decompress(). >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana >>> P >>>>>>> wrote: >>>>>>>>>>>> So my question was, can we prevent the migration to a >>> different >>>>> cpu >>>>>>>>>>>> by relinquishing the mutex lock after this conditional >>>>>>>>>>> >>>>>>>>>>> Holding the mutex doesn't prevent preemption/migration. >>>>>>>>>> >>>>>>>>>> Sure, however, is this also applicable to holding the mutex of a >>> per- >>>>> cpu >>>>>>>>>> structure obtained via raw_cpu_ptr()? >>>>>>>>> >>>>>>>>> Yes, unless you use migration_disable() or cpus_read_lock() to >>> protect >>>>>>>>> this section. >>>>>>>> >>>>>>>> Ok. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Would holding the mutex prevent the acomp_ctx of the cpu prior >>> to >>>>>>>>>> the migration (in the UAF scenario you described) from being >>>>> deleted? >>>>>>>>> >>>>>>>>> No, cpu offline can kick in anytime to free the acomp_ctx->buffer. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> If holding the per-cpu acomp_ctx's mutex isn't sufficient to >>> prevent >>>>> the >>>>>>>>>> UAF, I agree, we might need a way to prevent the acomp_ctx >>> from >>>>> being >>>>>>>>>> deleted, e.g. with refcounts as you've suggested, or to not use >>> the >>>>>>>>> >>>>>>>>> Right, refcount solution from Johannes is very good IMHO. >>>>>>>>> >>>>>>>>>> acomp_ctx at all for the check, instead use a boolean. >>>>>>>>> >>>>>>>>> But this is not enough to just avoid using acomp_ctx for the check, >>>>>>>>> the usage of acomp_ctx inside the mutex is also UAF, since cpu >>> offline >>>>>>>>> can kick in anytime to free the acomp_ctx->buffer. >>>>>>>> >>>>>>>> I see. How would the refcounts work? Would this add latency to >>> zswap >>>>>>>> ops? In low memory situations, could the cpu offlining code over-ride >>>>>>>> the refcounts? >>>>>>> >>>>>>> I think what Johannes meant is that the zswap compress/decompress >>>>>>> paths grab a ref on the acomp_ctx before using it, and the CPU >>>>>>> offlining code only drops the initial ref, and does not free the >>>>>>> buffer directly. The buffer is only freed when the ref drops to zero. >>>>>>> >>>>>>> I am not familiar with CPU hotplug, would it be simpler if we have a >>>>>>> wrapper like get_acomp_ctx() that disables migration or calls >>>>>>> cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar >>>>>>> wrapper, put_acompt_ctx() will be used after we are done using the >>>>>>> acomp_ctx. >>>>>> >>>>>> Would it be sufficient to add a check for mutex_is_locked() in >>>>>> zswap_cpu_comp_dead() and if this returns true, to exit without >>> deleting >>>>>> the acomp? >>>>> >>>>> I don't think this works. First of all, it's racy. It's possible the >>>>> mutex gets locked after we check mutex_is_locked() but before we >>>>> delete the acomp_ctx. Also, if we find that the mutex is locked, then >>>>> we do nothing and essentially leak the memory. >>>> >>>> Yes, this would assume the cpu offlining code retries at some interval, >>>> which could prevent the memory leak. >>> >>> I am not sure about that, but even so, it wouldn't handle the first >>> scenario where the mutex gets locked after we check mutex_is_locked(). >>> >>>> >>>>> >>>>> Second, and probably more important, this only checks if anyone is >>>>> currently holding the mutex. What about tasks that may be sleeping >>>>> waiting for the mutex to be unlocked? The mutex will be deleted from >>>>> under them as well. >>>> >>>> Wouldn't this and the race described above, also be issues for the >>>> refcount based approach? >>> >>> I don't think so, at least if implemented correctly. There are a lot >>> of examples around the kernel that use RCU + refcounts for such use >>> cases. I think there are also some examples in kernel docs. >>> >>> That being said, I am wondering if we can get away with something >>> simpler like holding the cpus read lock or disabling migration as I >>> suggested earlier, but I am not quite sure. >> >> Another idea to consider is how zsmalloc avoids this issue through >> its use of the local_lock() on the per-cpu mapping area. This disables >> preemption from zs_map_object() through zs_unmap_object(). >> Would changing the acomp_ctx's mutex to a local_lock solve the >> problem? > > This is similar to disabling migration as I suggested, but disabling > preemption means that we cannot sleep, we spin on a lock instead. > > In zswap_compress(), we send the compression request and may sleep > waiting for it. We also make a non-atomic allocation later that may > also sleep but that's less of a problem. > > In zswap_decompress() we may also sleep, which is why we sometimes > copy the data into acomp_ctx->buffer and unmap the handle to begin > with. > > So I don't think we can just replace the mutex with a lock. > >> >>> >>>> >>>> Also, I am wondering if the mutex design already handles cases where >>>> tasks are sleeping, waiting for a mutex that disappears? >>> >>> I don't believe so. It doesn't make sense for someone to free a mutex >>> while someone is waiting for it. How would the waiter know if the >>> memory backing the mutex was freed? >> >> Thanks Yosry, all good points. There would need to be some sort of >> arbiter (for e.g., the cpu offlining code) that would reschedule tasks >> running on a cpu before shutting it down, which could address >> this specific issue. I was thinking these are not problems unique to >> zswap's per-cpu acomp_ctx->mutex wrt the offlining? > > There are a few reasons why zswap has this problem and other code may > not have it. For example the data structure is dynamically allocated > and is freed during offlining, it wouldn't be a problem if it was > static. Also the fact that we don't disable preemption when accessing > the per-CPU data, as I mentioned earlier, which would prevent the CPU > we are running on from going offline while we access the per-CPU data. > > I think we should either: > (a) Use refcounts. > (b) Disable migration. > (c) Hold the CPUs read lock. > > I was hoping someone with more knowledge about CPU offlining would > confirm (b) and (c) would work, but I am pretty confident they would. +1, I also think (b) or (c) should work with my limited knowledge about CPU offlining :-) And they seem simpler than refcounts implementation. Thanks.