From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33B28C02198 for ; Wed, 5 Feb 2025 19:06:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B28AE6B009A; Wed, 5 Feb 2025 14:06:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD841280003; Wed, 5 Feb 2025 14:06:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 978936B009C; Wed, 5 Feb 2025 14:06:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6BB3E6B009A for ; Wed, 5 Feb 2025 14:06:22 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0C369A07BA for ; Wed, 5 Feb 2025 19:06:22 +0000 (UTC) X-FDA: 83086821804.16.9E24427 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) by imf03.hostedemail.com (Postfix) with ESMTP id 012872000B for ; Wed, 5 Feb 2025 19:06:19 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=RaYKHMk6; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf03.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738782380; a=rsa-sha256; cv=none; b=Zw1rRzhN1VtK1gdTBD7mK1hmniGoQRpJ8MDUZarPeDrkcHOjRxZLUmv8v2W6tARy+gPsyg qOJVKBmVePczjwrSZ+HT0Fm8tdSAmmQlnxEhbikTeVOn7pmj9Omo740l0RAwdv8VOI+xqk JJhFMDmcFT7y7iqjpaC+7O4rdyn2Vsk= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=RaYKHMk6; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf03.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738782380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TZnjECvRPRLfx0ZfOm8La4P4ThFoINb5xgu2oElZf/k=; b=O8yRRbpwo/ACIpraicoTez8zllnw+ml8leaIpGTLBINXno+Tqpy7yYrT6zTEs2CuqUd725 uIaixeuH6+9mM+lYjNcVL/32WD+YBjeY2El57e+ijjeBr6yeD/EnV8gDRUMRNxudz+WE4S SZVJ3jquPAuY1KhdIRtJRnZxYpdZ3yo= Date: Wed, 5 Feb 2025 19:06:08 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1738782373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TZnjECvRPRLfx0ZfOm8La4P4ThFoINb5xgu2oElZf/k=; b=RaYKHMk6xI+AgIZbFCN5f6ZRaT57E3ez9d8PXrQCQX3299zQnr0OZJ/Uo8A8j/PDU6lljx x+6uN2g9pRusSRNK7v8qHX+LSlpVD3U8CJnFvFXb2g/Gu6e+1RAZvXTADS7DMKrffMDnVO FNC0XC3u33RCmKzQn3Q9dwk18J52CnU= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Sergey Senozhatsky Cc: Andrew Morton , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Message-ID: References: <20250131090658.3386285-1-senozhatsky@chromium.org> <20250131090658.3386285-15-senozhatsky@chromium.org> <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 012872000B X-Stat-Signature: f899ipcu6nk91kzjah5jta9c33hrd55h X-Rspam-User: X-HE-Tag: 1738782379-822762 X-HE-Meta: U2FsdGVkX18LyA/tRGXfaS6kKY6uEPqNIah05EqeFL5+iLTwhQj8p8Fz+TkmDS4vrgUPrKz4AbaA1Xuou+2aoIz/JPssre0jtOKnCmqSxNpZNtKpxHWyI73cMSQbttyzB9HNcVXJV+RSRZ4vGPdno/iua7NaOqA0G7ME8K26cL/K7pN0yNgmJi5Lw0/pfw85r/GU5OT46F5YCxLJ6zpaRCtlj0M/OiHE/XKuLMcA13r/gu8plXi96TxwRtbcn1udWuL2ewePTPexnq58VEn7E5AgFue1NcM3ceDCkiznfOuXZwCWY6DTzsArYgpcG8eLeFt9srrAHLHPVbovzbCXtNQA92i0dNUsfcI4aYKh/kBpjF9u0dKxY5D3Rc94vtJe/eLYW95/1FwQRCNAf1xsDXvq42kfBXlae+4BazIdXSrmhjRfqcdDhEkayiuBLWYC3gnkCRbixzD+4YZg87XNUIFPyOvUnVEk5fGiGGeE8lUwRpybOgoTFNuIBvSVpWfALEPKE2hrDkUOO4HX/uTsgTS463gNlNCr9gJBkpu+iSH2O/p6UbwOfnV4e2DqbnKkLpExIo42/SfqYigpYEgCQpfr4NfoBMh6p0wX3dZyMmFNdjrZzekG9NopAlBxrjwrrA1/l8BpRKJDUIiBq8KZY24hjx1ohOxCDh+rYPoxuLmfNQEj3/i4LfmUDcKBIWqjHi5xSCbsU/yt7xCM6newup/VoJkXNSQrXRqnl0CW+RqzEM0ewg9oHg5Ei+DB6XgLpwNc1MHrO2ISCeSjJ/z1wSdRz60fwC7O6Lq9SDuT+EhX1jdOH7PYzHtBlH5yxG5SSEvDP5dlSLNZlz4gg9SIPRtJpulcOcLszGRx6A3i84J2BzKmc48cAHEFpaFiUizC5aoL7pyaJRfBszIGseAFKWbE+mHgXYMrVpdquxFCMwI5QiQx8L3bFcE2APMzYaTRos9GvGrrpaqsP+cGjtl 09KsxgLW 5bwch3VX4WBnFstBIJDpU/WS1BWouxRdqTI1P/7Tjs6loQ2P9ZnPh4Nn7xYiqoAWZ3dUgC75P10Fthma7Rrw6CUt7Bb9jh62HtYFttfZW2oHWt4Y++BkgYV5aE57xVuqe96uac5F6QIKu7MQCRB/NfeVkP0zno4G+q4Mn5urshb/kS2mzm56wIFgWaQN8wWoCQyXg4kWpeO3gG03EoH9HQ6ehtg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 05, 2025 at 11:43:16AM +0900, Sergey Senozhatsky wrote: > On (25/02/04 17:19), Yosry Ahmed wrote: > > > sizeof(struct zs_page) change is one thing. Another thing is that > > > zspage->lock is taken from atomic sections, pretty much everywhere. > > > compaction/migration write-lock it under pool rwlock and class spinlock, > > > but both compaction and migration now EAGAIN if the lock is locked > > > already, so that is sorted out. > > > > > > The remaining problem is map(), which takes zspage read-lock under pool > > > rwlock. RFC series (which you hated with passion :P) converted all zsmalloc > > > into preemptible ones because of this - zspage->lock is a nested leaf-lock, > > > so it cannot schedule unless locks it's nested under permit it (needless to > > > say neither rwlock nor spinlock permit it). > > > > Hmm, so we want the lock to be preemtible, but we don't want to use an > > existing preemtible lock because it may be held it from atomic context. > > > > I think one problem here is that the lock you are introducing is a > > spinning lock but the lock holder can be preempted. This is why spinning > > locks do not allow preemption. Others waiting for the lock can spin > > waiting for a process that is scheduled out. > > > > For example, the compaction/migration code could be sleeping holding the > > write lock, and a map() call would spin waiting for that sleeping task. > > write-lock holders cannot sleep, that's the key part. > > So the rules are: > > 1) writer cannot sleep > - migration/compaction runs in atomic context and grabs > write-lock only from atomic context > - write-locking function disables preemption before lock(), just to be > safe, and enables it after unlock() > > 2) writer does not spin waiting > - that's why there is only write_try_lock function > - compaction and migration bail out when they cannot lock the > zspage > > 3) readers can sleep and can spin waiting for a lock > - other (even preempted) readers don't block new readers > - writers don't sleep, they always unlock That's useful, thanks. If we go with custom locking we need to document this clearly and add debug checks where possible. > > > I wonder if there's a way to rework the locking instead to avoid the > > nesting. It seems like sometimes we lock the zspage with the pool lock > > held, sometimes with the class lock held, and sometimes with no lock > > held. > > > > What are the rules here for acquiring the zspage lock? > > Most of that code is not written by me, but I think the rule is to disable > "migration" be it via pool lock or class lock. It seems like we're not holding either of these locks in async_free_zspage() when we call lock_zspage(). Is it safe for a different reason? > > > Do we need to hold another lock just to make sure the zspage does not go > > away from under us? > > Yes, the page cannot go away via "normal" path: > zs_free(last object) -> zspage becomes empty -> free zspage > > so when we have active mapping() it's only migration and compaction > that can free zspage (its content is migrated and so it becomes empty). > > > Can we use RCU or something similar to do that instead? > > Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data > patterns the clients have. I suspect we'd need to synchronize RCU every > time a zspage is freed: zs_free() [this one is complicated], or migration, > or compaction? Sounds like anti-pattern for RCU? Can't we use kfree_rcu() instead of synchronizing? Not sure if this would still be an antipattern tbh. It just seems like the current locking scheme is really complicated :/