From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DF29C02198 for ; Thu, 6 Feb 2025 16:16:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7E996B0082; Thu, 6 Feb 2025 11:16:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2E746B0083; Thu, 6 Feb 2025 11:16:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5C76B0085; Thu, 6 Feb 2025 11:16:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6D2446B0082 for ; Thu, 6 Feb 2025 11:16:26 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DD0881212CB for ; Thu, 6 Feb 2025 16:16:25 +0000 (UTC) X-FDA: 83090022330.11.200EAAB Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) by imf09.hostedemail.com (Postfix) with ESMTP id 2DE9414001F for ; Thu, 6 Feb 2025 16:16:22 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rzZMdbJE; spf=pass (imf09.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738858583; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1IZlSqZZbQ+v7dFcZMYYg2FQblOSM6/wVM/2+9y0alo=; b=CbdVHfPXIONOc6eCKqzgfPEVk+gqEXNjIAGH1fr+6uluZulrG3LqNMCsshv63RWQR7TKgB YmS+sdX2jLQ0Pc6z0vorcyVNk/UPBOw+0B0cZEiauH5ykqzcDqlH6O0SMddbp8sgpImXRS VTrp14+PkZ7BH6poWxQ94sJ+aXx56FQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rzZMdbJE; spf=pass (imf09.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738858583; a=rsa-sha256; cv=none; b=k7bCADisGtneBNsiR6CR7SXN2uWO4TE+H/U9kclP3y7HQVJlsgJblmdMIkBQo4w7luryyV eNEZOrdvs8FeD2dQgZava0osBqPbQ2oxwAl0V+cvG4/XjAXVCJsYMjnhaeh2f6IO9xZnGu F4qKXVQa5Jl8UNcm9kAZc9Ct4E4xuyk= Date: Thu, 6 Feb 2025 16:16:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1738858576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1IZlSqZZbQ+v7dFcZMYYg2FQblOSM6/wVM/2+9y0alo=; b=rzZMdbJEnnIcq9IYLc44Jo9x4ffGAeJ9VX/E+mmwzKEmIeRirGj02td+vP2Yo7U86eb83f /zBa//b2O9Ux3Tpf5JDfy+FTcjFBNfWsdW/TQnINajFl7V3MfnIFBtzaliZKX2hwUB3Yub PTd5FtyoiEzmFkpZ6oIv593v6qMRyhk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Sergey Senozhatsky Cc: Kairui Song , Andrew Morton , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv4 02/17] zram: do not use per-CPU compression streams Message-ID: References: <20250131090658.3386285-1-senozhatsky@chromium.org> <20250131090658.3386285-3-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 2DE9414001F X-Stat-Signature: 1f33dtadm8j439gcm5gmu16gx6xgerpz X-Rspamd-Server: rspam03 X-HE-Tag: 1738858582-784733 X-HE-Meta: U2FsdGVkX1/p07SjhN/Un4AB1HYodUbqTIVAzNkQiVz4ART2cVWUnCciPRz7DxzhC1Hv7wpiH7sPBjHCHiyvNiq6J5RsOO/AgVFXUftNG4ns60SU1OYuqEDzbZG9nvDLmZlPPz7Yb6URIKTWIugHEKBWDyBj4SmHs4FXTI/oScmB5/FEfBw6vnsVkb2mJ37ZX91PfI+X6+6hxuRG5tlYxrh0/33RdrBEQxZ6mp9wOyXNHYmi3iE4gntXQZFtj8JrN4p9izRAUkHEfn/vt7Qb8aGsnuY89R6OeaLhl8QM4fPtWIAy6IYLNn6AhLvV4wt5xc7icazSDGHgrQ8rsH1zXTJiYgTU61tEgMQXdUrtqJiAnGG8oO5KkU71o4OodFwu23MNl6ST63fook179imbDTGo6jf9BVZsqrEXYVCJ4wkFuOBwmlwYmiuID8RRmA1mvgMtipybhi3ihBAfqep8A29VdnumPSEbyZIxv++kFBRfewbjbt0A1WtsfVCQUmpEV0hP14Y0KrxnQ5EYRDpl/LyYQU3JsMkBXwMmZuPX/srdO75Ym5PIhS53NCspVB7J0UhgWL7+kVU0DVGMtlzWX7CVDjZmfiON4ny39dXeTaKSBCnT4mbajgkqIinobxPGvx8GKrlaLvjW9GBj9Nj4pO7U1wMXgWEwc7J7oWJM9KmufI74FtrlHrzb/qXj0LHKL9qtaX838pI1nzpCm+axLne09ZILnlvGai68vkFHkyCaPL64geHscPXAzGrFC11kovpx9tP04STLa0I+OeZEDhLFqxQtTRYj3vpRCbVGpJwOvJhcBLR5gSPR48f7hlR534/eduTRA0AS5kciAruyx1WW5HexzaWW9VXt6qSRtfvP70bYi0wLkZ1fzRkx2GXf76Ho80KH4s7FBpRHJHp43+DQYV628UdBqoxRnEXvOY7O60wMBVWaX3Xy8Z1c0RH+UyedNIogGu8pvGBhkJI vmlgmioR kuSkAU3YluxAVGU5PtX/ZlN77AzeotQtvCNf84QeBFidnaLldF86DcVuXuYo6qwx2pgArEkTiEvzknfdRNiGUMNkWYnn2M+3NuAB9AOAIBBz73vwdAYZGqL7r+wEgEQ4gm+3KXh10PI802w3n1YrOU9VylIUD8JK3hgO4Zf1h6Vm0Zh051OL6Go4DxSTRqSEJ+hfBfKK0KQXIva4rSUiL5umghQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 06, 2025 at 04:22:27PM +0900, Sergey Senozhatsky wrote: > On (25/02/06 14:55), Kairui Song wrote: > > > On (25/02/01 17:21), Kairui Song wrote: > > > > This seems will cause a huge regression of performance on multi core > > > > systems, this is especially significant as the number of concurrent > > > > tasks increases: > > > > > > > > Test build linux kernel using ZRAM as SWAP (1G memcg): > > > > > > > > Before: > > > > + /usr/bin/time make -s -j48 > > > > 2495.77user 2604.77system 2:12.95elapsed 3836%CPU (0avgtext+0avgdata > > > > 863304maxresident)k > > > > > > > > After: > > > > + /usr/bin/time make -s -j48 > > > > 2403.60user 6676.09system 3:38.22elapsed 4160%CPU (0avgtext+0avgdata > > > > 863276maxresident)k > > > > > > How many CPUs do you have? I assume, preemption gets into way which is > > > sort of expected, to be honest... Using per-CPU compression streams > > > disables preemption and uses CPU exclusively at a price of other tasks > > > not being able to run. I do tend to think that I made a mistake by > > > switching zram to per-CPU compression streams. > > > > > > What preemption model do you use and to what extent do you overload > > > your system? > > > > > > My tests don't show anything unusual (but I don't overload the system) > > > > > > CONFIG_PREEMPT > > > > I'm using CONFIG_PREEMPT_VOLUNTARY=y, and there are 96 logical CPUs > > (48c96t), make -j48 shouldn't be considered overload I think. make > > -j32 also showed an obvious slow down. > > Hmm, there should be more than enough compression streams then, the > limit is num_online_cpus. That's strange. I wonder if that's zsmalloc > handle allocation ("remove two-staged handle allocation" in the series.) > > [..] > > > Hmm it's just > > > > > > spin_lock() > > > list first entry > > > spin_unlock() > > > > > > Shouldn't be "a big spin lock", that's very odd. I'm not familiar with > > > perf lock contention, let me take a look. > > > > I can debug this a bit more to figure out why the contention is huge > > later > > That will be appreciated, thank you. > > > but my first thought is that, as Yosry also mentioned in > > another reply, making it preemptable doesn't necessarily mean the per > > CPU stream has to be gone. > > Was going to reply to Yosry's email today/tomorrow, didn't have time to > look into, but will reply here. > > > So for spin-lock contention - yes, but that lock really should not > be so visible. Other than that we limit the number of compression > streams to the number of the CPUs and permit preemption, so it should > be the same as the "preemptible per-CPU" streams, roughly. I think one other problem is that with a pool of streams guarded by a single lock all CPUs have to be serialized on that lock, even if there's enough streams for all CPUs in theory. > The difference, perhaps, is that we don't pre-allocate streams, but > allocate only as needed. This has two sides: one side is that later > allocations can fail, but the other side is that we don't allocate > streams that we don't use. Especially secondary streams (priority 1 > and 2, which are used for recompression). I didn't know it was possible > to use per-CPU data and still have preemption enabled at the same time. > So I'm not opposed to the idea of still having per-CPU streams and do > what zswap folks did. Note that it's not a free lunch. If preemption is allowed there is nothing holding keeping the CPU that you're using its data, and it can be offlined. I see that zcomp_cpu_dead() would free the compression stream from under its user in this case. We had a similar problem recently in zswap and it took me a couple of iterations to properly fix it. In short, you need to synchronize the CPU hotplug callbacks with the users of the compression stream to make sure the stream is not freed under the user.