From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5325CC02198 for ; Thu, 6 Feb 2025 03:06:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAC2F280002; Wed, 5 Feb 2025 22:06:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D33DB280001; Wed, 5 Feb 2025 22:06:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BADD1280002; Wed, 5 Feb 2025 22:06:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 958C2280001 for ; Wed, 5 Feb 2025 22:06:04 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1A0FF140E2D for ; Thu, 6 Feb 2025 03:06:04 +0000 (UTC) X-FDA: 83088030648.24.C6A1EC8 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf11.hostedemail.com (Postfix) with ESMTP id 2C49940004 for ; Thu, 6 Feb 2025 03:06:02 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=awXwzzIZ; spf=pass (imf11.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.177 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738811162; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kdKNX1BvstXeCC6f1zyDXr/amOzxwNFbB+FhJgAdwVk=; b=dGduHtmwl7GIXvT7z53REKezEAuo4v8wFzfUqMR/YAFZ47PPXxi3ORPsHVD7240+cca32o qr43Wq90kzNX1LGQ94R6QWm/2iryaLhqQmiLLMS2H7lYIeatAzgTdUZ9c+W6p/NvTvUDlv g5/jPmNZ0nmXEQz4AD1E8I6ccFLYPlE= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=awXwzzIZ; spf=pass (imf11.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.177 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738811162; a=rsa-sha256; cv=none; b=vTl/Vm0nUnNCHcgReapftZEkXx7bDK1zS7263cAyPOy/4HLefy1mreRHSbLdhxdfaVOFrw OY6S/K4FSjyVWCSQKbE6f3JL7HqJhtRjcz/605zfaX7xskgm7PVvmCuXz4kugpD+FR7IF6 zLdVRy3paIhQNYLMOEqL2q9rNaazB7A= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21f2339dcfdso6188645ad.1 for ; Wed, 05 Feb 2025 19:06:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1738811161; x=1739415961; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kdKNX1BvstXeCC6f1zyDXr/amOzxwNFbB+FhJgAdwVk=; b=awXwzzIZ0uKKtND4Xima6d2T4UsYL2A1hUx3Bggnfutr1kRgv9fCqARIpdyL0h3/am qRqg68dj8U2XBRrtuMt6YuJn/xK0dNEUY+EKg/NJ+HmxgnnTrOulfRZ9R0Mk+GYJeQ/U ED0FJXhkC+6mMuZwzvewRNs/Ssse+wSFsMaWU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738811161; x=1739415961; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kdKNX1BvstXeCC6f1zyDXr/amOzxwNFbB+FhJgAdwVk=; b=TJdr1M4meigpaiKordcnqvrkTuklu3i7EVblOOGu7UJUI9WbpGz21gCgiolXHVrzWU yGtEWfltQSgmd1X30WRKi1YmfdQOpBggAPScRq/9R+uo2OkgjDjLInXmwQHkUZjnM/Lr +JDYLI4uLe8NtRabJ/gYApgHguGd+voM9yp/V8NTKQMunEjDhS0h4kW2hrS72mYXYSRn DB4NE7D3Eyx3ge0zHVerthqBJNP+8ObKgocFbVTduGit0Xs8Veh9McKx0Srwbd4zLTne QKalGvss60o9UCxUVAfQQoW0kAUfTWRWvMOHqGL8+DnBMG2JAEle79taYo2A0pfH9oMD vjeQ== X-Forwarded-Encrypted: i=1; AJvYcCVnbiZk7fLWzHgtk/HCeI6wDi0owsjBFuOruq6AOVmYresd9raRsVeMNVcNU46rMh2ZdJNsyqbkYQ==@kvack.org X-Gm-Message-State: AOJu0Yw7JPyly1SyZU3zoMNUtMXu0+YKi2ol+Wsnp+KG/OSWweJYO7Xs EirUBu22pD5b9uuWmkF3dhAs/OEAW6XnkcuHzHTI/Ad+HIq4aFfTRg69YA9zyA== X-Gm-Gg: ASbGncuqIeYTKevTqEtjZGM9bFtAk2Ku+5OuM36V44mBZzg2NKZvHiYlWaGc4VAeEG1 vsZbnrv/00f5njqDrQymP9ybhdGmy+3uinjVPuzK1aYKiCg4RyB6xbP8PQ6MKohP7NeTuU+8p2O Oky9gtWC9EXd9ITpqht6NZKj2feXe9p7d5OrdSDielSYYCIf6yxzP9wditktUP9dXRFuHllhr6C XGjc6YrfPTAAo152l39+bhVBdMmbVv6JHz9+/maGpxLeGDv3DNvm5Y2udQLSu0t7fgXRue4ldN0 o6mdRmcuXWB8332xf1g= X-Google-Smtp-Source: AGHT+IF1h9QqWx8Ax9GduzvfjPC6Hk2SaCzVPxPlrH6rugnohOtxNvQlMN4PYWLIBEwOzz2PYVh21g== X-Received: by 2002:a17:902:f60e:b0:215:742e:5cff with SMTP id d9443c01a7336-21f2f1a3d3dmr33958385ad.16.1738811160877; Wed, 05 Feb 2025 19:06:00 -0800 (PST) Received: from google.com ([2401:fa00:8f:203:b2ad:5851:6fc0:11a4]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21f36552068sm1455385ad.87.2025.02.05.19.05.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Feb 2025 19:06:00 -0800 (PST) Date: Thu, 6 Feb 2025 12:05:55 +0900 From: Sergey Senozhatsky To: Yosry Ahmed Cc: Sergey Senozhatsky , Andrew Morton , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Message-ID: <6uhsj4bckhursiblkxe54azfgyqal6tq2de3lpkxw6omkised6@uylodcjruuei> References: <20250131090658.3386285-1-senozhatsky@chromium.org> <20250131090658.3386285-15-senozhatsky@chromium.org> <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2C49940004 X-Stat-Signature: deudwn1hogb6exykk1koh5pg4j3m9ppx X-Rspam-User: X-HE-Tag: 1738811162-880378 X-HE-Meta: U2FsdGVkX1/axPToRLHdPFStLZqGfPiG2fWJ9vNktxpXmIPVhQG+zog8nFIbPD/jPa2ddevchI5pRbgZ57g+KluPEWFi8oqVd2qz8LuUzeWnbqmpzyb6QT5bEKXjwMYZTDcNWo0GP/Z3O7/hfU4OKkS81IAEswRf9TRmm4+265dPUREfkkJPmw7VNeHrLS1RsEYidGmakX7H7Q/Df2ztnkj9CLBK/d3xV07oUUSTjo70SpOfWiaThOQ+i/ITDav+fPThKkW5/pH4GWlY9BX+Q13QEnVGxMgDX9GDuBa1xq/lrIG3Wvf9hLLsJ8CDOGP2bBakZxIUorhC6XdygQDmLf0ArB5DK/wQ71ij5U7O//hC8p+sFUbNaL9RGiJ3GNIjKQM5vfEOtnlBEBPDz6BbpZLior8K2PO1zLCOvUcRkDl2BY168PLnRErQ6Ism7+PrZhwAtcs+FthmEu8GlcRV8xud9DpVfsGWMJDNm9SCdLsc9jh0gXju+0JwkT88hiITMkw1lOm95nkHoOKjyoTvaShii97me1bN0vsAVagI7qbqi50/XshDsZeyYLggFczchl+iJOcWbPBEwp51qGOL5JvBRTKChGNEKYfkY7JrAJhA1rhAJIi4jsKpubPigMj5z4GlJN4ryvO5IkPVoRCuL5qJ3cZTeog/i5XBVs4knVatkLmPxkp3NuXw9S01yrVFBtzOr0Ml2CVaYjLwXHetW82cZBAWXG4EwWP4e4L0uPqdIsYnTFlxqa/IXuYlE5iZihTh/6CedQ5B1LPEwuYX7HGZgQw86fO/QL8yxq6g7ZtF8r2pRve4lPGt5Qeul2Kcz+GWdPxtf/bZD0R8mHJaFoqzIwW4GE8PQhVyQBqSFk2HgQ3FvBwmzO9pRO3lIALI/OkV1h0qGh7dc9nFGGs0xdI2nBi78geLsOsH8OCpbuipTxyCiTKIhYDqfiv5qrKSi44KgpFs2CvkD7HdS7Q tFhmIfkl RPSeNz1nQ239XwyoIykbvYTlGlU+RO7GRHobweYHA88zvjwPeCNR21gYPWokm56WSLVU5hKzU8i4mcgE5JhaOtoSfautE6ut3iMa0ERXjg9+caHAeDa/0C+DHsRxv0UzclCSnwXZpzFPaQxIgfFXDic14q1jdFa45vuNCi32BuUxfpgXAirI0PPnMb6iNosph1S3Ma5JXg8/1m3FGlrc08oAJ1qSv1VlviuSzC+qW4CQ/G/2KSNE65WZ3kPvDkqeNP9mLXXnvcFD0fPr567xKr32hTr841zj/EUd+wJ2Cxp8xRGxBj7oZ/KaOSr2aRuZQ2Z5jjOCHkAKm2joYBdVF0FF0RkhGBnVWWrqxAISsEqz77oc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On (25/02/05 19:06), Yosry Ahmed wrote: > > > For example, the compaction/migration code could be sleeping holding the > > > write lock, and a map() call would spin waiting for that sleeping task. > > > > write-lock holders cannot sleep, that's the key part. > > > > So the rules are: > > > > 1) writer cannot sleep > > - migration/compaction runs in atomic context and grabs > > write-lock only from atomic context > > - write-locking function disables preemption before lock(), just to be > > safe, and enables it after unlock() > > > > 2) writer does not spin waiting > > - that's why there is only write_try_lock function > > - compaction and migration bail out when they cannot lock the > > zspage > > > > 3) readers can sleep and can spin waiting for a lock > > - other (even preempted) readers don't block new readers > > - writers don't sleep, they always unlock > > That's useful, thanks. If we go with custom locking we need to document > this clearly and add debug checks where possible. Sure. That's what it currently looks like (can always improve) --- /* * zspage lock permits preemption on the reader-side (there can be multiple * readers). Writers (exclusive zspage ownership), on the other hand, are * always run in atomic context and cannot spin waiting for a (potentially * preempted) reader to unlock zspage. This, basically, means that writers * can only call write-try-lock and must bail out if it didn't succeed. * * At the same time, writers cannot reschedule under zspage write-lock, * so readers can spin waiting for the writer to unlock zspage. */ static void zspage_read_lock(struct zspage *zspage) { atomic_t *lock = &zspage->lock; int old = atomic_read_acquire(lock); do { if (old == ZS_PAGE_WRLOCKED) { cpu_relax(); old = atomic_read_acquire(lock); continue; } } while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1)); #ifdef CONFIG_DEBUG_LOCK_ALLOC rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_); #endif } static void zspage_read_unlock(struct zspage *zspage) { atomic_dec_return_release(&zspage->lock); #ifdef CONFIG_DEBUG_LOCK_ALLOC rwsem_release(&zspage->lockdep_map, _RET_IP_); #endif } static bool zspage_try_write_lock(struct zspage *zspage) { atomic_t *lock = &zspage->lock; int old = ZS_PAGE_UNLOCKED; preempt_disable(); if (atomic_try_cmpxchg_acquire(lock, &old, ZS_PAGE_WRLOCKED)) { #ifdef CONFIG_DEBUG_LOCK_ALLOC rwsem_acquire(&zspage->lockdep_map, 0, 0, _RET_IP_); #endif return true; } preempt_enable(); return false; } static void zspage_write_unlock(struct zspage *zspage) { atomic_set_release(&zspage->lock, ZS_PAGE_UNLOCKED); #ifdef CONFIG_DEBUG_LOCK_ALLOC rwsem_release(&zspage->lockdep_map, _RET_IP_); #endif preempt_enable(); } --- Maybe I'll just copy-paste the locking rules list, a list is always cleaner. > > > I wonder if there's a way to rework the locking instead to avoid the > > > nesting. It seems like sometimes we lock the zspage with the pool lock > > > held, sometimes with the class lock held, and sometimes with no lock > > > held. > > > > > > What are the rules here for acquiring the zspage lock? > > > > Most of that code is not written by me, but I think the rule is to disable > > "migration" be it via pool lock or class lock. > > It seems like we're not holding either of these locks in > async_free_zspage() when we call lock_zspage(). Is it safe for a > different reason? I think we hold size class lock there. async-free is only for pages that reached 0 usage ratio (empty fullness group), so they don't hold any objects any more and from her such zspages either get freed or find_get_zspage() recovers them from fullness 0 and allocates an object. Both are synchronized by size class lock. > > Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data > > patterns the clients have. I suspect we'd need to synchronize RCU every > > time a zspage is freed: zs_free() [this one is complicated], or migration, > > or compaction? Sounds like anti-pattern for RCU? > > Can't we use kfree_rcu() instead of synchronizing? Not sure if this > would still be an antipattern tbh. Yeah, I don't know. The last time I wrongly used kfree_rcu() it caused a 27% performance drop (some internal code). This zspage thingy maybe will be better, but still has a potential to generate high numbers of RCU calls, depends on the clients. Probably the chances are too high. Apart from that, kvfree_rcu() can sleep, as far as I understand, so zram might have some extra things to deal with, namely slot-free notifications which can be called from softirq, and always called under spinlock: mm slot-free -> zram slot-free -> zs_free -> empty zspage -> kfree_rcu > It just seems like the current locking scheme is really complicated :/ That's very true.