From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AB18C02193 for ; Wed, 5 Feb 2025 02:43:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 825E1280009; Tue, 4 Feb 2025 21:43:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D612280007; Tue, 4 Feb 2025 21:43:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67843280009; Tue, 4 Feb 2025 21:43:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 472DE280007 for ; Tue, 4 Feb 2025 21:43:26 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D2BD41C7AD0 for ; Wed, 5 Feb 2025 02:43:25 +0000 (UTC) X-FDA: 83084344770.13.9C969D5 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf12.hostedemail.com (Postfix) with ESMTP id EC78040006 for ; Wed, 5 Feb 2025 02:43:23 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=YVTbouUz; spf=pass (imf12.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.172 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738723404; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PBrHPuVmyWNdWdsZVCF6A9XAzFkJOG30Q3Gn/3kEVpY=; b=ulLX2Yj6ivj0eXCv6B/OwTzC70gQIA+Z0eZyvot+QN8aAfMgGBhC4lZn9/6Oc60dq47Ijc FtdYpkVzheKa84IO3WS7YUg9nrrQOkFLKZJYkblu7p8YZuKEZBnovKPP32EwRXoFnGp2qK v8NIGDDZ+kj4l4l4AoSqLLqhTnnlv6I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738723404; a=rsa-sha256; cv=none; b=2d3hzDPw2lL4E0jgxVJPO6g+/2DDmhJ2sjndjr83ZtJaNZZ9Hk2TMHZxDBTp8urTS+RSHI N1BP77Q0f/VOGZXuJXI7cYNPrEbc5w6qYaEkTFWTnzmd6JYcXV4tt0+LbynTZU1/cp1nU2 8bFc0bDitvFL8bE0DlDY05IpuT7ko/s= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=YVTbouUz; spf=pass (imf12.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.172 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-21f0c4275a1so17162405ad.2 for ; Tue, 04 Feb 2025 18:43:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1738723403; x=1739328203; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PBrHPuVmyWNdWdsZVCF6A9XAzFkJOG30Q3Gn/3kEVpY=; b=YVTbouUzgEDxQMwNMLZ1hfiSH0B83mBDWJt3EdWXNraTx9GSoYlGdsFxtspjHPXtJw wvb5z4jJSwuA8NeS+n7TALxUPmFD1H9VKFFrTmttVP7wAyc3cKF2pHdUIzOeZfzeUaPJ kb+z5evC8yI9q4zR1H7SncxjMGJvD0f9W3Oco= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738723403; x=1739328203; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PBrHPuVmyWNdWdsZVCF6A9XAzFkJOG30Q3Gn/3kEVpY=; b=Pc6QPGCRL7PGdxxfRVMFx4OaSSOn2o9LTIN9iKeTRrua8Ggs8V/gnqr0b/a2JMwbwQ 692Z8y/HDm/nE+QYNO47Eq98im9v7W4mvzrKni3sQzbKC8eFS97M9yTauC8Y5S82r7mK AdI/5BGQfnu77MCnmmdwH0uULPibpWMS7K+ePyzFVKkgj9TiQy5pW2oPHV4V1Z4VYiiH 0j92zj9YkPjB5J//i7Ql2H5zO2fXGBB8Aa8/a4R8dzgFbFAkbDGzucAZQWgguE1CBlOs Jqi44PGk2UXy/FkUGvk8fWdRcxZ4mQ+9/ln7MqFqrV8YWl1ZkAOQLWmJw23WgdnoR3Mh vC1A== X-Forwarded-Encrypted: i=1; AJvYcCVpaLOW0PhL1HmlF3MPhxn3zXiWrMU1zx9Jf1N+FQLU4wYMdK/E0nvDyQufWL3G8EkLb+EUd+hRnw==@kvack.org X-Gm-Message-State: AOJu0YwpvTgvxIQsf9LvwzDeoKKzdEpVWhrwwK7S8YuNu6VRmdfMha3f j0cMyg+kvqbwA0zYhvzIMExmgRkVmNzo3XkS1IyoozcApwutDYteOLReSseWMw== X-Gm-Gg: ASbGncsXb/ihNH0ReZ0n77xryQR/St5K9wfRMtPUOjc71Z9W92GgiRJ2wC84YYWi4yQ 4GJyrDAyW6ORNpjbc7p1qbGKDKfGT/7h487qJZAob/pvFCgxowSus+ZryYFiRB2UlRuEuZPzkdL +399IWChoYf5LaMDA9q14XIQVvUMkwnHNLOp56Cf99aB+SuD4oYo1icyeHzWSyfSaHIKGZ6K087 odXFUdEpdvifhkDdVpQZnDmTP44NtmpvhOTXJ1EoDQrCTGIYEKXX8NzcXCTLhOWjELQ9zCRo/2Z KpCYEGV1pJtk5QRhl+M= X-Google-Smtp-Source: AGHT+IFWDIBI3L1G1wcU9fpNE+tjjiOnhkKSnDmkzKhD2iNyGiziKwAaNefN3LiNkKYbdp1TpT00YQ== X-Received: by 2002:a17:902:cec5:b0:21f:456:3afc with SMTP id d9443c01a7336-21f17e74b85mr18678125ad.29.1738723402571; Tue, 04 Feb 2025 18:43:22 -0800 (PST) Received: from google.com ([2401:fa00:8f:203:5d42:47ec:e16e:5797]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21de31f836csm103708655ad.93.2025.02.04.18.43.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 18:43:21 -0800 (PST) Date: Wed, 5 Feb 2025 11:43:16 +0900 From: Sergey Senozhatsky To: Yosry Ahmed Cc: Sergey Senozhatsky , Andrew Morton , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Message-ID: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> References: <20250131090658.3386285-1-senozhatsky@chromium.org> <20250131090658.3386285-15-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: 9h3mpsutc1ue6zd4tqb6pgpm9mjd64mr X-Rspam-User: X-Rspamd-Queue-Id: EC78040006 X-Rspamd-Server: rspam03 X-HE-Tag: 1738723403-788659 X-HE-Meta: U2FsdGVkX19Rv3W1qqigQOCMCShFGqM1JyDytGtwjFbF9F4045N56/7vAA4CMdvAFp7Nay+JuCQSDJGu7V0FhFYLosI3XGDM+X4KdKVI3Paan446pRla1g95ASabxyfnuR+PGipD22XlWMcjEJwsadbffveEuMvsKs0UhUeepXQjwhmVKzZjPNhkh4bNfBdbIaFjPGNVYGx3rwv4ftMna8Y2ib0CrgvOY7fOLmvPnA84qa/QMLTax2cnwFNwFGGtBY8e3eIlWYFENN6DhTVXPMnaD0OQ5hQ+niZQgM2EblhdqpqGAD+bcjOYdJpTy1h+lk7FAOh1hTgZ2nskFJxeN5P5QazYNypxAls8OE6fHeG1vPNS4xSTEwdV4WEIgI3AL1HbEfrt/2wxRedqjbXjXF7M5jNa7FKigw5pCsL7Oc5Y5DvXndkokKy49MmDpcHDcLIzCUIUYS6udsBYdBn9M7h2ObC/mE2Bv0p/KOPIdvq43BZQ16l5+lrVTmU1vSX2Elm/jHYf/L1Po0PhmERka20DTX4I5he7Xxcc9+4IvrEemx599D8L8nBoNU9C/6d70dxzJLitwV5+N3sOfRnC4OBbERM/CCupsKOtpEoqpeqrLdXJHjm5B/McXujtpP2XgjKwsLuqvnEGqXSpNyJPRRBFiWkk8A/biA2mk333P8UHFJuae/gInSpw59xTASr+6KYp/pwm//gVYKKZAKGfCdUgzxG0Gj3Y5zqp6Npf2tcAlMlda7WCJBGU9XOzH5q9tXK2cSjn7ctOgTrAbZ58mtg3IzdF37Dg/Rsjzoy1NQZCm0hcWQXUG97/DWpBjYx7rnL2eGenYEBDPJ4ZlXdl65T9eMt+Kz/S9uBpxieySqcfP3c7SU/oZmtn8xjAFvmE7dGEp6ONlPo3G8E6HfiC6e0ZPOWqRNwwgFqXW67S5S0bEk51Og56/QndlQfmMB9UejMHcxtz2ARlDupZmFc sbsNvoue D58AwRktIE6e2kwXHOeUcCA0eYCOO3Z2KODdjmJtvsM+OyFRGgxb9xWpQ6xFkzEJu+0IcjHRl1w/cxWhq0qPUJXac4lWI+C3smBQkGFo1LtSjpAhr0b3v0OYM9WGD0JupwtPh4pOIwq7gDPfX1B3tkZJZrvXGuVwmP54gQlulRG2YvnCnCYedwMdMbZ1xZHkqeTTjpsjUkKc1O4KC4jqaed65jbAKwXHvJLRhPYrfSB0LYtUklfo78JsEWRASt1mxwV0ErAlvz26o+/2pqNnHDUHHy85oLPQwUF4kboM+0exedBbUd00yv22PwvzOA7PssJNYHmCgsuhdnNOm22k1/bx8mFynb0DRlTvDf8yGmuMCZA4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On (25/02/04 17:19), Yosry Ahmed wrote: > > sizeof(struct zs_page) change is one thing. Another thing is that > > zspage->lock is taken from atomic sections, pretty much everywhere. > > compaction/migration write-lock it under pool rwlock and class spinlock, > > but both compaction and migration now EAGAIN if the lock is locked > > already, so that is sorted out. > > > > The remaining problem is map(), which takes zspage read-lock under pool > > rwlock. RFC series (which you hated with passion :P) converted all zsmalloc > > into preemptible ones because of this - zspage->lock is a nested leaf-lock, > > so it cannot schedule unless locks it's nested under permit it (needless to > > say neither rwlock nor spinlock permit it). > > Hmm, so we want the lock to be preemtible, but we don't want to use an > existing preemtible lock because it may be held it from atomic context. > > I think one problem here is that the lock you are introducing is a > spinning lock but the lock holder can be preempted. This is why spinning > locks do not allow preemption. Others waiting for the lock can spin > waiting for a process that is scheduled out. > > For example, the compaction/migration code could be sleeping holding the > write lock, and a map() call would spin waiting for that sleeping task. write-lock holders cannot sleep, that's the key part. So the rules are: 1) writer cannot sleep - migration/compaction runs in atomic context and grabs write-lock only from atomic context - write-locking function disables preemption before lock(), just to be safe, and enables it after unlock() 2) writer does not spin waiting - that's why there is only write_try_lock function - compaction and migration bail out when they cannot lock the zspage 3) readers can sleep and can spin waiting for a lock - other (even preempted) readers don't block new readers - writers don't sleep, they always unlock > I wonder if there's a way to rework the locking instead to avoid the > nesting. It seems like sometimes we lock the zspage with the pool lock > held, sometimes with the class lock held, and sometimes with no lock > held. > > What are the rules here for acquiring the zspage lock? Most of that code is not written by me, but I think the rule is to disable "migration" be it via pool lock or class lock. > Do we need to hold another lock just to make sure the zspage does not go > away from under us? Yes, the page cannot go away via "normal" path: zs_free(last object) -> zspage becomes empty -> free zspage so when we have active mapping() it's only migration and compaction that can free zspage (its content is migrated and so it becomes empty). > Can we use RCU or something similar to do that instead? Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data patterns the clients have. I suspect we'd need to synchronize RCU every time a zspage is freed: zs_free() [this one is complicated], or migration, or compaction? Sounds like anti-pattern for RCU?