From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EE41C0218D for ; Tue, 28 Jan 2025 05:30:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98DF2280201; Tue, 28 Jan 2025 00:30:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93EA8280200; Tue, 28 Jan 2025 00:30:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 806AF280201; Tue, 28 Jan 2025 00:30:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 626E2280200 for ; Tue, 28 Jan 2025 00:30:07 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DB1EF120BDB for ; Tue, 28 Jan 2025 05:30:06 +0000 (UTC) X-FDA: 83055734412.10.7729985 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf17.hostedemail.com (Postfix) with ESMTP id BB6CD40007 for ; Tue, 28 Jan 2025 05:30:04 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TnRk9xVp; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.171 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738042204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MdQh40kM/TtDv7WT37Q6qIUHvr8p+2uD9zcVAcxBIho=; b=dcuMVtkbLdxGgEBhDZts/tM/wxQt3FfMphBc4dsP0sLVT4LwgyqVexDT2bvHs88WukgwXS /tm+fgBK37b4pWYKLwK4iEikx+m3uHiIogC9CdFgPvzzkiYo9DXwgUzTnWZ0DkzSMdvyhn PVb7DlVHeJbjPdEb0Xfp3LKIy36TPTU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738042204; a=rsa-sha256; cv=none; b=oFDlbxWcPbh9ASHsj8BCAByK2vnOgLJy6UOJuxjFELhUfQEPI+Y17aelNHQaBC9dZCZD2i Ehg6vZKmdSqGJ+SbslwN3bJ4fGa3Qzlfc5LDL7ZJZHJVp32Ffv10YvrR/TzPzKMz0Q4Icb JipJungIoQT5U5vyQfjfXNoHE0Pc7Ig= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TnRk9xVp; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.171 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2162c0f6a39so111610815ad.0 for ; Mon, 27 Jan 2025 21:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1738042203; x=1738647003; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MdQh40kM/TtDv7WT37Q6qIUHvr8p+2uD9zcVAcxBIho=; b=TnRk9xVp0RChbE/QhcwGCYGxg4ryJHAk4ZOmuglVq1/xWoFvs4j+HMnRtbueyVxCaL yo1BTsawaxerBMxGx2IEsG40zYgMVZYjx2BPwzED4vPSBmV+XD0ZL+a+YZIF326wzLFT nQI4sQWNJs0Izp4qft5TSkB8NqlrHCvJl/ZpU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738042203; x=1738647003; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MdQh40kM/TtDv7WT37Q6qIUHvr8p+2uD9zcVAcxBIho=; b=S3m1DcnMWsWZCv4hKtX2NQ6gs0TBNC2THR+6IX2x7fCk6uzqkv7P3Z7CJNDcIMTTKc tgMfBZPgOcNPQA+BJVjiVc1SfZkFUkSd3AQFNO5EO4bbidHTtXXHJEfs8M7SrDQ3Xxjq spXld2qZMckbgDySzstmgUDQ70XiurQ++CB+x1cqlxshraMgBSv8K5eWQyXm8R4Ekqdj aQM7w4fSquHKGknhjLF52HTBKvjU+n057UpepXqSs+wk5PaVJoRtF5s/nMtq1CcdnT3O NJ8uasGbsf+8sAYXYTfGkZ5CmCDZDlh/IOmTRUEJ/G7+EoKNy6duBBKLgKyMvxVdKPy5 gr/A== X-Forwarded-Encrypted: i=1; AJvYcCV9byMyHLjzHijImXhozrapm0jQHsyo1iU15PM9YgCtVbPGEe5zUMJMzm6giGTZfOQY2dnRiJC77A==@kvack.org X-Gm-Message-State: AOJu0YxGtbgbVqkGnSZ6Pq9slsdI05/hxYYCrYBGe9awCHhFVlmf9ag7 sY71mgFaf/qYEfgiHy/Cg0UmLQgiBWnCmb8uVkmTEvIrJFNliYhB+rJ2gih/ew== X-Gm-Gg: ASbGncsXki6bU3rRAXl9kV3rZo9MAgyOE0TmnS/h1ADlrtfHO/ZWaRky0oW8dRl/XKF em29+dCuTiP+2XQ9YllbuKzpRKG9vQQUOgkymvy7NiXp6mcARGXwwa8+gRH8cSC+hW1DO4qP5si aR4i2yBDceZmrbJ5T+9Z+n16T+7gjfrvnLZY4ima7wjYZsq1bzSWhJ6NHNfBgwbmJ83xpaplh0X cww11NAoYlzbYPf1UvCO1+krJmj3vAZhPRXtI/QxzVZYR604fm1FpTgwFrF/gbVa5GpGTPdF6CR fEam5MwYrDpMPdZOxFuONylDAopyDQ== X-Google-Smtp-Source: AGHT+IGDs2GdopoThJ/JbslDKG30vbHD1Le8hiqTMbLYbht/n25kvkl8xuAVVuk5LD7pcwx6/xO4zg== X-Received: by 2002:a17:902:ef51:b0:216:59f1:c7d9 with SMTP id d9443c01a7336-21dcc51a2b2mr42662825ad.19.1738042203337; Mon, 27 Jan 2025 21:30:03 -0800 (PST) Received: from google.com ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da424da9csm73177875ad.219.2025.01.27.21.29.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2025 21:30:02 -0800 (PST) Date: Tue, 28 Jan 2025 14:29:56 +0900 From: Sergey Senozhatsky To: Yosry Ahmed Cc: Sergey Senozhatsky , Andrew Morton , Minchan Kim , Johannes Weiner , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Message-ID: References: <20250127080254.1302026-1-senozhatsky@chromium.org> <20250127080254.1302026-6-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: sjeynhtz4fr5y69xpk5pa6b5ze9j45jn X-Rspam-User: X-Rspamd-Queue-Id: BB6CD40007 X-Rspamd-Server: rspam03 X-HE-Tag: 1738042204-949725 X-HE-Meta: U2FsdGVkX19QQJZvCu1SNKhs6r1SCQzsAXR7ajktEDnm6+86DuHiUHNklYbDPaNGxQlHt0AahsremuS5LQZ//oKaIs2aA7NrfWCP7TbUi1/j8wlCeJM0eaAe3rozIq7BWgoOnVdT6AymgThQs1SVhTDxq3tY3wKptLQza49U1xLEhqsCdn2uXfhUYRQ+teCxBGplZ4ch8HVn+UJCJX+iYMq2YQPatZWJWp2+BOHdp3/9qly7w0HTyBw2tE/ixO3r6JHHEbLvO9uM8YQkOJTnnWIIpQPwZUNwwabovNvbwH7ERLFnuIa6xkYXoobK1TUIs9MrPcmtVa3zhfO81b4IgyXSnU7zfS7bpGFWZ62PII4lp2gTwSigPGSC4tDYih950jb9ug3qyjjTxe/kE39QcevmDznwWlWjjLSoEoXhFajqAbJ9YTucAQHa/BF7mTu3mJAo8Iu4cTt/uD/M22apBpzFsall9SOpUd+M+lc/n+ykuY0IJ9xlz8/ClNcHQCSMTOz0RafimRCml1Or/eUOzIcJWW7L5AEAB+G64+13oU3xvgLBJsWFiHa7w54cx5fYqKS0yFSiyFARSh8dETzWeTzyseLJY6pJuduSemeMWNiYjwXB3SErIxIjFk10Zr6eodMMVquKa7XTHdWnkqxZNu/ZLxCZC+RlC+C5rGww7cUhJP4X9MW6eiRokYmDiTyzQ0Ob9MYMTQ77snL5r9fliF2KgvpDPurxPl9k8SJ0ww2T3F81f/PAxiPCwZT6310wlBVN7RjFx+xlFrKpAetRKWF2+kNofHEwcKn8Ugw3Itg7lWj4WN7zPfxlaGiam7fsPBim9rBIRKmImsPEYszyrdy5tEoijlekTK/Nm2vbUJrvr/NZ/ka6UTOOuLLu38El6e75ru2EqU28/6QwIk/AGLGUD+8w8sSnKMqmhjyz5suBKLNMe9uKghlGI9bWA0XVKs61zGxXCZ7i/yVRu24 NcaPfyIT mjvQsRkkjaHOqDX8vmNKz/yA2kAIC81nH+VFHc0eVAH2zik+kOHnxoWAly5ZqUIdAi9/UpyzEWZQef0CwG0CcFIZtMTTFjvUw8UVXq2cIgOzGlufUrXdDNXqzOg67qmYjXs51BgEa/PsGDfYlk0Icktc9zoP9RcMR0gosiAeMMK2etttaymEKCjDo1moGitv7l9RvJxeyhzf/3EnVJCucpP+0wv8n19mproYDMG8V8W9NilPchxcR/sLloFDVfgmj+ekpsBWv2mjPB1dqnKNrqIfyXf0ECrYT/lHwrM7CL4cFZcKwxBxLCCl7RgM+bDFGbxduRJpikYYZhd5OcVlog+GAepcOzDFkWWT/o5BZ5oLXE90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On (25/01/28 01:36), Yosry Ahmed wrote: > > Yes, for (a) mentioned above. > > > > > I guess in the WO case the buffer is not needed and we can just pass > > > NULL? > > > > Yes. > > Perhaps we want to document this and enforce it (make sure that the > NULL-ness of the buffer matches the access type). Right. > > But, and it's a big but. And it's (b) from the above. I wasn't brave > > enough to just drop (b) optimization and replace it with memcpy(), > > especially when we work with relatively large objects (say size-class > > 3600 bytes and above). This certainly would not make battery powered > > devices happier. Maybe in zswap the page is only read once (is that > > correct?), but in zram page can be read multiple times (e.g. when zram > > is used as a raw block-dev, or has a mounted fs on it) which means > > multiple extra memcpy()-s. > > In zswap, because we use the crypto_acomp API, when we cannot sleep with > the object mapped (which is true for zsmalloc), we just copy the > compressed object into a preallocated buffer anyway. So having a > zs_obj_load() interface would move that copy inside zsmalloc. Yeah, I saw zpool_can_sleep_mapped() and had the same thought. zram, as of now, doesn't support algos that can/need schedule internally for whatever reason - kmalloc, mutex, H/W wait, etc. > With your series, zswap can drop the memcpy and save some cycles on the > compress side. I didn't realize that zram does not perform any copies on the > read/decompress side. > > Maybe the load interface can still provide a buffer to avoid the copy > where possible? I suspect with that we don't need the state and can > just pass a pointer. We'd need another call to potentially unmap, so > maybe load_start/load_end, or read_start/read_end. > > Something like: > > zs_obj_read_start(.., buf) > { > if (contained in one page) > return kmapped obj > else > memcpy to buf > return buf > } > > zs_obj_read_end(.., buf) > { > if (container in one page) > kunmap > } > > The interface is more straightforward and we can drop the map flags > entirely, unless I missed something here. Unfortunately you'd still need > the locking changes in zsmalloc to make zram reads fully preemptible. Agreed, the interface part is less of a problem, the atomicity of zsmalloc is a much bigger issue. We, technically, only need to mark zspage as "being used, don't free" so that zsfree/compaction/migration don't mess with it, but this is only "technically". In practice we then have CPU0 CPU1 zs_map_object set READ bit migrate schedule pool rwlock size class spin-lock wait for READ bit to clear ... set WRITE bit clear READ bit and the whole thing collapses like a house of cards. I wasn't able to trigger a watchdog on my tests, but the pattern is there and it's enough. Maybe we can teach compaction and migration to try-WRITE and bail out if the page is locked, but I don't know. > I am not suggesting that we have to go this way, just throwing out > ideas. Sure, load+store is still an option. While that zs_map_object() optimization is nice, it may have two sides [in zram case]. On one hand, we safe memcpy() [but only for certain objects], on the other hand, we keep the page locked for the entire decompression duration, which can be quite a while (e.g. when algorithm is configured with a very high compression level): CPU0 CPU1 zs_map_object read lock page rwlock write lock page rwlock spin decompress() ... spin a lot read unlock page rwlock Maybe copy-in is just an okay thing to do. Let me try to measure. > BTW, are we positive that the locking changes made in this series are > not introducing regressions? Cannot claim that with confidence. Our workloads don't match, we don't even use zsmalloc in the same way :) Here be dragons.