From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API
Date: Tue, 28 Jan 2025 14:29:56 +0900 [thread overview]
Message-ID: <m7xv5v46sowhm7xwutv5zqxpfygcg2r3asfd3npakmacap6wq2@prtxaiwzzgjn> (raw)
In-Reply-To: <Z5g0gITP8psSdqwj@google.com>
On (25/01/28 01:36), Yosry Ahmed wrote:
> > Yes, for (a) mentioned above.
> >
> > > I guess in the WO case the buffer is not needed and we can just pass
> > > NULL?
> >
> > Yes.
>
> Perhaps we want to document this and enforce it (make sure that the
> NULL-ness of the buffer matches the access type).
Right.
> > But, and it's a big but. And it's (b) from the above. I wasn't brave
> > enough to just drop (b) optimization and replace it with memcpy(),
> > especially when we work with relatively large objects (say size-class
> > 3600 bytes and above). This certainly would not make battery powered
> > devices happier. Maybe in zswap the page is only read once (is that
> > correct?), but in zram page can be read multiple times (e.g. when zram
> > is used as a raw block-dev, or has a mounted fs on it) which means
> > multiple extra memcpy()-s.
>
> In zswap, because we use the crypto_acomp API, when we cannot sleep with
> the object mapped (which is true for zsmalloc), we just copy the
> compressed object into a preallocated buffer anyway. So having a
> zs_obj_load() interface would move that copy inside zsmalloc.
Yeah, I saw zpool_can_sleep_mapped() and had the same thought. zram,
as of now, doesn't support algos that can/need schedule internally for
whatever reason - kmalloc, mutex, H/W wait, etc.
> With your series, zswap can drop the memcpy and save some cycles on the
> compress side. I didn't realize that zram does not perform any copies on the
> read/decompress side.
>
> Maybe the load interface can still provide a buffer to avoid the copy
> where possible? I suspect with that we don't need the state and can
> just pass a pointer. We'd need another call to potentially unmap, so
> maybe load_start/load_end, or read_start/read_end.
>
> Something like:
>
> zs_obj_read_start(.., buf)
> {
> if (contained in one page)
> return kmapped obj
> else
> memcpy to buf
> return buf
> }
>
> zs_obj_read_end(.., buf)
> {
> if (container in one page)
> kunmap
> }
>
> The interface is more straightforward and we can drop the map flags
> entirely, unless I missed something here. Unfortunately you'd still need
> the locking changes in zsmalloc to make zram reads fully preemptible.
Agreed, the interface part is less of a problem, the atomicity of zsmalloc
is a much bigger issue. We, technically, only need to mark zspage as "being
used, don't free" so that zsfree/compaction/migration don't mess with it,
but this is only "technically". In practice we then have
CPU0 CPU1
zs_map_object
set READ bit migrate
schedule pool rwlock
size class spin-lock
wait for READ bit to clear
... set WRITE bit
clear READ bit
and the whole thing collapses like a house of cards. I wasn't able
to trigger a watchdog on my tests, but the pattern is there and it's
enough. Maybe we can teach compaction and migration to try-WRITE and
bail out if the page is locked, but I don't know.
> I am not suggesting that we have to go this way, just throwing out
> ideas.
Sure, load+store is still an option. While that zs_map_object()
optimization is nice, it may have two sides [in zram case]. On
one hand, we safe memcpy() [but only for certain objects], on the
other hand, we keep the page locked for the entire decompression
duration, which can be quite a while (e.g. when algorithm is
configured with a very high compression level):
CPU0 CPU1
zs_map_object
read lock page rwlock write lock page rwlock
spin
decompress() ... spin a lot
read unlock page rwlock
Maybe copy-in is just an okay thing to do. Let me try to measure.
> BTW, are we positive that the locking changes made in this series are
> not introducing regressions?
Cannot claim that with confidence. Our workloads don't match, we don't
even use zsmalloc in the same way :) Here be dragons.
next prev parent reply other threads:[~2025-01-28 5:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-27 7:59 [RFC PATCH 0/6] zsmalloc: make zsmalloc preemptible Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 1/6] zram: deffer slot free notification Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 2/6] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-27 20:23 ` Uros Bizjak
2025-01-28 0:29 ` Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 3/6] zsmalloc: convert to sleepable pool lock Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 4/6] zsmalloc: make class lock sleepable Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Sergey Senozhatsky
2025-01-27 21:26 ` Yosry Ahmed
2025-01-28 0:37 ` Sergey Senozhatsky
2025-01-28 0:49 ` Yosry Ahmed
2025-01-28 1:13 ` Sergey Senozhatsky
2025-01-27 21:58 ` Yosry Ahmed
2025-01-28 0:59 ` Sergey Senozhatsky
2025-01-28 1:36 ` Yosry Ahmed
2025-01-28 5:29 ` Sergey Senozhatsky [this message]
2025-01-28 9:38 ` Sergey Senozhatsky
2025-01-28 17:21 ` Yosry Ahmed
2025-01-29 3:32 ` Sergey Senozhatsky
2025-01-28 11:10 ` Sergey Senozhatsky
2025-01-28 17:22 ` Yosry Ahmed
2025-01-28 23:01 ` Sergey Senozhatsky
2025-01-29 5:40 ` Sergey Senozhatsky
2025-01-27 7:59 ` [RFC PATCH 6/6] zram: switch over to zshandle " Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m7xv5v46sowhm7xwutv5zqxpfygcg2r3asfd3npakmacap6wq2@prtxaiwzzgjn \
--to=senozhatsky@chromium.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox