linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Nhat Pham <nphamcs@gmail.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API
Date: Tue, 28 Jan 2025 14:29:56 +0900	[thread overview]
Message-ID: <m7xv5v46sowhm7xwutv5zqxpfygcg2r3asfd3npakmacap6wq2@prtxaiwzzgjn> (raw)
In-Reply-To: <Z5g0gITP8psSdqwj@google.com>

On (25/01/28 01:36), Yosry Ahmed wrote:
> > Yes, for (a) mentioned above.
> > 
> > > I guess in the WO case the buffer is not needed and we can just pass
> > > NULL?
> > 
> > Yes.
> 
> Perhaps we want to document this and enforce it (make sure that the
> NULL-ness of the buffer matches the access type).

Right.

> > But, and it's a big but.  And it's (b) from the above.  I wasn't brave
> > enough to just drop (b) optimization and replace it with memcpy(),
> > especially when we work with relatively large objects (say size-class
> > 3600 bytes and above).  This certainly would not make battery powered
> > devices happier.  Maybe in zswap the page is only read once (is that
> > correct?), but in zram page can be read multiple times (e.g. when zram
> > is used as a raw block-dev, or has a mounted fs on it) which means
> > multiple extra memcpy()-s.
> 
> In zswap, because we use the crypto_acomp API, when we cannot sleep with
> the object mapped (which is true for zsmalloc), we just copy the
> compressed object into a preallocated buffer anyway. So having a
> zs_obj_load() interface would move that copy inside zsmalloc.

Yeah, I saw zpool_can_sleep_mapped() and had the same thought.  zram,
as of now, doesn't support algos that can/need schedule internally for
whatever reason - kmalloc, mutex, H/W wait, etc.

> With your series, zswap can drop the memcpy and save some cycles on the
> compress side. I didn't realize that zram does not perform any copies on the
> read/decompress side.
> 
> Maybe the load interface can still provide a buffer to avoid the copy
> where possible? I suspect with that we don't need the state and can
> just pass a pointer. We'd need another call to potentially unmap, so
> maybe load_start/load_end, or read_start/read_end.
> 
> Something like:
> 
> zs_obj_read_start(.., buf)
> {
> 	if (contained in one page)
> 		return kmapped obj
> 	else
> 		memcpy to buf
> 		return buf
> }
> 
> zs_obj_read_end(.., buf)
> {
> 	if (container in one page)
> 		kunmap
> }
> 
> The interface is more straightforward and we can drop the map flags
> entirely, unless I missed something here. Unfortunately you'd still need
> the locking changes in zsmalloc to make zram reads fully preemptible.

Agreed, the interface part is less of a problem, the atomicity of zsmalloc
is a much bigger issue.  We, technically, only need to mark zspage as "being
used, don't free" so that zsfree/compaction/migration don't mess with it,
but this is only "technically".  In practice we then have

	CPU0							CPU1

	zs_map_object
	set READ bit					migrate
	schedule						pool rwlock
									size class spin-lock
									wait for READ bit to clear
	...								set WRITE bit
	clear READ bit

and the whole thing collapses like a house of cards.  I wasn't able
to trigger a watchdog on my tests, but the pattern is there and it's
enough.  Maybe we can teach compaction and migration to try-WRITE and
bail out if the page is locked, but I don't know.

> I am not suggesting that we have to go this way, just throwing out
> ideas.

Sure, load+store is still an option.  While that zs_map_object()
optimization is nice, it may have two sides [in zram case].  On
one hand, we safe memcpy() [but only for certain objects], on the
other hand, we keep the page locked for the entire decompression
duration, which can be quite a while (e.g. when algorithm is
configured with a very high compression level):

	CPU0							CPU1

	zs_map_object
	read lock page rwlock			write lock page rwlock
									spin
	decompress()					... spin a lot
	read unlock page rwlock

Maybe copy-in is just an okay thing to do.  Let me try to measure.

> BTW, are we positive that the locking changes made in this series are
> not introducing regressions?

Cannot claim that with confidence.  Our workloads don't match, we don't
even use zsmalloc in the same way :)  Here be dragons.


  reply	other threads:[~2025-01-28  5:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-27  7:59 [RFC PATCH 0/6] zsmalloc: make zsmalloc preemptible Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 1/6] zram: deffer slot free notification Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 2/6] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-27 20:23   ` Uros Bizjak
2025-01-28  0:29     ` Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 3/6] zsmalloc: convert to sleepable pool lock Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 4/6] zsmalloc: make class lock sleepable Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Sergey Senozhatsky
2025-01-27 21:26   ` Yosry Ahmed
2025-01-28  0:37     ` Sergey Senozhatsky
2025-01-28  0:49       ` Yosry Ahmed
2025-01-28  1:13         ` Sergey Senozhatsky
2025-01-27 21:58   ` Yosry Ahmed
2025-01-28  0:59     ` Sergey Senozhatsky
2025-01-28  1:36       ` Yosry Ahmed
2025-01-28  5:29         ` Sergey Senozhatsky [this message]
2025-01-28  9:38           ` Sergey Senozhatsky
2025-01-28 17:21             ` Yosry Ahmed
2025-01-29  3:32               ` Sergey Senozhatsky
2025-01-28 11:10           ` Sergey Senozhatsky
2025-01-28 17:22             ` Yosry Ahmed
2025-01-28 23:01               ` Sergey Senozhatsky
2025-01-29  5:40         ` Sergey Senozhatsky
2025-01-27  7:59 ` [RFC PATCH 6/6] zram: switch over to zshandle " Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m7xv5v46sowhm7xwutv5zqxpfygcg2r3asfd3npakmacap6wq2@prtxaiwzzgjn \
    --to=senozhatsky@chromium.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox