From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Wed, 5 Feb 2025 11:43:16 +0900 [thread overview]
Message-ID: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> (raw)
In-Reply-To: <Z6JMIKlO9U5P-5gE@google.com>
On (25/02/04 17:19), Yosry Ahmed wrote:
> > sizeof(struct zs_page) change is one thing. Another thing is that
> > zspage->lock is taken from atomic sections, pretty much everywhere.
> > compaction/migration write-lock it under pool rwlock and class spinlock,
> > but both compaction and migration now EAGAIN if the lock is locked
> > already, so that is sorted out.
> >
> > The remaining problem is map(), which takes zspage read-lock under pool
> > rwlock. RFC series (which you hated with passion :P) converted all zsmalloc
> > into preemptible ones because of this - zspage->lock is a nested leaf-lock,
> > so it cannot schedule unless locks it's nested under permit it (needless to
> > say neither rwlock nor spinlock permit it).
>
> Hmm, so we want the lock to be preemtible, but we don't want to use an
> existing preemtible lock because it may be held it from atomic context.
>
> I think one problem here is that the lock you are introducing is a
> spinning lock but the lock holder can be preempted. This is why spinning
> locks do not allow preemption. Others waiting for the lock can spin
> waiting for a process that is scheduled out.
>
> For example, the compaction/migration code could be sleeping holding the
> write lock, and a map() call would spin waiting for that sleeping task.
write-lock holders cannot sleep, that's the key part.
So the rules are:
1) writer cannot sleep
- migration/compaction runs in atomic context and grabs
write-lock only from atomic context
- write-locking function disables preemption before lock(), just to be
safe, and enables it after unlock()
2) writer does not spin waiting
- that's why there is only write_try_lock function
- compaction and migration bail out when they cannot lock the
zspage
3) readers can sleep and can spin waiting for a lock
- other (even preempted) readers don't block new readers
- writers don't sleep, they always unlock
> I wonder if there's a way to rework the locking instead to avoid the
> nesting. It seems like sometimes we lock the zspage with the pool lock
> held, sometimes with the class lock held, and sometimes with no lock
> held.
>
> What are the rules here for acquiring the zspage lock?
Most of that code is not written by me, but I think the rule is to disable
"migration" be it via pool lock or class lock.
> Do we need to hold another lock just to make sure the zspage does not go
> away from under us?
Yes, the page cannot go away via "normal" path:
zs_free(last object) -> zspage becomes empty -> free zspage
so when we have active mapping() it's only migration and compaction
that can free zspage (its content is migrated and so it becomes empty).
> Can we use RCU or something similar to do that instead?
Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data
patterns the clients have. I suspect we'd need to synchronize RCU every
time a zspage is freed: zs_free() [this one is complicated], or migration,
or compaction? Sounds like anti-pattern for RCU?
next prev parent reply other threads:[~2025-02-05 2:43 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41 ` Hillf Danton
2025-02-03 3:21 ` Sergey Senozhatsky
2025-02-03 3:52 ` Sergey Senozhatsky
2025-02-03 12:39 ` Sergey Senozhatsky
2025-01-31 22:55 ` Andrew Morton
2025-02-03 3:26 ` Sergey Senozhatsky
2025-02-03 7:11 ` Sergey Senozhatsky
2025-02-03 7:33 ` Sergey Senozhatsky
2025-02-04 0:19 ` Andrew Morton
2025-02-04 4:22 ` Sergey Senozhatsky
2025-02-06 7:01 ` Sergey Senozhatsky
2025-02-06 7:38 ` Sebastian Andrzej Siewior
2025-02-06 7:47 ` Sergey Senozhatsky
2025-02-06 8:13 ` Sebastian Andrzej Siewior
2025-02-06 8:17 ` Sergey Senozhatsky
2025-02-06 8:26 ` Sebastian Andrzej Siewior
2025-02-06 8:29 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01 9:21 ` Kairui Song
2025-02-03 3:49 ` Sergey Senozhatsky
2025-02-03 21:00 ` Yosry Ahmed
2025-02-06 12:26 ` Sergey Senozhatsky
2025-02-06 6:55 ` Kairui Song
2025-02-06 7:22 ` Sergey Senozhatsky
2025-02-06 8:22 ` Sergey Senozhatsky
2025-02-06 16:16 ` Yosry Ahmed
2025-02-07 2:56 ` Sergey Senozhatsky
2025-02-07 6:12 ` Sergey Senozhatsky
2025-02-07 21:07 ` Yosry Ahmed
2025-02-08 16:20 ` Sergey Senozhatsky
2025-02-08 16:41 ` Sergey Senozhatsky
2025-02-09 6:22 ` Sergey Senozhatsky
2025-02-09 7:42 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46 ` Yosry Ahmed
2025-02-03 4:57 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51 ` Yosry Ahmed
2025-02-03 3:13 ` Sergey Senozhatsky
2025-02-03 4:56 ` Sergey Senozhatsky
2025-02-03 21:11 ` Yosry Ahmed
2025-02-04 6:59 ` Sergey Senozhatsky
2025-02-04 17:19 ` Yosry Ahmed
2025-02-05 2:43 ` Sergey Senozhatsky [this message]
2025-02-05 19:06 ` Yosry Ahmed
2025-02-06 3:05 ` Sergey Senozhatsky
2025-02-06 3:28 ` Sergey Senozhatsky
2025-02-06 16:19 ` Yosry Ahmed
2025-02-07 2:48 ` Sergey Senozhatsky
2025-02-07 21:09 ` Yosry Ahmed
2025-02-12 5:00 ` Sergey Senozhatsky
2025-02-12 15:35 ` Yosry Ahmed
2025-02-13 2:18 ` Sergey Senozhatsky
2025-02-13 2:57 ` Yosry Ahmed
2025-02-13 7:21 ` Sergey Senozhatsky
2025-02-13 8:22 ` Sergey Senozhatsky
2025-02-13 15:25 ` Yosry Ahmed
2025-02-14 3:33 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv \
--to=senozhatsky@chromium.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox