From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Thu, 6 Feb 2025 16:19:36 +0000 [thread overview]
Message-ID: <Z6ThGFt6wyNpx9xi@google.com> (raw)
In-Reply-To: <6uhsj4bckhursiblkxe54azfgyqal6tq2de3lpkxw6omkised6@uylodcjruuei>
On Thu, Feb 06, 2025 at 12:05:55PM +0900, Sergey Senozhatsky wrote:
> On (25/02/05 19:06), Yosry Ahmed wrote:
> > > > For example, the compaction/migration code could be sleeping holding the
> > > > write lock, and a map() call would spin waiting for that sleeping task.
> > >
> > > write-lock holders cannot sleep, that's the key part.
> > >
> > > So the rules are:
> > >
> > > 1) writer cannot sleep
> > > - migration/compaction runs in atomic context and grabs
> > > write-lock only from atomic context
> > > - write-locking function disables preemption before lock(), just to be
> > > safe, and enables it after unlock()
> > >
> > > 2) writer does not spin waiting
> > > - that's why there is only write_try_lock function
> > > - compaction and migration bail out when they cannot lock the
> > > zspage
> > >
> > > 3) readers can sleep and can spin waiting for a lock
> > > - other (even preempted) readers don't block new readers
> > > - writers don't sleep, they always unlock
> >
> > That's useful, thanks. If we go with custom locking we need to document
> > this clearly and add debug checks where possible.
>
> Sure. That's what it currently looks like (can always improve)
>
> ---
> /*
> * zspage lock permits preemption on the reader-side (there can be multiple
> * readers). Writers (exclusive zspage ownership), on the other hand, are
> * always run in atomic context and cannot spin waiting for a (potentially
> * preempted) reader to unlock zspage. This, basically, means that writers
> * can only call write-try-lock and must bail out if it didn't succeed.
> *
> * At the same time, writers cannot reschedule under zspage write-lock,
> * so readers can spin waiting for the writer to unlock zspage.
> */
> static void zspage_read_lock(struct zspage *zspage)
> {
> atomic_t *lock = &zspage->lock;
> int old = atomic_read_acquire(lock);
>
> do {
> if (old == ZS_PAGE_WRLOCKED) {
> cpu_relax();
> old = atomic_read_acquire(lock);
> continue;
> }
> } while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1));
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_);
> #endif
> }
>
> static void zspage_read_unlock(struct zspage *zspage)
> {
> atomic_dec_return_release(&zspage->lock);
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> rwsem_release(&zspage->lockdep_map, _RET_IP_);
> #endif
> }
>
> static bool zspage_try_write_lock(struct zspage *zspage)
> {
> atomic_t *lock = &zspage->lock;
> int old = ZS_PAGE_UNLOCKED;
>
> preempt_disable();
> if (atomic_try_cmpxchg_acquire(lock, &old, ZS_PAGE_WRLOCKED)) {
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> rwsem_acquire(&zspage->lockdep_map, 0, 0, _RET_IP_);
> #endif
> return true;
> }
>
> preempt_enable();
> return false;
> }
>
> static void zspage_write_unlock(struct zspage *zspage)
> {
> atomic_set_release(&zspage->lock, ZS_PAGE_UNLOCKED);
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> rwsem_release(&zspage->lockdep_map, _RET_IP_);
> #endif
> preempt_enable();
> }
> ---
>
> Maybe I'll just copy-paste the locking rules list, a list is always cleaner.
Thanks. I think it would be nice if we could also get someone with
locking expertise to take a look at this.
>
> > > > I wonder if there's a way to rework the locking instead to avoid the
> > > > nesting. It seems like sometimes we lock the zspage with the pool lock
> > > > held, sometimes with the class lock held, and sometimes with no lock
> > > > held.
> > > >
> > > > What are the rules here for acquiring the zspage lock?
> > >
> > > Most of that code is not written by me, but I think the rule is to disable
> > > "migration" be it via pool lock or class lock.
> >
> > It seems like we're not holding either of these locks in
> > async_free_zspage() when we call lock_zspage(). Is it safe for a
> > different reason?
>
> I think we hold size class lock there. async-free is only for pages that
> reached 0 usage ratio (empty fullness group), so they don't hold any
> objects any more and from her such zspages either get freed or
> find_get_zspage() recovers them from fullness 0 and allocates an object.
> Both are synchronized by size class lock.
>
> > > Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data
> > > patterns the clients have. I suspect we'd need to synchronize RCU every
> > > time a zspage is freed: zs_free() [this one is complicated], or migration,
> > > or compaction? Sounds like anti-pattern for RCU?
> >
> > Can't we use kfree_rcu() instead of synchronizing? Not sure if this
> > would still be an antipattern tbh.
>
> Yeah, I don't know. The last time I wrongly used kfree_rcu() it caused a
> 27% performance drop (some internal code). This zspage thingy maybe will
> be better, but still has a potential to generate high numbers of RCU calls,
> depends on the clients. Probably the chances are too high. Apart from
> that, kvfree_rcu() can sleep, as far as I understand, so zram might have
> some extra things to deal with, namely slot-free notifications which can
> be called from softirq, and always called under spinlock:
>
> mm slot-free -> zram slot-free -> zs_free -> empty zspage -> kfree_rcu
>
> > It just seems like the current locking scheme is really complicated :/
>
> That's very true.
Seems like we have to compromise either way, custom locking or we enter
into a new complexity realm with RCU freeing.
next prev parent reply other threads:[~2025-02-06 16:19 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41 ` Hillf Danton
2025-02-03 3:21 ` Sergey Senozhatsky
2025-02-03 3:52 ` Sergey Senozhatsky
2025-02-03 12:39 ` Sergey Senozhatsky
2025-01-31 22:55 ` Andrew Morton
2025-02-03 3:26 ` Sergey Senozhatsky
2025-02-03 7:11 ` Sergey Senozhatsky
2025-02-03 7:33 ` Sergey Senozhatsky
2025-02-04 0:19 ` Andrew Morton
2025-02-04 4:22 ` Sergey Senozhatsky
2025-02-06 7:01 ` Sergey Senozhatsky
2025-02-06 7:38 ` Sebastian Andrzej Siewior
2025-02-06 7:47 ` Sergey Senozhatsky
2025-02-06 8:13 ` Sebastian Andrzej Siewior
2025-02-06 8:17 ` Sergey Senozhatsky
2025-02-06 8:26 ` Sebastian Andrzej Siewior
2025-02-06 8:29 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01 9:21 ` Kairui Song
2025-02-03 3:49 ` Sergey Senozhatsky
2025-02-03 21:00 ` Yosry Ahmed
2025-02-06 12:26 ` Sergey Senozhatsky
2025-02-06 6:55 ` Kairui Song
2025-02-06 7:22 ` Sergey Senozhatsky
2025-02-06 8:22 ` Sergey Senozhatsky
2025-02-06 16:16 ` Yosry Ahmed
2025-02-07 2:56 ` Sergey Senozhatsky
2025-02-07 6:12 ` Sergey Senozhatsky
2025-02-07 21:07 ` Yosry Ahmed
2025-02-08 16:20 ` Sergey Senozhatsky
2025-02-08 16:41 ` Sergey Senozhatsky
2025-02-09 6:22 ` Sergey Senozhatsky
2025-02-09 7:42 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46 ` Yosry Ahmed
2025-02-03 4:57 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51 ` Yosry Ahmed
2025-02-03 3:13 ` Sergey Senozhatsky
2025-02-03 4:56 ` Sergey Senozhatsky
2025-02-03 21:11 ` Yosry Ahmed
2025-02-04 6:59 ` Sergey Senozhatsky
2025-02-04 17:19 ` Yosry Ahmed
2025-02-05 2:43 ` Sergey Senozhatsky
2025-02-05 19:06 ` Yosry Ahmed
2025-02-06 3:05 ` Sergey Senozhatsky
2025-02-06 3:28 ` Sergey Senozhatsky
2025-02-06 16:19 ` Yosry Ahmed [this message]
2025-02-07 2:48 ` Sergey Senozhatsky
2025-02-07 21:09 ` Yosry Ahmed
2025-02-12 5:00 ` Sergey Senozhatsky
2025-02-12 15:35 ` Yosry Ahmed
2025-02-13 2:18 ` Sergey Senozhatsky
2025-02-13 2:57 ` Yosry Ahmed
2025-02-13 7:21 ` Sergey Senozhatsky
2025-02-13 8:22 ` Sergey Senozhatsky
2025-02-13 15:25 ` Yosry Ahmed
2025-02-14 3:33 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6ThGFt6wyNpx9xi@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox