linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Kairui Song <ryncsn@gmail.com>, Minchan Kim <minchan@kernel.org>,
	 linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 12/18] zsmalloc: make zspage lock preemptible
Date: Thu, 13 Feb 2025 10:20:26 +0900	[thread overview]
Message-ID: <etumn4tax7g5c3wygn2aazmo5m7f4ydfji7ehno5i6jckkf27e@mu3fisrw5jcc> (raw)
In-Reply-To: <Z6zXEktee8OS51hg@google.com>

On (25/02/12 17:14), Yosry Ahmed wrote:
> On Wed, Feb 12, 2025 at 03:27:10PM +0900, Sergey Senozhatsky wrote:
> > Switch over from rwlock_t to a atomic_t variable that takes negative
> > value when the page is under migration, or positive values when the
> > page is used by zsmalloc users (object map, etc.)   Using a rwsem
> > per-zspage is a little too memory heavy, a simple atomic_t should
> > suffice.
> 
> We should also explain that rwsem cannot be used due to the locking
> context (we need to hold it in an atomic context). Basically what you
> explained to me before :)
>
> > zspage lock is a leaf lock for zs_map_object(), where it's read-acquired.
> > Since this lock now permits preemption extra care needs to be taken when
> > it is write-acquired - all writers grab it in atomic context, so they
> > cannot spin and wait for (potentially preempted) reader to unlock zspage.
> > There are only two writers at this moment - migration and compaction.  In
> > both cases we use write-try-lock and bail out if zspage is read locked.
> > Writers, on the other hand, never get preempted, so readers can spin
> > waiting for the writer to unlock zspage.
> 
> The details are important, but I think we want to concisely state the
> problem statement either before or after. Basically we want a lock that
> we *never* sleep while acquiring but *can* sleep while holding in read
> mode. This allows holding the lock from any context, but also being
> preemptible if the context allows it.

Ack.

[..]
> > +/*
> > + * zspage locking rules:
> 
> Also here we need to state our key rule:
> Never sleep while acquiring, preemtible while holding (if possible). The
> following rules are basically how we make sure we keep this true.
> 
> > + *
> > + * 1) writer-lock is exclusive
> > + *
> > + * 2) writer-lock owner cannot sleep
> > + *
> > + * 3) writer-lock owner cannot spin waiting for the lock
> > + *   - caller (e.g. compaction and migration) must check return value and
> > + *     handle locking failures
> > + *   - there is only TRY variant of writer-lock function
> > + *
> > + * 4) reader-lock owners (multiple) can sleep
> > + *
> > + * 5) reader-lock owners can spin waiting for the lock, in any context
> > + *   - existing readers (even preempted ones) don't block new readers
> > + *   - writer-lock owners never sleep, always unlock at some point
> 
> 
> May I suggest something more concise and to the point?
> 
> /*
>  * The zspage lock can be held from atomic contexts, but it needs to remain
>  * preemptible when held for reading because it remains held outside of those
>  * atomic contexts, otherwise we unnecessarily lose preemptibility.
>  *
>  * To achieve this, the following rules are enforced on readers and writers:
>  *
>  * - Writers are blocked by both writers and readers, while readers are only
>  *   blocked by writers (i.e. normal rwlock semantics).
>  *
>  * - Writers are always atomic (to allow readers to spin waiting for them).
>  *
>  * - Writers always use trylock (as the lock may be held be sleeping readers).
>  *
>  * - Readers may spin on the lock (as they can only wait for atomic writers).
>  *
>  * - Readers may sleep while holding the lock (as writes only use trylock).
>  */

Looks good, thanks.

> > + */
> > +static void zspage_read_lock(struct zspage *zspage)
> > +{
> > +	atomic_t *lock = &zspage->lock;
> > +	int old = atomic_read_acquire(lock);
> > +
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_);
> > +#endif
> > +
> > +	do {
> > +		if (old == ZS_PAGE_WRLOCKED) {
> > +			cpu_relax();
> > +			old = atomic_read_acquire(lock);
> > +			continue;
> > +		}
> > +	} while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1));
> > +}
> > +
> > +static void zspage_read_unlock(struct zspage *zspage)
> > +{
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	rwsem_release(&zspage->lockdep_map, _RET_IP_);
> > +#endif
> > +	atomic_dec_return_release(&zspage->lock);
> > +}
> > +
> > +static __must_check bool zspage_try_write_lock(struct zspage *zspage)
> 
> I believe zspage_write_trylock() would be closer to the normal rwlock
> naming.

It derived its name from rwsem "age".  Can rename.

> > +{
> > +	atomic_t *lock = &zspage->lock;
> > +	int old = ZS_PAGE_UNLOCKED;
> > +
> > +	WARN_ON_ONCE(preemptible());
> 
> Hmm I know I may have been the one suggesting this, but do we actually
> need it? We disable preemption explicitly anyway before holding the
> lock.

This is just to make sure that the precondition for
"writer is always atomic" is satisfied.  But I can drop it.

> >  	size_class_lock(class);
> > -	/* the migrate_write_lock protects zpage access via zs_map_object */
> > -	migrate_write_lock(zspage);
> > +	/* the zspage write_lock protects zpage access via zs_map_object */
> > +	if (!zspage_try_write_lock(zspage)) {
> > +		size_class_unlock(class);
> > +		pool_write_unlock(pool);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* We're committed, tell the world that this is a Zsmalloc page. */
> > +	__zpdesc_set_zsmalloc(newzpdesc);
> 
> We used to do this earlier on, before any locks are held. Why is it
> moved here?

I want to do that only if zspaage write-trylock has succeeded (we didn't
have any error out paths before).


  reply	other threads:[~2025-02-13  1:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-12  6:26 [PATCH v5 00/18] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-02-12  6:26 ` [PATCH v5 01/18] zram: sleepable entry locking Sergey Senozhatsky
2025-02-13  0:08   ` Andrew Morton
2025-02-13  0:52     ` Sergey Senozhatsky
2025-02-13  1:42       ` Sergey Senozhatsky
2025-02-13  8:49         ` Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 02/18] zram: permit preemption with active compression stream Sergey Senozhatsky
2025-02-12 16:01   ` Yosry Ahmed
2025-02-13  1:04     ` Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 03/18] zram: remove crypto include Sergey Senozhatsky
2025-02-12 16:13   ` Yosry Ahmed
2025-02-13  0:53     ` Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 04/18] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 05/18] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 06/18] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 07/18] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 08/18] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 09/18] zram: rework recompression loop Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 10/18] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-02-12 16:18   ` Yosry Ahmed
2025-02-12 16:19     ` Yosry Ahmed
2025-02-13  0:57     ` Sergey Senozhatsky
2025-02-13  1:12       ` Yosry Ahmed
2025-02-13  2:54         ` Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 11/18] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 12/18] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-02-12 17:14   ` Yosry Ahmed
2025-02-13  1:20     ` Sergey Senozhatsky [this message]
2025-02-13  1:31       ` Yosry Ahmed
2025-02-13  1:53         ` Sergey Senozhatsky
2025-02-13 11:32   ` Hillf Danton
2025-02-13 12:29     ` Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 13/18] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 14/18] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 15/18] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 16/18] zram: do not leak page on recompress_store error path Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 17/18] zram: do not leak page on writeback_store " Sergey Senozhatsky
2025-02-12  6:27 ` [PATCH v5 18/18] zram: add might_sleep to zcomp API Sergey Senozhatsky
2025-02-13  0:09 ` [PATCH v5 00/18] zsmalloc/zram: there be preemption Andrew Morton
2025-02-13  0:51   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=etumn4tax7g5c3wygn2aazmo5m7f4ydfji7ehno5i6jckkf27e@mu3fisrw5jcc \
    --to=senozhatsky@chromium.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox