linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	linux-mm <linux-mm@kvack.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 7/8] zsmalloc: replace per zpage lock with pool->migrate_lock
Date: Thu, 11 Nov 2021 15:11:34 -0800	[thread overview]
Message-ID: <YY2jJgMxJHPNs9ES@google.com> (raw)
In-Reply-To: <20211111090727.eq67hxfpux23dagd@linutronix.de>

On Thu, Nov 11, 2021 at 10:07:27AM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-11-10 10:54:32 [-0800], Minchan Kim wrote:
> > The zsmalloc has used a bit for spin_lock in zpage handle to keep
> > zpage object alive during several operations. However, it causes
> > the problem for PREEMPT_RT as well as introducing too complicated.
> > 
> > This patch replaces the bit spin_lock with pool->migrate_lock
> > rwlock. It could make the code simple as well as zsmalloc work
> > under PREEMPT_RT.
> > 
> > The drawback is the pool->migrate_lock is bigger granuarity than
> > per zpage lock so the contention would be higher than old when
> > both IO-related operations(i.e., zsmalloc, zsfree, zs_[map|unmap])
> > and compaction(page/zpage migration) are going in parallel(*,
> > the migrate_lock is rwlock and IO related functions are all read
> > side lock so there is no contention). However, the write-side
> > is fast enough(dominant overhead is just page copy) so it wouldn't
> > affect much. If the lock granurity becomes more problem later,
> > we could introduce table locks based on handle as a hash value.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> …
> > index b8b098be92fa..5d4c4d254679 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1789,6 +1767,11 @@ static void migrate_write_lock(struct zspage *zspage)
> >  	write_lock(&zspage->lock);
> >  }
> >  
> > +static void migrate_write_lock_nested(struct zspage *zspage)
> > +{
> > +	write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING);
> 
> I don't have this in my tree. 

I forgot it. I append it at the tail of the thread. 
I will also include it at nest revision.

> 
> > +}
> > +
> >  static void migrate_write_unlock(struct zspage *zspage)
> >  {
> >  	write_unlock(&zspage->lock);
> …
> > @@ -2077,8 +2043,13 @@ static unsigned long __zs_compact(struct zs_pool *pool,
> >  	struct zspage *dst_zspage = NULL;
> >  	unsigned long pages_freed = 0;
> >  
> > +	/* protect the race between zpage migration and zs_free */
> > +	write_lock(&pool->migrate_lock);
> > +	/* protect zpage allocation/free */
> >  	spin_lock(&class->lock);
> >  	while ((src_zspage = isolate_zspage(class, true))) {
> > +		/* protect someone accessing the zspage(i.e., zs_map_object) */
> > +		migrate_write_lock(src_zspage);
> >  
> >  		if (!zs_can_compact(class))
> >  			break;
> > @@ -2087,6 +2058,8 @@ static unsigned long __zs_compact(struct zs_pool *pool,
> >  		cc.s_page = get_first_page(src_zspage);
> >  
> >  		while ((dst_zspage = isolate_zspage(class, false))) {
> > +			migrate_write_lock_nested(dst_zspage);
> > +
> >  			cc.d_page = get_first_page(dst_zspage);
> >  			/*
> >  			 * If there is no more space in dst_page, resched
> 
> Looking at the these two chunks, the page here comes from a list, you
> remove that page from that list and this ensures that you can't lock the
> very same pages in reverse order as in:
> 
>    migrate_write_lock(dst_zspage);
>    …
>    	migrate_write_lock(src_zspage);
> 
> right?

Sure.

From e0bfc5185bbd15c651a7a367b6d053b8c88b1e01 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 19 Oct 2021 15:34:09 -0700
Subject: [PATCH] locking/rwlocks: introduce write_lock_nested

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/rwlock.h          | 6 ++++++
 include/linux/rwlock_api_smp.h  | 9 +++++++++
 include/linux/spinlock_api_up.h | 1 +
 kernel/locking/spinlock.c       | 6 ++++++
 4 files changed, 22 insertions(+)

diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 7ce9a51ae5c0..93086de7bf9e 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -70,6 +70,12 @@ do {								\
 #define write_lock(lock)	_raw_write_lock(lock)
 #define read_lock(lock)		_raw_read_lock(lock)
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define write_lock_nested(lock, subclass)	_raw_write_lock_nested(lock, subclass)
+#else
+#define write_lock_nested(lock, subclass)	_raw_write_lock(lock)
+#endif
+
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 
 #define read_lock_irqsave(lock, flags)			\
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index abfb53ab11be..e0c866177c03 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -17,6 +17,7 @@
 
 void __lockfunc _raw_read_lock(rwlock_t *lock)		__acquires(lock);
 void __lockfunc _raw_write_lock(rwlock_t *lock)		__acquires(lock);
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass)	__acquires(lock);
 void __lockfunc _raw_read_lock_bh(rwlock_t *lock)	__acquires(lock);
 void __lockfunc _raw_write_lock_bh(rwlock_t *lock)	__acquires(lock);
 void __lockfunc _raw_read_lock_irq(rwlock_t *lock)	__acquires(lock);
@@ -46,6 +47,7 @@ _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 
 #ifdef CONFIG_INLINE_WRITE_LOCK
 #define _raw_write_lock(lock) __raw_write_lock(lock)
+#define _raw_write_lock_nested(lock, subclass) __raw_write_lock_nested(lock, subclass)
 #endif
 
 #ifdef CONFIG_INLINE_READ_LOCK_BH
@@ -211,6 +213,13 @@ static inline void __raw_write_lock(rwlock_t *lock)
 	LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
 }
 
+static inline void __raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
+}
+
 #endif /* !CONFIG_GENERIC_LOCKBREAK || CONFIG_DEBUG_LOCK_ALLOC */
 
 static inline void __raw_write_unlock(rwlock_t *lock)
diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h
index d0d188861ad6..b8ba00ccccde 100644
--- a/include/linux/spinlock_api_up.h
+++ b/include/linux/spinlock_api_up.h
@@ -59,6 +59,7 @@
 #define _raw_spin_lock_nested(lock, subclass)	__LOCK(lock)
 #define _raw_read_lock(lock)			__LOCK(lock)
 #define _raw_write_lock(lock)			__LOCK(lock)
+#define _raw_write_lock_nested(lock, subclass)	__LOCK(lock)
 #define _raw_spin_lock_bh(lock)			__LOCK_BH(lock)
 #define _raw_read_lock_bh(lock)			__LOCK_BH(lock)
 #define _raw_write_lock_bh(lock)		__LOCK_BH(lock)
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index c5830cfa379a..22969ec69288 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -300,6 +300,12 @@ void __lockfunc _raw_write_lock(rwlock_t *lock)
 	__raw_write_lock(lock);
 }
 EXPORT_SYMBOL(_raw_write_lock);
+
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+	__raw_write_lock_nested(lock, subclass);
+}
+EXPORT_SYMBOL(_raw_write_lock_nested);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
-- 
2.34.0.rc1.387.gb447b232ab-goog



  reply	other threads:[~2021-11-11 23:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10 18:54 [PATCH 0/8] zsmalloc: remove bit_spin_lock Minchan Kim
2021-11-10 18:54 ` [PATCH 1/8] zsmalloc: introduce some helper functions Minchan Kim
2021-11-10 18:54 ` [PATCH 2/8] zsmalloc: rename zs_stat_type to class_stat_type Minchan Kim
2021-11-10 18:54 ` [PATCH 3/8] zsmalloc: decouple class actions from zspage works Minchan Kim
2021-11-10 18:54 ` [PATCH 4/8] zsmalloc: introduce obj_allocated Minchan Kim
2021-11-10 18:54 ` [PATCH 5/8] zsmalloc: move huge compressed obj from page to zspage Minchan Kim
2021-11-10 18:54 ` [PATCH 6/8] zsmalloc: remove zspage isolation for migration Minchan Kim
2021-11-10 18:54 ` [PATCH 7/8] zsmalloc: replace per zpage lock with pool->migrate_lock Minchan Kim
2021-11-11  9:07   ` Sebastian Andrzej Siewior
2021-11-11 23:11     ` Minchan Kim [this message]
2021-11-12  7:28       ` Sebastian Andrzej Siewior
2021-11-12  7:31       ` Sebastian Andrzej Siewior
2021-11-12 22:10         ` Minchan Kim
2021-11-11 10:13   ` kernel test robot
2021-11-10 18:54 ` [PATCH 8/8] zsmalloc: replace get_cpu_var with local_lock Minchan Kim
2021-11-11  8:56   ` Sebastian Andrzej Siewior
2021-11-11 23:08     ` Minchan Kim
2021-11-15  3:56   ` Davidlohr Bueso
2021-11-15  7:27     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YY2jJgMxJHPNs9ES@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-mm@kvack.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox