* [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
@ 2022-08-17 16:26 ` Sebastian Andrzej Siewior
2022-08-18 9:42 ` Christoph Lameter
` (2 more replies)
2022-08-17 16:26 ` [PATCH 4/9] mm/vmstat: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
` (3 subsequent siblings)
4 siblings, 3 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-17 16:26 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, Christoph Lameter, David Rientjes,
Joonsoo Kim, Pekka Enberg, Vlastimil Babka, linux-mm,
Sebastian Andrzej Siewior
From: Thomas Gleixner <tglx@linutronix.de>
The slub code already has a few helpers depending on PREEMPT_RT. Add a few
more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/slub.c | 66 +++++++++++++++++++++++++------------------------------
1 file changed, 30 insertions(+), 36 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 862dbd9af4f52..5f7c5b5bd49f9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -100,9 +100,11 @@
* except the stat counters. This is a percpu structure manipulated only by
* the local cpu, so the lock protects against being preempted or interrupted
* by an irq. Fast path operations rely on lockless operations instead.
- * On PREEMPT_RT, the local lock does not actually disable irqs (and thus
- * prevent the lockless operations), so fastpath operations also need to take
- * the lock and are no longer lockless.
+ *
+ * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
+ * which means the lockless fastpath cannot be used as it might interfere with
+ * an in-progress slow path operations. In this case the local lock is always
+ * taken but it still utilizes the freelist for the common operations.
*
* lockless fastpaths
*
@@ -163,8 +165,11 @@
* function call even on !PREEMPT_RT, use inline preempt_disable() there.
*/
#ifndef CONFIG_PREEMPT_RT
-#define slub_get_cpu_ptr(var) get_cpu_ptr(var)
-#define slub_put_cpu_ptr(var) put_cpu_ptr(var)
+#define slub_get_cpu_ptr(var) get_cpu_ptr(var)
+#define slub_put_cpu_ptr(var) put_cpu_ptr(var)
+#define use_lockless_fast_path() (true)
+#define slub_local_irq_save(flags) local_irq_save(flags)
+#define slub_local_irq_restore(flags) local_irq_restore(flags)
#else
#define slub_get_cpu_ptr(var) \
({ \
@@ -176,6 +181,9 @@ do { \
(void)(var); \
migrate_enable(); \
} while (0)
+#define use_lockless_fast_path() (false)
+#define slub_local_irq_save(flags) do { } while (0)
+#define slub_local_irq_restore(flags) do { } while (0)
#endif
#ifdef CONFIG_SLUB_DEBUG
@@ -460,16 +468,14 @@ static __always_inline void __slab_unlock(struct slab *slab)
static __always_inline void slab_lock(struct slab *slab, unsigned long *flags)
{
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- local_irq_save(*flags);
+ slub_local_irq_save(*flags);
__slab_lock(slab);
}
static __always_inline void slab_unlock(struct slab *slab, unsigned long *flags)
{
__slab_unlock(slab);
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- local_irq_restore(*flags);
+ slub_local_irq_restore(*flags);
}
/*
@@ -482,7 +488,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab
void *freelist_new, unsigned long counters_new,
const char *n)
{
- if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ if (use_lockless_fast_path())
lockdep_assert_irqs_disabled();
#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
@@ -3197,14 +3203,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l
object = c->freelist;
slab = c->slab;
- /*
- * We cannot use the lockless fastpath on PREEMPT_RT because if a
- * slowpath has taken the local_lock_irqsave(), it is not protected
- * against a fast path operation in an irq handler. So we need to take
- * the slow path which uses local_lock. It is still relatively fast if
- * there is a suitable cpu freelist.
- */
- if (IS_ENABLED(CONFIG_PREEMPT_RT) ||
+
+ if (!use_lockless_fast_path() ||
unlikely(!object || !slab || !node_match(slab, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
} else {
@@ -3463,6 +3463,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
void *tail_obj = tail ? : head;
struct kmem_cache_cpu *c;
unsigned long tid;
+ void **freelist;
redo:
/*
@@ -3477,9 +3478,13 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
/* Same with comment on barrier() in slab_alloc_node() */
barrier();
- if (likely(slab == c->slab)) {
-#ifndef CONFIG_PREEMPT_RT
- void **freelist = READ_ONCE(c->freelist);
+ if (unlikely(slab != c->slab)) {
+ __slab_free(s, slab, head, tail_obj, cnt, addr);
+ return;
+ }
+
+ if (use_lockless_fast_path()) {
+ freelist = READ_ONCE(c->freelist);
set_freepointer(s, tail_obj, freelist);
@@ -3491,16 +3496,8 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
note_cmpxchg_failure("slab_free", s, tid);
goto redo;
}
-#else /* CONFIG_PREEMPT_RT */
- /*
- * We cannot use the lockless fastpath on PREEMPT_RT because if
- * a slowpath has taken the local_lock_irqsave(), it is not
- * protected against a fast path operation in an irq handler. So
- * we need to take the local_lock. We shouldn't simply defer to
- * __slab_free() as that wouldn't use the cpu freelist at all.
- */
- void **freelist;
-
+ } else {
+ /* Update the free list under the local lock */
local_lock(&s->cpu_slab->lock);
c = this_cpu_ptr(s->cpu_slab);
if (unlikely(slab != c->slab)) {
@@ -3515,11 +3512,8 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
c->tid = next_tid(tid);
local_unlock(&s->cpu_slab->lock);
-#endif
- stat(s, FREE_FASTPATH);
- } else
- __slab_free(s, slab, head, tail_obj, cnt, addr);
-
+ }
+ stat(s, FREE_FASTPATH);
}
static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
--
2.37.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/9] mm/vmstat: Use preempt_[dis|en]able_nested()
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
@ 2022-08-17 16:26 ` Sebastian Andrzej Siewior
2022-08-17 16:26 ` [PATCH 5/9] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
` (2 subsequent siblings)
4 siblings, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-17 16:26 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, linux-mm,
Sebastian Andrzej Siewior
From: Thomas Gleixner <tglx@linutronix.de>
Replace the open coded CONFIG_PREEMPT_RT conditional
preempt_enable/disable() pairs with the new helper functions which hide
the underlying implementation details.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/vmstat.c | 36 ++++++++++++------------------------
1 file changed, 12 insertions(+), 24 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 373d2730fcf21..d514fe7f90af0 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -355,8 +355,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
* CPU migrations and preemption potentially corrupts a counter so
* disable preemption.
*/
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
x = delta + __this_cpu_read(*p);
@@ -368,8 +367,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
}
__this_cpu_write(*p, x);
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
EXPORT_SYMBOL(__mod_zone_page_state);
@@ -393,8 +391,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
}
/* See __mod_node_page_state */
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
x = delta + __this_cpu_read(*p);
@@ -406,8 +403,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
}
__this_cpu_write(*p, x);
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
EXPORT_SYMBOL(__mod_node_page_state);
@@ -441,8 +437,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
s8 v, t;
/* See __mod_node_page_state */
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
v = __this_cpu_inc_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
@@ -453,8 +448,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
__this_cpu_write(*p, -overstep);
}
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -466,8 +460,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
/* See __mod_node_page_state */
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
v = __this_cpu_inc_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
@@ -478,8 +471,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
__this_cpu_write(*p, -overstep);
}
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -501,8 +493,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
s8 v, t;
/* See __mod_node_page_state */
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
v = __this_cpu_dec_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
@@ -513,8 +504,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
__this_cpu_write(*p, overstep);
}
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -526,8 +516,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
/* See __mod_node_page_state */
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_disable();
+ preempt_disable_nested();
v = __this_cpu_dec_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
@@ -538,8 +527,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
__this_cpu_write(*p, overstep);
}
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- preempt_enable();
+ preempt_enable_nested();
}
void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
--
2.37.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/9] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
2022-08-17 16:26 ` [PATCH 4/9] mm/vmstat: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-08-17 16:26 ` Sebastian Andrzej Siewior
2022-08-17 16:27 ` [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
2022-08-17 16:27 ` [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
4 siblings, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-17 16:26 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, linux-mm,
Sebastian Andrzej Siewior
From: Thomas Gleixner <tglx@linutronix.de>
Some places in the VM code expect interrupts disabled, which is a valid
expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
some places because the RT spinlock substitution does not disable
interrupts.
To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
PREEMPT_RT=n.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
include/linux/mmdebug.h | 6 ++++++
lib/Kconfig.debug | 3 +++
2 files changed, 9 insertions(+)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 15ae78cd28536..b8728d11c9490 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -94,6 +94,12 @@ void dump_mm(const struct mm_struct *mm);
#define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
#endif
+#ifdef CONFIG_DEBUG_VM_IRQSOFF
+#define VM_WARN_ON_IRQS_ENABLED() WARN_ON_ONCE(!irqs_disabled())
+#else
+#define VM_WARN_ON_IRQS_ENABLED() do { } while (0)
+#endif
+
#ifdef CONFIG_DEBUG_VIRTUAL
#define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
#else
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 072e4b289c13e..c96fc6820544c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -803,6 +803,9 @@ config ARCH_HAS_DEBUG_VM_PGTABLE
An architecture should select this when it can successfully
build and run DEBUG_VM_PGTABLE.
+config DEBUG_VM_IRQSOFF
+ def_bool DEBUG_VM && !PREEMPT_RT
+
config DEBUG_VM
bool "Debug VM"
depends on DEBUG_KERNEL
--
2.37.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
` (2 preceding siblings ...)
2022-08-17 16:26 ` [PATCH 5/9] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
@ 2022-08-17 16:27 ` Sebastian Andrzej Siewior
2022-08-17 16:59 ` Johannes Weiner
2022-08-18 2:45 ` Muchun Song
2022-08-17 16:27 ` [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
4 siblings, 2 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-17 16:27 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Johannes Weiner, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, cgroups, linux-mm,
Sebastian Andrzej Siewior
From: Thomas Gleixner <tglx@linutronix.de>
Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
replace the CONFIG_PREEMPT_RT #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/memcontrol.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b69979c9ced5c..d35b6fa560f0a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -597,25 +597,18 @@ static u64 flush_next_time;
*/
static void memcg_stats_lock(void)
{
-#ifdef CONFIG_PREEMPT_RT
- preempt_disable();
-#else
- VM_BUG_ON(!irqs_disabled());
-#endif
+ preempt_disable_nested();
+ VM_WARN_ON_IRQS_ENABLED();
}
static void __memcg_stats_lock(void)
{
-#ifdef CONFIG_PREEMPT_RT
- preempt_disable();
-#endif
+ preempt_disable_nested();
}
static void memcg_stats_unlock(void)
{
-#ifdef CONFIG_PREEMPT_RT
- preempt_enable();
-#endif
+ preempt_enable_nested();
}
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
@@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
* interrupt context while other caller need to have disabled interrupt.
*/
__memcg_stats_lock();
- if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ if (IS_ENABLED(CONFIG_DEBUG_VM)) {
switch (idx) {
case NR_ANON_MAPPED:
case NR_FILE_MAPPED:
@@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
WARN_ON_ONCE(!in_task());
break;
default:
- WARN_ON_ONCE(!irqs_disabled());
+ VM_WARN_ON_IRQS_ENABLED();
}
}
--
2.37.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
` (3 preceding siblings ...)
2022-08-17 16:27 ` [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
@ 2022-08-17 16:27 ` Sebastian Andrzej Siewior
2022-08-18 8:55 ` Rasmus Villemoes
4 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-17 16:27 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, Nick Terrell, linux-mm,
Sebastian Andrzej Siewior
From: Thomas Gleixner <tglx@linutronix.de>
Move the RT dependency for the initial value of
sysctl_compact_unevictable_allowed into Kconfig.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/Kconfig | 5 +++++
mm/compaction.c | 6 +-----
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 0331f1461f81c..a0506a54a4f3f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -579,6 +579,11 @@ config COMPACTION
it and then we would be really interested to hear about that at
linux-mm@kvack.org.
+config COMPACT_UNEVICTABLE_DEFAULT
+ int
+ default 0 if PREEMPT_RT
+ default 1
+
#
# support for free page reporting
config PAGE_REPORTING
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76228dd9..10561cb1aaad9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1727,11 +1727,7 @@ typedef enum {
* Allow userspace to control policy on scanning the unevictable LRU for
* compactable pages.
*/
-#ifdef CONFIG_PREEMPT_RT
-int sysctl_compact_unevictable_allowed __read_mostly = 0;
-#else
-int sysctl_compact_unevictable_allowed __read_mostly = 1;
-#endif
+int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
static inline void
update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
--
2.37.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals
2022-08-17 16:27 ` [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
@ 2022-08-17 16:59 ` Johannes Weiner
2022-08-18 2:45 ` Muchun Song
1 sibling, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2022-08-17 16:59 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, cgroups, linux-mm
On Wed, Aug 17, 2022 at 06:27:00PM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
> replace the CONFIG_PREEMPT_RT #ifdeffery.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals
2022-08-17 16:27 ` [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
2022-08-17 16:59 ` Johannes Weiner
@ 2022-08-18 2:45 ` Muchun Song
1 sibling, 0 replies; 20+ messages in thread
From: Muchun Song @ 2022-08-18 2:45 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, cgroups, Linux MM
> On Aug 18, 2022, at 00:27, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
> replace the CONFIG_PREEMPT_RT #ifdeffery.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
More simple.
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery
2022-08-17 16:27 ` [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
@ 2022-08-18 8:55 ` Rasmus Villemoes
2022-08-18 15:51 ` Sebastian Andrzej Siewior
2022-08-24 13:50 ` Thomas Gleixner
0 siblings, 2 replies; 20+ messages in thread
From: Rasmus Villemoes @ 2022-08-18 8:55 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, Nick Terrell, linux-mm
On 17/08/2022 18.27, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Move the RT dependency for the initial value of
> sysctl_compact_unevictable_allowed into Kconfig.
>
>
> +config COMPACT_UNEVICTABLE_DEFAULT
> + int
> + default 0 if PREEMPT_RT
> + default 1
> +
> #
> # support for free page reporting
> config PAGE_REPORTING
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 640fa76228dd9..10561cb1aaad9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1727,11 +1727,7 @@ typedef enum {
> * Allow userspace to control policy on scanning the unevictable LRU for
> * compactable pages.
> */
> -#ifdef CONFIG_PREEMPT_RT
> -int sysctl_compact_unevictable_allowed __read_mostly = 0;
> -#else
> -int sysctl_compact_unevictable_allowed __read_mostly = 1;
> -#endif
> +int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
Why introduce a Kconfig symbol for this, and not just spell the
initializer "IS_ENABLED(CONFIG_PREEMPT_RT) ? 0 : 1" or simply
"!IS_ENABLED(CONFIG_PREEMPT_RT)"?
And if you do keep it in Kconfig, shouldn't the symbol be "depends on
COMPACTION" so it doesn't needlessly appear in .config when
!CONFIG_COMPACTION.
Rasmus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
@ 2022-08-18 9:42 ` Christoph Lameter
2022-08-18 14:37 ` Vlastimil Babka
2022-08-18 17:34 ` Linus Torvalds
2022-08-23 17:15 ` Vlastimil Babka
2 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2022-08-18 9:42 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, David Rientjes,
Joonsoo Kim, Pekka Enberg, Vlastimil Babka, linux-mm
On Wed, 17 Aug 2022, Sebastian Andrzej Siewior wrote:
> + * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
> + * which means the lockless fastpath cannot be used as it might interfere with
> + * an in-progress slow path operations. In this case the local lock is always
> + * taken but it still utilizes the freelist for the common operations.
The slub fastpath does not interfere with slow path operations and the
fastpath does not require disabling preemption or interrupts if the
processor supports local rmv operations. So you should be able to use the
fastpath on PREEMPT_RT.
If the fastpath is not possible then you need to disable preemption and
eventually take locks etc and then things may get a bit more complicated.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-18 9:42 ` Christoph Lameter
@ 2022-08-18 14:37 ` Vlastimil Babka
2022-08-18 15:22 ` Sebastian Andrzej Siewior
2022-08-19 15:04 ` Christoph Lameter
0 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2022-08-18 14:37 UTC (permalink / raw)
To: Christoph Lameter, Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, David Rientjes,
Joonsoo Kim, Pekka Enberg, linux-mm
On 8/18/22 11:42, Christoph Lameter wrote:
> On Wed, 17 Aug 2022, Sebastian Andrzej Siewior wrote:
>
>> + * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
>> + * which means the lockless fastpath cannot be used as it might interfere with
>> + * an in-progress slow path operations. In this case the local lock is always
>> + * taken but it still utilizes the freelist for the common operations.
>
> The slub fastpath does not interfere with slow path operations and the
That's true on !PREEMPT_RT because a slowpath operation under
local_lock_irqsave() will disable interrupts, so there can't be a
fastpath operation in an interrupt handler appearing in the middle of a
slowpath operation.
On PREEMPT_RT local_lock_irqsave() doesn't actually disable interrupts,
so that can happen. IIRC we learned that the hard way when Mike
Galbraith was testing early versions of my PREEMPT_RT changes for SLUB.
> fastpath does not require disabling preemption or interrupts if the
> processor supports local rmv operations. So you should be able to use the
> fastpath on PREEMPT_RT.
>
> If the fastpath is not possible then you need to disable preemption and
> eventually take locks etc and then things may get a bit more complicated.
Yeah that's basically the solution for PREEMPT_RT, take the local_lock.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-18 14:37 ` Vlastimil Babka
@ 2022-08-18 15:22 ` Sebastian Andrzej Siewior
2022-08-19 15:04 ` Christoph Lameter
1 sibling, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-18 15:22 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, linux-kernel, Thomas Gleixner, Peter Zijlstra,
Steven Rostedt, Linus Torvalds, Matthew Wilcox, Andrew Morton,
David Rientjes, Joonsoo Kim, Pekka Enberg, linux-mm
On 2022-08-18 16:37:06 [+0200], Vlastimil Babka wrote:
> > The slub fastpath does not interfere with slow path operations and the
>
> That's true on !PREEMPT_RT because a slowpath operation under
> local_lock_irqsave() will disable interrupts, so there can't be a
> fastpath operation in an interrupt handler appearing in the middle of a
> slowpath operation.
>
> On PREEMPT_RT local_lock_irqsave() doesn't actually disable interrupts,
> so that can happen. IIRC we learned that the hard way when Mike
> Galbraith was testing early versions of my PREEMPT_RT changes for SLUB.
I think the point is that local_lock_irqsave() does not disable
preemption. So the lock owner (within the local_lock_irqsave() section)
can be preempted and another task can use the fast path which does not
involve the lock and things go boom from here.
Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery
2022-08-18 8:55 ` Rasmus Villemoes
@ 2022-08-18 15:51 ` Sebastian Andrzej Siewior
2022-08-24 13:50 ` Thomas Gleixner
1 sibling, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-18 15:51 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, Nick Terrell,
linux-mm
On 2022-08-18 10:55:28 [+0200], Rasmus Villemoes wrote:
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1727,11 +1727,7 @@ typedef enum {
> > * Allow userspace to control policy on scanning the unevictable LRU for
> > * compactable pages.
> > */
> > -#ifdef CONFIG_PREEMPT_RT
> > -int sysctl_compact_unevictable_allowed __read_mostly = 0;
> > -#else
> > -int sysctl_compact_unevictable_allowed __read_mostly = 1;
> > -#endif
> > +int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
>
> Why introduce a Kconfig symbol for this, and not just spell the
> initializer "IS_ENABLED(CONFIG_PREEMPT_RT) ? 0 : 1" or simply
> "!IS_ENABLED(CONFIG_PREEMPT_RT)"?
The idea was to remove the CONFIG_PREEMPT_RT. However if this IS_ENABLED
is preferred, we can certainly do this.
> And if you do keep it in Kconfig, shouldn't the symbol be "depends on
> COMPACTION" so it doesn't needlessly appear in .config when
> !CONFIG_COMPACTION.
Sure, if we keep the Kconfig.
> Rasmus
Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
2022-08-18 9:42 ` Christoph Lameter
@ 2022-08-18 17:34 ` Linus Torvalds
2022-08-23 17:15 ` Vlastimil Babka
2 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2022-08-18 17:34 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Matthew Wilcox, Andrew Morton, Christoph Lameter, David Rientjes,
Joonsoo Kim, Pekka Enberg, Vlastimil Babka, linux-mm
On Wed, Aug 17, 2022 at 9:27 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The slub code already has a few helpers depending on PREEMPT_RT. Add a few
> more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.
>
> No functional change.
Looks like a fine cleanup, but I'd prefer that
#define use_lockless_fast_path() {(true)/(false)}
to look much less like it's some function call. The first read-through
it looked like some very dynamic thing to me.
Just doing
#define USE_LOCKLESS_FAST_PATH ..
would make it much more visually obvious that it's not some kind of
complex run-time decision.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-18 14:37 ` Vlastimil Babka
2022-08-18 15:22 ` Sebastian Andrzej Siewior
@ 2022-08-19 15:04 ` Christoph Lameter
2022-08-25 5:15 ` Hyeonggon Yoo
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2022-08-19 15:04 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Sebastian Andrzej Siewior, linux-kernel, Thomas Gleixner,
Peter Zijlstra, Steven Rostedt, Linus Torvalds, Matthew Wilcox,
Andrew Morton, David Rientjes, Joonsoo Kim, Pekka Enberg,
linux-mm
On Thu, 18 Aug 2022, Vlastimil Babka wrote:
> On 8/18/22 11:42, Christoph Lameter wrote:
> > On Wed, 17 Aug 2022, Sebastian Andrzej Siewior wrote:
> >
> >> + * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
> >> + * which means the lockless fastpath cannot be used as it might interfere with
> >> + * an in-progress slow path operations. In this case the local lock is always
> >> + * taken but it still utilizes the freelist for the common operations.
> >
> > The slub fastpath does not interfere with slow path operations and the
>
> That's true on !PREEMPT_RT because a slowpath operation under
> local_lock_irqsave() will disable interrupts, so there can't be a
> fastpath operation in an interrupt handler appearing in the middle of a
> slowpath operation.
>
> On PREEMPT_RT local_lock_irqsave() doesn't actually disable interrupts,
> so that can happen. IIRC we learned that the hard way when Mike
> Galbraith was testing early versions of my PREEMPT_RT changes for SLUB.
Well yes if you enable interrupts during the slowpath then interrupts may
use the fastpath. That is a basic design change to the way concurrency is
handled in the allocators.
There needs to be some fix here to restore the exclusion of the fastpath
during slow path processing. This could be
A) Exclude the fastpath during slowpath operations
This can be accomplished by setting things up like in the debug mode
that also excludes the fastpath.
or
B) Force interrupt allocations to the slowpath.
Check some flag that indicates an interrupt allocation is occurring and
then bypass the fastpath.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
2022-08-18 9:42 ` Christoph Lameter
2022-08-18 17:34 ` Linus Torvalds
@ 2022-08-23 17:15 ` Vlastimil Babka
2022-08-24 13:25 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2022-08-23 17:15 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, linux-kernel
Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, Christoph Lameter, David Rientjes,
Joonsoo Kim, Pekka Enberg, linux-mm
On 8/17/22 18:26, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The slub code already has a few helpers depending on PREEMPT_RT. Add a few
> more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.
>
> No functional change.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> mm/slub.c | 66 +++++++++++++++++++++++++------------------------------
> 1 file changed, 30 insertions(+), 36 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 862dbd9af4f52..5f7c5b5bd49f9 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -100,9 +100,11 @@
> * except the stat counters. This is a percpu structure manipulated only by
> * the local cpu, so the lock protects against being preempted or interrupted
> * by an irq. Fast path operations rely on lockless operations instead.
> - * On PREEMPT_RT, the local lock does not actually disable irqs (and thus
> - * prevent the lockless operations), so fastpath operations also need to take
> - * the lock and are no longer lockless.
> + *
> + * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
> + * which means the lockless fastpath cannot be used as it might interfere with
> + * an in-progress slow path operations. In this case the local lock is always
> + * taken but it still utilizes the freelist for the common operations.
> *
> * lockless fastpaths
> *
> @@ -163,8 +165,11 @@
> * function call even on !PREEMPT_RT, use inline preempt_disable() there.
> */
> #ifndef CONFIG_PREEMPT_RT
> -#define slub_get_cpu_ptr(var) get_cpu_ptr(var)
> -#define slub_put_cpu_ptr(var) put_cpu_ptr(var)
> +#define slub_get_cpu_ptr(var) get_cpu_ptr(var)
> +#define slub_put_cpu_ptr(var) put_cpu_ptr(var)
> +#define use_lockless_fast_path() (true)
> +#define slub_local_irq_save(flags) local_irq_save(flags)
> +#define slub_local_irq_restore(flags) local_irq_restore(flags)
Note these won't be neccessary anymore after
https://lore.kernel.org/linux-mm/20220823170400.26546-6-vbabka@suse.cz/T/#u
> #else
> #define slub_get_cpu_ptr(var) \
> ({ \
> @@ -176,6 +181,9 @@ do { \
> (void)(var); \
> migrate_enable(); \
> } while (0)
> +#define use_lockless_fast_path() (false)
> +#define slub_local_irq_save(flags) do { } while (0)
> +#define slub_local_irq_restore(flags) do { } while (0)
> #endif
>
> #ifdef CONFIG_SLUB_DEBUG
> @@ -460,16 +468,14 @@ static __always_inline void __slab_unlock(struct slab *slab)
>
> static __always_inline void slab_lock(struct slab *slab, unsigned long *flags)
> {
> - if (IS_ENABLED(CONFIG_PREEMPT_RT))
> - local_irq_save(*flags);
> + slub_local_irq_save(*flags);
> __slab_lock(slab);
> }
>
> static __always_inline void slab_unlock(struct slab *slab, unsigned long *flags)
> {
> __slab_unlock(slab);
> - if (IS_ENABLED(CONFIG_PREEMPT_RT))
> - local_irq_restore(*flags);
> + slub_local_irq_restore(*flags);
> }
Ditto.
> /*
> @@ -482,7 +488,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab
> void *freelist_new, unsigned long counters_new,
> const char *n)
> {
> - if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> + if (use_lockless_fast_path())
> lockdep_assert_irqs_disabled();
This test would stay after the patch I referenced above. But while this
change will keep testing the technically correct thing, the name would be
IMHO misleading here, as this is semantically not about the lockless fast
path, but whether we need to have irqs disabled to avoid a deadlock due to
irq incoming when we hold the bit_spin_lock() and its handler trying to
acquire it as well.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-23 17:15 ` Vlastimil Babka
@ 2022-08-24 13:25 ` Sebastian Andrzej Siewior
2022-08-24 13:54 ` Vlastimil Babka
0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-24 13:25 UTC (permalink / raw)
To: Vlastimil Babka
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, Christoph Lameter,
David Rientjes, Joonsoo Kim, Pekka Enberg, linux-mm
On 2022-08-23 19:15:43 [+0200], Vlastimil Babka wrote:
> > +#define slub_local_irq_save(flags) local_irq_save(flags)
> > +#define slub_local_irq_restore(flags) local_irq_restore(flags)
>
> Note these won't be neccessary anymore after
> https://lore.kernel.org/linux-mm/20220823170400.26546-6-vbabka@suse.cz/T/#u
Okay, let me postpone that one and rebase what is left on top.
> > @@ -482,7 +488,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab
> > void *freelist_new, unsigned long counters_new,
> > const char *n)
> > {
> > - if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> > + if (use_lockless_fast_path())
> > lockdep_assert_irqs_disabled();
>
> This test would stay after the patch I referenced above. But while this
> change will keep testing the technically correct thing, the name would be
> IMHO misleading here, as this is semantically not about the lockless fast
> path, but whether we need to have irqs disabled to avoid a deadlock due to
> irq incoming when we hold the bit_spin_lock() and its handler trying to
> acquire it as well.
Color me confused. Memory is never allocated in-IRQ context on
PREEMPT_RT. Therefore I don't understand why interrupts must be disabled
for the fast path (unless that comment only applied to !RT).
It could be about preemption since spinlock, local_lock don't disable
preemption and so another allocation on the same CPU is possible. But
then you say "we hold the bit_spin_lock()" and this one disables
preemption. This means nothing can stop the bit_spin_lock() owner from
making progress and since there is no memory allocation in-IRQ, we can't
block on the same bit_spin_lock() on the same CPU.
Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery
2022-08-18 8:55 ` Rasmus Villemoes
2022-08-18 15:51 ` Sebastian Andrzej Siewior
@ 2022-08-24 13:50 ` Thomas Gleixner
1 sibling, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-08-24 13:50 UTC (permalink / raw)
To: Rasmus Villemoes, Sebastian Andrzej Siewior, linux-kernel
Cc: Peter Zijlstra, Steven Rostedt, Linus Torvalds, Matthew Wilcox,
Andrew Morton, Nick Terrell, linux-mm
On Thu, Aug 18 2022 at 10:55, Rasmus Villemoes wrote:
> On 17/08/2022 18.27, Sebastian Andrzej Siewior wrote:
>> -#ifdef CONFIG_PREEMPT_RT
>> -int sysctl_compact_unevictable_allowed __read_mostly = 0;
>> -#else
>> -int sysctl_compact_unevictable_allowed __read_mostly = 1;
>> -#endif
>> +int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
>
> Why introduce a Kconfig symbol for this, and not just spell the
> initializer "IS_ENABLED(CONFIG_PREEMPT_RT) ? 0 : 1" or simply
> "!IS_ENABLED(CONFIG_PREEMPT_RT)"?
The reason for the config symbol is that Linus requested to have
semantically obvious constructs which can be utilized even without RT
and clearly spell out what the construct does. When RT selects this then
it's a documented requirement/dependency.
> And if you do keep it in Kconfig, shouldn't the symbol be "depends on
> COMPACTION" so it doesn't needlessly appear in .config when
> !CONFIG_COMPACTION.
Sure.
Thanks,
tglx
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-24 13:25 ` Sebastian Andrzej Siewior
@ 2022-08-24 13:54 ` Vlastimil Babka
2022-08-24 13:57 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2022-08-24 13:54 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, Christoph Lameter,
David Rientjes, Joonsoo Kim, Pekka Enberg, linux-mm
On 8/24/22 15:25, Sebastian Andrzej Siewior wrote:
> On 2022-08-23 19:15:43 [+0200], Vlastimil Babka wrote:
>>> +#define slub_local_irq_save(flags) local_irq_save(flags)
>>> +#define slub_local_irq_restore(flags) local_irq_restore(flags)
>>
>> Note these won't be neccessary anymore after
>> https://lore.kernel.org/linux-mm/20220823170400.26546-6-vbabka@suse.cz/T/#u
>
> Okay, let me postpone that one and rebase what is left on top.
>
>>> @@ -482,7 +488,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab
>>> void *freelist_new, unsigned long counters_new,
>>> const char *n)
>>> {
>>> - if (!IS_ENABLED(CONFIG_PREEMPT_RT))
>>> + if (use_lockless_fast_path())
>>> lockdep_assert_irqs_disabled();
>>
>> This test would stay after the patch I referenced above. But while this
>> change will keep testing the technically correct thing, the name would be
>> IMHO misleading here, as this is semantically not about the lockless fast
>> path, but whether we need to have irqs disabled to avoid a deadlock due to
>> irq incoming when we hold the bit_spin_lock() and its handler trying to
>> acquire it as well.
>
> Color me confused. Memory is never allocated in-IRQ context on
> PREEMPT_RT. Therefore I don't understand why interrupts must be disabled
> for the fast path (unless that comment only applied to !RT).
Yes that only applied to !RT. Hence why the assert is there only for !RT.
> It could be about preemption since spinlock, local_lock don't disable
> preemption and so another allocation on the same CPU is possible. But
> then you say "we hold the bit_spin_lock()" and this one disables
> preemption. This means nothing can stop the bit_spin_lock() owner from
> making progress and since there is no memory allocation in-IRQ, we can't
> block on the same bit_spin_lock() on the same CPU.
Yeah, realizing that this is true on RT led to the recent patch I
referenced. Initially when converting SLUB to RT last year I didn't
realize this detail, and instead replaced the irq disabling done (only
on !RT) by e.g. local_lock_irqsave with the manual local_irq_save().
> Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-24 13:54 ` Vlastimil Babka
@ 2022-08-24 13:57 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-24 13:57 UTC (permalink / raw)
To: Vlastimil Babka
Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
Linus Torvalds, Matthew Wilcox, Andrew Morton, Christoph Lameter,
David Rientjes, Joonsoo Kim, Pekka Enberg, linux-mm
On 2022-08-24 15:54:54 [+0200], Vlastimil Babka wrote:
> Yeah, realizing that this is true on RT led to the recent patch I
> referenced. Initially when converting SLUB to RT last year I didn't realize
> this detail, and instead replaced the irq disabling done (only on !RT) by
> e.g. local_lock_irqsave with the manual local_irq_save().
Ah, okay. Let me get to the other series then ;)
Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted
2022-08-19 15:04 ` Christoph Lameter
@ 2022-08-25 5:15 ` Hyeonggon Yoo
0 siblings, 0 replies; 20+ messages in thread
From: Hyeonggon Yoo @ 2022-08-25 5:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Vlastimil Babka, Sebastian Andrzej Siewior, linux-kernel,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
Matthew Wilcox, Andrew Morton, David Rientjes, Joonsoo Kim,
Pekka Enberg, linux-mm
On Fri, Aug 19, 2022 at 05:04:31PM +0200, Christoph Lameter wrote:
> On Thu, 18 Aug 2022, Vlastimil Babka wrote:
>
> > On 8/18/22 11:42, Christoph Lameter wrote:
> > > On Wed, 17 Aug 2022, Sebastian Andrzej Siewior wrote:
> > >
> > >> + * On PREEMPT_RT, the local lock neither disables interrupts nor preemption
> > >> + * which means the lockless fastpath cannot be used as it might interfere with
> > >> + * an in-progress slow path operations. In this case the local lock is always
> > >> + * taken but it still utilizes the freelist for the common operations.
> > >
> > > The slub fastpath does not interfere with slow path operations and the
> >
> > That's true on !PREEMPT_RT because a slowpath operation under
> > local_lock_irqsave() will disable interrupts, so there can't be a
> > fastpath operation in an interrupt handler appearing in the middle of a
> > slowpath operation.
> >
> > On PREEMPT_RT local_lock_irqsave() doesn't actually disable interrupts,
> > so that can happen. IIRC we learned that the hard way when Mike
> > Galbraith was testing early versions of my PREEMPT_RT changes for SLUB.
>
> Well yes if you enable interrupts during the slowpath then interrupts may
> use the fastpath. That is a basic design change to the way concurrency is
> handled in the allocators.
>
> There needs to be some fix here to restore the exclusion of the fastpath
> during slow path processing. This could be
>
> A) Exclude the fastpath during slowpath operations
>
> This can be accomplished by setting things up like in the debug mode
> that also excludes the fastpath.
I think we can do that by disabling preemption (for a short period, I think)
in slowpath on RT (like disabling irq in non-RT)
But I wonder if RT guys will prefer that?
> B) Force interrupt allocations to the slowpath.
>
> Check some flag that indicates an interrupt allocation is occurring and
> then bypass the fastpath.
There is nothing special about interrupt allocation on RT.
All users of SLUB on RT must not be in hardirq context.
So I don't think it is possible to distingush between a thread being preempted
and another thread that preempts it.
--
Thanks,
Hyeonggon
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-08-25 5:15 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20220817162703.728679-1-bigeasy@linutronix.de>
2022-08-17 16:26 ` [PATCH 1/9] slub: Make PREEMPT_RT support less convoluted Sebastian Andrzej Siewior
2022-08-18 9:42 ` Christoph Lameter
2022-08-18 14:37 ` Vlastimil Babka
2022-08-18 15:22 ` Sebastian Andrzej Siewior
2022-08-19 15:04 ` Christoph Lameter
2022-08-25 5:15 ` Hyeonggon Yoo
2022-08-18 17:34 ` Linus Torvalds
2022-08-23 17:15 ` Vlastimil Babka
2022-08-24 13:25 ` Sebastian Andrzej Siewior
2022-08-24 13:54 ` Vlastimil Babka
2022-08-24 13:57 ` Sebastian Andrzej Siewior
2022-08-17 16:26 ` [PATCH 4/9] mm/vmstat: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
2022-08-17 16:26 ` [PATCH 5/9] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
2022-08-17 16:27 ` [PATCH 6/9] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
2022-08-17 16:59 ` Johannes Weiner
2022-08-18 2:45 ` Muchun Song
2022-08-17 16:27 ` [PATCH 7/9] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
2022-08-18 8:55 ` Rasmus Villemoes
2022-08-18 15:51 ` Sebastian Andrzej Siewior
2022-08-24 13:50 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox