* [PATCH v2 0/2] execve scalability issues, part 1
@ 2023-08-22 18:41 Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 1/2] pcpcntr: add group allocation/free Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
0 siblings, 2 replies; 6+ messages in thread
From: Mateusz Guzik @ 2023-08-22 18:41 UTC (permalink / raw)
To: linux-kernel; +Cc: dennis, tj, cl, akpm, shakeelb, linux-mm, Mateusz Guzik
To start I figured I'm going to bench about as friendly case as it gets
-- statically linked *separate* binaries all doing execve in a loop.
I borrowed the bench from here:
http://apollo.backplane.com/DFlyMisc/doexec.c
$ cc -static -O2 -o static-doexec doexec.c
$ ./static-doexec $(nproc)
It prints a result every second (warning: first line is garbage).
My test box is temporarily only 26 cores and even at this scale I run
into massive lock contention stemming from back-to-back calls to
percpu_counter_init (and _destroy later).
While not a panacea, one simple thing to do here is to batch these ops.
Since the term "batching" is already used in the file, I decided to
refer to it as "grouping" instead.
Even if this code could be patched to dodge these counters, I would
argue a high-traffic alloc/free consumer is only a matter of time so it
makes sense to facilitate it.
With the fix I get an ok win, to quote from the commit:
> Even at a very modest scale of 26 cores (ops/s):
> before: 133543.63
> after: 186061.81 (+39%)
> While with the patch these allocations remain a significant problem,
> the primary bottleneck shifts to:
>
> __pv_queued_spin_lock_slowpath+1
> _raw_spin_lock_irqsave+57
> folio_lruvec_lock_irqsave+91
> release_pages+590
> tlb_batch_pages_flush+61
> tlb_finish_mmu+101
> exit_mmap+327
> __mmput+61
> begin_new_exec+1245
> load_elf_binary+712
> bprm_execve+644
> do_execveat_common.isra.0+429
> __x64_sys_execve+50
> do_syscall_64+46
> entry_SYSCALL_64_after_hwframe+110
I intend to do more work on the area to mostly sort it out, but I would
not mind if someone else took the hammer to folio. :)
With this out of the way I'll be looking at some form of caching to
eliminate these allocs as a problem.
Thoughts?
v2:
- force bigger alignment on alloc
- rename "counters" to "nr_counters" and pass prior to lock key
- drop {}'s for single-statement loops
Mateusz Guzik (2):
pcpcntr: add group allocation/free
fork: group allocation of per-cpu counters for mm struct
include/linux/percpu_counter.h | 20 ++++++++---
kernel/fork.c | 14 ++------
lib/percpu_counter.c | 61 +++++++++++++++++++++++-----------
3 files changed, 60 insertions(+), 35 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] pcpcntr: add group allocation/free
2023-08-22 18:41 [PATCH v2 0/2] execve scalability issues, part 1 Mateusz Guzik
@ 2023-08-22 18:41 ` Mateusz Guzik
2023-08-23 1:14 ` Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
1 sibling, 1 reply; 6+ messages in thread
From: Mateusz Guzik @ 2023-08-22 18:41 UTC (permalink / raw)
To: linux-kernel; +Cc: dennis, tj, cl, akpm, shakeelb, linux-mm, Mateusz Guzik
Allocations and frees are globally serialized on the pcpu lock (and the
CPU hotplug lock if enabled, which is the case on Debian).
At least one frequent consumer allocates 4 back-to-back counters (and
frees them in the same manner), exacerbating the problem.
While this does not fully remedy scalability issues, it is a step
towards that goal and provides immediate relief.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
include/linux/percpu_counter.h | 20 ++++++++---
lib/percpu_counter.c | 61 +++++++++++++++++++++++-----------
2 files changed, 57 insertions(+), 24 deletions(-)
diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 75b73c83bc9d..518a4088b964 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -30,17 +30,27 @@ struct percpu_counter {
extern int percpu_counter_batch;
-int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp,
- struct lock_class_key *key);
+int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp,
+ u32 nr_counters, struct lock_class_key *key);
-#define percpu_counter_init(fbc, value, gfp) \
+#define percpu_counter_init_many(fbc, value, gfp, nr_counters) \
({ \
static struct lock_class_key __key; \
\
- __percpu_counter_init(fbc, value, gfp, &__key); \
+ __percpu_counter_init_many(fbc, value, gfp, nr_counters,\
+ &__key); \
})
-void percpu_counter_destroy(struct percpu_counter *fbc);
+
+#define percpu_counter_init(fbc, value, gfp) \
+ percpu_counter_init_many(fbc, value, gfp, 1)
+
+void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters);
+static inline void percpu_counter_destroy(struct percpu_counter *fbc)
+{
+ percpu_counter_destroy_many(fbc, 1);
+}
+
void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
s32 batch);
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 5004463c4f9f..9338b27f1cdd 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -151,48 +151,71 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
}
EXPORT_SYMBOL(__percpu_counter_sum);
-int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp,
- struct lock_class_key *key)
+int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp,
+ u32 nr_counters, struct lock_class_key *key)
{
unsigned long flags __maybe_unused;
-
- raw_spin_lock_init(&fbc->lock);
- lockdep_set_class(&fbc->lock, key);
- fbc->count = amount;
- fbc->counters = alloc_percpu_gfp(s32, gfp);
- if (!fbc->counters)
+ size_t counter_size;
+ s32 __percpu *counters;
+ u32 i;
+
+ counter_size = ALIGN(sizeof(*counters), __alignof__(*counters));
+ counters = __alloc_percpu_gfp(nr_counters * counter_size,
+ __alignof__(*counters), gfp);
+ if (!counters) {
+ fbc[0].counters = NULL;
return -ENOMEM;
+ }
- debug_percpu_counter_activate(fbc);
+ for (i = 0; i < nr_counters; i++) {
+ raw_spin_lock_init(&fbc[i].lock);
+ lockdep_set_class(&fbc[i].lock, key);
+#ifdef CONFIG_HOTPLUG_CPU
+ INIT_LIST_HEAD(&fbc[i].list);
+#endif
+ fbc[i].count = amount;
+ fbc[i].counters = (void *)counters + (i * counter_size);
+
+ debug_percpu_counter_activate(&fbc[i]);
+ }
#ifdef CONFIG_HOTPLUG_CPU
- INIT_LIST_HEAD(&fbc->list);
spin_lock_irqsave(&percpu_counters_lock, flags);
- list_add(&fbc->list, &percpu_counters);
+ for (i = 0; i < nr_counters; i++)
+ list_add(&fbc[i].list, &percpu_counters);
spin_unlock_irqrestore(&percpu_counters_lock, flags);
#endif
return 0;
}
-EXPORT_SYMBOL(__percpu_counter_init);
+EXPORT_SYMBOL(__percpu_counter_init_many);
-void percpu_counter_destroy(struct percpu_counter *fbc)
+void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters)
{
unsigned long flags __maybe_unused;
+ u32 i;
+
+ if (WARN_ON_ONCE(!fbc))
+ return;
- if (!fbc->counters)
+ if (!fbc[0].counters)
return;
- debug_percpu_counter_deactivate(fbc);
+ for (i = 0; i < nr_counters; i++)
+ debug_percpu_counter_deactivate(&fbc[i]);
#ifdef CONFIG_HOTPLUG_CPU
spin_lock_irqsave(&percpu_counters_lock, flags);
- list_del(&fbc->list);
+ for (i = 0; i < nr_counters; i++)
+ list_del(&fbc[i].list);
spin_unlock_irqrestore(&percpu_counters_lock, flags);
#endif
- free_percpu(fbc->counters);
- fbc->counters = NULL;
+
+ free_percpu(fbc[0].counters);
+
+ for (i = 0; i < nr_counters; i++)
+ fbc[i].counters = NULL;
}
-EXPORT_SYMBOL(percpu_counter_destroy);
+EXPORT_SYMBOL(percpu_counter_destroy_many);
int percpu_counter_batch __read_mostly = 32;
EXPORT_SYMBOL(percpu_counter_batch);
--
2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct
2023-08-22 18:41 [PATCH v2 0/2] execve scalability issues, part 1 Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 1/2] pcpcntr: add group allocation/free Mateusz Guzik
@ 2023-08-22 18:41 ` Mateusz Guzik
2023-08-23 2:22 ` kernel test robot
2023-08-23 4:05 ` kernel test robot
1 sibling, 2 replies; 6+ messages in thread
From: Mateusz Guzik @ 2023-08-22 18:41 UTC (permalink / raw)
To: linux-kernel; +Cc: dennis, tj, cl, akpm, shakeelb, linux-mm, Mateusz Guzik
A trivial execve scalability test which tries to be very friendly
(statically linked binaries, all separate) is predominantly bottlenecked
by back-to-back per-cpu counter allocations which serialize on global
locks.
Ease the pain by allocating and freeing them in one go.
Bench can be found here:
http://apollo.backplane.com/DFlyMisc/doexec.c
$ cc -static -O2 -o static-doexec doexec.c
$ ./static-doexec $(nproc)
Even at a very modest scale of 26 cores (ops/s):
before: 133543.63
after: 186061.81 (+39%)
While with the patch these allocations remain a significant problem,
the primary bottleneck shifts to:
__pv_queued_spin_lock_slowpath+1
_raw_spin_lock_irqsave+57
folio_lruvec_lock_irqsave+91
release_pages+590
tlb_batch_pages_flush+61
tlb_finish_mmu+101
exit_mmap+327
__mmput+61
begin_new_exec+1245
load_elf_binary+712
bprm_execve+644
do_execveat_common.isra.0+429
__x64_sys_execve+50
do_syscall_64+46
entry_SYSCALL_64_after_hwframe+110
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
kernel/fork.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index d2e12b6d2b18..4f0ada33457e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -909,8 +909,6 @@ static void cleanup_lazy_tlbs(struct mm_struct *mm)
*/
void __mmdrop(struct mm_struct *mm)
{
- int i;
-
BUG_ON(mm == &init_mm);
WARN_ON_ONCE(mm == current->mm);
@@ -925,9 +923,8 @@ void __mmdrop(struct mm_struct *mm)
put_user_ns(mm->user_ns);
mm_pasid_drop(mm);
mm_destroy_cid(mm);
+ percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
- for (i = 0; i < NR_MM_COUNTERS; i++)
- percpu_counter_destroy(&mm->rss_stat[i]);
free_mm(mm);
}
EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1252,8 +1249,6 @@ static void mm_init_uprobes_state(struct mm_struct *mm)
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
struct user_namespace *user_ns)
{
- int i;
-
mt_init_flags(&mm->mm_mt, MM_MT_FLAGS);
mt_set_external_lock(&mm->mm_mt, &mm->mmap_lock);
atomic_set(&mm->mm_users, 1);
@@ -1301,17 +1296,14 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
if (mm_alloc_cid(mm))
goto fail_cid;
- for (i = 0; i < NR_MM_COUNTERS; i++)
- if (percpu_counter_init(&mm->rss_stat[i], 0, GFP_KERNEL_ACCOUNT))
- goto fail_pcpu;
+ if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, NR_MM_COUNTERS))
+ goto fail_pcpu;
mm->user_ns = get_user_ns(user_ns);
lru_gen_init_mm(mm);
return mm;
fail_pcpu:
- while (i > 0)
- percpu_counter_destroy(&mm->rss_stat[--i]);
mm_destroy_cid(mm);
fail_cid:
destroy_context(mm);
--
2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] pcpcntr: add group allocation/free
2023-08-22 18:41 ` [PATCH v2 1/2] pcpcntr: add group allocation/free Mateusz Guzik
@ 2023-08-23 1:14 ` Mateusz Guzik
0 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2023-08-23 1:14 UTC (permalink / raw)
To: linux-kernel; +Cc: dennis, tj, cl, akpm, shakeelb, linux-mm, Mateusz Guzik
On 8/22/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> Allocations and frees are globally serialized on the pcpu lock (and the
> CPU hotplug lock if enabled, which is the case on Debian).
>
> At least one frequent consumer allocates 4 back-to-back counters (and
> frees them in the same manner), exacerbating the problem.
>
> While this does not fully remedy scalability issues, it is a step
> towards that goal and provides immediate relief.
>
I just found I'm going to have to send a v3 to handle !CONFIG_SMP.
Temporarily I can't even compile-test that, so for now I'm just asking
if this v2 looks fine (modulo the !smp problem).
Sorry for the spam.
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
> include/linux/percpu_counter.h | 20 ++++++++---
> lib/percpu_counter.c | 61 +++++++++++++++++++++++-----------
> 2 files changed, 57 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/percpu_counter.h
> b/include/linux/percpu_counter.h
> index 75b73c83bc9d..518a4088b964 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -30,17 +30,27 @@ struct percpu_counter {
>
> extern int percpu_counter_batch;
>
> -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t
> gfp,
> - struct lock_class_key *key);
> +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount,
> gfp_t gfp,
> + u32 nr_counters, struct lock_class_key *key);
>
> -#define percpu_counter_init(fbc, value, gfp) \
> +#define percpu_counter_init_many(fbc, value, gfp, nr_counters) \
> ({ \
> static struct lock_class_key __key; \
> \
> - __percpu_counter_init(fbc, value, gfp, &__key); \
> + __percpu_counter_init_many(fbc, value, gfp, nr_counters,\
> + &__key); \
> })
>
> -void percpu_counter_destroy(struct percpu_counter *fbc);
> +
> +#define percpu_counter_init(fbc, value, gfp) \
> + percpu_counter_init_many(fbc, value, gfp, 1)
> +
> +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32
> nr_counters);
> +static inline void percpu_counter_destroy(struct percpu_counter *fbc)
> +{
> + percpu_counter_destroy_many(fbc, 1);
> +}
> +
> void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
> void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
> s32 batch);
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index 5004463c4f9f..9338b27f1cdd 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -151,48 +151,71 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
> }
> EXPORT_SYMBOL(__percpu_counter_sum);
>
> -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t
> gfp,
> - struct lock_class_key *key)
> +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount,
> gfp_t gfp,
> + u32 nr_counters, struct lock_class_key *key)
> {
> unsigned long flags __maybe_unused;
> -
> - raw_spin_lock_init(&fbc->lock);
> - lockdep_set_class(&fbc->lock, key);
> - fbc->count = amount;
> - fbc->counters = alloc_percpu_gfp(s32, gfp);
> - if (!fbc->counters)
> + size_t counter_size;
> + s32 __percpu *counters;
> + u32 i;
> +
> + counter_size = ALIGN(sizeof(*counters), __alignof__(*counters));
> + counters = __alloc_percpu_gfp(nr_counters * counter_size,
> + __alignof__(*counters), gfp);
> + if (!counters) {
> + fbc[0].counters = NULL;
> return -ENOMEM;
> + }
>
> - debug_percpu_counter_activate(fbc);
> + for (i = 0; i < nr_counters; i++) {
> + raw_spin_lock_init(&fbc[i].lock);
> + lockdep_set_class(&fbc[i].lock, key);
> +#ifdef CONFIG_HOTPLUG_CPU
> + INIT_LIST_HEAD(&fbc[i].list);
> +#endif
> + fbc[i].count = amount;
> + fbc[i].counters = (void *)counters + (i * counter_size);
> +
> + debug_percpu_counter_activate(&fbc[i]);
> + }
>
> #ifdef CONFIG_HOTPLUG_CPU
> - INIT_LIST_HEAD(&fbc->list);
> spin_lock_irqsave(&percpu_counters_lock, flags);
> - list_add(&fbc->list, &percpu_counters);
> + for (i = 0; i < nr_counters; i++)
> + list_add(&fbc[i].list, &percpu_counters);
> spin_unlock_irqrestore(&percpu_counters_lock, flags);
> #endif
> return 0;
> }
> -EXPORT_SYMBOL(__percpu_counter_init);
> +EXPORT_SYMBOL(__percpu_counter_init_many);
>
> -void percpu_counter_destroy(struct percpu_counter *fbc)
> +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32
> nr_counters)
> {
> unsigned long flags __maybe_unused;
> + u32 i;
> +
> + if (WARN_ON_ONCE(!fbc))
> + return;
>
> - if (!fbc->counters)
> + if (!fbc[0].counters)
> return;
>
> - debug_percpu_counter_deactivate(fbc);
> + for (i = 0; i < nr_counters; i++)
> + debug_percpu_counter_deactivate(&fbc[i]);
>
> #ifdef CONFIG_HOTPLUG_CPU
> spin_lock_irqsave(&percpu_counters_lock, flags);
> - list_del(&fbc->list);
> + for (i = 0; i < nr_counters; i++)
> + list_del(&fbc[i].list);
> spin_unlock_irqrestore(&percpu_counters_lock, flags);
> #endif
> - free_percpu(fbc->counters);
> - fbc->counters = NULL;
> +
> + free_percpu(fbc[0].counters);
> +
> + for (i = 0; i < nr_counters; i++)
> + fbc[i].counters = NULL;
> }
> -EXPORT_SYMBOL(percpu_counter_destroy);
> +EXPORT_SYMBOL(percpu_counter_destroy_many);
>
> int percpu_counter_batch __read_mostly = 32;
> EXPORT_SYMBOL(percpu_counter_batch);
> --
> 2.39.2
>
>
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct
2023-08-22 18:41 ` [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
@ 2023-08-23 2:22 ` kernel test robot
2023-08-23 4:05 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2023-08-23 2:22 UTC (permalink / raw)
To: Mateusz Guzik, linux-kernel
Cc: llvm, oe-kbuild-all, dennis, tj, cl, akpm, shakeelb, linux-mm,
Mateusz Guzik
Hi Mateusz,
kernel test robot noticed the following build errors:
[auto build test ERROR on dennis-percpu/for-next]
[also build test ERROR on linus/master v6.5-rc7 next-20230822]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mateusz-Guzik/pcpcntr-add-group-allocation-free/20230823-024312
base: https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-next
patch link: https://lore.kernel.org/r/20230822184152.2194558-3-mjguzik%40gmail.com
patch subject: [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20230823/202308231004.tg0zQ8e9-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce: (https://download.01.org/0day-ci/archive/20230823/202308231004.tg0zQ8e9-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308231004.tg0zQ8e9-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from kernel/fork.c:34:
In file included from include/linux/mempolicy.h:15:
In file included from include/linux/pagemap.h:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from kernel/fork.c:34:
In file included from include/linux/mempolicy.h:15:
In file included from include/linux/pagemap.h:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from kernel/fork.c:34:
In file included from include/linux/mempolicy.h:15:
In file included from include/linux/pagemap.h:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> kernel/fork.c:926:2: error: implicit declaration of function 'percpu_counter_destroy_many' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
^
kernel/fork.c:926:2: note: did you mean 'percpu_counter_destroy'?
include/linux/percpu_counter.h:136:20: note: 'percpu_counter_destroy' declared here
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
^
>> kernel/fork.c:1299:6: error: implicit declaration of function 'percpu_counter_init_many' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, NR_MM_COUNTERS))
^
kernel/fork.c:1299:6: note: did you mean 'percpu_counter_init'?
include/linux/percpu_counter.h:129:19: note: 'percpu_counter_init' declared here
static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount,
^
12 warnings and 2 errors generated.
vim +/percpu_counter_destroy_many +926 kernel/fork.c
904
905 /*
906 * Called when the last reference to the mm
907 * is dropped: either by a lazy thread or by
908 * mmput. Free the page directory and the mm.
909 */
910 void __mmdrop(struct mm_struct *mm)
911 {
912 BUG_ON(mm == &init_mm);
913 WARN_ON_ONCE(mm == current->mm);
914
915 /* Ensure no CPUs are using this as their lazy tlb mm */
916 cleanup_lazy_tlbs(mm);
917
918 WARN_ON_ONCE(mm == current->active_mm);
919 mm_free_pgd(mm);
920 destroy_context(mm);
921 mmu_notifier_subscriptions_destroy(mm);
922 check_mm(mm);
923 put_user_ns(mm->user_ns);
924 mm_pasid_drop(mm);
925 mm_destroy_cid(mm);
> 926 percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
927
928 free_mm(mm);
929 }
930 EXPORT_SYMBOL_GPL(__mmdrop);
931
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct
2023-08-22 18:41 ` [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
2023-08-23 2:22 ` kernel test robot
@ 2023-08-23 4:05 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2023-08-23 4:05 UTC (permalink / raw)
To: Mateusz Guzik, linux-kernel
Cc: oe-kbuild-all, dennis, tj, cl, akpm, shakeelb, linux-mm, Mateusz Guzik
Hi Mateusz,
kernel test robot noticed the following build errors:
[auto build test ERROR on dennis-percpu/for-next]
[also build test ERROR on linus/master v6.5-rc7 next-20230822]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mateusz-Guzik/pcpcntr-add-group-allocation-free/20230823-024312
base: https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-next
patch link: https://lore.kernel.org/r/20230822184152.2194558-3-mjguzik%40gmail.com
patch subject: [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct
config: arm-randconfig-r005-20230823 (https://download.01.org/0day-ci/archive/20230823/202308231154.SM8fedb1-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230823/202308231154.SM8fedb1-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308231154.SM8fedb1-lkp@intel.com/
All errors (new ones prefixed by >>):
kernel/fork.c: In function '__mmdrop':
>> kernel/fork.c:926:9: error: implicit declaration of function 'percpu_counter_destroy_many'; did you mean 'percpu_counter_destroy'? [-Werror=implicit-function-declaration]
926 | percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
| percpu_counter_destroy
kernel/fork.c: In function 'mm_init':
>> kernel/fork.c:1299:13: error: implicit declaration of function 'percpu_counter_init_many'; did you mean 'percpu_counter_init'? [-Werror=implicit-function-declaration]
1299 | if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, NR_MM_COUNTERS))
| ^~~~~~~~~~~~~~~~~~~~~~~~
| percpu_counter_init
cc1: some warnings being treated as errors
vim +926 kernel/fork.c
904
905 /*
906 * Called when the last reference to the mm
907 * is dropped: either by a lazy thread or by
908 * mmput. Free the page directory and the mm.
909 */
910 void __mmdrop(struct mm_struct *mm)
911 {
912 BUG_ON(mm == &init_mm);
913 WARN_ON_ONCE(mm == current->mm);
914
915 /* Ensure no CPUs are using this as their lazy tlb mm */
916 cleanup_lazy_tlbs(mm);
917
918 WARN_ON_ONCE(mm == current->active_mm);
919 mm_free_pgd(mm);
920 destroy_context(mm);
921 mmu_notifier_subscriptions_destroy(mm);
922 check_mm(mm);
923 put_user_ns(mm->user_ns);
924 mm_pasid_drop(mm);
925 mm_destroy_cid(mm);
> 926 percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
927
928 free_mm(mm);
929 }
930 EXPORT_SYMBOL_GPL(__mmdrop);
931
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-08-23 4:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22 18:41 [PATCH v2 0/2] execve scalability issues, part 1 Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 1/2] pcpcntr: add group allocation/free Mateusz Guzik
2023-08-23 1:14 ` Mateusz Guzik
2023-08-22 18:41 ` [PATCH v2 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
2023-08-23 2:22 ` kernel test robot
2023-08-23 4:05 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox