* [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
@ 2024-07-22 7:30 kernel test robot
2024-07-22 17:47 ` Boqun Feng
0 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2024-07-22 7:30 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Kent Overstreet,
Kees Cook, Alexander Viro, Alex Gaynor, Alice Ryhl,
Andreas Hindborg, Benno Lossin, Björn Roy Baron, Boqun Feng,
Christoph Lameter, Dennis Zhou, Gary Guo, Miguel Ojeda,
Pasha Tatashin, Peter Zijlstra, Tejun Heo, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, oliver.sang
Hello,
kernel test robot noticed "BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc" on:
commit: 24e44cc22aa3112082f2ee23137d048c73ca96d5 ("mm: percpu: enable per-cpu allocation tagging")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/mater 68b59730459e5d1fe4e0bbeb04ceb9df0f002270]
[test failed on linux-next/master 73399b58e5e5a1b28a04baf42e321cfcfc663c2f]
in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:
runtime: 300s
group: group-04
nr_groups: 5
compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
as we understand, this commit is not the root-cause of KCSAN issue, just make
the issue change from form (1) to (2). furthermore, we don't understand why
the issue happens randomly. but we failed to bisect the issue (1). so we just
make out this report FYI, not sure if it could supply some hints to any real
issues.
60fa4a9e23231721 24e44cc22aa3112082f2ee23137
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
281:999 -28% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_alloc <--- (1)
296:999 -30% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_free
25:999 -3% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_chunk_populated
:999 29% 292:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc <---(2)
:999 27% 269:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_free
:999 4% 44:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_chunk_populated
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407191651.f24e499d-oliver.sang@intel.com
[ 117.357897][ T3648] ==================================================================
[ 117.358568][ T3648] BUG: KCSAN: data-race in pcpu_alloc_noprof / pcpu_block_update_hint_alloc
[ 117.359222][ T3648]
[ 117.359424][ T3648] write to 0xffffffff8559e600 of 4 bytes by task 3653 on cpu 1:
[ 117.360048][ T3648] pcpu_block_update_hint_alloc (mm/percpu.c:602 mm/percpu.c:598 mm/percpu.c:923)
[ 117.360554][ T3648] pcpu_alloc_area (mm/percpu.c:1260)
[ 117.360960][ T3648] pcpu_alloc_noprof (mm/percpu.c:1835)
[ 117.361393][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
[ 117.361825][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
[ 117.362211][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
[ 117.362618][ T3648] map_create (kernel/bpf/syscall.c:1320)
[ 117.362987][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
[ 117.363348][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
[ 117.363738][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
[ 117.364151][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
[ 117.364578][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
[ 117.365004][ T3648]
[ 117.365203][ T3648] read to 0xffffffff8559e600 of 4 bytes by task 3648 on cpu 0:
[ 117.365797][ T3648] pcpu_alloc_noprof (mm/percpu.c:1894)
[ 117.366210][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
[ 117.366627][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
[ 117.367000][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
[ 117.367400][ T3648] map_create (kernel/bpf/syscall.c:1320)
[ 117.367766][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
[ 117.368120][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
[ 117.368503][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
[ 117.368907][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
[ 117.369341][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
[ 117.369761][ T3648]
[ 117.369968][ T3648] value changed: 0x00000003 -> 0x00000000
[ 117.370454][ T3648]
[ 117.370662][ T3648] Reported by Kernel Concurrency Sanitizer on:
[ 117.371179][ T3648] CPU: 0 PID: 3648 Comm: trinity-c2 Not tainted 6.9.0-rc4-00089-g24e44cc22aa3 #1
[ 117.371943][ T3648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 117.372811][ T3648] ==================================================================
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240719/202407191651.f24e499d-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 7:30 [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc kernel test robot
@ 2024-07-22 17:47 ` Boqun Feng
2024-07-22 17:52 ` Tejun Heo
0 siblings, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2024-07-22 17:47 UTC (permalink / raw)
To: kernel test robot
Cc: Suren Baghdasaryan, oe-lkp, lkp, linux-kernel, Andrew Morton,
Kent Overstreet, Kees Cook, Alexander Viro, Alex Gaynor,
Alice Ryhl, Andreas Hindborg, Benno Lossin, Björn Roy Baron,
Christoph Lameter, Dennis Zhou, Gary Guo, Miguel Ojeda,
Pasha Tatashin, Peter Zijlstra, Tejun Heo, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 03:30:01PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc" on:
>
> commit: 24e44cc22aa3112082f2ee23137d048c73ca96d5 ("mm: percpu: enable per-cpu allocation tagging")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/mater 68b59730459e5d1fe4e0bbeb04ceb9df0f002270]
> [test failed on linux-next/master 73399b58e5e5a1b28a04baf42e321cfcfc663c2f]
>
> in testcase: trinity
> version: trinity-i386-abe9de86-1_20230429
> with following parameters:
>
> runtime: 300s
> group: group-04
> nr_groups: 5
>
>
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> as we understand, this commit is not the root-cause of KCSAN issue, just make
> the issue change from form (1) to (2). furthermore, we don't understand why
> the issue happens randomly. but we failed to bisect the issue (1). so we just
> make out this report FYI, not sure if it could supply some hints to any real
> issues.
>
> 60fa4a9e23231721 24e44cc22aa3112082f2ee23137
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 281:999 -28% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_alloc <--- (1)
> 296:999 -30% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_free
> 25:999 -3% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_chunk_populated
> :999 29% 292:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc <---(2)
> :999 27% 269:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_free
> :999 4% 44:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_chunk_populated
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407191651.f24e499d-oliver.sang@intel.com
>
>
>
> [ 117.357897][ T3648] ==================================================================
> [ 117.358568][ T3648] BUG: KCSAN: data-race in pcpu_alloc_noprof / pcpu_block_update_hint_alloc
> [ 117.359222][ T3648]
> [ 117.359424][ T3648] write to 0xffffffff8559e600 of 4 bytes by task 3653 on cpu 1:
> [ 117.360048][ T3648] pcpu_block_update_hint_alloc (mm/percpu.c:602 mm/percpu.c:598 mm/percpu.c:923)
> [ 117.360554][ T3648] pcpu_alloc_area (mm/percpu.c:1260)
> [ 117.360960][ T3648] pcpu_alloc_noprof (mm/percpu.c:1835)
> [ 117.361393][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
> [ 117.361825][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
> [ 117.362211][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
> [ 117.362618][ T3648] map_create (kernel/bpf/syscall.c:1320)
> [ 117.362987][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
> [ 117.363348][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
> [ 117.363738][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
> [ 117.364151][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
> [ 117.364578][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
> [ 117.365004][ T3648]
> [ 117.365203][ T3648] read to 0xffffffff8559e600 of 4 bytes by task 3648 on cpu 0:
> [ 117.365797][ T3648] pcpu_alloc_noprof (mm/percpu.c:1894)
> [ 117.366210][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
> [ 117.366627][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
> [ 117.367000][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
> [ 117.367400][ T3648] map_create (kernel/bpf/syscall.c:1320)
> [ 117.367766][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
> [ 117.368120][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
> [ 117.368503][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
> [ 117.368907][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
> [ 117.369341][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
> [ 117.369761][ T3648]
> [ 117.369968][ T3648] value changed: 0x00000003 -> 0x00000000
> [ 117.370454][ T3648]
> [ 117.370662][ T3648] Reported by Kernel Concurrency Sanitizer on:
> [ 117.371179][ T3648] CPU: 0 PID: 3648 Comm: trinity-c2 Not tainted 6.9.0-rc4-00089-g24e44cc22aa3 #1
> [ 117.371943][ T3648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 117.372811][ T3648] ==================================================================
>
This looks like a data race because we read pcpu_nr_empty_pop_pages out
of the lock for a best effort checking, @Tejun, maybe you could confirm
on this?
If so, a fix could be as the follow, i.e. telling KCSAN that the data
race is expected. Another fix is making all accesses to
pcpu_nr_empty_pop_pages atomics (via WRITE_ONCE() and READ_ONCE()).
Regards,
Boqun
(is there a way to tell the bot to issue a rerun with a diff? ;-))
------------------------>8
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..0626ef12099b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1891,7 +1891,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
mutex_unlock(&pcpu_alloc_mutex);
}
- if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
+ /*
+ * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
+ * occur but this is just a best-effort checking, everything is synced
+ * in pcpu_balance_work.
+ */
+ if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
pcpu_schedule_balance_work();
/* clear the areas and return address relative to base address */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 17:47 ` Boqun Feng
@ 2024-07-22 17:52 ` Tejun Heo
2024-07-22 18:03 ` Boqun Feng
0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2024-07-22 17:52 UTC (permalink / raw)
To: Boqun Feng
Cc: kernel test robot, Suren Baghdasaryan, oe-lkp, lkp, linux-kernel,
Andrew Morton, Kent Overstreet, Kees Cook, Alexander Viro,
Alex Gaynor, Alice Ryhl, Andreas Hindborg, Benno Lossin,
Björn Roy Baron, Christoph Lameter, Dennis Zhou, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> This looks like a data race because we read pcpu_nr_empty_pop_pages out
> of the lock for a best effort checking, @Tejun, maybe you could confirm
> on this?
That does sound plausible.
> - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> + /*
> + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> + * occur but this is just a best-effort checking, everything is synced
> + * in pcpu_balance_work.
> + */
> + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> pcpu_schedule_balance_work();
Would it be better to use READ/WRITE_ONCE() for the variable?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 17:52 ` Tejun Heo
@ 2024-07-22 18:03 ` Boqun Feng
2024-07-22 18:15 ` Tejun Heo
2024-07-22 18:27 ` Dennis Zhou
0 siblings, 2 replies; 13+ messages in thread
From: Boqun Feng @ 2024-07-22 18:03 UTC (permalink / raw)
To: Tejun Heo
Cc: kernel test robot, Suren Baghdasaryan, oe-lkp, lkp, linux-kernel,
Andrew Morton, Kent Overstreet, Kees Cook, Alexander Viro,
Alex Gaynor, Alice Ryhl, Andreas Hindborg, Benno Lossin,
Björn Roy Baron, Christoph Lameter, Dennis Zhou, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > on this?
>
> That does sound plausible.
>
> > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > + /*
> > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > + * occur but this is just a best-effort checking, everything is synced
> > + * in pcpu_balance_work.
> > + */
> > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > pcpu_schedule_balance_work();
>
> Would it be better to use READ/WRITE_ONCE() for the variable?
>
For READ/WRITE_ONCE(), we will need to replace all write accesses and
all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
It's better in the sense that it doesn't rely on compiler behaviors on
data races, not sure about the performance impact though.
Regards,
Boqun
----->8
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..729e8188238b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -570,7 +570,8 @@ static void pcpu_isolate_chunk(struct pcpu_chunk *chunk)
if (!chunk->isolated) {
chunk->isolated = true;
- pcpu_nr_empty_pop_pages -= chunk->nr_empty_pop_pages;
+ WRITE_ONCE(pcpu_nr_empty_pop_pages,
+ pcpu_nr_empty_pop_pages - chunk->nr_empty_pop_pages);
}
list_move(&chunk->list, &pcpu_chunk_lists[pcpu_to_depopulate_slot]);
}
@@ -581,7 +582,8 @@ static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk)
if (chunk->isolated) {
chunk->isolated = false;
- pcpu_nr_empty_pop_pages += chunk->nr_empty_pop_pages;
+ WRITE_ONCE(pcpu_nr_empty_pop_pages,
+ pcpu_nr_empty_pop_pages + chunk->nr_empty_pop_pages);
pcpu_chunk_relocate(chunk, -1);
}
}
@@ -599,7 +601,8 @@ static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr)
{
chunk->nr_empty_pop_pages += nr;
if (chunk != pcpu_reserved_chunk && !chunk->isolated)
- pcpu_nr_empty_pop_pages += nr;
+ WRITE_ONCE(pcpu_nr_empty_pop_pages,
+ pcpu_nr_empty_pop_pages + nr);
}
/*
@@ -1891,7 +1894,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
mutex_unlock(&pcpu_alloc_mutex);
}
- if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
+ if (READ_ONCE(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
pcpu_schedule_balance_work();
/* clear the areas and return address relative to base address */
@@ -2754,7 +2757,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
tmp_addr = (unsigned long)base_addr + static_size + ai->reserved_size;
pcpu_first_chunk = pcpu_alloc_first_chunk(tmp_addr, dyn_size);
- pcpu_nr_empty_pop_pages = pcpu_first_chunk->nr_empty_pop_pages;
+ WRITE_ONCE(pcpu_nr_empty_pop_pages, pcpu_first_chunk->nr_empty_pop_pages);
pcpu_chunk_relocate(pcpu_first_chunk, -1);
/* include all regions of the first chunk */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 18:03 ` Boqun Feng
@ 2024-07-22 18:15 ` Tejun Heo
2024-07-22 18:27 ` Dennis Zhou
1 sibling, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2024-07-22 18:15 UTC (permalink / raw)
To: Boqun Feng
Cc: kernel test robot, Suren Baghdasaryan, oe-lkp, lkp, linux-kernel,
Andrew Morton, Kent Overstreet, Kees Cook, Alexander Viro,
Alex Gaynor, Alice Ryhl, Andreas Hindborg, Benno Lossin,
Björn Roy Baron, Christoph Lameter, Dennis Zhou, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
Hello,
On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
...
> For READ/WRITE_ONCE(), we will need to replace all write accesses and
> all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> It's better in the sense that it doesn't rely on compiler behaviors on
> data races, not sure about the performance impact though.
The patch looks great and I doubt this would impact performance in any
noticeable way.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 18:03 ` Boqun Feng
2024-07-22 18:15 ` Tejun Heo
@ 2024-07-22 18:27 ` Dennis Zhou
2024-07-22 20:53 ` Boqun Feng
1 sibling, 1 reply; 13+ messages in thread
From: Dennis Zhou @ 2024-07-22 18:27 UTC (permalink / raw)
To: Boqun Feng, Tejun Heo
Cc: kernel test robot, Suren Baghdasaryan, oe-lkp, lkp, linux-kernel,
Andrew Morton, Kent Overstreet, Kees Cook, Alexander Viro,
Alex Gaynor, Alice Ryhl, Andreas Hindborg, Benno Lossin,
Björn Roy Baron, Christoph Lameter, Dennis Zhou, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
Hello,
On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > on this?
> >
> > That does sound plausible.
> >
> > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > + /*
> > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > + * occur but this is just a best-effort checking, everything is synced
> > > + * in pcpu_balance_work.
> > > + */
> > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > pcpu_schedule_balance_work();
> >
> > Would it be better to use READ/WRITE_ONCE() for the variable?
> >
>
> For READ/WRITE_ONCE(), we will need to replace all write accesses and
> all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> It's better in the sense that it doesn't rely on compiler behaviors on
> data races, not sure about the performance impact though.
>
I think a better alternative is we can move it up into the lock under
area_found. The value gets updated as part of pcpu_alloc_area() as the
code above populates percpu memory that is already allocated.
We should probably annotate pcpu_update_empty_pages() with:
lockdep_assert_held(&pcpu_lock);
Thanks,
Dennis
> Regards,
> Boqun
>
> ----->8
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 20d91af8c033..729e8188238b 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -570,7 +570,8 @@ static void pcpu_isolate_chunk(struct pcpu_chunk *chunk)
>
> if (!chunk->isolated) {
> chunk->isolated = true;
> - pcpu_nr_empty_pop_pages -= chunk->nr_empty_pop_pages;
> + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> + pcpu_nr_empty_pop_pages - chunk->nr_empty_pop_pages);
> }
> list_move(&chunk->list, &pcpu_chunk_lists[pcpu_to_depopulate_slot]);
> }
> @@ -581,7 +582,8 @@ static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk)
>
> if (chunk->isolated) {
> chunk->isolated = false;
> - pcpu_nr_empty_pop_pages += chunk->nr_empty_pop_pages;
> + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> + pcpu_nr_empty_pop_pages + chunk->nr_empty_pop_pages);
> pcpu_chunk_relocate(chunk, -1);
> }
> }
> @@ -599,7 +601,8 @@ static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr)
> {
> chunk->nr_empty_pop_pages += nr;
> if (chunk != pcpu_reserved_chunk && !chunk->isolated)
> - pcpu_nr_empty_pop_pages += nr;
> + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> + pcpu_nr_empty_pop_pages + nr);
> }
>
> /*
> @@ -1891,7 +1894,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> mutex_unlock(&pcpu_alloc_mutex);
> }
>
> - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> + if (READ_ONCE(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> pcpu_schedule_balance_work();
>
> /* clear the areas and return address relative to base address */
> @@ -2754,7 +2757,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
> tmp_addr = (unsigned long)base_addr + static_size + ai->reserved_size;
> pcpu_first_chunk = pcpu_alloc_first_chunk(tmp_addr, dyn_size);
>
> - pcpu_nr_empty_pop_pages = pcpu_first_chunk->nr_empty_pop_pages;
> + WRITE_ONCE(pcpu_nr_empty_pop_pages, pcpu_first_chunk->nr_empty_pop_pages);
> pcpu_chunk_relocate(pcpu_first_chunk, -1);
>
> /* include all regions of the first chunk */
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 18:27 ` Dennis Zhou
@ 2024-07-22 20:53 ` Boqun Feng
2024-07-23 5:50 ` Dennis Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2024-07-22 20:53 UTC (permalink / raw)
To: Dennis Zhou
Cc: Tejun Heo, kernel test robot, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> Hello,
>
> On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > on this?
> > >
> > > That does sound plausible.
> > >
> > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > + /*
> > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > + * occur but this is just a best-effort checking, everything is synced
> > > > + * in pcpu_balance_work.
> > > > + */
> > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > pcpu_schedule_balance_work();
> > >
> > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > >
> >
> > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > It's better in the sense that it doesn't rely on compiler behaviors on
> > data races, not sure about the performance impact though.
> >
>
> I think a better alternative is we can move it up into the lock under
> area_found. The value gets updated as part of pcpu_alloc_area() as the
> code above populates percpu memory that is already allocated.
>
Not sure I followed what exactly you suggested here because I'm not
familiar with the logic, but a simpler version would be:
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..fc54d27e5786 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1891,8 +1891,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
mutex_unlock(&pcpu_alloc_mutex);
}
- if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
- pcpu_schedule_balance_work();
+ scoped_guard(spinlock_irqsave, &pcpu_lock) {
+ if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
+ pcpu_schedule_balance_work();
+ }
/* clear the areas and return address relative to base address */
for_each_possible_cpu(cpu)
I.e. just locking while checking.
Regards,
Boqun
> We should probably annotate pcpu_update_empty_pages() with:
> lockdep_assert_held(&pcpu_lock);
>
> Thanks,
> Dennis
>
> > Regards,
> > Boqun
> >
> > ----->8
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 20d91af8c033..729e8188238b 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -570,7 +570,8 @@ static void pcpu_isolate_chunk(struct pcpu_chunk *chunk)
> >
> > if (!chunk->isolated) {
> > chunk->isolated = true;
> > - pcpu_nr_empty_pop_pages -= chunk->nr_empty_pop_pages;
> > + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> > + pcpu_nr_empty_pop_pages - chunk->nr_empty_pop_pages);
> > }
> > list_move(&chunk->list, &pcpu_chunk_lists[pcpu_to_depopulate_slot]);
> > }
> > @@ -581,7 +582,8 @@ static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk)
> >
> > if (chunk->isolated) {
> > chunk->isolated = false;
> > - pcpu_nr_empty_pop_pages += chunk->nr_empty_pop_pages;
> > + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> > + pcpu_nr_empty_pop_pages + chunk->nr_empty_pop_pages);
> > pcpu_chunk_relocate(chunk, -1);
> > }
> > }
> > @@ -599,7 +601,8 @@ static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr)
> > {
> > chunk->nr_empty_pop_pages += nr;
> > if (chunk != pcpu_reserved_chunk && !chunk->isolated)
> > - pcpu_nr_empty_pop_pages += nr;
> > + WRITE_ONCE(pcpu_nr_empty_pop_pages,
> > + pcpu_nr_empty_pop_pages + nr);
> > }
> >
> > /*
> > @@ -1891,7 +1894,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > mutex_unlock(&pcpu_alloc_mutex);
> > }
> >
> > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > + if (READ_ONCE(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > pcpu_schedule_balance_work();
> >
> > /* clear the areas and return address relative to base address */
> > @@ -2754,7 +2757,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
> > tmp_addr = (unsigned long)base_addr + static_size + ai->reserved_size;
> > pcpu_first_chunk = pcpu_alloc_first_chunk(tmp_addr, dyn_size);
> >
> > - pcpu_nr_empty_pop_pages = pcpu_first_chunk->nr_empty_pop_pages;
> > + WRITE_ONCE(pcpu_nr_empty_pop_pages, pcpu_first_chunk->nr_empty_pop_pages);
> > pcpu_chunk_relocate(pcpu_first_chunk, -1);
> >
> > /* include all regions of the first chunk */
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-22 20:53 ` Boqun Feng
@ 2024-07-23 5:50 ` Dennis Zhou
2024-07-23 6:09 ` Oliver Sang
2024-07-23 21:14 ` Boqun Feng
0 siblings, 2 replies; 13+ messages in thread
From: Dennis Zhou @ 2024-07-23 5:50 UTC (permalink / raw)
To: Boqun Feng
Cc: Tejun Heo, kernel test robot, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > Hello,
> >
> > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > on this?
> > > >
> > > > That does sound plausible.
> > > >
> > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > + /*
> > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > + * in pcpu_balance_work.
> > > > > + */
> > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > pcpu_schedule_balance_work();
> > > >
> > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > >
> > >
> > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > data races, not sure about the performance impact though.
> > >
> >
> > I think a better alternative is we can move it up into the lock under
> > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > code above populates percpu memory that is already allocated.
> >
>
> Not sure I followed what exactly you suggested here because I'm not
> familiar with the logic, but a simpler version would be:
>
>
I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
I was thinking this'll fix this problem.
I also don't know how to rerun this CI tho..
---
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..325fb8412e90 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
area_found:
pcpu_stats_area_alloc(chunk, size);
+
+ if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
+ pcpu_schedule_balance_work();
+
spin_unlock_irqrestore(&pcpu_lock, flags);
/* populate if not all pages are already there */
@@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
mutex_unlock(&pcpu_alloc_mutex);
}
- if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
- pcpu_schedule_balance_work();
-
/* clear the areas and return address relative to base address */
for_each_possible_cpu(cpu)
memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-23 5:50 ` Dennis Zhou
@ 2024-07-23 6:09 ` Oliver Sang
2024-07-23 6:13 ` Dennis Zhou
2024-07-23 21:14 ` Boqun Feng
1 sibling, 1 reply; 13+ messages in thread
From: Oliver Sang @ 2024-07-23 6:09 UTC (permalink / raw)
To: Dennis Zhou
Cc: Boqun Feng, Tejun Heo, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm, oliver.sang
hi, Dennis Zhou,
On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > Hello,
> > >
> > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > on this?
> > > > >
> > > > > That does sound plausible.
> > > > >
> > > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > + /*
> > > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > > + * in pcpu_balance_work.
> > > > > > + */
> > > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > pcpu_schedule_balance_work();
> > > > >
> > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > >
> > > >
> > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > data races, not sure about the performance impact though.
> > > >
> > >
> > > I think a better alternative is we can move it up into the lock under
> > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > code above populates percpu memory that is already allocated.
> > >
> >
> > Not sure I followed what exactly you suggested here because I'm not
> > familiar with the logic, but a simpler version would be:
> >
> >
>
> I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> I was thinking this'll fix this problem.
>
> I also don't know how to rerun this CI tho..
we could test this patch. what's the base? could we apply it directly upon
24e44cc22a?
BTW, our bot is not so clever so far to auto test fix patches, so this is kind
of manual effort. due to resource constraint, it will be hard for us to test
each patch (we saw several patches in this thread already) or test very fast.
>
> ---
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 20d91af8c033..325fb8412e90 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
>
> area_found:
> pcpu_stats_area_alloc(chunk, size);
> +
> + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> + pcpu_schedule_balance_work();
> +
> spin_unlock_irqrestore(&pcpu_lock, flags);
>
> /* populate if not all pages are already there */
> @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> mutex_unlock(&pcpu_alloc_mutex);
> }
>
> - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> - pcpu_schedule_balance_work();
> -
> /* clear the areas and return address relative to base address */
> for_each_possible_cpu(cpu)
> memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-23 6:09 ` Oliver Sang
@ 2024-07-23 6:13 ` Dennis Zhou
2024-07-24 7:10 ` Oliver Sang
0 siblings, 1 reply; 13+ messages in thread
From: Dennis Zhou @ 2024-07-23 6:13 UTC (permalink / raw)
To: Oliver Sang
Cc: Boqun Feng, Tejun Heo, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
Hi Oliver,
On Tue, Jul 23, 2024 at 02:09:38PM +0800, Oliver Sang wrote:
> hi, Dennis Zhou,
>
> On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> > On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > > Hello,
> > > >
> > > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > > on this?
> > > > > >
> > > > > > That does sound plausible.
> > > > > >
> > > > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > + /*
> > > > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > > > + * in pcpu_balance_work.
> > > > > > > + */
> > > > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > pcpu_schedule_balance_work();
> > > > > >
> > > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > > >
> > > > >
> > > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > > data races, not sure about the performance impact though.
> > > > >
> > > >
> > > > I think a better alternative is we can move it up into the lock under
> > > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > > code above populates percpu memory that is already allocated.
> > > >
> > >
> > > Not sure I followed what exactly you suggested here because I'm not
> > > familiar with the logic, but a simpler version would be:
> > >
> > >
> >
> > I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> > I was thinking this'll fix this problem.
> >
> > I also don't know how to rerun this CI tho..
>
> we could test this patch. what's the base? could we apply it directly upon
> 24e44cc22a?
>
> BTW, our bot is not so clever so far to auto test fix patches, so this is kind
> of manual effort. due to resource constraint, it will be hard for us to test
> each patch (we saw several patches in this thread already) or test very fast.
>
Ah yeah that makes sense. If you don't mind testing the last one I sent,
the one below, that applies cleanly to 24e44cc22a.
Thanks,
Dennis
> >
> > ---
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 20d91af8c033..325fb8412e90 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> >
> > area_found:
> > pcpu_stats_area_alloc(chunk, size);
> > +
> > + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > + pcpu_schedule_balance_work();
> > +
> > spin_unlock_irqrestore(&pcpu_lock, flags);
> >
> > /* populate if not all pages are already there */
> > @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > mutex_unlock(&pcpu_alloc_mutex);
> > }
> >
> > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > - pcpu_schedule_balance_work();
> > -
> > /* clear the areas and return address relative to base address */
> > for_each_possible_cpu(cpu)
> > memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-23 5:50 ` Dennis Zhou
2024-07-23 6:09 ` Oliver Sang
@ 2024-07-23 21:14 ` Boqun Feng
2024-07-27 3:15 ` Dennis Zhou
1 sibling, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2024-07-23 21:14 UTC (permalink / raw)
To: Dennis Zhou
Cc: Tejun Heo, kernel test robot, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > Hello,
> > >
> > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > on this?
> > > > >
> > > > > That does sound plausible.
> > > > >
> > > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > + /*
> > > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > > + * in pcpu_balance_work.
> > > > > > + */
> > > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > pcpu_schedule_balance_work();
> > > > >
> > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > >
> > > >
> > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > data races, not sure about the performance impact though.
> > > >
> > >
> > > I think a better alternative is we can move it up into the lock under
> > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > code above populates percpu memory that is already allocated.
> > >
> >
> > Not sure I followed what exactly you suggested here because I'm not
> > familiar with the logic, but a simpler version would be:
> >
> >
>
> I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> I was thinking this'll fix this problem.
>
> I also don't know how to rerun this CI tho..
>
> ---
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 20d91af8c033..325fb8412e90 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
>
> area_found:
> pcpu_stats_area_alloc(chunk, size);
> +
> + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> + pcpu_schedule_balance_work();
> +
But the pcpu_chunk_populated() afterwards could modify the
pcpu_nr_empty_pop_pages again, wouldn't this be a behavior changing?
Regards,
Boqun
> spin_unlock_irqrestore(&pcpu_lock, flags);
>
> /* populate if not all pages are already there */
> @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> mutex_unlock(&pcpu_alloc_mutex);
> }
>
> - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> - pcpu_schedule_balance_work();
> -
> /* clear the areas and return address relative to base address */
> for_each_possible_cpu(cpu)
> memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-23 6:13 ` Dennis Zhou
@ 2024-07-24 7:10 ` Oliver Sang
0 siblings, 0 replies; 13+ messages in thread
From: Oliver Sang @ 2024-07-24 7:10 UTC (permalink / raw)
To: Dennis Zhou
Cc: Boqun Feng, Tejun Heo, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm, oliver.sang
hi, Dannis,
On Mon, Jul 22, 2024 at 11:13:10PM -0700, Dennis Zhou wrote:
> Hi Oliver,
>
> On Tue, Jul 23, 2024 at 02:09:38PM +0800, Oliver Sang wrote:
> > hi, Dennis Zhou,
> >
> > On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> > > On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > > > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > > > Hello,
> > > > >
> > > > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > > > on this?
> > > > > > >
> > > > > > > That does sound plausible.
> > > > > > >
> > > > > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > > + /*
> > > > > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > > > > + * in pcpu_balance_work.
> > > > > > > > + */
> > > > > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > > pcpu_schedule_balance_work();
> > > > > > >
> > > > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > > > >
> > > > > >
> > > > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > > > data races, not sure about the performance impact though.
> > > > > >
> > > > >
> > > > > I think a better alternative is we can move it up into the lock under
> > > > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > > > code above populates percpu memory that is already allocated.
> > > > >
> > > >
> > > > Not sure I followed what exactly you suggested here because I'm not
> > > > familiar with the logic, but a simpler version would be:
> > > >
> > > >
> > >
> > > I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> > > I was thinking this'll fix this problem.
> > >
> > > I also don't know how to rerun this CI tho..
> >
> > we could test this patch. what's the base? could we apply it directly upon
> > 24e44cc22a?
> >
> > BTW, our bot is not so clever so far to auto test fix patches, so this is kind
> > of manual effort. due to resource constraint, it will be hard for us to test
> > each patch (we saw several patches in this thread already) or test very fast.
> >
>
> Ah yeah that makes sense. If you don't mind testing the last one I sent,
> the one below, that applies cleanly to 24e44cc22a.
in our tests, you patch could solve the KCSAN issues. thanks
=========================================================================================
compiler/group/kconfig/nr_groups/rootfs/runtime/tbox_group/testcase:
gcc-13/group-04/x86_64-randconfig-013-20240713/5/debian-11.1-i386-20220923.cgz/300s/vm-snb/trinity
commit:
60fa4a9e23231 ("mm: percpu: add codetag reference into pcpuobj_ext")
24e44cc22aa31 ("mm: percpu: enable per-cpu allocation tagging")
dcfbb68202759 <--- your patch
60fa4a9e23231721 24e44cc22aa3112082f2ee23137 dcfbb6820275994e92a9dcf309e
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
281:999 -28% :998 -28% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_alloc
296:999 -30% :998 -30% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_free
25:999 -3% :998 -3% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_chunk_populated
:999 29% 292:998 0% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
:999 27% 269:998 0% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_free
:999 4% 44:998 0% :999 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_chunk_populated
>
> Thanks,
> Dennis
>
> > >
> > > ---
> > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > index 20d91af8c033..325fb8412e90 100644
> > > --- a/mm/percpu.c
> > > +++ b/mm/percpu.c
> > > @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > >
> > > area_found:
> > > pcpu_stats_area_alloc(chunk, size);
> > > +
> > > + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > + pcpu_schedule_balance_work();
> > > +
> > > spin_unlock_irqrestore(&pcpu_lock, flags);
> > >
> > > /* populate if not all pages are already there */
> > > @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > > mutex_unlock(&pcpu_alloc_mutex);
> > > }
> > >
> > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > - pcpu_schedule_balance_work();
> > > -
> > > /* clear the areas and return address relative to base address */
> > > for_each_possible_cpu(cpu)
> > > memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
2024-07-23 21:14 ` Boqun Feng
@ 2024-07-27 3:15 ` Dennis Zhou
0 siblings, 0 replies; 13+ messages in thread
From: Dennis Zhou @ 2024-07-27 3:15 UTC (permalink / raw)
To: Boqun Feng
Cc: Tejun Heo, kernel test robot, Suren Baghdasaryan, oe-lkp, lkp,
linux-kernel, Andrew Morton, Kent Overstreet, Kees Cook,
Alexander Viro, Alex Gaynor, Alice Ryhl, Andreas Hindborg,
Benno Lossin, Björn Roy Baron, Christoph Lameter, Gary Guo,
Miguel Ojeda, Pasha Tatashin, Peter Zijlstra, Vlastimil Babka,
Wedson Almeida Filho, linux-mm, lkmm
On Tue, Jul 23, 2024 at 02:14:00PM -0700, Boqun Feng wrote:
> On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> > On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > > Hello,
> > > >
> > > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > > on this?
> > > > > >
> > > > > > That does sound plausible.
> > > > > >
> > > > > > > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > + /*
> > > > > > > + * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > > + * occur but this is just a best-effort checking, everything is synced
> > > > > > > + * in pcpu_balance_work.
> > > > > > > + */
> > > > > > > + if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > pcpu_schedule_balance_work();
> > > > > >
> > > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > > >
> > > > >
> > > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > > data races, not sure about the performance impact though.
> > > > >
> > > >
> > > > I think a better alternative is we can move it up into the lock under
> > > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > > code above populates percpu memory that is already allocated.
> > > >
> > >
> > > Not sure I followed what exactly you suggested here because I'm not
> > > familiar with the logic, but a simpler version would be:
> > >
> > >
> >
> > I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> > I was thinking this'll fix this problem.
> >
> > I also don't know how to rerun this CI tho..
> >
> > ---
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 20d91af8c033..325fb8412e90 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> >
> > area_found:
> > pcpu_stats_area_alloc(chunk, size);
> > +
> > + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > + pcpu_schedule_balance_work();
> > +
>
> But the pcpu_chunk_populated() afterwards could modify the
> pcpu_nr_empty_pop_pages again, wouldn't this be a behavior changing?
>
It does, but really at this point it's a mixed bag because the lock
isn't permanently held at all while we do all these operations. The
value is read at best effort.
Ultimately the code below is populating backing pages for non-atomic
allocations. At this point the ideal situation is we're using an already
populated page. There are caveats but I can't say the prior is any
better than this version.
The code you mentioned pairs with the comment on line 916 below.
/*
* If the allocation is not atomic, some blocks may not be
* populated with pages, while we account it here. The number
* of pages will be added back with pcpu_chunk_populated()
* when populating pages.
*/
Thanks,
Dennis
> Regards,
> Boqun
>
> > spin_unlock_irqrestore(&pcpu_lock, flags);
> >
> > /* populate if not all pages are already there */
> > @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > mutex_unlock(&pcpu_alloc_mutex);
> > }
> >
> > - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > - pcpu_schedule_balance_work();
> > -
> > /* clear the areas and return address relative to base address */
> > for_each_possible_cpu(cpu)
> > memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-07-27 3:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 7:30 [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc kernel test robot
2024-07-22 17:47 ` Boqun Feng
2024-07-22 17:52 ` Tejun Heo
2024-07-22 18:03 ` Boqun Feng
2024-07-22 18:15 ` Tejun Heo
2024-07-22 18:27 ` Dennis Zhou
2024-07-22 20:53 ` Boqun Feng
2024-07-23 5:50 ` Dennis Zhou
2024-07-23 6:09 ` Oliver Sang
2024-07-23 6:13 ` Dennis Zhou
2024-07-24 7:10 ` Oliver Sang
2024-07-23 21:14 ` Boqun Feng
2024-07-27 3:15 ` Dennis Zhou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox