linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mark Brown <broonie@kernel.org>
To: chengming.zhou@linux.dev
Cc: vbabka@suse.cz, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Chengming Zhou <zhouchengming@bytedance.com>
Subject: Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs
Date: Mon, 20 Nov 2023 18:49:58 +0000	[thread overview]
Message-ID: <4f3bc1bd-ea87-465d-b58a-0ed57b15187b@sirena.org.uk> (raw)
In-Reply-To: <20231102032330.1036151-7-chengming.zhou@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 6629 bytes --]

On Thu, Nov 02, 2023 at 03:23:27AM +0000, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Now we will freeze slabs when moving them out of node partial list to
> cpu partial list, this method needs two cmpxchg_double operations:
> 
> 1. freeze slab (acquire_slab()) under the node list_lock
> 2. get_freelist() when pick used in ___slab_alloc()

Recently -next has been failing to boot on a Raspberry Pi 3 with an arm
multi_v7_defconfig and a NFS rootfs, a bisect appears to point to this
patch (in -next as c8d312e039030edab25836a326bcaeb2a3d4db14) as having
introduced the issue.  I've included the full bisect log below.

When we see problems we see RCU stalls while logging in, for example:

debian-testing-armhf login: root (automatic login)
Linux debian-testing-armhf 6.7.0-rc1-00006-gc8d312e03903 #1 SMP @1699864348 armv7l
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
[   46.453323] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   46.459361] rcu: 	3-...0: (1 GPs behind) idle=def4/1/0x40000000 softirq=1304/1304 fqs=951
[   46.467669] rcu: 	(detected by 0, t=2103 jiffies, g=1161, q=499 ncpus=4)
[   46.474472] Sending NMI from CPU 0 to CPUs 3:
[   56.478894] rcu: rcu_sched kthread timer wakeup didn't happen for 1002 jiffies! g1161 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[   56.490195] rcu: 	Possible timer handling issue on cpu=0 timer-softirq=1650
[   56.497259] rcu: rcu_sched kthread starved for 1005 jiffies! g1161 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[   56.507589] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   56.516681] rcu: RCU grace-period kthread stack dump:
[   56.521803] task:rcu_sched       state:I stack:0     pid:13    tgid:13    ppid:2      flags:0x00000000
[   56.531267]  __schedule from schedule+0x20/0xe8
[   56.535883]  schedule from schedule_timeout+0xa0/0x158
[   56.541111]  schedule_timeout from rcu_gp_fqs_loop+0x104/0x594
[   56.547048]  rcu_gp_fqs_loop from rcu_gp_kthread+0x14c/0x1c0
[   56.552801]  rcu_gp_kthread from kthread+0xe0/0xfc
[   56.557674]  kthread from ret_from_fork+0x14/0x28
[   56.562457] Exception stack(0xf084dfb0 to 0xf084dff8)
[   56.567584] dfa0:                                     00000000 00000000 00000000 00000000
[   56.575886] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   56.584191] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   56.590907] rcu: Stack dump where RCU GP kthread last ran:
[   56.596474] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc1-00006-gc8d312e03903 #1
[   56.604515] Hardware name: BCM2835
[   56.607965] PC is at default_idle_call+0x1c/0xb0
[   56.612654] LR is at ct_kernel_enter.constprop.0+0x48/0x11c
[   56.618311] pc : [<c1197054>]    lr : [<c1195c98>]    psr: 60010013
[   56.624672] sp : c1b01f70  ip : c1d5af7c  fp : 00000000
[   56.629974] r10: c19cda60  r9 : 00000000  r8 : 00000000
[   56.635277] r7 : c1b04d50  r6 : c1b04d18  r5 : c1d5b684  r4 : c1b09740
[   56.641902] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : 002a3114
[   56.648528] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   56.655774] Control: 10c5383d  Table: 0237406a  DAC: 00000051
[   56.661605]  default_idle_call from do_idle+0x208/0x290
[   56.666920]  do_idle from cpu_startup_entry+0x28/0x2c
[   56.672059]  cpu_startup_entry from rest_init+0xac/0xb0
[   56.677371]  rest_init from arch_post_acpi_subsys_init+0x0/0x8
Login ti

A full log for that run can be seen at:

   https://validation.linaro.org/scheduler/job/4017095

Boots to initramfs with the same kernel image seem fine.  Other systems,
including other 32 bit arm ones, don't seem to be having similar issues
with this userspace.  I've not investigated beyond running the bisect,
the log for which is below:

git bisect start
# good: [64e6d94bfb47ed0732ad06aedf8ec6af5dd2ab84] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc
git bisect good 64e6d94bfb47ed0732ad06aedf8ec6af5dd2ab84
# bad: [5a82d69d48c82e89aef44483d2a129f869f3506a] Add linux-next specific files for 20231120
git bisect bad 5a82d69d48c82e89aef44483d2a129f869f3506a
# good: [ce252a92a867da8a6622463bff637e5f7b904a46] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect good ce252a92a867da8a6622463bff637e5f7b904a46
# good: [c22e026efad504e3b056d4436920d173a09c580e] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git
git bisect good c22e026efad504e3b056d4436920d173a09c580e
# good: [b7fc58ffb105470cb339163cc2b04e3f59387a45] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga.git
git bisect good b7fc58ffb105470cb339163cc2b04e3f59387a45
# good: [26f89f0614f03e4016578a992fc2e86b048a5cb4] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
git bisect good 26f89f0614f03e4016578a992fc2e86b048a5cb4
# good: [602bf18307981f3bfd9ebf19921791a4256d3fd1] Merge branch 'for-6.7' into for-next
git bisect good 602bf18307981f3bfd9ebf19921791a4256d3fd1
# good: [9f16a68069822b1df6bfb8a9ef7258a1e32b25e7] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
git bisect good 9f16a68069822b1df6bfb8a9ef7258a1e32b25e7
# good: [3ff57db6f6569ebc2cc333437e7e949749e59424] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git
git bisect good 3ff57db6f6569ebc2cc333437e7e949749e59424
# bad: [dd374e220ba492f95344a638b1efe5b2744fdd73] slub: Update frozen slabs documentations in the source
git bisect bad dd374e220ba492f95344a638b1efe5b2744fdd73
# good: [a3058965bb35490454953aa2c87ea51004839f2f] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list
git bisect good a3058965bb35490454953aa2c87ea51004839f2f
# bad: [c8d312e039030edab25836a326bcaeb2a3d4db14] slub: Delay freezing of partial slabs
git bisect bad c8d312e039030edab25836a326bcaeb2a3d4db14
# good: [00b15a19ee543f0117cb217fcbab8b7b3fd50677] slub: Introduce freeze_slab()
git bisect good 00b15a19ee543f0117cb217fcbab8b7b3fd50677
# first bad commit: [c8d312e039030edab25836a326bcaeb2a3d4db14] slub: Delay freezing of partial slabs

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2023-11-20 18:50 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02  3:23 [PATCH v5 0/9] slub: Delay freezing of CPU " chengming.zhou
2023-11-02  3:23 ` [PATCH v5 1/9] slub: Reflow ___slab_alloc() chengming.zhou
2023-11-22  0:26   ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 2/9] slub: Change get_partial() interfaces to return slab chengming.zhou
2023-11-22  1:09   ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 3/9] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
2023-11-22  1:21   ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
2023-12-03  6:01   ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 5/9] slub: Introduce freeze_slab() chengming.zhou
2023-11-02  3:23 ` [PATCH v5 6/9] slub: Delay freezing of partial slabs chengming.zhou
2023-11-14  5:44   ` kernel test robot
2023-11-20 18:49   ` Mark Brown [this message]
2023-11-21  0:58     ` Chengming Zhou
2023-11-21  1:29       ` Mark Brown
2023-11-21 15:47         ` Chengming Zhou
2023-11-21 18:21           ` Mark Brown
2023-11-22  8:52             ` Vlastimil Babka
2023-11-22  9:37     ` Vlastimil Babka
2023-11-22 11:27       ` Mark Brown
2023-11-22 11:35       ` Chengming Zhou
2023-11-22 11:40         ` Vlastimil Babka
2023-11-22 11:54           ` Chengming Zhou
2023-11-22 13:19             ` Vlastimil Babka
2023-11-22 14:28               ` Chengming Zhou
2023-11-22 14:32                 ` Vlastimil Babka
2023-12-03  6:53   ` Hyeonggon Yoo
2023-12-03 10:15     ` Chengming Zhou
2023-12-04 16:58       ` Vlastimil Babka
2023-11-02  3:23 ` [PATCH v5 7/9] slub: Optimize deactivate_slab() chengming.zhou
2023-12-03  9:23   ` Hyeonggon Yoo
2023-12-03 10:26     ` Chengming Zhou
2023-12-03 11:19       ` Hyeonggon Yoo
2023-12-03 11:47         ` Chengming Zhou
2023-12-04 17:55     ` Vlastimil Babka
2023-12-05  0:20       ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials* chengming.zhou
2023-12-03  9:27   ` Hyeonggon Yoo
2023-11-02  3:23 ` [PATCH v5 9/9] slub: Update frozen slabs documentations in the source chengming.zhou
2023-12-03  9:47   ` Hyeonggon Yoo
2023-12-04 21:41   ` Christoph Lameter (Ampere)
2023-12-05  6:06     ` Chengming Zhou
2023-12-05  9:39       ` Vlastimil Babka
2023-11-13  8:36 ` [PATCH v5 0/9] slub: Delay freezing of CPU partial slabs Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f3bc1bd-ea87-465d-b58a-0ed57b15187b@sirena.org.uk \
    --to=broonie@kernel.org \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox