* WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 @ 2019-06-19 2:08 Andrei Vagin 2019-06-19 3:41 ` Roman Gushchin 2019-06-19 21:19 ` Roman Gushchin 0 siblings, 2 replies; 5+ messages in thread From: Andrei Vagin @ 2019-06-19 2:08 UTC (permalink / raw) To: Roman Gushchin, linux-mm Hello, We run CRIU tests on linux-next kernels and today we found this warning in the kernel log: [ 381.345960] WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 page_counter_cancel+0x26/0x30 [ 381.345992] Modules linked in: [ 381.345998] CPU: 0 PID: 11655 Comm: kworker/0:8 Not tainted 5.2.0-rc5-next-20190618+ #1 [ 381.346001] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 381.346010] Workqueue: memcg_kmem_cache kmemcg_workfn [ 381.346013] RIP: 0010:page_counter_cancel+0x26/0x30 [ 381.346017] Code: 1f 44 00 00 0f 1f 44 00 00 48 89 f0 53 48 f7 d8 f0 48 0f c1 07 48 29 f0 48 89 c3 48 89 c6 e8 61 ff ff ff 48 85 db 78 02 5b c3 <0f> 0b 5b c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 41 41 55 [ 381.346019] RSP: 0018:ffffb3b34319f990 EFLAGS: 00010086 [ 381.346022] RAX: fffffffffffffffc RBX: fffffffffffffffc RCX: 0000000000000004 [ 381.346024] RDX: 0000000000000000 RSI: fffffffffffffffc RDI: ffff9c2cd7165270 [ 381.346026] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000001 [ 381.346028] R10: 00000000000000c8 R11: ffff9c2cd684e660 R12: 00000000fffffffc [ 381.346030] R13: 0000000000000002 R14: 0000000000000006 R15: ffff9c2c8ce1f200 [ 381.346033] FS: 0000000000000000(0000) GS:ffff9c2cd8200000(0000) knlGS:0000000000000000 [ 381.346039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 381.346041] CR2: 00000000007be000 CR3: 00000001cdbfc005 CR4: 00000000001606f0 [ 381.346043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 381.346045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 381.346047] Call Trace: [ 381.346054] page_counter_uncharge+0x1d/0x30 [ 381.346065] __memcg_kmem_uncharge_memcg+0x39/0x60 [ 381.346071] __free_slab+0x34c/0x460 [ 381.346079] deactivate_slab.isra.80+0x57d/0x6d0 [ 381.346088] ? add_lock_to_list.isra.36+0x9c/0xf0 [ 381.346095] ? __lock_acquire+0x252/0x1410 [ 381.346106] ? cpumask_next_and+0x19/0x20 [ 381.346110] ? slub_cpu_dead+0xd0/0xd0 [ 381.346113] flush_cpu_slab+0x36/0x50 [ 381.346117] ? slub_cpu_dead+0xd0/0xd0 [ 381.346125] on_each_cpu_mask+0x51/0x70 [ 381.346131] ? ksm_migrate_page+0x60/0x60 [ 381.346134] on_each_cpu_cond_mask+0xab/0x100 [ 381.346143] __kmem_cache_shrink+0x56/0x320 [ 381.346150] ? ret_from_fork+0x3a/0x50 [ 381.346157] ? unwind_next_frame+0x73/0x480 [ 381.346176] ? __lock_acquire+0x252/0x1410 [ 381.346188] ? kmemcg_workfn+0x21/0x50 [ 381.346196] ? __mutex_lock+0x99/0x920 [ 381.346199] ? kmemcg_workfn+0x21/0x50 [ 381.346205] ? kmemcg_workfn+0x21/0x50 [ 381.346216] __kmemcg_cache_deactivate_after_rcu+0xe/0x40 [ 381.346220] kmemcg_cache_deactivate_after_rcu+0xe/0x20 [ 381.346223] kmemcg_workfn+0x31/0x50 [ 381.346230] process_one_work+0x23c/0x5e0 [ 381.346241] worker_thread+0x3c/0x390 [ 381.346248] ? process_one_work+0x5e0/0x5e0 [ 381.346252] kthread+0x11d/0x140 [ 381.346255] ? kthread_create_on_node+0x60/0x60 [ 381.346261] ret_from_fork+0x3a/0x50 [ 381.346275] irq event stamp: 10302 [ 381.346278] hardirqs last enabled at (10301): [<ffffffffb2c1a0b9>] _raw_spin_unlock_irq+0x29/0x40 [ 381.346282] hardirqs last disabled at (10302): [<ffffffffb2182289>] on_each_cpu_mask+0x49/0x70 [ 381.346287] softirqs last enabled at (10262): [<ffffffffb2191f4a>] cgroup_idr_replace+0x3a/0x50 [ 381.346290] softirqs last disabled at (10260): [<ffffffffb2191f2d>] cgroup_idr_replace+0x1d/0x50 [ 381.346293] ---[ end trace b324ba73eb3659f0 ]--- All logs are here: https://travis-ci.org/avagin/linux/builds/546601278 The problem is probably in the " [PATCH v7 00/10] mm: reparent slab memory on cgroup removal" series. Thanks, Andrei ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 2019-06-19 2:08 WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 Andrei Vagin @ 2019-06-19 3:41 ` Roman Gushchin 2019-06-19 21:19 ` Roman Gushchin 1 sibling, 0 replies; 5+ messages in thread From: Roman Gushchin @ 2019-06-19 3:41 UTC (permalink / raw) To: Andrei Vagin; +Cc: linux-mm Hi Andrei! Thank you for the report! I guess the problem is caused by a race between drain_all_stock() in mem_cgroup_css_offline() and kmem_cache reparenting, so some portion of the charge isn’t propagating to the parent level in time, causing the disbalance. If so, it’s not a huge problem, but definitely something to fix. I’m on pto/traveling this week without a reliable internet connection. I will send out a fix on Sunday/early next week. Thanks! Sent from my iPhone > On Jun 18, 2019, at 19:08, Andrei Vagin <avagin@gmail.com> wrote: > > Hello, > > We run CRIU tests on linux-next kernels and today we found this > warning in the kernel log: > > [ 381.345960] WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 > page_counter_cancel+0x26/0x30 > [ 381.345992] Modules linked in: > [ 381.345998] CPU: 0 PID: 11655 Comm: kworker/0:8 Not tainted > 5.2.0-rc5-next-20190618+ #1 > [ 381.346001] Hardware name: Google Google Compute Engine/Google > Compute Engine, BIOS Google 01/01/2011 > [ 381.346010] Workqueue: memcg_kmem_cache kmemcg_workfn > [ 381.346013] RIP: 0010:page_counter_cancel+0x26/0x30 > [ 381.346017] Code: 1f 44 00 00 0f 1f 44 00 00 48 89 f0 53 48 f7 d8 > f0 48 0f c1 07 48 29 f0 48 89 c3 48 89 c6 e8 61 ff ff ff 48 85 db 78 > 02 5b c3 <0f> 0b 5b c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 41 > 41 55 > [ 381.346019] RSP: 0018:ffffb3b34319f990 EFLAGS: 00010086 > [ 381.346022] RAX: fffffffffffffffc RBX: fffffffffffffffc RCX: 0000000000000004 > [ 381.346024] RDX: 0000000000000000 RSI: fffffffffffffffc RDI: ffff9c2cd7165270 > [ 381.346026] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000001 > [ 381.346028] R10: 00000000000000c8 R11: ffff9c2cd684e660 R12: 00000000fffffffc > [ 381.346030] R13: 0000000000000002 R14: 0000000000000006 R15: ffff9c2c8ce1f200 > [ 381.346033] FS: 0000000000000000(0000) GS:ffff9c2cd8200000(0000) > knlGS:0000000000000000 > [ 381.346039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 381.346041] CR2: 00000000007be000 CR3: 00000001cdbfc005 CR4: 00000000001606f0 > [ 381.346043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 381.346045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 381.346047] Call Trace: > [ 381.346054] page_counter_uncharge+0x1d/0x30 > [ 381.346065] __memcg_kmem_uncharge_memcg+0x39/0x60 > [ 381.346071] __free_slab+0x34c/0x460 > [ 381.346079] deactivate_slab.isra.80+0x57d/0x6d0 > [ 381.346088] ? add_lock_to_list.isra.36+0x9c/0xf0 > [ 381.346095] ? __lock_acquire+0x252/0x1410 > [ 381.346106] ? cpumask_next_and+0x19/0x20 > [ 381.346110] ? slub_cpu_dead+0xd0/0xd0 > [ 381.346113] flush_cpu_slab+0x36/0x50 > [ 381.346117] ? slub_cpu_dead+0xd0/0xd0 > [ 381.346125] on_each_cpu_mask+0x51/0x70 > [ 381.346131] ? ksm_migrate_page+0x60/0x60 > [ 381.346134] on_each_cpu_cond_mask+0xab/0x100 > [ 381.346143] __kmem_cache_shrink+0x56/0x320 > [ 381.346150] ? ret_from_fork+0x3a/0x50 > [ 381.346157] ? unwind_next_frame+0x73/0x480 > [ 381.346176] ? __lock_acquire+0x252/0x1410 > [ 381.346188] ? kmemcg_workfn+0x21/0x50 > [ 381.346196] ? __mutex_lock+0x99/0x920 > [ 381.346199] ? kmemcg_workfn+0x21/0x50 > [ 381.346205] ? kmemcg_workfn+0x21/0x50 > [ 381.346216] __kmemcg_cache_deactivate_after_rcu+0xe/0x40 > [ 381.346220] kmemcg_cache_deactivate_after_rcu+0xe/0x20 > [ 381.346223] kmemcg_workfn+0x31/0x50 > [ 381.346230] process_one_work+0x23c/0x5e0 > [ 381.346241] worker_thread+0x3c/0x390 > [ 381.346248] ? process_one_work+0x5e0/0x5e0 > [ 381.346252] kthread+0x11d/0x140 > [ 381.346255] ? kthread_create_on_node+0x60/0x60 > [ 381.346261] ret_from_fork+0x3a/0x50 > [ 381.346275] irq event stamp: 10302 > [ 381.346278] hardirqs last enabled at (10301): [<ffffffffb2c1a0b9>] > _raw_spin_unlock_irq+0x29/0x40 > [ 381.346282] hardirqs last disabled at (10302): [<ffffffffb2182289>] > on_each_cpu_mask+0x49/0x70 > [ 381.346287] softirqs last enabled at (10262): [<ffffffffb2191f4a>] > cgroup_idr_replace+0x3a/0x50 > [ 381.346290] softirqs last disabled at (10260): [<ffffffffb2191f2d>] > cgroup_idr_replace+0x1d/0x50 > [ 381.346293] ---[ end trace b324ba73eb3659f0 ]--- > > All logs are here: > https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_avagin_linux_builds_546601278&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=kpOAQ-QKsSZxwkrkvl5sjp-p0lK15lr38jLoHbKhwVQ&s=-sDpLY8sPriCii_-pdfWaH84xNWSJB9aPb0MTMzWEb0&e= > > The problem is probably in the " [PATCH v7 00/10] mm: reparent slab > memory on cgroup removal" series. > > Thanks, > Andrei ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 2019-06-19 2:08 WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 Andrei Vagin 2019-06-19 3:41 ` Roman Gushchin @ 2019-06-19 21:19 ` Roman Gushchin 2019-06-19 23:41 ` Andrei Vagin 1 sibling, 1 reply; 5+ messages in thread From: Roman Gushchin @ 2019-06-19 21:19 UTC (permalink / raw) To: Andrei Vagin; +Cc: linux-mm On Tue, Jun 18, 2019 at 07:08:26PM -0700, Andrei Vagin wrote: > Hello, > > We run CRIU tests on linux-next kernels and today we found this > warning in the kernel log: Hello, Andrei! Can you, please, check if the following patch fixes the problem? Thanks a lot! -- diff --git a/mm/slab.h b/mm/slab.h index a4c9b9d042de..7667dddb6492 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -326,7 +326,8 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order, memcg = READ_ONCE(s->memcg_params.memcg); lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order)); - memcg_kmem_uncharge_memcg(page, order, memcg); + if (!mem_cgroup_is_root(memcg)) + memcg_kmem_uncharge_memcg(page, order, memcg); rcu_read_unlock(); percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 2019-06-19 21:19 ` Roman Gushchin @ 2019-06-19 23:41 ` Andrei Vagin 2019-06-20 1:32 ` Roman Gushchin 0 siblings, 1 reply; 5+ messages in thread From: Andrei Vagin @ 2019-06-19 23:41 UTC (permalink / raw) To: Roman Gushchin; +Cc: linux-mm On Wed, Jun 19, 2019 at 2:19 PM Roman Gushchin <guro@fb.com> wrote: > > On Tue, Jun 18, 2019 at 07:08:26PM -0700, Andrei Vagin wrote: > > Hello, > > > > We run CRIU tests on linux-next kernels and today we found this > > warning in the kernel log: > > Hello, Andrei! > > Can you, please, check if the following patch fixes the problem? All my tests passed: https://travis-ci.org/avagin/linux/builds/547940031 Tested-by: Andrei Vagin <avagin@gmail.com> Thanks, Andrei > > Thanks a lot! > > -- > > diff --git a/mm/slab.h b/mm/slab.h > index a4c9b9d042de..7667dddb6492 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -326,7 +326,8 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order, > memcg = READ_ONCE(s->memcg_params.memcg); > lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); > mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order)); > - memcg_kmem_uncharge_memcg(page, order, memcg); > + if (!mem_cgroup_is_root(memcg)) > + memcg_kmem_uncharge_memcg(page, order, memcg); > rcu_read_unlock(); > > percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 2019-06-19 23:41 ` Andrei Vagin @ 2019-06-20 1:32 ` Roman Gushchin 0 siblings, 0 replies; 5+ messages in thread From: Roman Gushchin @ 2019-06-20 1:32 UTC (permalink / raw) To: Andrei Vagin; +Cc: linux-mm On Wed, Jun 19, 2019 at 04:41:05PM -0700, Andrei Vagin wrote: > On Wed, Jun 19, 2019 at 2:19 PM Roman Gushchin <guro@fb.com> wrote: > > > > On Tue, Jun 18, 2019 at 07:08:26PM -0700, Andrei Vagin wrote: > > > Hello, > > > > > > We run CRIU tests on linux-next kernels and today we found this > > > warning in the kernel log: > > > > Hello, Andrei! > > > > Can you, please, check if the following patch fixes the problem? > > All my tests passed: https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_avagin_linux_builds_547940031&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=PL584FpQp6_68teeNmPDMBi6YNZHVSL9_X83jLmfid0&s=g-gMywZpFZp5GRfXixu-iX_YPx0rRrMhCgMZc-5IcF4&e= > > Tested-by: Andrei Vagin <avagin@gmail.com> Thank you very much! I'll send the proper patch soon. It's a bit different to what you've tested (I realized that for root_mem_cgroup vmstats should be handled differently too). So I won't put your tested-by for now, let's wait for tests passing with the actual patch. Thank you! Roman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-20 1:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-19 2:08 WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62 Andrei Vagin 2019-06-19 3:41 ` Roman Gushchin 2019-06-19 21:19 ` Roman Gushchin 2019-06-19 23:41 ` Andrei Vagin 2019-06-20 1:32 ` Roman Gushchin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox