linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Qian Cai <cai@lca.pw>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Borislav Petkov <bp@alien8.de>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] mm, meminit: Recalculate pcpu batch and high limits after init completes
Date: Mon, 21 Oct 2019 16:12:28 +0200	[thread overview]
Message-ID: <20191021141228.GR9379@dhcp22.suse.cz> (raw)
In-Reply-To: <85A7E76A-0839-4A43-86F3-6847639F9F92@lca.pw>

On Mon 21-10-19 10:01:24, Qian Cai wrote:
> 
> 
> > On Oct 21, 2019, at 5:48 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
i[...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c0b2e0306720..f972076d0f6b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1947,6 +1947,14 @@ void __init page_alloc_init_late(void)
> > 	/* Block until all are initialised */
> > 	wait_for_completion(&pgdat_init_all_done_comp);
> > 
> > +	/*
> > +	 * The number of managed pages has changed due to the initialisation
> > +	 * so the pcpu batch and high limits needs to be updated or the limits
> > +	 * will be artificially small.
> > +	 */
> > +	for_each_populated_zone(zone)
> > +		zone_pcp_update(zone);
> > +
> > 	/*
> > 	 * We initialized the rest of the deferred pages.  Permanently disable
> > 	 * on-demand struct page initialization.
> > -- 
> > 2.16.4
> > 
> > 
> 
> Warnings from linux-next,
> 
> [   14.265911][  T659] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
> [   14.265992][  T659] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 659, name: pgdatinit8
> [   14.266044][  T659] 1 lock held by pgdatinit8/659:
> [   14.266075][  T659]  #0: c000201ffca87b40 (&(&pgdat->node_size_lock)->rlock){....}, at: deferred_init_memmap+0xc4/0x26c

This is really surprising to say the least. I do not see any spinlock
held here. Besides that we do sleep in wait_for_completion already.
Is it possible that the patch has been misplaced? zone_pcp_update is
called from page_alloc_init_late which is a different context than
deferred_init_memmap which runs in a separate kthread.

> [   14.266160][  T659] irq event stamp: 26
> [   14.266194][  T659] hardirqs last  enabled at (25): [<c000000000950584>] _raw_spin_unlock_irq+0x44/0x80
> [   14.266246][  T659] hardirqs last disabled at (26): [<c0000000009502ec>] _raw_spin_lock_irqsave+0x3c/0xa0
> [   14.266299][  T659] softirqs last  enabled at (0): [<c0000000000ff8d0>] copy_process+0x720/0x19b0
> [   14.266339][  T659] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   14.266400][  T659] CPU: 64 PID: 659 Comm: pgdatinit8 Not tainted 5.4.0-rc4-next-20191021 #1
> [   14.266462][  T659] Call Trace:
> [   14.266494][  T659] [c00000003d8efae0] [c000000000921cf4] dump_stack+0xe8/0x164 (unreliable)
> [   14.266538][  T659] [c00000003d8efb30] [c000000000157c54] ___might_sleep+0x334/0x370
> [   14.266577][  T659] [c00000003d8efbb0] [c00000000094a784] __mutex_lock+0x84/0xb20
> [   14.266627][  T659] [c00000003d8efcc0] [c000000000954038] zone_pcp_update+0x34/0x64
> [   14.266677][  T659] [c00000003d8efcf0] [c000000000b9e6bc] deferred_init_memmap+0x1b8/0x26c
> [   14.266740][  T659] [c00000003d8efdb0] [c000000000149528] kthread+0x1a8/0x1b0
> [   14.266792][  T659] [c00000003d8efe20] [c00000000000b748] ret_from_kernel_thread+0x5c/0x74
> [   14.268288][  T659] node 8 initialised, 1879186 pages in 12200ms
> [   14.268527][  T659] pgdatinit8 (659) used greatest stack depth: 27984 bytes left
> [   15.589983][  T658] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
> [   15.590041][  T658] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 658, name: pgdatinit0
> [   15.590078][  T658] 1 lock held by pgdatinit0/658:
> [   15.590108][  T658]  #0: c000001fff5c7b40 (&(&pgdat->node_size_lock)->rlock){....}, at: deferred_init_memmap+0xc4/0x26c
> [   15.590192][  T658] irq event stamp: 18
> [   15.590224][  T658] hardirqs last  enabled at (17): [<c000000000950654>] _raw_spin_unlock_irqrestore+0x94/0xd0
> [   15.590283][  T658] hardirqs last disabled at (18): [<c0000000009502ec>] _raw_spin_lock_irqsave+0x3c/0xa0
> [   15.590332][  T658] softirqs last  enabled at (0): [<c0000000000ff8d0>] copy_process+0x720/0x19b0
> [   15.590379][  T658] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   15.590414][  T658] CPU: 8 PID: 658 Comm: pgdatinit0 Tainted: G        W         5.4.0-rc4-next-20191021 #1
> [   15.590460][  T658] Call Trace:
> [   15.590491][  T658] [c00000003d8cfae0] [c000000000921cf4] dump_stack+0xe8/0x164 (unreliable)
> [   15.590541][  T658] [c00000003d8cfb30] [c000000000157c54] ___might_sleep+0x334/0x370
> [   15.590588][  T658] [c00000003d8cfbb0] [c00000000094a784] __mutex_lock+0x84/0xb20
> [   15.590643][  T658] [c00000003d8cfcc0] [c000000000954038] zone_pcp_update+0x34/0x64
> [   15.590689][  T658] [c00000003d8cfcf0] [c000000000b9e6bc] deferred_init_memmap+0x1b8/0x26c
> [   15.590739][  T658] [c00000003d8cfdb0] [c000000000149528] kthread+0x1a8/0x1b0
> [   15.590790][  T658] [c00000003d8cfe20] [c00000000000b748] ret_from_kernel_thread+0x5c/0x74

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-10-21 14:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-21  9:48 [PATCH 0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit v2 Mel Gorman
2019-10-21  9:48 ` [PATCH 1/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Mel Gorman
2019-10-21 10:27   ` Michal Hocko
2019-10-21 11:42   ` Vlastimil Babka
2019-10-21 14:01   ` Qian Cai
2019-10-21 14:12     ` Michal Hocko [this message]
2019-10-21 14:25     ` Mel Gorman
2019-10-21 19:39   ` [PATCH] mm, meminit: Recalculate pcpu batch and high limits after init completes -fix Mel Gorman
2019-10-21  9:48 ` [PATCH 2/3] mm, pcp: Share common code between memory hotplug and percpu sysctl handler Mel Gorman
2019-10-21 11:44   ` Vlastimil Babka
2019-10-21  9:48 ` [PATCH 3/3] mm, pcpu: Make zone pcp updates and reset internal to the mm Mel Gorman
2019-10-21 11:42   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191021141228.GR9379@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgorman@techsingularity.net \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox