From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f200.google.com (mail-wj0-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 470EA6B0033 for ; Tue, 7 Feb 2017 04:53:23 -0500 (EST) Received: by mail-wj0-f200.google.com with SMTP id ez4so24324092wjd.2 for ; Tue, 07 Feb 2017 01:53:23 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id u16si4399482wru.73.2017.02.07.01.53.21 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 07 Feb 2017 01:53:22 -0800 (PST) Date: Tue, 7 Feb 2017 10:53:20 +0100 From: Michal Hocko Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc Message-ID: <20170207095320.GF5065@dhcp22.suse.cz> References: <20170206220530.apvuknbagaf2rdlw@techsingularity.net> <20170207084855.GC5065@dhcp22.suse.cz> <614e9873-c894-de42-a38a-1798fc0be039@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <614e9873-c894-de42-a38a-1798fc0be039@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Mel Gorman , Dmitry Vyukov , Tejun Heo , Christoph Lameter , "linux-mm@kvack.org" , LKML , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , syzkaller , Andrew Morton On Tue 07-02-17 10:23:31, Vlastimil Babka wrote: > On 02/07/2017 09:48 AM, Michal Hocko wrote: > > On Mon 06-02-17 22:05:30, Mel Gorman wrote: > >>> Unfortunately it does not seem to help. > >> > >> I'm a little stuck on how to best handle this. get_online_cpus() can > >> halt forever if the hotplug operation is holding the mutex when calling > >> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which > >> trylocks the mutex. However, given that drain is so unlikely to actually > >> make that make a difference when racing against parallel allocations, > >> I think this should be acceptable. > >> > >> Any objections? > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 3b93879990fd..a3192447e906 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, > >> */ > >> if (!page && !drained) { > >> unreserve_highatomic_pageblock(ac, false); > >> - drain_all_pages(NULL); > >> + > >> + /* > >> + * Only drain from contexts allocating for user allocations. > >> + * Kernel allocations could be holding a CPU hotplug-related > >> + * mutex, particularly hot-add allocating per-cpu structures > >> + * while hotplug-related mutex's are held which would prevent > >> + * get_online_cpus ever returning. > >> + */ > >> + if (gfp_mask & __GFP_HARDWALL) > >> + drain_all_pages(NULL); > >> + > > > > This wouldn't work AFAICS. If you look at the lockdep splat, the path > > which reverses the locking order (takes pcpu_alloc_mutex prior to > > cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus > > __GFP_HARDWALL. > > > > I believe we shouldn't pull any dependency on the hotplug locks inside > > the allocator. This is just too fragile! Can we simply drop the > > get_online_cpus()? Why do we need it, anyway? Say we are racing with the > > It was added after I noticed in review that queue_work_on() has a > comment that caller must ensure that cpu can't go away, and wondered > about it. Ohh, I haven't noticed the comment. Thanks for pointing it out. I still do not see what would a missing get_online_cpus mean for queuing. > Also noted that a similar lru_add_drain_all() does it too. > > > cpu offlining. I have to check the code but my impression was that WQ > > code will ignore the cpu requested by the work item when the cpu is > > going offline. If the offline happens while the worker function already > > executes then it has to wait as we run with preemption disabled so we > > should be safe here. Or am I missing something obvious? > > Tejun suggested an alternative solution to avoiding get_online_cpus() in > this thread: > https://lkml.kernel.org/r/<20170123170329.GA7820@htj.duckdns.org> OK, so we have page_alloc_cpu_notify which also does drain_pages so all we have to do to make sure they do not race is to synchronize there. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org