From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D38CC10DCE for ; Wed, 11 Mar 2020 01:44:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9C829208C3 for ; Wed, 11 Mar 2020 01:44:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C829208C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 465C26B0003; Tue, 10 Mar 2020 21:44:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EF296B0006; Tue, 10 Mar 2020 21:44:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28EF36B0007; Tue, 10 Mar 2020 21:44:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 065C36B0003 for ; Tue, 10 Mar 2020 21:44:22 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id ABC3F181AC9BF for ; Wed, 11 Mar 2020 01:44:21 +0000 (UTC) X-FDA: 76581386322.18.clam47_8b54c4888814 X-HE-Tag: clam47_8b54c4888814 X-Filterd-Recvd-Size: 15507 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Mar 2020 01:44:18 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=shile.zhang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0TsFZVK1_1583891050; Received: from ali-6c96cfdd1403.local(mailfrom:shile.zhang@linux.alibaba.com fp:SMTPD_---0TsFZVK1_1583891050) by smtp.aliyun-inc.com(127.0.0.1); Wed, 11 Mar 2020 09:44:10 +0800 From: Shile Zhang Subject: Re: [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap() To: Kirill Tkhai , Andrew Morton , Pavel Tatashin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Michal Hocko References: <20200303161551.132263-1-shile.zhang@linux.alibaba.com> <20200303161551.132263-2-shile.zhang@linux.alibaba.com> <386d7d5f-a57d-f5b1-acee-131ce23d35ec@linux.alibaba.com> <2d4defb7-8816-3447-3d65-f5d80067a9fd@virtuozzo.com> Message-ID: <1856c956-858f-82d4-f3b3-05b2d0e5641c@linux.alibaba.com> Date: Wed, 11 Mar 2020 09:44:10 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <2d4defb7-8816-3447-3d65-f5d80067a9fd@virtuozzo.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Kirill, Sorry for late to reply! I'm not fully understood the whole thing about deferred page init, so I just force on the jiffies update issue itself. Maybe I'm in wrong path, it seems make no sense that deferred page init=20 in 1 CPU system, it cannot be initialize memory parallel. It might be better to disable deferred page init in 'deferred_init' in=20 case of 1 CPU (or only one memory node). In other word, seems the better way to solve this issue is do not bind=20 'pgdatinit' thread on boot CPU. I also refactor the patch based on your comment, please help to check,=20 thanks! On 2020/3/4 18:47, Kirill Tkhai wrote: > On 04.03.2020 05:34, Shile Zhang wrote: >> Hi Kirill, >> >> Thanks for your quickly reply! >> >> On 2020/3/4 00:52, Kirill Tkhai wrote: >>> On 03.03.2020 19:15, Shile Zhang wrote: >>>> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread = will >>>> initialise the deferred pages with local interrupts disabled. It is >>>> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while >>>> initializing deferred pages"). >>>> >>>> The local interrupt will be disabled long time inside >>>> deferred_init_memmap(), depends on memory size. >>>> On machine with NCPUS <=3D 2, the 'pgdatinit' kthread could be pined= on >>>> boot CPU, then the tick timer will stuck long time, which caused the >>>> system wall time inaccuracy. >>>> >>>> For example, the dmesg shown that: >>>> >>>> =C2=A0=C2=A0 [=C2=A0=C2=A0=C2=A0 0.197975] node 0 initialised, 3217= 0688 pages in 1ms >>>> >>>> Obviously, 1ms is unreasonable. >>>> Now, fix it by restore in the pending interrupts inside the while lo= op. >>>> The reasonable demsg shown likes: >>>> >>>> [=C2=A0=C2=A0=C2=A0 1.069306] node 0 initialised, 32203456 pages in = 894ms >>> The way I understand the original problem, that Pavel fixed: >>> >>> we need disable irqs in deferred_init_memmap() since this function ma= y be called >>> in parallel with deferred_grow_zone() called from interrupt handler. = So, Pavel >>> added lock to fix the race. >>> >>> In case of we temporary unlock the lock, interrupt still be possible, >>> so my previous proposition returns the problem back. >>> >>> Now thought again, I think we have to just add: >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0pgdat_resize_unlock(); >>> =C2=A0=C2=A0=C2=A0=C2=A0pgdat_resize_lock(); >>> >>> instead of releasing interrupts, since in case of we just release the= m with lock held, >>> a call of interrupt->deferred_grow_zone() bring us to a deadlock. >>> >>> So, unlock the lock is must. >> Yes, you're right! I missed this point. >> Thanks for your comment! >> >>>> Signed-off-by: Shile Zhang >>>> --- >>>> =C2=A0 mm/page_alloc.c | 6 +++++- >>>> =C2=A0 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 3c4eb750a199..d3f337f2e089 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1809,8 +1809,12 @@ static int __init deferred_init_memmap(void *= data) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * that we can avoid introducin= g any issues with the buddy >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * allocator. >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>> -=C2=A0=C2=A0=C2=A0 while (spfn < epfn) >>>> +=C2=A0=C2=A0=C2=A0 while (spfn < epfn) { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 nr_pages +=3D= deferred_init_maxorder(&i, zone, &spfn, &epfn); >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* let in any pending in= terrupts */ >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 local_irq_restore(flags)= ; >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 local_irq_save(flags); >>>> +=C2=A0=C2=A0=C2=A0 } >>>> =C2=A0 zone_empty: >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_resize_unlock(pgdat, &flags); >>> I think we need here something like below (untested): >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 79e950d76ffc..323afa9a4db5 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1828,7 +1828,7 @@ static int __init deferred_init_memmap(void *da= ta) >>> =C2=A0 { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pg_data_t *pgdat =3D data; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const struct cpumask *cpumask =3D cpu= mask_of_node(pgdat->node_id); >>> -=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D 0, nr_pages =3D= 0; >>> +=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D 0, nr_pages =3D= 0, prev_nr_pages =3D 0; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long first_init_pfn, flags; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long start =3D jiffies; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct zone *zone; >>> @@ -1869,8 +1869,18 @@ static int __init deferred_init_memmap(void *d= ata) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * that we can avoid introducing= any issues with the buddy >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * allocator. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> -=C2=A0=C2=A0=C2=A0 while (spfn < epfn) >>> +=C2=A0=C2=A0=C2=A0 while (spfn < epfn) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 nr_pages +=3D= deferred_init_maxorder(&i, zone, &spfn, &epfn); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Release interrupt= s every 1Gb to give a possibility >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * a timer to advanc= e jiffies. >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (nr_pages - prev_nr_pa= ges > (1UL << (30 - PAGE_SHIFT))) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p= rev_nr_pages =3D nr_pages; >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p= gdat_resize_unlock(pgdat, &flags); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p= gdat_resize_lock(pgdat, &flags); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> +=C2=A0=C2=A0=C2=A0 } >>> =C2=A0 zone_empty: >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_resize_unlock(pgdat, &flags); >>> =20 >>> (I believe the comment may be improved more). >> Yeah, your patch is better! >> I test your code and it works! >> But it seems that 1G is still hold the interrupts too long, about 40ms= in my env >> with Intel(R) Xeon(R) 2.5GHz). I tried other size, it is OK to use 102= 4 pages (4MB), >> which suggested by Andrew's before. >> >> Could you please help to review it again? >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3c4eb750a199..5def66d3ffcd 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1768,7 +1768,7 @@ static int __init deferred_init_memmap(void *dat= a) >> =C2=A0{ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pg_data_t *pgdat =3D data; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const struct cpumask *cpum= ask =3D cpumask_of_node(pgdat->node_id); >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D= 0, nr_pages =3D 0; >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D= 0, nr_pages =3D 0, prev_nr_pages =3D 0; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long first_init_p= fn, flags; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long start =3D ji= ffies; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct zone *zone; >> @@ -1809,8 +1809,17 @@ static int __init deferred_init_memmap(void *da= ta) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * that we can avoid = introducing any issues with the buddy >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * allocator. >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 while (spfn < epfn) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 while (spfn < epfn) { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 nr_pages +=3D deferred_init_maxorder(&i, zone, &spf= n, &epfn); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 /* >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 * Restore pending interrupts every 1024 pages to gi= ve >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 * the chance tick timer to advance jiffies. >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 */ >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 if (nr_pages - prev_nr_pages > 1024) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_res= ize_unlock(&flags); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_res= ize_lock(&flags); > Here is problem: prev_nr_pages must be updated. > > Anyway, releasing every 4M looks wrong for me, since you removes the fi= x that Pavel introduced. > He protected against big allocations made from interrupt content. But i= n case of we unlock > the lock after 4Mb, only 4Mb will be available for allocations from int= errupts. pgdat->first_deferred_pfn > is updated at the start of function, so interrupt allocations won't be = able to initialize > mode for themselve. Yes, you're right. I missed this point since I'm not fully understood=20 the code before. Thanks for your advice! > In case of you want unlock interrupts very often, you should make some = creativity with first_deferred_pfn. > We should update it sequentially. Something like below (untested): I got your point now, thanks! > --- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 79e950d76ffc..be09d158baeb 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1828,7 +1828,7 @@ static int __init deferred_init_memmap(void *data= ) > { > pg_data_t *pgdat =3D data; > const struct cpumask *cpumask =3D cpumask_of_node(pgdat->node_id); > - unsigned long spfn =3D 0, epfn =3D 0, nr_pages =3D 0; > + unsigned long spfn =3D 0, epfn =3D 0, nr_pages; > unsigned long first_init_pfn, flags; > unsigned long start =3D jiffies; > struct zone *zone; > @@ -1838,7 +1838,7 @@ static int __init deferred_init_memmap(void *data= ) > /* Bind memory initialisation thread to a local node if possible */ > if (!cpumask_empty(cpumask)) > set_cpus_allowed_ptr(current, cpumask); > - > +again: > pgdat_resize_lock(pgdat, &flags); > first_init_pfn =3D pgdat->first_deferred_pfn; > if (first_init_pfn =3D=3D ULONG_MAX) { > @@ -1850,7 +1850,6 @@ static int __init deferred_init_memmap(void *data= ) > /* Sanity check boundaries */ > BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > - pgdat->first_deferred_pfn =3D ULONG_MAX; > =20 > /* Only the highest zone is deferred so find it */ > for (zid =3D 0; zid < MAX_NR_ZONES; zid++) { > @@ -1864,14 +1863,30 @@ static int __init deferred_init_memmap(void *da= ta) > first_init_pfn)) > goto zone_empty; > =20 > + nr_pages =3D 0; 'nr_pages' used to mark the total init pages before, so it cannot be=20 zerolized each round. seems we need one more to count the pages init each round. > + > /* > * Initialize and free pages in MAX_ORDER sized increments so > * that we can avoid introducing any issues with the buddy > * allocator. > + * Final iteration marker is: spfn=3DULONG_MAX and epfn=3D0. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages +=3D deferred_init_maxorder(&i, zone, &spfn, &epfn); > + if (!epfn) > + break; Seems 'epfn' never goes to 0 since it is "end page frame number", right? So this is needless. > + pgdat->first_deferred_pfn =3D epfn; I think first_deferred_pfn update wrong value here, it seems should be=20 the spfn, the start pfn right? > + /* > + * Restore pending interrupts every 128Mb to give > + * the chance tick timer to advance jiffies. > + */ > + if (nr_pages > (1UL << 27 - PAGE_SHIFT)) { > + pgdat_resize_unlock(pgdat, &flags); > + goto again; > + } > + } > zone_empty: > + pgdat->first_deferred_pfn =3D ULONG_MAX; > pgdat_resize_unlock(pgdat, &flags); > =20 > /* Sanity check that the next zone really is unpopulated */ > > I update the patch based on your comment, it passed the test. Could you please help to review it again? Thanks! diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..841c902d4509 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1763,12 +1763,17 @@ deferred_init_maxorder(u64 *i, struct zone=20 *zone, unsigned long *start_pfn, =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return nr_pages; =C2=A0} +/* + * Release the tick timer interrupts for every TICK_PAGE_COUNT pages. + */ +#define TICK_PAGE_COUNT=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (32 * = 1024) + =C2=A0/* Initialise remaining memory on a node */ =C2=A0static int __init deferred_init_memmap(void *data) =C2=A0{ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pg_data_t *pgdat =3D data; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const struct cpumask *cpumask= =3D cpumask_of_node(pgdat->node_id); -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D = 0, nr_pages =3D 0; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long spfn =3D 0, epfn =3D = 0, nr_pages =3D 0, prev_nr_pages =3D 0; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long first_init_pfn,= flags; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long start =3D jiffi= es; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct zone *zone; @@ -1779,6 +1784,7 @@ static int __init deferred_init_memmap(void *data) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!cpumask_empty(cpumask)) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 set_cpus_allowed_ptr(current, cpumask); +again: =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_resize_lock(pgdat, &fla= gs); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 first_init_pfn =3D pgdat->fir= st_deferred_pfn; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (first_init_pfn =3D=3D ULO= NG_MAX) { @@ -1790,7 +1796,6 @@ static int __init deferred_init_memmap(void *data) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Sanity check boundaries */ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 BUG_ON(pgdat->first_deferred_= pfn < pgdat->node_start_pfn); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 BUG_ON(pgdat->first_deferred_= pfn > pgdat_end_pfn(pgdat)); -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat->first_deferred_pfn =3D ULONG= _MAX; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Only the highest zone is d= eferred so find it */ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (zid =3D 0; zid < MAX_NR_= ZONES; zid++) { @@ -1809,9 +1814,23 @@ static int __init deferred_init_memmap(void *data) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * that we can avoid int= roducing any issues with the buddy =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * allocator. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 while (spfn < epfn) +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 while (spfn < epfn) { =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 nr_pages +=3D deferred_init_maxorder(&i, zone, &spfn, = &epfn); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 /* +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 * Release the interrupts for every TICK_PAGE_COUNT pag= es +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 * (128MB) to give the chance that tick timer to advanc= e +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 * the jiffies. +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 */ +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) { +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 prev_nr_page= s =3D nr_pages; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat->first= _deferred_pfn =3D spfn; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_resize= _unlock(pgdat, &flags); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto again; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 } +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } + =C2=A0zone_empty: +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat->first_deferred_pfn =3D ULONG= _MAX; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgdat_resize_unlock(pgdat, &f= lags); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Sanity check that the next= zone really is unpopulated */