On 04/15/2015 09:38 AM, Mel Gorman wrote:
>> However, there were 2 bootup problems in the dmesg log that needed
>> to be addressed.
>> 1. There were 2 vmalloc allocation failures:
>> [    2.284686] vmalloc: allocation failure, allocated 16578404352 of
>> 17179873280 bytes
>> [   10.399938] vmalloc: allocation failure, allocated 7970922496 of
>> 8589938688 bytes
>>
>> 2. There were 2 soft lockup warnings:
>> [   57.319453] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
>> [swapper/0:1]
>> [   85.409263] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
>> [swapper/0:1]
>>
>> Once those problems are fixed, the patch should be in a pretty good
>> shape. I have attached the dmesg log for your reference.
>>
> The obvious conclusion is that initialising 1G per node is not enough for
> really large machines. Can you try this on top? It's untested but should
> work. The low value was chosen because it happened to work and I wanted
> to get test coverage on common hardware but broke is broke.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f2c96d02662f..6b3bec304e35 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -276,9 +276,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>   	if (pgdat->first_deferred_pfn != ULONG_MAX)
>   		return false;
>
> -	/* Initialise at least 1G per zone */
> +	/* Initialise at least 32G per node */
>   	(*nr_initialised)++;
> -	if (*nr_initialised>  (1UL<<  (30 - PAGE_SHIFT))&&
> +	if (*nr_initialised>  (32UL<<  (30 - PAGE_SHIFT))&&
>   	(pfn&  (PAGES_PER_SECTION - 1)) == 0) {
>   		pgdat->first_deferred_pfn = pfn;
>   		return false;
>
>
I applied the patch and the boot time was 299s instead of 298s, so 
practically the same. The two issues that I discussed about previously 
were both gone. Attached is the new dmesg log for your reference.

Cheers,
Longman