From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEAB6C43331 for ; Wed, 1 Apr 2020 15:42:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9293A212CC for ; Wed, 1 Apr 2020 15:42:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9293A212CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3D3168E0005; Wed, 1 Apr 2020 11:42:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 383E58E0001; Wed, 1 Apr 2020 11:42:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 299E78E0005; Wed, 1 Apr 2020 11:42:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 0E8D38E0001 for ; Wed, 1 Apr 2020 11:42:22 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C13942C94 for ; Wed, 1 Apr 2020 15:42:21 +0000 (UTC) X-FDA: 76659702882.16.limit43_6be3d78d08309 X-HE-Tag: limit43_6be3d78d08309 X-Filterd-Recvd-Size: 6557 Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Wed, 1 Apr 2020 15:42:21 +0000 (UTC) Received: by mail-wm1-f65.google.com with SMTP id z7so202472wmk.1 for ; Wed, 01 Apr 2020 08:42:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Qh51pqUcxSK1vy6HHAY3PK3fGGZoLe1l0UQDBNVaGW8=; b=fvHxlPWMnKrQo6WzMfuZKvheD55BU3YFCtA3znr0vQ5D42yqB/ckGkeaNdVgJV+8Rw TvhgGAYSqh7c58LTGBhuy8Zj2Vbn59zpU6sDs/u+lq9pAUp6y0bnMOLYfnZBWgZekDn7 bM/iNCKcBzcTy69q8lZ8lhqxpTkqZ/pXmxSIzwSU74ndPgxsp5k9Yldm9Gj6ugWQiS1/ Jf4he/kOzUdLK4gvm1adq27aySd81GFefZAhT9UbIWKlsjxH0f7ljvC+h16fmwCdzfxN 7QZ/XwCPP6rPB0Pcjt5hgAzlJkVCj8aKPJsFghaUohJPs3sU9nF2wXfJ2SsomQwn1n32 jaWA== X-Gm-Message-State: AGi0PuYCT7arW4f2oQ9M1sFWQ0DpjD0fi/C/h//BONYRx3g1iw450XP9 J/VoVfgxX4ojBKdZ1qdQ3+w= X-Google-Smtp-Source: APiQypKfZZzePopjzy6XhMXAehJzcOCmIFHyJEDfDiqCyC+Lo4Awo+YvRWIG0adATkzNq+8IjdVjZw== X-Received: by 2002:a7b:c050:: with SMTP id u16mr5284608wmc.68.1585755740307; Wed, 01 Apr 2020 08:42:20 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id g2sm3322729wrs.42.2020.04.01.08.42.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Apr 2020 08:42:19 -0700 (PDT) Date: Wed, 1 Apr 2020 17:42:17 +0200 From: Michal Hocko To: Shile Zhang Cc: Andrew Morton , Kirill Tkhai , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm: fix tick timer stall during deferred page init Message-ID: <20200401154217.GQ22681@dhcp22.suse.cz> References: <20200311123848.118638-1-shile.zhang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200311123848.118638-1-shile.zhang@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: I am sorry but I have completely missed this patch. On Wed 11-03-20 20:38:48, Shile Zhang wrote: > When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will > initialise the deferred pages with local interrupts disabled. It is > introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while > initializing deferred pages"). > > On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to > the boot CPU, which could caused the tick timer long time stall, system > jiffies not be updated in time. > > The dmesg shown that: > > [ 0.197975] node 0 initialised, 32170688 pages in 1ms > > Obviously, 1ms is unreasonable. > > Now, fix it by restore in the pending interrupts for every 32*1204 pages > (128MB) initialized, give the chance to update the systemd jiffies. > The reasonable demsg shown likes: > > [ 1.069306] node 0 initialised, 32203456 pages in 894ms > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages"). I dislike this solution TBH. It effectivelly conserves the current code and just works around the problem. Why do we hold the IRQ lock here in the first place? This is an early init code and a very limited code is running at this stage. Certainly nothing memory hotplug related which should be the only path really interested in the resize lock AFAIR. This needs a double checking but I strongly believe that the lock can be simply dropped in this path. > Co-developed-by: Kirill Tkhai > Signed-off-by: Kirill Tkhai > Signed-off-by: Shile Zhang > --- > mm/page_alloc.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..a3a47845e150 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1763,12 +1763,17 @@ deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, > return nr_pages; > } > > +/* > + * Release the pending interrupts for every TICK_PAGE_COUNT pages. > + */ > +#define TICK_PAGE_COUNT (32 * 1024) > + > /* Initialise remaining memory on a node */ > static int __init deferred_init_memmap(void *data) > { > pg_data_t *pgdat = data; > const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); > - unsigned long spfn = 0, epfn = 0, nr_pages = 0; > + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0; > unsigned long first_init_pfn, flags; > unsigned long start = jiffies; > struct zone *zone; > @@ -1779,6 +1784,7 @@ static int __init deferred_init_memmap(void *data) > if (!cpumask_empty(cpumask)) > set_cpus_allowed_ptr(current, cpumask); > > +again: > pgdat_resize_lock(pgdat, &flags); > first_init_pfn = pgdat->first_deferred_pfn; > if (first_init_pfn == ULONG_MAX) { > @@ -1790,7 +1796,6 @@ static int __init deferred_init_memmap(void *data) > /* Sanity check boundaries */ > BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > - pgdat->first_deferred_pfn = ULONG_MAX; > > /* Only the highest zone is deferred so find it */ > for (zid = 0; zid < MAX_NR_ZONES; zid++) { > @@ -1809,9 +1814,23 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + /* > + * Release the interrupts for every TICK_PAGE_COUNT pages > + * (128MB) to give tick timer the chance to update the > + * system jiffies. > + */ > + if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) { > + prev_nr_pages = nr_pages; > + pgdat->first_deferred_pfn = spfn; > + pgdat_resize_unlock(pgdat, &flags); > + goto again; > + } > + } > + > zone_empty: > + pgdat->first_deferred_pfn = ULONG_MAX; > pgdat_resize_unlock(pgdat, &flags); > > /* Sanity check that the next zone really is unpopulated */ > -- > 2.24.0.rc2 -- Michal Hocko SUSE Labs