From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11B28C2D0E4 for ; Mon, 23 Nov 2020 20:04:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B94820717 for ; Mon, 23 Nov 2020 20:04:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="S0FNdtUx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B94820717 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AFC7A6B0070; Mon, 23 Nov 2020 15:04:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A848A6B0071; Mon, 23 Nov 2020 15:04:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94BBD6B0072; Mon, 23 Nov 2020 15:04:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 66E5E6B0070 for ; Mon, 23 Nov 2020 15:04:35 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1055E180AD80F for ; Mon, 23 Nov 2020 20:04:35 +0000 (UTC) X-FDA: 77516760510.18.sofa80_4904bd727368 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id DC118100ED0D0 for ; Mon, 23 Nov 2020 20:04:34 +0000 (UTC) X-HE-Tag: sofa80_4904bd727368 X-Filterd-Recvd-Size: 4011 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Nov 2020 20:04:34 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EC23420717; Mon, 23 Nov 2020 20:04:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1606161873; bh=8vMh6ZMp9z9BdPbBFZToxbqCwtSHMtbDr/NVvgJkd8A=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=S0FNdtUxNi/hpadZtgAByV84DtYvwCNoorHIAlnF8WyvUJ397wGpePVoxqFVBSLUm bslj1JXYmNJ6Npp5UyNKCPbI6F8/MLHYD7IbLw4hPtXqKW1M1BUoRIkRak6YE3m7bW 8mv6K93hQf+AOehcNmzXxsX9mIx5m7pRu2YIB32Y= Date: Mon, 23 Nov 2020 12:04:32 -0800 From: Andrew Morton To: Lin Feng Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, mgorman@techsingularity.net Subject: Re: [PATCH] [RFC] init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set Message-Id: <20201123120432.3c0cb9b7e2f46150f132d592@linux-foundation.org> In-Reply-To: <20201123110500.103523-1-linf@wangsu.com> References: <20201123110500.103523-1-linf@wangsu.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 23 Nov 2020 19:05:00 +0800 Lin Feng wrote: > In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set, > we have following callchain: > > start_kernel > ... > mm_init > mem_init > memblock_free_all > reset_all_zones_managed_pages > free_low_memory_core_early > ... > buffer_init > nr_free_buffer_pages > zone->managed_pages > ... > rest_init > kernel_init > kernel_init_freeable > page_alloc_init_late > kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); > wait_for_completion(&pgdat_init_all_done_comp); > ... > files_maxfiles_init > > It's clear that buffer_init depends on zone->managed_pages, but it's reset > in reset_all_zones_managed_pages after that pages are readded into > zone->managed_pages, but when buffer_init runs this process is half done > and most of them will finally be added till deferred_init_memmap done. > In large memory couting of nr_free_buffer_pages drifts too much, also > drifting from kernels to kernels on same hardware. > > Fix is simple, it delays buffer_init run till deferred_init_memmap all done. > > But as corrected by this patch, max_buffer_heads becomes very large, > the value is roughly as many as 4 times of totalram_pages, formula: > max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head)); > > Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads > turns out to be roughly 67,108,864. > In common cases, should a buffer_head be mapped to one page/block(4KB)? > So max_buffer_heads never exceeds totalram_pages. > IMO it's likely to make buffer_heads_over_limit bool value alwasy false, > then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary. > Correct me if it's not true. I agree - seems that on such a system we'll allow enough buffer_heads to manage about 250GB worth of pagecache, for a 4kb filesystem blocksize. Perhaps this code is all a remnant of highmem systems, where ZONE_NORMAL is considerably smaller than ZONE_HIGHMEM, and we don't want to be consuming all of ZONE_NORMAL for highmem-attached buffer_heads. I'm not sure that it's all very harmful - we don't *need* to be trimming away at the buffer_heads on a 64GB 4-bit system so the code is really only functional on highmem machines. And as far as I know, it works OK on such machines.