From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2426C77B61 for ; Wed, 26 Apr 2023 01:24:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B03D6B0075; Tue, 25 Apr 2023 21:24:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 839BB6B0078; Tue, 25 Apr 2023 21:24:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DA8A6B007B; Tue, 25 Apr 2023 21:24:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 56A206B0075 for ; Tue, 25 Apr 2023 21:24:36 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2D4E01A02F5 for ; Wed, 26 Apr 2023 01:24:36 +0000 (UTC) X-FDA: 80721797352.29.1976E75 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 22139120010 for ; Wed, 26 Apr 2023 01:24:33 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MNCbLSN3; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682472274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4l+aAPd/Pidty4VT3FUb+Z71Ngt0f045wCHJVjAyCBk=; b=LXhXamejw+13Jp7w/E6MlBVe/xctOl6gKXKD+MdKrrrCOXVoBTIvoaW4LIdmzraOSB1OVv LEmK+dM/8dxhDOjfQVXbar71MpsjPQrMeCkeNIxZQD95vau2BA8QaxlL/JitU0bElvKvRt Pb74NpqFhkizcgLnmdszmILaH8/aGvo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MNCbLSN3; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682472274; a=rsa-sha256; cv=none; b=uoVHG3BvvDf5Pd84jY6Q8M2V2FkBs0pDG4x/Gxv3OJFBGkn3ngCklC6YH/rlaagYC1I672 P2mzyDJBkSbeMzOvi4ZMmJhbN7ZHAIUV/txrT1CbitV4AMa1dOLA/A/dh0HcO9MA+np+gy k41zjD3l8tUah/j1G9Vs/ARWrLS12W4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682472274; x=1714008274; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=yYghjDk70DgBitqXEYyYD8wI/oOC8o5/z6yF0dEbrYo=; b=MNCbLSN3YxGRwT7i8tiWyM5KduV0spQHtUND3ZSrLBDJ9SuRpRaBfLCI v0dtS5+/Y7K3fePwqyahr/z9K2zF3nMc5ucu5Fx8W70iab5imMRVHN66M MNZAa89tI8baNogERBEF9pyWgbCUK83kgo1O71mf/Ab1gmYl8BWpwpOsq iqnxnesTyp5ukGMNuLZs4RnWDDuQ922qLe6BsDka/n426+Z2tGDh5Z7m/ 7pGh7QqD5D0Ju7T4iQnX4T70Uj9zler+ZP4RTvvANbasAMUoo9F8BcG8H 3InvTVQSRzQKnT3IFfDaV3OVaio1wGMISTuug7y/JFgO9mUmdVA4zrZCS A==; X-IronPort-AV: E=McAfee;i="6600,9927,10691"; a="433231468" X-IronPort-AV: E=Sophos;i="5.99,227,1677571200"; d="scan'208";a="433231468" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2023 18:24:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10691"; a="805289286" X-IronPort-AV: E=Sophos;i="5.99,227,1677571200"; d="scan'208";a="805289286" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2023 18:24:30 -0700 From: "Huang, Ying" To: Baolin Wang Cc: , , , , , , , Subject: Re: [PATCH v4] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() References: <98fa0a22-77d1-cdb3-1ce2-48a00c3ed5a9@linux.alibaba.com> <5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com> Date: Wed, 26 Apr 2023 09:23:25 +0800 In-Reply-To: <5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com> (Baolin Wang's message of "Tue, 25 Apr 2023 20:44:53 +0800") Message-ID: <87sfcne8s2.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: o7qezhgih1por4x8dzryenuq4gamtnrm X-Rspamd-Queue-Id: 22139120010 X-HE-Tag: 1682472273-152008 X-HE-Meta: U2FsdGVkX1/+Zf+bjPHVLcZW7EHkOdBnekYCMu2X7Os+itkREcQNADoTmPmoC5uVTC8bDKuA/uazugIYII3VbfYKL/DkV330A8NUQ8J6RWE7OGB07K8+/hQY6fEH/MvXhSDvl6gpXJkMG/Gca74cbAdy5W5AgKnRoljA4Vc0kYa94OlDvEbsF4sGXFU5io7Ekl68KyH2Cn130EsdQrDJrgEnHQLvhwvnTyd0+3sO3JsM5qURXyXsVObWWJpfbBiuhB26MZHt5+/q9aWySoPX3rzBAOu+YQLimBjyLbLDFmQDmBh9y2GE+LvNcgxyy77B0m2CCfOUKtq5CyMHME8fwVeJNqKdTFraKnVpCzYC5rIQiGENFfD2AlEzAz9DkrCeMspxmLheEp/aBwzP4kakST28+5gGaLISWf0ZZs5tLNL2y1QX8xHqtFsEXfFuciJOLqWoBUBNTDOla+5N+CI/Qm8wRfQuSOaLi2pP+ErigpDLazqpC4XulwHXnG1iLeoNJSjdQmVLWyHt2zQfAqYUBPD51hCDtnIOGMJcAearFd9WZZCIfmnb0THGxn2SgcLBONBYog6GYdYJtiMmCvNAkQuSY8XqJ7TSaLMZJ9x1hkGYJfYM/0vkJc8btn/+vrLUIMaOXiSRqpY2WHpwYmQF+ehr7su1jHqq+kdg872hy7hVhZQWy41PlsbiReYKf9vQPNvHHsE7x0hdy1sks9XvtoMN0r9g7sCoY01//1mKp92nb4nV/Pz//kfTKbHJt7GgoZCtjepN2uGeTRQU+1i0+XTRQfcRiVkHSsG+0yofIoCPCaQ1TAWT/TI1nV/JsFxddxfyfuk5hZF50tlJEyAk64BeqRpPMAj51vFvipfabmtZImnUACIEPE+HqOSEKWM+rrvyO/okECq/Bv47qVVDgWXQz2WtaufIa7wp0Pt+9xTfDFY5nWNa9XUYp3LIUkWMQyJaVnpu8IvlJbVoUlr dMxaGocC nuhntExyaFb7h9q1aiTV9u3OTPEah3/UCaxi3+s8S1O53wqrB2Wl0B8i0BZ42SBEONXbsqsv/6Fi5o3s3+XadaJOreHZZqAlHeC6b0t6Z8Mpq+oxRsPQ0+aEgXVL8MKfp+gj1Nod2lg4I/LSXkFsCRIqgXC495N7xwjxGoP1xuL47m+SJPN2QsaoO5X1Lbm/96DpgXDnPELoo2PSGiHJ+Z+WxkV7rBvHN5+WyWaAckDl0ugjDYKlZHnzXylWBWHqEtsXtvNg0RkihfUjU5pmzZ0QbEZhe8kP+fATy+d/D37cDpxv9ytbLfyjPrg0NE+sABbSccBr2uT0e5fiDEKBRcFgc523MyQ4CwRbrjT/Mifn9zrRpNM9L+i1vkEEpJ/AlSplvVlLnpIV8U3l11Jey+8459hv8bxD51EvhLvb+rWNXFWM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Baolin Wang writes: > Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which > checks whether the given zone contains holes, and uses pfn_to_online_page() > to validate if the start pfn is online and valid, as well as using pfn_valid() > to validate the end pfn. > > However, the __pageblock_pfn_to_page() function may return non-NULL even > if the end pfn of a pageblock is in a memory hole in some situations. For > example, if the pageblock order is MAX_ORDER, which will fall into 2 > sub-sections, and the end pfn of the pageblock may be hole even though > the start pfn is online and valid. > > See below memory layout as an example and suppose the pageblock order > is MAX_ORDER. > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000040000000-0x0000001fa3c7ffff] > [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] > [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] > [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] > [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] > [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] > [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] > [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] > [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7dfffff] > > Focus on the last memory range, and there is a hole for the range [mem > 0x0000001fa7590000-0x0000001fa7dfffff]. That means the last pageblock > will contain the range from 0x1fa7c00000 to 0x1fa7ffffff, since the > pageblock must be 4M aligned. And in this pageblock, these pfns will > fall into 2 sub-section (the sub-section size is 2M aligned). > > So, the 1st sub-section (indicates pfn range: 0x1fa7c00000 - > 0x1fa7dfffff ) in this pageblock is valid by calling subsection_map_init() > in free_area_init(), but the 2nd sub-section (indicates pfn range: > 0x1fa7e00000 - 0x1fa7ffffff ) in this pageblock is not valid. > > This did not break anything until now, but the zone continuous is fragile > in this possible scenario. So as previous discussion[1], it is better to > add some comments to explain this possible issue in case there are some > future pfn walkers that rely on this. > > [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/ > > Signed-off-by: Baolin Wang > Acked-by: Michal Hocko Reviewed-by: "Huang, Ying" > --- > Changes from v3: > - Update the comments to make it clear. > - Add acked tag from Michal. > Changes from v2: > - Update the commit log and comments per Michal, thanks. > Changes from v1: > - Update the comments per Ying and Mike, thanks. > --- > mm/page_alloc.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6457b64fe562..af9c995d3c1e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1502,6 +1502,15 @@ void __free_pages_core(struct page *page, unsigned int order) > * interleaving within a single pageblock. It is therefore sufficient to check > * the first and last page of a pageblock and avoid checking each individual > * page in a pageblock. > + * > + * Note: the function may return non-NULL struct page even for a page block > + * which contains a memory hole (i.e. there is no physical memory for a subset > + * of the pfn range). For example, if the pageblock order is MAX_ORDER, which > + * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole > + * even though the start pfn is online and valid. This should be safe most of > + * the time because struct pages are still initialized via init_unavailable_range() > + * and pfn walkers shouldn't touch any physical memory range for which they do > + * not recognize any specific metadata in struct pages. > */ > struct page *__pageblock_pfn_to_page(unsigned long start_pfn, > unsigned long end_pfn, struct zone *zone)