From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA5B2C77B61 for ; Tue, 25 Apr 2023 00:24:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D55C06B0071; Mon, 24 Apr 2023 20:24:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D05C86B0074; Mon, 24 Apr 2023 20:24:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCD426B0075; Mon, 24 Apr 2023 20:24:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC5946B0071 for ; Mon, 24 Apr 2023 20:24:11 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 78547AC927 for ; Tue, 25 Apr 2023 00:24:11 +0000 (UTC) X-FDA: 80718016302.09.2104ACE Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf30.hostedemail.com (Postfix) with ESMTP id 4AF0C8000E for ; Tue, 25 Apr 2023 00:24:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=S+Aw3UiM; spf=pass (imf30.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682382249; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a5q4YH8Xgt6HT6rzZYLE49+OL48+swqe3AcF+WdPFIQ=; b=KTjHEg6X5paO+ZejnS5tHYZb95tIPV4+XglcJpGIdh7SCBNkULu9zy6GGZT1SDwrahTLwS 7ie/XI6y39HDovJ0FGT0UBAzXVapUC/uuCmIUZuTpAH4cwOLUwnVuCB8iUPpgDz0dxx8m+ xnjVeEFYUVHcom/+CuJWpI/d+ui/cqY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=S+Aw3UiM; spf=pass (imf30.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682382249; a=rsa-sha256; cv=none; b=8Lpu+Meb3wHWXCY9I4JCohj1I2UP57jv4/2agQVgcGc0T2fLWP6Sgv6Us41ZYUxUJQHtoS rjjmORCDZQPIceOwH39iwECQXyM+q/0Y9VgemjViwntfcCskYcoJpzybMNV/pbUetWwogb QhkQPChk5r3k2rRJ2W4Cv8G3LIC9qCg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682382249; x=1713918249; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=Ggl6Dk7a57sbHO8EsDno0qdkPR9T5Q9oeWBPCCIvkIc=; b=S+Aw3UiMSGgEmByUOSYL+tVprjq+k6XoBDdb9n/Gif+6t4e6DLjOaUyx PVCCo+fZFrqqCNEFj6BMqEX7c9QrsuFOXgTXEjVLptwR5gLbV0dzZ89Ww d3g9hhQetPxd9IqHui/ROGchSEupKMO0KuMMCxarxXMkzlzBUEvGAsQrm L424ZulPEYk6UOBxjc6gnLZtN5SDHXsMw/gibyI1cUPcFwsJSwBvsx8Gk dMLz7155GKAxPc5OASAaHdR5oiJOHlZvzsW1mG2j0KHpElkf48Id4+bYZ tTn5icOIQtSZmX3hfwyzQ960SfcTzroo9hWL5i2KQ3pvoC4f+G9XSxf3L g==; X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="326216072" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="326216072" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 17:24:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="693276416" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="693276416" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 17:24:04 -0700 From: "Huang, Ying" To: Baolin Wang Cc: , , , , , , , Subject: Re: [PATCH v3 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() References: <50b5e05dbb007e3a969ac946bc9ee0b2b77b185f.1682342634.git.baolin.wang@linux.alibaba.com> Date: Tue, 25 Apr 2023 08:22:55 +0800 In-Reply-To: <50b5e05dbb007e3a969ac946bc9ee0b2b77b185f.1682342634.git.baolin.wang@linux.alibaba.com> (Baolin Wang's message of "Mon, 24 Apr 2023 21:45:40 +0800") Message-ID: <87zg6wkdy8.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 4AF0C8000E X-Stat-Signature: yxpenicg9pa3jtids3u8gfpmhr6p6jpz X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1682382249-976414 X-HE-Meta: U2FsdGVkX19TGHuPsJxTtDioUgOCuAwb3HuhFxBj3RPCMKv1ljnVOgG0HSXexlkOsroqZPrXJB6nYO/lWubJzp544Kte4BrbsKfUJkUcrl4AvFWWjplGrPYyvZj+a+22c4wSjFOEIAzW1k7mOXcw8DqJEzrUpWJwGzCKfH75B4Vqsfxe/Uq+amGNWNCQ+Iqdk4eF4oaHsbUzgTO45JEVgytoCL+gaZpCKVAreT3m8NKucTpS1QSnelLkwjCbxWdygvP2XIPRMBptcvCFtW4KJyU8mIbi4X7bNLqB2bs36ryM3oayeZ6V+lM2vJMu9vJjZNy8BU42tqtT4gmNnxmhVUEWgSbOa//3jqkTSR52lHNm24qJDfUYrsQc8Gws+2zAC0xaoY//hd04e1moz/Zjh84a5mfrQYxVPAPZRSqlwdkuQZIwoao8F0lu/DQQf4VI66UmA1xFfYJ7+Y4mxKY6LU3mudDtCfxTLFG6VsZ3HjrAHRnOHy0lYxYWKmU6uKAe4YgfCW2WqRoR3jUIleYG5WOimsN7Q+K3aAo7zdigzVsZSRyJg4qCs9SoahoaD6aiBOY1o1dHBZBjVVpJcADsDcK6lRbXN2NICJHkPuOyoIIAh2q85/9OOcoQ4BIJi2Kx9ZG3oMPYcJhho6KhNmexM+xCJjuFIdyI2ZeoZa0515eBRNj1yyzw79aYia1rQYaiAZUgbhyvDDHA8maAoHN73loIhNY+SRlEOCHNWNRJhBbhG4dRMAChHuOQSAH+2Xj5gMrjU9Lm/IKk/DZ9+HJEJwAjsVKSJovXLN0kP2nVAyfHMb6SRpLRQ5xmy8tY6bW7oqTwy6LjA7jUx6lFjN6mQkZYmwaVcJ1ATHJ19AV1uMYQWv7PqzKK+RLcab4aelIHEHd1rVllbk2cePmWN4lDxzxgP/VtI61noGagYynvycCDBcZ6roAD8zONuVle1Itm5vlgJ1AebDZOGGNaKlJ qwq7IkUP Kbb2UP4RFXPliP0AbuFsrIxOKhKM8g9ryRyp3bORZw/Erc9a7LG1gW6PC78tHgD/2N5FxmS5HCtzLmUsw59Gwpi4e3Kre+4hKYX1pgAG4dxri6LwbEcDvqNXEES3l8SfklKkyFRWxIIZlhZnKqmubDFziRNVOIg3bx8RokXhHIvxYZlDXmIeOAUTUv9pEz5EnJSy+eeF/ONlTdE2t1IYoqDC5TgIewOX/ggs4vFUqFBWh4t9zZRQqVwunrsvf1XB0nivNacxRPHW4QOSNAQnEnRUYT03im+7k/kfPNBqnSUddehTAR2fU4u7VWrNeOrbDQhkKyl0LCIm3Kd5DHw1uxmecPasHV7UnrEJnD8yAPgaXcVY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Baolin Wang writes: > Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which > checks whether the given zone contains holes, and uses pfn_to_online_page() > to validate if the start pfn is online and valid, as well as using pfn_valid() > to validate the end pfn. > > However, the __pageblock_pfn_to_page() function may return non-NULL even > if the end pfn of a pageblock is in a memory hole in some situations. For > example, if the pageblock order is MAX_ORDER, which will fall into 2 > sub-sections, and the end pfn of the pageblock may be hole even though > the start pfn is online and valid. > > See below memory layout as an example and suppose the pageblock order > is MAX_ORDER. > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000040000000-0x0000001fa3c7ffff] > [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] > [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] > [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] > [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] > [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] > [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] > [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] > [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7dfffff] > > Focus on the last memory range, and there is a hole for the range [mem > 0x0000001fa7590000-0x0000001fa7dfffff]. That means the last pageblock > will contain the range from 0x1fa7c00000 to 0x1fa7ffffff, since the > pageblock must be 4M aligned. And in this pageblock, these pfns will > fall into 2 sub-section (the sub-section size is 2M aligned). > > So, the 1st sub-section (indicates pfn range: 0x1fa7c00000 - > 0x1fa7dfffff ) in this pageblock is valid by calling subsection_map_init() > in free_area_init(), but the 2nd sub-section (indicates pfn range: > 0x1fa7e00000 - 0x1fa7ffffff ) in this pageblock is not valid. > > This did not break anything until now, but the zone continuous is fragile > in this possible scenario. So as previous discussion[1], it is better to > add some comments to explain this possible issue in case there are some > future pfn walkers that rely on this. > > [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/ > > Signed-off-by: Baolin Wang > --- > Changes from v2: > - Update the commit log and comments per Michal, thanks. > Changes from v1: > - Update the comments per Ying and Mike, thanks. > > Note, I did not add Huang Ying's reviewed tag, since there are some > updates per Michal's suggestion. Ying, please review the v3. Thanks. > --- > mm/page_alloc.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6457b64fe562..bd124390c79b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1502,6 +1502,15 @@ void __free_pages_core(struct page *page, unsigned int order) > * interleaving within a single pageblock. It is therefore sufficient to check > * the first and last page of a pageblock and avoid checking each individual > * page in a pageblock. > + * > + * Note: the function may return non-NULL struct page even for a page block > + * which contains a memory hole (i.e. there is no physical memory for a subset > + * of the pfn range). For example, if the pageblock order is MAX_ORDER, which > + * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole > + * even though the start pfn is online and valid. This should be safe most of > + * the time because struct pages are still zero pre-filled and pfn walkers I don't think the pfn is just zero-filled even it's a hole. Can you confirm that? In memmap_init() and memmap_init_zone_range(), init_unavailable_range() is called to initialize the struct page. Best Regards, Huang, Ying > + * shouldn't touch any physical memory range for which they do not recognize > + * any specific metadata in struct pages. > */ > struct page *__pageblock_pfn_to_page(unsigned long start_pfn, > unsigned long end_pfn, struct zone *zone)