From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96B79C77B72 for ; Thu, 20 Apr 2023 07:24:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C22316B0071; Thu, 20 Apr 2023 03:24:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD11F6B0072; Thu, 20 Apr 2023 03:24:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A99546B0074; Thu, 20 Apr 2023 03:24:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9B4C86B0071 for ; Thu, 20 Apr 2023 03:24:21 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 74059C0525 for ; Thu, 20 Apr 2023 07:24:21 +0000 (UTC) X-FDA: 80700931122.08.524F947 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf26.hostedemail.com (Postfix) with ESMTP id D69A114001A for ; Thu, 20 Apr 2023 07:24:18 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PlcHztoq; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681975459; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NaqCBiM11wR09NHKkAS6JKcZbBFQZXkc+bp+ZWqHwFs=; b=5kJNoF9sCUEXOv1NrXD9WONacj62ErDV4st/KCJv3RDygjV/9bw1VBVVlyPPjykPEGF4Ue xie8iLimvFkLLSpXboXbZOW+AdvJgP8opowxMuQsHTtaiA+rK0wx0LUNaNIEg85HkfsUNi 2tuTjdv3xfeVamFt2YbYJoLFsCmmCrA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PlcHztoq; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681975459; a=rsa-sha256; cv=none; b=dlY1X55hDFJRMrq6K1LXnfXIUAjkUygf1lMjRbJ9TYixQAumWLkvSlnnjsiX74kk8YsSM+ StSFzu1cszTnx7sk3I1j5wI+F5YRP6a95+7QvDHQTFO6SMrXYcB8Eh4ue6BEjjtmNkJXVi YvzHqdBsX30C+3DRinWiFId2tt2v6hE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681975458; x=1713511458; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=qYzZ6ZIe1ih7rPAsTnxSOh7P5cfjCrZOjaZ15iTC+bU=; b=PlcHztoqtevYfguFvagO9hHkdhjM8eep+wne7ANqXitHccXNIzhsAln7 Bd4O4ZvOP2tN8bcD8nWUGJhD34gCTVCbl9ITsHMcb7wzDzTk5LdAR2ZrL uJaFi6OHnKZ/vR9YAGqwh96sAAeb1fn8ZQba+CMwX50e9NqiGH0tFEIVH Z/v9EAXix8VQyu+ot24O7RqhXIQX0+aPE92zX3g9jioijLZqNw+lCsjRv l8iNe++i4MkoCoUqeR6jZ66E0VL41vWOgaz7BeZbmmYa+ri6cDGKqOSUg zfN3kjsbQf9BLcbhQlVK7oK78/2U6QgMp7esZFxw+xfB9+Qv8TbVlxFnk A==; X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="408569534" X-IronPort-AV: E=Sophos;i="5.99,211,1677571200"; d="scan'208";a="408569534" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Apr 2023 00:24:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="761040972" X-IronPort-AV: E=Sophos;i="5.99,211,1677571200"; d="scan'208";a="761040972" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Apr 2023 00:24:14 -0700 From: "Huang, Ying" To: Baolin Wang Cc: David Hildenbrand , akpm@linux-foundation.org, mgorman@techsingularity.net, vbabka@suse.cz, mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/page_alloc: consider pfn holes after pfn_valid() in __pageblock_pfn_to_page() References: <62e231a8f2e50c04dcadc7a0cfaa6dea5ce1ec05.1681296022.git.baolin.wang@linux.alibaba.com> <94bfa3cc-674e-25b0-e7e2-d74c970acef7@redhat.com> Date: Thu, 20 Apr 2023 15:22:57 +0800 In-Reply-To: (Baolin Wang's message of "Wed, 12 Apr 2023 20:16:27 +0800") Message-ID: <87cz3zt3u6.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: 8bit X-Stat-Signature: ktbdkbyc4wrcraorf1doba46by6k9zh4 X-Rspam-User: X-Rspamd-Queue-Id: D69A114001A X-Rspamd-Server: rspam06 X-HE-Tag: 1681975458-359198 X-HE-Meta: U2FsdGVkX19kvP8IqferMjHo7c4t2h+ouZ53QX3K9r+Pb+q7AKGiHYQHKrYYdD7w59AHbZESYwQ50rMvU8DPZtkmFDFTi1c1pRfu2yLrqz+WYv1KQX42XAk72UXVJ0/oAH78l862Ut1/dDlxfluIkGusIg+3b+egXHcchHFeryi3LYqGeyyhaW7pjeRmXtAfWN3brk/snmjFJK86qGnVNcvuTqPd/ZfT/YxXsPoXDIN6rdpqeA1xxPRObtwxwm683I0TNuKvuFR4DLAW0BX/dxwV8JlIalIQvvKFe8v387LPahEUWaG9bQ0G4QrKK/hUEYYEOYVESTg6zElcIxIWmlGIWc4BcJPP0MIWZg56dRryubxzKrCkAKqdWlAVCMF9KSp1WR9IV002R45vI5tcmcpssIwTd9U9rlnHPjXD7FLnELW3fN6h5rvqRDzJe/ixp4fNrFFORnOMq6NK3cA3uNZyLU2iYBz/1gSZGYnXpjmaW8HzSbp+JHBmiQ5hRBad8mlngWKaNp1ee5Uamce29r+rssK4Q85/+fpz/DymIrPiB5r6d0HJCBXlYU7UC1w9EJb2UfbHHf4pyCuRhry5rB1D6UG4Y6tlhMJUYoZIgo558fLfcFVql47ne9bAv+F5+UhaiQV+p30qGmNmfI885YKIyhVf6ZC7+X13o3EvaTiLWX2SabLUMBElUQqdYH0KGNaOGAEX/n/mV1XodR5z7oSME07D1IFntKSVEDFv/JTp86gONZ0QZTQdTXYtRqgCjPs9//gEzcftBqg+TsojU6Pgem394CSRk2c/stV54wUnHCz+YyJdNa9wAMpDoTrN0Z8obtCchayZnw16+uv5qmYVijHTx7ehupIvq8RBJAP6f6meFtfW5+ohPYAMU65WAqybrpjBIKRDCdiuiB5IlIZPsuwxgx2l4v9erGtw50SokZTq9WFWXsLotFyL3QigaolSN9x+cVGKGn4lS4L RpBnulup FGQL2gXYQlipsaDqazS6+Ynxt/kjujFxQzN0UIBMVy96btlXXIH4pu2D9eMqYHsZN7FO40rWUj9HTnyrr40EG4cCvi199Aq5U86YN/OiWHN5t33UovCacP/vkYgO8hF99pl3+2F9o+MnUJfaLd8zEYyC6UP9r0INKMoc3EWq94qvND0Nm2rTlYLGeQ3F0vdQczRT1SKftT34/5Q3oGoOefH6T2/jOdulX0fFKzYopirm/AyM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Baolin Wang writes: > On 4/12/2023 7:25 PM, David Hildenbrand wrote: >> On 12.04.23 12:45, Baolin Wang wrote: >>> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), >>> which checks whether the given zone contains holes, and uses pfn_valid() >>> to check if the end pfn is valid. However pfn_valid() can not make sure >>> the end pfn is not a hole if the size of a pageblock is larger than the >>> size of a sub-mem_section, since the struct page getting by pfn_to_page() >>> may represent a hole or an unusable page frame, which may cause incorrect >>> zone contiguous is set. >>> >>> Though another user of pageblock_pfn_to_page() in compaction seems work >>> well now, it is better to avoid scanning or touching these offline pfns. >>> So like commit 2d070eab2e82 ("mm: consider zone which is not fully >>> populated to have holes"), we should also use pfn_to_online_page() for >>> the end pfn to make sure it is a valid pfn with usable page frame. >>> Meanwhile the pfn_valid() for end pfn can be dropped now. >>> >>> Moreover we've already used pfn_to_online_page() for start pfn to make >>> sure it is online and valid, so the pfn_valid() for the start pfn is >>> unnecessary, drop it. >> pageblocks are supposed to fall into a single memory section, so in >> mos > cases, if the start is online, so is the end. > > Yes, the granularity of memory hotplug is a mem_section. > > However, suppose the pageblock order is MAX_ORDER-1, and the size of a > sub-section is 2M, that means a pageblock will fall into 2 sub > mem-section, and if there is a hole in the zone, that means the 2nd > sub mem-section can be invalid without setting subsection_map bitmap. > > So the start is online can make sure the end pfn of a pageblock is > online, but a valid start pfn can not make sure the end pfn is valid > in the bitmap of ms->usage->subsection_map. arch_add_memory add_pages __add_pages sparse_add_section /* set subsection_map */ arch_add_memory() is only called by add_memory_resource() and pagemap_range() (called add_pages() too). In add_memory_resource(), check_hotplug_memory_range() will enforce a strict hotplug range alignment requirement (128 MB on x86_64). pagemap_range() are used for ZONE_DEVICE only. That is, for normal memory, hotplug granularity is much larger than 2MB. IIUC, the situation you mentioned above is impossible. Or do I miss something? Best Regards, Huang, Ying >> THE exception to this rule is when we have a mixture of ZONE_DEVICE >> and ZONE_* within the same section. >> Then, indeed the end might not be online. >> BUT, if the end is valid (-> ZONE_DEVICE), then the zone_id will >> differ. [let's ignore any races for now, up to this point they are >> mostly of theoretical nature] >> So I don't think this change actually fixes something. >> >> Getting rid of the pfn_valid(start_pfn). makes sense. Replacing the > > Yes, my motivation is try to optimize the __pageblock_pfn_to_page() > which is hot when doing compaction, and I saw these pfn_valid() can be > dropped. > >> pfn_valid(end_pfn) by a pfn_to_online_page(end_pfn) could make that >> function less efficient. >> >>> >>> Signed-off-by: Baolin Wang >>> --- >>> . mm/page_alloc.c | 7 +++---- >>> . 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index d0eb280ec7e4..8076f519c572 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1512,9 +1512,6 @@ struct page *__pageblock_pfn_to_page(unsigned >>> long start_pfn, >>> . . . /* end_pfn is one past the range we are checking */ >>> . . . end_pfn--; >>> -. . if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn)) >>> -. . . . return NULL; >>> - >>> . . . start_page = pfn_to_online_page(start_pfn); >>> . . . if (!start_page) >>> . . . . . return NULL; >>> @@ -1522,7 +1519,9 @@ struct page *__pageblock_pfn_to_page(unsigned >>> long start_pfn, >>> . . . if (page_zone(start_page) != zone) >>> . . . . . return NULL; >>> -. . end_page = pfn_to_page(end_pfn); >>> +. . end_page = pfn_to_online_page(end_pfn); >>> +. . if (!end_page) >>> +. . . . return NULL; >>> . . . /* This gives a shorter code than deriving page_zone(end_page) */ >>> . . . if (page_zone_id(start_page) != page_zone_id(end_page)) >>