From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E357EB64DD for ; Mon, 24 Jul 2023 03:18:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9570B6B0071; Sun, 23 Jul 2023 23:18:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 906EA8D0001; Sun, 23 Jul 2023 23:18:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CF746B0075; Sun, 23 Jul 2023 23:18:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 686066B0071 for ; Sun, 23 Jul 2023 23:18:27 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 364861C972D for ; Mon, 24 Jul 2023 03:18:27 +0000 (UTC) X-FDA: 81045047454.10.7703DE6 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf01.hostedemail.com (Postfix) with ESMTP id 1AA794000B for ; Mon, 24 Jul 2023 03:18:22 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="MZI/19wY"; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690168704; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zYO0+HjyyvWKCUDE0krEtqd2T+oZGrGaY7kPUSwmNeU=; b=LzLD5/Te/NcKWzWMY8cPbXBHsz9x8YwAVVvIHN/AOmqnXEMeO4PlyMzrluuXAuzecAefDZ OF7D64ZboxFbA0jKuBuL/vroi2aoQ8YWhFcuXzBqOJQVKvJn7EkyB5yH+Y6eqnLfxNF16l EJ6z1D8J5nqrl1+cC2AN4V4XpbOGwUc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690168704; a=rsa-sha256; cv=none; b=bf1xKYBNqT8RLIu9samVUeYCrV3Jtc0IueB/uhqNafVZgCSh9qgsuzuecq/BMLv8kVlMm2 1NcOLkWXvrGFHAmPA1LkrMY3mrV7d/Q18C3Han1wFAgUFyKqFqTUZS/yEF1lP/oNlWWQ8Q 23J5KGuaYv2g8xQbLwcopkWfXLbqipM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="MZI/19wY"; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690168703; x=1721704703; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=V0GFpi5NOJg6k/sL9m42TEcqn0gYNHYLljSFMHAdJ64=; b=MZI/19wYGrFtShaWaj3N2RkTrRg0VE1ibiDHGUXr14X443gFh3z/laCK djZSpxf9dNQ/e4wAFz3U3BdYekX1s/vS/PZ58eXzVZAgggvEWObhfWLjp NDsUAcU2A1wfD7CSI86QIG0OKyYO8NIiTMrDLAFZ1QIIgH36VD9h1mY5d qeC6y5jK9VaOLbosV0mnl1iDAg2jhCW/v32ijkjb3EgLMs6lVAr9LAZgz Mvj0pUGVWYmmbp+ULi8rIpX/xCVyRENredmEZGiZqxoGOexoRbkcfVTJ4 x2GnDMrxoDcImCAG2KMO8qMuXY4mifCq8SeMFh4dX2vYEZxRGdVV51Nad A==; X-IronPort-AV: E=McAfee;i="6600,9927,10780"; a="453714647" X-IronPort-AV: E=Sophos;i="6.01,228,1684825200"; d="scan'208";a="453714647" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2023 20:18:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10780"; a="790806458" X-IronPort-AV: E=Sophos;i="6.01,228,1684825200"; d="scan'208";a="790806458" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2023 20:18:17 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" Cc: Vishal Verma , Andrew Morton , David Hildenbrand , Oscar Salvador , Dan Williams , Dave Jiang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, Dave Hansen , Jonathan Cameron , Jeff Moyer Subject: Re: [PATCH v2 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks References: <20230720-vv-kmem_memmap-v2-0-88bdaab34993@intel.com> <20230720-vv-kmem_memmap-v2-2-88bdaab34993@intel.com> <87a5vmadcw.fsf@linux.ibm.com> Date: Mon, 24 Jul 2023 11:16:28 +0800 In-Reply-To: <87a5vmadcw.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Sun, 23 Jul 2023 20:23:19 +0530") Message-ID: <87351e2e43.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 1AA794000B X-Rspam-User: X-Stat-Signature: jsio3iysdih1ce3bid4foj65m3fxtb1j X-Rspamd-Server: rspam03 X-HE-Tag: 1690168702-787244 X-HE-Meta: U2FsdGVkX18t4QPdQOXPQOGTDqvbqOKW+SRshNiUisOXnHuAN4V9jNFEBEqTJDRBHaZalFNNArdbJha6MSbvWfsUx/13RPSVVINuM1tMiHsg7l6cI8DHxukMdvk5fY2M9pTYZRRpiatW3ryIzyZ2Kcm6k7q5QKZXrEeUJ8gihUcxCx4U1MpoRsY4cCeE1rLAogyVJf00e4CD+CIH12Zte26Nc/DrmEGHksNgLR1KAVFXR47aoq2EgJSOzWniYnkeD5EoM1eU/TU60aXeLkbymA5S+EdR/XzbOa0uNL88wKYTQj3dWrcalAFRcU2cit6GThqRppw0fWeuSwK4MLcDkD8SgaOGIIH6Oi3fDSTWDmVoz4ZheC62ia0ZSndiJBiszT4Z6PBG6j4kNhphkI9W3btXOxJqXFKgmFqcUlFhBDVtgXo4ah0HPfUpyM+oSw8xft/YSXfXbPbH7yzhFzCOUQooZEyYo4HKdz5iiQXjxuxbOcDk8DwqFf9JGAPBhU+4ep4JGcX5hXRwhEfeDT1Gzh7jOs5wFsi7VEt8YW5rRjGI/Pz8knytWIrzGWHw47KFVZKmQIqO0/HvcWA/cMIBD83I/XoVvIq6/wcH+MdMPWjoyyBAvJTMP+0F/FkmOKBAQere5QzJRcf0Tx475u0ECnDoxk+4h4adKZXDRDQ5BzwDg3tuF3UeG7oWPhT0cm+sIXWN7lTFO3/MQmavLQ2xfnpqhz96kAbFctjTm16ZOtgLwhWgTKa9W3zTa3IT+/BYuzkb+lYZ3ttFLcgIUoLV+GD1/zb5QgTKVB9HocoeS5xl6Uf5zK+ssbHp1b7idcSqfXAAAnHEcqQe/+MbhvJYO2u8Ia50PyhblcyiCu9ScbJj2A5CZ34XteRq5w3MzA+XvVDZz2ZQ2mLK/A+v9gNnlxGQWFczbzLgCR0mO69ZY1WviWhIjWtVVEA7UsjglpZzBEnZmwHV/afocNkXA3S YshMj48F A/vecS3Wpm9v6JIdfvhQQJhTSSSvrFOqah6SIIOCKscSNM19sJpk72DXz1UFQA719lUubzHTlEcg9yGhl/jLxOXJPdi3MGL+Ev48c126RPvmJnJ3dx+swgic0Ls8kN0CRoO6ICTzAqSk+aOR9WHN9w1txP2wXfm2/u91MpvsV8bBdvdUir4sudiJNPa968TbVIflK6lTvno+AbMpZgyqnzg2+dgG3e7IBmRJi+xFCt1YnExh1oD1HD3N0ifpMkMJwik+eNQMyUszE3kXy41+RyrZakHOx6+Tlc14W+E/9nOaezxUNRWz5B77ZXihOsMSJ8DHf+nbCLGrBe1MeK3pOyKDTpN+Vv47rEtPk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Aneesh Kumar K.V" writes: > Vishal Verma writes: > >> The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is currently >> restricted to 'memblock_size' chunks of memory being added. Adding a >> larger span of memory precludes memmap_on_memory semantics. >> >> For users of hotplug such as kmem, large amounts of memory might get >> added from the CXL subsystem. In some cases, this amount may exceed the >> available 'main memory' to store the memmap for the memory being added. >> In this case, it is useful to have a way to place the memmap on the >> memory being added, even if it means splitting the addition into >> memblock-sized chunks. >> >> Change add_memory_resource() to loop over memblock-sized chunks of >> memory if caller requested memmap_on_memory, and if other conditions for >> it are met,. Teach try_remove_memory() to also expect that a memory >> range being removed might have been split up into memblock sized chunks, >> and to loop through those as needed. >> >> Cc: Andrew Morton >> Cc: David Hildenbrand >> Cc: Oscar Salvador >> Cc: Dan Williams >> Cc: Dave Jiang >> Cc: Dave Hansen >> Cc: Huang Ying >> Suggested-by: David Hildenbrand >> Signed-off-by: Vishal Verma >> --- >> mm/memory_hotplug.c | 154 +++++++++++++++++++++++++++++++--------------------- >> 1 file changed, 91 insertions(+), 63 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index e9bcacbcbae2..20456f0d28e6 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1286,6 +1286,35 @@ bool mhp_supports_memmap_on_memory(unsigned long size) >> } >> EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); >> >> +static int add_memory_create_devices(int nid, struct memory_group *group, >> + u64 start, u64 size, mhp_t mhp_flags) >> +{ >> + struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; >> + struct vmem_altmap mhp_altmap = {}; >> + int ret; >> + >> + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) { >> + mhp_altmap.free = PHYS_PFN(size); >> + mhp_altmap.base_pfn = PHYS_PFN(start); >> + params.altmap = &mhp_altmap; >> + } >> + >> + /* call arch's memory hotadd */ >> + ret = arch_add_memory(nid, start, size, ¶ms); >> + if (ret < 0) >> + return ret; >> + >> + /* create memory block devices after memory was added */ >> + ret = create_memory_block_devices(start, size, mhp_altmap.alloc, >> + group); >> + if (ret) { >> + arch_remove_memory(start, size, NULL); >> + return ret; >> + } >> + >> + return 0; >> +} >> + >> /* >> * NOTE: The caller must call lock_device_hotplug() to serialize hotplug >> * and online/offline operations (triggered e.g. by sysfs). >> @@ -1294,11 +1323,10 @@ EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); >> */ >> int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) >> { >> - struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; >> + unsigned long memblock_size = memory_block_size_bytes(); >> enum memblock_flags memblock_flags = MEMBLOCK_NONE; >> - struct vmem_altmap mhp_altmap = {}; >> struct memory_group *group = NULL; >> - u64 start, size; >> + u64 start, size, cur_start; >> bool new_node = false; >> int ret; >> >> @@ -1339,27 +1367,20 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) >> /* >> * Self hosted memmap array >> */ >> - if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { >> - if (!mhp_supports_memmap_on_memory(size)) { >> - ret = -EINVAL; >> + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) && >> + mhp_supports_memmap_on_memory(memblock_size)) { >> + for (cur_start = start; cur_start < start + size; >> + cur_start += memblock_size) { >> + ret = add_memory_create_devices(nid, group, cur_start, >> + memblock_size, >> + mhp_flags); >> + if (ret) >> + goto error; >> + } > > We should handle the below error details here. > > 1) If we hit an error after some blocks got added, should we iterate over rest of the dev_dax->nr_range. > 2) With some blocks added if we return a failure here, we remove the > resource in dax_kmem. Is that ok? > > IMHO error handling with partial creation of memory blocks in a resource range should be > documented with this change. Or, should we remove all added memory blocks upon error? -- Best Regards, Huang, Ying