From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64F59EB64D9 for ; Fri, 7 Jul 2023 12:17:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C69098D0002; Fri, 7 Jul 2023 08:17:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C19288D0001; Fri, 7 Jul 2023 08:17:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABA528D0002; Fri, 7 Jul 2023 08:17:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9CEF68D0001 for ; Fri, 7 Jul 2023 08:17:58 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4F843160D77 for ; Fri, 7 Jul 2023 12:17:58 +0000 (UTC) X-FDA: 80984717436.12.94D4DE2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id E83A6120004 for ; Fri, 7 Jul 2023 12:17:55 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jKQUVqHe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688732276; a=rsa-sha256; cv=none; b=Vn3AYubuRtqaHW0VKq11frCDizISLZg+Wr1FXdhOEniBqTxT/w3ydPzQ6mXM9hSeSh6rSH +aUb683useUjv5L/SDtF3mvLCsbo+468lFq4vKA7zfrOOR0fS0Z0CeVUHYMk/PU9GZXwOj x5TPd9qo9PUTinLQtPaq1fsqyacNJHQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jKQUVqHe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688732276; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3PKweTvYl5LiGyKuKPuX7PKGhw8DHWGK7bc17UzyC90=; b=BAconbdeOPx4vjWK86hJrjProVR5VJHFdhRtLFaW2oqByf/TGwxv4TCxK/fHJ1zW6GeA40 S8yYpo3MglI3iQHTi6ZP1stOZenkdA/oXGgM1BlTyiR+m7oOJHrZUDZ+G6VmIjGWM03/fb K086uiXpVhYKMwT1Xfs1rZiGbey+cLo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688732275; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3PKweTvYl5LiGyKuKPuX7PKGhw8DHWGK7bc17UzyC90=; b=jKQUVqHeZa6Ya76OM3Oq+nYHCRlDinU3rC/D9N8l+KD0atmO24/QHNCrs0rokXxCbeF6tZ 9USRHBCDCLSCW8Yl8lqkNzYS4dVAMXstyNOG3VN+qenWS3hPLWWQ2py/IaJ9VKnZO5EdrS hK1BD909W0w2aiHiSax/qcuByBjg5nM= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-107-KmmiMql7M_CryK01_R0XNA-1; Fri, 07 Jul 2023 08:17:53 -0400 X-MC-Unique: KmmiMql7M_CryK01_R0XNA-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3fbdf34184eso10981995e9.1 for ; Fri, 07 Jul 2023 05:17:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688732273; x=1691324273; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3PKweTvYl5LiGyKuKPuX7PKGhw8DHWGK7bc17UzyC90=; b=XJJma+B8J9OphR5roDXku5jbg7DDdUnqfavuWB2B/FmC1TBWDGXILPR9jw7Obxp+ZH PFNDqv6GXloKLH7DgdE2TGE7IETy73hw+chg7N2W13gSJ356XS1ogRjmLwnHmPLTEOEq ofw+zcx45l8jziou31/XByOqJHf4/RHNXHgF+nD52/buWpjhriUL4981mr05xpmp8XVi ZaxiKtJEE3vufqmmndMTJYQNc/E7ZFcC+LJlwTu0YUg8ovR7URyO4r4zVIH1gN0FXq2h wXEUKlRNOR50zR+rkWr3cFrZLhwp4UAYxduzzfS30GSIpjAtc2QP66mAtNcz2DTIM054 GoRg== X-Gm-Message-State: ABy/qLb2qG/aSEafaFKEXNY31+tbiVLJ45TZocxSYup+/T1uqz+NDleN 8Y/RR1J2CQemULGJ4MYPXuhqpx5EXkgr5E7kIzPOWN6bQ+2fjfyiMxzM33/iTHzAXBZ9EjI7fQa K9bB2SE4dtv0= X-Received: by 2002:a7b:ce0f:0:b0:3f9:b87c:10db with SMTP id m15-20020a7bce0f000000b003f9b87c10dbmr3690957wmc.3.1688732272809; Fri, 07 Jul 2023 05:17:52 -0700 (PDT) X-Google-Smtp-Source: APBJJlHkVxtlC56TbBBzJDTyYz9QgBl2sPl89eMkHaXchfxXCDoV9DdYAzH3zqFfKdeoqyDcP7djkg== X-Received: by 2002:a7b:ce0f:0:b0:3f9:b87c:10db with SMTP id m15-20020a7bce0f000000b003f9b87c10dbmr3690937wmc.3.1688732272381; Fri, 07 Jul 2023 05:17:52 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id 24-20020a05600c021800b003fbd597bccesm2272010wmi.41.2023.07.07.05.17.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 05:17:51 -0700 (PDT) Message-ID: <87f1854d-5e91-2aaa-6c22-23be61529200@redhat.com> Date: Fri, 7 Jul 2023 14:17:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Aneesh Kumar K V , linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , Michal Hocko , Vishal Verma References: <20230706085041.826340-1-aneesh.kumar@linux.ibm.com> <20230706085041.826340-2-aneesh.kumar@linux.ibm.com> <72488b8a-8f1e-c652-ab48-47e38290441f@redhat.com> <996e226a-2835-5b53-2255-2005c6335f98@linux.ibm.com> <9ca978e7-5c09-6d92-7983-03a731549b25@linux.ibm.com> <256bd2f0-1b77-26dc-6393-b26dd363912f@redhat.com> <1a35cb1c-5be5-3fba-d59f-132b36863312@linux.ibm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block In-Reply-To: <1a35cb1c-5be5-3fba-d59f-132b36863312@linux.ibm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E83A6120004 X-Stat-Signature: 6spmu15xhzmogipjkjpioqembfiuouft X-HE-Tag: 1688732275-707319 X-HE-Meta: U2FsdGVkX1/bEtCQujzkmFF0Yy5ElX5SGvQuTVQY1HAibWqlHDK0fTRHKOQ11BiGVWit/Ln6yMVj29tb0lAR++4poq0mY+3mq42rscqRoGl66izPCc25CvZJkMRS5B8hL/MP+EJvB7K56Mb6WBAO/BWKlFxX7kuj1y8rwdaugHIbUAkPJZfz3BBRsD1rvALhopbZI7BrCTDIq6Xe3LMVGM9qKTkAxlIKsZaZncjDkfq80181FJC1cXSBMRaO4Nqx3ggVXVc10+1uylazJKEb7u0aLdtTnRVMAVbvAhY6+g9gQkdzcsDTJgjIwsUxlufGRMD1RV6KaZzKiVypvi7CXQIe07tlTlG/YzsMx1cTDVSGfRgzhzzvTykrCtFTUxQ+s7qsn/0KOMrfG5CrfyfNlxcAsmVY//9enFBseu2z7jXEGMQpLr9HfEvCkL1VcJnXm3z31TeCXSwJJ8j1P9mrbBJjdDCpTvZdg5HjBbDT9iVJmcquyaaZ2+SdNBB8zG6TMcPssLyLaevjGeblF36QgveH9n/qLj8TXxrwMVdcrJGUaSuPTmgQ3QZoDoQKrCALyZRBlL5VcltwMogPNxbwDQHtL5NhnDInhPs7ULKYE9yjvSMfBw0Qsw4MDSqseM8f7kfch30SpadvbpYyHDchbDzDFq9wtzeBRM99u0iMpVGxWDvq/W3Kul5UcAVRU4/PUkYu/FJYE7K81h+kWDlTxML+1Oi0HHUqusbuZKhoD1ToUy3Y5ddG9nvv+F4BuMCYOVjtTtlqWb1ZZqAtJ4nPCGeQ5SBMlOfpzFJjniz7Bmds+ovfNoH2IDgucnPD3YY0iCS79HMFQsOg3O6pc+jKP3XPc+Ljiu6utBBmNTw9UXC0qpdoZycKq7Kes72ldZdhAR+strK9HVXZ6TNmUVitOASZ3d+loBVY/NnhS6OKkrJnTuVea2XI5RDjw4K0QKwFVeAT7lYLliZbf/ILhpg hYD7K1fB kG9oxwhURHvDKRZ5y921uIAgbcyfQ49S/QgI0vb7tLd1+fl4871ah3r06cWQd2NZo8NO/8UqN2VnTQHmbd7wkV30/XiyPVvHyS2byWGTvzD6Y3t9jb4EEEJUWebCY3yX+yWU0EHmLIo5aZ7+Ox6a9p0xVjzQ2kF0fb+Kr2CP6jRoVaJ7vsiL1avsleqRFwQHDABfaOViMv0CSQMtLQYIvLAVAzuAYaySqcGOpn7rmd2SARtRKRSXDgfCKnkmiSzNpUmoArxL71las6T59XmlPNA4kr8aiLXoxETQHF+wvCoUijg8qtWMXOghhXeqi0Yph7WA9JAUlMHFAbuAmfTWa75jG1GLB9NitdH2vp5rz/8lpPaySFbevmLn+xha0D6cZPL0sqzlFh/1vAGVW5jQ0tzNvpgVJ6ts7nL9PsNPzFpr4kh7IbChtRj740g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.07.23 18:06, Aneesh Kumar K V wrote: > On 7/6/23 6:29 PM, David Hildenbrand wrote: >> On 06.07.23 14:32, Aneesh Kumar K V wrote: >>> On 7/6/23 4:44 PM, David Hildenbrand wrote: >>>> On 06.07.23 11:36, Aneesh Kumar K V wrote: >>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote: >>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote: >>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap >>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. >>>>>> >>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't? >>>>>> >>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn? >>>>>> >>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken. >>>>>> >>>>>> [...] >>>>> >>>>> >>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because >>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range. >>>>> So on free we  check >>>> >>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap). >>>> >>>> (64 * 1024) / sizeof(struct page) -> 1024 pages >>>> >>>> 1024 pages * 64k = 64 MiB. >>>> >>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine. >>>> >>>> Smells like you want to disable the feature on a 64k system. >>>> >>> >>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require >>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea >>> of adding vmemmap_altmap to  struct memory_block >> >> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it). >> > > Sure. How about? > > bool mhp_supports_memmap_on_memory(unsigned long size) > { > > unsigned long nr_pages = size >> PAGE_SHIFT; > unsigned long vmemmap_size = nr_pages * sizeof(struct page); > > if (!radix_enabled()) > return false; > /* > * memmap on memory only supported with memory block size add/remove > */ > if (size != memory_block_size_bytes()) > return false; > /* > * Also make sure the vmemmap allocation is fully contianed > * so that we always allocate vmemmap memory from altmap area. > */ > if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE)) > return false; > /* > * The pageblock alignment requirement is met by using > * reserve blocks in altmap. > */ > return true; > } Better, but the PAGE_SIZE that could be added to common code as well. ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are? If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock. Wasting hotplugged memory certainly sounds wrong? So I appreciate if you could explain why the pageblock check should not be had for ppc64? > > > > >> Then, you can reconstruct the altmap layout trivially >> >> base_pfn: start of the range to unplug >> end_pfn: base_pfn + nr_vmemmap_pages >> >> and pass that to the removal code, which will do the right thing, no? >> >> >> Sure, remembering the altmap might be a potential cleanup (eventually?), but the basic reasoning why this is required as patch #1 IMHO is wrong: if you say you support memmap_on_memory for a configuration, then you should also properly support it (allocate from the hotplugged memory), not silently fall back to something else. > > I guess you want to keep the altmap introduction as a later patch in the series and not the preparatory patch? Or are you ok with just adding the additional check I mentioned above w.r.t size value and keep this patch as patch 1 as a generic cleanup (avoiding > the recomputation of altmap->alloc/base_pfn/end_pfn? Yes, if it's not required better remove it completely from this patchset. We can alter discuss if keeping the altmap around is actually a cleanup or rather unnecessary. -- Cheers, David / dhildenb