From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E377C0015E for ; Fri, 7 Jul 2023 15:43:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8257A8D0001; Fri, 7 Jul 2023 11:43:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D6166B0074; Fri, 7 Jul 2023 11:43:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 676488D0001; Fri, 7 Jul 2023 11:43:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 54DBB6B0072 for ; Fri, 7 Jul 2023 11:43:04 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 28768C0DAF for ; Fri, 7 Jul 2023 15:43:04 +0000 (UTC) X-FDA: 80985234288.17.FA21301 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id B40431A001B for ; Fri, 7 Jul 2023 15:43:00 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="a+WnvAM/"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688744581; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wml8vcGohPzmThqzK5ez5tpXkc4mwz7Yay2uzk8gCUw=; b=YyzZ+htiI+xoM+ckKdKaJSpd2l//Elh3X0fDix0BXs10F9wpJWzPJPxlYbEq5QagEGdpfQ iFijDINW92DHZG+nFYMyPp4/X4R3wfZ1/5wIesBUYk2v0HEzCb7rwwwkKKt2QgI6BO5dQ/ 3LA9snPfLgQbflJXOm3G1s3odFED6IE= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="a+WnvAM/"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688744581; a=rsa-sha256; cv=none; b=Jj35QV3lzYZ2QYhBUr6Hxm5xk8S8o7xqKEVIoTppM6m15L8oL8vckcWHNDYdYtj7XhgpTp 3seb0rD1+3fVSSaVMs7ekwG3MQOFPXhQl68u7oYeVgdhvBuYoox8AxFqJMaRtent8SbvAF RYzlHT5AvkVaOQgkilLfbCaK1n+FXeg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688744579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wml8vcGohPzmThqzK5ez5tpXkc4mwz7Yay2uzk8gCUw=; b=a+WnvAM/rcVW80O4g+VGNoWvxL0yo3t9r4xowDrD1X4PvWykw6k+vDnxWa8lA3GvUW9G8u eCb5KDp8C6NDrmXMZpXPu9tQSkM7PFCaLO/H+Rj+o2s+YclvdWYxS7MQz4gxi09VSX4SUf gM2Hetuz2bc+ixQ9kUcm6f2w8vnmt7w= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-580-2TDN1BXxOnK5Ubzrx1Cz4Q-1; Fri, 07 Jul 2023 11:42:58 -0400 X-MC-Unique: 2TDN1BXxOnK5Ubzrx1Cz4Q-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-313e6020882so1571400f8f.1 for ; Fri, 07 Jul 2023 08:42:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688744577; x=1691336577; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wml8vcGohPzmThqzK5ez5tpXkc4mwz7Yay2uzk8gCUw=; b=W/H3ffYw8UY2nQA4wD2aGu6/HfOGYxT7DTlecIBOjMImqF/BVwz01cS5lTiEhZTgtz SCEWkw4agYa0E0iZGB7r50zqnPiqZKy9oP/nEzuzS/HfbyKRJ9ARWZZYnMRKYwPA9iK+ qP1jlrEnpCgvLicCIVSK02atH9rrD6w5zNiBZNosOOQzuRjpMidlreJ7CsYxCpbMqKfh eHs1ypkTmH539Qbr2mKb7v2RMqujOD/9C1MvVy9P6MeLKF62X7Wsmn1aIFevC9wo1u3S fCoIloKVwyQz/Uus6byWsgdP0EF8qJDkJ905y5pgNqV7z6vYpln7ipnexfnKh8w7o1VR +ugg== X-Gm-Message-State: ABy/qLZMyxwfQp2CKDyu8yLLQOFkOc3iA7vygJObCiskGWIhfdMM0L35 T1EP+uoEqgUP8S/esMuD5H3obSrrmoKzC2HsiQetU31fYRr0dlORbKVjW17HQA5WSN41wNIC7zF DNLPs+E50MiA= X-Received: by 2002:a5d:6a8a:0:b0:314:ebc:1471 with SMTP id s10-20020a5d6a8a000000b003140ebc1471mr8011501wru.27.1688744577672; Fri, 07 Jul 2023 08:42:57 -0700 (PDT) X-Google-Smtp-Source: APBJJlEpLBuLY461eo/LAx1knSmXhhMk2XnzKuV51i86cCVkRKtnb4GjSmM9rGcBUp0TVefqiXI56g== X-Received: by 2002:a5d:6a8a:0:b0:314:ebc:1471 with SMTP id s10-20020a5d6a8a000000b003140ebc1471mr8011482wru.27.1688744577273; Fri, 07 Jul 2023 08:42:57 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id v12-20020adfe4cc000000b003143cb109d5sm4724018wrm.14.2023.07.07.08.42.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 08:42:56 -0700 (PDT) Message-ID: <26e9bd4b-965a-4aaa-6ae9-b1600c7ef52d@redhat.com> Date: Fri, 7 Jul 2023 17:42:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Aneesh Kumar K V , linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , Michal Hocko , Vishal Verma References: <20230706085041.826340-1-aneesh.kumar@linux.ibm.com> <20230706085041.826340-2-aneesh.kumar@linux.ibm.com> <72488b8a-8f1e-c652-ab48-47e38290441f@redhat.com> <996e226a-2835-5b53-2255-2005c6335f98@linux.ibm.com> <9ca978e7-5c09-6d92-7983-03a731549b25@linux.ibm.com> <256bd2f0-1b77-26dc-6393-b26dd363912f@redhat.com> <1a35cb1c-5be5-3fba-d59f-132b36863312@linux.ibm.com> <87f1854d-5e91-2aaa-6c22-23be61529200@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: B40431A001B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: b7yic5bzjo9u9cq6ys7iatrcw5qeuc59 X-HE-Tag: 1688744580-666457 X-HE-Meta: U2FsdGVkX1+o8M6IMbDoWpaz+3JyTU3yKmeaP2yG0kzDmtUAfGBdxwisVyJwLKEITuexkodj4cSNJZXFqrfS9y4xR6L+u6XQg+sbvzABOf6HBaNdY5h+4aehFJFdTpY2s0IIUAPe5IEyjxbBa82W07T2VIONS6q2GroH/US/nW8gAVEvnoalyXZ8gdzyn86RMd8rSHpEUlo9N/ASWCFa5GAN6oXYL5jjqOyRnQRnjp0LltSZPckv6FGD45RiozCLLtASoldGPrZVfTze4XAgO8iLxVzfLIsARVJitCKK1GOKPq1ap2JA/2SLDZX9gbsGl9BTSCMqBX3SJklWk3xYiEOJa/C6FU5qejmLz06FFyz3nUMAB9qdvOo1LHAKWdZ3ADyUTo3KnoCaGDlqEc2EIPLn19o0zUJ3xBkQFylIHI4asA8BG2CFHqjoGBvtWcTbZRQFK84XHQZ2/4e4k7I4Tupgk5vRxSY8FKG9SnuS+8gb0lWviiiAlDaBCxIGfK77Ge47jiYkGzVCg78lJi4sFpdJSxX2ik3zQrhnwZBmbufSHhAjqMJxZ4HiOxkkvt9dKvQobo0GiAaNsmC6lovQbfZ8SZJhj2k2XSP7qGmGlXyA2DefOBnS1byrLpc0HfR3dmIUPNMfMFXY41qiZnEk7PD1NIG8ocrPcPtHeNj11CRONiUmpc8eJ1iQ/Bkl1i/bgdur+RfLyPvCfqaiKzBMhd/xUXyJWTASh49jTv1Dm8TnHJw65AdZDbV/2ZsdV+xhnxI+hWYVwuD4PIxfeCCu96NyGrog3/f5jntl30wf11r31kxr2BAg8ytIenqyv2LMkN5ZGkp6Oi1bmUKdcEgpCH/WgI47Chj2QXavtz9lNNRn5H2pKDrSnrL9eqGu5ioe7ZXvSJQZYzwMH5Jw9JBNxfKSekbprj/ukEVy2TqWEq35bIERGC8lfv6AM8D5SFr+Js+gXGULDHgLKlOdAA0 4pAypudr AzQeMnJ7vlC87MoZIQK9At0T91uoJVShUAw1X3E5OCaTH1PFfyBN25uvGlb7aZYuL/05TboM/mgJ7CUzqPBe7dKOQcHB2+eEYXvRqZZTUjvfTXzORz13vAkqxtJBiOo7yfGH+T73v3O/FCrTFc8as7/IUPrNG62mluW+mMlUL0eQLEYH67OKBbghPLcY2+0aefiLIKHwWonOY+xTfq+X8y5Tzk1CSJYTl10uPPacjmBSWXzDJRbzOaS+0mthazYCmlQ2kpmyZg8FNgP4sbrnhy869joyFarmOJ2VWraMIefYGY7szbMhHZ1h/QH72Vjv1vU0QFmBJrlB6Iak= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 07.07.23 15:30, Aneesh Kumar K V wrote: > On 7/7/23 5:47 PM, David Hildenbrand wrote: >> On 06.07.23 18:06, Aneesh Kumar K V wrote: >>> On 7/6/23 6:29 PM, David Hildenbrand wrote: >>>> On 06.07.23 14:32, Aneesh Kumar K V wrote: >>>>> On 7/6/23 4:44 PM, David Hildenbrand wrote: >>>>>> On 06.07.23 11:36, Aneesh Kumar K V wrote: >>>>>>> On 7/6/23 2:48 PM, David Hildenbrand wrote: >>>>>>>> On 06.07.23 10:50, Aneesh Kumar K.V wrote: >>>>>>>>> With memmap on memory, some architecture needs more details w.r.t altmap >>>>>>>>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. >>>>>>>> >>>>>>>> Can you elaborate why ppc64 needs that and x86-64 + aarch64 don't? >>>>>>>> >>>>>>>> IOW, why can't ppc64 simply allocate the vmemmap from the start of the memblock (-> base_pfn) and use the stored number of vmemmap pages to calculate the end_pfn? >>>>>>>> >>>>>>>> To rephrase: if the vmemmap is not at the beginning and doesn't cover full apgeblocks, memory onlining/offlining would be broken. >>>>>>>> >>>>>>>> [...] >>>>>>> >>>>>>> >>>>>>> With ppc64 and 64K pagesize and different memory block sizes, we can end up allocating vmemmap backing memory from outside altmap because >>>>>>> a single page vmemmap can cover 1024 pages (64 *1024/sizeof(struct page)). and that can point to pages outside the dev_pagemap range. >>>>>>> So on free we  check >>>>>> >>>>>> So you end up with a mixture of altmap and ordinarily-allocated vmemmap pages? That sound wrong (and is counter-intuitive to the feature in general, where we *don't* want to allocate the vmemmap from outside the altmap). >>>>>> >>>>>> (64 * 1024) / sizeof(struct page) -> 1024 pages >>>>>> >>>>>> 1024 pages * 64k = 64 MiB. >>>>>> >>>>>> What's the memory block size on these systems? If it's >= 64 MiB the vmemmap of a single memory block fits into a single page and we should be fine. >>>>>> >>>>>> Smells like you want to disable the feature on a 64k system. >>>>>> >>>>> >>>>> But that part of vmemmap_free is common for both dax,dax kmem and the new memmap on memory feature. ie, ppc64 vmemmap_free have checks which require >>>>> a full altmap structure with all the details in. So for memmap on memmory to work on ppc64 we do require similar altmap struct. Hence the idea >>>>> of adding vmemmap_altmap to  struct memory_block >>>> >>>> I'd suggest making sure that for the memmap_on_memory case your really *always* allocate from the altmap (that's what the feature is about after all), and otherwise block the feature (i.e., arch_mhp_supports_... should reject it). >>>> >>> >>> Sure. How about? >>> >>> bool mhp_supports_memmap_on_memory(unsigned long size) >>> { >>> >>>     unsigned long nr_pages = size >> PAGE_SHIFT; >>>     unsigned long vmemmap_size = nr_pages * sizeof(struct page); >>> >>>     if (!radix_enabled()) >>>         return false; >>>     /* >>>      * memmap on memory only supported with memory block size add/remove >>>      */ >>>     if (size != memory_block_size_bytes()) >>>         return false; >>>     /* >>>      * Also make sure the vmemmap allocation is fully contianed >>>      * so that we always allocate vmemmap memory from altmap area. >>>      */ >>>     if (!IS_ALIGNED(vmemmap_size,  PAGE_SIZE)) >>>         return false; >>>     /* >>>      * The pageblock alignment requirement is met by using >>>      * reserve blocks in altmap. >>>      */ >>>     return true; >>> } >> >> Better, but the PAGE_SIZE that could be added to common code as well. >> >> ... but, the pageblock check in common code implies a PAGE_SIZE check, so why do we need any other check besides the radix_enabled() check for arm64 and just keep all the other checks in common code as they are? >> >> If your vmemmap does not cover full pageblocks (which implies full pages), the feature cannot be used *unless* we'd waste altmap space in the vmemmap to cover one pageblock. >> >> Wasting hotplugged memory certainly sounds wrong? >> >> >> So I appreciate if you could explain why the pageblock check should not be had for ppc64? >> > > If we want things to be aligned to pageblock (2M) we will have to use 2M vmemmap space and that implies a memory block of 2G with 64K page size. That requirements makes the feature not useful at all > on power. The compromise i came to was what i mentioned in the commit message for enabling the feature on ppc64. As we'll always handle a 2M pageblock, you'll end up wasting memory. Assume a 64MiB memory block: With 64k: 1024 pages -> 64k vmemmap, almost 2 MiB wasted. ~3.1 % With 4k: 16384 pages -> 1 MiB vmemmap, 1 MiB wasted. ~1.5% It gets worse with smaller memory block sizes. > > We use altmap.reserve feature to align things correctly at pageblock granularity. We can end up loosing some pages in memory with this. For ex: with 256MB memory block > size, we require 4 pages to map vmemmap pages, In order to align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages > 28 pages get reserved. You can simply align-up the nr_vmemmap_pages up to pageblocks in the memory hotplug code (e.g., depending on a config/arch knob whether wasting memory is supported). Because the pageblock granularity is a memory onlining/offlining limitation and should be checked+handled exactly there. -- Cheers, David / dhildenb