From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4130C433F5 for ; Fri, 1 Oct 2021 08:04:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AF1161A55 for ; Fri, 1 Oct 2021 08:04:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4AF1161A55 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B4AD99400F9; Fri, 1 Oct 2021 04:04:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD36E9400E4; Fri, 1 Oct 2021 04:04:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 973EC9400F9; Fri, 1 Oct 2021 04:04:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id 85D2D9400E4 for ; Fri, 1 Oct 2021 04:04:32 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 465F88249980 for ; Fri, 1 Oct 2021 08:04:32 +0000 (UTC) X-FDA: 78647131584.05.5E4BDC8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf10.hostedemail.com (Postfix) with ESMTP id E1D786001E74 for ; Fri, 1 Oct 2021 08:04:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633075471; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RgbMOii1S0zNC4XfQSp0PtfslNwqAcBT4JkfYppI8hA=; b=VDytWw6F4KAuH2KvGyaaLUDIBg+ZDZAu7jUR/3DQ+pbTsGpxHTKqTjhIonZBLuYfK+uxl5 0ht2hAXgN2nLqIQMRImsmD8c8txwSDpyKdrnyGP+2f8TP2AAzCA94mH4FVtJkyYPJA774/ 64H1blaVoxRRPKi5bDYt3ezYmjauQUo= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-423-o0-Ph5bxO3SkDyRw27htBw-1; Fri, 01 Oct 2021 04:04:28 -0400 X-MC-Unique: o0-Ph5bxO3SkDyRw27htBw-1 Received: by mail-wr1-f72.google.com with SMTP id a10-20020a5d508a000000b00160723ce588so2518399wrt.23 for ; Fri, 01 Oct 2021 01:04:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=RgbMOii1S0zNC4XfQSp0PtfslNwqAcBT4JkfYppI8hA=; b=Pp42eiUhKZ9u99xGZ/Pglxv0+mYquC7INJOs9Cpv2EyVkdTyVWtjZ+rropYYw/LICV L+GBi4v8p5kuAvWw5ikf2uJBFS6hNkCsFUKnERSFoAs/772bsV8nyvXvCX4vH9AcmsLB gNSVmg2g3ZA8chh9mqi7G6J/uQXYvkf0FYyDOzJqp42fznz/1z9ZSEAbktmmWFi7QE3t Q3UkYv1o7JviG4+mvGpYkykcISvMhi1T8E0OnccS7HkB6eLGJV2LUG6V88qHNWJeU083 m6MlmFgY+wm8hqYsf/qItmgjXTXvCIgnSDJGHjyoOFU/Z+PVurH8pWRKhAnvjSsg3ulW U87w== X-Gm-Message-State: AOAM533kHKha8koh96ET6qHI8CKbs6fzJjr27OR88CghQV7dT6wKhIQU 4CxvwRR6dKA0OC5EUuXbXIXa/MiYDyHGytmMnU/rUJhXe+2683FbdepUYexKQiQlj0ti3i7oiXC l1bTF7jxk6+4= X-Received: by 2002:a1c:f310:: with SMTP id q16mr3156069wmq.145.1633075467051; Fri, 01 Oct 2021 01:04:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy09IiCKfFi33tOU52HzJrT7VrbqrMf/P7p34oNrrEfUQJ/n90uGLajbxmxEo4tDXieDpGReQ== X-Received: by 2002:a1c:f310:: with SMTP id q16mr3156030wmq.145.1633075466790; Fri, 01 Oct 2021 01:04:26 -0700 (PDT) Received: from [192.168.3.132] (p5b0c64da.dip0.t-ipconnect.de. [91.12.100.218]) by smtp.gmail.com with ESMTPSA id z17sm5132732wrr.49.2021.10.01.01.04.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 01 Oct 2021 01:04:26 -0700 (PDT) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Andrew Morton , Michal Hocko , Oscar Salvador , Jianyong Wu , "Aneesh Kumar K . V" , Vineet Gupta , Geert Uytterhoeven , Huacai Chen , Jiaxun Yang , Thomas Bogendoerfer , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Eric Biederman , Arnd Bergmann , linux-snps-arc@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-s390@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org References: <20210927150518.8607-1-david@redhat.com> <20210927150518.8607-4-david@redhat.com> <830c1670-378b-0fb6-bd5e-208e545fa126@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 3/4] memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED Message-ID: <0d6c86ba-076b-5d4b-33a8-da267f951a85@redhat.com> Date: Fri, 1 Oct 2021 10:04:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E1D786001E74 X-Stat-Signature: zkfrw7gxwi3bg99imhabcjhytuwazwiz Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VDytWw6F; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf10.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1633075471-372385 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.09.21 23:21, Mike Rapoport wrote: > On Wed, Sep 29, 2021 at 06:54:01PM +0200, David Hildenbrand wrote: >> On 29.09.21 18:39, Mike Rapoport wrote: >>> Hi, >>> >>> On Mon, Sep 27, 2021 at 05:05:17PM +0200, David Hildenbrand wrote: >>>> Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGE= D. >>>> Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such m= emory >>>> like ordinary MEMBLOCK_NONE memory -- for example, when selecting me= mory >>>> regions to add to the vmcore for dumping in the crashkernel via >>>> for_each_mem_range(). >>> Can you please elaborate on the difference in semantics of MEMBLOCK_H= OTPLUG >>> and MEMBLOCK_DRIVER_MANAGED? >>> Unless I'm missing something they both mark memory that can be unplug= ged >>> anytime and so it should not be used in certain cases. Why is there a= need >>> for a new flag? >> >> In the cover letter I have "Alternative B: Reuse MEMBLOCK_HOTPLUG. >> MEMBLOCK_HOTPLUG serves a different purpose, though.", but looking int= o the >> details it won't work as is. >> >> MEMBLOCK_HOTPLUG is used to mark memory early during boot that can lat= er get >> hotunplugged again and should be placed into ZONE_MOVABLE if the >> "movable_node" kernel parameter is set. >> >> The confusing part is that we talk about "hotpluggable" but really mea= n >> "hotunpluggable": the reason is that HW flags DIMM slots that can late= r be >> hotplugged as "hotpluggable" even though there is already something >> hotplugged. >=20 > MEMBLOCK_HOTPLUG name is indeed somewhat confusing, but still it's core > meaning "this memory may be removed" which does not differ from what > IORESOURCE_SYSRAM_DRIVER_MANAGED means. >=20 > MEMBLOCK_HOTPLUG regions are indeed placed into ZONE_MOVABLE, but more > importantly, they are avoided when we allocate memory from memblock. >=20 > So, in my view, both flags mean that the memory may be removed and it > should not be used for certain types of allocations. The semantics are different: MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the=20 firmware-provided memory map and added to the system early during boot;=20 we want this memory to be managed by ZONE_MOVABLE with "movable_node"=20 set on the kernel command line, because only then we want it to be=20 hotpluggable again. kexec *has to* indicate this memory to the second=20 kernel and can place kexec-images on this memory. After memory=20 hotunplug, kexec has to be re-armed. MEMBLOCK_DRIVER_MANAGED: memory is not indicated as System RAM" in the=20 firmware-provided memory map; this memory is always detected and added=20 to the system by a driver; memory might not actually be physically=20 hotunpluggable and the ZONE selection does not depend on "movable_core".=20 kexec *must not* indicate this memory to the second kernel and *must=20 not* place kexec-images on this memory. I would really advise against mixing concepts here. What we could do is indicate *all* hotplugged memory (not just=20 IORESOURCE_SYSRAM_DRIVER_MANAGED memory) as MEMBLOCK_HOTPLUG and make=20 MEMBLOCK_HOTPLUG less dependent on "movable_node". MEMBLOCK_HOTPLUG for early boot memory: with "movable_core", place it in=20 ZONE_MOVABLE. Even without "movable_core", don't place early kernel=20 allocations on this memory. MEMBLOCK_HOTPLUG for all memory: don't place kexec images or on this=20 memory, independent of "movable_core". memblock would then not contain the information "contained in=20 firmware-provided memory map" vs. "not contained in firmware-provided=20 memory map"; but I think right now it's not strictly required to have=20 that information if we'd go down that path. > =20 >> For example, ranges in the ACPI SRAT that are marked as >> ACPI_SRAT_MEM_HOT_PLUGGABLE will be marked MEMBLOCK_HOTPLUG early duri= ng >> boot (drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init()). Late= r, we >> use that information to size ZONE_MOVABLE >> (mm/page_alloc.c:find_zone_movable_pfns_for_nodes()). This will make s= ure >> that these "hotpluggable" DIMMs can later get hotunplugged. >> >> Also, see should_skip_region() how this relates to the "movable_node" = kernel >> parameter: >> >> /* skip hotpluggable memory regions if needed */ >> if (movable_node_is_enabled() && memblock_is_hotpluggable(m) && >> (flags & MEMBLOCK_HOTPLUG)) >> return true; >=20 > Hmm, I think that the movable_node_is_enabled() check here is excessive= , > but I suspect we cannot simply remove it without breaking anything. The reasoning is: without "movable_core" we don't want this memory to be=20 hotunpluggable; consequently, we don't care if we place kexec-images on=20 this memory. MEMBLOCK_HOTPLUG is currently only active with "movable_core= ". If we remove that check, we will always not place early kernel=20 allocations on that memory, even if we don't care about ZONE_MOVABLE. >=20 > I'll take a deeper look on the potential consequences. >=20 > BTW, is there anything that prevents putting kexec to hot-unplugable me= mory > that was cold-plugged on boot? I think it depends on how the platform handles hotunpluggable DIMMs or=20 hotunpluggable NUMA nodes. If the platform ends up indicates such memory=20 via MEMBLOCK_HOTPLUG, and "movable_core" is set, memory would be put=20 into ZONE_MOVABLE and kexec would not place kexec-images on that memory. --=20 Thanks, David / dhildenb