From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C00BEC6FA8B for ; Thu, 8 Sep 2022 22:55:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FD238D0003; Thu, 8 Sep 2022 18:55:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ACAB8D0002; Thu, 8 Sep 2022 18:55:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04D208D0003; Thu, 8 Sep 2022 18:55:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E19FE8D0002 for ; Thu, 8 Sep 2022 18:55:52 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AAFAF160357 for ; Thu, 8 Sep 2022 22:55:52 +0000 (UTC) X-FDA: 79890427344.12.6CACD8B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id B554BA00A2 for ; Thu, 8 Sep 2022 22:55:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662677751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DU3bqKRijLSCOFrgdgPnLMDq2V/URDpkahXyMh3M2e4=; b=JrKjNZgwo152evPRw7boGlmFQPbcqpDcbihC0bekFtzf9EGSy0SbTnLRrgdFWxsc5flIxw 4mCcbClGvt1/KsGDhzWvyFgLiZmexkNPguimVFiLUZiph8/Ys3w9jt4ET1ika4O0EKrcMh 5+rhG4vWAD+trM5axAF/zoDg0nbAHnI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-423-ea5wydnNNNWaDOcORTs1ZA-1; Thu, 08 Sep 2022 18:55:49 -0400 X-MC-Unique: ea5wydnNNNWaDOcORTs1ZA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 237DA3C0D85E; Thu, 8 Sep 2022 22:55:49 +0000 (UTC) Received: from localhost (ovpn-12-17.pek2.redhat.com [10.72.12.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BED9E1121315; Thu, 8 Sep 2022 22:55:47 +0000 (UTC) Date: Fri, 9 Sep 2022 06:55:43 +0800 From: Baoquan He To: Ard Biesheuvel , will@kernel.org, catalin.marinas@arm.com, Nicolas Saenz Julienne Cc: Mike Rapoport , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, guanghuifeng@linux.alibaba.com, mark.rutland@arm.com, linux-mm@kvack.org, thunder.leizhen@huawei.com, wangkefeng.wang@huawei.com, kexec@lists.infradead.org Subject: Re: [PATCH 1/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end Message-ID: References: <20220828005545.94389-1-bhe@redhat.com> <20220828005545.94389-2-bhe@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662677752; a=rsa-sha256; cv=none; b=RTLNy0nB32efFbuKPI662tB/PkCpQOAJg8epiwBCnqKdJVH9uSU/EJjpfJ8CUDKOgErfEu L5kDyZCMOxd9X/T/ob6Rr5EdcNZE+Vs67wgIUIs2hwMqfLuLqygFchG3TQbo0c03KJeqjj s3KnOGjU6uujbdULakoAl3VM6x0hY/I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JrKjNZgw; spf=pass (imf25.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662677752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DU3bqKRijLSCOFrgdgPnLMDq2V/URDpkahXyMh3M2e4=; b=auzwkRaUlnHQ55tcG+6QybYbG8oNeHX2IgGj+r5PG2nGH1awehAcdhho/5Gl5Z9TukDbS+ bWBTEJ8ikV0znyTbxCdqccnKQMJ3M3QnkpDBwRM9d6vfVITWeEqUTbcLDCvQSd9AB6VZIy SFStfnKjS/hFvAdaQ3k3N0WXggZuch8= Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JrKjNZgw; spf=pass (imf25.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: z38nj36qxhoiwpjntrwqqpp4jy74o944 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B554BA00A2 X-HE-Tag: 1662677751-74552 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09/08/22 at 09:33pm, Baoquan He wrote: > On 09/06/22 at 03:05pm, Ard Biesheuvel wrote: > > On Mon, 5 Sept 2022 at 14:08, Baoquan He wrote: > > > > > > On 09/05/22 at 01:28pm, Mike Rapoport wrote: > > > > On Thu, Sep 01, 2022 at 08:25:54PM +0800, Baoquan He wrote: > > > > > On 09/01/22 at 10:24am, Mike Rapoport wrote: > > > > > > > > > > max_zone_phys() only handles cases when CONFIG_ZONE_DMA/DMA32 enabled, > > > > > the disabledCONFIG_ZONE_DMA/DMA32 case is not included. I can change > > > > > it like: > > > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > > { > > > > > phys_addr_t low_mem_mask = U32_MAX; > > > > > phys_addr_t phys_start = memblock_start_of_DRAM(); > > > > > > > > > > if ((!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) || > > > > > (phys_start > U32_MAX)) > > > > > low_mem_mask = PHYS_ADDR_MAX; > > > > > > > > > > return low_mem_mast + 1; > > > > > } > > > > > > > > > > or add the disabled CONFIG_ZONE_DMA/DMA32 case into crash_addr_low_max() > > > > > as you suggested. Which one do you like better? > > > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > > { > > > > > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > > > > > return PHYS_ADDR_MAX + 1; > > > > > > > > > > return max_zone_phys(32); > > > > > } > > > > > > > > I like the second variant better. > > > > > > Sure, will change to use the 2nd one . Thanks. > > > > > > > While I appreciate the effort that has gone into solving this problem, > > I don't think there is any consensus that an elaborate fix is required > > to ensure that the crash kernel can be unmapped from the linear map at > > all cost. In fact, I personally think we shouldn't bother, and IIRC, > > Will made a remark along the same lines back when the Huawei engineers > > were still driving this effort. > > > > So perhaps we could align on that before doing yet another version of this? > > Yes, certainly. That can save everybody's effort if there's different > opinion. Thanks for looking into this and the suggestion. > > About Will's remark, I checked those discussing threads, guess you are > mentioning the words in link [1]. I copy them at bottom for better > reference. Pleasae correct me if I am wrong. > > With my understanding, Will said so because the patch is too complex, > and there's risk that page table kernel data itself is using could share > the same block/section mapping as crashkernel region. With these > two cons, I agree with Will that we would rather take off the protection > on crashkernel region which is done by mapping or unmapping the region, > even though the protection enhances kdump's ronusness. > > Crashkernel reservation needs to know the low meory end so that DMA > buffer can be addressed by the dumping target, e.g storage disk. On the > current arm64, we have facts: > 1)Currently, except of Raspberry Pi 4, all arm64 systems can support > 32bit DMA addressing. So, except of RPi4, the low memory end can be > decided after memblock init is done, namely at the end of > arm64_memblock_init(). We don't need to defer the crashkernel > reservation until zone_sizes_init() is done. Those cases can be checked > in patch code. > 2)For RPi4, if its storage disk is 30bit DMA addressing, then we can > use crashkernel=xM@yM to specify reservation location under 1G to > work around this. > > *** > Based on above facts, with my patch applied: > pros: > 1) Performance issue is resolved; > 2) As you can see, the code with this patch applied will much > simpler, more straightforward and clearer; > 3) The protection can be kept; > 4) Crashkernel reservation can be easier to succeed on small memory > system, e.g virt guest system. The earlier the reservation is done, > it's more likely to get the whole chunk of meomry. > cons: > 1) Only RPi4 is put in inconvenience for crashkernel reservation. It > needs to use crashkernel=xM@yM to work around. > > *** > Take off the protection which is done by mapping or unmapping > crashkernel region as you and Will suggested: > pros: > 1) Performance issue is resolved; > 2) RPi4 will have the same convenience to set crashkernel; > > cons: > 1) No protection is taken on crashkernel region; > 2) Code logic is twisting. There are two places to separately reserve > crashkernel, one is at the end of arm64_memblock_init(), one is at > the end of bootmem_init(). > 3) Except of both CONFIG_ZONE_DMA|DMA32 disabled case, crashkernel > reservation is deferred. On small memory system, e.g virt guest system, > it increases risk that the resrevation could fail very possibly caused > by memory fragmentation. > > Besides, comparing the above two solutions, I also want to say kdump > is developed for enterprise level of system. We need combine with > reality when considering reasonable solution. E.g on x86_64, it has DMA > zone of 16M and DMA32 zone from 16M to 4G always in normal kernel. For > kdump, we ignore DMA zone directly because it's for ISA style devices. > Kdump doesn't support ISA style device with only 24bit DMA addressing > capability at the beginning, because it doesn't make sense, we never > hear that an enterprise level of x86_64 system needs to arm with kdump. Sorry, here I mean we never hear that an enterprise level of x86_64 system owns ISA storage disk and needs to arm with kdump. > > Hi Ard, Will, Catalin and other reviewers, > > Above is my understaning and thinking about the encountered issue, > plesae help check and point out what's missing or incorrect. > > Hi Nicolas, > > If it's convenient to you, please help make clear if the storage disk or > network card can only address 32bit DMA buffer on RPi4. Really ~~30bit, typo > appreciate that. > > *** > [1]Will's remark on Huawei's patch > https://lore.kernel.org/all/20220718131005.GA12406@willie-the-truck/T/#u > > ====quote Will's remark here > I do not think that this complexity is justified. As I have stated on > numerous occasions already, I would prefer that we leave the crashkernel > mapped when rodata is not "full". That fixes your performance issue and > matches what we do for module code, so I do not see a security argument > against it. > > I do not plan to merge this patch as-is. > === >