From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FA3FC433DF for ; Fri, 3 Jul 2020 07:39:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 13D18206B6 for ; Fri, 3 Jul 2020 07:39:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JE791qR6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13D18206B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 648C68D005F; Fri, 3 Jul 2020 03:39:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F7998D005E; Fri, 3 Jul 2020 03:39:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E62C8D005F; Fri, 3 Jul 2020 03:39:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 3A0E48D005E for ; Fri, 3 Jul 2020 03:39:43 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E5F3B181AC9C6 for ; Fri, 3 Jul 2020 07:39:42 +0000 (UTC) X-FDA: 76995965004.18.scent49_5017ac626e90 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id C4827100ED3CE for ; Fri, 3 Jul 2020 07:39:42 +0000 (UTC) X-HE-Tag: scent49_5017ac626e90 X-Filterd-Recvd-Size: 11471 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Jul 2020 07:39:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593761981; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=COuTBdKKfATNGseQylBEQ8M25EUASJk469Iefivx6yI=; b=JE791qR6G8o9hBNDoP62w50s6IoKAZiuTpUM0xx9w4DBP1JA05BeQVP0nzDDhDPGvZU3qt 7XFvTJPcQr1kAr2LVji7dQuJpy8YKYOcna1lE+lLvVFAU0O5HL77NnnXj6x82vChwPawcY l9KlA29eSfH/UGJ8gxSFSwmhQ8rP7cg= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-359-_NPrAzTZOWuNGiI3TtlueQ-1; Fri, 03 Jul 2020 03:39:36 -0400 X-MC-Unique: _NPrAzTZOWuNGiI3TtlueQ-1 Received: by mail-qk1-f197.google.com with SMTP id o26so21156452qko.7 for ; Fri, 03 Jul 2020 00:39:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=COuTBdKKfATNGseQylBEQ8M25EUASJk469Iefivx6yI=; b=h0CP4jCT9caHaPiX4eePfXmEE+StlEkvqbxvsGp2AX5/g80eMZNiRsrBHAu0HHvowT HlhGVBfyAkvxwjDqn1iCCGjm4rO6418df96YsQvO1GELm6iCOW1JPrO3VaWD719io7BT 211ulK+PWYu242Uoe08zIeVeEt8evmc0Eg9zxRrWgCKJS4M4ATlbi0syLsvB3cdOHEFC YPK1R69DjwixvLmd2kJsY64g4UQ40rGVPeqoWF8X4ZgNo6bTd/QRkz+dJ9ugrZw1mU4G TJeec6AvZ6W5riNR3yxVbNXH1Ab8LrvU7gaGbIPI8AXfH/qWnJt/fAON1BGoDA7TdH4d XyTg== X-Gm-Message-State: AOAM533GddM/okJF6x79reFRdb3/mBeqHP29SMdUfMs/79TTW0852eqt d8Nl/uopVtF76mECHh635G3czKSgazRDS7p7SxBhUFx7+0ldGc3L7uwuxQt97MY9IecP5Wb/1+5 7Lq4kTM6Gm/Js12EiZBJ2Lh6Et+E= X-Received: by 2002:a37:bd84:: with SMTP id n126mr33758490qkf.310.1593761975748; Fri, 03 Jul 2020 00:39:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHgpJU6xl99XnSU6tgodTRM4YjrBd79yydUZARWkn88G6I+upL/qhXsI08qb09PHJ2W0dWQoLAitLw0ltIpM8= X-Received: by 2002:a37:bd84:: with SMTP id n126mr33758477qkf.310.1593761975397; Fri, 03 Jul 2020 00:39:35 -0700 (PDT) MIME-Version: 1.0 References: <1593641660-13254-1-git-send-email-bhsharma@redhat.com> <1593641660-13254-3-git-send-email-bhsharma@redhat.com> <20200702075001.GA16113@willie-the-truck> In-Reply-To: From: Bhupesh Sharma Date: Fri, 3 Jul 2020 13:09:23 +0530 Message-ID: Subject: Re: [PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA To: chenzhou Cc: Will Deacon , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel , Bhupesh SHARMA , Johannes Weiner , Michal Hocko , Vladimir Davydov , James Morse , Mark Rutland , Catalin Marinas , Linux Kernel Mailing List , kexec mailing list Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=bhsharma@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: C4827100ED3CE X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Chen, On Fri, Jul 3, 2020 at 10:54 AM chenzhou wrote: > > Hi Bhupesh, > > > On 2020/7/3 3:22, Bhupesh Sharma wrote: > > Hi Will, > > > > On Thu, Jul 2, 2020 at 1:20 PM Will Deacon wrote: > >> On Thu, Jul 02, 2020 at 03:44:20AM +0530, Bhupesh Sharma wrote: > >>> commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in > >>> ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32. > >>> > >>> However as reported by Prabhakar, this breaks kdump kernel booting in > >>> ThunderX2 like arm64 systems. I have noticed this on another ampere > >>> arm64 machine. The OOM log in the kdump kernel looks like this: > >>> > >>> [ 0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations > >>> [ 0.247713] swapper/0: page allocation failure: order:1, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 > >>> <..snip..> > >>> [ 0.274706] Call trace: > >>> [ 0.277170] dump_backtrace+0x0/0x208 > >>> [ 0.280863] show_stack+0x1c/0x28 > >>> [ 0.284207] dump_stack+0xc4/0x10c > >>> [ 0.287638] warn_alloc+0x104/0x170 > >>> [ 0.291156] __alloc_pages_slowpath.constprop.106+0xb08/0xb48 > >>> [ 0.296958] __alloc_pages_nodemask+0x2ac/0x2f8 > >>> [ 0.301530] alloc_page_interleave+0x20/0x90 > >>> [ 0.305839] alloc_pages_current+0xdc/0xf8 > >>> [ 0.309972] atomic_pool_expand+0x60/0x210 > >>> [ 0.314108] __dma_atomic_pool_init+0x50/0xa4 > >>> [ 0.318504] dma_atomic_pool_init+0xac/0x158 > >>> [ 0.322813] do_one_initcall+0x50/0x218 > >>> [ 0.326684] kernel_init_freeable+0x22c/0x2d0 > >>> [ 0.331083] kernel_init+0x18/0x110 > >>> [ 0.334600] ret_from_fork+0x10/0x18 > >>> > >>> This patch limits the crashkernel allocation to the first 1GB of > >>> the RAM accessible (ZONE_DMA), as otherwise we might run into OOM > >>> issues when crashkernel is executed, as it might have been originally > >>> allocated from either a ZONE_DMA32 memory or mixture of memory chunks > >>> belonging to both ZONE_DMA and ZONE_DMA32. > >> How does this interact with this ongoing series: > >> > >> https://lore.kernel.org/r/20200628083458.40066-1-chenzhou10@huawei.com > >> > >> (patch 4, in particular) > > Many thanks for having a look at this patchset. I was not aware that > > Chen had sent out a new version. > > I had noted in the v9 review of the high/low range allocation > > that I was working > > on a generic solution (irrespective of the crashkernel, low and high > > range allocation) which resulted in this patchset. > > > > The issue is two-fold: OOPs in memcfg layer (PATCH 1/2, which has been > > Acked-by memcfg maintainer) and OOM in the kdump kernel due to > > crashkernel allocation in ZONE_DMA32 regions(s) which is addressed by > > this PATCH. > > > > I will have a closer look at the v10 patchset Chen shared, but seems > > it needs some rework as per Dave's review comments which he shared > > today. > > IMO, in the meanwhile this patchset can be used to fix the existing > > kdump issue with upstream kernel. > Thanks for your work. > There is no progress on the issue for long time, so i sent my solution in v8 comments > and sent v9 recently. Thanks a lot for your inputs. Well, I was working on the OOPs seen with cgroups layer even when the memory cgroup is disabled via kdump command line. As the cgroup maintainer also noted during the review of PATCH 1/2 of this series, it's quite a corner case and hence hard to debug. Hence the delay in sending out this series. > I think direct limiting the crashkernel in ZONE_DMA isn't a good idea: > 1. For parameter "crashkernel=Y", reserving crashkernel in first 1G memory will increase > the probability of memory allocation failure. > Previous discuss from https://lkml.org/lkml/2019/10/21/725: > "With ZONE_DMA=y, this config will fail to reserve 512M CMA on a server" That is correct. However, we have limited options anyways at the moment, hence the need for the crashkernel hi/low support series which you are already working on. Unfortunately as I noted in the review of the v10 series today, it still needs rework to fix the OOM issue seen on ThunderX2 and ampere boards with crashkernel=X kind of format. See for details. So, to workaround the issue (while the crashkernel hi/lo support series is reworked), the idea is to have similar kdump behaviour as we were having on these boards before ZONE_DMA32 changes were introduced. I am also working on fixing the '__dma_atomic_pool_init' behaviour itself (inside 'kernel/dma/pool.c') to adapt to ZONE_DMA and ZONE_DMA32 range availability in the kdump kernel, but this is a complex implementation and requires thorough checks (especially with drivers which can only work within ZONE_DMA memory regions in the kdump kernel). Hence it might take some iterations to share a RFC patch on the same. I will send a v2 addressing Will's inputs shortly. Thanks, Bhupesh > 2. For parameter "crashkernel=Y@X", limiting the crashkernel in ZONE_DMA is unreasonable > for someone really want to reserve crashkernel from specified start address. > > I have sent v10: https://www.spinics.net/lists/arm-kernel/msg819408.html, any commets are welcome. > > Thanks, > Chen Zhou > > > >>> Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32") > >>> Cc: Johannes Weiner > >>> Cc: Michal Hocko > >>> Cc: Vladimir Davydov > >>> Cc: James Morse > >>> Cc: Mark Rutland > >>> Cc: Will Deacon > >>> Cc: Catalin Marinas > >>> Cc: cgroups@vger.kernel.org > >>> Cc: linux-mm@kvack.org > >>> Cc: linux-arm-kernel@lists.infradead.org > >>> Cc: linux-kernel@vger.kernel.org > >>> Cc: kexec@lists.infradead.org > >>> Reported-by: Prabhakar Kushwaha > >>> Signed-off-by: Bhupesh Sharma > >>> --- > >>> arch/arm64/mm/init.c | 16 ++++++++++++++-- > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >>> index 1e93cfc7c47a..02ae4d623802 100644 > >>> --- a/arch/arm64/mm/init.c > >>> +++ b/arch/arm64/mm/init.c > >>> @@ -91,8 +91,15 @@ static void __init reserve_crashkernel(void) > >>> crash_size = PAGE_ALIGN(crash_size); > >>> > >>> if (crash_base == 0) { > >>> - /* Current arm64 boot protocol requires 2MB alignment */ > >>> - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit, > >>> + /* Current arm64 boot protocol requires 2MB alignment. > >>> + * Also limit the crashkernel allocation to the first > >>> + * 1GB of the RAM accessible (ZONE_DMA), as otherwise we > >>> + * might run into OOM issues when crashkernel is executed, > >>> + * as it might have been originally allocated from > >>> + * either a ZONE_DMA32 memory or mixture of memory > >>> + * chunks belonging to both ZONE_DMA and ZONE_DMA32. > >>> + */ > >> This comment needs help. Why does putting the crashkernel in ZONE_DMA > >> prevent "OOM issues"? > > Sure, I can work on adding more details in the comment so that it > > explains the potential OOM issue(s) better. > > > > Thanks, > > Bhupesh > > > > > > . > > > >