From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88CE0C54E4A for ; Tue, 12 May 2020 14:12:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 37E9B20722 for ; Tue, 12 May 2020 14:12:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eD7nxb7M" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 37E9B20722 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AE0319000BC; Tue, 12 May 2020 10:12:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6B16900036; Tue, 12 May 2020 10:12:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 933DD9000BC; Tue, 12 May 2020 10:12:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 7815D900036 for ; Tue, 12 May 2020 10:12:02 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2755282499B9 for ; Tue, 12 May 2020 14:12:02 +0000 (UTC) X-FDA: 76808256084.01.chain59_68fc0dc8c3757 X-HE-Tag: chain59_68fc0dc8c3757 X-Filterd-Recvd-Size: 7306 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 May 2020 14:12:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589292720; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=n70UbOUvJa9P4RPgTvmULexgnX+jCFgCJ+13wy5Dy+Q=; b=eD7nxb7MGgIoxUfpyCOopR9w9XZyy8VIYQCBbWwGMBVp2fUTf4IzNdf6FEpL7TUqDghP6z 7rFguGjUSVmF55v/bk8nTDMlASvGvGl44iFmzTDSj4kqFwMeY3BxtBulchTbjX1FE5x0cB HjtfJNWBtdnvDyAW2ASWu6isLBC5ti0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-239-zpf1rqexMOe-syUUIUv8rw-1; Tue, 12 May 2020 10:11:58 -0400 X-MC-Unique: zpf1rqexMOe-syUUIUv8rw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 46D2B8005B7; Tue, 12 May 2020 14:11:57 +0000 (UTC) Received: from localhost (ovpn-12-40.pek2.redhat.com [10.72.12.40]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CE5977840A; Tue, 12 May 2020 14:11:50 +0000 (UTC) Date: Tue, 12 May 2020 22:11:48 +0800 From: Baoquan He To: David Hildenbrand Cc: "Eric W. Biederman" , James Morse , kexec@lists.infradead.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Dave Young Subject: Re: [PATCH] kexec: Discard loaded image on memory hotplug Message-ID: <20200512141148.GL5029@MiWiFi-R3L-srv> References: <20200501165701.24587-1-james.morse@arm.com> <40b07632-b044-d1cd-96a2-81eec3da93e7@redhat.com> <8736892l92.fsf@x220.int.ebiederm.org> <20200511112755.GB4922@MiWiFi-R3L-srv> <04c8edd0-5c67-3ba7-5f87-c16a47b2af5c@redhat.com> <20200512103402.GK5029@MiWiFi-R3L-srv> <265d463c-1b2f-ca97-391d-8d1f9d60f16a@redhat.com> MIME-Version: 1.0 In-Reply-To: <265d463c-1b2f-ca97-391d-8d1f9d60f16a@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/12/20 at 12:54pm, David Hildenbrand wrote: > >> kexec_load(): > >> > >> 1. kexec-tools could have placed kexec images on memory that will be > >> removed. > >> > >> 2. the memory map of the guest is stale (esp., might still contain > >> hotunplugged memory). /sys/firmware/memmap and /proc/iomem will be > >> updated, so kexec-tools can fix this up. > > > > With my understanding, this is a corner case. Before James's last > > patchset, I even hadn't realized this is a problem. Because we usually > > load kexec image, next trigger a kexec rebooting. Wondering if James > > just found out a potential issue, or he really met this problem. Surely, > > Should be as easy as hotplugging a dimm, loading "kexec -c", unplugging > the dimm, triggering "kexec -e" if I am not wrong. Hmm, kexec rebooting is also one kind of rebooting, we may not want to hot plug memory during that time. But, yes, just in case. > > > we should fix it when have identified it, even though it's a corner > > case. > > > > And we suggested adding service of loading kexec to fix this. We > > suggest this because kdump also need to recollect the memory regions > > so that it can pass them into 2nd kernel and dump the newly added > > memory region, or not dump the already removed memory region. > > Kdump kernel won't get problem during boot or running caused by the > > hot added/removed memory as kexec kernel does, however, on failing to > > achieve expected result, kdump and kexec have the same problem. I don't > > see why kdump can be reloaded by memory adding/removing uevent triggering, > > but kexec can't. If have to unload kexec image, does kdump image need > > be unloaded? > > I think that approach is racy and might easily trigger a crash when > "kexec -e" is called at the wrong time during memory unplug. See below > why kdump is different. Triggering unloading in the kernel does not > conflict with that approach and even seems to fit into the picture, no? > > 1. Memory gets hot(un)plugged > 2. The kernel unloads the kexec image while processing the hot(un)plug > to make sure nothing will go wrong. > 3. User space gets notified and triggers reloading of kexec. > > That sounds like a sane approach to me, no? If there is no 3., nothing > will break. If there is a "kexec -e" before 3 finished, nothing will > break. As we discussed, we might be able to special-case > kexec_file_load() and not unload, but simply fixup. > > Note that kdump is slightly different. In case memory gets hotplugged > and kdump is not reloaded, that memory will simply not get dumped. In > case memory gets hotunplugged and kdump is not reloaded, that memory > will be skipped by makedumpfile (realizes memory is gone when parsing > the sparse sections, trying to find the memmap). In contrast to kexec, > there is no kernel crash. > > > > > Here my main concern is if it will complicate kexec code. While > > reloading it via systemd service won't. No matther if it's making kexec > > disable memory hotplug, or making memory hotplug disabling kexec, it > > seems to couple kexec with other feature/subcomponent. Anyway, we have > > added a kexec loading service, any memory adding/removing uevent will > > trigger the reloading. This patch won't impact anything, even though > > it doesn't make sense to us, so have no objection to this. > > I don't consider unloading in the kernel a lot of complexity. And it > seems to be the right thing to do to avoid crashes, especially if user > space will not reload itself. > > > > > Another thing is below patch. Another case of complicating kexec because > > of specific use case, please feel free to help review and add comment. > > I am wondering if we can make it in user space too. E.g for oracle DB, > > we limit the memory allocation within the movable nodes for memory > > hotplugging, we can also add memmap= or mem= to kexec-ed kernel to protect > > those memory regions inside the nodes, then restore the data from the nodes. > > Not sure if VM data can be put in MOVABLE zone only. > > > > [RFC 00/43] PKRAM: Preserved-over-Kexec RAM > > I've seen that patch set and it is on my todo list, not sure when I'll > have time to look into it. From a quick glimpse, I had the feeling that > it was not dealing with memory hot(un)plug, most probably because > concurrent memory hot(un)plug is not the target use case. Not, it's not about hot plug. Hope you can help check if restoring VM data in kexec-ed kernel have to be done like that from virt dev's point of view. Please feel free to add other virt expert you know who is familiar with that to the list to help review.