From: David Hildenbrand <david@redhat.com>
To: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Russell King - ARM Linux admin <linux@armlinux.org.uk>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Bhupesh Sharma <bhsharma@redhat.com>,
kexec@lists.infradead.org, linux-mm@kvack.org,
James Morse <james.morse@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
Date: Tue, 14 Apr 2020 11:37:01 +0200 [thread overview]
Message-ID: <ad060c8a-8afe-3858-0a4f-27ff54ef4c68@redhat.com> (raw)
In-Reply-To: <20200414092201.GD4247@MiWiFi-R3L-srv>
On 14.04.20 11:22, Baoquan He wrote:
> On 04/14/20 at 10:00am, David Hildenbrand wrote:
>> On 14.04.20 08:40, Baoquan He wrote:
>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote:
>>>> Baoquan He <bhe@redhat.com> writes:
>>>>
>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote:
>>>>>>
>>>>>> The only benefit of kexec_file_load is that it is simple enough from a
>>>>>> kernel perspective that signatures can be checked.
>>>>>
>>>>> We don't have this restriction any more with below commit:
>>>>>
>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG
>>>>> and KEXEC_SIG_FORCE")
>>>>>
>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both
>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is
>>>>> enough to astract and convince us to use it instead. And kexec_file_load
>>>>> has been in use for several years on systems with secure boot, since
>>>>> added in 2014, on x86_64.
>>>>
>>>> No. Actaully kexec_file_load is the less capable interface, and less
>>>> flexible interface. Which is why it is appropriate for signature
>>>> verification.
>>>
>>> Well, everyone has a stance and the corresponding view. You could have
>>> wider view from long time maintenance and in upstrem position, and think
>>> kexec_file_load is horrible. But I can only see from our work as a front
>>> line engineer to maintain/develop kexec/kdump in RHEL, and think
>>> kexec_file_load is easier to maintain.
>>>
>>> Surely except of multiple kernel image format support. No matter it is
>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage.
>>> This is produced from kerel building by default. We have no way to
>>> support it in our distros and add it into kexec_file_load.
>>>
>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able
>>> https://lkml.org/lkml/2017/2/15/654
>>>
>>>>
>>>>>> kexec_load in every other respect is the more capable and functional
>>>>>> interface. It makes no sense to get rid of it.
>>>>>>
>>>>>> It does make sense to reload with a loaded kernel on memory hotplug.
>>>>>> That is simple and easy. If we are going to handle something in the
>>>>>> kernel it should simple an automated unloading of the kernel on memory
>>>>>> hotplug.
>>>>>>
>>>>>>
>>>>>> I think it would be irresponsible to deprecate kexec_load on any
>>>>>> platform.
>>>>>>
>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb
>>>>>> on arm32 if someone wants to deal with signatures.
>>>>>>
>>>>>> We definitely can not even think of deprecating kexec_load until
>>>>>> architecture that supports it also supports kexec_file_load and everyone
>>>>>> is happy with that interface. That is Linus's no regression rule.
>>>>>
>>>>> I should pick a milder word to express our tendency and tell our plan
>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help
>>>>> much. I didn't mean to say 'deprecate' at all when replied.
>>>>>
>>>>> The situation and trend I understand about kexec_load and kexec_file_load
>>>>> are:
>>>>>
>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't
>>>>> have yet, just as x86_64, arm64 and s390 have done;
>>>>>
>>>>> 2) kexec_file_load is suggested to use, and take precedence over
>>>>> kexec_load in the future, if both are supported in one ARCH.
>>>>
>>>> The deep problem is that kexec_file_load is distinctly less expressive
>>>> than kexec_load.
>>>>
>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support,
>>>>> and by ARCHes for back compatibility w/ kexec_file_load support.
>>>>>
>>>>> For 1) and 2), I think the reason is obvious as Eric said,
>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug
>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load.
>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it
>>>>> in kernel space only, for kexec_file_load. This is what I meant about
>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too.
>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the
>>>>> old kexec_load interface in old product.
>>>>
>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite
>>>> stable and rarely needs changes except during a port of kexec to
>>>> another architecture.
>>>>
>>>> Last I looked the real maintenance effor of kexec and kexec on panic was
>>>> in the drivers. So I don't think we can use maintenance to do anything.
>>>
>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has
>>> been taken to make SEV work well on kexec_file_load. And we have
>>> switched to use kexec_file_load in the newly published Fedora release
>>> on x86_64 by default. Before this, Lianbo has investigated and done many
>>> experiments to make sure the switching is safe. We finally made this
>>> decision. Next we will do the switch in Enterprise distros. Once these
>>> are proved safe, we will suggest customers to use kexec_file_load for
>>> kexec rebooting too. In the future, we will only care about
>>> kexec_file_load if everying is going well. But as I have explained
>>> repeatedly, only caring about kexec_file_load means we will leave
>>> kexec_load as is, we will not add new feature or improvement patches
>>> for it.
>>>
>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854
>>> Author: Lianbo Jiang <lijiang@redhat.com>
>>> Date: Thu Jan 16 13:47:35 2020 +0800
>>>
>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default
>>>
>>>>
>>>>> For 3), people can still use kexec_load and develop/fix for it, if no
>>>>> kexec_file_load supported. But 32-bit arm should be a different one,
>>>>> more like i386, we will leave it as is, and fix anything which could
>>>>> break it. But people really expects to improve or add feature to it? E.g
>>>>> in this patchset, the mem hotplug issue James raised, I assume James is
>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in
>>>>> another reply, people even don't agree to continue supporting memory
>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug
>>>>> bug on i386 with a patch, but people would rather set it as BROKEN.
>>>>
>>>> For memory hotplug just reload. Userspace already gets good events.
>>>
>>> Kexec_file_load is easy to maintain. This is an example.
>>>
>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset,
>>> it's obviously not right. We can't disable memory hotplug just because
>>> kexec-ed kernel is loaded ahead of time.
>>>
>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a
>>> movable area, reloading can avoid kexec rebooting corruption if that
>>> area is hot removed. But if that area is not removed, locating kernel
>>> into the hotpluggable area will change the area into ummovable zone.
>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I
>>> guess it's very hard. Now in our distros kexec rebooting has been
>>> supported, the big cloud providers are deploying linux in guest, bugs on
>>> kexec reboot failure has been reported. They need the memory hotplug to
>>> increase/decrease memory.
>>>
>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory
>>> region. Just avoiding the movable area can fix it. In kexec_file_load(),
>>> just checking or picking those unmovable region to put kernel/initrd in
>>> function locate_mem_hole_callback() can fix it. The page or pageblock's
>>> zone is movable or not, it's easy to know. This fix doesn't need to
>>> bother other component.
>>
>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL
>> does not imply that it cannot get offlined and removed e.g., this is
>> heavily used on ppc64, with 16MB sections.
>
> Really? I just know there are two kinds of mem hoplug in ppc, but don't
> know the details. So in this case, is there any flag or a way to know
> those memory block are hotpluggable? I am curious how those kernel data
> is avoided to be put in this area. Or ppc just freely uses it for kernel
> data or user space data, then try to migrate when hot remove?
See
arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count()
Under DLAPR, it can remove memory in LMB granularity, which is usually
16MB (== single section on ppc64). DLPAR will directly online all
hotplugged memory (LMBs) from the kernel using device_online(), which
will go to ZONE_NORMAL.
When trying to remove memory, it simply scans for offlineable 16MB
memory blocks (==section == LMB), offlines and removes them. No need for
the movable zone and all the involved issues.
Now, the interesting question is, can we have LMBs added during boot
(not via add_memory()), that will later be removed via remove_memory().
IIRC, we had BUGs related to that, so I think yes. If a section contains
no unmovable allocations (after boot), it can get removed.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2020-04-14 9:37 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
2020-03-27 0:43 ` Anshuman Khandual
2020-03-27 2:54 ` Baoquan He
2020-03-27 15:46 ` James Morse
2020-03-27 2:34 ` Baoquan He
2020-03-27 9:30 ` David Hildenbrand
2020-03-27 16:56 ` James Morse
2020-03-27 17:06 ` David Hildenbrand
2020-03-27 18:07 ` James Morse
2020-03-27 18:52 ` David Hildenbrand
2020-03-30 13:00 ` James Morse
2020-03-30 13:13 ` David Hildenbrand
2020-03-30 17:17 ` James Morse
2020-03-30 18:14 ` David Hildenbrand
2020-04-10 19:10 ` Andrew Morton
2020-04-11 3:44 ` Baoquan He
2020-04-11 9:30 ` Russell King - ARM Linux admin
2020-04-11 9:58 ` David Hildenbrand
2020-04-12 5:35 ` Baoquan He
2020-04-12 8:08 ` Russell King - ARM Linux admin
2020-04-12 19:52 ` Eric W. Biederman
2020-04-12 20:37 ` Bhupesh SHARMA
2020-04-13 2:37 ` Baoquan He
2020-04-13 13:15 ` Eric W. Biederman
2020-04-13 23:01 ` Andrew Morton
2020-04-14 6:13 ` Eric W. Biederman
2020-04-14 6:40 ` Baoquan He
2020-04-14 6:51 ` Baoquan He
2020-04-14 8:00 ` David Hildenbrand
2020-04-14 9:22 ` Baoquan He
2020-04-14 9:37 ` David Hildenbrand [this message]
2020-04-14 14:39 ` Baoquan He
2020-04-14 14:49 ` David Hildenbrand
2020-04-15 2:35 ` Baoquan He
2020-04-16 13:31 ` David Hildenbrand
2020-04-16 14:02 ` Baoquan He
2020-04-16 14:09 ` David Hildenbrand
2020-04-16 14:36 ` Baoquan He
2020-04-16 14:47 ` David Hildenbrand
2020-04-21 13:29 ` David Hildenbrand
2020-04-21 13:57 ` David Hildenbrand
2020-04-21 13:59 ` Eric W. Biederman
2020-04-21 14:30 ` David Hildenbrand
2020-04-22 9:17 ` Baoquan He
2020-04-22 9:24 ` David Hildenbrand
2020-04-22 9:57 ` Baoquan He
2020-04-22 10:05 ` David Hildenbrand
2020-04-22 10:36 ` Baoquan He
2020-04-14 9:16 ` Dave Young
2020-04-14 9:38 ` Dave Young
2020-04-14 7:05 ` David Hildenbrand
2020-04-14 16:55 ` James Morse
2020-04-14 17:41 ` David Hildenbrand
2020-04-15 20:33 ` Eric W. Biederman
2020-04-22 12:28 ` James Morse
2020-04-22 15:25 ` Eric W. Biederman
2020-04-22 16:40 ` David Hildenbrand
2020-04-23 16:29 ` Eric W. Biederman
2020-04-24 7:39 ` David Hildenbrand
2020-04-24 7:41 ` David Hildenbrand
2020-05-01 16:55 ` James Morse
2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
2020-03-27 9:59 ` David Hildenbrand
2020-03-27 15:39 ` James Morse
2020-03-30 13:23 ` David Hildenbrand
2020-03-30 17:17 ` James Morse
2020-04-02 5:49 ` Dave Young
2020-04-02 6:12 ` piliu
2020-04-14 17:21 ` James Morse
2020-04-15 20:36 ` Eric W. Biederman
2020-04-22 12:14 ` James Morse
2020-05-09 0:45 ` Andrew Morton
2020-05-11 8:35 ` David Hildenbrand
2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
2020-03-30 19:01 ` David Hildenbrand
2020-04-15 20:37 ` Eric W. Biederman
2020-04-22 12:14 ` James Morse
2020-03-27 2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
2020-03-27 15:40 ` James Morse
2020-03-27 9:27 ` David Hildenbrand
2020-03-27 15:42 ` James Morse
2020-03-30 13:18 ` David Hildenbrand
2020-03-30 13:55 ` Baoquan He
2020-03-30 17:17 ` James Morse
2020-03-31 3:46 ` Dave Young
2020-04-14 17:31 ` James Morse
2020-03-31 3:38 ` Dave Young
2020-04-15 20:29 ` Eric W. Biederman
2020-04-22 12:14 ` James Morse
2020-04-22 13:04 ` Eric W. Biederman
2020-04-22 15:40 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad060c8a-8afe-3858-0a4f-27ff54ef4c68@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=bhe@redhat.com \
--cc=bhsharma@redhat.com \
--cc=catalin.marinas@arm.com \
--cc=ebiederm@xmission.com \
--cc=james.morse@arm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox