From: David Hildenbrand <david@redhat.com>
To: Alexander Egorenkov <egorenar@linux.ibm.com>
Cc: agordeev@linux.ibm.com, akpm@linux-foundation.org,
borntraeger@linux.ibm.com, cohuck@redhat.com, corbet@lwn.net,
eperezma@redhat.com, frankja@linux.ibm.com, gor@linux.ibm.com,
hca@linux.ibm.com, imbrenda@linux.ibm.com, jasowang@redhat.com,
kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-s390@vger.kernel.org, mcasquer@redhat.com, mst@redhat.com,
svens@linux.ibm.com, thuth@redhat.com,
virtualization@lists.linux.dev, xuanzhuo@linux.alibaba.com,
zaslonko@linux.ibm.com
Subject: Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
Date: Wed, 16 Oct 2024 17:54:56 +0200 [thread overview]
Message-ID: <87956f31-472d-4091-8061-1e55fea7a3d7@redhat.com> (raw)
In-Reply-To: <76f4ed45-5a40-4ac4-af24-a40effe7725c@redhat.com>
On 16.10.24 17:47, David Hildenbrand wrote:
>>>
>>> When I wrote that code I was rather convinced that the variant in this patch
>>> is the right thing to do.
>>
>> A short explanation about what a stand-alone kdump is.
>>
>> * First, it's not really a _regular_ kdump activated with kexec-tools and
>> executed by Linux itself but a regular stand-alone dump (SCSI) from the
>> FW's perspective (one has to use HMC or dumpconf to execute it and not
>> with kexec-tools like for the _regular_ kdump).
>
> Ah, that makes sense.
>
>> * One has to reserve crashkernel memory region in the old crashed kernel
>> even if it remains unused until the dump starts.
>> * zipl uses regular kdump kernel and initramfs to create stand-alone
>> dumper images and to write them to a dump disk which is used for
>> IPLIng the stand-alone dumper.
>> * The zipl bootloader takes care of transferring the old kernel memory
>> saved in HSA by the FW to the crashkernel memory region reserved by the old
>> crashed kernel before it enters the dumper. The HSA memory is released
>> by the zipl bootloader _before_ the dumper image is entered,
>> therefore, we cannot use HSA to read old kernel memory, and instead
>> use memory from crashkernel region, just like the regular kdump.
>> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>> the dumper like a regular stand-alone dump (e.g. zfcpdump).
>> * Summarized, zipl bootloader prepares an environment which is expected by
>> the regular kdump for a stand-alone kdump dumper before it is entered.
>
> Thanks for the details!
>
>>
>> In my opinion, the correct version of is_kdump_kernel() would be
>>
>> bool is_kdump_kernel(void)
>> {
>> return oldmem_data.start;
>> }
>>
>> because Linux kernel doesn't differentiate between both the regular
>> and the stand-alone kdump where it matters while performing dumper
>> operations (e.g. reading saved old kernel memory from crashkernel memory region).
>>
>
> Right, but if we consider "/proc/vmcore is available", a better version
> would IMHO be:
>
> bool is_kdump_kernel(void)
> {
> return dump_available();
> }
>
> Because that is mostly (not completely) how is_kdump_kernel() would have
> worked right now *after* we had the elfcorehdr_alloc() during the
> fs_init call.
>
>
>> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
>> is to tell us whether Linux kernel runs in a kdump like environment and not
>> whether the current mode is identical to the proper and true kdump,
>> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
>> is one, regardless how it was started, by kexecing or IPLing
>> from a disk.
>
> Same thinking here.
>
>>
>> The stand-alone kdump has a very special use case which most users will
>> never encounter. And usually, one just takes zfcpdump instead which is
>> more robust and much smaller considering how big kdump initrd can get.
>> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.
>
> Makes sense, so it boils down to either
>
> bool is_kdump_kernel(void)
> {
> return oldmem_data.start;
> }
>
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
> available or
>
> bool is_kdump_kernel(void)
> {
> return dump_available();
> }
>
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
> available. There is the chance of is_kdump_kernel() being "true" if
> "elfcorehdr_alloc()" fails with -ENODEV.
>
>
> You're call :) Thanks!
>
What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:
From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()
s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.
Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().
This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.
Update the documentation above dump_available().
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/include/asm/kexec.h | 4 ++++
arch/s390/kernel/crash_dump.c | 6 ++++++
arch/s390/kernel/smp.c | 16 ++++++++--------
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
void arch_kexec_unprotect_crashkres(void);
#define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
#endif
#ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
int arch_kimage_file_post_load_cleanup(struct kimage *image);
#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
#endif
+
#endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
prot);
}
+bool is_kdump_kernel(void)
+{
+ return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
static const char *nt_name(Elf64_Word type)
{
const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
/*
* Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
* 1) standard zfcp/nvme dump
* condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
* The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
* with sigp stop-and-store-status. The firmware or the boot-loader
* stored the registers of the boot CPU in the absolute lowcore in the
* memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- * or stand-alone kdump for DASD
- * condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
* The state for all CPUs except the boot CPU needs to be collected
* with sigp stop-and-store-status. The kexec code or the boot-loader
* stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- * condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- * This case does not exist for s390 anymore, setup_arch explicitly
- * deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
*/
static bool dump_available(void)
{
--
2.46.1
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-10-16 15:55 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
2024-10-14 18:20 ` Heiko Carstens
2024-10-14 19:26 ` David Hildenbrand
2024-10-15 8:30 ` Heiko Carstens
2024-10-15 8:41 ` David Hildenbrand
2024-10-15 8:53 ` David Hildenbrand
2024-10-15 8:56 ` David Hildenbrand
2024-10-15 10:08 ` Heiko Carstens
2024-10-15 10:40 ` David Hildenbrand
2024-10-16 13:35 ` Alexander Egorenkov
2024-10-16 15:47 ` David Hildenbrand
2024-10-16 15:54 ` David Hildenbrand [this message]
2024-10-21 12:46 ` Alexander Egorenkov
2024-10-21 14:45 ` David Hildenbrand
2024-10-23 7:42 ` Heiko Carstens
2024-10-23 7:45 ` David Hildenbrand
2024-10-23 11:17 ` Alexander Egorenkov
2024-10-14 14:46 ` [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall David Hildenbrand
2024-10-14 18:04 ` Heiko Carstens
2024-10-14 19:35 ` David Hildenbrand
2024-10-15 8:12 ` Heiko Carstens
2024-10-15 8:16 ` David Hildenbrand
2024-10-15 8:21 ` Heiko Carstens
2024-10-15 8:32 ` David Hildenbrand
2024-10-15 8:46 ` Heiko Carstens
2024-10-15 8:48 ` David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 3/7] Documentation: s390-diag.rst: document diag500(STORAGE LIMIT) subfunction David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
2024-10-14 18:43 ` Heiko Carstens
2024-10-14 19:42 ` David Hildenbrand
2024-10-15 15:01 ` Eric Farman
2024-10-15 15:20 ` Heiko Carstens
2024-10-25 10:52 ` David Hildenbrand
2024-10-16 10:37 ` Halil Pasic
2024-10-17 7:36 ` Alexander Gordeev
2024-10-17 8:19 ` David Hildenbrand
2024-10-17 9:53 ` Alexander Gordeev
2024-10-17 10:00 ` David Hildenbrand
2024-10-17 12:07 ` David Hildenbrand
2024-10-17 14:32 ` Alexander Gordeev
2024-10-17 14:36 ` David Hildenbrand
2024-10-30 14:30 ` Alexander Gordeev
2024-10-30 14:33 ` Alexander Gordeev
2024-10-14 14:46 ` [PATCH v2 5/7] virtio-mem: s390 support David Hildenbrand
2024-10-14 18:48 ` Heiko Carstens
2024-10-14 19:16 ` David Hildenbrand
2024-10-15 8:37 ` Heiko Carstens
2024-10-21 6:33 ` Christian Borntraeger
2024-10-21 12:19 ` David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390 David Hildenbrand
2024-10-14 18:53 ` Heiko Carstens
2024-10-14 14:46 ` [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB David Hildenbrand
2024-10-14 17:53 ` Heiko Carstens
2024-10-14 19:47 ` David Hildenbrand
2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
2024-10-14 19:17 ` David Hildenbrand
2024-10-15 7:57 ` Claudio Imbrenda
2024-10-25 10:54 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87956f31-472d-4091-8061-1e55fea7a3d7@redhat.com \
--to=david@redhat.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=corbet@lwn.net \
--cc=egorenar@linux.ibm.com \
--cc=eperezma@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=mcasquer@redhat.com \
--cc=mst@redhat.com \
--cc=svens@linux.ibm.com \
--cc=thuth@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
--cc=zaslonko@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox