* [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390
@ 2024-12-04 12:54 David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 01/12] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
` (12 more replies)
0 siblings, 13 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
The only "different than everything else" thing about virtio-mem on s390
is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
kdump kernel must detect memory ranges of the crashed kernel to
include via PT_LOAD in the vmcore.
On other architectures, all RAM regions (boot + hotplugged) can easily be
observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
the elfcore hdr.
On s390, information about "ordinary" memory (heh, "storage") can be
obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
that information is stored early during boot in the "physmem" memblock
data structure.
But virtio-mem memory is always detected by as device driver, which is
usually build as a module. So in the crash kernel, this memory can only be
properly detected once the virtio-mem driver started up.
The virtio-mem driver already supports the "kdump mode", where it won't
hotplug any memory but instead queries the device to implement the
pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
the vmcore.
With this series, if the virtio-mem driver is included in the kdump
initrd -- which dracut already takes care of under Fedora/RHEL -- it will
now detect the device RAM ranges on s390 once it probes the devices, to add
them to the vmcore using the same callback mechanism we already have for
pfn_is_ram().
To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
add new PT_LOAD entries that describe these memory ranges, and update
all offsets vmcore size so it is all consistent.
My testing when creating+analyzing crash dumps with hotplugged virtio-mem
memory (incl. holes) did not reveal any surprises.
Patch #1 -- #7 are vmcore preparations and cleanups
Patch #8 adds the infrastructure for drivers to report device RAM
Patch #9 + #10 are virtio-mem preparations
Patch #11 implements virtio-mem support to report device RAM
Patch #12 activates it for s390, implementing a new function to fill
PT_LOAD entry for device RAM
v1 -> v2:
* "fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex"
-> Extend patch description
* "fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex"
-> Extend patch description
* "fs/proc/vmcore: disallow vmcore modifications while the vmcore is open"
-> Disallow modifications only if it is currently open, but warn if it
was already open and got closed again.
-> Track vmcore_open vs. vmcore_opened
-> Extend patch description
* "fs/proc/vmcore: prefix all pr_* with "vmcore:""
-> Added
* "fs/proc/vmcore: move vmcore definitions out if kcore.h"
-> Call it "vmcore_range"
-> Place vmcoredd_node into vmcore.c
-> Adjust patch subject + description
* "fs/proc/vmcore: factor out allocating a vmcore range and adding it to a
list"
-> Adjust to "vmcore_range"
* "fs/proc/vmcore: factor out freeing a list of vmcore ranges"
-> Adjust to "vmcore_range"
* "fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
ranges in 2nd kernel"
-> Drop PROVIDE_PROC_VMCORE_DEVICE_RAM for now
-> Simplify Kconfig a bit
-> Drop "Kdump:" from warnings/errors
-> Perform Elf64 check first
-> Add regions also if the vmcore was opened, but got closed again. But
warn in any case, because it is unexpected.
-> Adjust patch description
* "virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM"
-> "depends on VIRTIO_MEM" for PROC_VMCORE_DEVICE_RAM
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand (12):
fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
fs/proc/vmcore: prefix all pr_* with "vmcore:"
fs/proc/vmcore: move vmcore definitions out of kcore.h
fs/proc/vmcore: factor out allocating a vmcore range and adding it to
a list
fs/proc/vmcore: factor out freeing a list of vmcore ranges
fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
ranges in 2nd kernel
virtio-mem: mark device ready before registering callbacks in kdump
mode
virtio-mem: remember usable region size
virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
arch/s390/Kconfig | 1 +
arch/s390/kernel/crash_dump.c | 39 ++++-
drivers/virtio/virtio_mem.c | 103 ++++++++++++-
fs/proc/Kconfig | 19 +++
fs/proc/vmcore.c | 283 ++++++++++++++++++++++++++--------
include/linux/crash_dump.h | 41 +++++
include/linux/kcore.h | 13 --
7 files changed, 407 insertions(+), 92 deletions(-)
base-commit: feffde684ac29a3b7aec82d2df850fbdbdee55e4
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 01/12] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 02/12] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
` (11 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
We want to protect vmcore modifications from concurrent opening of
the vmcore, and also serialize vmcore modification.
(a) We can currently modify the vmcore after it was opened. This can happen
if a vmcoredd is added after the vmcore module was initialized and
already opened by user space. We want to fix that and prepare for
new code wanting to serialize against concurrent opening.
(b) To handle it cleanly we need to protect the modifications against
concurrent opening. As the modifications end up allocating memory and
can sleep, we cannot rely on the spinlock.
Let's convert the spinlock into a mutex to prepare for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b4521b096058..586f84677d2f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -62,7 +62,8 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0);
/* Device Dump Size */
static size_t vmcoredd_orig_sz;
-static DEFINE_SPINLOCK(vmcore_cb_lock);
+static DEFINE_MUTEX(vmcore_mutex);
+
DEFINE_STATIC_SRCU(vmcore_cb_srcu);
/* List of registered vmcore callbacks. */
static LIST_HEAD(vmcore_cb_list);
@@ -72,7 +73,7 @@ static bool vmcore_opened;
void register_vmcore_cb(struct vmcore_cb *cb)
{
INIT_LIST_HEAD(&cb->next);
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
list_add_tail(&cb->next, &vmcore_cb_list);
/*
* Registering a vmcore callback after the vmcore was opened is
@@ -80,13 +81,13 @@ void register_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback registration\n");
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
}
EXPORT_SYMBOL_GPL(register_vmcore_cb);
void unregister_vmcore_cb(struct vmcore_cb *cb)
{
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
list_del_rcu(&cb->next);
/*
* Unregistering a vmcore callback after the vmcore was opened is
@@ -95,7 +96,7 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback unregistration\n");
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
synchronize_srcu(&vmcore_cb_srcu);
}
@@ -120,9 +121,9 @@ static bool pfn_is_ram(unsigned long pfn)
static int open_vmcore(struct inode *inode, struct file *file)
{
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
vmcore_opened = true;
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
return 0;
}
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 02/12] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 01/12] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 03/12] fs/proc/vmcore: disallow vmcore modifications while the vmcore is open David Hildenbrand
` (10 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Now that we have a mutex that synchronizes against opening of the vmcore,
let's use that one to replace vmcoredd_mutex: there is no need to have
two separate ones.
This is a preparation for properly preventing vmcore modifications
after the vmcore was opened.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 586f84677d2f..e5a7e302f91f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -53,7 +53,6 @@ static struct proc_dir_entry *proc_vmcore;
#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
/* Device Dump list and mutex to synchronize access to list */
static LIST_HEAD(vmcoredd_list);
-static DEFINE_MUTEX(vmcoredd_mutex);
static bool vmcoredd_disabled;
core_param(novmcoredd, vmcoredd_disabled, bool, 0);
@@ -248,7 +247,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
size_t tsz;
char *buf;
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
@@ -269,7 +268,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
}
out_unlock:
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
return ret;
}
@@ -283,7 +282,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
size_t tsz;
char *buf;
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
@@ -306,7 +305,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
}
out_unlock:
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
return ret;
}
#endif /* CONFIG_MMU */
@@ -1518,9 +1517,9 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
dump->size = data_size;
/* Add the dump to driver sysfs list */
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_add_tail(&dump->list, &vmcoredd_list);
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
vmcoredd_update_size(data_size);
return 0;
@@ -1538,7 +1537,7 @@ EXPORT_SYMBOL(vmcore_add_device_dump);
static void vmcore_free_device_dumps(void)
{
#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
while (!list_empty(&vmcoredd_list)) {
struct vmcoredd_node *dump;
@@ -1548,7 +1547,7 @@ static void vmcore_free_device_dumps(void)
vfree(dump->buf);
vfree(dump);
}
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
}
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 03/12] fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 01/12] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 02/12] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 04/12] fs/proc/vmcore: prefix all pr_* with "vmcore:" David Hildenbrand
` (9 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
The vmcoredd_update_size() call and its effects (size/offset changes) are
currently completely unsynchronized, and will cause trouble when
performed concurrently, or when done while someone is already reading the
vmcore.
Let's protect all vmcore modifications by the vmcore_mutex, disallow vmcore
modifications while the vmcore is open, and warn on vmcore
modifications after the vmcore was already opened once: modifications
while the vmcore is open are unsafe, and modifications after the vmcore
was opened indicates trouble. Properly synchronize against concurrent
opening of the vmcore.
No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.
It's worth noting that modifications after the vmcore was opened are
completely unexpected, so failing if open, and warning if already opened
(+closed again) is good enough.
This change not only handles concurrent adding of device dumps +
concurrent reading of the vmcore properly, it also prepares for other
mechanisms that will modify the vmcore.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 57 +++++++++++++++++++++++++++++-------------------
1 file changed, 34 insertions(+), 23 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e5a7e302f91f..16faabe5ea30 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -68,6 +68,8 @@ DEFINE_STATIC_SRCU(vmcore_cb_srcu);
static LIST_HEAD(vmcore_cb_list);
/* Whether the vmcore has been opened once. */
static bool vmcore_opened;
+/* Whether the vmcore is currently open. */
+static unsigned int vmcore_open;
void register_vmcore_cb(struct vmcore_cb *cb)
{
@@ -122,6 +124,20 @@ static int open_vmcore(struct inode *inode, struct file *file)
{
mutex_lock(&vmcore_mutex);
vmcore_opened = true;
+ if (vmcore_open + 1 == 0) {
+ mutex_unlock(&vmcore_mutex);
+ return -EBUSY;
+ }
+ vmcore_open++;
+ mutex_unlock(&vmcore_mutex);
+
+ return 0;
+}
+
+static int release_vmcore(struct inode *inode, struct file *file)
+{
+ mutex_lock(&vmcore_mutex);
+ vmcore_open--;
mutex_unlock(&vmcore_mutex);
return 0;
@@ -243,33 +259,27 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
{
struct vmcoredd_node *dump;
u64 offset = 0;
- int ret = 0;
size_t tsz;
char *buf;
- mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
buf = dump->buf + start - offset;
- if (copy_to_iter(buf, tsz, iter) < tsz) {
- ret = -EFAULT;
- goto out_unlock;
- }
+ if (copy_to_iter(buf, tsz, iter) < tsz)
+ return -EFAULT;
size -= tsz;
start += tsz;
/* Leave now if buffer filled already */
if (!size)
- goto out_unlock;
+ return 0;
}
offset += dump->size;
}
-out_unlock:
- mutex_unlock(&vmcore_mutex);
- return ret;
+ return 0;
}
#ifdef CONFIG_MMU
@@ -278,20 +288,16 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
{
struct vmcoredd_node *dump;
u64 offset = 0;
- int ret = 0;
size_t tsz;
char *buf;
- mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
buf = dump->buf + start - offset;
if (remap_vmalloc_range_partial(vma, dst, buf, 0,
- tsz)) {
- ret = -EFAULT;
- goto out_unlock;
- }
+ tsz))
+ return -EFAULT;
size -= tsz;
start += tsz;
@@ -299,14 +305,12 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
/* Leave now if buffer filled already */
if (!size)
- goto out_unlock;
+ return 0;
}
offset += dump->size;
}
-out_unlock:
- mutex_unlock(&vmcore_mutex);
- return ret;
+ return 0;
}
#endif /* CONFIG_MMU */
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
@@ -691,6 +695,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
static const struct proc_ops vmcore_proc_ops = {
.proc_open = open_vmcore,
+ .proc_release = release_vmcore,
.proc_read_iter = read_vmcore,
.proc_lseek = default_llseek,
.proc_mmap = mmap_vmcore,
@@ -1516,12 +1521,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
dump->buf = buf;
dump->size = data_size;
- /* Add the dump to driver sysfs list */
+ /* Add the dump to driver sysfs list and update the elfcore hdr */
mutex_lock(&vmcore_mutex);
- list_add_tail(&dump->list, &vmcoredd_list);
- mutex_unlock(&vmcore_mutex);
+ if (vmcore_opened)
+ pr_warn_once("Unexpected adding of device dump\n");
+ if (vmcore_open) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+ list_add_tail(&dump->list, &vmcoredd_list);
vmcoredd_update_size(data_size);
+ mutex_unlock(&vmcore_mutex);
return 0;
out_err:
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 04/12] fs/proc/vmcore: prefix all pr_* with "vmcore:"
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (2 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 03/12] fs/proc/vmcore: disallow vmcore modifications while the vmcore is open David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 05/12] fs/proc/vmcore: move vmcore definitions out of kcore.h David Hildenbrand
` (8 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's use "vmcore: " as a prefix, converting the single "Kdump:
vmcore not initialized" one to effectively be "vmcore: not initialized".
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 16faabe5ea30..13dfc128d07e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -8,6 +8,8 @@
*
*/
+#define pr_fmt(fmt) "vmcore: " fmt
+
#include <linux/mm.h>
#include <linux/kcore.h>
#include <linux/user.h>
@@ -1580,7 +1582,7 @@ static int __init vmcore_init(void)
rc = parse_crash_elf_headers();
if (rc) {
elfcorehdr_free(elfcorehdr_addr);
- pr_warn("Kdump: vmcore not initialized\n");
+ pr_warn("not initialized\n");
return rc;
}
elfcorehdr_free(elfcorehdr_addr);
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 05/12] fs/proc/vmcore: move vmcore definitions out of kcore.h
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (3 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 04/12] fs/proc/vmcore: prefix all pr_* with "vmcore:" David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 06/12] fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list David Hildenbrand
` (7 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
These vmcore defines are not related to /proc/kcore, move them out.
We'll move "struct vmcoredd_node" to vmcore.c, because it is only used
internally. While "struct vmcore" is only used internally for now,
we're planning on using it from inline functions in crash_dump.h next,
so move it to crash_dump.h.
While at it, rename "struct vmcore" to "struct vmcore_range", which is a
more suitable name and will make the usage of it outside of vmcore.c
clearer.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 26 ++++++++++++++++----------
include/linux/crash_dump.h | 7 +++++++
include/linux/kcore.h | 13 -------------
3 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 13dfc128d07e..8d262017ca11 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -53,6 +53,12 @@ static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore;
#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+struct vmcoredd_node {
+ struct list_head list; /* List of dumps */
+ void *buf; /* Buffer containing device's dump */
+ unsigned int size; /* Size of the buffer */
+};
+
/* Device Dump list and mutex to synchronize access to list */
static LIST_HEAD(vmcoredd_list);
@@ -322,10 +328,10 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
*/
static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
{
+ struct vmcore_range *m = NULL;
ssize_t acc = 0, tmp;
size_t tsz;
u64 start;
- struct vmcore *m = NULL;
if (!iov_iter_count(iter) || *fpos >= vmcore_size)
return 0;
@@ -580,7 +586,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
{
size_t size = vma->vm_end - vma->vm_start;
u64 start, end, len, tsz;
- struct vmcore *m;
+ struct vmcore_range *m;
start = (u64)vma->vm_pgoff << PAGE_SHIFT;
end = start + size;
@@ -703,16 +709,16 @@ static const struct proc_ops vmcore_proc_ops = {
.proc_mmap = mmap_vmcore,
};
-static struct vmcore* __init get_new_element(void)
+static struct vmcore_range * __init get_new_element(void)
{
- return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
+ return kzalloc(sizeof(struct vmcore_range), GFP_KERNEL);
}
static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
struct list_head *vc_list)
{
+ struct vmcore_range *m;
u64 size;
- struct vmcore *m;
size = elfsz + elfnotesegsz;
list_for_each_entry(m, vc_list, list) {
@@ -1110,11 +1116,11 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_range *new;
int i;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;
ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
@@ -1153,11 +1159,11 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_range *new;
int i;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;
ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
@@ -1195,8 +1201,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_range *m;
loff_t vmcore_off;
- struct vmcore *m;
/* Skip ELF header, program headers and ELF note segment. */
vmcore_off = elfsz + elfnotes_sz;
@@ -1605,9 +1611,9 @@ void vmcore_cleanup(void)
/* clear the vmcore list. */
while (!list_empty(&vmcore_list)) {
- struct vmcore *m;
+ struct vmcore_range *m;
- m = list_first_entry(&vmcore_list, struct vmcore, list);
+ m = list_first_entry(&vmcore_list, struct vmcore_range, list);
list_del(&m->list);
kfree(m);
}
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index acc55626afdc..788a45061f35 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -114,6 +114,13 @@ struct vmcore_cb {
extern void register_vmcore_cb(struct vmcore_cb *cb);
extern void unregister_vmcore_cb(struct vmcore_cb *cb);
+struct vmcore_range {
+ struct list_head list;
+ unsigned long long paddr;
+ unsigned long long size;
+ loff_t offset;
+};
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 86c0f1d18998..9a2fa013c91d 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -20,19 +20,6 @@ struct kcore_list {
int type;
};
-struct vmcore {
- struct list_head list;
- unsigned long long paddr;
- unsigned long long size;
- loff_t offset;
-};
-
-struct vmcoredd_node {
- struct list_head list; /* List of dumps */
- void *buf; /* Buffer containing device's dump */
- unsigned int size; /* Size of the buffer */
-};
-
#ifdef CONFIG_PROC_KCORE
void __init kclist_add(struct kcore_list *, void *, size_t, int type);
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 06/12] fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (4 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 05/12] fs/proc/vmcore: move vmcore definitions out of kcore.h David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 07/12] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
` (6 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 21 ++-------------------
include/linux/crash_dump.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 8d262017ca11..9b72e255dd03 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -709,11 +709,6 @@ static const struct proc_ops vmcore_proc_ops = {
.proc_mmap = mmap_vmcore,
};
-static struct vmcore_range * __init get_new_element(void)
-{
- return kzalloc(sizeof(struct vmcore_range), GFP_KERNEL);
-}
-
static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
struct list_head *vc_list)
{
@@ -1116,7 +1111,6 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
- struct vmcore_range *new;
int i;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr *phdr_ptr;
@@ -1139,13 +1133,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
size = end - start;
- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
+ if (vmcore_alloc_add_range(vc_list, start, size))
return -ENOMEM;
- new->paddr = start;
- new->size = size;
- list_add_tail(&new->list, vc_list);
/* Update the program header offset. */
phdr_ptr->p_offset = vmcore_off + (paddr - start);
@@ -1159,7 +1148,6 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
- struct vmcore_range *new;
int i;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr *phdr_ptr;
@@ -1182,13 +1170,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
size = end - start;
- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
+ if (vmcore_alloc_add_range(vc_list, start, size))
return -ENOMEM;
- new->paddr = start;
- new->size = size;
- list_add_tail(&new->list, vc_list);
/* Update the program header offset */
phdr_ptr->p_offset = vmcore_off + (paddr - start);
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 788a45061f35..9717912ce4d1 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -121,6 +121,20 @@ struct vmcore_range {
loff_t offset;
};
+/* Allocate a vmcore range and add it to the list. */
+static inline int vmcore_alloc_add_range(struct list_head *list,
+ unsigned long long paddr, unsigned long long size)
+{
+ struct vmcore_range *m = kzalloc(sizeof(*m), GFP_KERNEL);
+
+ if (!m)
+ return -ENOMEM;
+ m->paddr = paddr;
+ m->size = size;
+ list_add_tail(&m->list, list);
+ return 0;
+}
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 07/12] fs/proc/vmcore: factor out freeing a list of vmcore ranges
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (5 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 06/12] fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 08/12] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
` (5 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 9 +--------
include/linux/crash_dump.h | 11 +++++++++++
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9b72e255dd03..e7b3cde44890 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -1592,14 +1592,7 @@ void vmcore_cleanup(void)
proc_vmcore = NULL;
}
- /* clear the vmcore list. */
- while (!list_empty(&vmcore_list)) {
- struct vmcore_range *m;
-
- m = list_first_entry(&vmcore_list, struct vmcore_range, list);
- list_del(&m->list);
- kfree(m);
- }
+ vmcore_free_ranges(&vmcore_list);
free_elfcorebuf();
/* clear vmcore device dump list */
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 9717912ce4d1..5d61c7454fd6 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -135,6 +135,17 @@ static inline int vmcore_alloc_add_range(struct list_head *list,
return 0;
}
+/* Free a list of vmcore ranges. */
+static inline void vmcore_free_ranges(struct list_head *list)
+{
+ struct vmcore_range *m, *tmp;
+
+ list_for_each_entry_safe(m, tmp, list, list) {
+ list_del(&m->list);
+ kfree(m);
+ }
+}
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 08/12] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (6 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 07/12] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 09/12] virtio-mem: mark device ready before registering callbacks in kdump mode David Hildenbrand
` (4 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
the crashed kernel.
RAM provided by memory devices such as virtio-mem can only be detected
using the device driver; when vmcore_init() is called, these device
drivers are usually not loaded yet, or the devices did not get probed
yet. Consequently, on s390 these RAM ranges will not be included in
the crash dump, which makes the dump partially corrupt and is
unfortunate.
Instead of deferring the vmcore_init() call, to an (unclear?) later point,
let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
the device drivers probe the device and get access to this information.
Then, we'll add these ranges to the vmcore, adding more PT_LOAD
entries and updating the offsets+vmcore size.
Use a separate Kconfig option to be set by an architecture to include this
code only if the arch really needs it. Further, we'll make the config
depend on the relevant drivers (i.e., virtio_mem) once they implement
support (next). The alternative of having a PROVIDE_PROC_VMCORE_DEVICE_RAM
config option was dropped for now for simplicity.
The current target use case is s390, which only creates an elf64
elfcore, so focusing on elf64 is sufficient.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/Kconfig | 18 +++++
fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
include/linux/crash_dump.h | 9 +++
3 files changed, 183 insertions(+)
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index d80a1431ef7b..5668620ab34d 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -61,6 +61,24 @@ config PROC_VMCORE_DEVICE_DUMP
as ELF notes to /proc/vmcore. You can still disable device
dump using the kernel command line option 'novmcoredd'.
+config NEED_PROC_VMCORE_DEVICE_RAM
+ bool
+
+config PROC_VMCORE_DEVICE_RAM
+ def_bool y
+ depends on PROC_VMCORE && NEED_PROC_VMCORE_DEVICE_RAM
+ help
+ If the elfcore hdr is allocated and prepared by the dump kernel
+ ("2nd kernel") instead of the crashed kernel, RAM provided by memory
+ devices such as virtio-mem will not be included in the dump
+ image, because only the device driver can properly detect them.
+
+ With this config enabled, these RAM ranges will be queried from the
+ device drivers once the device gets probed, so they can be included
+ in the crash dump.
+
+ Relevant architectures should select NEED_PROC_VMCORE_DEVICE_RAM.
+
config PROC_SYSCTL
bool "Sysctl support (/proc/sys)" if EXPERT
depends on PROC_FS
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e7b3cde44890..6dc293d0eb5b 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -79,6 +79,8 @@ static bool vmcore_opened;
/* Whether the vmcore is currently open. */
static unsigned int vmcore_open;
+static void vmcore_process_device_ram(struct vmcore_cb *cb);
+
void register_vmcore_cb(struct vmcore_cb *cb)
{
INIT_LIST_HEAD(&cb->next);
@@ -90,6 +92,8 @@ void register_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback registration\n");
+ if (!vmcore_open && cb->get_device_ram)
+ vmcore_process_device_ram(cb);
mutex_unlock(&vmcore_mutex);
}
EXPORT_SYMBOL_GPL(register_vmcore_cb);
@@ -1535,6 +1539,158 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
EXPORT_SYMBOL(vmcore_add_device_dump);
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+static int vmcore_realloc_elfcore_buffer_elf64(size_t new_size)
+{
+ char *elfcorebuf_new;
+
+ if (WARN_ON_ONCE(new_size < elfcorebuf_sz))
+ return -EINVAL;
+ if (get_order(elfcorebuf_sz_orig) == get_order(new_size)) {
+ elfcorebuf_sz_orig = new_size;
+ return 0;
+ }
+
+ elfcorebuf_new = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(new_size));
+ if (!elfcorebuf_new)
+ return -ENOMEM;
+ memcpy(elfcorebuf_new, elfcorebuf, elfcorebuf_sz);
+ free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
+ elfcorebuf = elfcorebuf_new;
+ elfcorebuf_sz_orig = new_size;
+ return 0;
+}
+
+static void vmcore_reset_offsets_elf64(void)
+{
+ Elf64_Phdr *phdr_start = (Elf64_Phdr *)(elfcorebuf + sizeof(Elf64_Ehdr));
+ loff_t vmcore_off = elfcorebuf_sz + elfnotes_sz;
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfcorebuf;
+ Elf64_Phdr *phdr;
+ int i;
+
+ for (i = 0, phdr = phdr_start; i < ehdr->e_phnum; i++, phdr++) {
+ u64 start, end;
+
+ /*
+ * After merge_note_headers_elf64() we should only have a single
+ * PT_NOTE entry that starts immediately after elfcorebuf_sz.
+ */
+ if (phdr->p_type == PT_NOTE) {
+ phdr->p_offset = elfcorebuf_sz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz, PAGE_SIZE);
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off = vmcore_off + end - start;
+ }
+ set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
+}
+
+static int vmcore_add_device_ram_elf64(struct list_head *list, size_t count)
+{
+ Elf64_Phdr *phdr_start = (Elf64_Phdr *)(elfcorebuf + sizeof(Elf64_Ehdr));
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfcorebuf;
+ struct vmcore_range *cur;
+ Elf64_Phdr *phdr;
+ size_t new_size;
+ int rc;
+
+ if ((Elf32_Half)(ehdr->e_phnum + count) != ehdr->e_phnum + count) {
+ pr_err("too many device ram ranges\n");
+ return -ENOSPC;
+ }
+
+ /* elfcorebuf_sz must always cover full pages. */
+ new_size = sizeof(Elf64_Ehdr) +
+ (ehdr->e_phnum + count) * sizeof(Elf64_Phdr);
+ new_size = roundup(new_size, PAGE_SIZE);
+
+ /*
+ * Make sure we have sufficient space to include the new PT_LOAD
+ * entries.
+ */
+ rc = vmcore_realloc_elfcore_buffer_elf64(new_size);
+ if (rc) {
+ pr_err("resizing elfcore failed\n");
+ return rc;
+ }
+
+ /* Modify our used elfcore buffer size to cover the new entries. */
+ elfcorebuf_sz = new_size;
+
+ /* Fill the added PT_LOAD entries. */
+ phdr = phdr_start + ehdr->e_phnum;
+ list_for_each_entry(cur, list, list) {
+ WARN_ON_ONCE(!IS_ALIGNED(cur->paddr | cur->size, PAGE_SIZE));
+ elfcorehdr_fill_device_ram_ptload_elf64(phdr, cur->paddr, cur->size);
+
+ /* p_offset will be adjusted later. */
+ phdr++;
+ ehdr->e_phnum++;
+ }
+ list_splice_tail(list, &vmcore_list);
+
+ /* We changed elfcorebuf_sz and added new entries; reset all offsets. */
+ vmcore_reset_offsets_elf64();
+
+ /* Finally, recalculate the total vmcore size. */
+ vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+ &vmcore_list);
+ proc_vmcore->size = vmcore_size;
+ return 0;
+}
+
+static void vmcore_process_device_ram(struct vmcore_cb *cb)
+{
+ unsigned char *e_ident = (unsigned char *)elfcorebuf;
+ struct vmcore_range *first, *m;
+ LIST_HEAD(list);
+ int count;
+
+ /* We only support Elf64 dumps for now. */
+ if (WARN_ON_ONCE(e_ident[EI_CLASS] != ELFCLASS64)) {
+ pr_err("device ram ranges only support Elf64\n");
+ return;
+ }
+
+ if (cb->get_device_ram(cb, &list)) {
+ pr_err("obtaining device ram ranges failed\n");
+ return;
+ }
+ count = list_count_nodes(&list);
+ if (!count)
+ return;
+
+ /*
+ * For some reason these ranges are already know? Might happen
+ * with unusual register->unregister->register sequences; we'll simply
+ * sanity check using the first range.
+ */
+ first = list_first_entry(&list, struct vmcore_range, list);
+ list_for_each_entry(m, &vmcore_list, list) {
+ unsigned long long m_end = m->paddr + m->size;
+ unsigned long long first_end = first->paddr + first->size;
+
+ if (first->paddr < m_end && m->paddr < first_end)
+ goto out_free;
+ }
+
+ /* If adding the mem nodes succeeds, they must not be freed. */
+ if (!vmcore_add_device_ram_elf64(&list, count))
+ return;
+out_free:
+ vmcore_free_ranges(&list);
+}
+#else /* !CONFIG_PROC_VMCORE_DEVICE_RAM */
+static void vmcore_process_device_ram(struct vmcore_cb *cb)
+{
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
+
/* Free all dumps in vmcore device dump list */
static void vmcore_free_device_dumps(void)
{
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 5d61c7454fd6..2f2555e6407c 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -20,6 +20,8 @@ extern int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size);
extern void elfcorehdr_free(unsigned long long addr);
extern ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos);
extern ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+void elfcorehdr_fill_device_ram_ptload_elf64(Elf64_Phdr *phdr,
+ unsigned long long paddr, unsigned long long size);
extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
unsigned long from, unsigned long pfn,
unsigned long size, pgprot_t prot);
@@ -99,6 +101,12 @@ static inline void vmcore_unusable(void)
* indicated in the vmcore instead. For example, a ballooned page
* contains no data and reading from such a page will cause high
* load in the hypervisor.
+ * @get_device_ram: query RAM ranges that can only be detected by device
+ * drivers, such as the virtio-mem driver, so they can be included in
+ * the crash dump on architectures that allocate the elfcore hdr in the dump
+ * ("2nd") kernel. Indicated RAM ranges may contain holes to reduce the
+ * total number of ranges; such holes can be detected using the pfn_is_ram
+ * callback just like for other RAM.
* @next: List head to manage registered callbacks internally; initialized by
* register_vmcore_cb().
*
@@ -109,6 +117,7 @@ static inline void vmcore_unusable(void)
*/
struct vmcore_cb {
bool (*pfn_is_ram)(struct vmcore_cb *cb, unsigned long pfn);
+ int (*get_device_ram)(struct vmcore_cb *cb, struct list_head *list);
struct list_head next;
};
extern void register_vmcore_cb(struct vmcore_cb *cb);
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 09/12] virtio-mem: mark device ready before registering callbacks in kdump mode
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (7 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 08/12] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 10/12] virtio-mem: remember usable region size David Hildenbrand
` (3 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
After the callbacks are registered we may immediately get a callback. So
mark the device ready before registering the callbacks.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index b0b871441578..126f1d669bb0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2648,6 +2648,7 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
if (rc)
goto out_unreg_pm;
+ virtio_device_ready(vm->vdev);
return 0;
out_unreg_pm:
unregister_pm_notifier(&vm->pm_notifier);
@@ -2729,6 +2730,8 @@ static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb,
static int virtio_mem_init_kdump(struct virtio_mem *vm)
{
+ /* We must be prepared to receive a callback immediately. */
+ virtio_device_ready(vm->vdev);
#ifdef CONFIG_PROC_VMCORE
dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n");
vm->vmcore_cb.pfn_is_ram = virtio_mem_vmcore_pfn_is_ram;
@@ -2870,8 +2873,6 @@ static int virtio_mem_probe(struct virtio_device *vdev)
if (rc)
goto out_del_vq;
- virtio_device_ready(vdev);
-
/* trigger a config update to start processing the requested_size */
if (!vm->in_kdump) {
atomic_set(&vm->config_changed, 1);
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 10/12] virtio-mem: remember usable region size
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (8 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 09/12] virtio-mem: mark device ready before registering callbacks in kdump mode David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 11/12] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM David Hildenbrand
` (2 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's remember the usable region size, which will be helpful in kdump
mode next.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 126f1d669bb0..73477d5b79cf 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -133,6 +133,8 @@ struct virtio_mem {
uint64_t addr;
/* Maximum region size in bytes. */
uint64_t region_size;
+ /* Usable region size in bytes. */
+ uint64_t usable_region_size;
/* The parent resource for all memory added via this device. */
struct resource *parent_resource;
@@ -2368,7 +2370,7 @@ static int virtio_mem_cleanup_pending_mb(struct virtio_mem *vm)
static void virtio_mem_refresh_config(struct virtio_mem *vm)
{
const struct range pluggable_range = mhp_get_pluggable_range(true);
- uint64_t new_plugged_size, usable_region_size, end_addr;
+ uint64_t new_plugged_size, end_addr;
/* the plugged_size is just a reflection of what _we_ did previously */
virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size,
@@ -2378,8 +2380,8 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
/* calculate the last usable memory block id */
virtio_cread_le(vm->vdev, struct virtio_mem_config,
- usable_region_size, &usable_region_size);
- end_addr = min(vm->addr + usable_region_size - 1,
+ usable_region_size, &vm->usable_region_size);
+ end_addr = min(vm->addr + vm->usable_region_size - 1,
pluggable_range.end);
if (vm->in_sbm) {
@@ -2763,6 +2765,8 @@ static int virtio_mem_init(struct virtio_mem *vm)
virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, &vm->addr);
virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
&vm->region_size);
+ virtio_cread_le(vm->vdev, struct virtio_mem_config, usable_region_size,
+ &vm->usable_region_size);
/* Determine the nid for the device based on the lowest address. */
if (vm->nid == NUMA_NO_NODE)
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 11/12] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (9 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 10/12] virtio-mem: remember usable region size David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 12/12] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) David Hildenbrand
2025-01-08 12:04 ` [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 Michael S. Tsirkin
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's implement the get_device_ram() vmcore callback, so
architectures that select NEED_PROC_VMCORE_NEED_DEVICE_RAM, like s390
soon, can include that memory in a crash dump.
Merge ranges, and process ranges that might contain a mixture of plugged
and unplugged, to reduce the total number of ranges.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 88 +++++++++++++++++++++++++++++++++++++
fs/proc/Kconfig | 1 +
2 files changed, 89 insertions(+)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 73477d5b79cf..8a294b9cbcf6 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2728,6 +2728,91 @@ static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb,
mutex_unlock(&vm->hotplug_mutex);
return is_ram;
}
+
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+static int virtio_mem_vmcore_add_device_ram(struct virtio_mem *vm,
+ struct list_head *list, uint64_t start, uint64_t end)
+{
+ int rc;
+
+ rc = vmcore_alloc_add_range(list, start, end - start);
+ if (rc)
+ dev_err(&vm->vdev->dev,
+ "Error adding device RAM range: %d\n", rc);
+ return rc;
+}
+
+static int virtio_mem_vmcore_get_device_ram(struct vmcore_cb *cb,
+ struct list_head *list)
+{
+ struct virtio_mem *vm = container_of(cb, struct virtio_mem,
+ vmcore_cb);
+ const uint64_t device_start = vm->addr;
+ const uint64_t device_end = vm->addr + vm->usable_region_size;
+ uint64_t chunk_size, cur_start, cur_end, plugged_range_start = 0;
+ LIST_HEAD(tmp_list);
+ int rc;
+
+ if (!vm->plugged_size)
+ return 0;
+
+ /* Process memory sections, unless the device block size is bigger. */
+ chunk_size = max_t(uint64_t, PFN_PHYS(PAGES_PER_SECTION),
+ vm->device_block_size);
+
+ mutex_lock(&vm->hotplug_mutex);
+
+ /*
+ * We process larger chunks and indicate the complete chunk if any
+ * block in there is plugged. This reduces the number of pfn_is_ram()
+ * callbacks and mimic what is effectively being done when the old
+ * kernel would add complete memory sections/blocks to the elfcore hdr.
+ */
+ cur_start = device_start;
+ for (cur_start = device_start; cur_start < device_end; cur_start = cur_end) {
+ cur_end = ALIGN_DOWN(cur_start + chunk_size, chunk_size);
+ cur_end = min_t(uint64_t, cur_end, device_end);
+
+ rc = virtio_mem_send_state_request(vm, cur_start,
+ cur_end - cur_start);
+
+ if (rc < 0) {
+ dev_err(&vm->vdev->dev,
+ "Error querying block states: %d\n", rc);
+ goto out;
+ } else if (rc != VIRTIO_MEM_STATE_UNPLUGGED) {
+ /* Merge ranges with plugged memory. */
+ if (!plugged_range_start)
+ plugged_range_start = cur_start;
+ continue;
+ }
+
+ /* Flush any plugged range. */
+ if (plugged_range_start) {
+ rc = virtio_mem_vmcore_add_device_ram(vm, &tmp_list,
+ plugged_range_start,
+ cur_start);
+ if (rc)
+ goto out;
+ plugged_range_start = 0;
+ }
+ }
+
+ /* Flush any plugged range. */
+ if (plugged_range_start)
+ rc = virtio_mem_vmcore_add_device_ram(vm, &tmp_list,
+ plugged_range_start,
+ cur_start);
+out:
+ mutex_unlock(&vm->hotplug_mutex);
+ if (rc < 0) {
+ vmcore_free_ranges(&tmp_list);
+ return rc;
+ }
+ list_splice_tail(&tmp_list, list);
+ return 0;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
#endif /* CONFIG_PROC_VMCORE */
static int virtio_mem_init_kdump(struct virtio_mem *vm)
@@ -2737,6 +2822,9 @@ static int virtio_mem_init_kdump(struct virtio_mem *vm)
#ifdef CONFIG_PROC_VMCORE
dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n");
vm->vmcore_cb.pfn_is_ram = virtio_mem_vmcore_pfn_is_ram;
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+ vm->vmcore_cb.get_device_ram = virtio_mem_vmcore_get_device_ram;
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
register_vmcore_cb(&vm->vmcore_cb);
return 0;
#else /* CONFIG_PROC_VMCORE */
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 5668620ab34d..6ae966c561e7 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -67,6 +67,7 @@ config NEED_PROC_VMCORE_DEVICE_RAM
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE && NEED_PROC_VMCORE_DEVICE_RAM
+ depends on VIRTIO_MEM
help
If the elfcore hdr is allocated and prepared by the dump kernel
("2nd kernel") instead of the crashed kernel, RAM provided by memory
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 12/12] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (10 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 11/12] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM David Hildenbrand
@ 2024-12-04 12:54 ` David Hildenbrand
2025-01-08 12:04 ` [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 Michael S. Tsirkin
12 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-12-04 12:54 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's add support for including virtio-mem device RAM in the crash dump,
setting NEED_PROC_VMCORE_DEVICE_RAM, and implementing
elfcorehdr_fill_device_ram_ptload_elf64().
To avoid code duplication, factor out the code to fill a PT_LOAD entry.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/Kconfig | 1 +
arch/s390/kernel/crash_dump.c | 39 ++++++++++++++++++++++++++++-------
2 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 0077969170e8..c230bad7f5cc 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -240,6 +240,7 @@ config S390
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE if PCI
select NEED_PER_CPU_EMBED_FIRST_CHUNK
+ select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
select NEED_SG_DMA_LENGTH if PCI
select OLD_SIGACTION
select OLD_SIGSUSPEND3
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index cd0c93a8fb8b..f699df2a2b11 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -508,6 +508,19 @@ static int get_mem_chunk_cnt(void)
return cnt;
}
+static void fill_ptload(Elf64_Phdr *phdr, unsigned long paddr,
+ unsigned long vaddr, unsigned long size)
+{
+ phdr->p_type = PT_LOAD;
+ phdr->p_vaddr = vaddr;
+ phdr->p_offset = paddr;
+ phdr->p_paddr = paddr;
+ phdr->p_filesz = size;
+ phdr->p_memsz = size;
+ phdr->p_flags = PF_R | PF_W | PF_X;
+ phdr->p_align = PAGE_SIZE;
+}
+
/*
* Initialize ELF loads (new kernel)
*/
@@ -520,14 +533,8 @@ static void loads_init(Elf64_Phdr *phdr, bool os_info_has_vm)
if (os_info_has_vm)
old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
for_each_physmem_range(idx, &oldmem_type, &start, &end) {
- phdr->p_type = PT_LOAD;
- phdr->p_vaddr = old_identity_base + start;
- phdr->p_offset = start;
- phdr->p_paddr = start;
- phdr->p_filesz = end - start;
- phdr->p_memsz = end - start;
- phdr->p_flags = PF_R | PF_W | PF_X;
- phdr->p_align = PAGE_SIZE;
+ fill_ptload(phdr, start, old_identity_base + start,
+ end - start);
phdr++;
}
}
@@ -537,6 +544,22 @@ static bool os_info_has_vm(void)
return os_info_old_value(OS_INFO_KASLR_OFFSET);
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+/*
+ * Fill PT_LOAD for a physical memory range owned by a device and detected by
+ * its device driver.
+ */
+void elfcorehdr_fill_device_ram_ptload_elf64(Elf64_Phdr *phdr,
+ unsigned long long paddr, unsigned long long size)
+{
+ unsigned long old_identity_base = 0;
+
+ if (os_info_has_vm())
+ old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
+ fill_ptload(phdr, paddr, old_identity_base + paddr, size);
+}
+#endif
+
/*
* Prepare PT_LOAD type program header for kernel image region
*/
--
2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (11 preceding siblings ...)
2024-12-04 12:54 ` [PATCH v2 12/12] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) David Hildenbrand
@ 2025-01-08 12:04 ` Michael S. Tsirkin
2025-01-08 12:10 ` Heiko Carstens
12 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-01-08 12:04 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Jason Wang, Xuan Zhuo, Eugenio Pérez, Baoquan He,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On Wed, Dec 04, 2024 at 01:54:31PM +0100, David Hildenbrand wrote:
> The only "different than everything else" thing about virtio-mem on s390
> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> kdump kernel must detect memory ranges of the crashed kernel to
> include via PT_LOAD in the vmcore.
>
> On other architectures, all RAM regions (boot + hotplugged) can easily be
> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> the elfcore hdr.
>
> On s390, information about "ordinary" memory (heh, "storage") can be
> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> that information is stored early during boot in the "physmem" memblock
> data structure.
>
> But virtio-mem memory is always detected by as device driver, which is
> usually build as a module. So in the crash kernel, this memory can only be
> properly detected once the virtio-mem driver started up.
>
> The virtio-mem driver already supports the "kdump mode", where it won't
> hotplug any memory but instead queries the device to implement the
> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> the vmcore.
>
> With this series, if the virtio-mem driver is included in the kdump
> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> now detect the device RAM ranges on s390 once it probes the devices, to add
> them to the vmcore using the same callback mechanism we already have for
> pfn_is_ram().
>
> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> add new PT_LOAD entries that describe these memory ranges, and update
> all offsets vmcore size so it is all consistent.
>
> My testing when creating+analyzing crash dumps with hotplugged virtio-mem
> memory (incl. holes) did not reveal any surprises.
>
> Patch #1 -- #7 are vmcore preparations and cleanups
> Patch #8 adds the infrastructure for drivers to report device RAM
> Patch #9 + #10 are virtio-mem preparations
> Patch #11 implements virtio-mem support to report device RAM
> Patch #12 activates it for s390, implementing a new function to fill
> PT_LOAD entry for device RAM
Who is merging this?
virtio parts:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> v1 -> v2:
> * "fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex"
> -> Extend patch description
> * "fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex"
> -> Extend patch description
> * "fs/proc/vmcore: disallow vmcore modifications while the vmcore is open"
> -> Disallow modifications only if it is currently open, but warn if it
> was already open and got closed again.
> -> Track vmcore_open vs. vmcore_opened
> -> Extend patch description
> * "fs/proc/vmcore: prefix all pr_* with "vmcore:""
> -> Added
> * "fs/proc/vmcore: move vmcore definitions out if kcore.h"
> -> Call it "vmcore_range"
> -> Place vmcoredd_node into vmcore.c
> -> Adjust patch subject + description
> * "fs/proc/vmcore: factor out allocating a vmcore range and adding it to a
> list"
> -> Adjust to "vmcore_range"
> * "fs/proc/vmcore: factor out freeing a list of vmcore ranges"
> -> Adjust to "vmcore_range"
> * "fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
> ranges in 2nd kernel"
> -> Drop PROVIDE_PROC_VMCORE_DEVICE_RAM for now
> -> Simplify Kconfig a bit
> -> Drop "Kdump:" from warnings/errors
> -> Perform Elf64 check first
> -> Add regions also if the vmcore was opened, but got closed again. But
> warn in any case, because it is unexpected.
> -> Adjust patch description
> * "virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM"
> -> "depends on VIRTIO_MEM" for PROC_VMCORE_DEVICE_RAM
>
>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Cc: "Eugenio Pérez" <eperezma@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Eric Farman <farman@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
>
> David Hildenbrand (12):
> fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
> fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
> fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
> fs/proc/vmcore: prefix all pr_* with "vmcore:"
> fs/proc/vmcore: move vmcore definitions out of kcore.h
> fs/proc/vmcore: factor out allocating a vmcore range and adding it to
> a list
> fs/proc/vmcore: factor out freeing a list of vmcore ranges
> fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
> ranges in 2nd kernel
> virtio-mem: mark device ready before registering callbacks in kdump
> mode
> virtio-mem: remember usable region size
> virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
> s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
>
> arch/s390/Kconfig | 1 +
> arch/s390/kernel/crash_dump.c | 39 ++++-
> drivers/virtio/virtio_mem.c | 103 ++++++++++++-
> fs/proc/Kconfig | 19 +++
> fs/proc/vmcore.c | 283 ++++++++++++++++++++++++++--------
> include/linux/crash_dump.h | 41 +++++
> include/linux/kcore.h | 13 --
> 7 files changed, 407 insertions(+), 92 deletions(-)
>
>
> base-commit: feffde684ac29a3b7aec82d2df850fbdbdee55e4
> --
> 2.47.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390
2025-01-08 12:04 ` [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 Michael S. Tsirkin
@ 2025-01-08 12:10 ` Heiko Carstens
2025-01-08 12:14 ` David Hildenbrand
0 siblings, 1 reply; 16+ messages in thread
From: Heiko Carstens @ 2025-01-08 12:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: David Hildenbrand, linux-kernel, linux-mm, linux-s390,
virtualization, kvm, linux-fsdevel, kexec, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Jason Wang, Xuan Zhuo, Eugenio Pérez, Baoquan He,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On Wed, Jan 08, 2025 at 07:04:23AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 04, 2024 at 01:54:31PM +0100, David Hildenbrand wrote:
> > The only "different than everything else" thing about virtio-mem on s390
> > is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> > during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> > kdump kernel must detect memory ranges of the crashed kernel to
> > include via PT_LOAD in the vmcore.
> >
> > On other architectures, all RAM regions (boot + hotplugged) can easily be
> > observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> > the elfcore hdr.
> >
> > On s390, information about "ordinary" memory (heh, "storage") can be
> > obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> > that information is stored early during boot in the "physmem" memblock
> > data structure.
> >
> > But virtio-mem memory is always detected by as device driver, which is
> > usually build as a module. So in the crash kernel, this memory can only be
> > properly detected once the virtio-mem driver started up.
> >
> > The virtio-mem driver already supports the "kdump mode", where it won't
> > hotplug any memory but instead queries the device to implement the
> > pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> > the vmcore.
> >
> > With this series, if the virtio-mem driver is included in the kdump
> > initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> > now detect the device RAM ranges on s390 once it probes the devices, to add
> > them to the vmcore using the same callback mechanism we already have for
> > pfn_is_ram().
> >
> > To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> > add new PT_LOAD entries that describe these memory ranges, and update
> > all offsets vmcore size so it is all consistent.
> >
> > My testing when creating+analyzing crash dumps with hotplugged virtio-mem
> > memory (incl. holes) did not reveal any surprises.
> >
> > Patch #1 -- #7 are vmcore preparations and cleanups
> > Patch #8 adds the infrastructure for drivers to report device RAM
> > Patch #9 + #10 are virtio-mem preparations
> > Patch #11 implements virtio-mem support to report device RAM
> > Patch #12 activates it for s390, implementing a new function to fill
> > PT_LOAD entry for device RAM
>
> Who is merging this?
> virtio parts:
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
I guess this series should go via Andrew Morton. Andrew?
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390
2025-01-08 12:10 ` Heiko Carstens
@ 2025-01-08 12:14 ` David Hildenbrand
0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-01-08 12:14 UTC (permalink / raw)
To: Heiko Carstens, Michael S. Tsirkin
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Jason Wang, Xuan Zhuo,
Eugenio Pérez, Baoquan He, Vivek Goyal, Dave Young,
Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
Eric Farman, Andrew Morton
On 08.01.25 13:10, Heiko Carstens wrote:
> On Wed, Jan 08, 2025 at 07:04:23AM -0500, Michael S. Tsirkin wrote:
>> On Wed, Dec 04, 2024 at 01:54:31PM +0100, David Hildenbrand wrote:
>>> The only "different than everything else" thing about virtio-mem on s390
>>> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
>>> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
>>> kdump kernel must detect memory ranges of the crashed kernel to
>>> include via PT_LOAD in the vmcore.
>>>
>>> On other architectures, all RAM regions (boot + hotplugged) can easily be
>>> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
>>> the elfcore hdr.
>>>
>>> On s390, information about "ordinary" memory (heh, "storage") can be
>>> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
>>> that information is stored early during boot in the "physmem" memblock
>>> data structure.
>>>
>>> But virtio-mem memory is always detected by as device driver, which is
>>> usually build as a module. So in the crash kernel, this memory can only be
>>> properly detected once the virtio-mem driver started up.
>>>
>>> The virtio-mem driver already supports the "kdump mode", where it won't
>>> hotplug any memory but instead queries the device to implement the
>>> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
>>> the vmcore.
>>>
>>> With this series, if the virtio-mem driver is included in the kdump
>>> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
>>> now detect the device RAM ranges on s390 once it probes the devices, to add
>>> them to the vmcore using the same callback mechanism we already have for
>>> pfn_is_ram().
>>>
>>> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
>>> add new PT_LOAD entries that describe these memory ranges, and update
>>> all offsets vmcore size so it is all consistent.
>>>
>>> My testing when creating+analyzing crash dumps with hotplugged virtio-mem
>>> memory (incl. holes) did not reveal any surprises.
>>>
>>> Patch #1 -- #7 are vmcore preparations and cleanups
>>> Patch #8 adds the infrastructure for drivers to report device RAM
>>> Patch #9 + #10 are virtio-mem preparations
>>> Patch #11 implements virtio-mem support to report device RAM
>>> Patch #12 activates it for s390, implementing a new function to fill
>>> PT_LOAD entry for device RAM
>>
>> Who is merging this?
>> virtio parts:
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> I guess this series should go via Andrew Morton. Andrew?
>
> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
>
Yes, it's in mm-unstable already for quite a while.
Thanks for the acks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-01-08 12:14 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-04 12:54 [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 01/12] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 02/12] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 03/12] fs/proc/vmcore: disallow vmcore modifications while the vmcore is open David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 04/12] fs/proc/vmcore: prefix all pr_* with "vmcore:" David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 05/12] fs/proc/vmcore: move vmcore definitions out of kcore.h David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 06/12] fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 07/12] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 08/12] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 09/12] virtio-mem: mark device ready before registering callbacks in kdump mode David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 10/12] virtio-mem: remember usable region size David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 11/12] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM David Hildenbrand
2024-12-04 12:54 ` [PATCH v2 12/12] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) David Hildenbrand
2025-01-08 12:04 ` [PATCH v2 00/12] fs/proc/vmcore: kdump support for virtio-mem on s390 Michael S. Tsirkin
2025-01-08 12:10 ` Heiko Carstens
2025-01-08 12:14 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox