From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA33C433F5 for ; Tue, 5 Oct 2021 12:19:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 52E1A61409 for ; Tue, 5 Oct 2021 12:19:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 52E1A61409 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E75976B0073; Tue, 5 Oct 2021 08:19:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFED594000F; Tue, 5 Oct 2021 08:19:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C790D94000E; Tue, 5 Oct 2021 08:19:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id B6D8A6B0073 for ; Tue, 5 Oct 2021 08:19:37 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 60A271819588A for ; Tue, 5 Oct 2021 12:19:37 +0000 (UTC) X-FDA: 78662289594.05.98476E4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 0EA13900135D for ; Tue, 5 Oct 2021 12:19:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633436376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iz/bJO8gGfz6X0Xatxi/OuCnSQ7Id+RJEaaGs0FeG00=; b=HXZYheV9k5PnHV40d2MYFcGwJhr25VVMdc9kOybnNm83aIrOyTCbKuIjak83GrRAZFZ0q9 G3h5i3oKkp5wzYgzJB5Qb2iT22glfvuVARdZtf4nFu+7NXOvS+6ypxQ2oJoL5IuEhlnnWG f8d6akgOe5UEoBAFt/9EDXTPJiAHi+E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-511-Q6nJ3M9xMvO3xKykIuWtCQ-1; Tue, 05 Oct 2021 08:19:35 -0400 X-MC-Unique: Q6nJ3M9xMvO3xKykIuWtCQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B72726D50A; Tue, 5 Oct 2021 12:19:32 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.58]) by smtp.corp.redhat.com (Postfix) with ESMTP id 50F061F41E; Tue, 5 Oct 2021 12:18:50 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , "Michael S. Tsirkin" , Jason Wang , Dave Young , Baoquan He , Vivek Goyal , Michal Hocko , Oscar Salvador , Mike Rapoport , "Rafael J. Wysocki" , x86@kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, kexec@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 9/9] virtio-mem: kdump mode to sanitize /proc/vmcore access Date: Tue, 5 Oct 2021 14:14:30 +0200 Message-Id: <20211005121430.30136-10-david@redhat.com> In-Reply-To: <20211005121430.30136-1-david@redhat.com> References: <20211005121430.30136-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0EA13900135D X-Stat-Signature: 1gdjkxknbgrkypb9tjmwpwhr38cjzadc Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HXZYheV9; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf28.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1633436376-230015 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [= 1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory vi= a PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initr= d and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) a= nd a recent dracut version (including the virtio_mem module in the kdump initrd). [1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [2] https://github.com/dracutdevs/dracut/pull/1157 Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 136 ++++++++++++++++++++++++++++++++---- 1 file changed, 124 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 76d8aef3cfd2..3573b78f6b05 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -223,6 +223,9 @@ struct virtio_mem { * When this lock is held the pointers can't change, ONLINE and * OFFLINE blocks can't change the state and no subblocks will get * plugged/unplugged. + * + * In kdump mode, used to serialize requests, last_block_addr and + * last_block_plugged. */ struct mutex hotplug_mutex; bool hotplug_active; @@ -230,6 +233,9 @@ struct virtio_mem { /* An error occurred we cannot handle - stop processing requests. */ bool broken; =20 + /* Cached valued of is_kdump_kernel() when the device was probed. */ + bool in_kdump; + /* The driver is being removed. */ spinlock_t removal_lock; bool removing; @@ -243,6 +249,13 @@ struct virtio_mem { /* Memory notifier (online/offline events). */ struct notifier_block memory_notifier; =20 +#ifdef CONFIG_PROC_VMCORE + /* vmcore callback for /proc/vmcore handling in kdump mode */ + struct vmcore_cb vmcore_cb; + uint64_t last_block_addr; + bool last_block_plugged; +#endif /* CONFIG_PROC_VMCORE */ + /* Next device in the list of virtio-mem devices. */ struct list_head next; }; @@ -2293,6 +2306,12 @@ static void virtio_mem_run_wq(struct work_struct *= work) uint64_t diff; int rc; =20 + if (unlikely(vm->in_kdump)) { + dev_warn_once(&vm->vdev->dev, + "unexpected workqueue run in kdump kernel\n"); + return; + } + hrtimer_cancel(&vm->retry_timer); =20 if (vm->broken) @@ -2521,6 +2540,86 @@ static int virtio_mem_init_hotplug(struct virtio_m= em *vm) return rc; } =20 +#ifdef CONFIG_PROC_VMCORE +static int virtio_mem_send_state_request(struct virtio_mem *vm, uint64_t= addr, + uint64_t size) +{ + const uint64_t nb_vm_blocks =3D size / vm->device_block_size; + const struct virtio_mem_req req =3D { + .type =3D cpu_to_virtio16(vm->vdev, VIRTIO_MEM_REQ_STATE), + .u.state.addr =3D cpu_to_virtio64(vm->vdev, addr), + .u.state.nb_blocks =3D cpu_to_virtio16(vm->vdev, nb_vm_blocks), + }; + int rc =3D -ENOMEM; + + dev_dbg(&vm->vdev->dev, "requesting state: 0x%llx - 0x%llx\n", addr, + addr + size - 1); + + switch (virtio_mem_send_request(vm, &req)) { + case VIRTIO_MEM_RESP_ACK: + return virtio16_to_cpu(vm->vdev, vm->resp.u.state.state); + case VIRTIO_MEM_RESP_ERROR: + rc =3D -EINVAL; + break; + default: + break; + } + + dev_dbg(&vm->vdev->dev, "requesting state failed: %d\n", rc); + return rc; +} + +static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb, + unsigned long pfn) +{ + struct virtio_mem *vm =3D container_of(cb, struct virtio_mem, + vmcore_cb); + uint64_t addr =3D PFN_PHYS(pfn); + bool is_ram; + int rc; + + if (!virtio_mem_contains_range(vm, addr, PAGE_SIZE)) + return true; + if (!vm->plugged_size) + return false; + + /* + * We have to serialize device requests and access to the information + * about the block queried last. + */ + mutex_lock(&vm->hotplug_mutex); + + addr =3D ALIGN_DOWN(addr, vm->device_block_size); + if (addr !=3D vm->last_block_addr) { + rc =3D virtio_mem_send_state_request(vm, addr, + vm->device_block_size); + /* On any kind of error, we're going to signal !ram. */ + if (rc =3D=3D VIRTIO_MEM_STATE_PLUGGED) + vm->last_block_plugged =3D true; + else + vm->last_block_plugged =3D false; + vm->last_block_addr =3D addr; + } + + is_ram =3D vm->last_block_plugged; + mutex_unlock(&vm->hotplug_mutex); + return is_ram; +} +#endif /* CONFIG_PROC_VMCORE */ + +static int virtio_mem_init_kdump(struct virtio_mem *vm) +{ +#ifdef CONFIG_PROC_VMCORE + dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n= "); + vm->vmcore_cb.pfn_is_ram =3D virtio_mem_vmcore_pfn_is_ram; + register_vmcore_cb(&vm->vmcore_cb); + return 0; +#else /* CONFIG_PROC_VMCORE */ + dev_warn(&vm->vdev->dev, "disabled in kdump kernel without vmcore\n"); + return -EBUSY; +#endif /* CONFIG_PROC_VMCORE */ +} + static int virtio_mem_init(struct virtio_mem *vm) { uint16_t node_id; @@ -2530,15 +2629,6 @@ static int virtio_mem_init(struct virtio_mem *vm) return -EINVAL; } =20 - /* - * We don't want to (un)plug or reuse any memory when in kdump. The - * memory is still accessible (but not mapped). - */ - if (is_kdump_kernel()) { - dev_warn(&vm->vdev->dev, "disabled in kdump kernel\n"); - return -EBUSY; - } - /* Fetch all properties that can't change. */ virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size, &vm->plugged_size); @@ -2562,6 +2652,12 @@ static int virtio_mem_init(struct virtio_mem *vm) if (vm->nid !=3D NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA)) dev_info(&vm->vdev->dev, "nid: %d", vm->nid); =20 + /* + * We don't want to (un)plug or reuse any memory when in kdump. The + * memory is still accessible (but not exposed to Linux). + */ + if (vm->in_kdump) + return virtio_mem_init_kdump(vm); return virtio_mem_init_hotplug(vm); } =20 @@ -2640,6 +2736,7 @@ static int virtio_mem_probe(struct virtio_device *v= dev) hrtimer_init(&vm->retry_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); vm->retry_timer.function =3D virtio_mem_timer_expired; vm->retry_timer_ms =3D VIRTIO_MEM_RETRY_TIMER_MIN_MS; + vm->in_kdump =3D is_kdump_kernel(); =20 /* register the virtqueue */ rc =3D virtio_mem_init_vq(vm); @@ -2654,8 +2751,10 @@ static int virtio_mem_probe(struct virtio_device *= vdev) virtio_device_ready(vdev); =20 /* trigger a config update to start processing the requested_size */ - atomic_set(&vm->config_changed, 1); - queue_work(system_freezable_wq, &vm->wq); + if (!vm->in_kdump) { + atomic_set(&vm->config_changed, 1); + queue_work(system_freezable_wq, &vm->wq); + } =20 return 0; out_del_vq: @@ -2732,11 +2831,21 @@ static void virtio_mem_deinit_hotplug(struct virt= io_mem *vm) } } =20 +static void virtio_mem_deinit_kdump(struct virtio_mem *vm) +{ +#ifdef CONFIG_PROC_VMCORE + unregister_vmcore_cb(&vm->vmcore_cb); +#endif /* CONFIG_PROC_VMCORE */ +} + static void virtio_mem_remove(struct virtio_device *vdev) { struct virtio_mem *vm =3D vdev->priv; =20 - virtio_mem_deinit_hotplug(vm); + if (vm->in_kdump) + virtio_mem_deinit_kdump(vm); + else + virtio_mem_deinit_hotplug(vm); =20 /* reset the device and cleanup the queues */ vdev->config->reset(vdev); @@ -2750,6 +2859,9 @@ static void virtio_mem_config_changed(struct virtio= _device *vdev) { struct virtio_mem *vm =3D vdev->priv; =20 + if (unlikely(vm->in_kdump)) + return; + atomic_set(&vm->config_changed, 1); virtio_mem_retry(vm); } --=20 2.31.1