From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FABFCE7B1F for ; Thu, 28 Sep 2023 19:46:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA4058D00D5; Thu, 28 Sep 2023 15:46:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C51878D0053; Thu, 28 Sep 2023 15:46:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF2668D00D5; Thu, 28 Sep 2023 15:46:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9F3AD8D0053 for ; Thu, 28 Sep 2023 15:46:05 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 739101CA638 for ; Thu, 28 Sep 2023 19:46:05 +0000 (UTC) X-FDA: 81287037090.07.7D684DF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 53BB81A0010 for ; Thu, 28 Sep 2023 19:46:03 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ab6MX50m; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of alex.williamson@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=alex.williamson@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695930363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WiLyNyZShLLdd9RccU6T4jcODRuULNa0f04S4NA33TQ=; b=Yub1IZJ0a+P8KiWxCj+QbBPJyRDXsr1yZOj5vqIZUlrsKru5hXMWwRA//XALDRurTGYl1f 3zGb2nc0bhtNiEOx6jb4vN2s6yERmSO8TU4ZYZByYFRpUs0YHqgl+dYZ7Il7Y2TRAFB/qF 9jFcP4PpKz6K3HiMnH1gjFtVyQZ2Gdg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ab6MX50m; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of alex.williamson@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=alex.williamson@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695930363; a=rsa-sha256; cv=none; b=uuFnc9MSxwzMD7U+S5/SIDNBlJ5iS5nMXHvXx79EGgkbUC07Fa9HHfrxlWWiYGzTI3g4NG QYop0RK6zjN/JunoQ1zMHRm+qovf7SUrr42L70f2vFxVJgUzLxshPHeREUhHQXrIm2AvWy AxmqkpIRD3zc0vqRY+neQWUfxnb0gH0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695930362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WiLyNyZShLLdd9RccU6T4jcODRuULNa0f04S4NA33TQ=; b=Ab6MX50mJCoQQDH2H+w0NzaOe16Vz/BFuk0jFsbOYIBDbY68Sx3LQeU8bNoa1pj3Jm2Cqr nLa8No00GW9BOiWR6wAJC1ZEmj25kUIZHz4XzuVHXo5wY4IZk18z8TVRbxKs43dgzm+9jv mqCZoGkRcvfnUOeFjYIAzsumFCAW6vs= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-408-nlyVRcGZMKaf_5W12x50PA-1; Thu, 28 Sep 2023 15:45:59 -0400 X-MC-Unique: nlyVRcGZMKaf_5W12x50PA-1 Received: by mail-io1-f72.google.com with SMTP id ca18e2360f4ac-7913a5d6546so1674545839f.1 for ; Thu, 28 Sep 2023 12:45:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695930358; x=1696535158; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=60E5WNNxgSCl+tFz6iAjN5CoPA2wZHwA5yijdmM+h1U=; b=RDIbHiR09nWa1sbvyV7d/vFfxP53DyQqKxp86adO3HZiHL7/nn7IbC4g8GVNEOO9sm NjWPSu+jqAcyepyUafkF6+IiyYKZqbPHBMU03WcYKJ4eu5pW6n6u6yI6gZeVdo5veNsh 4UvLaeI3feV2ev3X3IzB4IGmJ1chddBE0Dg5ooNqXOmHrgxKhTWzT0qaqgGpqO8FQbHG EVxyiSotpSYVpDboy/JIRAB27A92ACGRxm2/qaRISOqAi+yjJIbb6d7tNjWw+QUx1GtM BppY0m5wSVRNmV074U/a9JOrFTrzCFiHpX1rFLD6h8O/q+1NR7qXy8bRou+3tJzMLuy5 Q2zA== X-Gm-Message-State: AOJu0Yy171MYYFHlI9eF8qbYpYblpfEdzAfPF9Hyx3OqmvttPE9GM5TJ a8INEPWYq+mPVd+cu9gB8VB23CZwEm3s+HKE8TzuZEy/MYfD198Ymr5lgDE2TgGiscMg4fcCup0 H6CcrujGTG2s= X-Received: by 2002:a5e:c908:0:b0:794:d833:4a8a with SMTP id z8-20020a5ec908000000b00794d8334a8amr2118399iol.0.1695930358343; Thu, 28 Sep 2023 12:45:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFTIpcOxqtSjd2ba/myuh51/GOw731q/iLcMd6z1TSsoqJRnI+GIsSKs1QDQUKIYUTiWxiRog== X-Received: by 2002:a5e:c908:0:b0:794:d833:4a8a with SMTP id z8-20020a5ec908000000b00794d8334a8amr2118389iol.0.1695930358047; Thu, 28 Sep 2023 12:45:58 -0700 (PDT) Received: from redhat.com ([38.15.60.12]) by smtp.gmail.com with ESMTPSA id z12-20020a6b5c0c000000b007864c26fd35sm4678742ioh.13.2023.09.28.12.45.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 12:45:57 -0700 (PDT) Date: Thu, 28 Sep 2023 13:45:50 -0600 From: Alex Williamson To: Cc: , , , , , , , , , , , , , , , , Subject: Re: [PATCH v1 4/4] vfio/nvgpu: register device memory for poison handling Message-ID: <20230928134550.55fd9d8b.alex.williamson@redhat.com> In-Reply-To: <20230920140210.12663-5-ankita@nvidia.com> References: <20230920140210.12663-1-ankita@nvidia.com> <20230920140210.12663-5-ankita@nvidia.com> Organization: Red Hat MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 53BB81A0010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: zjoeyx8swdogro87mbkf8mwm9jdtoqib X-HE-Tag: 1695930363-227623 X-HE-Meta: U2FsdGVkX1950W+5hLy6bObkJ04ZoCHMXAGZZxqXNxL3e2rFfNWOveWvGcJ46tCnD/KkvJM4SxgbiretmI4evzRikMVDiTs50NEQFIGhQyxWiVSmioAVRquNVNOMKJBlNq87FoikmpFf9f2SUgI/Rcbvz0D+V3xmq8WhF10Xg7xl9wz2Lloj06F4GgGAH2a/IBnxBEJqsg0JerxX+L0+0xrEWlj4SPKvutbjAPK3VuwpW7FJlmMcSq8E4VUgS4IE0TpSeAzhVl8jgEdbsu97CBLNc+WVKK7wayKPkP/7YSOFySVd7V6vwASXq2flILQQ1kxLi7Oz8txFgJeD0A0XEjVBuavwN73+o6UJ4dfs9WN30egEOWvGo7UC5PmJ3X6B01DpDjZ9BjaNUgqK9dWaWoY+6TzdL5SzApRIzAlS7eSD/W3PoMAtdTERcixKlSleAaRJyXK7ScfkacxLZaHYusWeD3hxLvUl/r+0ipjttTdSdnVKD+tij2nPakHl4YeKjmdG0Uv6LVMLS6Pj1u/TREvjwyc4H5uLzsaBzd1n+RFkZTnfjJWhepENiTlh6n0FP1W+t+7Z3I3EQWuVWk4QoRX6zwUqLmiEg2WBG6L/Fc9BMOXQrMDyEg1ttRtnRHiMZPpXOxDTqQgn/bG9G0VTMmCR2aDZTgvVkZdNgApQCN8EV9MBVRN3ynFFsrJhiitkzSPCkiOGz6QLuDdiSZ7pu037n6xFITuMc6xR+6C9mWCPn0PIz5FCMDXFeSNfmd3NK+WDxZuRp9LhuI/AnQFvDNgHdj7P5g83RQeN1vK2np77P6DkcAQTXDH9Td+FMpLIOpaFEqS8x4WabTxQ9NDIz6SABz748BquIY6jNNgGB8eGrVRdrtwMgsYyOj5G97sn7J+oGdRULUrw3taoSCmsF1ouNWrlALDr29PGJdpMAJfBfjTcH2SaAR6qidGrZAevwpb0xvYVhmk9nUM5bLs 1/42NSaq ljd6ztMERTORCfdmgTrrGSGpbdqE/AQ7d49E/K3qh8QwlVIkfVUlCHhMDhsNstOt+dD5/4iXzxBNw1P/WQbUlfcubyJfwpv1dVgIwMB68+lL/bk9n9a+7rcKlVbPAbqAULJd6fsUX3bxVah5JFKSsvw9eGz7ODxwFhg3F1SMQDKExWgBqMZ6pKflzWK44sis4/xhd8Xet4J3CfsIh7ly6m0g2VFJqItsO4P8U+RylV8qDy5+xrIwgEEg8rFqUTyhYeU6GpFamILuyTV+iQE88y5nK01a3UiSyjfKA+DAj7xSnpKvtKLPbN7l6xIag9rcXJ1KnfelrSAn9LWO2MRUqJyna4q+CYWRjzYYuNrMlgJX6F4J9cjXjLkSvPc3xYdUHdYqnUYP8PsUhpgPPh9304vAhjJn/16fl/6pn8tHqBfsRNLqLKWdQQC4UKaRCR8Bj2ddQVpCw9TI54GJkqfsei3dTjxU2hXsSY1WP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 20 Sep 2023 19:32:10 +0530 wrote: > From: Ankit Agrawal >=20 > The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA > (Qemu) using remap_pfn_range() without adding the memory to the kernel. > The device memory pages are not backed by struct page. Patches 1-3 > implements the mechanism to handle ECC/poison on memory page without > struct page and expose a registration function. This new mechanism is > leveraged here. > =C2=A0 > The module registers its memory region with the kernel MM for ECC handlin= g > using the register_pfn_address_space() registration API exposed by the > kernel. It also defines a failure callback function pfn_memory_failure() > to get the poisoned PFN from the MM. > =C2=A0 > The module track poisoned PFN as a bitmap with a bit per PFN. The PFN is > communicated by the kernel MM to the module through the failure function, > which sets the appropriate bit in the bitmap. > =C2=A0 > The module also defines a VMA fault ops for the module. It returns > VM_FAULT_HWPOISON in case the bit for the PFN is set in the bitmap. >=20 > [1] https://lore.kernel.org/all/20230915025415.6762-1-ankita@nvidia.com/ >=20 > Signed-off-by: Ankit Agrawal > --- > drivers/vfio/pci/nvgrace-gpu/main.c | 107 +++++++++++++++++++++++++++- > drivers/vfio/vfio.h | 11 --- > drivers/vfio/vfio_main.c | 3 +- > include/linux/vfio.h | 15 ++++ > 4 files changed, 123 insertions(+), 13 deletions(-) >=20 > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgra= ce-gpu/main.c > index ba323f2d8ea1..1c89ce0cc1cc 100644 > --- a/drivers/vfio/pci/nvgrace-gpu/main.c > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c > @@ -6,6 +6,10 @@ > #include > #include > #include > +#ifdef CONFIG_MEMORY_FAILURE > +#include > +#include > +#endif > =20 > struct nvgrace_gpu_vfio_pci_core_device { > =09struct vfio_pci_core_device core_device; > @@ -13,8 +17,85 @@ struct nvgrace_gpu_vfio_pci_core_device { > =09size_t memlength; > =09void *memmap; > =09struct mutex memmap_lock; > +#ifdef CONFIG_MEMORY_FAILURE > +=09struct pfn_address_space pfn_address_space; > +=09unsigned long *pfn_bitmap; > +#endif > }; > =20 > +#ifdef CONFIG_MEMORY_FAILURE > +void nvgrace_gpu_vfio_pci_pfn_memory_failure(struct pfn_address_space *p= fn_space, > +=09=09unsigned long pfn) > +{ > +=09struct nvgrace_gpu_vfio_pci_core_device *nvdev =3D container_of( > +=09=09pfn_space, struct nvgrace_gpu_vfio_pci_core_device, pfn_address_sp= ace); > +=09unsigned long mem_offset =3D pfn - pfn_space->node.start; > + > +=09if (mem_offset >=3D nvdev->memlength) > +=09=09return; > + > +=09/* > +=09 * MM has called to notify a poisoned page. Track that in the bitmap. > +=09 */ > +=09__set_bit(mem_offset, nvdev->pfn_bitmap); > +} > + > +struct pfn_address_space_ops nvgrace_gpu_vfio_pci_pas_ops =3D { > +=09.failure =3D nvgrace_gpu_vfio_pci_pfn_memory_failure, > +}; > + > +static int > +nvgrace_gpu_vfio_pci_register_pfn_range(struct nvgrace_gpu_vfio_pci_core= _device *nvdev, > +=09=09=09=09=09struct vm_area_struct *vma) > +{ > +=09unsigned long nr_pages; > +=09int ret =3D 0; > + > +=09nr_pages =3D nvdev->memlength >> PAGE_SHIFT; > + > +=09nvdev->pfn_address_space.node.start =3D vma->vm_pgoff; > +=09nvdev->pfn_address_space.node.last =3D vma->vm_pgoff + nr_pages - 1; > +=09nvdev->pfn_address_space.ops =3D &nvgrace_gpu_vfio_pci_pas_ops; > +=09nvdev->pfn_address_space.mapping =3D vma->vm_file->f_mapping; > + > +=09ret =3D register_pfn_address_space(&(nvdev->pfn_address_space)); > + > +=09return ret; > +} > + > +static vm_fault_t nvgrace_gpu_vfio_pci_fault(struct vm_fault *vmf) > +{ > +=09unsigned long mem_offset =3D vmf->pgoff - vmf->vma->vm_pgoff; > +=09struct vfio_device *core_vdev; > +=09struct nvgrace_gpu_vfio_pci_core_device *nvdev; > + > +=09if (!(vmf->vma->vm_file)) > +=09=09goto error_exit; > + > +=09core_vdev =3D vfio_device_from_file(vmf->vma->vm_file); > + > +=09if (!core_vdev) > +=09=09goto error_exit; > + > +=09nvdev =3D container_of(core_vdev, > +=09=09=09struct nvgrace_gpu_vfio_pci_core_device, core_device.vdev); > + > +=09/* > +=09 * Check if the page is poisoned. > +=09 */ > +=09if (mem_offset < (nvdev->memlength >> PAGE_SHIFT) && > +=09=09test_bit(mem_offset, nvdev->pfn_bitmap)) > +=09=09return VM_FAULT_HWPOISON; > + > +error_exit: > +=09return VM_FAULT_ERROR; > +} > + > +static const struct vm_operations_struct nvgrace_gpu_vfio_pci_mmap_ops = =3D { > +=09.fault =3D nvgrace_gpu_vfio_pci_fault, > +}; > +#endif > + > static int nvgrace_gpu_vfio_pci_open_device(struct vfio_device *core_vde= v) > { > =09struct vfio_pci_core_device *vdev =3D > @@ -46,6 +127,9 @@ static void nvgrace_gpu_vfio_pci_close_device(struct v= fio_device *core_vdev) > =20 > =09mutex_destroy(&nvdev->memmap_lock); > =20 > +#ifdef CONFIG_MEMORY_FAILURE > +=09unregister_pfn_address_space(&(nvdev->pfn_address_space)); > +#endif > =09vfio_pci_core_close_device(core_vdev); > } > =20 > @@ -104,8 +188,12 @@ static int nvgrace_gpu_vfio_pci_mmap(struct vfio_dev= ice *core_vdev, > =09=09return ret; > =20 > =09vma->vm_pgoff =3D start_pfn; > +#ifdef CONFIG_MEMORY_FAILURE > +=09vma->vm_ops =3D &nvgrace_gpu_vfio_pci_mmap_ops; > =20 > -=09return 0; > +=09ret =3D nvgrace_gpu_vfio_pci_register_pfn_range(nvdev, vma); > +#endif > +=09return ret; > } > =20 > static long > @@ -406,6 +494,19 @@ nvgrace_gpu_vfio_pci_fetch_memory_property(struct pc= i_dev *pdev, > =20 > =09nvdev->memlength =3D memlength; > =20 > +#ifdef CONFIG_MEMORY_FAILURE > +=09/* > +=09 * A bitmap is maintained to track the pages that are poisoned. Each > +=09 * page is represented by a bit. Allocation size in bytes is > +=09 * determined by shifting the device memory size by PAGE_SHIFT to > +=09 * determine the number of pages; and further shifted by 3 as each > +=09 * byte could track 8 pages. > +=09 */ > +=09nvdev->pfn_bitmap > +=09=09=3D vzalloc((nvdev->memlength >> PAGE_SHIFT)/BITS_PER_TYPE(char)); > +=09if (!nvdev->pfn_bitmap) > +=09=09ret =3D -ENOMEM; > +#endif > =09return ret; > } > =20 > @@ -442,6 +543,10 @@ static void nvgrace_gpu_vfio_pci_remove(struct pci_d= ev *pdev) > =09struct nvgrace_gpu_vfio_pci_core_device *nvdev =3D nvgrace_gpu_drvdat= a(pdev); > =09struct vfio_pci_core_device *vdev =3D &nvdev->core_device; > =20 > +#ifdef CONFIG_MEMORY_FAILURE > +=09vfree(nvdev->pfn_bitmap); > +#endif > + > =09vfio_pci_core_unregister_device(vdev); > =09vfio_put_device(&vdev->vdev); > } > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h > index 307e3f29b527..747094503909 100644 > --- a/drivers/vfio/vfio.h > +++ b/drivers/vfio/vfio.h > @@ -16,17 +16,6 @@ struct iommufd_ctx; > struct iommu_group; > struct vfio_container; > =20 > -struct vfio_device_file { > -=09struct vfio_device *device; > -=09struct vfio_group *group; > - > -=09u8 access_granted; > -=09u32 devid; /* only valid when iommufd is valid */ > -=09spinlock_t kvm_ref_lock; /* protect kvm field */ > -=09struct kvm *kvm; > -=09struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::= lock */ > -}; > - > void vfio_device_put_registration(struct vfio_device *device); > bool vfio_device_try_get_registration(struct vfio_device *device); > int vfio_df_open(struct vfio_device_file *df); > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 40732e8ed4c6..a7dafd7c64a6 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -1309,7 +1309,7 @@ const struct file_operations vfio_device_fops =3D { > =09.mmap=09=09=3D vfio_device_fops_mmap, > }; > =20 > -static struct vfio_device *vfio_device_from_file(struct file *file) > +struct vfio_device *vfio_device_from_file(struct file *file) > { > =09struct vfio_device_file *df =3D file->private_data; > =20 > @@ -1317,6 +1317,7 @@ static struct vfio_device *vfio_device_from_file(st= ruct file *file) > =09=09return NULL; > =09return df->device; > } > +EXPORT_SYMBOL_GPL(vfio_device_from_file); > =20 > /** > * vfio_file_is_valid - True if the file is valid vfio file > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 454e9295970c..d88af251e931 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -361,4 +361,19 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(= void *, void *), > =09=09 struct virqfd **pvirqfd, int fd); > void vfio_virqfd_disable(struct virqfd **pvirqfd); > =20 > +/* > + * VFIO device file. > + */ > +struct vfio_device_file { > +=09struct vfio_device *device; > +=09struct vfio_group *group; > +=09u8 access_granted; > +=09u32 devid; /* only valid when iommufd is valid */ > +=09spinlock_t kvm_ref_lock; /* protect kvm field */ > +=09struct kvm *kvm; > +=09struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::= lock */ > +}; What here necessitates moving this to the more public header? Thanks, Alex > + > +struct vfio_device *vfio_device_from_file(struct file *file); > + > #endif /* VFIO_H */