From: Alistair Popple <apopple@nvidia.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-mm@kvack.org, "Andrew Morton" <akpm@linux-foundation.org>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Pan, Xinhui" <Xinhui.Pan@amd.com>,
"David Airlie" <airlied@linux.ie>,
"Daniel Vetter" <daniel@ffwll.ch>,
"Ben Skeggs" <bskeggs@redhat.com>,
"Karol Herbst" <kherbst@redhat.com>,
"Lyude Paul" <lyude@redhat.com>,
"Ralph Campbell" <rcampbell@nvidia.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Alex Sierra" <alex.sierra@amd.com>,
"John Hubbard" <jhubbard@nvidia.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
amd-gfx@lists.freedesktop.org, nouveau@lists.freedesktop.org,
dri-devel@lists.freedesktop.org,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Dan Williams" <dan.j.williams@intel.com>
Subject: Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page
Date: Thu, 29 Sep 2022 11:40:32 +1000 [thread overview]
Message-ID: <875yh7osye.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <87fsgbf3gh.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Alistair Popple <apopple@nvidia.com> writes:
>> When the CPU tries to access a device private page the migrate_to_ram()
>> callback associated with the pgmap for the page is called. However no
>> reference is taken on the faulting page. Therefore a concurrent
>> migration of the device private page can free the page and possibly the
>> underlying pgmap. This results in a race which can crash the kernel due
>> to the migrate_to_ram() function pointer becoming invalid. It also means
>> drivers can't reliably read the zone_device_data field because the page
>> may have been freed with memunmap_pages().
>>
>> Close the race by getting a reference on the page while holding the ptl
>> to ensure it has not been freed. Unfortunately the elevated reference
>> count will cause the migration required to handle the fault to fail. To
>> avoid this failure pass the faulting page into the migrate_vma functions
>> so that if an elevated reference count is found it can be checked to see
>> if it's expected or not.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> ---
>> arch/powerpc/kvm/book3s_hv_uvmem.c | 15 ++++++-----
>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++++++------
>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +-
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +++++---
>> include/linux/migrate.h | 8 ++++++-
>> lib/test_hmm.c | 7 ++---
>> mm/memory.c | 16 +++++++++++-
>> mm/migrate.c | 34 ++++++++++++++-----------
>> mm/migrate_device.c | 18 +++++++++----
>> 9 files changed, 87 insertions(+), 41 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> index 5980063..d4eacf4 100644
>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>> static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>> unsigned long start,
>> unsigned long end, unsigned long page_shift,
>> - struct kvm *kvm, unsigned long gpa)
>> + struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>> {
>> unsigned long src_pfn, dst_pfn = 0;
>> - struct migrate_vma mig;
>> + struct migrate_vma mig = { 0 };
>> struct page *dpage, *spage;
>> struct kvmppc_uvmem_page_pvt *pvt;
>> unsigned long pfn;
>> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>> mig.dst = &dst_pfn;
>> mig.pgmap_owner = &kvmppc_uvmem_pgmap;
>> mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>> + mig.fault_page = fault_page;
>>
>> /* The requested page is already paged-out, nothing to do */
>> if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
>> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>> static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>> unsigned long start, unsigned long end,
>> unsigned long page_shift,
>> - struct kvm *kvm, unsigned long gpa)
>> + struct kvm *kvm, unsigned long gpa,
>> + struct page *fault_page)
>> {
>> int ret;
>>
>> mutex_lock(&kvm->arch.uvmem_lock);
>> - ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
>> + ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
>> + fault_page);
>> mutex_unlock(&kvm->arch.uvmem_lock);
>>
>> return ret;
>> @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>> bool pagein)
>> {
>> unsigned long src_pfn, dst_pfn = 0;
>> - struct migrate_vma mig;
>> + struct migrate_vma mig = { 0 };
>> struct page *spage;
>> unsigned long pfn;
>> struct page *dpage;
>> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
>>
>> if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>> vmf->address + PAGE_SIZE, PAGE_SHIFT,
>> - pvt->kvm, pvt->gpa))
>> + pvt->kvm, pvt->gpa, vmf->page))
>> return VM_FAULT_SIGBUS;
>> else
>> return 0;
>
> I don't have a UV test system, but as-is it doesn't even compile :)
Ugh, thanks. I did get as far as installing a PPC cross-compiler and
building a kernel. Apparently I did not get as far as enabling
CONFIG_PPC_UV :)
> kvmppc_svm_page_out() is called via some paths other than the
> migrate_to_ram callback.
>
> I think it's correct to just pass fault_page = NULL when it's not called
> from the migrate_to_ram callback?
>
> Incremental diff below.
>
> cheers
>
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index d4eacf410956..965c9e9e500b 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -637,7 +637,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
> pvt->remove_gfn = true;
>
> if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE,
> - PAGE_SHIFT, kvm, pvt->gpa))
> + PAGE_SHIFT, kvm, pvt->gpa, NULL))
> pr_err("Can't page out gpa:0x%lx addr:0x%lx\n",
> pvt->gpa, addr);
> } else {
> @@ -1068,7 +1068,7 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
> if (!vma || vma->vm_start > start || vma->vm_end < end)
> goto out;
>
> - if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa))
> + if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa, NULL))
> ret = H_SUCCESS;
> out:
> mmap_read_unlock(kvm->mm);
next prev parent reply other threads:[~2022-09-29 1:44 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-26 6:03 [PATCH 0/7] Fix several device private page reference counting issues Alistair Popple
2022-09-26 6:03 ` [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page Alistair Popple
2022-09-29 0:07 ` Michael Ellerman
2022-09-29 1:40 ` Alistair Popple [this message]
2022-09-29 5:07 ` Michael Ellerman
2022-09-26 6:03 ` [PATCH 2/7] mm: Free device private pages have zero refcount Alistair Popple
2022-09-26 14:36 ` Jason Gunthorpe
2022-09-27 2:06 ` Alistair Popple
2022-09-29 20:18 ` Dan Williams
2022-09-30 0:45 ` Alistair Popple
2022-09-30 1:49 ` Dan Williams
2022-09-26 6:03 ` [PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page() Alistair Popple
2022-09-26 6:03 ` [PATCH 4/7] mm/migrate_device.c: Add migrate_device_range() Alistair Popple
2022-09-26 6:03 ` [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one() Alistair Popple
2022-09-26 21:29 ` Lyude Paul
2022-09-28 11:30 ` Alistair Popple
2022-09-26 6:03 ` [PATCH 6/7] nouveau/dmem: Evict device private memory during release Alistair Popple
2022-09-26 13:28 ` kernel test robot
2022-09-26 21:35 ` Lyude Paul
2022-09-26 22:14 ` John Hubbard
2022-09-26 23:45 ` Alistair Popple
2022-09-28 21:39 ` Lyude Paul
2022-09-26 23:07 ` Felix Kuehling
2022-09-27 1:39 ` Alistair Popple
2022-09-28 21:23 ` Lyude Paul
2022-09-26 6:03 ` [PATCH 7/7] hmm-tests: Add test for migrate_device_range() Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yh7osye.fsf@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=Felix.Kuehling@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=alex.sierra@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=bskeggs@redhat.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kherbst@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lyude@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=nouveau@lists.freedesktop.org \
--cc=npiggin@gmail.com \
--cc=rcampbell@nvidia.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox