From: Felix Kuehling <felix.kuehling@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
amd-gfx@lists.freedesktop.org,
"Christian König" <christian.koenig@amd.com>
Cc: linux-mm@kvack.org, kernel test robot <lkp@intel.com>
Subject: Re: [PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings
Date: Mon, 11 Dec 2023 10:56:13 -0500 [thread overview]
Message-ID: <7b67d4f1-cfcd-47a9-a80e-f4c1eee235a1@amd.com> (raw)
In-Reply-To: <a6411c81-d0ea-4002-bfc6-a725a83eb9bf@gmail.com>
On 2023-12-08 05:11, Christian König wrote:
> Am 07.12.23 um 20:14 schrieb Felix Kuehling:
>>
>> On 2023-12-05 17:20, Felix Kuehling wrote:
>>> Properly mark kfd_process->ef as __rcu and consistently access it with
>>> rcu_dereference_protected.
>>>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Closes:
>>> https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>
>> ping.
>>
>> Christian, would you review this patch, please?
>
> Looks a bit suspicious, especially the rcu_dereference_protected() use.
>
> What is the static checker complaining about in the first place?
From
https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/:
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1671:9: sparse: sparse: incompatible types in comparison expression (different address spaces): >> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1671:9: sparse:
struct dma_fence [noderef] __rcu * >>
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1671:9: sparse:
struct dma_fence * ... >>
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36: sparse:
sparse: incompatible types in comparison expression (different address
spaces): >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36:
sparse: struct dma_fence [noderef] __rcu * >>
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36: sparse: struct
dma_fence * >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36:
sparse: sparse: incompatible types in comparison expression (different
address spaces): >>
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36: sparse: struct
dma_fence [noderef] __rcu * >>
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2765:36: sparse: struct
dma_fence *
As far as I can tell, the reason is, that I'm using
dma_fence_get_rcu_safe and rcu_replace_pointer to get and update
kfd_process->ef, without annotating the fence pointers with __rcu. This
patch fixes that by marking kfd_process->ef as an __rcu pointer. The
only place that was dereferencing it directly was
kfd_process_wq_release, where I added rcu_dereference_protected. The
condition I'm using here is, that the process ref count is 0 at that
point, which means nobody else is referencing the process or this fence
pointer at the time.
Regards,
Felix
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Felix
>>
>>
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 ++++--
>>> 4 files changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> index f2e920734c98..20cb266dcedd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> @@ -314,7 +314,7 @@ void
>>> amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem);
>>> int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev,
>>> struct amdgpu_bo *bo);
>>> int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
>>> - struct dma_fence **ef);
>>> + struct dma_fence __rcu **ef);
>>> int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct amdgpu_device *adev,
>>> struct kfd_vm_fault_info *info);
>>> int amdgpu_amdkfd_gpuvm_import_dmabuf_fd(struct amdgpu_device
>>> *adev, int fd,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 7d91f99acb59..8ba6f6c8363d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -2806,7 +2806,7 @@ static void
>>> amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>>> put_task_struct(usertask);
>>> }
>>> -static void replace_eviction_fence(struct dma_fence **ef,
>>> +static void replace_eviction_fence(struct dma_fence __rcu **ef,
>>> struct dma_fence *new_ef)
>>> {
>>> struct dma_fence *old_ef = rcu_replace_pointer(*ef, new_ef, true
>>> @@ -2841,7 +2841,7 @@ static void replace_eviction_fence(struct
>>> dma_fence **ef,
>>> * 7. Add fence to all PD and PT BOs.
>>> * 8. Unreserve all BOs
>>> */
>>> -int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
>>> dma_fence **ef)
>>> +int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
>>> dma_fence __rcu **ef)
>>> {
>>> struct amdkfd_process_info *process_info = info;
>>> struct amdgpu_vm *peer_vm;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 45366b4ca976..5a24097a9f28 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -917,7 +917,7 @@ struct kfd_process {
>>> * fence will be triggered during eviction and new one will be
>>> created
>>> * during restore
>>> */
>>> - struct dma_fence *ef;
>>> + struct dma_fence __rcu *ef;
>>> /* Work items for evicting and restoring BOs */
>>> struct delayed_work eviction_work;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index 71df51fcc1b0..14b11d61f8dd 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -1110,6 +1110,8 @@ static void kfd_process_wq_release(struct
>>> work_struct *work)
>>> {
>>> struct kfd_process *p = container_of(work, struct kfd_process,
>>> release_work);
>>> + struct dma_fence *ef = rcu_dereference_protected(p->ef,
>>> + kref_read(&p->ref) == 0);
>>> kfd_process_dequeue_from_all_devices(p);
>>> pqm_uninit(&p->pqm);
>>> @@ -1118,7 +1120,7 @@ static void kfd_process_wq_release(struct
>>> work_struct *work)
>>> * destroyed. This allows any BOs to be freed without
>>> * triggering pointless evictions or waiting for fences.
>>> */
>>> - dma_fence_signal(p->ef);
>>> + dma_fence_signal(ef);
>>> kfd_process_remove_sysfs(p);
>>> @@ -1127,7 +1129,7 @@ static void kfd_process_wq_release(struct
>>> work_struct *work)
>>> svm_range_list_fini(p);
>>> kfd_process_destroy_pdds(p);
>>> - dma_fence_put(p->ef);
>>> + dma_fence_put(ef);
>>> kfd_event_free_process(p);
>
next prev parent reply other threads:[~2023-12-11 15:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-05 22:20 Felix Kuehling
2023-12-07 19:14 ` Felix Kuehling
2023-12-08 10:11 ` Christian König
2023-12-11 15:56 ` Felix Kuehling [this message]
2023-12-20 16:58 ` Felix Kuehling
2024-01-05 13:32 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b67d4f1-cfcd-47a9-a80e-f4c1eee235a1@amd.com \
--to=felix.kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox