From: Martin KaFai Lau <martin.lau@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kui-Feng Lee <thinker.li@gmail.com>, bpf <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Kui-Feng Lee <sinquersw@gmail.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.
Date: Fri, 6 Sep 2024 21:03:44 -0700 [thread overview]
Message-ID: <769541a2-4009-4035-a327-838ebdfbf258@linux.dev> (raw)
In-Reply-To: <8b61f093-a6a6-4f99-91f8-20f2a7235d76@linux.dev>
On 9/6/24 6:32 PM, Martin KaFai Lau wrote:
> On 9/6/24 4:44 PM, Alexei Starovoitov wrote:
>> On Fri, Sep 6, 2024 at 1:11 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
>>>> On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
>>>>>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec,
>>>>>> void *obj)
>>>>>> field->kptr.dtor(xchgd_field);
>>>>>> }
>>>>>> break;
>>>>>> + case BPF_UPTR:
>>>>>> + if (*(void **)field_ptr)
>>>>>> + bpf_obj_unpin_uptr(field, *(void
>>>>>> **)field_ptr);
>>>>>> + *(void **)field_ptr = NULL;
>>>>> This one will be called from
>>>>> task_storage_delete->bpf_selem_free->bpf_obj_free_fields
>>>>>
>>>>> and even if upin was safe to do from that context
>>>>> we cannot just do:
>>>>> *(void **)field_ptr = NULL;
>>>>>
>>>>> since bpf prog might be running in parallel,
>>>>> it could have just read that addr and now is using it.
>>>>>
>>>>> The first thought of a way to fix this was to split
>>>>> bpf_obj_free_fields() into the current one plus
>>>>> bpf_obj_free_fields_after_gp()
>>>>> that will do the above unpin bit.
>>>>> and call the later one from bpf_selem_free_rcu()
>>>>> while bpf_obj_free_fields() from bpf_selem_free()
>>>>> will not touch uptr.
>>>>>
>>>>> But after digging further I realized that task_storage
>>>>> already switched to use bpf_ma, so the above won't work.
>>>>>
>>>>> So we need something similar to BPF_KPTR_REF logic:
>>>>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>>>>> and then delay of uptr unpin for that address into call_rcu.
>>>>>
>>>>> Any better ideas?
>>>>
>>>
>>> I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now
>>> (renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace
>>> for the common case.
>>>
>>> selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf
>>> prog.
>>>
>>> bpf_selem_free knows whether a selem can be reused immediately based on the
>>> caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool
>>> reuse_now)".
>>>
>>> If a selem cannot be reuse_now (i.e. == false), it is currently going through
>>> "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do
>>> unpin_user_page() in the rcu callback.
>>>
>>> A selem can be reuse_now (i.e. == true) if the selem is no longer needed because
>>> either its owner (i.e. the task_struct here) is going away in free_task() or the
>>> bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should
>>> have a hold on the selem at this point. I think for these two cases, the
>>> unpin_user_page() can be directly called in bpf_selem_free().
>>
>> but there is also this path:
>> bpf_task_storage_delete -> task_storage_delete -> bpf_selem_free
>> -> bpf_obj_free_fields
>>
>> In this case bpf prog may still be looking at uptr address
>> and we cannot do unpin right away in bpf_obj_free_fields.
>
> cannot unpin immediately in the bpf_task_storage_delete() path is understood.
> task_storage can be used in sleepable. It needs to wait for the tasks_trace and
> the regular rcu gp before unpin.
>
> I forgot to mention earlier that bpf_task_storage_delete() will have the
> bpf_selem_free(..., reuse_now == false). It will then do the
> "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);". The unpin could
> happen in bpf_selem_free_trace_rcu() in this case. I am suggesting to unpin in
> bpf_selem_free_trace_rcu together with the selem free.
>
> I just noticed the map and its btf_record are gone in
> bpf_selem_free_trace_rcu()... so won't work. :(
Thought about it more. Adding a rcu_barrier() to bpf_local_storage_map_free()
may be enough. Then bpf_selem_free_rcu() will have the map->record to unpin.
will need to think about it more.
>
>> All other special fields in map value are ok,
>> since they are either relying on bpf_mem_alloc and
>> have rcu/rcu_tasks_trace gp
>> or extra indirection like timer/wq.
>>
>>> One existing bug is, from looking at patch 6, I don't think the free_task() case
>>> can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not
>>> mark the previously obtained task_storage to be invalid:
>>>
>>> data_task = bpf_task_from_pid(parent_pid);
>>> ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
>>> bpf_task_release(data_task);
>>> if (!ptr)
>>> return 0;
>>> /* The prog still holds a valid task storage ptr. */
>>> udata = ptr->udata;
>>>
>>> It can be fixed by marking the ref_obj_id of the "ptr". Although it is more
>>> correct to make the task storage "ptr" invalid after task_release, it may break
>>> the existing progs.
>>
>> Are you suggesting that bpf_task_release should invalidate all pointers
>> fetched from map value?
>
> I was thinking at least the map value ptr itself needs to be invalidated.
>
>> That will work, but it's not an issue for other special fields in there
>> like kptr.
>> So this invalidation would be need only for uptr which feels
>> weird to special case it and probably will be confusing to users writing
>> such programs.
>
> hmm... I haven't thought about the other pointer fields that read before the
> task_release().
>
> Agreed, it is hard to use if only marks uptr invalid. Thinking about it. Even
> marking the map value ptr invalid while other previously read fields keep
> working is also the same weirdness.
>
>> Above bpf prog example should be ok to use.
>> We only need to delay unpin after rcu/rcu_task_trace gp.
>> Hence my proposal in bpf_obj_free_fields() do:
>> case UPTR:
>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>> call_rcu(...) to unpin.
>
> Agree that call_rcu() here is the only option. It probably needs to go through
> the tasks_trace gp also.
>
> Can the page->rcu_head be used here?
>
prev parent reply other threads:[~2024-09-07 4:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240816191213.35573-1-thinker.li@gmail.com>
2024-08-16 19:12 ` Kui-Feng Lee
2024-08-28 23:24 ` Alexei Starovoitov
2024-09-04 22:21 ` Martin KaFai Lau
2024-09-06 20:11 ` Martin KaFai Lau
2024-09-06 23:44 ` Alexei Starovoitov
2024-09-07 1:32 ` Martin KaFai Lau
2024-09-07 4:03 ` Martin KaFai Lau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=769541a2-4009-4035-a327-838ebdfbf258@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sinquersw@gmail.com \
--cc=thinker.li@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox