From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BECACC3DA7F for ; Mon, 12 Aug 2024 17:24:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41E656B008A; Mon, 12 Aug 2024 13:24:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F4EF6B008C; Mon, 12 Aug 2024 13:24:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26FAF6B0095; Mon, 12 Aug 2024 13:24:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F3CDA6B008A for ; Mon, 12 Aug 2024 13:24:42 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 95BCD120297 for ; Mon, 12 Aug 2024 17:24:42 +0000 (UTC) X-FDA: 82444268004.04.5659386 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf28.hostedemail.com (Postfix) with ESMTP id 9293EC000D for ; Mon, 12 Aug 2024 17:24:40 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gz8ELbJr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of sinquersw@gmail.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=sinquersw@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723483469; a=rsa-sha256; cv=none; b=tbueEl2S0nA1IQRTjOqNnZZ3Rap+0BIdN28WrHDqJu8VCZBYXcMFl09yVssogX5XjyBBxp 7Gqz9s7ZLfhStmBqDHnBUFlvVEgAbrFJAhF7O7XHvQPXp/MfeKA5/YfIAbD2JAyEzFslKi L9aR9UdomNa7JZ4v3gkQ6Dlw7gfSpNo= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gz8ELbJr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of sinquersw@gmail.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=sinquersw@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723483469; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sF4Jwntoq4zdAhAvYVGxfYqAqOdgf+36ZG4E/807e9Q=; b=FJBqZJea+X7CupFnmywhDI9WpYiDb5KXG8Te+ZtJOwXSXoOWnZuhcHH7dNr25ugkOo38b4 1F21TQCs9l6nmucRGPlPzozB7LiKdLsNE5QlwuPAIdxcHO/EJ9JN2B4rp4dSEETOjIiplU uZL7O2d5x2UDe0+af7tvNe38hUCGYaU= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-e0bf9c7f4f2so4303960276.0 for ; Mon, 12 Aug 2024 10:24:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723483479; x=1724088279; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=sF4Jwntoq4zdAhAvYVGxfYqAqOdgf+36ZG4E/807e9Q=; b=gz8ELbJrVQMD49jmyeVZYQMxRo5jof/r529QdOnJL5+QZ5+mI3de/Si7nYnXmm9bbV i0qVRpx5WHMlY2E9cX0223geHi6E6TZMrnYI6a3FfEmDgGCe/PHxIQFfyDx48yL+fA+G 5NmNJobwCeOsfa21bIcWHFTlM7sZHI0ULVi1SH9ay0Ms5u/DSBHwzepLq4RbHqUD0Np5 1itLhW70pGS9ZhPx5GpAdL7uPDhhJC8Ng4eYJoqIVFeAlPLSvhwMRDn+OtpPjlQI71kW bM7fMvTv/rQS4b4lDG1/IWpegLo6uarAAeuacgg3pwPjPiaPacVnvIndWCtCPqDdhFT2 YuwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723483479; x=1724088279; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sF4Jwntoq4zdAhAvYVGxfYqAqOdgf+36ZG4E/807e9Q=; b=Wk3tmtjQMygfl7kAsODmQBjRfbvn/KMRx24PGeNz4kqTZAAVrY/KqtB67sRXuqIMNn CDZOaViJoGN9u75bMoJ1Wfq7MasJmluAqKGRV6866D9U93EjkNcDJP2qzxy1ayJfejAh EX3ibLGmjdktgF92HR8AYYKeZ5rAxGREU2RNgSTJzewMnRYnvgFNF7rXi8hUIhz/2+BU p/Oi7L+SgA2en8M7/6WgKDML0r4mtTLEqu/aGGFrfF/4htFA7lDIlapzh03p9BExapbw B4D+noPUFcV8hP92WNtv++ZWG2AzEWWgtW0XXwNRTy1G+uQfjZ7TxVvhRpNtcazUuYLM qv0w== X-Forwarded-Encrypted: i=1; AJvYcCUzdOiOGJWrRAJ8iVH/c1wui/HcXKJ188ferY2If4Z9HypZC6RLxNtdeInJhO27MIQeEoZ5ZQrZiBzgS6CmjqU9h4c= X-Gm-Message-State: AOJu0YzItozYqLlEml4WwW1JbccPFdT0+FBNixe7/ZahsUGCYkxPHDRt 8XExCvBw6WXrYXuXn9nTEICj74qgczYsEajJGhnPl5TgbvDm6sL7 X-Google-Smtp-Source: AGHT+IFrwb3QDjfZwrlWrCK0WQjnkPQnFiDLc7iQDc7na0hOOfYDxW3rbA2ZF7iVXIP3Ci5HoZDnRg== X-Received: by 2002:a05:6902:2742:b0:e0b:f2ca:e4df with SMTP id 3f1490d57ef6-e113cf00eddmr1006063276.34.1723483479318; Mon, 12 Aug 2024 10:24:39 -0700 (PDT) Received: from ?IPV6:2600:1700:6cf8:1240:9b6c:23b8:ec8:40fd? ([2600:1700:6cf8:1240:9b6c:23b8:ec8:40fd]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e0ec8ca637asm1214888276.59.2024.08.12.10.24.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Aug 2024 10:24:38 -0700 (PDT) Message-ID: <62ade560-dd46-4480-8595-250b0264d3a4@gmail.com> Date: Mon, 12 Aug 2024 10:24:36 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC bpf-next 3/5] bpf: pin, translate, and unpin __kptr_user from syscalls. To: Alexei Starovoitov , Kui-Feng Lee Cc: bpf , Alexei Starovoitov , Martin KaFai Lau , Song Liu , Kernel Team , Andrii Nakryiko , Kui-Feng Lee , linux-mm References: <20240807235755.1435806-1-thinker.li@gmail.com> <20240807235755.1435806-4-thinker.li@gmail.com> Content-Language: en-US From: Kui-Feng Lee In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 9293EC000D X-Rspamd-Server: rspam01 X-Stat-Signature: mnahuwzu7nebyibddt9w3go96pmyaq4c X-HE-Tag: 1723483480-372146 X-HE-Meta: U2FsdGVkX18foZOjuXECSDuy8teVqcpVshjviToBVyBM+edMabbtUBa0S4HFStapK4crJ3hezU5rPYeHlx8VT14uYsv5Y7sghSnhuThAHyVyg9/EZQX7E9NMi8OI4RLR/uRSxmkfbSP2M+sQZH0/+W6mxbIgSESQuumph2u8e2Ro8XC21/IsZuq2shBWk0LoP//5G9A0TEYHn0dDG+8GKEcz223KkoV8YFVtE0ilFbgg9+3w0tfxZVU1BnX9pYb51eatoshQumIMuq47zRPjAZZg27fGcvBUPuTkXFZ86XKZFTISS2oSIdEDkdiAduo0iV+F6Xb8aPpTjc56ulH9LibPnDh3vMDkado5P530DVSYIhCNxzf57owm1Oz/CeY8WMFlBbsh3Gb0SGggj5WU6oIneRGgcDQog5925xY6v6xp7d6fOUks+HACvYA/n/gT/HZuAmN3CxvMSz9AO+FYqY/7sEHfepfOzTLpAzrwbH0vyNTf6K86+uF5vrbF250iNuCHWBu0BGSdLNBy0rwR/OphqquR739zLj+M7hoBxy8rKVk+BSwkccpDEe43iOrHfuenWQwtiFdZui7HL6MpXfPqA4zuK3VIvSMrsawU74mb0PoQhpzMK2rjWF15yWLTca9mgUKaBo7S87oWhXNMMCU8lBvIw9dw2l3+G5FW8OTc+JH8hPfjEwBjh0A+6eE1PDYMuNL/GkyI4Om7Nz7r419fhMvRYZd6eDDld1TzvoHckgQrxACU2wlQl6EjSCWT9+gj4MnsCLGZUFS7A7eRQclp8JPUQ0gbQcNxezgHMygdVdkjT/6MHoFLa1lnInzQTSWajbWVaxxYJwynsg5VDD6hXsHEBN04GHtI3PwOof7AnQg2vcD6Af/DzwAiAMgizL/c98/vQt8TdSAuChvB6Zt9v+a24ZRlE9ZkrYvorrEPwqpvY8FJ7C/APJEvvH3a8fj430Gjusx7SXeuMG9 bff5VAZG EmrKF47sMYcc8TwIXMfoo3aMA55Ja+WCAfYzG85atHzb0k3mugzj15J/+NCp7NEUh2sa8rnHy98PnTQ3nxfAEVfUc2zUvpxjhoEyTpvFXVYqkbwv3Sr1wuNEiDSszx5HkTXMXjlcU9gsfrOT79l9HAh5MQMH/GduzE86zc93RfFKutXozfNtM3X/0J+/7h3wKRR5JJOPAfWAG1ERMhPIjngyynI4njsTv/U3GBzl4rVgeLIFhBA9RXRQcfDfbPyqqgZP/gqsJZfswTd+gxNAkbUbfV5p/xz2M7XddQSh0Guk6bbf7RMXfLk3XuIAb8isOeIhXGWSKp363s1QRbzfvqbUYyTuCoJRl7yhio9bDrlOFv90sB6AIR7LGlJEIrgTKBOoT4TPGB2g0mQQpEFV3XtByU0HUOAbxyizoFdVqkvy6AWkeuUodO+Ixim5MLi56Lzfat23uRbsfbONe/0QK/HX27A0yoDqPAqHvE9Lcm1Tb1SZyMfNGtef/wzE46FsZE5OrucSi+IJpm51ZFic976RS0NXlhCCczv+JErPsw0hVJIEC+mbIEXexr4Kl/ZNZw49jPaFgxJjMb85FtkUUb7qMw48y7plG/3DbNW7dCPOI6DI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/12/24 09:45, Alexei Starovoitov wrote: > On Wed, Aug 7, 2024 at 4:58 PM Kui-Feng Lee wrote: >> >> User kptrs are pinned, by pin_user_pages_fast(), and translated to an >> address in the kernel when the value is updated by user programs. (Call >> bpf_map_update_elem() from user programs.) And, the pinned pages are >> unpinned if the value of user kptrs are overritten or if the values of maps >> are deleted/destroyed. >> >> The pages are mapped through vmap() in order to get a continuous space in >> the kernel if the memory pointed by a user kptr resides in two or more >> pages. For the case of single page, page_address() is called to get the >> address of a page in the kernel. >> >> User kptr is only supported by task storage maps. >> >> One user kptr can pin at most KPTR_USER_MAX_PAGES(16) physical pages. This >> is a random picked number for safety. We actually can remove this >> restriction totally. >> >> User kptrs could only be set by user programs through syscalls. Any >> attempts of updating the value of a map with __kptr_user in it should >> ignore the values of user kptrs from BPF programs. The values of user kptrs >> will keep as they were if the new values are from BPF programs, not from >> user programs. >> >> Cc: linux-mm@kvack.org >> Signed-off-by: Kui-Feng Lee >> --- >> include/linux/bpf.h | 35 +++++- >> include/linux/bpf_local_storage.h | 2 +- >> kernel/bpf/bpf_local_storage.c | 18 +-- >> kernel/bpf/helpers.c | 12 +- >> kernel/bpf/local_storage.c | 2 +- >> kernel/bpf/syscall.c | 177 +++++++++++++++++++++++++++++- >> net/core/bpf_sk_storage.c | 2 +- >> 7 files changed, 227 insertions(+), 21 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index 87d5f98249e2..f4ad0bc183cb 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -30,6 +30,7 @@ >> #include >> #include >> #include >> +#include >> >> struct bpf_verifier_env; >> struct bpf_verifier_log; >> @@ -477,10 +478,12 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) >> data_race(*ldst++ = *lsrc++); >> } >> >> +void bpf_obj_unpin_uaddr(const struct btf_field *field, void *addr); >> + >> /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */ >> static inline void bpf_obj_memcpy(struct btf_record *rec, >> void *dst, void *src, u32 size, >> - bool long_memcpy) >> + bool long_memcpy, bool from_user) >> { >> u32 curr_off = 0; >> int i; >> @@ -496,21 +499,40 @@ static inline void bpf_obj_memcpy(struct btf_record *rec, >> for (i = 0; i < rec->cnt; i++) { >> u32 next_off = rec->fields[i].offset; >> u32 sz = next_off - curr_off; >> + void *addr; >> >> memcpy(dst + curr_off, src + curr_off, sz); >> + if (from_user && rec->fields[i].type == BPF_KPTR_USER) { > > > Do not add this to bpf_obj_memcpy() which is a critical path > for various map operations. > This has to be standalone for task storage only. > >> + /* Unpin old address. >> + * >> + * Alignments are guaranteed by btf_find_field_one(). >> + */ >> + addr = *(void **)(dst + next_off); >> + if (virt_addr_valid(addr)) >> + bpf_obj_unpin_uaddr(&rec->fields[i], addr); >> + else if (addr) >> + WARN_ON_ONCE(1); >> + >> + *(void **)(dst + next_off) = *(void **)(src + next_off); >> + } >> curr_off += rec->fields[i].size + sz; >> } >> memcpy(dst + curr_off, src + curr_off, size - curr_off); >> } >> >> +static inline void copy_map_value_user(struct bpf_map *map, void *dst, void *src, bool from_user) > > No need for these helpers either. > >> +{ >> + bpf_obj_memcpy(map->record, dst, src, map->value_size, false, from_user); >> +} >> + >> static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) >> { >> - bpf_obj_memcpy(map->record, dst, src, map->value_size, false); >> + bpf_obj_memcpy(map->record, dst, src, map->value_size, false, false); >> } >> >> static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src) >> { >> - bpf_obj_memcpy(map->record, dst, src, map->value_size, true); >> + bpf_obj_memcpy(map->record, dst, src, map->value_size, true, false); >> } >> >> static inline void bpf_obj_memzero(struct btf_record *rec, void *dst, u32 size) >> @@ -538,6 +560,8 @@ static inline void zero_map_value(struct bpf_map *map, void *dst) >> bpf_obj_memzero(map->record, dst, map->value_size); >> } >> >> +void copy_map_value_locked_user(struct bpf_map *map, void *dst, void *src, >> + bool lock_src, bool from_user); >> void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, >> bool lock_src); >> void bpf_timer_cancel_and_free(void *timer); >> @@ -775,6 +799,11 @@ enum bpf_arg_type { >> }; >> static_assert(__BPF_ARG_TYPE_MAX <= BPF_BASE_TYPE_LIMIT); >> >> +#define BPF_MAP_UPDATE_FLAG_BITS 3 >> +enum bpf_map_update_flag { >> + BPF_FROM_USER = BIT(0 + BPF_MAP_UPDATE_FLAG_BITS) >> +}; >> + >> /* type of values returned from helper functions */ >> enum bpf_return_type { >> RET_INTEGER, /* function returns integer */ >> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h >> index dcddb0aef7d8..d337df68fa23 100644 >> --- a/include/linux/bpf_local_storage.h >> +++ b/include/linux/bpf_local_storage.h >> @@ -181,7 +181,7 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap, >> >> struct bpf_local_storage_elem * >> bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, >> - bool charge_mem, gfp_t gfp_flags); >> + bool charge_mem, gfp_t gfp_flags, bool from_user); >> >> void bpf_selem_free(struct bpf_local_storage_elem *selem, >> struct bpf_local_storage_map *smap, >> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c >> index c938dea5ddbf..c4cf09e27a19 100644 >> --- a/kernel/bpf/bpf_local_storage.c >> +++ b/kernel/bpf/bpf_local_storage.c >> @@ -73,7 +73,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) >> >> struct bpf_local_storage_elem * >> bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, >> - void *value, bool charge_mem, gfp_t gfp_flags) >> + void *value, bool charge_mem, gfp_t gfp_flags, bool from_user) >> { >> struct bpf_local_storage_elem *selem; >> >> @@ -100,7 +100,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, >> >> if (selem) { >> if (value) >> - copy_map_value(&smap->map, SDATA(selem)->data, value); >> + copy_map_value_user(&smap->map, SDATA(selem)->data, value, from_user); >> /* No need to call check_and_init_map_value as memory is zero init */ >> return selem; >> } >> @@ -530,9 +530,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, >> struct bpf_local_storage_elem *alloc_selem, *selem = NULL; >> struct bpf_local_storage *local_storage; >> unsigned long flags; >> + bool from_user = map_flags & BPF_FROM_USER; >> int err; >> >> /* BPF_EXIST and BPF_NOEXIST cannot be both set */ >> + map_flags &= ~BPF_FROM_USER; >> if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || >> /* BPF_F_LOCK can only be used in a value with spin_lock */ >> unlikely((map_flags & BPF_F_LOCK) && >> @@ -550,7 +552,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, >> if (err) >> return ERR_PTR(err); >> >> - selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); >> + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags, from_user); >> if (!selem) >> return ERR_PTR(-ENOMEM); >> >> @@ -575,8 +577,8 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, >> if (err) >> return ERR_PTR(err); >> if (old_sdata && selem_linked_to_storage_lockless(SELEM(old_sdata))) { >> - copy_map_value_locked(&smap->map, old_sdata->data, >> - value, false); >> + copy_map_value_locked_user(&smap->map, old_sdata->data, >> + value, false, from_user); >> return old_sdata; >> } >> } >> @@ -584,7 +586,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, >> /* A lookup has just been done before and concluded a new selem is >> * needed. The chance of an unnecessary alloc is unlikely. >> */ >> - alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); >> + alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags, from_user); >> if (!alloc_selem) >> return ERR_PTR(-ENOMEM); >> >> @@ -607,8 +609,8 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, >> goto unlock; >> >> if (old_sdata && (map_flags & BPF_F_LOCK)) { >> - copy_map_value_locked(&smap->map, old_sdata->data, value, >> - false); >> + copy_map_value_locked_user(&smap->map, old_sdata->data, value, >> + false, from_user); >> selem = SELEM(old_sdata); >> goto unlock; >> } >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >> index d02ae323996b..4aef86209fdd 100644 >> --- a/kernel/bpf/helpers.c >> +++ b/kernel/bpf/helpers.c >> @@ -372,8 +372,8 @@ const struct bpf_func_proto bpf_spin_unlock_proto = { >> .arg1_btf_id = BPF_PTR_POISON, >> }; >> >> -void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, >> - bool lock_src) >> +void copy_map_value_locked_user(struct bpf_map *map, void *dst, void *src, >> + bool lock_src, bool from_user) >> { >> struct bpf_spin_lock *lock; >> >> @@ -383,11 +383,17 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, >> lock = dst + map->record->spin_lock_off; >> preempt_disable(); >> __bpf_spin_lock_irqsave(lock); >> - copy_map_value(map, dst, src); >> + copy_map_value_user(map, dst, src, from_user); >> __bpf_spin_unlock_irqrestore(lock); >> preempt_enable(); >> } >> >> +void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, >> + bool lock_src) >> +{ >> + copy_map_value_locked_user(map, dst, src, lock_src, false); >> +} >> + >> BPF_CALL_0(bpf_jiffies64) >> { >> return get_jiffies_64(); >> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c >> index 3969eb0382af..62a12fa8ce9e 100644 >> --- a/kernel/bpf/local_storage.c >> +++ b/kernel/bpf/local_storage.c >> @@ -147,7 +147,7 @@ static long cgroup_storage_update_elem(struct bpf_map *map, void *key, >> struct bpf_cgroup_storage *storage; >> struct bpf_storage_buffer *new; >> >> - if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST))) >> + if (unlikely(flags & ~BPF_F_LOCK)) >> return -EINVAL; >> >> if (unlikely((flags & BPF_F_LOCK) && >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 90a25307480e..eaa2a9d13265 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -155,8 +155,134 @@ static void maybe_wait_bpf_programs(struct bpf_map *map) >> synchronize_rcu(); >> } >> >> -static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, >> - void *key, void *value, __u64 flags) >> +static void *trans_addr_pages(struct page **pages, int npages) >> +{ >> + if (npages == 1) >> + return page_address(pages[0]); >> + /* For multiple pages, we need to use vmap() to get a contiguous >> + * virtual address range. >> + */ >> + return vmap(pages, npages, VM_MAP, PAGE_KERNEL); >> +} > > Don't quite see a need for trans_addr_pages() helper when it's used > once. > >> + >> +#define KPTR_USER_MAX_PAGES 16 >> + >> +static int bpf_obj_trans_pin_uaddr(struct btf_field *field, void **addr) >> +{ >> + const struct btf_type *t; >> + struct page *pages[KPTR_USER_MAX_PAGES]; >> + void *ptr, *kern_addr; >> + u32 type_id, tsz; >> + int r, npages; >> + >> + ptr = *addr; >> + type_id = field->kptr.btf_id; >> + t = btf_type_id_size(field->kptr.btf, &type_id, &tsz); >> + if (!t) >> + return -EINVAL; >> + if (tsz == 0) { >> + *addr = NULL; >> + return 0; >> + } >> + >> + npages = (((intptr_t)ptr + tsz + ~PAGE_MASK) - >> + ((intptr_t)ptr & PAGE_MASK)) >> PAGE_SHIFT; >> + if (npages > KPTR_USER_MAX_PAGES) >> + return -E2BIG; >> + r = pin_user_pages_fast((intptr_t)ptr & PAGE_MASK, npages, 0, pages); > > No need to apply the mask on ptr. See pin_user_pages_fast() internals. > > It probably should be FOLL_WRITE | FOLL_LONGTERM instead of 0. Agree! > >> + if (r != npages) >> + return -EINVAL; >> + kern_addr = trans_addr_pages(pages, npages); >> + if (!kern_addr) >> + return -ENOMEM; >> + *addr = kern_addr + ((intptr_t)ptr & ~PAGE_MASK); >> + return 0; >> +} >> + >> +void bpf_obj_unpin_uaddr(const struct btf_field *field, void *addr) >> +{ >> + struct page *pages[KPTR_USER_MAX_PAGES]; >> + int npages, i; >> + u32 size, type_id; >> + void *ptr; >> + >> + type_id = field->kptr.btf_id; >> + btf_type_id_size(field->kptr.btf, &type_id, &size); >> + if (size == 0) >> + return; >> + >> + ptr = (void *)((intptr_t)addr & PAGE_MASK); >> + npages = (((intptr_t)addr + size + ~PAGE_MASK) - (intptr_t)ptr) >> PAGE_SHIFT; >> + for (i = 0; i < npages; i++) { >> + pages[i] = virt_to_page(ptr); >> + ptr += PAGE_SIZE; >> + } >> + if (npages > 1) >> + /* Paired with vmap() in trans_addr_pages() */ >> + vunmap((void *)((intptr_t)addr & PAGE_MASK)); >> + unpin_user_pages(pages, npages); >> +} >> + >> +static int bpf_obj_trans_pin_uaddrs(struct btf_record *rec, void *src, u32 size) >> +{ >> + u32 next_off; >> + int i, err; >> + >> + if (IS_ERR_OR_NULL(rec)) >> + return 0; >> + >> + if (!btf_record_has_field(rec, BPF_KPTR_USER)) >> + return 0; > > imo kptr_user doesn't quite fit as a name. > 'kptr' means 'kernel pointer'. Here it's user addr. > Maybe just "uptr" ? That makes sense. > >> + >> + for (i = 0; i < rec->cnt; i++) { >> + if (rec->fields[i].type != BPF_KPTR_USER) >> + continue; >> + >> + next_off = rec->fields[i].offset; >> + if (next_off + sizeof(void *) > size) >> + return -EINVAL; >> + err = bpf_obj_trans_pin_uaddr(&rec->fields[i], src + next_off); >> + if (!err) >> + continue; >> + >> + /* Rollback */ >> + for (i--; i >= 0; i--) { >> + if (rec->fields[i].type != BPF_KPTR_USER) >> + continue; >> + next_off = rec->fields[i].offset; >> + bpf_obj_unpin_uaddr(&rec->fields[i], *(void **)(src + next_off)); >> + *(void **)(src + next_off) = NULL; >> + } >> + >> + return err; >> + } >> + >> + return 0; >> +} >> + >> +static void bpf_obj_unpin_uaddrs(struct btf_record *rec, void *src) >> +{ >> + u32 next_off; >> + int i; >> + >> + if (IS_ERR_OR_NULL(rec)) >> + return; >> + >> + if (!btf_record_has_field(rec, BPF_KPTR_USER)) >> + return; >> + >> + for (i = 0; i < rec->cnt; i++) { >> + if (rec->fields[i].type != BPF_KPTR_USER) >> + continue; >> + >> + next_off = rec->fields[i].offset; >> + bpf_obj_unpin_uaddr(&rec->fields[i], *(void **)(src + next_off)); >> + *(void **)(src + next_off) = NULL; > > This part is pretty much the same as the undo part in > bpf_obj_trans_pin_uaddrs() and the common helper is warranted. Sure! > >> + } >> +} >> + >> +static int bpf_map_update_value_inner(struct bpf_map *map, struct file *map_file, >> + void *key, void *value, __u64 flags) >> { >> int err; >> >> @@ -208,6 +334,29 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, >> return err; >> } >> >> +static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, >> + void *key, void *value, __u64 flags) >> +{ >> + int err; >> + >> + if (flags & BPF_FROM_USER) { > > there shouldn't be a need for this extra flag. > map->record has the info whether uptr is present or not. The BPF_FROM_USER flag is used to support updating map values from BPF programs as well. Although BPF programs can udpate map values, I don't want the values of uptrs to be changed by the BPF programs. Should we just forbid the BPF programs to udpate the map values having uptrs in them? > >> + /* Pin user memory can lead to context switch, so we need >> + * to do it before potential RCU lock. >> + */ >> + err = bpf_obj_trans_pin_uaddrs(map->record, value, >> + bpf_map_value_size(map)); >> + if (err) >> + return err; >> + } >> + >> + err = bpf_map_update_value_inner(map, map_file, key, value, flags); >> + >> + if (err && (flags & BPF_FROM_USER)) >> + bpf_obj_unpin_uaddrs(map->record, value); >> + >> + return err; >> +} >> + >> static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value, >> __u64 flags) >> { >> @@ -714,6 +863,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) >> field->kptr.dtor(xchgd_field); >> } >> break; >> + case BPF_KPTR_USER: >> + if (virt_addr_valid(*(void **)field_ptr)) >> + bpf_obj_unpin_uaddr(field, *(void **)field_ptr); >> + *(void **)field_ptr = NULL; >> + break; >> case BPF_LIST_HEAD: >> if (WARN_ON_ONCE(rec->spin_lock_off < 0)) >> continue; >> @@ -1155,6 +1309,12 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, >> goto free_map_tab; >> } >> break; >> + case BPF_KPTR_USER: >> + if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE) { >> + ret = -EOPNOTSUPP; >> + goto free_map_tab; >> + } >> + break; >> case BPF_LIST_HEAD: >> case BPF_RB_ROOT: >> if (map->map_type != BPF_MAP_TYPE_HASH && >> @@ -1618,11 +1778,15 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) >> struct bpf_map *map; >> void *key, *value; >> u32 value_size; >> + u64 extra_flags = 0; >> struct fd f; >> int err; >> >> if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM)) >> return -EINVAL; >> + /* Prevent userspace from setting any internal flags */ >> + if (attr->flags & ~(BIT(BPF_MAP_UPDATE_FLAG_BITS) - 1)) >> + return -EINVAL; >> >> f = fdget(ufd); >> map = __bpf_map_get(f); >> @@ -1653,7 +1817,9 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) >> goto free_key; >> } >> >> - err = bpf_map_update_value(map, f.file, key, value, attr->flags); >> + if (map->map_type == BPF_MAP_TYPE_TASK_STORAGE) >> + extra_flags |= BPF_FROM_USER; >> + err = bpf_map_update_value(map, f.file, key, value, attr->flags | extra_flags); >> if (!err) >> maybe_wait_bpf_programs(map); >> >> @@ -1852,6 +2018,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, >> void __user *keys = u64_to_user_ptr(attr->batch.keys); >> u32 value_size, cp, max_count; >> void *key, *value; >> + u64 extra_flags = 0; >> int err = 0; >> >> if (attr->batch.elem_flags & ~BPF_F_LOCK) >> @@ -1881,6 +2048,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, >> return -ENOMEM; >> } >> >> + if (map->map_type == BPF_MAP_TYPE_TASK_STORAGE) >> + extra_flags |= BPF_FROM_USER; >> for (cp = 0; cp < max_count; cp++) { >> err = -EFAULT; >> if (copy_from_user(key, keys + cp * map->key_size, >> @@ -1889,7 +2058,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, >> break; >> >> err = bpf_map_update_value(map, map_file, key, value, >> - attr->batch.elem_flags); >> + attr->batch.elem_flags | extra_flags); >> >> if (err) >> break; >> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c >> index bc01b3aa6b0f..db5281384e6a 100644 >> --- a/net/core/bpf_sk_storage.c >> +++ b/net/core/bpf_sk_storage.c >> @@ -137,7 +137,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk, >> { >> struct bpf_local_storage_elem *copy_selem; >> >> - copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC); >> + copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC, false); >> if (!copy_selem) >> return NULL; >> >> -- >> 2.34.1 >>