From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5ECBE6FE49 for ; Fri, 6 Sep 2024 23:44:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D15A6B0085; Fri, 6 Sep 2024 19:44:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 282126B0088; Fri, 6 Sep 2024 19:44:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14A6A6B0089; Fri, 6 Sep 2024 19:44:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EA7CE6B0085 for ; Fri, 6 Sep 2024 19:44:37 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 796001212C8 for ; Fri, 6 Sep 2024 23:44:37 +0000 (UTC) X-FDA: 82535945394.02.1220E0E Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf20.hostedemail.com (Postfix) with ESMTP id A7E961C0002 for ; Fri, 6 Sep 2024 23:44:35 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Du/zW+OP"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725666165; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y01pZ4xvz66B8X54xHNrPK7EonzIgm65LzyWpwJb+yU=; b=NeJaw8JmGb/qpyEaf008WhtpxiNUWWEVwKvbcabf4aefOYrsu2YRgHQrV+2iibnzGe/HYM Iv08U6gWksy4k/Sp7GJad7oidWsgIWXW1EjuJNZiSsn1dyPT45v5zvmPM5hM5VIw5gRzgl Ju1tWy655b3g4yTZWZLV0OBEs5hQiTg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725666165; a=rsa-sha256; cv=none; b=sxgmdTuEUmTTGnHnY3yrvV8jz43jL0KVwA4zJ12BUskMktb3/fof83jfvj5ixaSpd7yjdc a3gH4SJ7p1ru/WyiH0H/Kn0a5gbeXcYAhC/nraYRjOhksHKFav9cHTEAo1UUutc4jAi7XZ peXlyFsljC31UVc1YNvsIY+uCmuul8M= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Du/zW+OP"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-374b5f27cf2so1588515f8f.1 for ; Fri, 06 Sep 2024 16:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725666274; x=1726271074; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y01pZ4xvz66B8X54xHNrPK7EonzIgm65LzyWpwJb+yU=; b=Du/zW+OPPQDyVWUZdXfyn+bGZXuH73D41Y5p5YZjwNe1Rr/+TD6WNQan31u1BTxiIl XNIGtBiBGDChCOsh49q3S53uAxq7VJ4FhuKAXy6tVuLxFDYdu/F0KJP9PQFgv0IhRgQp 24Tkh1m61HBdj0UU6P0GxG3GAU1x0I5J0C7uA73Geae4yORSECiOpkTw5Z0MTc/eqKmu dcvhx4k6gcz9gbSycCjA3lSnDXLNTDmvjN5zVUIzUY16/pZOxsYs9yvJoobBa/Mpwiy/ wTpwNPsfzanng16kAT17ErV3bThlJnhmymAXU781GqXxgY9KX4VmonK14UIoS9azeIdH 4yZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725666274; x=1726271074; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y01pZ4xvz66B8X54xHNrPK7EonzIgm65LzyWpwJb+yU=; b=bi9XZx7aVtRVmlJIFYSl5R2v4MaNVhUPzl+lbT+30lqVTfh0F1mfO2A9vJSeAYRt9a WEZTw69iQ1n7ZoU6jcvb/L/bnKQQOxRz3VnpkYwyYwkwmksB6DJ+CLGw94GGr1gp471Y JzChR6es4Wbr41tGn53Au3ctOTOyemsmdrAi3eSY8st1OKnaTNEX9XkLjQJa2tb7DIOx ir/NBeb3tL04dURHa++SbsgdzKDQtGCL1XVVWJcd7WL9eBLGq3kzyQvtmnWWsuWhBpFk zUaIXXmNSeTf7qz+g5yzInpbbLsdDH+Px81AGhG3Fonwhsu4lzhPSRWudyDON7Nnk/kI pZYQ== X-Forwarded-Encrypted: i=1; AJvYcCXVJ2p597i3Yln7t2qbmPdM25hVyLPs5IAmibPVCVXl8IKsRwjpIRrJnZTsAhggiOsZOJIPOykBQg==@kvack.org X-Gm-Message-State: AOJu0YwQYmTmBNs/yVP2Q+/UxUbI7p4SBNzw47RFTlqfrqdf4iPDEa28 SfWBKOkbY4tzqN64JJMAtTQkltOkcMr7cPsahCcSLtGxGQCGqR+BOv3sWN9AQSl2j5ewV6WU5tv oHgiKINiyioJz5KvdfJvOBCVwpAE= X-Google-Smtp-Source: AGHT+IHXV27ow3w/krD2rqORlsNLr2Vh4FMnJY/8mRxBYLlKmk6AebkUdX06hLd0JET17D1iS7eGK1zcBxchDmFHYtU= X-Received: by 2002:a5d:4991:0:b0:368:5bb4:169b with SMTP id ffacd0b85a97d-378926858b2mr407679f8f.4.1725666273800; Fri, 06 Sep 2024 16:44:33 -0700 (PDT) MIME-Version: 1.0 References: <20240816191213.35573-1-thinker.li@gmail.com> <20240816191213.35573-5-thinker.li@gmail.com> <70a1b24f-84cd-464c-8fb6-a2c52fd3d703@linux.dev> In-Reply-To: From: Alexei Starovoitov Date: Fri, 6 Sep 2024 16:44:22 -0700 Message-ID: Subject: Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls. To: Martin KaFai Lau Cc: Kui-Feng Lee , bpf , Alexei Starovoitov , Andrii Nakryiko , Kui-Feng Lee , linux-mm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A7E961C0002 X-Stat-Signature: 1aky5huuda1hme556hxbef5zchoggkw7 X-Rspam-User: X-HE-Tag: 1725666275-412416 X-HE-Meta: U2FsdGVkX1+JVOj4CPvo/kjgrm23Gzwgb5+Ngh/k4RMG69/iI3rNxRGQKYgPApPRa0+DJ6c/DsUX5DmmaA5y7Y+6LUo1kdHIPVBv2mg6ZG4SpKhUvPb4k3KLB+Em3KlXhYrlLGkN7/Afj3zLVXJFle6xCoXhCgFHEzr4YIroD3Wmv8MEwmaPuCoG58DKafzF+RrZEcH+T6TWYwlnL5XrZ2K/Z01VxbLoVL5zYLDOXj+DW1JlJQxSMl1gFAFZ82atjNMcIJGElS/sMN1a1uyir2cSTb8vbOwsXQ2+DY+S/213QuNK46ZP6cajC9WxSw74lU4jPOU1CIyBwU9Nd2LKJ/KaO2qijHYB2sK7tO8b9ghnDIP1EhLWZaCM1hElje79oLrxMS3X/z34IAfU2YefgmzLWb328st0rZUG0p3aBJPBtyXkdniNUtxFR3VLPXa2kedaSlqO3DfDuI4WVizgXrpoTsMYJr66L8253OTxIhQGhHgjAzrm3SgzCtF3WjcXwTIo8w8AG1eeaAl+yvewElikIjDHnFrs5g0acn/sN4Td6SPW4iKGITjz5iuxIpYhnMFXcVZUm/kuVCf4UanXdCFY0EBvZvu1RHBjyM/mde9mOy53SsVwGBZd0jnZsOcc7d8v33mAk+Qb1yJEvKLeFRobKDOfJcJXmI9C921xWVHDWgVSm8/rxRy5P1TnBtVZ4UWNHiv2ITqa87GRgpGTUTq2Nk+Ruiz1NULu6CHcscDXlJLoLRfX+IVvnc4uONzdbUoSR8ro6s3Akdh1h0DJQCotth+bK9L5ZFON5P0ssdnmcAVjIXkVKD6D1fa3MvIBC9gyswXEzZxJdRoSX7w43YXkZDQFY3D0Tdr5HCPu52cHD7q09rpKAJcSZs3hf7p+OCY8kRFe34dCSYjkMQ4KNJIR4OL7JKsr4z4tFNM37EXS68/hPQ6BfQdT0tKscRtey8E0sLZDFwp/b4DwqG0 p5Qpm4Bf LjJdS+1NPDzYLkX81HgPT2W1irFwpsRoIi/6A1NLY8E0hf/mYe7OOo5gJAw8nwQdpM6bA8ZcDeilbPcI2Y5mTya7NtB7AHklUDOuZS9QURJL0ODT4vmWwo1wufRLADQ0JinGryHU+ksns/XWrrS0kASLKg0zfU9kJEEn0oTEnIqmzyGwmhwV0LQ2i9xmK1WukRjHbByPCYq1UUy3BufDgoB9ySiz8vJffnSKFh/rZJu/yXE/eHOFq8nSUbxUoA6kKYxo+Gwdewxfbw45iCMc8d+dtkmaqA3i9zD24obg32Tffuzazh1PWvmgHUGO93XRciNrDilCjnk89mPjD2N+4NYB3/ztFXE/5SUSZNy8jfIz87m6uRS9yswm9rtdSVjn/w8+dTr5e3KC3WQbLLfTzG23ZhHuA1ZXABT+h1N0LXzL0i0BJI9tYddSGzokCW8S9o/hr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000933, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 6, 2024 at 1:11=E2=80=AFPM Martin KaFai Lau wrote: > > On 9/4/24 3:21 PM, Martin KaFai Lau wrote: > > On 8/28/24 4:24 PM, Alexei Starovoitov wrote: > >>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record= *rec, > >>> void *obj) > >>> field->kptr.dtor(xchgd_field); > >>> } > >>> break; > >>> + case BPF_UPTR: > >>> + if (*(void **)field_ptr) > >>> + bpf_obj_unpin_uptr(field, *(void **)f= ield_ptr); > >>> + *(void **)field_ptr =3D NULL; > >> This one will be called from > >> task_storage_delete->bpf_selem_free->bpf_obj_free_fields > >> > >> and even if upin was safe to do from that context > >> we cannot just do: > >> *(void **)field_ptr =3D NULL; > >> > >> since bpf prog might be running in parallel, > >> it could have just read that addr and now is using it. > >> > >> The first thought of a way to fix this was to split > >> bpf_obj_free_fields() into the current one plus > >> bpf_obj_free_fields_after_gp() > >> that will do the above unpin bit. > >> and call the later one from bpf_selem_free_rcu() > >> while bpf_obj_free_fields() from bpf_selem_free() > >> will not touch uptr. > >> > >> But after digging further I realized that task_storage > >> already switched to use bpf_ma, so the above won't work. > >> > >> So we need something similar to BPF_KPTR_REF logic: > >> xchgd_field =3D (void *)xchg((unsigned long *)field_ptr, 0); > >> and then delay of uptr unpin for that address into call_rcu. > >> > >> Any better ideas? > > > > I think the existing reuse_now arg in the bpf_selem_free can be used. reu= se_now > (renamed from the earlier use_trace_rcu) was added to avoid call_rcu_task= s_trace > for the common case. > > selem (in type "struct bpf_local_storage_elem") is the one exposed to the= bpf prog. > > bpf_selem_free knows whether a selem can be reused immediately based on t= he > caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...= ., bool > reuse_now)". > > If a selem cannot be reuse_now (i.e. =3D=3D false), it is currently going= through > "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do > unpin_user_page() in the rcu callback. > > A selem can be reuse_now (i.e. =3D=3D true) if the selem is no longer nee= ded because > either its owner (i.e. the task_struct here) is going away in free_task()= or the > bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog = should > have a hold on the selem at this point. I think for these two cases, the > unpin_user_page() can be directly called in bpf_selem_free(). but there is also this path: bpf_task_storage_delete -> task_storage_delete -> bpf_selem_free -> bpf_obj_free_fields In this case bpf prog may still be looking at uptr address and we cannot do unpin right away in bpf_obj_free_fields. All other special fields in map value are ok, since they are either relying on bpf_mem_alloc and have rcu/rcu_tasks_trace gp or extra indirection like timer/wq. > One existing bug is, from looking at patch 6, I don't think the free_task= () case > can be "reuse_now =3D=3D true" anymore because of the bpf_task_release kf= unc did not > mark the previously obtained task_storage to be invalid: > > data_task =3D bpf_task_from_pid(parent_pid); > ptr =3D bpf_task_storage_get(&datamap, data_task, 0, ...); > bpf_task_release(data_task); > if (!ptr) > return 0; > /* The prog still holds a valid task storage ptr. */ > udata =3D ptr->udata; > > It can be fixed by marking the ref_obj_id of the "ptr". Although it is mo= re > correct to make the task storage "ptr" invalid after task_release, it may= break > the existing progs. Are you suggesting that bpf_task_release should invalidate all pointers fetched from map value? That will work, but it's not an issue for other special fields in there like kptr. So this invalidation would be need only for uptr which feels weird to special case it and probably will be confusing to users writing such programs. Above bpf prog example should be ok to use. We only need to delay unpin after rcu/rcu_task_trace gp. Hence my proposal in bpf_obj_free_fields() do: case UPTR: xchgd_field =3D (void *)xchg((unsigned long *)field_ptr, 0); call_rcu(...) to unpin.