From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E210C54E67 for ; Wed, 27 Mar 2024 21:41:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CC146B00A0; Wed, 27 Mar 2024 17:41:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07C866B00A1; Wed, 27 Mar 2024 17:41:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E86356B00A2; Wed, 27 Mar 2024 17:41:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CD73B6B00A0 for ; Wed, 27 Mar 2024 17:41:52 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 840A6801E1 for ; Wed, 27 Mar 2024 21:41:52 +0000 (UTC) X-FDA: 81944141664.24.BF8019C Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf10.hostedemail.com (Postfix) with ESMTP id 9C638C0003 for ; Wed, 27 Mar 2024 21:41:50 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QYU1CN8i; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711575710; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QzkPeJjBZryBqMmL4486tp2idMhhYHpknALnqkob6rs=; b=kDjscn3iTEnpzwsIHnBnSqgkIbpDSMqWYdUbI40cw01SCYfUKAPlycv9Uy290buIa4tkH2 FccKWpc8wCz6JLx3Bul+cVSpMxIUAULBlInXnGv8PwqbIHY37WqzrcB04qiUA407SKW5Dz yF7O92N2r5QkQ+4Tjv7+hzpr0e2vnWk= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QYU1CN8i; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711575710; a=rsa-sha256; cv=none; b=gCXtj2VSkKi4c76v3qD/KrobJtcFNvajCFKVEwJIiUzTe7atFCTuhUw9hqs/Tz8yUh1qjG RqkIROGlq3Zy4gnpONs0gKzx+LypgbyjzNLWRpZW754to98Mtg3Ws/tSUcT9O587c3w4gq 5Ex6Gg6SGtp2iOxF3OLDyuIXaYBSmp8= Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-414946b418dso2199165e9.0 for ; Wed, 27 Mar 2024 14:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711575709; x=1712180509; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QzkPeJjBZryBqMmL4486tp2idMhhYHpknALnqkob6rs=; b=QYU1CN8irrtGtmR58JPsDh23CCWs/2AwjGHZIzMR756yqR/0Xk5y2Bg7xWuV0BMzFJ 3VWdVm7H836thecXuGCUTjyZfksy80Hi3YJ7Zo1+MlvRJ9Q/6J9UyCR8pMWFCdF1Et6O ooE9FyS/D3E/wUGNXGNUZC+xtKoJJh3cyjyc5ERJ4D9wJtn6fzO8APsnUPQXCpZ9dRQz Oz/CQa1qDHtNiwoaFHFTJNj6ekc2I7lubb6U7WrsIi2wlc1ohE/bKRr4ssKZleZqJqHO uhkWfYOf3KUrxXXEGo3LS4vjDWCR2k9pw+VmXVeiLDYrFaOjeLOgE0FvnpC7NkpGku3U 2DjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711575709; x=1712180509; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QzkPeJjBZryBqMmL4486tp2idMhhYHpknALnqkob6rs=; b=bHtUGz3+xLSlDqctb/+I2JchGUHkWOf4c0ynL8XrElIvHEl3Q2xD8foBhHa/HxVSB8 bvBMTmT76KDhnIMqsbfLYHa7HHy5uf7eKEsF7EGW0EZoNidDSz7/tNy3OMDC3ddrCTMi xVo8uXjeHqzRFrfg49OQ4H5pBe/WW7hEIbF/3SyRZh1pAYoCQAq2374UlqPZfrX0gH/9 5sqFBW7nvDWiA22qz3nuN6q/2ghtEhws50YeLKerUu9FZeGJWREvLe7rFN4iTEyT3yOz 2hppXsob17C7fU8b98DYoxUdon9BrQVJ8rSVW5un8z7y7bqnDzs93hVyCV5s6q38Ozm8 RP+Q== X-Forwarded-Encrypted: i=1; AJvYcCWvTPzgJAuAcuKay/baPuqfwjRym0iCVFpFpcAwsQ48nlECyYQGX/EebUCyYbR9EjWQ5Oqw3v8L5GkWTrjkyHkrQeI= X-Gm-Message-State: AOJu0Yx3XuNqI+F5BdfNCWeBP/0hOKdJwtWlaTaSbF6TBjNjyp1Bez4y s1GeNQDmdnK2WiykElxsTIrVrVu+sXXTv+wpGdlwS1wudAbub11oLB9SPOTy1V5V3EG2IpWMTeV EMgM4GpZg9umsO/+A7IorgJQuARU= X-Google-Smtp-Source: AGHT+IH8bsjkbEPq5nFZa25RNEBMUp4NHKWgJkZvhsLEiO/AS+v/zIitCzEw8DN2gbwGn9LjyHANzGigswDNTFCf8qM= X-Received: by 2002:a05:6000:b04:b0:33e:40a3:22c8 with SMTP id dj4-20020a0560000b0400b0033e40a322c8mr425338wrb.33.1711575708527; Wed, 27 Mar 2024 14:41:48 -0700 (PDT) MIME-Version: 1.0 References: <20240306-flach-tragbar-b2b3c531bf0d@brauner> <20240306-sandgrube-flora-a61409c2f10c@brauner> <20240307-phosphor-entnahmen-8ef28b782abf@brauner> <20240308-kleben-eindecken-73c993fb3ebd@brauner> <20240311-geglaubt-kursverfall-500a27578cca@brauner> <20240318-individual-gekennzeichnet-1658fcb8bf27@brauner> In-Reply-To: <20240318-individual-gekennzeichnet-1658fcb8bf27@brauner> From: Alexei Starovoitov Date: Wed, 27 Mar 2024 14:41:36 -0700 Message-ID: Subject: Re: [PATCH v2 bpf-next 0/9] add new acquire/release BPF kfuncs To: Christian Brauner Cc: Matt Bobrowski , bpf , Alexei Starovoitov , Andrii Nakryiko , KP Singh , Jann Horn , Jiri Olsa , Daniel Borkmann , Linus Torvalds , Linux-Fsdevel , Andrew Morton , linux-mm , LSM List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9C638C0003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: thcs6bw1p5tdr8d8dd5ob9hr7pwazghp X-HE-Tag: 1711575710-786687 X-HE-Meta: U2FsdGVkX1+zpYEVbUhL5TsHsnYoccYGkkhTIHqv2VnFeOz8zqZj+dNtTQina64WFBvgLiU5nBczi8kicRRPjwoTr+lkBgF0+Q6p/Aye1Xs+rv7WbpylQrMzDt70bpdOlVYAJ5WRbTLJSSQMKEWsuGN+t7V51rFeWpNa2M2OVhtaPGUsnJCtFqkxfdZoBwXIV42ifW9Oen14JbFJUP2wMA2940/owz33QXmVLXuZEtqlOqvVhA8WlR3Q+o6blZFvuBN0PocKPF17ae3cdlkWK3Oq3MTuusnJPpOHJGGQYvRRDlBWoVTIoZgx7wmaYaYsqT9ob0xajdLkhuoWoUerqTqNMoLwJb2tq2leEc4y6c+80mTULogEZLhtd8KPPbyzfo65JQwTK9y+vUz2M8MG+4Hbl3see3moLYJYRz7o7Vjz3J7cHyTq4z0llWZaF5znfi0KukKq/2hM6ZqtqcwGyJT1pdfLzLAOcBslR9IhvYgU5KdEcYVXKjMeVq0maX45s6Z3Zi9iziXgYx+sbw5ToWIyVl9iq7BUsVfduIPP6XhZgAQpM9h6Y2O+Mcm2ybyk+olhfyipXDTt7PxVYllwe0lqVcQhWG7MBAZcSIAxtWxGUVCQMJtCRXWn+Vqa8tsunZYU03l4mtIQim+lQ8L1gVpsQjHMPHeHvUD6bhkEcheQqUUsuwi+jl+S75IvEbuZdp7We93th/0xwtZM7xtFikwdZ29Bhb19MgdgqQFKANCMiDTw5+VQVuWLR1+e2XoBK414aTcI+nVqcySOFAXDGfdQrh+Sk5QWQeIBwiaiTg9JaOcqwRHV1zcN5K3wYspbtBniIaJ4mCRye6KPcNlHH06I3fy6Ytji6I/lZKidH6tLQfAt5XTGkJo5a0it9F5/G8mzaWA+Rj97jAVxPVBepqxEOpZm87J3+AOoUGjQqtC3LKNS1DL0KSZHT2e7Q+9opwR4q9Fn5G1/CWEMJ+P 8esYuq2T cwhBx4fVt5InQklkE8YQlLjLscfeB4fJnUIge7i2PNXOzfeA8rkmY1r7Nu36mMfocf2p2dh834R5i8uySx42lbKeOSA/13y0B9rXeLNhhnNCf1ba5P8OfeE9NHK4M27QqfQlhc2RbccGSFkp2eJrL2l0tcwI8Ke4+QJ0X5uEm0ZLLcEyhYQilF6pDrvCYCyf71fdK0G/IhtUkEeIWYX2yGqkoeaODgnhajoNP+9iDnQdgqOxwSnEM2I6aIJiXtScrQZe/v28xCksWuJoWOqTUoRTsNTz7ay1rb3/N6xZDTXU1sq+sLCNRvOOqGxqPnqSLbJuGN6zUNThZE592vL6GM7PB5Hsv+zNDpKPP+PrTEU2daPJKS8c/UcM7s3S9xymbaxyOAsfEfadHqA2yk/dQOhvPFaqu+HAGsP0G8OYhuR34cdb+ie/zr5mGxJFz8WGsMF1ttRnanWyFBMaeUP3Rk2d2jw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 18, 2024 at 6:14=E2=80=AFAM Christian Brauner wrote: > > On Wed, Mar 13, 2024 at 02:05:13PM -0700, Alexei Starovoitov wrote: > > But this whole polemic just illustrates that you simply didn't bother to > understand how the code works. The way you talk about UAF together with > SLAB_TYPESAFE_BY_RCU is telling. Please read the code instead of > guessing. Ok. Fair enough. I've read the code and old threads from Sep-Nov. I think the concerns about typesafe_by_rcu made folks believe in races that don't exist. if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) .. * (a) the file ref already went down to zero and the * file hasn't been reused yet or the file count * isn't zero but the file has already been reused. .. if (unlikely(file !=3D rcu_dereference_raw(*fdentry)) || The first part of the comment is partially incorrect. (it's in the wrong place). The file ref could have been down to zero and not reused yet, but it's before atomic_long_inc_not_zero. Once the code reaches 2nd check the file guaranteed to be reused and it went through init_file(), because that's the only code that brings it back from zero. This race is ok: cpu0 cpu1 file =3D rcu_dereference_raw(*fdentry); // file->f_count =3D=3D 1 rcu_assign_pointer(fdt->fd[fd], NUL= L); fput() // reaches zero atomic_long_inc_not_zero() // will not succeed. This race is ok too: cpu0 cpu1 file =3D rcu_dereference_raw(*fdentry); // file->f_count =3D=3D 1 rcu_assign_pointer(fdt->fd[fd], NUL= L); atomic_long_inc_not_zero() // succeeds. f_count =3D=3D 2 fput() // f_count =3D=3D 1 file !=3D rcu_dereference_raw(*fdentry) // will fail but it doesn't have to. This is a safe race. It's no different if cpu0 went through these steps including successful last check and then cpu1 did close(fd) The file held by cpu0 is not on the path to zero. Similarly, back then, there was a concern about two parallel __fget_files_r= cu() where one cpu incremented refcnt, failed some check and didn't do fput yet. In this case the file is not on the path to zero either. Both cpu-s saw non-zero f_count when they went through atomic_long_inc_not_= zero. The file is not on the path to be reused. Now the second part of the comment "the file count isn't zero but the file has already been reused" is misleading. The (file !=3D rcu_dereference_raw(*fdentry)) check is racy. Another cpu could have replaced that slot right after that check. Example: cpu0 doing lockless __fget_files_rcu() while cpu1 doing sys_close. __fget_files_rcu() will be returning a file that doesn't exist in fdt. And it's safe. This race is possible: cpu0 cpu1 file =3D rcu_dereference_raw(*fdentry); fput() reaches zero file_free alloc_empty_file // got same file init_file // f_count =3D 1 atomic_long_inc_not_zero() // succeeds. f_count =3D=3D 2 file !=3D rcu_dereference_raw(*fdentry)) // preventing a reuse of a file that was never in this fdt. The only value of this check is to make sure that this file either _is_ or _was_ at some point in this fdt. It's not preventing reuse per-se. This race is possible: cpu0 cpu1 file =3D rcu_dereference_raw(*fdentry); fput() reaches zero file_free alloc_empty_file // got same file init_file // f_count =3D 1 fd_install atomic_long_inc_not_zero() // succeeds. f_count =3D=3D 2 file =3D=3D rcu_dereference_raw(*fdentry)) // success, though the file _was reused_. I suggest to revise the comment. > > You've been provided: > > a) good reasons why the patchset in it's current form isn't acceptable > repeated multiple times We will improve commit logs in the next revision. > b) support for exporting a variant of bpf_d_path() that is safe to use good, but bpf_d_path kfunc alone is not useful. As Jann noted back in September: https://lore.kernel.org/all/CAG48ez2d5CW=3DCDi+fBOU1YqtwHfubN3q6w=3D1LfD+ss= +Q1PWHgQ@mail.gmail.com/ The conversion of files to typesafe_by_rcu broke that verifier assumption about mm->exe_file and we need kfuncs to safely acquire/release file reference to fix that breakage. > c) a request that all kfunc exports for the vfs will have to be located > under fs/, not in kernel/bpf/ we've added kfuncs to net/netfilter/, net/xfrm/, net/ipv4/, kernel/cgroup/, drivers/hid/ because maintainers of those subsystems demonstrated understanding of what bpf is doing and what these kfuncs are for. We can put them in fs/, but you need to demonstrate willingness to understand the problem we're solving instead of arguing about how hard file typesafe_by_rcu is to understand. > d) a path on how to move forward with additional kfunc requests: > Clear and documented rules when it's ok for someone to come along and > request access to bpf kfuncs when it's to be rejected and when it's > ok to be supported. There are ~36500 EXPORT_SYMBOL in the kernel. Are there "clear documented rules when it's ok for someone to" add or remove them? There is a gentleman's agreement that maintainers of subsystems need to be cc-ed when new EXPORT_SYMBOL-s are added. In this case no new EXPORT_SYMBOLs are requested. Compare that to 221 bpf helpers (which are uapi, and for the last 2 years we didn't add a single one) and 151 bpf kfuncs which are not uapi as clearly documented in Documentation/bpf/kfuncs.rst When developers want to add them they cc bpf@vger and relevant subsystems just like we did with netfilter, xfrm, cgroup, hid. kfunc deprecation rules are also documented in kfunc.rst > You repeatedly threatening to go over the heads of people will not make > them more amenable to happily integrate with your subsystem. This is not it. We made our own mistakes with bpf_d_path safety, and now file typesafe_by_rcu broke bpf safety assumptions. We have to fix it one way or the other. It's bpf that got affected by your changes. But we don't demand that you fix bpf bits. We're fixing them. But you have to provide technical reasons why file acquire/release kfuncs are not suitable. "Only 3 people understand typesafe_by_rcu" is not it.