From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 573ABC531DC for ; Tue, 20 Aug 2024 18:01:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E51876B0092; Tue, 20 Aug 2024 14:01:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB3AF6B0096; Tue, 20 Aug 2024 14:01:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2C056B0098; Tue, 20 Aug 2024 14:01:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A6F716B0092 for ; Tue, 20 Aug 2024 14:01:29 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 25CCE1A0357 for ; Tue, 20 Aug 2024 18:01:29 +0000 (UTC) X-FDA: 82473391098.13.D56EC79 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf30.hostedemail.com (Postfix) with ESMTP id 3215880021 for ; Tue, 20 Aug 2024 18:01:26 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lcvho604; spf=pass (imf30.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724176871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XW2EDQsveDHftwUoUVuswpkNq+gO1+8ZL6TivHjfcJ0=; b=mwU+tRmtytA2MFCnJvhL5uzpQInzrUzk+X/mEV+jMfYEDgnWt/0jMO2KmStpcBf+MMx5s2 XCCjnFlMiP4fSWZCVhpbFgXj4YvFYYWzsdwuqfOYIQkgNXICnExn5Myw1a0rHinor/e5Hj sBkxOhXSokFkPVUmG1UHrKubnN7pGkE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lcvho604; spf=pass (imf30.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724176871; a=rsa-sha256; cv=none; b=E845cww6vW5ssh1db3dHeBzJ0DZjUTf1Oq5lNPJxB4dazV1FTLa7VywG7LaWxufnbzYh9u NZqLkiCv/Rw5FpweBMhvEL7OBRYBZO+iakRwbhN16rXivI0W87Vw6o+myVcv1UELbo8LXD 6GPr733Q+7ibcTI7hZFdjsdJNGQFJd0= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2023dd9b86aso20847065ad.1 for ; Tue, 20 Aug 2024 11:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724176886; x=1724781686; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XW2EDQsveDHftwUoUVuswpkNq+gO1+8ZL6TivHjfcJ0=; b=lcvho604BchYa1DesIRgYHsPglc8YdEqTY/GGiWCh7tAHxRwnHbL169R+ncceLjFXV q+JdpRWwtDuBTbpIrBEt/TJtMNV6RSBFyS4AxlgAKwYhMYK1vW1XYg8PJlRgxJi5sCC+ gbV8ot4lXOm7h6+8Mv3dzpOALM0UUm375G9ce/ZQrR4rBUl1PJ81C4XgsHuCcBwJ7xXy d3t+pbXd3a+P+Ajlfdw5VbQ+Rmiq0IvcZx71tv+XeqQNb4Vh4sAubST+hKKnLddCWWre 93NrvVP421wOGDgJnEOJm+2tazEkeXzwb3tJrjEZ98F1GxWvvXrg8v8U8fIgJips+csR kpEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724176886; x=1724781686; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XW2EDQsveDHftwUoUVuswpkNq+gO1+8ZL6TivHjfcJ0=; b=LG1TruSatmWnmMd8APVQgi4uHs/hoKbr8LQdJzo0HEDhOBPev0qe2PagMMHG6phTFP LPJHZ6ITnN+QMGT+gOcjps3QLchemEIY5s8pq+8v1yZfKmlkFggkEVkkIbg1StFn9EWQ 2H4hLJWAZKw4idSLw8G79BA+1WpjkiZLxQXGpsmHozBZGiRktLu7sHEg3pVcyyYgxsGX dUmHfLhshT3KGmYaT+LHD4GbFtvXECp2r/yQGRb9KnDBD9iuenabaKwnaZHSMcV8c7Ek EgwNabxNcOa2BRLDkrHqtKGGiZV0zdUiE3iIeGnBdCGPbKZFKAchGlsrlpwW5u50lYLK itNg== X-Forwarded-Encrypted: i=1; AJvYcCXPkbEI14eCEPcHeVUhFfgfjOQgQyWUDyUIH2G+qh4HujT4oIlv1imBJlm20kU2yBxercxWBAXz7Q==@kvack.org X-Gm-Message-State: AOJu0YwyMUy/7UqL4bcN/xq0/XG3o9zUmX2ZqqyYww1GvDtX55oEvjtj 0ZHZyBk4xopgbJWs7Ib0CfC1hxtrp0F70hUzlNEkYVG+2TsYwowOmRWgXm3pD4N8MrJazt5sE4d NiiJpl3ou4pe8uX+r6onUukdoWDc= X-Google-Smtp-Source: AGHT+IH+ytdwRHSzvQk3nNnD41x0T2rOFsEZheRcnQGQLslFa+6HExdYnCpFoslduDeAxl5b3tb0cFOh9PB4zP2KLbg= X-Received: by 2002:a17:90a:744a:b0:2c9:61ad:dcd9 with SMTP id 98e67ed59e1d1-2d3e00ec17dmr14073007a91.27.1724176885685; Tue, 20 Aug 2024 11:01:25 -0700 (PDT) MIME-Version: 1.0 References: <20240813042917.506057-1-andrii@kernel.org> <20240813042917.506057-10-andrii@kernel.org> <20240819134107.GB3515@redhat.com> <20240820150534.GD12400@redhat.com> In-Reply-To: <20240820150534.GD12400@redhat.com> From: Andrii Nakryiko Date: Tue, 20 Aug 2024 11:01:13 -0700 Message-ID: Subject: Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) To: Oleg Nesterov Cc: Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, peterz@infradead.org, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: h7i4brgyftw3a83ei51hnm3fuaansnbt X-Rspamd-Queue-Id: 3215880021 X-Rspamd-Server: rspam11 X-HE-Tag: 1724176886-949594 X-HE-Meta: U2FsdGVkX18VkdrcKZXwJHwdZgJJ2quU5oP4Q3i0vgzX81Ctcs1Pg8c4AVordXCckuoglO0Jkmxdt9l1ygjAN7OzL5EZqcX4Kr/Zsxe6iw44q0mkzuAL/RxzsUOxVRygzahvjesxKO5XkclyBLcqHR2rjvLGLB8DEWcaq8MS7CQ7tEyC632cF75TB3LEpG4nwHotJrvxTXz1yYgmjuSeoS9WlfWQ9EnhG0WkeilUjrQX8163vWeWigikNNoWkjuLYMp6pqBBV6u7R+CVG395AHqy3rWIctT6B3FMHp+b5mZozm/fNvcqjcsdJ928TTh/KgAePuy10kOcoCbAnRSbWmpFy8iBN2tLyTgOQuS0pHiqo4k0NPk1g3qsp+4PGIZZgaqAtEx1dSGZ2rDFXO0NxhhIrptuW9uIl1bTbWvh5J+9DjoFB8wnOQq4fvLmTnsokVk/UOZY32D31SLs5tvAZYU7gMt3NNNaseO7BfX9U3UUAt6o3VEwInI6SlDeGwLCr/LdZD3w9QdJ69k2hmc9cT+f29fn8xzeD9gPlDNCcXaaJfJos37wrn7Ytq0VISS0X5ycduuyapBOma/fZpW0KiOBfOhLTcYtxJb8Hv15X2mz0nyhpTdKFXKuuRZGWUKTl6Xu8Rl45femdZOwMU8g1wMc9vCfw7Tp4C32tSV44YpCsKmumAgI95jeFzUCcfuulddfKcTaribAphHXUXFVDn8cMOt8WhfvTsHnhG4boWGBCG6sWHmazb9EtE9YHVpiPFk86O/MQMn4SW0dWQQNlta9CetG3Br2kqbaNzBVVg7vRtZ7++KO4me63RUjbXnhUX5nRl990DrA1M2hlTkX4MDNUtbXFGpkHp45Y0f/X/gsk217SLuyLY7DTzliisWGIus+1o/UI0+OX62KMIN8zcGvN6z3jgsK0rYkTOwEhOZ9iEB9NdqUuVbWTQPiOEici1Kt0nkNdye6NaJQ+T4 DJVyY5zk zU7vY//wGEvbNpeZ1k9gmOkDLbsJSQsipx8GNl0kjJEo7yYKQ2gxKbtZRUn9CvCPnnSjFGTsJtcmvd85ciipPUkLLVla/sXFkaT849dTyMwBfzd1as963NZoxiAnVd5O79jVHOH0RSybLHa6L4eYE6WpKvi3hTuJNxAk1KANA3k/nMcIW1Ao/in9k8HYffZyQoBDkKXJLIF3DsASo1+3cAniaqo7AYUVxL0YRRohqjj40nmT1mXrUqdIkwJ03XEaOsm0VSaOjUnDnAgANdlarNOBDt04rmkBt/JLQgIIjs8AZEIbXpUnYF4u7l5/+hzOXF4T7PlYM71Heqee5lS49TwYvEc0Y8IrWM2k2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 20, 2024 at 8:06=E2=80=AFAM Oleg Nesterov wro= te: > > On 08/19, Andrii Nakryiko wrote: > > > > On Mon, Aug 19, 2024 at 6:41=E2=80=AFAM Oleg Nesterov = wrote: > > > > > > On 08/12, Andrii Nakryiko wrote: > > > > > > > > Avoid taking refcount on uprobe in prepare_uretprobe(), instead tak= e > > > > uretprobe-specific SRCU lock and keep it active as kernel transfers > > > > control back to user space. > > > ... > > > > include/linux/uprobes.h | 49 ++++++- > > > > kernel/events/uprobes.c | 294 ++++++++++++++++++++++++++++++++++--= ---- > > > > 2 files changed, 301 insertions(+), 42 deletions(-) > > > > > > Oh. To be honest I don't like this patch. > > > > > > I would like to know what other reviewers think, but to me it adds to= o many > > > complications that I can't even fully understand... > > > > Which parts? The atomic xchg() and cmpxchg() parts? What exactly do > > you feel like you don't fully understand? > > Heh, everything looks too complex for me ;) Well, the best code is no code. But I'm not doing this just for fun, so I'm happy with the simplest solution *that works*. > > Say, hprobe_expire(). It is also called by ri_timer() in softirq context, > right? And it does > > /* We lost the race, undo our refcount bump. It can drop to zero.= */ > put_uprobe(uprobe); > > How so? If the refcount goes to zero, put_uprobe() does mutex_lock(), > but it must not sleep in softirq context. > Now we are talking about specific issues, thank you! It's hard to discuss "I don't like". Yes, I think you are right and it is certainly a problem with put_uprobe() that it can't be called from softirq (just having to remember that is error-prone, as is evidenced by me forgetting about this issue). It's easy enough to solve. We can either schedule work from timer thread (and that will solve this particular issue only), or we can teach put_uprobe() to schedule work item if it drops refcount to zero from softirq and other restricted contexts. I vote for making put_uprobe() flexible in this regard, add work_struct to uprobe, and schedule all this to be done in the work queue callback. WDYT? > > Or, prepare_uretprobe() which does > > rcu_assign_pointer(utask->return_instances, ri); > > if (!timer_pending(&utask->ri_timer)) > mod_timer(&utask->ri_timer, ...); > > Suppose that the timer was pending and it was fired right before > rcu_assign_pointer(). What guarantees that prepare_uretprobe() will see > timer_pending() =3D=3D false? > > rcu_assign_pointer()->smp_store_release() is a one-way barrier. This > timer_pending() check may appear to happen before rcu_assign_pointer() > completes. > > In this (yes, theoretical) case ri_timer() can miss the new return_instan= ce, > while prepare_uretprobe() can miss the necessary mod_timer(). I think thi= s > needs another mb() in between. > Ok, that's fair. I felt like this pattern might be a bit problematic, but I also felt like it's good to have to ensure that we do occasionally have a race between timer callback and uretprobe, even if uretprobe returns very quickly. (and I did confirm we get those races and they seem to be handled fine, i.e., I saw uprobes being "expired" into refcounted ones from ri_timer) But the really simple way to solve this is to do unconditional mod_timer(), so I can do just that to keep this less tricky. Would you be ok with that? > > And I can't convince myself hprobe_expire() is correct... OK, I don't > fully understand the logic, but why data_race(READ_ONCE(hprobe->leased)) = ? > READ_ONCE() should be enough in this case? you mean why data_race() annotation? To appease KCSAN, given that we modify hprobe->leased with xchg/cmpxchg, but read here with READ_ONCE(). Maybe I'm overthinking it, not sure. There is a reason why this is an RFC ;) > > > > > As I have already mentioned in the previous discussions, we can simpl= y kill > > > utask->active_uprobe. And utask->auprobe. > > > > I don't have anything against that, in principle, but let's benchmark > > and test that thoroughly. I'm a bit uneasy about the possibility that > > some arch-specific code will do container_of() on this arch_uprobe in > > order to get to uprobe, > > Well, struct uprobe is not "exported", the arch-specific code can't do th= is. > Ah, good point, that's great. But as I said, uretprobe is actually *the common* use case, not single-stepped uprobe. Still not very happy about that memcpy() and assumption that it's cheap, but that's minor. But no matter what we do for single-stepped one, uretprobe needs some solution. (and if that solution works for uretprobe, why wouldn't it work for single-step?...) > Oleg. >