From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03FB4CA0EC8 for ; Thu, 29 Aug 2024 23:31:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 695766B008A; Thu, 29 Aug 2024 19:31:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 644E86B0092; Thu, 29 Aug 2024 19:31:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50CB46B0093; Thu, 29 Aug 2024 19:31:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 335DE6B008A for ; Thu, 29 Aug 2024 19:31:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A386880A90 for ; Thu, 29 Aug 2024 23:31:34 +0000 (UTC) X-FDA: 82506882108.04.9FE4E10 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf01.hostedemail.com (Postfix) with ESMTP id AD80040011 for ; Thu, 29 Aug 2024 23:31:32 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="AvJ/+G3g"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724974220; a=rsa-sha256; cv=none; b=V9CnHFSibYf13AYEkTwgzV/DCdYrTeL07J16mp2ooR51byxvaj4x/onksQoEI4hzUDh4jX AJtGaMEFNnfy+eXaWXJtsegOui8btO4WeKk9pPPDWOdLGXgEgOcZ7RsQp4lYsVyghhHHbe 4devNYrKCOUvATlNSxV79BbkSp0kb8M= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="AvJ/+G3g"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724974220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0sGLpSuNSb9TW4wOVrUrMRuVsSgylePv3SAAiV7rgxA=; b=biYi3BAlNze8KLG4xDniq3g6vpGBfrPRmUfI2A6lMeBkCsU1a26Rea7PBQNc6KlblrrhfK RSTZZ9MfYfaeElYaEy1JJSE3fuTIJxSGP48RR8PVr//TWrmHhFNg1tGo4KUDp1uECewSWz YoEsUqSxg7MkHECjeneJJ+kxcXCCzqw= Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2d5f5d8cc01so939369a91.0 for ; Thu, 29 Aug 2024 16:31:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724974291; x=1725579091; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0sGLpSuNSb9TW4wOVrUrMRuVsSgylePv3SAAiV7rgxA=; b=AvJ/+G3gvqY5uVnXUXyA2zSqiX3hqc6Na6+5xwnoDa9gl1a54ZZKCSdWPUtzn+RBMK SIAq95Kdivw5CO9sOGAYh1XhwGf2nsX7o4j7dAWwITufF6+N/+jFf7VSs/S4wiLXjkow 7xl1A8nnTcNFfOme44e1O5FQVJD7SssFKaRDNxA/OkXoBKnL7/nCY8yf+9kRhYHpR+ia 5plfC0IvIIBNxM59PGdpGkZ4wc8czjHAP2fudR2dKzjI88f+c3boCUs9x6dHqAnOjpqS 5d4IfEQaahLwYJJuIHjUxXdnyWwlBNikM7iNBWo0/4CqGGxqksIFbn0TiKwj3KSRz76M fS5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724974291; x=1725579091; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0sGLpSuNSb9TW4wOVrUrMRuVsSgylePv3SAAiV7rgxA=; b=GUdb6pnHvw92zMblAhxL+Txie5VrjmEVy7Qz8nRvsbjhQKCXRPoDAyXPA3lMPfgYJS XMjM/ZHqfdV3qG2KjZfxPx+VMnoCmLwm7+AX0C7zS60J+2pbRW/cEsreIWcf1MkuxYSl 3EZakd0GACw28xqogWh9wDEjxCJw6mWsY9n9Lv2jU7gDRC5pDToZzRI6vFxOo3hTBFZe slpq1T/1kchJNHZQmMN83cf9mJamLPEzYaz24b7/rM+pyKfHpxYRzHRaz6sMpbn+F26m GayJi8uVGSMIThbwEyb6u7BmkKAn6wanQI1+58VRiSD+qJU++DdF82jxggmrvV2+xx8z 4J2w== X-Forwarded-Encrypted: i=1; AJvYcCXUZGxaS26odpdPjqZ+GYTXoepooyR1SryyWS8C97OaOMISKURXKP6OIMcFDFQp25xZtUC7D+qahg==@kvack.org X-Gm-Message-State: AOJu0YwqkRWX7p5hViTQFq2wzsRkO8j81ZkbgTwwlidJC0MpIwuKsuwm m+8QPmeMKmAukcko3CYnUFMn3A3Y1uBzX2/jsB3NKBFp/fCm+jD4I0SobearHnDrQ6aUzG1Qnm9 V+zqWG++6FM+Yh1H5ILc3wEMlUi0= X-Google-Smtp-Source: AGHT+IG2BzeU3+Y2dsTguQEEYV/W0QoXOwVAztdGZI+xFocA80n2JYgBL9eumUXtEmUdGK+owUNOgdddLh6JmuVWmBo= X-Received: by 2002:a17:90b:2750:b0:2c9:7343:71f1 with SMTP id 98e67ed59e1d1-2d86b00ece4mr657169a91.14.1724974291105; Thu, 29 Aug 2024 16:31:31 -0700 (PDT) MIME-Version: 1.0 References: <20240829183741.3331213-1-andrii@kernel.org> <20240829183741.3331213-5-andrii@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Thu, 29 Aug 2024 16:31:18 -0700 Message-ID: Subject: Re: [PATCH v4 4/8] uprobes: travers uprobe's consumer list locklessly under SRCU protection To: Jiri Olsa Cc: Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AD80040011 X-Stat-Signature: ukzsnds3zzw3pno8pxjsumk3nh5hbx7j X-Rspam-User: X-HE-Tag: 1724974292-145193 X-HE-Meta: U2FsdGVkX18tHr62Uy47jjRjGy1/pJJHPWcqgkGPUWOEmWSqClV6buwzOPBFeWGSuD0nI3pYeTmM3sKC7GGKpWBQvvxG7KkOlq3PQsa+tM0krIfGGjEc7O7xmQ483RLgSl/XyOp4L13FYlHt46u85YwkW/JS1zgNsxjzLjz0nSKdp7YSqDD+leLtzSQY60hOrtdAGOxsLkuqh5bhJoB4iHdVe2DT1IFNncJgt0vzbjp0++cPHFHncJ5s+QNTVZ0WL1OHhyXYCi2v79CMZGv57JR7W23YDN97JCl4UCbjropM/pY0fGl+E4jPkHBPnXUPgFKM0nJWbKGmqmWwaxw4fsD4Lls60zQcAGEcDlOZka+vXmt+iUkF+wM9+Z0Tu+BzGRFL3ig1y1rA/IdIe8KbBhq/uYLBrK/L80uIBG4V70dsREKPmpM6E4vYEpfBoRWrDTIlhu+9WVuxRnruFcLO0nS75MY1YB7WpX2aKVxMBjdIPfl17Oq6foybpzVRsbchQW7HP+KclKOYH462FlUv2aYpry9aYnt0kXfYQOUUmhZp0RTMsH95e8+fare2xxuexqD4jBTUfdb/uOQq/YGg4AS1Vg+LAP+q0RonmX5V/FiibvcXwVZYv1VTfgTuLzafEZp3TZL3ZBWMrtIUsFdxVRdpbV2qIxOq+zawjISLIR2QDJfOhPVC0NPgLg8z8o1mc4oPxFB/IeIlJAztDYJGhsxG5LwrQFzptLX8ZxZBkv4z4DR+ijUOl6gJTuHBvGEPcVolRUf44lo1vFaWV2/CLph4UID7T5nrQTohG8oSqWEnu+TBLhM4n1T9/azAbR/6yTl9sIuVsp4hKB/akWj93iPhRdnVkMEQ3RZC1Bw6N02O69P4/TWzdAnTwojYGV2Ruy1akjmO5i/vaxhBQRCelxzZ+O73NVzsh4Lp+YtlG4gI6Lc5HtTm9p8bt4hfnnWQqgRW1kc/jDT0nmWGglr 19pKEnJb es5AGl3ePFr+BQb+2rgTfTLqzDQZM9z+8DdcjW9ND2K0lk1U+NOAHMcYn7hUNtQePSHaBOM7vF31vjPDvQ2O5mms9CTPWFK0iZ95EPS3boG4gRcKHrV889fM3xlu2+RTh3abZJqbguBL+Ma62PmbuM3C+wEvTlYDz7niF0nNTLjtxnZfZuljGc1vCVj6JqOuiie4U5Qghm2RhwDP8Y/g6xXDYstBxbhBep78y6Vtvjaxlf090ID1nWpUYQmXfp/SOq+wuW0TQxhNdtNIXfLSaoUCIEIEwWc+N3B0P4Wjzyks8Gq8LElFo24kiAyCGCe7frxISE6BVp0/8BJ3KtmkHnTSOmdMenMgG8qKhZqBVogEXFb3r5HTg/T3vifbHbpWGHidtCnl/YSJ1MbYZJzh/ZdKGzQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 4:10=E2=80=AFPM Jiri Olsa wrot= e: > > On Thu, Aug 29, 2024 at 11:37:37AM -0700, Andrii Nakryiko wrote: > > uprobe->register_rwsem is one of a few big bottlenecks to scalability o= f > > uprobes, so we need to get rid of it to improve uprobe performance and > > multi-CPU scalability. > > > > First, we turn uprobe's consumer list to a typical doubly-linked list > > and utilize existing RCU-aware helpers for traversing such lists, as > > well as adding and removing elements from it. > > > > For entry uprobes we already have SRCU protection active since before > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > > won't go away from under us, but we add SRCU protection around consumer > > list traversal. > > > > Lastly, to keep handler_chain()'s UPROBE_HANDLER_REMOVE handling simple= , > > we remember whether any removal was requested during handler calls, but > > then we double-check the decision under a proper register_rwsem using > > consumers' filter callbacks. Handler removal is very rare, so this extr= a > > lock won't hurt performance, overall, but we also avoid the need for an= y > > extra protection (e.g., seqcount locks). > > > > Signed-off-by: Andrii Nakryiko > > --- > > include/linux/uprobes.h | 2 +- > > kernel/events/uprobes.c | 104 +++++++++++++++++++++++----------------- > > 2 files changed, 62 insertions(+), 44 deletions(-) > > > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > > index 9cf0dce62e4c..29c935b0d504 100644 > > --- a/include/linux/uprobes.h > > +++ b/include/linux/uprobes.h > > @@ -35,7 +35,7 @@ struct uprobe_consumer { > > struct pt_regs *regs); > > bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm= ); > > > > - struct uprobe_consumer *next; > > + struct list_head cons_node; > > }; > > > > #ifdef CONFIG_UPROBES > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index 8bdcdc6901b2..97e58d160647 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -59,7 +59,7 @@ struct uprobe { > > struct rw_semaphore register_rwsem; > > struct rw_semaphore consumer_rwsem; > > struct list_head pending_list; > > - struct uprobe_consumer *consumers; > > + struct list_head consumers; > > struct inode *inode; /* Also hold a ref to ino= de */ > > struct rcu_head rcu; > > loff_t offset; > > @@ -783,6 +783,7 @@ static struct uprobe *alloc_uprobe(struct inode *in= ode, loff_t offset, > > uprobe->inode =3D inode; > > uprobe->offset =3D offset; > > uprobe->ref_ctr_offset =3D ref_ctr_offset; > > + INIT_LIST_HEAD(&uprobe->consumers); > > init_rwsem(&uprobe->register_rwsem); > > init_rwsem(&uprobe->consumer_rwsem); > > RB_CLEAR_NODE(&uprobe->rb_node); > > @@ -808,32 +809,19 @@ static struct uprobe *alloc_uprobe(struct inode *= inode, loff_t offset, > > static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer= *uc) > > { > > down_write(&uprobe->consumer_rwsem); > > - uc->next =3D uprobe->consumers; > > - uprobe->consumers =3D uc; > > + list_add_rcu(&uc->cons_node, &uprobe->consumers); > > up_write(&uprobe->consumer_rwsem); > > } > > > > /* > > * For uprobe @uprobe, delete the consumer @uc. > > - * Return true if the @uc is deleted successfully > > - * or return false. > > + * Should never be called with consumer that's not part of @uprobe->co= nsumers. > > */ > > -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer= *uc) > > +static void consumer_del(struct uprobe *uprobe, struct uprobe_consumer= *uc) > > { > > - struct uprobe_consumer **con; > > - bool ret =3D false; > > - > > down_write(&uprobe->consumer_rwsem); > > - for (con =3D &uprobe->consumers; *con; con =3D &(*con)->next) { > > - if (*con =3D=3D uc) { > > - *con =3D uc->next; > > - ret =3D true; > > - break; > > - } > > - } > > + list_del_rcu(&uc->cons_node); > > up_write(&uprobe->consumer_rwsem); > > - > > - return ret; > > } > > > > static int __copy_insn(struct address_space *mapping, struct file *fil= p, > > @@ -929,7 +917,8 @@ static bool filter_chain(struct uprobe *uprobe, str= uct mm_struct *mm) > > bool ret =3D false; > > > > down_read(&uprobe->consumer_rwsem); > > - for (uc =3D uprobe->consumers; uc; uc =3D uc->next) { > > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > > + srcu_read_lock_held(&uprobes_srcu)) { > > ret =3D consumer_filter(uc, mm); > > if (ret) > > break; > > @@ -1125,18 +1114,29 @@ void uprobe_unregister(struct uprobe *uprobe, s= truct uprobe_consumer *uc) > > int err; > > > > down_write(&uprobe->register_rwsem); > > - if (WARN_ON(!consumer_del(uprobe, uc))) { > > - err =3D -ENOENT; > > - } else { > > - err =3D register_for_each_vma(uprobe, NULL); > > - /* TODO : cant unregister? schedule a worker thread */ > > - if (unlikely(err)) > > - uprobe_warn(current, "unregister, leaking uprobe"= ); > > - } > > + consumer_del(uprobe, uc); > > + err =3D register_for_each_vma(uprobe, NULL); > > up_write(&uprobe->register_rwsem); > > > > - if (!err) > > - put_uprobe(uprobe); > > + /* TODO : cant unregister? schedule a worker thread */ > > + if (unlikely(err)) { > > + uprobe_warn(current, "unregister, leaking uprobe"); > > + goto out_sync; > > + } > > + > > + put_uprobe(uprobe); > > + > > +out_sync: > > + /* > > + * Now that handler_chain() and handle_uretprobe_chain() iterate = over > > + * uprobe->consumers list under RCU protection without holding > > + * uprobe->register_rwsem, we need to wait for RCU grace period t= o > > + * make sure that we can't call into just unregistered > > + * uprobe_consumer's callbacks anymore. If we don't do that, fast= and > > + * unlucky enough caller can free consumer's memory and cause > > + * handler_chain() or handle_uretprobe_chain() to do an use-after= -free. > > + */ > > + synchronize_srcu(&uprobes_srcu); > > } > > EXPORT_SYMBOL_GPL(uprobe_unregister); > > > > @@ -1214,13 +1214,20 @@ EXPORT_SYMBOL_GPL(uprobe_register); > > int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bo= ol add) > > { > > struct uprobe_consumer *con; > > - int ret =3D -ENOENT; > > + int ret =3D -ENOENT, srcu_idx; > > > > down_write(&uprobe->register_rwsem); > > - for (con =3D uprobe->consumers; con && con !=3D uc ; con =3D con-= >next) > > - ; > > - if (con) > > - ret =3D register_for_each_vma(uprobe, add ? uc : NULL); > > + > > + srcu_idx =3D srcu_read_lock(&uprobes_srcu); > > + list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, > > + srcu_read_lock_held(&uprobes_srcu)) { > > + if (con =3D=3D uc) { > > + ret =3D register_for_each_vma(uprobe, add ? uc : = NULL); > > + break; > > + } > > + } > > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > > + > > up_write(&uprobe->register_rwsem); > > > > return ret; > > @@ -2085,10 +2092,12 @@ static void handler_chain(struct uprobe *uprobe= , struct pt_regs *regs) > > struct uprobe_consumer *uc; > > int remove =3D UPROBE_HANDLER_REMOVE; > > bool need_prep =3D false; /* prepare return uprobe, when needed *= / > > + bool has_consumers =3D false; > > > > - down_read(&uprobe->register_rwsem); > > current->utask->auprobe =3D &uprobe->arch; > > - for (uc =3D uprobe->consumers; uc; uc =3D uc->next) { > > + > > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > > + srcu_read_lock_held(&uprobes_srcu)) { > > int rc =3D 0; > > > > if (uc->handler) { > > @@ -2101,17 +2110,24 @@ static void handler_chain(struct uprobe *uprobe= , struct pt_regs *regs) > > need_prep =3D true; > > > > remove &=3D rc; > > + has_consumers =3D true; > > } > > current->utask->auprobe =3D NULL; > > > > if (need_prep && !remove) > > prepare_uretprobe(uprobe, regs); /* put bp at return */ > > > > - if (remove && uprobe->consumers) { > > - WARN_ON(!uprobe_is_active(uprobe)); > > - unapply_uprobe(uprobe, current->mm); > > + if (remove && has_consumers) { > > + down_read(&uprobe->register_rwsem); > > + > > + /* re-check that removal is still required, this time und= er lock */ > > + if (!filter_chain(uprobe, current->mm)) { > > sorry for late question, but I do not follow this change.. > > at this point we got 1 as handler's return value from all the uprobe's co= nsumers, > why do we need to call filter_chain in here.. IIUC this will likely skip = over > the removal? > Because we don't hold register_rwsem we are now racing with registration. So while we can get all consumers at the time we were iterating over the consumer list to request deletion, a parallel CPU can add another consumer that needs this uprobe+PID combination. So if we don't double-check, we are risking having a consumer that will not be triggered for the desired process. Does it make sense? Given removal is rare, it's ok to take lock if we *suspect* removal, and then check authoritatively again under lock. > with single uprobe_multi consumer: > > handler_chain > uprobe_multi_link_handler > uprobe_prog_run > bpf_prog returns 1 > > remove =3D 1 > > if (remove && has_consumers) { > > filter_chain - uprobe_multi_link_filter returns true.. so the uprob= e stays? > > maybe I just need to write test for it ;-) > > thanks, > jirka > > > > + WARN_ON(!uprobe_is_active(uprobe)); > > + unapply_uprobe(uprobe, current->mm); > > + } > > + > > + up_read(&uprobe->register_rwsem); > > } > > - up_read(&uprobe->register_rwsem); > > } > > > > static void > > @@ -2119,13 +2135,15 @@ handle_uretprobe_chain(struct return_instance *= ri, struct pt_regs *regs) > > { > > struct uprobe *uprobe =3D ri->uprobe; > > struct uprobe_consumer *uc; > > + int srcu_idx; > > > > - down_read(&uprobe->register_rwsem); > > - for (uc =3D uprobe->consumers; uc; uc =3D uc->next) { > > + srcu_idx =3D srcu_read_lock(&uprobes_srcu); > > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > > + srcu_read_lock_held(&uprobes_srcu)) { > > if (uc->ret_handler) > > uc->ret_handler(uc, ri->func, regs); > > } > > - up_read(&uprobe->register_rwsem); > > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > > } > > > > static struct return_instance *find_next_ret_chain(struct return_insta= nce *ri) > > -- > > 2.43.5 > >