From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D34B1CA0EC7 for ; Thu, 29 Aug 2024 23:10:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FD176B0089; Thu, 29 Aug 2024 19:10:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AD516B008A; Thu, 29 Aug 2024 19:10:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34D236B008C; Thu, 29 Aug 2024 19:10:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 18D6C6B0089 for ; Thu, 29 Aug 2024 19:10:11 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B7406161237 for ; Thu, 29 Aug 2024 23:10:10 +0000 (UTC) X-FDA: 82506828180.10.91B2A18 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf08.hostedemail.com (Postfix) with ESMTP id C2DAA16000E for ; Thu, 29 Aug 2024 23:10:08 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JyVb9821; spf=pass (imf08.hostedemail.com: domain of olsajiri@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=olsajiri@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724972890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hRlsZ+3H5fYdYp9L3nxbm/aJCbmtautxoF+qQQN3vuU=; b=oF6tiXjP4yQghMbAt6gtbfCvN2bxbQ315D9WFrpoxyUSjU1hSF2uMgG+IMmIkH608whb2H 7bDbjglS8hUI4gjGCabm1bJtqDyGYe6UNkSVDRgMGfHFWrIjSJ3YEiUzkGwAxt418L7xiQ o98MPRwKR8EFS9T0XrP2xoWgwEyrKes= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JyVb9821; spf=pass (imf08.hostedemail.com: domain of olsajiri@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=olsajiri@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724972890; a=rsa-sha256; cv=none; b=vrAzgChktIgFJRmt6Aci5ymYLDiNlLDkM2+S6BY0QHTHqQO2U/ZKfiESL0JHnFC5h489z0 gJgzQL31I77ZsDCCoKp8FE0/Btjxoo3nLDT5pySjkTy5d0hRzy/ysnuvmx43vwKv1EKXHB qcXScK2UyesiI+IEVmhv7FTt184yijQ= Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-53349d3071eso1548120e87.2 for ; Thu, 29 Aug 2024 16:10:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724973007; x=1725577807; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=hRlsZ+3H5fYdYp9L3nxbm/aJCbmtautxoF+qQQN3vuU=; b=JyVb9821C9lo/+0BMl0uB/UzLVE1RK9CZ/kE85XKx9sYdMqb0Ahxv13BH6dX0gh8Q+ iVaVDu3XNaVR1+1auR1mE3LRczVJ8bGNLcpGKmNjwpsiq1X956zGqyqM2yvBOmsG9sZs FBnpUMjjFkdxZOUBclt28fySXcmBCqtJqdd0mIG44H4/uZzDNeQIAJdRbS7I/LV9FMnC 9Lx9VvDkmI18jBoXAa9QZYlLlsMucvYUuZNKGT6T/+/BjMV7vENXZ1qO7MjeGyheuKCO K888PjUdK2RIW9o1kBcmbGibl9Y4Lqxi65uKzvNEdF7neZLMYHAb8gyA0zqazYFblkp0 oCqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724973007; x=1725577807; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hRlsZ+3H5fYdYp9L3nxbm/aJCbmtautxoF+qQQN3vuU=; b=bjbCSUc+ptLo3Cr+FVyimRRV41EiprqRvsUB4kIsA65cRAjV9g9yCb61AKHZ2Js2i3 T1htR6Jrxe73PM1RBXXR2Siq+/Vyvdb1tRydEo4aLPS9467exSb33NuOxtdX2pSLhpb9 LJsyhlguvdg0zZ8/a9jZQLBhaoWEC2XqU5jQJhNkptHm0E1TU+Y2ep9NlHW4hFwQXrTW Xl+sNVZajgUujrwKQirmjO4KT/viNw10qjjSEd3b5Nf5+dhHRnOSGU42dtoCZvBFtG/9 fxo/rKi2Pj3sJGy4vDj1DytassM8bfH49YQw/WGBoQYaq0Lr55NkU1JMf5G+5p6K9G00 T2wg== X-Forwarded-Encrypted: i=1; AJvYcCUDvqYdJ5ZpKfeqHgWX36gTxS2tKXSQGNR7lbmRflJ+5Ta+yOhY24aRdzw1mK+B6xK6qkXVC5hNNA==@kvack.org X-Gm-Message-State: AOJu0YzHIGQQGtYEw2c0FCbg+5ht9I1tGBOicbyNhlKvNp9TtpryDNIX dhTMhO7SfPZxT/rvOPzroHiSWWDKCjTSvJM+gXE15JVgEhrIy8WS X-Google-Smtp-Source: AGHT+IGiK68KeCI51IGjOWFrUsz8CF4G0YKtUFVxpFkhV4qOK3nx4YQ9l73a9k2ptKkm5Re9KMmESA== X-Received: by 2002:a05:6512:2387:b0:530:c1fc:1c32 with SMTP id 2adb3069b0e04-53546b8e196mr90507e87.45.1724973006428; Thu, 29 Aug 2024 16:10:06 -0700 (PDT) Received: from krava (brn-rj-tbond02.sa.cz. [185.94.55.131]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8988fefc12sm135358866b.9.2024.08.29.16.10.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Aug 2024 16:10:05 -0700 (PDT) From: Jiri Olsa X-Google-Original-From: Jiri Olsa Date: Fri, 30 Aug 2024 01:09:59 +0200 To: Andrii Nakryiko Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org Subject: Re: [PATCH v4 4/8] uprobes: travers uprobe's consumer list locklessly under SRCU protection Message-ID: References: <20240829183741.3331213-1-andrii@kernel.org> <20240829183741.3331213-5-andrii@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240829183741.3331213-5-andrii@kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C2DAA16000E X-Stat-Signature: 6q1m9f9sshh1mzaeszymzmnxjxru8eqh X-Rspam-User: X-HE-Tag: 1724973008-136723 X-HE-Meta: U2FsdGVkX1/mPg15gB7G3pHRrv3SXSoqxhDxaMToHNtJjTVDfT6HveN7GsgssY6bqCOnz8NuJQno0h64bxzRqxZ6blaVSXQ3gIJt5W3UwynQRdJ+aeyLwBoylM7lVgVxAyG7ylZWo6UrY/ELaSu4alo6wvkiSVRiy3BTEi4SQV3/bOoSIgioGVdMfOPPy+0HW2/XFtB9N6IU8jXturAlcp5I6Lz94UXHYCd+1VBQ5s7qxOzCcejBx0rRsi30TvuHD4w4vFkKpusNCC5Q4CX9TJnhX3A1Ni0ia9EdEtiGGc5qjSxVSK5bpFR9oN5fdzeZjAWNL3mGbWJwCqhTL6rzxL1MJhCV8XK2BX/W4GxzsjlcgDBY/Yl1xmbYZ6aDpv16Tfl5rkF3NnN0s+KhWICsGivxVv0T4yxaIPTnlR/ru01ASlrs1pTEkZEV5Jim5SVnM5l7spqFoLIZoGpMYlktraU1TVPARo5hoGP4sRYKA/qHg4dX94KVz2edROpaKxAWen/e9TTVCeQnFcC4SFXLhlWr7l4gbd0PHpXr+artbeYO/tC9BcfBU/nM4E2OPQujNIO3EGYtJFDBgCQjnCWljeJcWDmXLWx+bhUqpiYEO1GLJr/1tICqjm3Md8Bho+NRfW0tRZFmJzFqZveJT/twNwWdKnhmQlLb0sMd5KySpu3lai0Y8aenvWUCSennCoGSlHS9YbZDxOWpulsayIZI+pZ2PCrRHIggjk4CBmxJ8hbj6YOe4aZIIbiAFcwehY6JAqzx8niM1r1CdHbO9wOvZle/9HxbEVxcXMZQnXCL0nhR4XxeJWzQ4K6kRQBvLsNhxhu5VSX78COzP91HUzMWezQi07tXTaSx5Ks4XShV+gF2OOwLqALF1GHj8Wkic4IH4pXjaJTJ0fwsiHjBJfqkQjY/87bhyVu7/WHclUnKsFOk3t8HTW7Pm+WQfn4cmcPp9hFXGqdPtZTqE5rrYmd 67IFper/ 0XrDDjCld9e0kd6fS3ZZZCSdiL+WcZYgeXzE3QTOx7VUGVSgVbk5GRVTBJfr1RwkeNf3XolSfoIf5kJ4jGBsfJfHPUr1oL+4tqB9xdT42Y2TMhRWj56e8khFDdd9G64xrLLLd/xICUUDgNO8TRelZX2zu/O/FACMbhUQkBKGgCu//fxuqf/ipG13IMmgelZOoerZfOLGV5vwQJwVqD/2vwE7dtlkF8nX8xg95IhKr8QJPdGlCeOz1QiqQnVPj8zWQARBrbkVFI8JMHTYXjm9sb/P4iw8lbypHNdvsxIDhKHw2A5I1MAb4RQHPX0r6X/FaTzDRLAzri0PPTxYhMEI7N2z0J+dmTnqmZdQ/n9vjdxqdv0TlXvctnr395l5iHbtiF01Fls1eaLS2Nh0wvtEmShewvGzPT8A5xDob596Ukf/y56C7FNIZsIQFstAwpymqsNrXlMV0KJ3EJz4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 11:37:37AM -0700, Andrii Nakryiko wrote: > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > uprobes, so we need to get rid of it to improve uprobe performance and > multi-CPU scalability. > > First, we turn uprobe's consumer list to a typical doubly-linked list > and utilize existing RCU-aware helpers for traversing such lists, as > well as adding and removing elements from it. > > For entry uprobes we already have SRCU protection active since before > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > won't go away from under us, but we add SRCU protection around consumer > list traversal. > > Lastly, to keep handler_chain()'s UPROBE_HANDLER_REMOVE handling simple, > we remember whether any removal was requested during handler calls, but > then we double-check the decision under a proper register_rwsem using > consumers' filter callbacks. Handler removal is very rare, so this extra > lock won't hurt performance, overall, but we also avoid the need for any > extra protection (e.g., seqcount locks). > > Signed-off-by: Andrii Nakryiko > --- > include/linux/uprobes.h | 2 +- > kernel/events/uprobes.c | 104 +++++++++++++++++++++++----------------- > 2 files changed, 62 insertions(+), 44 deletions(-) > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index 9cf0dce62e4c..29c935b0d504 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h > @@ -35,7 +35,7 @@ struct uprobe_consumer { > struct pt_regs *regs); > bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm); > > - struct uprobe_consumer *next; > + struct list_head cons_node; > }; > > #ifdef CONFIG_UPROBES > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 8bdcdc6901b2..97e58d160647 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -59,7 +59,7 @@ struct uprobe { > struct rw_semaphore register_rwsem; > struct rw_semaphore consumer_rwsem; > struct list_head pending_list; > - struct uprobe_consumer *consumers; > + struct list_head consumers; > struct inode *inode; /* Also hold a ref to inode */ > struct rcu_head rcu; > loff_t offset; > @@ -783,6 +783,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, > uprobe->inode = inode; > uprobe->offset = offset; > uprobe->ref_ctr_offset = ref_ctr_offset; > + INIT_LIST_HEAD(&uprobe->consumers); > init_rwsem(&uprobe->register_rwsem); > init_rwsem(&uprobe->consumer_rwsem); > RB_CLEAR_NODE(&uprobe->rb_node); > @@ -808,32 +809,19 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, > static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) > { > down_write(&uprobe->consumer_rwsem); > - uc->next = uprobe->consumers; > - uprobe->consumers = uc; > + list_add_rcu(&uc->cons_node, &uprobe->consumers); > up_write(&uprobe->consumer_rwsem); > } > > /* > * For uprobe @uprobe, delete the consumer @uc. > - * Return true if the @uc is deleted successfully > - * or return false. > + * Should never be called with consumer that's not part of @uprobe->consumers. > */ > -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) > +static void consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) > { > - struct uprobe_consumer **con; > - bool ret = false; > - > down_write(&uprobe->consumer_rwsem); > - for (con = &uprobe->consumers; *con; con = &(*con)->next) { > - if (*con == uc) { > - *con = uc->next; > - ret = true; > - break; > - } > - } > + list_del_rcu(&uc->cons_node); > up_write(&uprobe->consumer_rwsem); > - > - return ret; > } > > static int __copy_insn(struct address_space *mapping, struct file *filp, > @@ -929,7 +917,8 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) > bool ret = false; > > down_read(&uprobe->consumer_rwsem); > - for (uc = uprobe->consumers; uc; uc = uc->next) { > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > + srcu_read_lock_held(&uprobes_srcu)) { > ret = consumer_filter(uc, mm); > if (ret) > break; > @@ -1125,18 +1114,29 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) > int err; > > down_write(&uprobe->register_rwsem); > - if (WARN_ON(!consumer_del(uprobe, uc))) { > - err = -ENOENT; > - } else { > - err = register_for_each_vma(uprobe, NULL); > - /* TODO : cant unregister? schedule a worker thread */ > - if (unlikely(err)) > - uprobe_warn(current, "unregister, leaking uprobe"); > - } > + consumer_del(uprobe, uc); > + err = register_for_each_vma(uprobe, NULL); > up_write(&uprobe->register_rwsem); > > - if (!err) > - put_uprobe(uprobe); > + /* TODO : cant unregister? schedule a worker thread */ > + if (unlikely(err)) { > + uprobe_warn(current, "unregister, leaking uprobe"); > + goto out_sync; > + } > + > + put_uprobe(uprobe); > + > +out_sync: > + /* > + * Now that handler_chain() and handle_uretprobe_chain() iterate over > + * uprobe->consumers list under RCU protection without holding > + * uprobe->register_rwsem, we need to wait for RCU grace period to > + * make sure that we can't call into just unregistered > + * uprobe_consumer's callbacks anymore. If we don't do that, fast and > + * unlucky enough caller can free consumer's memory and cause > + * handler_chain() or handle_uretprobe_chain() to do an use-after-free. > + */ > + synchronize_srcu(&uprobes_srcu); > } > EXPORT_SYMBOL_GPL(uprobe_unregister); > > @@ -1214,13 +1214,20 @@ EXPORT_SYMBOL_GPL(uprobe_register); > int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) > { > struct uprobe_consumer *con; > - int ret = -ENOENT; > + int ret = -ENOENT, srcu_idx; > > down_write(&uprobe->register_rwsem); > - for (con = uprobe->consumers; con && con != uc ; con = con->next) > - ; > - if (con) > - ret = register_for_each_vma(uprobe, add ? uc : NULL); > + > + srcu_idx = srcu_read_lock(&uprobes_srcu); > + list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, > + srcu_read_lock_held(&uprobes_srcu)) { > + if (con == uc) { > + ret = register_for_each_vma(uprobe, add ? uc : NULL); > + break; > + } > + } > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > + > up_write(&uprobe->register_rwsem); > > return ret; > @@ -2085,10 +2092,12 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) > struct uprobe_consumer *uc; > int remove = UPROBE_HANDLER_REMOVE; > bool need_prep = false; /* prepare return uprobe, when needed */ > + bool has_consumers = false; > > - down_read(&uprobe->register_rwsem); > current->utask->auprobe = &uprobe->arch; > - for (uc = uprobe->consumers; uc; uc = uc->next) { > + > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > + srcu_read_lock_held(&uprobes_srcu)) { > int rc = 0; > > if (uc->handler) { > @@ -2101,17 +2110,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) > need_prep = true; > > remove &= rc; > + has_consumers = true; > } > current->utask->auprobe = NULL; > > if (need_prep && !remove) > prepare_uretprobe(uprobe, regs); /* put bp at return */ > > - if (remove && uprobe->consumers) { > - WARN_ON(!uprobe_is_active(uprobe)); > - unapply_uprobe(uprobe, current->mm); > + if (remove && has_consumers) { > + down_read(&uprobe->register_rwsem); > + > + /* re-check that removal is still required, this time under lock */ > + if (!filter_chain(uprobe, current->mm)) { sorry for late question, but I do not follow this change.. at this point we got 1 as handler's return value from all the uprobe's consumers, why do we need to call filter_chain in here.. IIUC this will likely skip over the removal? with single uprobe_multi consumer: handler_chain uprobe_multi_link_handler uprobe_prog_run bpf_prog returns 1 remove = 1 if (remove && has_consumers) { filter_chain - uprobe_multi_link_filter returns true.. so the uprobe stays? maybe I just need to write test for it ;-) thanks, jirka > + WARN_ON(!uprobe_is_active(uprobe)); > + unapply_uprobe(uprobe, current->mm); > + } > + > + up_read(&uprobe->register_rwsem); > } > - up_read(&uprobe->register_rwsem); > } > > static void > @@ -2119,13 +2135,15 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) > { > struct uprobe *uprobe = ri->uprobe; > struct uprobe_consumer *uc; > + int srcu_idx; > > - down_read(&uprobe->register_rwsem); > - for (uc = uprobe->consumers; uc; uc = uc->next) { > + srcu_idx = srcu_read_lock(&uprobes_srcu); > + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, > + srcu_read_lock_held(&uprobes_srcu)) { > if (uc->ret_handler) > uc->ret_handler(uc, ri->func, regs); > } > - up_read(&uprobe->register_rwsem); > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > } > > static struct return_instance *find_next_ret_chain(struct return_instance *ri) > -- > 2.43.5 >