From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4602BC4707B for ; Fri, 12 Jan 2024 02:38:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5AD66B008A; Thu, 11 Jan 2024 21:38:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E3356B008C; Thu, 11 Jan 2024 21:38:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 883BB6B0092; Thu, 11 Jan 2024 21:38:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 707A46B008A for ; Thu, 11 Jan 2024 21:38:57 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3A78AA236C for ; Fri, 12 Jan 2024 02:38:57 +0000 (UTC) X-FDA: 81669101514.11.34C86E1 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf18.hostedemail.com (Postfix) with ESMTP id 4AFBA1C0003 for ; Fri, 12 Jan 2024 02:38:55 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JwYlK1E6; spf=pass (imf18.hostedemail.com: domain of andreyknvl@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=andreyknvl@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705027135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yggLNlcD+Vr3O3v/JSMp2/FPyreRgsEDcSVU2gS+9Rk=; b=raYEsgpaizEfTB1Waen5698zCaJtPgDETQYP/keogMItRVIbbxHqfd/qY49LVLQW/c1izG O2qmjcqqiG+kpD25EocQlAnVxDnCC8p3oZ6ZjIy9eNyBOPjfY4flTcLgJ+vYu2h++enni6 Wa6+U4U/fSXkFALmzWip1fAZMGumuao= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705027135; a=rsa-sha256; cv=none; b=GeLWpkcJYQ4j01W7qf5bQp7spaU1mKseZkE+yZwDc3TCCvSAmIjF5gVlqXSlhunmQMX0yv jyVRKvejxbLhpYzfJrhqhO+VRd5ZlxIl62p+y/xjIcl3PRjPsPtOF/f1B4gIHhUtwixLrW P4JnnfJC7QTUUMhNZgcppBZEcixq2uM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JwYlK1E6; spf=pass (imf18.hostedemail.com: domain of andreyknvl@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=andreyknvl@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-3374c693f92so4336290f8f.1 for ; Thu, 11 Jan 2024 18:38:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705027134; x=1705631934; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yggLNlcD+Vr3O3v/JSMp2/FPyreRgsEDcSVU2gS+9Rk=; b=JwYlK1E6Q0XzPefqnxIbeArd9NQZZal+I2tGyGh+4u2ZL4hLI4Qs97zZUwzISdzsmc +mHRDaIe3o4twmNapgxhoUqCpmrB5HEBYL/U2lqa5GFLOHpF1isvbAxA4k0gIsCR7uDl VRH98FdLgrWai7mdVC99iMjCoHyugLPfwoDaADYMMPStBrttWXHqjWkSupVJXf6SRa1S 44+6UJzV9EZXhwAjNfKXaeYdBOwG1BBccBUe/9u7t9Edtp6I2t13+1UqRePihr2WiaG/ sq5As5bjRNKOiP0yuCbVw3zu8amvmG0HAWm1r8sZVbQndXZAYEilPa+kvGbNIiCGOeEG VmLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705027134; x=1705631934; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yggLNlcD+Vr3O3v/JSMp2/FPyreRgsEDcSVU2gS+9Rk=; b=kWclhS4rFGPrztEATGspScWb3WqdUAkKm24Gabg9mtHSZI28RBejxzth7NjglT5suF s9NAp8aatsJrUYFDDkOzAGwmMuMZZntc4Og/LDbgxOTq0TPlm6E02xgHIsLZMxrQ30VC ncvX2tp/SkFxzpsouZ1Tw70hhnDHAJyL/vKilfiJfKN6qun/7PNg76/PUKD2vybx/MgK afV9DeFubdp0dJUAyV5OtfRW0gUq7PGcxFhhQcvOAWAygMiSI68xawf/jWRNWCzQ0rGw aVxeXcNDbcj8FJqa2o2O3sViwT0v98N5xHXz0yOHH0OSx1PTns9jezmcOILcEPmOOXUg m7nQ== X-Gm-Message-State: AOJu0YxP6j5fXxZ3EpL1ZTemomTpf49fx3l0aXJURy3p7+DqyNo2CVBm pDtbmMXEeaipH9RlzS9BOgW8ZXZdOkWVaV/lOuo= X-Google-Smtp-Source: AGHT+IGP0e6RB2OZBIus43ub0XNtmwCRJCli7HOVmHJ/kkT2gJbY2Lt+Uh9Gs7hzwdDAiQS++jwxIHE0ieX8SezQCKs= X-Received: by 2002:adf:ef88:0:b0:337:68ab:617f with SMTP id d8-20020adfef88000000b0033768ab617fmr321236wro.15.1705027133584; Thu, 11 Jan 2024 18:38:53 -0800 (PST) MIME-Version: 1.0 References: <9f81ffcc4bb422ebb6326a65a770bf1918634cbb.1700502145.git.andreyknvl@google.com> <87sf34lrn3.fsf@linux.intel.com> In-Reply-To: From: Andrey Konovalov Date: Fri, 12 Jan 2024 03:38:42 +0100 Message-ID: Subject: Re: [PATCH v4 12/22] lib/stackdepot: use read/write lock To: Marco Elver Cc: Andi Kleen , Oscar Salvador , andrey.konovalov@linux.dev, Andrew Morton , Alexander Potapenko , Dmitry Vyukov , Vlastimil Babka , kasan-dev@googlegroups.com, Evgenii Stepanov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrey Konovalov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: hx99mcwifjyhfde44m5mkmwj44om37gu X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4AFBA1C0003 X-Rspam-User: X-HE-Tag: 1705027135-120259 X-HE-Meta: U2FsdGVkX1896tKxpMI1LXYIiO5MM4tdrP6QZ20Oyz66yO6OaTzu8lvNRTgsm5FGTkHQmG2MyNgR1MjNvptFwLB+x+4l73ZqzflqDXiiirTIPyXs2Oh4Nmv1GCNiDhIKZt8sQiTsr8FrswzBbala5C4mmECOBazNcowrthbsFlvMRbhbOGb8Szwa7k9wXFxX0Zc5kxALKKFNmcLwujjXzBkJfjOjM+ZNih6n3l5UOengisDIsTrxixFkCpDRv4koSa/7z4oGxkO969OwICWMY0ZQoZet4sP0L77+5Io7lk+b4DZv3SzE2KV2kUwhUet8frsclKrlWAmvaYFlzvOv6I/B3yLtNR1T5+lnhAa1klhgoPMJNrd8rtuDIEOvXIqM5cj9J5HhRXgYWJcfM8CUM+zJJcO0rzWDiytvfkvpEypDsJE+ir+zktTHXENGa8OtZC1zpYvRLSTj8QKjeaQGOithot5EWV6AoqmV8bGxT63TRvPvPSqpaFlw3e5hHlGHbhtBdfbJ5kYBJC2L6fK0aKN5p4uKbV2v9Zz0LoaVo8LpxuNUxvHV7UPyv+GOXnF9DfqwuXKfiRsaeJPH4KE4NZVLiojFB87F2bKPq3DNUzqOkHzb/Q76Yju0Ypfc2KkXDSZFDT3oJ+uHdII9Jv284qkzxyNpDZjgm+JPHADtwK1tsSxesGVpvKVM+6PMA8jUUj8PXkRSR164/wb95tmp9UMK5FN7abaS0iBVE9cR3jjvvOcGNEYoD90aE5GcYex00hqXg+OPdR7xaNmOsxmPw+tlJHe6jr4J8TXkI/HlmnNA5klM+kD5Cs2UG9ZbVyxsF9TjVSqlc6vvTSt7rGQM27hYKbboL1mhBn+KSfTWwOumvVRo9I78/sEVeB5f6AmMsnCC30fbT+HltuKJN5h3C38+7FAGZXhf0LnWSnzdvawj9z5r6ey3wjC7yh7pfc40xypLV/QHxKonwR1z/ro SucpyNIx XzHL4i/RglxZJiQhFJgBWH19CUKCSAP+hkwUgc9rXa6z5lSb/2SmyUf+YPzkO0QpiZwvFpr1QhgGyLS7jvySLLrGILrs+ivMzsc3C4RdpltYj0TQIeqZyq3jojxVbxkgRcOGjzEfxBm9uCVYBnPBgIPjWpgYyzF3583vI4webQrUASuuXmxMrnZP2rMxFBP0KdsywcllIDTqPM4fbvkLwFHtutE4FAEqXSsRJTkurPObnWwJcVJjAVNNHW4C3GCVT/YynWxEPHlZFd6wymT/M5DuSJD8sRqdlTu/cqjDNhl7VTAr07X+XwmTH7dj6jniD//CZef7y95uE8xg+4Rhh+xMoQP1QOFXFT1dgGMMSKMxLvvzRhA870oI5OQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 11, 2024 at 8:08=E2=80=AFPM Marco Elver wrot= e: > > On Thu, Jan 11, 2024 at 04:36AM -0800, Andi Kleen wrote: > > > stackdepot is severely limited in what kernel facilities it may use > > > due to being used by such low level facilities as the allocator > > > itself. > > > > RCU can be done quite low level too (e.g. there is NMI safe RCU) > > How about the below? This should get us back the performance of the old > lock-less version. Although it's using rculist, we don't actually need > to synchronize via RCU. > > Thanks, > -- Marco > > ------ >8 ------ > > From: Marco Elver > Date: Tue, 9 Jan 2024 10:21:56 +0100 > Subject: [PATCH] stackdepot: make fast paths lock-less again > > stack_depot_put() unconditionally takes the pool_rwlock as a writer. > This is unnecessary if the stack record is not going to be freed. > Furthermore, reader-writer locks have inherent cache contention, which > does not scale well on machines with large CPU counts. > > Instead, rework the synchronization story of stack depot to again avoid > taking any locks in the fast paths. This is done by relying on RCU > primitives to give us lock-less list traversal. See code comments for > more details. > > Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") > Signed-off-by: Marco Elver > --- > lib/stackdepot.c | 222 ++++++++++++++++++++++++++++------------------- > 1 file changed, 133 insertions(+), 89 deletions(-) > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > index a0be5d05c7f0..9eaf46f8abc4 100644 > --- a/lib/stackdepot.c > +++ b/lib/stackdepot.c > @@ -19,10 +19,13 @@ > #include > #include > #include > +#include > #include > #include > #include > #include > +#include > +#include > #include > #include > #include > @@ -67,7 +70,8 @@ union handle_parts { > }; > > struct stack_record { > - struct list_head list; /* Links in hash table or freelis= t */ > + struct list_head hash_list; /* Links in the hash table */ > + struct llist_node free_list; /* Links in the freelist */ > u32 hash; /* Hash in hash table */ > u32 size; /* Number of stored frames */ > union handle_parts handle; > @@ -104,7 +108,7 @@ static void *new_pool; > /* Number of pools in stack_pools. */ > static int pools_num; > /* Freelist of stack records within stack_pools. */ > -static LIST_HEAD(free_stacks); > +static LLIST_HEAD(free_stacks); > /* > * Stack depot tries to keep an extra pool allocated even before it runs= out > * of space in the currently used pool. This flag marks whether this ext= ra pool > @@ -112,8 +116,8 @@ static LIST_HEAD(free_stacks); > * yet allocated or if the limit on the number of pools is reached. > */ > static bool new_pool_required =3D true; > -/* Lock that protects the variables above. */ > -static DEFINE_RWLOCK(pool_rwlock); > +/* The lock must be held when performing pool or free list modifications= . */ > +static DEFINE_RAW_SPINLOCK(pool_lock); > > static int __init disable_stack_depot(char *str) > { > @@ -263,9 +267,7 @@ static void depot_init_pool(void *pool) > { > int offset; > > - lockdep_assert_held_write(&pool_rwlock); > - > - WARN_ON(!list_empty(&free_stacks)); > + lockdep_assert_held(&pool_lock); > > /* Initialize handles and link stack records into the freelist. *= / > for (offset =3D 0; offset <=3D DEPOT_POOL_SIZE - DEPOT_STACK_RECO= RD_SIZE; > @@ -276,18 +278,25 @@ static void depot_init_pool(void *pool) > stack->handle.offset =3D offset >> DEPOT_STACK_ALIGN; > stack->handle.extra =3D 0; > > - list_add(&stack->list, &free_stacks); > + llist_add(&stack->free_list, &free_stacks); > + INIT_LIST_HEAD(&stack->hash_list); > } > > /* Save reference to the pool to be used by depot_fetch_stack(). = */ > stack_pools[pools_num] =3D pool; > - pools_num++; > + > + /* > + * Release of pool pointer assignment above. Pairs with the > + * smp_load_acquire() in depot_fetch_stack(). > + */ > + smp_store_release(&pools_num, pools_num + 1); > + ASSERT_EXCLUSIVE_WRITER(pools_num); > } > > /* Keeps the preallocated memory to be used for a new stack depot pool. = */ > static void depot_keep_new_pool(void **prealloc) > { > - lockdep_assert_held_write(&pool_rwlock); > + lockdep_assert_held(&pool_lock); > > /* > * If a new pool is already saved or the maximum number of > @@ -310,16 +319,16 @@ static void depot_keep_new_pool(void **prealloc) > * number of pools is reached. In either case, take note that > * keeping another pool is not required. > */ > - new_pool_required =3D false; > + WRITE_ONCE(new_pool_required, false); > } > > /* Updates references to the current and the next stack depot pools. */ > static bool depot_update_pools(void **prealloc) > { > - lockdep_assert_held_write(&pool_rwlock); > + lockdep_assert_held(&pool_lock); > > /* Check if we still have objects in the freelist. */ > - if (!list_empty(&free_stacks)) > + if (!llist_empty(&free_stacks)) > goto out_keep_prealloc; > > /* Check if we have a new pool saved and use it. */ > @@ -329,7 +338,7 @@ static bool depot_update_pools(void **prealloc) > > /* Take note that we might need a new new_pool. */ > if (pools_num < DEPOT_MAX_POOLS) > - new_pool_required =3D true; > + WRITE_ONCE(new_pool_required, true); > > /* Try keeping the preallocated memory for new_pool. */ > goto out_keep_prealloc; > @@ -362,20 +371,19 @@ static struct stack_record * > depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **pre= alloc) > { > struct stack_record *stack; > + struct llist_node *free; > > - lockdep_assert_held_write(&pool_rwlock); > + lockdep_assert_held(&pool_lock); > > /* Update current and new pools if required and possible. */ > if (!depot_update_pools(prealloc)) > return NULL; > > /* Check if we have a stack record to save the stack trace. */ > - if (list_empty(&free_stacks)) > + free =3D llist_del_first(&free_stacks); > + if (!free) > return NULL; > - > - /* Get and unlink the first entry from the freelist. */ > - stack =3D list_first_entry(&free_stacks, struct stack_record, lis= t); > - list_del(&stack->list); > + stack =3D llist_entry(free, struct stack_record, free_list); > > /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. = */ > if (size > CONFIG_STACKDEPOT_MAX_FRAMES) > @@ -385,7 +393,6 @@ depot_alloc_stack(unsigned long *entries, int size, u= 32 hash, void **prealloc) > stack->hash =3D hash; > stack->size =3D size; > /* stack->handle is already filled in by depot_init_pool(). */ > - refcount_set(&stack->count, 1); > memcpy(stack->entries, entries, flex_array_size(stack, entries, s= ize)); > > /* > @@ -394,21 +401,30 @@ depot_alloc_stack(unsigned long *entries, int size,= u32 hash, void **prealloc) > */ > kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE); > > + /* > + * Release saving of the stack trace. Pairs with smp_mb() in > + * depot_fetch_stack(). > + */ > + smp_mb__before_atomic(); > + refcount_set(&stack->count, 1); > + > return stack; > } > > static struct stack_record *depot_fetch_stack(depot_stack_handle_t handl= e) > { > + /* Acquire the pool pointer written in depot_init_pool(). */ > + const int pools_num_cached =3D smp_load_acquire(&pools_num); > union handle_parts parts =3D { .handle =3D handle }; > void *pool; > size_t offset =3D parts.offset << DEPOT_STACK_ALIGN; > struct stack_record *stack; > > - lockdep_assert_held(&pool_rwlock); > + lockdep_assert_not_held(&pool_lock); > > - if (parts.pool_index > pools_num) { > + if (parts.pool_index > pools_num_cached) { > WARN(1, "pool index %d out of bounds (%d) for stack id %0= 8x\n", > - parts.pool_index, pools_num, handle); > + parts.pool_index, pools_num_cached, handle); > return NULL; > } > > @@ -417,15 +433,35 @@ static struct stack_record *depot_fetch_stack(depot= _stack_handle_t handle) > return NULL; > > stack =3D pool + offset; > + > + /* > + * Acquire the stack trace. Pairs with smp_mb() in depot_alloc_st= ack(). > + * > + * This does not protect against a stack_depot_put() freeing the = record > + * and having it subsequently being reused. Callers are responsib= le to > + * avoid using stack depot handles after passing to stack_depot_p= ut(). > + */ > + if (!refcount_read(&stack->count)) > + return NULL; Can this happen? It seems that depot_fetch_stack should only be called for handles that were returned from stack_depot_save_flags before all puts and thus the the refcount should > 0. Or is this a safeguard against improper API usage? > + smp_mb__after_atomic(); > + > return stack; > } > > /* Links stack into the freelist. */ > static void depot_free_stack(struct stack_record *stack) > { > - lockdep_assert_held_write(&pool_rwlock); > + unsigned long flags; > + > + lockdep_assert_not_held(&pool_lock); > + > + raw_spin_lock_irqsave(&pool_lock, flags); > + printk_deferred_enter(); > + list_del_rcu(&stack->hash_list); > + printk_deferred_exit(); > + raw_spin_unlock_irqrestore(&pool_lock, flags); > > - list_add(&stack->list, &free_stacks); > + llist_add(&stack->free_list, &free_stacks); This llist_add is outside of the lock just because we can (i.e. llist_add can run concurrently with the other free_stacks operations, which are all under the lock), right? This slightly contradicts the comment above the free_stacks definition. If we put this under the lock and use normal list instead of llist, I think we can then combine the hash_list with the free_list like before to save up on some space for stack_record. Would that make sense? > } > > /* Calculates the hash for a stack. */ > @@ -453,22 +489,55 @@ int stackdepot_memcmp(const unsigned long *u1, cons= t unsigned long *u2, > > /* Finds a stack in a bucket of the hash table. */ > static inline struct stack_record *find_stack(struct list_head *bucket, > - unsigned long *entries, int = size, > - u32 hash) > + unsigned long *entries, int= size, > + u32 hash, depot_flags_t fla= gs) > { > - struct list_head *pos; > - struct stack_record *found; > + struct stack_record *stack, *ret =3D NULL; > > - lockdep_assert_held(&pool_rwlock); > + /* > + * Due to being used from low-level code paths such as the alloca= tors, > + * NMI, or even RCU itself, stackdepot cannot rely on primitives = that > + * would sleep (such as synchronize_rcu()) or end up recursively = call > + * into stack depot again (such as call_rcu()). > + * > + * Instead, lock-less readers only rely on RCU primitives for cor= rect > + * memory ordering, but do not use RCU-based synchronization othe= rwise. > + * Instead, we perform 3-pass validation below to ensure that the= stack > + * record we accessed is actually valid. If we fail to obtain a v= alid > + * stack record here, the slow-path in stack_depot_save_flags() w= ill > + * retry to avoid inserting duplicates. > + * > + * If STACK_DEPOT_FLAG_GET is not used, it is undefined behaviour= to > + * call stack_depot_put() later - i.e. in the non-refcounted case= , we do > + * not have to worry that the entry will be recycled. > + */ > + > + list_for_each_entry_rcu(stack, bucket, hash_list) { So we don't need rcu_read_lock here, because we don't rely on call_rcu etc., right? > + /* 1. Check if this entry could potentially match. */ > + if (data_race(stack->hash !=3D hash || stack->size !=3D s= ize)) > + continue; > + > + /* > + * 2. Increase refcount if not zero. If this is successfu= l, we > + * know that this stack record is valid and will not b= e freed by > + * stack_depot_put(). > + */ > + if ((flags & STACK_DEPOT_FLAG_GET) && unlikely(!refcount_= inc_not_zero(&stack->count))) > + continue; > + > + /* 3. Do full validation of the record. */ > + if (likely(stack->hash =3D=3D hash && stack->size =3D=3D = size && > + !stackdepot_memcmp(entries, stack->entries, si= ze))) { > + ret =3D stack; > + break; > + } > > - list_for_each(pos, bucket) { > - found =3D list_entry(pos, struct stack_record, list); > - if (found->hash =3D=3D hash && > - found->size =3D=3D size && > - !stackdepot_memcmp(entries, found->entries, size)) > - return found; > + /* Undo refcount - could have raced with stack_depot_put(= ). */ > + if ((flags & STACK_DEPOT_FLAG_GET) && unlikely(refcount_d= ec_and_test(&stack->count))) > + depot_free_stack(stack); > } > - return NULL; > + > + return ret; > } > > depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, > @@ -482,7 +551,6 @@ depot_stack_handle_t stack_depot_save_flags(unsigned = long *entries, > struct page *page =3D NULL; > void *prealloc =3D NULL; > bool can_alloc =3D depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC; > - bool need_alloc =3D false; > unsigned long flags; > u32 hash; > > @@ -505,31 +573,16 @@ depot_stack_handle_t stack_depot_save_flags(unsigne= d long *entries, > hash =3D hash_stack(entries, nr_entries); > bucket =3D &stack_table[hash & stack_hash_mask]; > > - read_lock_irqsave(&pool_rwlock, flags); > - printk_deferred_enter(); > - > - /* Fast path: look the stack trace up without full locking. */ > - found =3D find_stack(bucket, entries, nr_entries, hash); > - if (found) { > - if (depot_flags & STACK_DEPOT_FLAG_GET) > - refcount_inc(&found->count); > - printk_deferred_exit(); > - read_unlock_irqrestore(&pool_rwlock, flags); > + /* Fast path: look the stack trace up without locking. */ > + found =3D find_stack(bucket, entries, nr_entries, hash, depot_fla= gs); > + if (found) > goto exit; > - } > - > - /* Take note if another stack pool needs to be allocated. */ > - if (new_pool_required) > - need_alloc =3D true; > - > - printk_deferred_exit(); > - read_unlock_irqrestore(&pool_rwlock, flags); > > /* > * Allocate memory for a new pool if required now: > * we won't be able to do that under the lock. > */ > - if (unlikely(can_alloc && need_alloc)) { > + if (unlikely(can_alloc && READ_ONCE(new_pool_required))) { > /* > * Zero out zone modifiers, as we don't have specific zon= e > * requirements. Keep the flags related to allocation in = atomic > @@ -543,31 +596,33 @@ depot_stack_handle_t stack_depot_save_flags(unsigne= d long *entries, > prealloc =3D page_address(page); > } > > - write_lock_irqsave(&pool_rwlock, flags); > + raw_spin_lock_irqsave(&pool_lock, flags); > printk_deferred_enter(); > > - found =3D find_stack(bucket, entries, nr_entries, hash); > + /* Try to find again, to avoid concurrently inserting duplicates.= */ > + found =3D find_stack(bucket, entries, nr_entries, hash, depot_fla= gs); > if (!found) { > struct stack_record *new =3D > depot_alloc_stack(entries, nr_entries, hash, &pre= alloc); > > if (new) { > - list_add(&new->list, bucket); > + /* > + * This releases the stack record into the bucket= and > + * makes it visible to readers in find_stack(). > + */ > + list_add_rcu(&new->hash_list, bucket); > found =3D new; > } > - } else { > - if (depot_flags & STACK_DEPOT_FLAG_GET) > - refcount_inc(&found->count); > + } else if (prealloc) { > /* > * Stack depot already contains this stack trace, but let= 's > * keep the preallocated memory for future. > */ > - if (prealloc) > - depot_keep_new_pool(&prealloc); > + depot_keep_new_pool(&prealloc); > } > > printk_deferred_exit(); > - write_unlock_irqrestore(&pool_rwlock, flags); > + raw_spin_unlock_irqrestore(&pool_lock, flags); > exit: > if (prealloc) { > /* Stack depot didn't use this memory, free it. */ > @@ -592,7 +647,6 @@ unsigned int stack_depot_fetch(depot_stack_handle_t h= andle, > unsigned long **entries) > { > struct stack_record *stack; > - unsigned long flags; > > *entries =3D NULL; > /* > @@ -604,13 +658,12 @@ unsigned int stack_depot_fetch(depot_stack_handle_t= handle, > if (!handle || stack_depot_disabled) > return 0; > > - read_lock_irqsave(&pool_rwlock, flags); > - printk_deferred_enter(); > - > stack =3D depot_fetch_stack(handle); > - > - printk_deferred_exit(); > - read_unlock_irqrestore(&pool_rwlock, flags); > + /* > + * Should never be NULL, otherwise this is a use-after-put. > + */ > + if (WARN_ON(!stack)) > + return 0; > > *entries =3D stack->entries; > return stack->size; > @@ -620,29 +673,20 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch); > void stack_depot_put(depot_stack_handle_t handle) > { > struct stack_record *stack; > - unsigned long flags; > > if (!handle || stack_depot_disabled) > return; > > - write_lock_irqsave(&pool_rwlock, flags); > - printk_deferred_enter(); > - > stack =3D depot_fetch_stack(handle); > + /* > + * Should always be able to find the stack record, otherwise this= is an > + * unbalanced put attempt. > + */ > if (WARN_ON(!stack)) > - goto out; > - > - if (refcount_dec_and_test(&stack->count)) { > - /* Unlink stack from the hash table. */ > - list_del(&stack->list); > + return; > > - /* Free stack. */ > + if (refcount_dec_and_test(&stack->count)) > depot_free_stack(stack); > - } > - > -out: > - printk_deferred_exit(); > - write_unlock_irqrestore(&pool_rwlock, flags); > } > EXPORT_SYMBOL_GPL(stack_depot_put); > > -- > 2.43.0.275.g3460e3d667-goog > Looks good to me from the functional perspective (modulo the clarification comments I left above), but it would be great to get a review from someone with a better understanding of the low-level synchronization primitives. Thank you!