From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB2D5C25B74 for ; Wed, 22 May 2024 01:17:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E9296B0082; Tue, 21 May 2024 21:17:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4719A6B009D; Tue, 21 May 2024 21:17:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 312B86B009E; Tue, 21 May 2024 21:17:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 103D96B0082 for ; Tue, 21 May 2024 21:17:42 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B29DB1C06F1 for ; Wed, 22 May 2024 01:17:41 +0000 (UTC) X-FDA: 82144269522.09.91D67B7 Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf17.hostedemail.com (Postfix) with ESMTP id 0039040021 for ; Wed, 22 May 2024 01:17:39 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf17.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716340660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uULwFZNVMvuZLMIcl/Jj9q31bH6KvSTXLcBzFppjdEQ=; b=XBFepTjos1fZmQxTjR6T7RaXRU9ZoxjCgbfD6iLgn3o5gK9EoaAjXjWWxQwyI5rzRT7M/s K5F1E4BwN6VkfNzaINOqK987z7yT6ocYOklKjdi9eYdcLsViucozQQXDMIK6AS9ZUAA4kr TZy3Ivl4ROomkLMGqUlUuFGShs/3Bpg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf17.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716340660; a=rsa-sha256; cv=none; b=CkVhT71gF27ggR1skFqiiIqR4NVSIE6mp6z+sGk+y0ch2Gmw6EYep2d3mufH28HwZgepdw L9LYzxiBfc1RVf7EDb85r07Efa4VevYBpVhMvxwmfMvumnP+1OkObBQ2RMoxn3jIoUJh7k SA3DV16ljcF6ekaEyT7Ju/PBev8uXYE= Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-4df3ad5520aso125400e0c.0 for ; Tue, 21 May 2024 18:17:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716340659; x=1716945459; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uULwFZNVMvuZLMIcl/Jj9q31bH6KvSTXLcBzFppjdEQ=; b=X5dHg/lZfP2O0xGcuVPjoK6+SBGEyd68LIhShtfe/LeadctbcMggFhmV74exoi6H4Z A6tceKykGzwvPdHpmMQj+NZ3WoPOK9aX3pgDjL6/EkdHl81UR+xxO88OFAzWjMgLA2kB ZE2/DM1ciTIb8GoFJQkoorMMbaIUC6KG81L8OnuzO8sppkyA6JRFkwAGkOp7oYzEb8Ms TslQdo2PPVPNLLK1d+Kv63bLrdZWQMPn50wa1AZADnv2pYHhUDwv3IFM6JKZfBJmlu2F gTNX/IYQoCz5h3zukh+XEzAzkpyi9EhIYI5o+DBUb7hGhaRFEGzh7xh6/XVtxc7fb5JB 2wnQ== X-Forwarded-Encrypted: i=1; AJvYcCWUO4dSxBafPkcSrBiAPbmeZP1DxRo8Q/Tub4hm44xM3dBjnNRt9KDs1xXS+8ndQpW+ge8FVVluCVnKzK3q/aadmiQ= X-Gm-Message-State: AOJu0YzVj8C3/H8Tx/e9zOZF+UydOMEY/o+vh2WJ9E3Q/8MaQ2OsyHVi kjsT4C5x1/rxq+1/Aluki2Q90Uoc0mi32Tr6Rz3Vb/+RZ49hIrnQ X-Google-Smtp-Source: AGHT+IHhSdROKgvMV8hFVSNBhuhEqQ6/xSpoi2IcBAFa+l3fdjSN+49znDGLKI2DlOgM2BkACSghJw== X-Received: by 2002:a05:6122:368b:b0:4df:2b08:f217 with SMTP id 71dfb90a1353d-4e218510d67mr540985e0c.6.1716340658805; Tue, 21 May 2024 18:17:38 -0700 (PDT) Received: from snowbird ([165.225.8.163]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ab840e24eesm266616d6.135.2024.05.21.18.17.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 18:17:38 -0700 (PDT) Date: Tue, 21 May 2024 18:17:36 -0700 From: Dennis Zhou To: Mateusz Guzik Cc: tj@kernel.org, hughd@google.com, akpm@linux-foundation.org, vbabka@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] percpu_counter: add a cmpxchg-based _add_batch variant Message-ID: References: <20240521233100.358002-1-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240521233100.358002-1-mjguzik@gmail.com> X-Stat-Signature: nq7scna8e6jpijetegsq8eub1fmp6s41 X-Rspamd-Queue-Id: 0039040021 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1716340659-648046 X-HE-Meta: U2FsdGVkX1/FBeX1+JoEzWqgiiDo/z+A9CuP/I+9NfO9UM3njESBaCkRXDe+P24w14jda9WFqi60z+LYxRBbrrRNHl9S4q42XDBcWmAVoIokN8Hxj+EHBVN6uXQOlSwKZpCohc336vB1eB8xLmGMf1ELnrYo+B8YW2+5lifHNYfGD/OliugbgY7/AbOzXFoHlKRqTciLvC7ucKhfqZQLZenQ6EHXor54xgpzR8PEPKF2XiuQfQpaVNRPKF06esE5OFykntFQecKKJkEzxDheN9elpZcI4TtwEElt3ft8ZFv0Bl9m9vTW7WbEgcNz0eNriss4h/CMtFPukhRdNPpMj4ydclwQsjoJXfHWNCOwHaa0Lt9iJYESrNVfoiDcBGfPIGgU37l6IqQoE9N5tOHizbN5ikzFWMsRINh0F0dYblYvXMnunqwoYCSVC2LU5wGV63m81y8ZyyEPXwOHRKHa5pZ6LADLJW5kwZzUOMjSRnpVptUReRpkftm6CokGqJxlwn0D0Syljw4Rwye99LPExVM50jpVAt3KGthF9uGSkimpEO4yiNRbcWZxL2uY0MHUbCQKwJQsep2rlhmBK3T/jS3QnB4xO3kX8ZpMMshixzJTaJHlIuWpAhRC6plMtS9hVG0d1Dv8kEbJpN38Thu6T2T9oBesYGuBloC2hbQjYZmi/KdB83rlO0gl6UH8zK5vZxVeH2fwfCyG+OQaFMnAcxHxwnoPmISkXSIHuevp5Mo79RsLI5A3qJzREhJKkBxqUEFBmt57h6DwA0kIIoT9/A4eD+lwxHPvFubOXMAwrgKT2R8leorkle7PNDBcL+4ljUQuhkhIhL0KjVTgguZi6g6+DGdG7mxydvlz4BeNQ1lG8gDAPa8tizWcL9UadxsFGD5xqYuwb22SeEKAthujrFM4e8fcFQYZUXJ66hyJaFWJL8PLlLGigmCfU6MgnXUw9uIkg3956mggvfxilzF KBiZEjqp lueOabS1+vQt2Xmw0uM8p3Dj4/OUsDakVCc1WqN60tQIhH1rGG50GC79tdwEkcBPUWT+P3QolY5zz/kRZIEZYC+epTcMO6XNJoiKOTGU9PX9Pq4xlqwUcK3kyA2EBsYSkOmoXla3d/QvFb1svCZ1TY2E8aHAJzY86PmCfNBNQGhuKd7G+AtgJ9V9TSd6qfb93S6rX4VqLLSemspiIo2Jv81BeVZkVZ7knCFdRrYICJpDD4GwfYUeTSr2czZ+rPtMfpH8+2WFbaW35PTs7ghTTL9px1NqxZOX3P/hhpXb0BHL1tRk0W08vxRwCA8G+3mX4kUUPWp6v7ahC0R4rFZsuWOe2jOph+JDC9wETo+JbZ1xdIzVgD66QgjDp9utLQbE1Ot4pou7gWdyhO3JtBpRASRVMKK0IS5nbk6JiLnSJHkm6pppbjx+gTTbpbkY0w/zP0uEccw3Ir5s7y+5dtejY0fT6sIPspoqn1ea3AMMvRxyY6nZyKY0Yv7Y+VKbsukQT6PeLFLt0A0/zqixkdpB56PfSi0hn4/odcqE5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Mateusz, On Wed, May 22, 2024 at 01:31:00AM +0200, Mateusz Guzik wrote: > Interrupt disable/enable trips are quite expensive on x86-64 compared to > a mere cmpxchg (note: no lock prefix!) and percpu counters are used > quite often. > > With this change I get a bump of 1% ops/s for negative path lookups, > plugged into will-it-scale: > > void testcase(unsigned long long *iterations, unsigned long nr) > { > while (1) { > int fd = open("/tmp/nonexistent", O_RDONLY); > assert(fd == -1); > > (*iterations)++; > } > } > > The win would be higher if it was not for other slowdowns, but one has > to start somewhere. This is cool! > > Signed-off-by: Mateusz Guzik > Acked-by: Vlastimil Babka > --- > > v3: > - add a missing word to the new comment > > v2: > - dodge preemption > - use this_cpu_try_cmpxchg > - keep the old variant depending on CONFIG_HAVE_CMPXCHG_LOCAL > > lib/percpu_counter.c | 44 +++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 39 insertions(+), 5 deletions(-) > > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c > index 44dd133594d4..c3140276bb36 100644 > --- a/lib/percpu_counter.c > +++ b/lib/percpu_counter.c > @@ -73,17 +73,50 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount) > EXPORT_SYMBOL(percpu_counter_set); > > /* > - * local_irq_save() is needed to make the function irq safe: > - * - The slow path would be ok as protected by an irq-safe spinlock. > - * - this_cpu_add would be ok as it is irq-safe by definition. > - * But: > - * The decision slow path/fast path and the actual update must be atomic, too. > + * Add to a counter while respecting batch size. > + * > + * There are 2 implementations, both dealing with the following problem: > + * > + * The decision slow path/fast path and the actual update must be atomic. > * Otherwise a call in process context could check the current values and > * decide that the fast path can be used. If now an interrupt occurs before > * the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters), > * then the this_cpu_add() that is executed after the interrupt has completed > * can produce values larger than "batch" or even overflows. > */ > +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL > +/* > + * Safety against interrupts is achieved in 2 ways: > + * 1. the fast path uses local cmpxchg (note: no lock prefix) > + * 2. the slow path operates with interrupts disabled > + */ > +void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > +{ > + s64 count; > + unsigned long flags; > + > + count = this_cpu_read(*fbc->counters); Should this_cpu_read() be inside the do {} while in case the extreme case that we get preempted after the read and before the cmpxchg AND count + amount < batch on both the previous and next cpu? > + do { > + if (unlikely(abs(count + amount)) >= batch) { > + raw_spin_lock_irqsave(&fbc->lock, flags); > + /* > + * Note: by now we might have migrated to another CPU > + * or the value might have changed. > + */ > + count = __this_cpu_read(*fbc->counters); > + fbc->count += count + amount; > + __this_cpu_sub(*fbc->counters, count); > + raw_spin_unlock_irqrestore(&fbc->lock, flags); > + return; > + } > + } while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount)); > +} > +#else > +/* > + * local_irq_save() is used to make the function irq safe: > + * - The slow path would be ok as protected by an irq-safe spinlock. > + * - this_cpu_add would be ok as it is irq-safe by definition. > + */ > void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > { > s64 count; > @@ -101,6 +134,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > } > local_irq_restore(flags); > } > +#endif > EXPORT_SYMBOL(percpu_counter_add_batch); > > /* > -- > 2.39.2 > Thanks, Dennis