From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAE97C25B7C for ; Wed, 22 May 2024 23:52:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 166186B0083; Wed, 22 May 2024 19:52:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 116516B0085; Wed, 22 May 2024 19:52:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1F736B0088; Wed, 22 May 2024 19:52:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D32A86B0083 for ; Wed, 22 May 2024 19:52:06 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 34247120117 for ; Wed, 22 May 2024 23:52:06 +0000 (UTC) X-FDA: 82147682652.23.CD12375 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf22.hostedemail.com (Postfix) with ESMTP id 61D66C0008 for ; Wed, 22 May 2024 23:52:04 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716421924; a=rsa-sha256; cv=none; b=OGK/7HFHdk2ohrgJxjMgOQDg0CegY7s99AYXWegBjg5k2T62nEhkEpQ6fPMlJFPfz8fyvS EjuNXGScG6C4gp0lPgbqJ5C0uQ5+BTLVJ3ELU5iIiaFKCfPxvdd3vTQ8fiwq9xuchRdrdb daoNekPC/AXoemiZEcKyPQk0ns1b3jM= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716421924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YgJe/VU2CHA4O5aH9yebofep1FFi9VwegOd03XFkMT0=; b=u/eii1WshGTWUn8EtvTM+zHphL5KbvDTuOdXyaOsh4NmtJj56GNnPJetYjiANxgi2lEJ+P onDwxOWpf01DRoHH9zmzo2O5vXYDhSgi8W9DyY6hV4v+OdsEMe389YFHUWpWUf7IMUkFdO 7oQhl1gCq1GNHrvs3icB5dwZBhzvX5Q= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1ee5235f5c9so115709725ad.2 for ; Wed, 22 May 2024 16:52:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716421923; x=1717026723; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YgJe/VU2CHA4O5aH9yebofep1FFi9VwegOd03XFkMT0=; b=ZVDk+IlaRy+lvFjDb+VDhp72nH2tmTespnAqY1NBz7vARBLEE7PP+/NWQaiLvzUkgu 5QYvlzzwL10kQC9+do5WoeW2KV4Sq93OzMtodf5k6HDylBvl7A8yA+uLdOyp7Ty005dW A+nvvc01fkX3/uHfU/YjHBiAl/i6CgW+eZqrVu9JGaVneIraWJtQ4prFW/54UuFgLwlx v4uqPIwBUMB6YNV7HW45Ol9PyfisB2+MZIr4n/HTeNEqNGuCOdnjwo0GgKfYQz0GdS/y JDCCBGIeY4YKiCgh+5LNO7qpVXLBGsSYhFW7DtHE7o6gr6U0/CJnK3O/uyDfO0wjT121 fC7A== X-Forwarded-Encrypted: i=1; AJvYcCXb6wh+mgsgxTQHvIpJjx77hTy+NYHV0l1yPUKj3JDYW6G8wYjoEKAnvjJ1+6WUpQi7Nkw3Hj17CQsSseAZmjvlCQY= X-Gm-Message-State: AOJu0YwMaIgAtbqd4/uKAmod4JamRtduGHHrdiAK0X3e5ty81vCPQF2j +jWGLVMwntGs/NwLx2j7yYQX7rVSwwGVYZybs6jpZM7Q7OwGjrx8 X-Google-Smtp-Source: AGHT+IFx9sqketrAstF369fJMuliVnj7CDzB4Ud/h+zw6WNkjxLXPUMX9giQK7mMH1W/C8fVySpfvg== X-Received: by 2002:a17:902:ce8f:b0:1f3:33b:ff18 with SMTP id d9443c01a7336-1f31c964f6bmr39555795ad.11.1716421923052; Wed, 22 May 2024 16:52:03 -0700 (PDT) Received: from snowbird ([136.25.84.117]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0b9d97e9sm245050305ad.17.2024.05.22.16.52.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 May 2024 16:52:02 -0700 (PDT) Date: Wed, 22 May 2024 16:52:00 -0700 From: Dennis Zhou To: Mateusz Guzik Cc: tj@kernel.org, hughd@google.com, akpm@linux-foundation.org, vbabka@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] percpu_counter: add a cmpxchg-based _add_batch variant Message-ID: References: <20240521233100.358002-1-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 61D66C0008 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 6uptzzjh956j5efh7abffh7ggp5k5q68 X-HE-Tag: 1716421924-154006 X-HE-Meta: U2FsdGVkX19zV/zDabeheLIZDqwEAAOkjF0hwpZUnt8Go3hqMPCaeVT8lya0l6+KrVdD2imYS7IM08aSKAsB11Nc8Ztsm4gjK/kq4BZvk/LjyD5SveoQh/nZCE2qE71RCpH/bYjlOYk0cVZOmbkbfpdDFf/FioFHXm2lv2Li3NVxfI5jDpmi/VBcyYG0g7TE1HGSWr8ad3dKhfOc77klbTSzLFjy5leJFoEPEJI/Kus22ZfTx8H6082uP+9rJ2h9boSTyiD4Lldn08/umMD0Nh3Jxk/p3/UYSAm5rx0/Pz2ZaZ7bDYo6Evz7/Rtehs81hS0hOceDUjzuGKNdftI15FqJ2iJAKqiHNvqbuU2Wp+2Ai6MNA8y3Z7ebW0I9BzlKb7qsAy7+dNHqkrXf2e7PCIjzz4VZGZfo+NxCfL8TDSocVaIrLndybRmYCd0CPavSEFlK7ay1m6SgjFseJ5ozbrfndvsU1K0i3jSA7BF5m2E24JJbsgIy1ngGO3Z/z09RJTubZin0HDg7zkP81qSpnKhO6iy1NnAt6Z/dIXki2g9+jlARnkT6151tJKE3LWqjcOvg/1kG25ETMUNM/JlbugNbJIEorkBYW0q2eTiA5jox283Phtfl+ZOXSSYM8JNQEkibWW2nbhHOKljSEfSzwfTF/JJqJ3yBVWAJit9cOazzr339z4rTcDwDvt23d6IQApANmUQKyemcGrdzqiGr7UPFF5ytmcG2SFzTFtxXDVA2JTK67bt224juPbgILIXijKkV2OakiOTEmBsf9V7LT3XCrRPwyAF/GEJW+iEWQbxXjm8ANgCWeU+gvx8shZrIRqyr6geMkoV09JsugGlgEpmvniNnx/59W/CkmLdtkHIfJC2PVxdKYFMivhLXwta3O9zXS8nkZozuKZFykTbb2bt4ROCngc7KJhgOCEHexDEObm9M/BWpgonxehI4kZ+WQWyamjp5qMDLshhBB/L Y/EMWSME grdqBDIMTsRlaVNAhsPLByfBHL7CfJzXhEzQyprND2z4+dRTg1YNqDnGKoEKiEVn+t7Dc83cc2KPPiNxJM0pcFziDX0cwc7ZRIiN6fuVLLTei1LkGvDU7tMj2veNhpld5aijIJ8V96NaJBZH1IzlvGoprAZgFQCFSihuFQOJUHPQ72ITiKNCpotpU+SMIF263UnKLeox1OZ1aTrl8YY8uMVIJBYQfogI8MWjP3Vp7y9VeQpi3r8Jmfqt7I2tTfrvy+6fhCm8kpStz7cYtB6smJaLH5Do+sJC69CrckOVmP4PBp4/19JlriHRiO9fLKzc101aVe61jqOXxoVmLOObGLpLDT24D97oTBNyqYOttK4XTGZSQ82HawgLLERvyH7Lx/MHH2EN1KtM39lG8og0KI+xRqKXPXNg+sv5XX4FxfDZCLNx/48ZQugVbADuvckHXe7CydFxx1xqYweE8HvI2+zfH2g1go1olwVRwUSbe+DqeXg4hSBp5Ln6G2y5I1OttJebcSvAGoY4LWSfXrrmw9qOMffgJN22CGr/NJfdLjORvg4WOzv21DGwrOgV6hYF9s26qXQnJWBqHsCkKWQ+Zq1VrOYaU5kgDY+LN9Ss2kRNCddSMD7elGS6x3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 22, 2024 at 06:59:02AM +0200, Mateusz Guzik wrote: > On Wed, May 22, 2024 at 3:17 AM Dennis Zhou wrote: > > > > Hi Mateusz, > > > > On Wed, May 22, 2024 at 01:31:00AM +0200, Mateusz Guzik wrote: > > > Interrupt disable/enable trips are quite expensive on x86-64 compared to > > > a mere cmpxchg (note: no lock prefix!) and percpu counters are used > > > quite often. > > > > > > With this change I get a bump of 1% ops/s for negative path lookups, > > > plugged into will-it-scale: > > > > > > void testcase(unsigned long long *iterations, unsigned long nr) > > > { > > > while (1) { > > > int fd = open("/tmp/nonexistent", O_RDONLY); > > > assert(fd == -1); > > > > > > (*iterations)++; > > > } > > > } > > > > > > The win would be higher if it was not for other slowdowns, but one has > > > to start somewhere. > > > > This is cool! > > > > > > > > Signed-off-by: Mateusz Guzik > > > Acked-by: Vlastimil Babka > > > --- > > > > > > v3: > > > - add a missing word to the new comment > > > > > > v2: > > > - dodge preemption > > > - use this_cpu_try_cmpxchg > > > - keep the old variant depending on CONFIG_HAVE_CMPXCHG_LOCAL > > > > > > lib/percpu_counter.c | 44 +++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 39 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c > > > index 44dd133594d4..c3140276bb36 100644 > > > --- a/lib/percpu_counter.c > > > +++ b/lib/percpu_counter.c > > > @@ -73,17 +73,50 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount) > > > EXPORT_SYMBOL(percpu_counter_set); > > > > > > /* > > > - * local_irq_save() is needed to make the function irq safe: > > > - * - The slow path would be ok as protected by an irq-safe spinlock. > > > - * - this_cpu_add would be ok as it is irq-safe by definition. > > > - * But: > > > - * The decision slow path/fast path and the actual update must be atomic, too. > > > + * Add to a counter while respecting batch size. > > > + * > > > + * There are 2 implementations, both dealing with the following problem: > > > + * > > > + * The decision slow path/fast path and the actual update must be atomic. > > > * Otherwise a call in process context could check the current values and > > > * decide that the fast path can be used. If now an interrupt occurs before > > > * the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters), > > > * then the this_cpu_add() that is executed after the interrupt has completed > > > * can produce values larger than "batch" or even overflows. > > > */ > > > +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL > > > +/* > > > + * Safety against interrupts is achieved in 2 ways: > > > + * 1. the fast path uses local cmpxchg (note: no lock prefix) > > > + * 2. the slow path operates with interrupts disabled > > > + */ > > > +void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > > > +{ > > > + s64 count; > > > + unsigned long flags; > > > + > > > + count = this_cpu_read(*fbc->counters); > > > > Should this_cpu_read() be inside the do {} while in case the extreme > > case that we get preempted after the read and before the cmpxchg AND > > count + amount < batch on both the previous and next cpu? > > > > this_cpu_try_cmpxchg updates the local value on failure (hence &), so > from semantic pov this is equivalent to having this_cpu_read in the > loop. I'm using it the same way as mod_zone_state. > Ah I didn't catch that last night. Thanks. I've applied this to percpu#for-6.11. Thanks, Dennis > > > + do { > > > + if (unlikely(abs(count + amount)) >= batch) { > > > + raw_spin_lock_irqsave(&fbc->lock, flags); > > > + /* > > > + * Note: by now we might have migrated to another CPU > > > + * or the value might have changed. > > > + */ > > > + count = __this_cpu_read(*fbc->counters); > > > + fbc->count += count + amount; > > > + __this_cpu_sub(*fbc->counters, count); > > > + raw_spin_unlock_irqrestore(&fbc->lock, flags); > > > + return; > > > + } > > > + } while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount)); > > > +} > > > +#else > > > +/* > > > + * local_irq_save() is used to make the function irq safe: > > > + * - The slow path would be ok as protected by an irq-safe spinlock. > > > + * - this_cpu_add would be ok as it is irq-safe by definition. > > > + */ > > > void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > > > { > > > s64 count; > > > @@ -101,6 +134,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) > > > } > > > local_irq_restore(flags); > > > } > > > +#endif > > > EXPORT_SYMBOL(percpu_counter_add_batch); > > > > > > /* > > > -- > > > 2.39.2 > > > > > > > Thanks, > > Dennis > > > > -- > Mateusz Guzik