From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6614DEE49A3 for ; Tue, 22 Aug 2023 17:02:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9B2228004E; Tue, 22 Aug 2023 13:02:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E23B7280046; Tue, 22 Aug 2023 13:02:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9D9828004E; Tue, 22 Aug 2023 13:02:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B3936280046 for ; Tue, 22 Aug 2023 13:02:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7FCC1C03FF for ; Tue, 22 Aug 2023 17:02:26 +0000 (UTC) X-FDA: 81152359092.02.69E206E Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf23.hostedemail.com (Postfix) with ESMTP id 91090140032 for ; Tue, 22 Aug 2023 17:02:23 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf23.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692723743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q4sVlQQACP9KfO4dkY6wxP9XqMwO3mckdV+5TyWvLmA=; b=klK71A2sckkxVXUpvsTGGqcHohG7qUMUfOeG9Fc2Lks5ttYpTYUQdPhYK56oK6LDIK6aa0 nMNQz4sqb7tOpo3jAjsOptidjrPcuL5Ezqhul+mKi/iAfkUW5M2u5Ho99D3Eq4xvTv92MR Pv8lhHIhiYqOihhVCPNWFf+E1UGlVmk= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf23.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692723743; a=rsa-sha256; cv=none; b=bs8CNMVu+23HXccFMAtre9G8KdqP41cjZxsfSIO+CabXE6VwqLiueDh70MMab85umkNauX tLYpBv2jYALqe8RDFgrE7hShx+Diptpeb0GC3dh7fkYdBV6XIUOjttldHyeD3huaaOMcFd mec01MQisrI1COP6fGCIEmD34t+vauY= Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-68a5457b930so1517413b3a.2 for ; Tue, 22 Aug 2023 10:02:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692723742; x=1693328542; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=q4sVlQQACP9KfO4dkY6wxP9XqMwO3mckdV+5TyWvLmA=; b=aSJ3pPEnTc9Efs/bRlzR5WoxiSXqw8MketP9nceSFFw/KsAkviDFU/3IZ+d7SXICWt qjYjp6TdLdKoiX4/FwsZeVI9zcpvFmuCFwjosuCBHspxUB2mCmySi0t/q1/begO/5Im4 neWxZ24BSE8IUIgMKw2cP+ZnMCFR1wB9Ojh/GaoTna4YhqfM8g7lNfRcvzpuvfFoH0rP qtuS2agaQlXuzqFYWdEsZNnDaRC1bFEMXw3mLWRL9p0sDCL13PSRo7cU2CJZrz1Mxk09 UcPUv6hoEra5YT4IJiZR/QeANfR4q57iyhO3STvFPpkOoSiXMFin6ZpNJtPcLqQFoOPC 80kw== X-Gm-Message-State: AOJu0YwoDsp2xjpuo1Ydxf4ro0Y4IUbmG7Jnj6fyCEd/UHgNAGk7GTbF 1bTXs02mt4hC43d9VskRkDI= X-Google-Smtp-Source: AGHT+IF34UVPbfUAWwyA5LVhkb7wEW66G0fsDOo/mn220q8082TYndkx8753uBbmEvMqwqiEY2tOSg== X-Received: by 2002:a05:6a20:4310:b0:149:2149:9491 with SMTP id h16-20020a056a20431000b0014921499491mr6131522pzk.43.1692723742138; Tue, 22 Aug 2023 10:02:22 -0700 (PDT) Received: from snowbird ([136.25.84.107]) by smtp.gmail.com with ESMTPSA id y3-20020aa78543000000b006875a366acfsm4199880pfn.8.2023.08.22.10.02.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 10:02:21 -0700 (PDT) Date: Tue, 22 Aug 2023 10:02:19 -0700 From: Dennis Zhou To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org Subject: Re: [PATCH 1/2] pcpcntr: add group allocation/free Message-ID: References: <20230821202829.2163744-1-mjguzik@gmail.com> <20230821202829.2163744-2-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230821202829.2163744-2-mjguzik@gmail.com> X-Rspamd-Queue-Id: 91090140032 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: yhhmf3i8wuu8dj3xuidxeckatx7ki7bu X-HE-Tag: 1692723743-442926 X-HE-Meta: U2FsdGVkX19oqQHYIdemn0OhBAOB3O9bFCu756DFIh6EhEYtHj/E4rCe7/KZTtDC3eJg40NNaMfuGWsFIBUkE7+ehzZXn3NsK1anzpgS6PZYF/liL48M+8o2pAa7SO8xexip357USSfQMn5k3ASUydovCxtl2cO8gnT0tbnn1oB6zGhAtbPZCvc4wqBlkzyNapR4K6AwtzvxU9WrJUXLnyS/SC+4LzV9kTAUz4CxQj0rbjdXbOutw/QByMTgw7LXp6hTFHmqYGG/NTpsZZST1BuvjLGqHO6P4nyHzGM2sLMXF2bOWKlO+yCgRTjgcZrXeAY4JNUXhiNzhpScKu8niiq5qyeSC6l9EZO2hBm1gT8CBpIvcDPzgB9CO3d0Izg15SjQgylz18nlagPU0qEel3v4eIfrAz9h2J5pmV8nDJAiA76osaPfS0W59ZkZFJMcIleOIkeM3q3fXcI3V5LO3lh/8YKkEpt9LyOGc8YyleiBgbu30QQQGTflrkaKH0OpU9bOYChb+RRe49YdVVThNlX1YNZKDBnFo5JX1bN6EEQZQYwZtCs46qXGRCGto0JdGu61IRGf9+0bd71A0nI/vLGeLzRKrf+AmtCDN/CnLM8E0dqwk3w7ImYImyJE8guMTk9Jn+mHpuexMkmSbLsMvhD5Ua7KIF3T9owuqW1QEwAKl1dxP8Fun9lL6dHEUGSaCk2D9J+ULuZnYBQ8JZdDJFKBpM16WXdN9r5UVY0Y4DLKm59DERckvbGypzsBEUewJ4NVXe/omI+hv6/Jb8myzCwU6A4NYji28BiP5DGNITXtVczVN+Gwbpzaw1ELSCTJ3rep1C4ior3i3PazyxpHPuYZrTeun6ZfYEU76ZTctf0MMPqz6pcwSdK1RzYNbfBQXn8fj8FIwJAfrg7hXdlhbn5/E89HjqH8rvpjoAyS4U1fOqoMY1AkJNGXcqXSSaHRPX8tFuqa45ipwaG/6XF JVn1fu5q mkYTX29KQ0oATzzL3ehOXJ4vYHqGxPFRhgyC73JY6CuVzZ5sudEqZSxTwNiNbEjTlPJvQOCJS+YSVAzgKovZg3j9yN/rwtk7Rca+eiCZxCfW2M+qxOompkH7D0UTZJpPlk3OoUlffXig453cnR8kzHxbCWIfSiMFp5HpMeOcql5CnCkC49EVKmLSbetLHt2YmsmR+DrqOKcRuwID69uMiw1UxoLqsOPuQ5em5PRaFjrYX+kioydnYeVH/kJPOI2WT6VuB56N95quACNyVDotLQ9uqiG20SHizqhnpDSEAzFKjAxyz9wGos0wM7y7+FJmS+ynUN+nR51XWGA0vRtud7eESNndYTc3yEa5VjD0ESCyRwxAEjOW1nGxEEQJI77lPI28JyK2/xvogPTf/nJptYFC1dEItV3ZlIyMmEoHJP4KUDx0XsALxvrrlSx6D5IDf0ws8iyH9QbBBDk2Bv3pZAm8RkJEPQDEtwTgf2dnOKxRPpA6UEixjbQMQ8Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mateusz, On Mon, Aug 21, 2023 at 10:28:28PM +0200, Mateusz Guzik wrote: > Allocations and frees are globally serialized on the pcpu lock (and the > CPU hotplug lock if enabled, which is the case on Debian). > > At least one frequent consumer allocates 4 back-to-back counters (and > frees them in the same manner), exacerbating the problem. > > While this does not fully remedy scalability issues, it is a step > towards that goal and provides immediate relief. > > Signed-off-by: Mateusz Guzik As I mentioned yesterday, I like this approach because instead of making percpu smarter, we're batching against the higher level inits which I'm assuming will have additional synchronization requirements. Below is mostly style changes to conform and a few nits wrt naming. > --- > include/linux/percpu_counter.h | 19 ++++++++--- > lib/percpu_counter.c | 61 ++++++++++++++++++++++++---------- > 2 files changed, 57 insertions(+), 23 deletions(-) > > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h > index 75b73c83bc9d..ff5850b07124 100644 > --- a/include/linux/percpu_counter.h > +++ b/include/linux/percpu_counter.h > @@ -30,17 +30,26 @@ struct percpu_counter { > > extern int percpu_counter_batch; > > -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp, > - struct lock_class_key *key); > +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp, > + struct lock_class_key *key, u32 count); 1. Can we move count to before the lock_class_key? 2. count is an overloaded term between percpu_counters and percpu_refcount. Maybe `nr` or `nr_counters` is better? > > -#define percpu_counter_init(fbc, value, gfp) \ > +#define percpu_counter_init_many(fbc, value, gfp, count) \ > ({ \ > static struct lock_class_key __key; \ > \ > - __percpu_counter_init(fbc, value, gfp, &__key); \ > + __percpu_counter_init_many(fbc, value, gfp, &__key, count);\ > }) > > -void percpu_counter_destroy(struct percpu_counter *fbc); > + > +#define percpu_counter_init(fbc, value, gfp) \ > + percpu_counter_init_many(fbc, value, gfp, 1) > + > +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 count); > +static inline void percpu_counter_destroy(struct percpu_counter *fbc) > +{ > + percpu_counter_destroy_many(fbc, 1); > +} > + > void percpu_counter_set(struct percpu_counter *fbc, s64 amount); > void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, > s32 batch); > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c > index 5004463c4f9f..2a33cf23df55 100644 > --- a/lib/percpu_counter.c > +++ b/lib/percpu_counter.c > @@ -151,48 +151,73 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc) > } > EXPORT_SYMBOL(__percpu_counter_sum); > > -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp, > - struct lock_class_key *key) > +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp, > + struct lock_class_key *key, u32 count) > { > unsigned long flags __maybe_unused; > + s32 __percpu *counters; > + u32 i; > > - raw_spin_lock_init(&fbc->lock); > - lockdep_set_class(&fbc->lock, key); > - fbc->count = amount; > - fbc->counters = alloc_percpu_gfp(s32, gfp); > - if (!fbc->counters) > + counters = __alloc_percpu_gfp(sizeof(*counters) * count, > + sizeof(*counters), gfp); So while a bit moot, I think it'd be nice to clarify what we should do here wrt alignment and batch allocation. There has been a case in the past where alignment > size. This is from my batch api implementation: + /* + * When allocating a batch we need to align the size so that each + * element in the batch will have the appropriate alignment. + */ + size = ALIGN(size, align); So I think some variation of: element_size = ALIGN(sizeof(*counters, __alignof__(*counters))); counters = __alloc_percpu_gfp(nr * element_size, __alignof__(*counters), gfp); While this isn't necessary here for s32's, I think it would be nice to just set the precedent so we preserve alignment asks for future users if they use this as a template. > + if (!counters) { > + fbc[0].counters = NULL; > return -ENOMEM; > + } > > - debug_percpu_counter_activate(fbc); > + for (i = 0; i < count; i++) { > + raw_spin_lock_init(&fbc[i].lock); > + lockdep_set_class(&fbc[i].lock, key); > +#ifdef CONFIG_HOTPLUG_CPU > + INIT_LIST_HEAD(&fbc[i].list); > +#endif > + fbc[i].count = amount; > + fbc[i].counters = &counters[i]; This would have to become some (void *) math tho with element_size. > + > + debug_percpu_counter_activate(&fbc[i]); > + } > > #ifdef CONFIG_HOTPLUG_CPU > - INIT_LIST_HEAD(&fbc->list); > spin_lock_irqsave(&percpu_counters_lock, flags); > - list_add(&fbc->list, &percpu_counters); > + for (i = 0; i < count; i++) { > + list_add(&fbc[i].list, &percpu_counters); > + } nit: we don't add {} for single line loops. > spin_unlock_irqrestore(&percpu_counters_lock, flags); > #endif > return 0; > } > -EXPORT_SYMBOL(__percpu_counter_init); > +EXPORT_SYMBOL(__percpu_counter_init_many); > > -void percpu_counter_destroy(struct percpu_counter *fbc) > +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 count) > { > unsigned long flags __maybe_unused; > + u32 i; > > - if (!fbc->counters) > + if (WARN_ON_ONCE(!fbc)) > return; > > - debug_percpu_counter_deactivate(fbc); > + if (!fbc[0].counters) > + return; > + > + for (i = 0; i < count; i++) { > + debug_percpu_counter_deactivate(&fbc[i]); > + } > nit: no {}. > #ifdef CONFIG_HOTPLUG_CPU > spin_lock_irqsave(&percpu_counters_lock, flags); > - list_del(&fbc->list); > + for (i = 0; i < count; i++) { > + list_del(&fbc[i].list); > + } nit: no {}. > spin_unlock_irqrestore(&percpu_counters_lock, flags); > #endif > - free_percpu(fbc->counters); > - fbc->counters = NULL; > + > + free_percpu(fbc[0].counters); > + > + for (i = 0; i < count; i++) { > + fbc[i].counters = NULL; > + } nit: no {}. > } > -EXPORT_SYMBOL(percpu_counter_destroy); > +EXPORT_SYMBOL(percpu_counter_destroy_many); > > int percpu_counter_batch __read_mostly = 32; > EXPORT_SYMBOL(percpu_counter_batch); > -- > 2.39.2 > Thanks, Dennis