From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF5C4C2D0A3 for ; Fri, 6 Nov 2020 23:33:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5BC3720867 for ; Fri, 6 Nov 2020 23:33:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BC3720867 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 94F8D6B005C; Fri, 6 Nov 2020 18:33:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D76A6B0068; Fri, 6 Nov 2020 18:33:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79E466B006C; Fri, 6 Nov 2020 18:33:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 3FD1A6B005C for ; Fri, 6 Nov 2020 18:33:45 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D82B3824999B for ; Fri, 6 Nov 2020 23:33:44 +0000 (UTC) X-FDA: 77455597968.03.train68_550beb3272d6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id B820F28A4EB for ; Fri, 6 Nov 2020 23:33:44 +0000 (UTC) X-HE-Tag: train68_550beb3272d6 X-Filterd-Recvd-Size: 5375 Received: from mail-ua1-f66.google.com (mail-ua1-f66.google.com [209.85.222.66]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Fri, 6 Nov 2020 23:33:44 +0000 (UTC) Received: by mail-ua1-f66.google.com with SMTP id x13so784671uar.4 for ; Fri, 06 Nov 2020 15:33:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3hoQ7Rv8SuYGHeCGpYFig7VCdImbAanwm6eVTlT9qc8=; b=fDxt2GCGvJZ7Afz8hCsYmmKZQqDRveRKb2ptAcldVCFL8mBan5pU49K/hS/GZpTHBI EOnSjp0RuRM6E9hWyQb/8tVGm0V8SOBZHxyI8Jr0fXLJ2BpUwyCoi1RKwmS6OYtHxSPF D0fMRLtdMdPNTV1UgJpVPJD0pTqv7EQRtNrccQa/PaAWLv6fxLN/e784ywwAt/3eSyKq kzl8RMP7ysZ0mq8kSHuOvwfdDfBdXfEFHLR53nGcCEXyesb5qJWa6jdJICpGqCwgsj2T ZWi4lKRqpRjqW+U3EHSMz1Yy76pABPr2Lx9bG0seo1z5iAFOCciQH5C3gozurFGfuDwg 7c2Q== X-Gm-Message-State: AOAM5317ubtmenMuVroO3grtdBWXfCr1FBHhNxzCYe7H00HoNue10TF7 8qQ4O5gMEuBXBbBwrek9yE0= X-Google-Smtp-Source: ABdhPJwPsle1atKrtMGxvEd3SIByzvpbgRFglt+ohfEhqu9CR2z8P/Y2wL+hEh5onHjnKKqvxRBHRw== X-Received: by 2002:ab0:31:: with SMTP id 46mr2588637uai.131.1604705623548; Fri, 06 Nov 2020 15:33:43 -0800 (PST) Received: from google.com (239.145.196.35.bc.googleusercontent.com. [35.196.145.239]) by smtp.gmail.com with ESMTPSA id x20sm355998vso.19.2020.11.06.15.33.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Nov 2020 15:33:42 -0800 (PST) Date: Fri, 6 Nov 2020 23:33:41 +0000 From: Dennis Zhou To: Wonhuyk Yang Cc: Tejun Heo , Christoph Lameter , linux-mm@kvack.org Subject: Re: [PATCH] percpu: reduce the number of searches calculating best upa Message-ID: <20201106233341.GA1681505@google.com> References: <20201102052647.8211-1-vvghjk1234@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201102052647.8211-1-vvghjk1234@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, On Mon, Nov 02, 2020 at 02:26:47PM +0900, Wonhuyk Yang wrote: > From: Wonhyuk Yang > > Best upa is determined by iterating 1 to max_upa. If the size of > alloc_size is power of 2, numbers of iteration decrease to > logarithmic level. > > Prime factorization of alloc_size makes it easy to get possible > upas. When alloc_size is power of 2, we can avoid cost of the > prime factorization and possible upas are 1, 2, 4, ... max_upa. > > Signed-off-by: Wonhyuk Yang > --- > mm/percpu.c | 20 ++++++++------------ > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/mm/percpu.c b/mm/percpu.c > index 66a93f096394..a24f3973744f 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -2689,18 +2689,17 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( > > /* > * Determine min_unit_size, alloc_size and max_upa such that > - * alloc_size is multiple of atom_size and is the smallest > - * which can accommodate 4k aligned segments which are equal to > - * or larger than min_unit_size. > + * alloc_size is the maximu value of min_unit_size, atom_size. > + * Also, alloc_size is power of 2 because both min_unit_size > + * and atom_size are power of 2. > */ > min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE); > + min_unit_size = roundup_pow_of_two(min_unit_size); While this may make sense for the vast majority of users, there remain users such as embedded devices that page in the first chunk and have fairly limited use of percpu memory. In these cases, we wouldn't want to round up the min_unit_size as that might be wasteful albeit not much but still to be not really worth changing the behavior here. > > /* determine the maximum # of units that can fit in an allocation */ > - alloc_size = roundup(min_unit_size, atom_size); > - upa = alloc_size / min_unit_size; > - while (alloc_size % upa || (offset_in_page(alloc_size / upa))) > - upa--; > - max_upa = upa; > + alloc_size = max_t(size_t, min_unit_size, atom_size); > + max_upa = alloc_size / min_unit_size; > + > > /* group cpus according to their proximity */ > for_each_possible_cpu(cpu) { > @@ -2727,12 +2726,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( > * Related to atom_size, which could be much larger than the unit_size. > */ > last_allocs = INT_MAX; > - for (upa = max_upa; upa; upa--) { > + for (upa = max_upa; upa; upa >>= 1) { > int allocs = 0, wasted = 0; > > - if (alloc_size % upa || (offset_in_page(alloc_size / upa))) > - continue; > - > for (group = 0; group < nr_groups; group++) { > int this_allocs = DIV_ROUND_UP(group_cnt[group], upa); > allocs += this_allocs; > -- > 2.17.1 > Overall, I'm not inclined to take this because it is trying to optimize boot time code, which runs at most a few times, by introducing a new assumption. I personally find this code a little complex to parse, so I'd rather not make the change unless it aided in maintainability or was side effect free. Thanks, Dennis