From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=5vU3=EM=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CF5C4C2D0A3
	for <linux-mm@archiver.kernel.org>; Fri,  6 Nov 2020 23:33:46 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 5BC3720867
	for <linux-mm@archiver.kernel.org>; Fri,  6 Nov 2020 23:33:46 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BC3720867
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 94F8D6B005C; Fri,  6 Nov 2020 18:33:45 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8D76A6B0068; Fri,  6 Nov 2020 18:33:45 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 79E466B006C; Fri,  6 Nov 2020 18:33:45 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232])
	by kanga.kvack.org (Postfix) with ESMTP id 3FD1A6B005C
	for <linux-mm@kvack.org>; Fri,  6 Nov 2020 18:33:45 -0500 (EST)
Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id D82B3824999B
	for <linux-mm@kvack.org>; Fri,  6 Nov 2020 23:33:44 +0000 (UTC)
X-FDA: 77455597968.03.train68_550beb3272d6
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin03.hostedemail.com (Postfix) with ESMTP id B820F28A4EB
	for <linux-mm@kvack.org>; Fri,  6 Nov 2020 23:33:44 +0000 (UTC)
X-HE-Tag: train68_550beb3272d6
X-Filterd-Recvd-Size: 5375
Received: from mail-ua1-f66.google.com (mail-ua1-f66.google.com [209.85.222.66])
	by imf22.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri,  6 Nov 2020 23:33:44 +0000 (UTC)
Received: by mail-ua1-f66.google.com with SMTP id x13so784671uar.4
        for <linux-mm@kvack.org>; Fri, 06 Nov 2020 15:33:44 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=3hoQ7Rv8SuYGHeCGpYFig7VCdImbAanwm6eVTlT9qc8=;
        b=fDxt2GCGvJZ7Afz8hCsYmmKZQqDRveRKb2ptAcldVCFL8mBan5pU49K/hS/GZpTHBI
         EOnSjp0RuRM6E9hWyQb/8tVGm0V8SOBZHxyI8Jr0fXLJ2BpUwyCoi1RKwmS6OYtHxSPF
         D0fMRLtdMdPNTV1UgJpVPJD0pTqv7EQRtNrccQa/PaAWLv6fxLN/e784ywwAt/3eSyKq
         kzl8RMP7ysZ0mq8kSHuOvwfdDfBdXfEFHLR53nGcCEXyesb5qJWa6jdJICpGqCwgsj2T
         ZWi4lKRqpRjqW+U3EHSMz1Yy76pABPr2Lx9bG0seo1z5iAFOCciQH5C3gozurFGfuDwg
         7c2Q==
X-Gm-Message-State: AOAM5317ubtmenMuVroO3grtdBWXfCr1FBHhNxzCYe7H00HoNue10TF7
	8qQ4O5gMEuBXBbBwrek9yE0=
X-Google-Smtp-Source: ABdhPJwPsle1atKrtMGxvEd3SIByzvpbgRFglt+ohfEhqu9CR2z8P/Y2wL+hEh5onHjnKKqvxRBHRw==
X-Received: by 2002:ab0:31:: with SMTP id 46mr2588637uai.131.1604705623548;
        Fri, 06 Nov 2020 15:33:43 -0800 (PST)
Received: from google.com (239.145.196.35.bc.googleusercontent.com. [35.196.145.239])
        by smtp.gmail.com with ESMTPSA id x20sm355998vso.19.2020.11.06.15.33.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 06 Nov 2020 15:33:42 -0800 (PST)
Date: Fri, 6 Nov 2020 23:33:41 +0000
From: Dennis Zhou <dennis@kernel.org>
To: Wonhuyk Yang <vvghjk1234@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] percpu: reduce the number of searches calculating best
 upa
Message-ID: <20201106233341.GA1681505@google.com>
References: <20201102052647.8211-1-vvghjk1234@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201102052647.8211-1-vvghjk1234@gmail.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hi,

On Mon, Nov 02, 2020 at 02:26:47PM +0900, Wonhuyk Yang wrote:
> From: Wonhyuk Yang <vvghjk1234@gmail.com>
> 
> Best upa is determined by iterating 1 to max_upa. If the size of
> alloc_size is power of 2, numbers of iteration decrease to
> logarithmic level.
> 
> Prime factorization of alloc_size makes it easy to get possible
> upas. When alloc_size is power of 2, we can avoid cost of the
> prime factorization and possible upas are 1, 2, 4, ... max_upa.
> 
> Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
> ---
>  mm/percpu.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 66a93f096394..a24f3973744f 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2689,18 +2689,17 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  
>  	/*
>  	 * Determine min_unit_size, alloc_size and max_upa such that
> -	 * alloc_size is multiple of atom_size and is the smallest
> -	 * which can accommodate 4k aligned segments which are equal to
> -	 * or larger than min_unit_size.
> +	 * alloc_size is the maximu value of min_unit_size, atom_size.
> +	 * Also, alloc_size is power of 2 because both min_unit_size
> +	 * and atom_size are power of 2.
>  	 */
>  	min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
> +	min_unit_size = roundup_pow_of_two(min_unit_size);

While this may make sense for the vast majority of users, there remain
users such as embedded devices that page in the first chunk and have
fairly limited use of percpu memory. In these cases, we wouldn't want to
round up the min_unit_size as that might be wasteful albeit not much but
still to be not really worth changing the behavior here.

>  
>  	/* determine the maximum # of units that can fit in an allocation */
> -	alloc_size = roundup(min_unit_size, atom_size);
> -	upa = alloc_size / min_unit_size;
> -	while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
> -		upa--;
> -	max_upa = upa;
> +	alloc_size = max_t(size_t, min_unit_size, atom_size);
> +	max_upa = alloc_size / min_unit_size;
> +
>  
>  	/* group cpus according to their proximity */
>  	for_each_possible_cpu(cpu) {
> @@ -2727,12 +2726,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  	 * Related to atom_size, which could be much larger than the unit_size.
>  	 */
>  	last_allocs = INT_MAX;
> -	for (upa = max_upa; upa; upa--) {
> +	for (upa = max_upa; upa; upa >>= 1) {
>  		int allocs = 0, wasted = 0;
>  
> -		if (alloc_size % upa || (offset_in_page(alloc_size / upa)))
> -			continue;
> -
>  		for (group = 0; group < nr_groups; group++) {
>  			int this_allocs = DIV_ROUND_UP(group_cnt[group], upa);
>  			allocs += this_allocs;
> -- 
> 2.17.1
> 

Overall, I'm not inclined to take this because it is trying to optimize
boot time code, which runs at most a few times, by introducing a new
assumption. I personally find this code a little complex to parse, so
I'd rather not make the change unless it aided in maintainability or was
side effect free.

Thanks,
Dennis