From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7A07C43334 for ; Thu, 23 Jun 2022 05:25:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F03C88E0127; Thu, 23 Jun 2022 01:25:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB3AC8E0115; Thu, 23 Jun 2022 01:25:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7BBA8E0127; Thu, 23 Jun 2022 01:25:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C97BF8E0115 for ; Thu, 23 Jun 2022 01:25:10 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9F51E3335A for ; Thu, 23 Jun 2022 05:25:10 +0000 (UTC) X-FDA: 79608361980.09.4AD130F Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf09.hostedemail.com (Postfix) with ESMTP id 2D1A51400BC for ; Thu, 23 Jun 2022 05:25:10 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id f16so16431369pjj.1 for ; Wed, 22 Jun 2022 22:25:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=BkA3goyCUVCAoZrWCBXBcphDU/vQ/nDzMYEvUN39kDs=; b=CxGelxIsH+OWmJwJRJmFJH721XC8kg8fgx8D/848lP1zmr+4NOCpXgJrviyg2rzbNT oWcb6JG1BvWTPt5xC9Z9ZWzrPAZDSmjG0QR2F5QoLzjkbCg30GWnMC3m4T28e6KLcZc2 pVOtGNxBdP0N2n3+PQUWpYEjOt5IyyAH6ZKEab6yL0Jf4cWOQMxyDeiXQ4WghLxsjuhV Hxz6XHPHJhAtK8E1dkZ3N8sc7L8XNLogUFQaawb9hwE91iRPXOAM1/uyz3OGfxAojUF4 RNt71L/6hLIR8FwYs801zYlvjqoi9aqI6/BoSh0PUgt5fqpabHp8BU4PxSsknbbhyKSn 0O5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=BkA3goyCUVCAoZrWCBXBcphDU/vQ/nDzMYEvUN39kDs=; b=xjMpTKinkzIzv3Xbg7FCiwqBu4PUAy8PAMuV0pQ/vY05JFhqXOMphyUroe9ZrnEcXH YYBWhjfQOCgW8kvspEUWBnP1HE1lyGbQz2/LCS8uj7Q9izXXNwup5aGOsoUT/jtEqixQ Ys+UkBwoondLlbjm4iyfnv9HUeUkXt7yQSsfUH4DEWZnp3Ybb4LdKGFlq88voe6NNgTx ENGjWocqrM0zjlgQ2iMjt1NarKS0cOJrUtmdhnCxG0LBrnyXd/s4lW4ZxXyi8vJUhAtQ lyMmfsX+dtPEFpNzl8StZDz0cUPBNYWbz5hNyhJfF9NAmMjPcazQn/I2ZLtGl7mFymDZ kIsA== X-Gm-Message-State: AJIora88AUd9moP8BfivDvusa144CMc+T5k3pceG28nGspbfCIF/sT8G NS4JjW9Dx4fOhY+L4i8DMDQ= X-Google-Smtp-Source: AGRyM1tKv7/KhHLB6cxCXOyCj7zNBU33siZHD0cvgCEoAz2jFvmnH5BH3WZliSs/CTpvDrtu9o7LNw== X-Received: by 2002:a17:902:d50b:b0:16a:2cb3:750d with SMTP id b11-20020a170902d50b00b0016a2cb3750dmr16163171plg.17.1655961909011; Wed, 22 Jun 2022 22:25:09 -0700 (PDT) Received: from fedora (136-24-99-118.cab.webpass.net. [136.24.99.118]) by smtp.gmail.com with ESMTPSA id p5-20020a17090b010500b001ec7ba41fe7sm772999pjz.48.2022.06.22.22.25.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 22:25:08 -0700 (PDT) Date: Wed, 22 Jun 2022 22:25:31 -0700 From: Dennis Zhou To: Yafang Shao Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, quentin@isovalent.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, vbabka@suse.cz, linux-mm@kvack.org, bpf@vger.kernel.org Subject: Re: [RFC PATCH bpf-next 07/10] mm: Add helper to recharge percpu address Message-ID: References: <20220619155032.32515-1-laoar.shao@gmail.com> <20220619155032.32515-8-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220619155032.32515-8-laoar.shao@gmail.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655961910; a=rsa-sha256; cv=none; b=VWzjvszxuOtOpQg/7pAJYFCsT0gVIyvE0cBWQx8wVr56K89tVxlAkiycwKxemqMjxdSL+o miEhqKNvlazwDDyt9xW9VcLGkAwHQK36Zjpyc9rQKA0nXYxx/XofMyweSel8PR/QnafaTX 8AbNfgVn9AVlS0EHRFIoetePlBSYrYo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CxGelxIs; spf=pass (imf09.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655961910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BkA3goyCUVCAoZrWCBXBcphDU/vQ/nDzMYEvUN39kDs=; b=Wo7EXP7IeZfss51wLtqbiIzdH4lBQ42Etbsw7WItRkK+XgaKKfTyJFwmn8rqPDBY6ydd5i 60r5wJkwSqEKNl0Jl2Qzpq0n5pzEYf63PYrvQ7WGBKjEcGAm7ZKBpcd8ub6VS0MXaPJzFX JhmdAlXSRV35Ms+8BoVtSdhGRS3gsRc= X-Rspamd-Queue-Id: 2D1A51400BC X-Rspam-User: Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CxGelxIs; spf=pass (imf09.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam08 X-Stat-Signature: qsr34dbsssbhkcihmgbs3nhorjtcwcrd X-HE-Tag: 1655961910-162346 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Sun, Jun 19, 2022 at 03:50:29PM +0000, Yafang Shao wrote: > This patch introduces a helper to recharge the corresponding pages of > a given percpu address. It is similar with how to recharge a kmalloc'ed > address. > > Signed-off-by: Yafang Shao > --- > include/linux/percpu.h | 1 + > mm/percpu.c | 98 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 99 insertions(+) > > diff --git a/include/linux/percpu.h b/include/linux/percpu.h > index f1ec5ad1351c..e88429410179 100644 > --- a/include/linux/percpu.h > +++ b/include/linux/percpu.h > @@ -128,6 +128,7 @@ extern void __init setup_per_cpu_areas(void); > extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1); > extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1); > extern void free_percpu(void __percpu *__pdata); > +bool recharge_percpu(void __percpu *__pdata, int step); Nit: can you add extern to keep the file consistent. > extern phys_addr_t per_cpu_ptr_to_phys(void *addr); > > #define alloc_percpu_gfp(type, gfp) \ > diff --git a/mm/percpu.c b/mm/percpu.c > index 3633eeefaa0d..fd81f4d79f2f 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -2310,6 +2310,104 @@ void free_percpu(void __percpu *ptr) > } > EXPORT_SYMBOL_GPL(free_percpu); > > +#ifdef CONFIG_MEMCG_KMEM > +bool recharge_percpu(void __percpu *ptr, int step) > +{ > + int bit_off, off, bits, size, end; > + struct obj_cgroup *objcg_old; > + struct obj_cgroup *objcg_new; > + struct pcpu_chunk *chunk; > + unsigned long flags; > + void *addr; > + > + WARN_ON(!in_task()); > + > + if (!ptr) > + return true; > + > + addr = __pcpu_ptr_to_addr(ptr); > + spin_lock_irqsave(&pcpu_lock, flags); > + chunk = pcpu_chunk_addr_search(addr); > + off = addr - chunk->base_addr; > + objcg_old = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT]; > + if (!objcg_old && step != MEMCG_KMEM_POST_CHARGE) { > + spin_unlock_irqrestore(&pcpu_lock, flags); > + return true; > + } > + > + bit_off = off / PCPU_MIN_ALLOC_SIZE; > + /* find end index */ > + end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), > + bit_off + 1); > + bits = end - bit_off; > + size = bits * PCPU_MIN_ALLOC_SIZE; > + > + switch (step) { > + case MEMCG_KMEM_PRE_CHARGE: > + objcg_new = get_obj_cgroup_from_current(); > + WARN_ON(!objcg_new); > + if (obj_cgroup_charge(objcg_new, GFP_KERNEL, > + size * num_possible_cpus())) { > + obj_cgroup_put(objcg_new); > + spin_unlock_irqrestore(&pcpu_lock, flags); > + return false; > + } > + break; > + case MEMCG_KMEM_UNCHARGE: > + obj_cgroup_uncharge(objcg_old, size * num_possible_cpus()); > + rcu_read_lock(); > + mod_memcg_state(obj_cgroup_memcg(objcg_old), MEMCG_PERCPU_B, > + -(size * num_possible_cpus())); > + rcu_read_unlock(); > + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL; > + obj_cgroup_put(objcg_old); > + break; > + case MEMCG_KMEM_POST_CHARGE: > + rcu_read_lock(); > + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = obj_cgroup_from_current(); > + mod_memcg_state(mem_cgroup_from_task(current), MEMCG_PERCPU_B, > + (size * num_possible_cpus())); > + rcu_read_unlock(); > + break; > + case MEMCG_KMEM_CHARGE_ERR: > + /* > + * In case fail to charge to the new one in the pre charge state, > + * for example, we have pre-charged one memcg successfully but fail > + * to pre-charge the second memcg, then we should uncharge the first > + * memcg. > + */ > + objcg_new = obj_cgroup_from_current(); > + obj_cgroup_uncharge(objcg_new, size * num_possible_cpus()); > + obj_cgroup_put(objcg_new); > + rcu_read_lock(); > + mod_memcg_state(obj_cgroup_memcg(objcg_new), MEMCG_PERCPU_B, > + -(size * num_possible_cpus())); > + rcu_read_unlock(); > + > + break; > + } I'm not really the biggest fan of this step stuff. I see why you're doing it because you want to do all or nothing recharging the percpu bpf maps. Is there a way to have percpu own this logic and attempt to do all or nothing instead? I realize bpf is likely the largest percpu user, but the recharge_percpu() api seems to be more generic than forcing potential other users in the future to open code it. > + > + spin_unlock_irqrestore(&pcpu_lock, flags); > + > + return true; > +} > +EXPORT_SYMBOL(recharge_percpu); > + > +#else /* CONFIG_MEMCG_KMEM */ > + > +bool charge_percpu(void __percpu *ptr, bool charge) > +{ > + return true; > +} > +EXPORT_SYMBOL(charge_percpu); > + > +void uncharge_percpu(void __percpu *ptr) > +{ > +} I'm guessing this is supposed to be recharge_percpu() not (un)charge_percpu(). > +EXPORT_SYMBOL(uncharge_percpu); > + > +#endif /* CONFIG_MEMCG_KMEM */ > + > bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr) > { > #ifdef CONFIG_SMP > -- > 2.17.1 > > Thanks, Dennis