From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30932C433EF for ; Sat, 25 Jun 2022 14:18:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA4028E0030; Sat, 25 Jun 2022 10:18:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B51BB8E0009; Sat, 25 Jun 2022 10:18:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1ABB8E0030; Sat, 25 Jun 2022 10:18:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 922C68E0009 for ; Sat, 25 Jun 2022 10:18:41 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5C531C33 for ; Sat, 25 Jun 2022 14:18:41 +0000 (UTC) X-FDA: 79616964042.07.8550E18 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf12.hostedemail.com (Postfix) with ESMTP id 074974001D for ; Sat, 25 Jun 2022 14:18:40 +0000 (UTC) Received: by mail-vk1-f174.google.com with SMTP id az35so2507533vkb.0 for ; Sat, 25 Jun 2022 07:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EYhSLaohKIEssxf/fNlbOqQk67cnyoEd3iS2K5d+Hmc=; b=auaHFR3raXiYXDOhV9N2vsSHiwKKL2v+60qnrKBAYiOHiwWhkbUr1xbjbVYHP1W2i2 auoTzPOPs5BHYfkxtp3fRcuaEQ6reswO8lkzMfrtHF2uZWbPjpQLghqKEfifd1J9kVYB 7cLqgqB+A6nvTlkxfQxaip5vv3B002R2EEkmfGmTg6j/88LF5LegSnH/4dcbXqsaKLBt u1c8xL7Tc7lVWdgOopCxm2c60FGUz4ZDkYLdW0Fb0dCzMeLpOl1tqUC5X1hrfNltpyty E1Nbf9RK1AQtF3Wa7+1D/wXwsk71eIL1kTNLwYUAKKK80LIPKIWzvVuK1c6JbT5hNT2K aZ7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EYhSLaohKIEssxf/fNlbOqQk67cnyoEd3iS2K5d+Hmc=; b=IqMN3tpiKySrH7ItvMG5D0QEZXyxjm5vEfy0uYTYUm/jojzaSN6Ndksa9dHpJPWMOV R9w7GrmQWzwnuZntUIH7QLy+UIol9IDPPmbDvSXcrK4t5X4iaBSMEsIAcpRhYQGVBgdW ihRlQ1FOvj0FKyvzmUe87vdRPjw1CXZuhr1uOLBe0ggIPfV5TXlPB03mfAXnkjpPO9Dj pnuvyPtfelC+bHg6JPde9Iw02ydVIGY9jhNBlg5FUaKJ6eeRiZoI84eWZqKqaFYe4gcg 30k5j60HCcWL+1UiEIeHMbyB/IeSDzQ+fGSo4Y3MrJzu08Rco/Q5l/p8vpzl3KugRlXr A9bg== X-Gm-Message-State: AJIora+SCl5+GIQGu3chq+7DonGO0Zxg7KNdQ9qY/7O61jt3maqZLcvj zVgFfbhRFSTfAN9WJMA6euOfa9BM7isIZE/UYzU= X-Google-Smtp-Source: AGRyM1vqa1yJcSonJArtV5Ue+/fmev6sr/82ERc5TmUzBxC1iqjt39GDfSbmzPfLqIA506Z4yVVAHuY8MDfkxhQKYbU= X-Received: by 2002:a05:6122:c63:b0:36c:95ef:f6ce with SMTP id i35-20020a0561220c6300b0036c95eff6cemr1421691vkr.5.1656166720236; Sat, 25 Jun 2022 07:18:40 -0700 (PDT) MIME-Version: 1.0 References: <20220619155032.32515-1-laoar.shao@gmail.com> <20220619155032.32515-8-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Sat, 25 Jun 2022 22:18:03 +0800 Message-ID: Subject: Re: [RFC PATCH bpf-next 07/10] mm: Add helper to recharge percpu address To: Dennis Zhou , kbuild test robot , kbuild-all@lists.01.org Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin Lau , Song Liu , Yonghong Song , john fastabend , KP Singh , Quentin Monnet , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , songmuchun@bytedance.com, Andrew Morton , Christoph Lameter , penberg@kernel.org, David Rientjes , iamjoonsoo.kim@lge.com, Vlastimil Babka , Linux MM , bpf Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=auaHFR3r; spf=pass (imf12.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656166721; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EYhSLaohKIEssxf/fNlbOqQk67cnyoEd3iS2K5d+Hmc=; b=H1Kr85XIatTwctaDEm3j0MOcOawcSownrwCoA4mnk1QdGP/2GwZo52zUGrf0/izlPU2KPM OhhRUQpgVZMBjDUgW1vwsMgRsyC9cTn9eO7INBOElvv6iHxvjVjkFboTnBdw82uqvdEE6z wQuIFUfbp7Y8iUBWdRV/33FahpVizCI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656166721; a=rsa-sha256; cv=none; b=moYPyA+fLUdxyWVd7fcpa6DZIX2zO1s8tCFY2uNC1ceZo0kAPBc5UOih2oWeJTYeBQ7e6u R3NfXzGMYcDN+Who8FijMuGpDF+xnWELCJaHLZWyVv6W/AakV8rFlAfyH2Y5mc3QEEXrR0 Ud0j0fMbIGsoX+OnxCa6G6JD2VY9klE= X-Stat-Signature: gzkgmc64h977p3kpjkgn5qxir6goj89z X-Rspamd-Queue-Id: 074974001D X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=auaHFR3r; spf=pass (imf12.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam02 X-HE-Tag: 1656166720-876956 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 23, 2022 at 1:25 PM Dennis Zhou wrote: > > Hello, > > On Sun, Jun 19, 2022 at 03:50:29PM +0000, Yafang Shao wrote: > > This patch introduces a helper to recharge the corresponding pages of > > a given percpu address. It is similar with how to recharge a kmalloc'ed > > address. > > > > Signed-off-by: Yafang Shao > > --- > > include/linux/percpu.h | 1 + > > mm/percpu.c | 98 ++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 99 insertions(+) > > > > diff --git a/include/linux/percpu.h b/include/linux/percpu.h > > index f1ec5ad1351c..e88429410179 100644 > > --- a/include/linux/percpu.h > > +++ b/include/linux/percpu.h > > @@ -128,6 +128,7 @@ extern void __init setup_per_cpu_areas(void); > > extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1); > > extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1); > > extern void free_percpu(void __percpu *__pdata); > > +bool recharge_percpu(void __percpu *__pdata, int step); > > Nit: can you add extern to keep the file consistent. > Sure, I will do it. > > extern phys_addr_t per_cpu_ptr_to_phys(void *addr); > > > > #define alloc_percpu_gfp(type, gfp) \ > > diff --git a/mm/percpu.c b/mm/percpu.c > > index 3633eeefaa0d..fd81f4d79f2f 100644 > > --- a/mm/percpu.c > > +++ b/mm/percpu.c > > @@ -2310,6 +2310,104 @@ void free_percpu(void __percpu *ptr) > > } > > EXPORT_SYMBOL_GPL(free_percpu); > > > > +#ifdef CONFIG_MEMCG_KMEM > > +bool recharge_percpu(void __percpu *ptr, int step) > > +{ > > + int bit_off, off, bits, size, end; > > + struct obj_cgroup *objcg_old; > > + struct obj_cgroup *objcg_new; > > + struct pcpu_chunk *chunk; > > + unsigned long flags; > > + void *addr; > > + > > + WARN_ON(!in_task()); > > + > > + if (!ptr) > > + return true; > > + > > + addr = __pcpu_ptr_to_addr(ptr); > > + spin_lock_irqsave(&pcpu_lock, flags); > > + chunk = pcpu_chunk_addr_search(addr); > > + off = addr - chunk->base_addr; > > + objcg_old = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT]; > > + if (!objcg_old && step != MEMCG_KMEM_POST_CHARGE) { > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > + return true; > > + } > > + > > + bit_off = off / PCPU_MIN_ALLOC_SIZE; > > + /* find end index */ > > + end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), > > + bit_off + 1); > > + bits = end - bit_off; > > + size = bits * PCPU_MIN_ALLOC_SIZE; > > + > > + switch (step) { > > + case MEMCG_KMEM_PRE_CHARGE: > > + objcg_new = get_obj_cgroup_from_current(); > > + WARN_ON(!objcg_new); > > + if (obj_cgroup_charge(objcg_new, GFP_KERNEL, > > + size * num_possible_cpus())) { > > + obj_cgroup_put(objcg_new); > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > + return false; > > + } > > + break; > > + case MEMCG_KMEM_UNCHARGE: > > + obj_cgroup_uncharge(objcg_old, size * num_possible_cpus()); > > + rcu_read_lock(); > > + mod_memcg_state(obj_cgroup_memcg(objcg_old), MEMCG_PERCPU_B, > > + -(size * num_possible_cpus())); > > + rcu_read_unlock(); > > + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL; > > + obj_cgroup_put(objcg_old); > > + break; > > + case MEMCG_KMEM_POST_CHARGE: > > + rcu_read_lock(); > > + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = obj_cgroup_from_current(); > > + mod_memcg_state(mem_cgroup_from_task(current), MEMCG_PERCPU_B, > > + (size * num_possible_cpus())); > > + rcu_read_unlock(); > > + break; > > + case MEMCG_KMEM_CHARGE_ERR: > > + /* > > + * In case fail to charge to the new one in the pre charge state, > > + * for example, we have pre-charged one memcg successfully but fail > > + * to pre-charge the second memcg, then we should uncharge the first > > + * memcg. > > + */ > > + objcg_new = obj_cgroup_from_current(); > > + obj_cgroup_uncharge(objcg_new, size * num_possible_cpus()); > > + obj_cgroup_put(objcg_new); > > + rcu_read_lock(); > > + mod_memcg_state(obj_cgroup_memcg(objcg_new), MEMCG_PERCPU_B, > > + -(size * num_possible_cpus())); > > + rcu_read_unlock(); > > + > > + break; > > + } > > I'm not really the biggest fan of this step stuff. I see why you're > doing it because you want to do all or nothing recharging the percpu bpf > maps. Is there a way to have percpu own this logic and attempt to do all > or nothing instead? I realize bpf is likely the largest percpu user, but > the recharge_percpu() api seems to be more generic than forcing > potential other users in the future to open code it. > Agree with you that the recharge api may be used by other users. It should be a more generic helper. Maybe we can make percpu own this logic by introducing a new value for the parameter step, for example, recharge_percpu(ptr, -1); // -1 means the user doesn't need to care about the multiple steps. > > + > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > + > > + return true; > > +} > > +EXPORT_SYMBOL(recharge_percpu); > > + > > +#else /* CONFIG_MEMCG_KMEM */ > > + > > +bool charge_percpu(void __percpu *ptr, bool charge) > > +{ > > + return true; > > +} > > +EXPORT_SYMBOL(charge_percpu); > > + > > +void uncharge_percpu(void __percpu *ptr) > > +{ > > +} > > I'm guessing this is supposed to be recharge_percpu() not > (un)charge_percpu(). Thanks for pointing out this bug. The lkp robot also reported this bug to me. I will change it. -- Regards Yafang