From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25141C282EC for ; Fri, 14 Mar 2025 13:33:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA335280002; Fri, 14 Mar 2025 09:33:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D51CD280001; Fri, 14 Mar 2025 09:33:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCBD2280002; Fri, 14 Mar 2025 09:33:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 96CB0280001 for ; Fri, 14 Mar 2025 09:33:32 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A52ACC10E3 for ; Fri, 14 Mar 2025 13:33:33 +0000 (UTC) X-FDA: 83220248706.22.B2B65BB Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf14.hostedemail.com (Postfix) with ESMTP id 0DCEF10000A for ; Fri, 14 Mar 2025 13:33:30 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=LUCy1sFG; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5xrte5QE; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=LUCy1sFG; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5xrte5QE; spf=pass (imf14.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741959211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7WUzyeDy3uNNnz/1XAJKue/PI1Pk+9JL+BtFXHj2L2c=; b=Gc8+oa/yH+edf2FYp6kST7ZMfYiMC4aNAgqsXFMu4ZckJFvn/IoLIdEVnvDmMycqZ4aiGK W/OmbDRoxCRGHCiCz8E4aqVNvPY3WPlMk3dfyMrbdX3yopVaFa6qqwQXWSMGCSlOnw46W/ TfnjwW7ALI1dQors2XTtht9nnuBsBsY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741959211; a=rsa-sha256; cv=none; b=5WQgkM7EhZfki4i2GUB74obkhAtla0xO4F1Qiu0oo9QqEdCspqGE/68qccaF/zr0g47BN6 JQbh/5/tDSKTJKBjow2oG7w/gdwfpbf9/KnyUCR+XKke8NnXNPdcZBWZUn7qJFcLf0caEg xV5dboJDoS7+5TVdtwvrWpXcJ4QKOug= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=LUCy1sFG; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5xrte5QE; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=LUCy1sFG; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5xrte5QE; spf=pass (imf14.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5524A1F394; Fri, 14 Mar 2025 13:33:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1741959209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7WUzyeDy3uNNnz/1XAJKue/PI1Pk+9JL+BtFXHj2L2c=; b=LUCy1sFGRLyFxulnt3TGgq9+7KOC0zTFuedXr7RaDo0P1seMFvHd3OTDZklTGnS1xg/qv1 OqsI2cicZh+xmHy6nuTnEmfVfFHBPW0qknKNEuEg92pRFTuIcaucQlSSsW2UzRs84T+pl/ IJSa2eqGx2OUnlho8bXPY9kiH8iO5hk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1741959209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7WUzyeDy3uNNnz/1XAJKue/PI1Pk+9JL+BtFXHj2L2c=; b=5xrte5QE6Y7oYgIDJpFARrhH/LtJ9aHTXSTAK38SA8RneUt4xuNNSCasrxvO6SULVshCsW SqkLTpADxf3n7+CA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1741959209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7WUzyeDy3uNNnz/1XAJKue/PI1Pk+9JL+BtFXHj2L2c=; b=LUCy1sFGRLyFxulnt3TGgq9+7KOC0zTFuedXr7RaDo0P1seMFvHd3OTDZklTGnS1xg/qv1 OqsI2cicZh+xmHy6nuTnEmfVfFHBPW0qknKNEuEg92pRFTuIcaucQlSSsW2UzRs84T+pl/ IJSa2eqGx2OUnlho8bXPY9kiH8iO5hk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1741959209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7WUzyeDy3uNNnz/1XAJKue/PI1Pk+9JL+BtFXHj2L2c=; b=5xrte5QE6Y7oYgIDJpFARrhH/LtJ9aHTXSTAK38SA8RneUt4xuNNSCasrxvO6SULVshCsW SqkLTpADxf3n7+CA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 3A688132DD; Fri, 14 Mar 2025 13:33:29 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id QtfxDSkw1GfKIwAAD6G6ig (envelope-from ); Fri, 14 Mar 2025 13:33:29 +0000 Message-ID: <6bd606f4-cfe9-4afb-b538-26feb56f9268@suse.cz> Date: Fri, 14 Mar 2025 14:33:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 00/10] memcg: stock code cleanups Content-Language: en-US To: Shakeel Butt , Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Sebastian Andrzej Siewior , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team References: <20250314061511.1308152-1-shakeel.butt@linux.dev> From: Vlastimil Babka In-Reply-To: <20250314061511.1308152-1-shakeel.butt@linux.dev> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0DCEF10000A X-Stat-Signature: c6esekb3db5jaczcbes5tawxfqep13sc X-HE-Tag: 1741959210-513260 X-HE-Meta: U2FsdGVkX19aE6wnbGEo3AWdg1mvtYJbBtXotxyWD4lBY33xGbgu0StFPBiKqIb92FY/9/+gc7PUBNJijSsUGGtIXjpMjLxjq7gG7rOB2Siaqtlei/ZDi55TJhhD4ZKWSSUw4K2P+X8c8csNORLcvSV6gfQx17VQx+k9DgIWPsxcUrDMhu6x8ysqhJNbqR8WXQouUucArcQ0Zao1q5/nKemC9KAzrJtY0OlH062Emec1MOM5NI2PDmthjkrVhP0PCq4l3sPZf/qP9GpmyyyQN6/B1WUvbsszFJNmgq8wVRP+0r5+a0aowkOwxW/Hma55fzV7q4H8Ajy0Ib5gLSw9M/4L/bCWv7BEkux8OLinHL5nNjaXGkIF9zIi1FZb3CjcEsTdGRhoOHbsFXXn4c0B56INSbUo6SjZ7AensAyNzbyOYIa2yT4wXMVAfqXyWIqergYrsOZRNZQ0mNnb15aQRDrG9kMLIOtrAKCLUE5VxUl+4AB4duUFEF4XOz3MTGnYTCyZcDQ5Xs2RPsWeqTXabP9+vnn/QyWRc/5u77x1dnCX3fzccgnwhbndCnz2rAvxNtBeRNtrjLH8JfhrxnvLxEzwhvdHaSgybxBov4fGWLUUX1MbWXr+plOQe/XyNRPeQMBlC9bXzEE0B75HSkpr38rl7Un5y7Skvvcg0ISIc6suW24Few+wijj1BYWHcj3DOqw7DOzKJZ+PyEYcQ1YgrgaD1XefnpEQ/yUcQM1BCyJzswmG/a8UkaPktKXPsSphzAwUVGEqRe/qnAXgG49nVDJhPfU6sWByvlBARkFxBzDsn4MIvKuh2qayTgtZCbKckzGchjCAAwu9KHSOuzE55OpOMIB6db4kIdrD15dbV9wEbRCmzOAxAjf0fVPHFqlFMYYf1cNnAqNBris5ZsZ+Yqq8PMB0+ZzEEmSbq4fetZuCqtKbbD99hAxpcdZO9493H65IAbipSsGRR9MlMz3 kfun+Bim UZTLPi3YjLVEQN8i88KjX9dNIR6Bl6Q7fAWrCC99ir5yvdaVyax+USUGJqDVJwfzST/g34sLKWh15YUOo8tnLFmJLSZm+8D5Iz0iMcx91jWEPnCHwb9V/nHcFbDEyDvuW+L9f4wXXTV1LBKaIGFXZAf8TwSCUEaEaUxer8n7ftGAUSWQddwonyq2n/Nr3X2duVfpli5Jty13AhEK6iL7DbvQ7PrFBn9iyFR5m2qhHKXJ20/qZzpsMOUw/PNREsCZWCUL10BRYzUm+3da5vNdNCwB+Ws+wwZ0KeUulss+4lje0rX5CzkoyUtVwpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/14/25 07:15, Shakeel Butt wrote: > This is a cleanup series (first 7 patches) is trying to simplify the > memcg stock code, particularly it tries to remove unnecessary > dependencies. The last 3 patches are a prototype to make per-cpu memcg > stock work without disabling irqs. > > My plan is to send out the first 7 cleanup patches separately for the > next release window and iterate more on the last 3 patches plus add > functionality for multiple memcgs. > > This series is based on next-20250313 plus two following patches: > > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/ > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/ > > to simply the memcg stock code Hi, I've been looking at this area too, and noticed a different opportunity for cleanup+perf improvement yesterday. I rebased it on top of patch 7 as it would make sense to do it before changing the locking - it reduces the number of places where the local_trylock is taken. If it's ok to you, please incorporate to your series. Thanks, Vlastimil ----8<---- >From 8dda8e4225ee91b48a73759f727a28d448c634b5 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 13 Mar 2025 20:08:32 +0100 Subject: [PATCH] memcg: combine slab obj stock charging and accounting When handing slab objects, we use obj_cgroup_[un]charge() for (un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B. All these operations use the percpu stock for performance. However with the calls being separate, the stock_lock is taken twice in each case. By refactoring the code, we can turn mod_objcg_state() into __account_obj_stock() which is called on a stock that's already locked and validated. On the charging side we can call this function from consume_obj_stock() when it succeeds, and refill_obj_stock() in the fallback. We just expand parameters of these functions as necessary. The uncharge side from __memcg_slab_free_hook() is just the call to refill_obj_stock(). Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the extra parameters as NULL/zeroes to skip the __account_obj_stock() operation. In __memcg_slab_post_alloc_hook() we now charge each object separately, but that's not a problem as we did call mod_objcg_state() for each object separately, and most allocations are non-bulk anyway. This could be improved by batching all operations until slab_pgdat(slab) changes. Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M iterations with/without __GFP_ACCOUNT: Before the patch: kmalloc/kfree !memcg: 581390144 cycles kmalloc/kfree memcg: 783689984 cycles After the patch: kmalloc/kfree memcg: 658723808 cycles More than half of the overhead of __GFP_ACCOUNT relative to non-accounted case seems eliminated. Signed-off-by: Vlastimil Babka --- mm/memcontrol.c | 77 +++++++++++++++++++++++++++++-------------------- 1 file changed, 46 insertions(+), 31 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index dfe9c2eb7816..553eb1d7250a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2753,25 +2753,17 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock, WRITE_ONCE(stock->cached_objcg, objcg); } -static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, - enum node_stat_item idx, int nr) +static void __account_obj_stock(struct obj_cgroup *objcg, + struct memcg_stock_pcp *stock, int nr, + struct pglist_data *pgdat, enum node_stat_item idx) { - struct memcg_stock_pcp *stock; - unsigned long flags; int *bytes; - localtry_lock_irqsave(&memcg_stock.stock_lock, flags); - stock = this_cpu_ptr(&memcg_stock); - /* * Save vmstat data in stock and skip vmstat array update unless - * accumulating over a page of vmstat data or when pgdat or idx - * changes. + * accumulating over a page of vmstat data or when pgdat changes. */ - if (READ_ONCE(stock->cached_objcg) != objcg) { - replace_stock_objcg(stock, objcg); - stock->cached_pgdat = pgdat; - } else if (stock->cached_pgdat != pgdat) { + if (stock->cached_pgdat != pgdat) { /* Flush the existing cached vmstat data */ struct pglist_data *oldpg = stock->cached_pgdat; @@ -2808,11 +2800,10 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, } if (nr) __mod_objcg_mlstate(objcg, pgdat, idx, nr); - - localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); } -static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) +static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, + struct pglist_data *pgdat, enum node_stat_item idx) { struct memcg_stock_pcp *stock; unsigned long flags; @@ -2824,6 +2815,9 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes; ret = true; + + if (pgdat) + __account_obj_stock(objcg, stock, nr_bytes, pgdat, idx); } localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); @@ -2908,7 +2902,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, } static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, - bool allow_uncharge) + bool allow_uncharge, int nr_acct, struct pglist_data *pgdat, + enum node_stat_item idx) { struct memcg_stock_pcp *stock; unsigned long flags; @@ -2923,6 +2918,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, } stock->nr_bytes += nr_bytes; + if (pgdat) + __account_obj_stock(objcg, stock, nr_acct, pgdat, idx); + if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) { nr_pages = stock->nr_bytes >> PAGE_SHIFT; stock->nr_bytes &= (PAGE_SIZE - 1); @@ -2934,12 +2932,13 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, obj_cgroup_uncharge_pages(objcg, nr_pages); } -int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) +static int obj_cgroup_charge_account(struct obj_cgroup *objcg, gfp_t gfp, size_t size, + struct pglist_data *pgdat, enum node_stat_item idx) { unsigned int nr_pages, nr_bytes; int ret; - if (consume_obj_stock(objcg, size)) + if (likely(consume_obj_stock(objcg, size, pgdat, idx))) return 0; /* @@ -2972,15 +2971,21 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) nr_pages += 1; ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages); - if (!ret && nr_bytes) - refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false); + if (!ret && (nr_bytes || pgdat)) + refill_obj_stock(objcg, nr_bytes ? PAGE_SIZE - nr_bytes : 0, + false, size, pgdat, idx); return ret; } +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) +{ + return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0); +} + void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) { - refill_obj_stock(objcg, size, true); + refill_obj_stock(objcg, size, true, 0, NULL, 0); } static inline size_t obj_full_size(struct kmem_cache *s) @@ -3032,23 +3037,32 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, return false; } - if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s))) - return false; - for (i = 0; i < size; i++) { slab = virt_to_slab(p[i]); if (!slab_obj_exts(slab) && alloc_slab_obj_exts(slab, s, flags, false)) { - obj_cgroup_uncharge(objcg, obj_full_size(s)); continue; } + /* + * if we fail and size is 1, memcg_alloc_abort_single() will + * just free the object, which is ok as we have not assigned + * objcg to its obj_ext yet + * + * for larger sizes, kmem_cache_free_bulk() will uncharge + * any objects that were already charged and obj_ext assigned + * + * TODO: we could batch this until slab_pgdat(slab) changes + * between iterations, with a more complicated undo + */ + if (obj_cgroup_charge_account(objcg, flags, obj_full_size(s), + slab_pgdat(slab), cache_vmstat_idx(s))) + return false; + off = obj_to_index(s, slab, p[i]); obj_cgroup_get(objcg); slab_obj_exts(slab)[off].objcg = objcg; - mod_objcg_state(objcg, slab_pgdat(slab), - cache_vmstat_idx(s), obj_full_size(s)); } return true; @@ -3057,6 +3071,8 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects, struct slabobj_ext *obj_exts) { + size_t obj_size = obj_full_size(s); + for (int i = 0; i < objects; i++) { struct obj_cgroup *objcg; unsigned int off; @@ -3067,9 +3083,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, continue; obj_exts[off].objcg = NULL; - obj_cgroup_uncharge(objcg, obj_full_size(s)); - mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), - -obj_full_size(s)); + refill_obj_stock(objcg, obj_size, true, -obj_size, + slab_pgdat(slab), cache_vmstat_idx(s)); obj_cgroup_put(objcg); } } -- 2.48.1