From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49753C3ABAA for ; Mon, 5 May 2025 18:46:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8B166B009B; Mon, 5 May 2025 14:46:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C37356B009C; Mon, 5 May 2025 14:46:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B25B36B009D; Mon, 5 May 2025 14:46:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 948146B009B for ; Mon, 5 May 2025 14:46:40 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6F75A5A2B9 for ; Mon, 5 May 2025 18:46:39 +0000 (UTC) X-FDA: 83409735318.06.26DE2F2 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) by imf30.hostedemail.com (Postfix) with ESMTP id 4450280012 for ; Mon, 5 May 2025 18:46:37 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DC4qa+a3; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746470797; a=rsa-sha256; cv=none; b=qDJ5Ry/0dsO2Nnzq2xPUPzVxgf6PesS8+f39CqImMGlh1ls8iHPo/+ybeTqs9M/blfn0/w KrHcOgqTeie7wRQqStsTUL6BKq8i1Q0t+bLiVdudFfAp2JlnsazxqNdOr6plQsP2n+gkdg rHMCuKJSLsfWabS8sx1MriCE/fkgS4g= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DC4qa+a3; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746470797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VnKxsV2L2fffvIQfQUdFUlww7mT05ivEl4wH2LZkKqU=; b=ZHpgWom4Yh7dWi1i6WXUHK1f1WjvDq0akKZkKtem7V9XY7VgBN+Sx3RuZTTxV+InpqTY8+ yM8t5Ra9L3yOLMfPBn/D9C17yyUk596eHuC1Fu8bKYgYYa5Z3dc/JEZCOzJeN7KaULqv+r 51sbtRQY/s6EeoxPf9IgYJafUeVJhmQ= Date: Mon, 5 May 2025 11:46:28 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1746470795; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VnKxsV2L2fffvIQfQUdFUlww7mT05ivEl4wH2LZkKqU=; b=DC4qa+a3IVGQtg+5iEP1lHix3GfsKejrm/X2uAXD887G8EBSEms8BiKVwvaYqezI4Sr5z3 XZ/PiFmCfecMxGquEk73+I4tiKYDs/SzcC+d7RilXQNgo0Tz1gVyoCenuX4eNzhrgdyGgE yZIWWdVl3udUJT04Z0RIWOgmj2R9NYk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Alexei Starovoitov Cc: bpf@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz, harry.yoo@oracle.com, mhocko@suse.com, bigeasy@linutronix.de, andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, rostedt@goodmis.org, hannes@cmpxchg.org, willy@infradead.org Subject: Re: [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock(). Message-ID: References: <20250501032718.65476-1-alexei.starovoitov@gmail.com> <20250501032718.65476-7-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250501032718.65476-7-alexei.starovoitov@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 4450280012 X-Stat-Signature: 1htsqwp9jh6h8fh889fdhgycj7tqfasb X-Rspam-User: X-HE-Tag: 1746470797-710441 X-HE-Meta: U2FsdGVkX194s3/O0pnH849/NbFGu1YuXBrWh/58Y0AXz2U5kvhQBFK13wSx+Z4LSyUgsVaW/WQ7fxX+JRbqpOokJhkyC/rZGRlBeecyZpG6VFih9jktAVtXIFHq1ufhEu50SWePrnFNUDZ3nV5/s96/wBAMOADa2gAj5ZGMHclJcX+RspZ95UHbUTkBYiUF65lA57ayhAqi5D6jprbOLYM1/L9jG317tNdfXulK0sAjs5WdRiPHXNzaiYyi4iHlEs/Wo+5rou0JOZ2rp9HqPUeuE73ZPEDfTWknWm4kxrf9hk6F1YrR8Bzr+5R3V4Xh7JRduvkitKHSGqclRSYp37mj+clKSGPEMp3JadIDrSpN5+cXmfqaX5Eo5qzvKatOel/UZW2x7wFCFXBojiWwKe8gZSj4KzXLu9Gld38PG4ZLGeqMlZvX2Cwbnt2picuqFISpYDc/gU/LmwI5BYoo0D5ilEqf96tRJiOsO1B6eG36l37izkQLRdzfz/VYOxU08fJKUXrxi9xv8QWR9wyRMLVAkXbEWDCqFPH5QXROMc6otzIF0a0ML5vwynmaPkYNTCgmeNKuIGA4Drw++h41uLayvThFLMvItnwxoPCYv9DZLK8IgKvPBTRP0O5yho3uaQjB8wvqv+qjDWSogDOUggP/EvkiFSu8VmoDTY5NaeWt1STdhcz/y16qIsGukcYf5PlWy/hZbylXSsfTG70B3pxeTJhM617Xg6AfTX3adtZysYJkfcqjgN9/WgVAsKKAfII2rfxwdn8NHeSLJargxyCfso3gRmMQ/4TzoUEzeoi7LIQLuZPUEwSlIH2MZHzgEX8JWOg88NjVxUmm7ioAi71tK3WOvXU629jX7hOwe/QdIBzGiwrP2WF3XxDiUXN3sLtuazvhCsDsC70JOUnA/4GMi5QiPqRZJRQ13slCRKFG6N+VZyfi2XcfoidNSOQh0vg3kcrd5HtmURhmnGp X/eb+LC6 3+L5Aa1VrOd9Hge3/NSfVADI1dENZZnX9X3CMB2R6oC/fbk4ZIeC5bo3X37gri+4eVEpYzuorujkj4L0DCrSLw/7o2VpTkP6DOnkywDyAmK1f+iPQ60s8cXS2ohsoL/WjyCY0XvURG8IshpOGcggNcaDx4d3QmmQSwFEmLQKwKziTVHAjap3vY1aE6GgOOcjNX++YnyGEEnMzsgXvEH5kV4lV7fF94NbIkV6Edk39CoZRfN419R2kMuo4h3kyhH2Rxl8quHPu5GfORzSELV1M3lq9ZedI92RShAknY/KggR952uNUvWgYkMh9Nr24g6FqOfJzBSCaI7JkhZ3/SkyOwo0y7U0YfFgAR4YmvyoBNKTgVeA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 30, 2025 at 08:27:18PM -0700, Alexei Starovoitov wrote: > From: Alexei Starovoitov > > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -595,7 +595,13 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > if (!val) > return; > > - cgroup_rstat_updated(memcg->css.cgroup, cpu); > + /* > + * If called from NMI via kmalloc_nolock -> memcg_slab_post_alloc_hook > + * -> obj_cgroup_charge -> mod_memcg_state, > + * then delay the update. > + */ > + if (!in_nmi()) > + cgroup_rstat_updated(memcg->css.cgroup, cpu); I don't think we can just ignore cgroup_rstat_updated() for nmi as there is a chance (though very small) that we will loose these stats updates. In addition, memcg_rstat_updated() itself is not reentrant safe along with couple of functions leading to it like __mod_memcg_lruvec_state(). > statc = this_cpu_ptr(memcg->vmstats_percpu); > for (; statc; statc = statc->parent) { > /* > @@ -2895,7 +2901,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, > unsigned long flags; > bool ret = false; > > - local_lock_irqsave(&memcg_stock.stock_lock, flags); > + local_lock_irqsave_check(&memcg_stock.stock_lock, flags); > > stock = this_cpu_ptr(&memcg_stock); > if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { > @@ -2995,7 +3001,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, > unsigned long flags; > unsigned int nr_pages = 0; > > - local_lock_irqsave(&memcg_stock.stock_lock, flags); > + local_lock_irqsave_check(&memcg_stock.stock_lock, flags); > > stock = this_cpu_ptr(&memcg_stock); > if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > @@ -3088,6 +3094,27 @@ static inline size_t obj_full_size(struct kmem_cache *s) > return s->size + sizeof(struct obj_cgroup *); > } > > +/* > + * Try subtract from nr_charged_bytes without making it negative > + */ > +static bool obj_cgroup_charge_atomic(struct obj_cgroup *objcg, gfp_t flags, size_t sz) > +{ > + size_t old = atomic_read(&objcg->nr_charged_bytes); > + u32 nr_pages = sz >> PAGE_SHIFT; > + u32 nr_bytes = sz & (PAGE_SIZE - 1); > + > + if ((ssize_t)(old - sz) >= 0 && > + atomic_cmpxchg(&objcg->nr_charged_bytes, old, old - sz) == old) > + return true; > + > + nr_pages++; > + if (obj_cgroup_charge_pages(objcg, flags, nr_pages)) > + return false; > + > + atomic_add(PAGE_SIZE - nr_bytes, &objcg->nr_charged_bytes); > + return true; > +} > + > bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, > gfp_t flags, size_t size, void **p) > { > @@ -3128,6 +3155,21 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, > return false; > } > > + if (!gfpflags_allow_spinning(flags)) { > + if (local_lock_is_locked(&memcg_stock.stock_lock)) { > + /* > + * Cannot use > + * lockdep_assert_held(this_cpu_ptr(&memcg_stock.stock_lock)); > + * since lockdep might not have been informed yet > + * of lock acquisition. > + */ > + return obj_cgroup_charge_atomic(objcg, flags, > + size * obj_full_size(s)); We can not just ignore the stat updates here. > + } else { > + lockdep_assert_not_held(this_cpu_ptr(&memcg_stock.stock_lock)); > + } > + } > + > for (i = 0; i < size; i++) { > slab = virt_to_slab(p[i]); > > @@ -3162,8 +3204,12 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, > void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > void **p, int objects, struct slabobj_ext *obj_exts) > { > + bool lock_held = local_lock_is_locked(&memcg_stock.stock_lock); > size_t obj_size = obj_full_size(s); > > + if (likely(!lock_held)) > + lockdep_assert_not_held(this_cpu_ptr(&memcg_stock.stock_lock)); > + > for (int i = 0; i < objects; i++) { > struct obj_cgroup *objcg; > unsigned int off; > @@ -3174,8 +3220,12 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > continue; > > obj_exts[off].objcg = NULL; > - refill_obj_stock(objcg, obj_size, true, -obj_size, > - slab_pgdat(slab), cache_vmstat_idx(s)); > + if (unlikely(lock_held)) { > + atomic_add(obj_size, &objcg->nr_charged_bytes); objcg->nr_charged_bytes is stats ignorant and the relevant stats need to be updated before putting stuff into it. > + } else { > + refill_obj_stock(objcg, obj_size, true, -obj_size, > + slab_pgdat(slab), cache_vmstat_idx(s)); > + } > obj_cgroup_put(objcg); > } > } I am actually working on making this whole call chain (i.e. kmalloc/kmem_cache_alloc to memcg [un]charging) reentrant/nmi safe.