From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23617C3ABBF for ; Tue, 6 May 2025 00:50:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B27A6B000A; Mon, 5 May 2025 20:50:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 760756B0082; Mon, 5 May 2025 20:50:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 627F16B0083; Mon, 5 May 2025 20:50:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4480D6B000A for ; Mon, 5 May 2025 20:50:01 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DFD35BB3B4 for ; Tue, 6 May 2025 00:50:01 +0000 (UTC) X-FDA: 83410651002.18.2099277 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf17.hostedemail.com (Postfix) with ESMTP id EBC2F40008 for ; Tue, 6 May 2025 00:49:59 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jz99Uk6B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746492600; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3kYkFCBBmPy4T85wpFGmROsEJALC7bO/4xMlhN5NF/o=; b=KrBmqsFHMjPu9EGCtQGHRJ6xKcrWSjMlN7QfQL3ZRTtH0LFVZIk79XfjHJatNbRobrCGat UqdlJSYAGS8xysS/CQQlhYGGcpGHG+7zf5tQOQxgkHCShrzOQxI7iphBWw+zsXE+PXOTCa rLemEUWJw1cJAOwcTDUD6Pkqqn5Ol68= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746492600; a=rsa-sha256; cv=none; b=8dl1l/bGR/FKydZxYXoOeTMlb4AEENDrG0du76xl7Ue/lmKu0Ti7IhLWqW7S4TaMLf4eB0 1h38lHm+AS02Lkei13U/DAiPCZhRUsYzyi/S51lZosIVgDnjTIlFqPPVYW4dUOhj3wvFyo 9JNZLDZ/nIIlApSGYFOH/4ZcHsqhoGs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jz99Uk6B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-440685d6afcso44372875e9.0 for ; Mon, 05 May 2025 17:49:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746492598; x=1747097398; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3kYkFCBBmPy4T85wpFGmROsEJALC7bO/4xMlhN5NF/o=; b=jz99Uk6BTQwLb9TPb3EMW6jz3zdtT1pdScjGH5fRSoouEmm5Xb8l896wuxw0fXrgrM kRIjgMqDAvFbl/13vsyrxI2VkouAOnVCL6ZcAIWUhj8P7EfCyhOxNv+xH071+Hj6s6Dm JLMhUkO4SEPAFmQC5xXcC6V/IrCIlW+lT2lXzexox91hM4aO+y2teWqF8tWQ0LU6H9/z H+hBuUMKYyA3xbBoI7y+Ejff7FO/KCwW5cJv5rEDijqYSEhqEeB3w9U6Zxo80AYgLnen 23hR5N2rA3JmSFPLTu57to+T7LzSUWGyJ49uBAk6qCcHr04Ddq5O1BxxZoFiT7LYianB iZ1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746492598; x=1747097398; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3kYkFCBBmPy4T85wpFGmROsEJALC7bO/4xMlhN5NF/o=; b=ro36UR0c7WIvaKiA5ZOYpqsb09O4i2P8bkjfppwQWZ+Ni4ppOzNEqer+r3x2d+ygwn S+aUC8iVSY3T5Tz1dQXnsztHTlHjxR5hSWWe4OFtma3cdT71XVTn2pJzeB8i8VPxvh8D pVjzrg0RWTAbVOTg9FaakX5089wcNZgP+wUAc9casMm9zvp/FHLljZIs1m3BDKEUroSc cLcBbIfY/UFW+Zvv8xBJ13deFsUO15BXUNTNXzmfE3Idj1GNChnUd8oKOAUVoqO8iYty Bb88aP82HX5IMIgXWXfIpePA4lKlWEkiyWa1pPh+9yEuNqXdVrg5pWUWs7TcN0yAsKyI SAXw== X-Forwarded-Encrypted: i=1; AJvYcCUtpb2uPc9yzIaH4ubskDc1GDeaUAJshnDHqGq6dbYJCqPPp1e+suITOwwW1K2pNSGUlUvhF3m+TA==@kvack.org X-Gm-Message-State: AOJu0Yy+uGO6djKAes7L/HiKrH2zN2fGKP3fueBvc1q5gQoSD8AZq+HJ agA81tr5L2XRu3ibcp5sM18nIVqLbmesPFxqHjopGDfIMm/GOQuIJQ1Ra32TffktLyfld8k1VKB QS5mjyx2Px16t6RpUIqkMjA67RXlP1zep X-Gm-Gg: ASbGncsmgNH0Hd2iX3mX5RkZc4weBkdjhImsOpjwEGMM141eoW7fQrGUgFyMYeJUshq Yf7CA2TrZqkYqAri5lbP4hcI0TbwgztLKRzVumzctyPBxZAJVpng6h8s+MFq3Wdf/fSMmjZCb0j FFztkIU5hB/dIuOp2lrQSWmGPEbJrQ4Np72drT2h7q3+Vcd10DXr+D/Q2aeUPv X-Google-Smtp-Source: AGHT+IGtZ9BU5HYCpKD3bvs4zRGF35ibOXCV2qyV/g+kztcZM0YAg1VBYIQuGLc1PUJI/IOlo0HLJG/oqSWKY280NX8= X-Received: by 2002:a05:6000:1449:b0:3a0:80bf:b4e3 with SMTP id ffacd0b85a97d-3a0ac3ec4d6mr622651f8f.58.1746492598119; Mon, 05 May 2025 17:49:58 -0700 (PDT) MIME-Version: 1.0 References: <20250501032718.65476-1-alexei.starovoitov@gmail.com> <20250501032718.65476-7-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Mon, 5 May 2025 17:49:47 -0700 X-Gm-Features: ATxdqUHN4s32Fei5KXYAx2bOJLl9c4dX97Sy3yVQojA0qLJvMHYB9wpx_n7B3WE Message-ID: Subject: Re: [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock(). To: Shakeel Butt Cc: bpf , linux-mm , Vlastimil Babka , Harry Yoo , Michal Hocko , Sebastian Sewior , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Johannes Weiner , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EBC2F40008 X-Rspam-User: X-Stat-Signature: xd1wn8efygix531e3gimjsujgcewczhg X-HE-Tag: 1746492599-282201 X-HE-Meta: U2FsdGVkX1+crj0IiUnGgpc4Q+ke/wvZI1bP7upSn9Z2Cz8tkFDaRtl9Yzi3SdlPDIT2W01W88qHp01Q+HQMWEGLJvAoUxUfOt53Cv0Kvsag8LLAuN9m30ZSyGmBcCCiFgCgzebzE5MCOCqT+gYV3KEvS11esuQ7GWOB1K296u7CKvBoEv+32kbQ4AHtE4ldZVqDBztNSHG9kruXsrFwrl8eXvzEcVEt1tF+Z+Aep5KJe35s1B5GyOucy1oQFHbwp/v5CDsQzthsQWvybS7vva5VQdq0wf8dXgOw+LR+AjdGJitt38UJEh2Rs1VH1ou5x0yvNcweDNq/CoNowQVspjonoUd0nPw/tkBV4LZjEoN2frNgwU5i/g5nyuwsd2iwWVbja9ALEIXntaHZ5SeSBtwhRje1gQEg/zo8uveAsxcIzdPcbEYNfNwDU/s9fWtYifYjDycNg5dGCFWL4YzIIKmIK1LvKBjMvOmB/WFrsvaOsDYeJ4wv4OTT1ldlLc7tqIsrydqFI8607EKxYqxnSjt9jom/d49qgwX0uDmVAFav13x3+QQPe8S0BVDwtBwJrIVoR8gwGWVde8iM6AjwdflybBLbVdBsb9RdQtmHF2KVjxLJme7zxf6M6VDl9NgVOpoun8KMScHYG/PXpZfhDcjzIQ0jeLkD39LkunaThRe2V84Gq8FlxCJGxXs6RR8CddKHq5rYYG23rwqCHkA77xBOCskPDHyxDno8zHMDy+DsfxO/C4du5NKhxaHv36ORI0bKgqCbtuOhwMtK3266uBcvfm2BGdKvUAV5cNgmTPTf9pH5Jns1z8Phh6NCd0rq7Q5tlTKGQ0lko0Ng5i8owS5/e20yCtukZJ/9M0ay1mZdDSHKSw5Q9ftzjI9yuAGeAYz0Mjdx3YezUX70vXJH+ByN6BLrqWtF39IIncB8w4GDuhcJ2d8brRCJlBrb71F+2Loz71zKkm5cf84hMSc Rvt63vZ6 aJ8MDFfpEgDwo+HfBgHfEpNJfK1Y+izDXBDP5DYjFVcsGgOgXUM6hv6Bd6ihoNO7sHl9U5LNdDxTyDW8UJxru9YDTD+MGOPchq0QNng5vdyior2lJ6Yn6TfBLF1Bfi/7vwGyumAnlR5AiydOC53liLpney4GKkLo+KJcjsz0DZY5fjgZFb0SRrajPXQSaXlfM/C3DzFdM7Oh8h+FMDDIwBABKFbEzwJivIO7Lp6hrtbYjBLUWpjsp2P9hIlGM/uLuvN25FUFK7Gcug3Nu3hvIDjOZrMKXLHZf2cmngnXEJGLgH6DBkh3K3QNc+DNjCH5CQ5Y4oTQpDWuiP01xmUSVcDvjqX5sBlDgLl6lEFLeVT9JFFmWeiVMF6fkD3l/P3i55cCCejec6mAhrqVunawAKWbVcNiYuv0EIwQKVJQUUC8Y+fU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 5, 2025 at 11:46=E2=80=AFAM Shakeel Butt wrote: > > On Wed, Apr 30, 2025 at 08:27:18PM -0700, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -595,7 +595,13 @@ static inline void memcg_rstat_updated(struct mem_= cgroup *memcg, int val) > > if (!val) > > return; > > > > - cgroup_rstat_updated(memcg->css.cgroup, cpu); > > + /* > > + * If called from NMI via kmalloc_nolock -> memcg_slab_post_alloc= _hook > > + * -> obj_cgroup_charge -> mod_memcg_state, > > + * then delay the update. > > + */ > > + if (!in_nmi()) > > + cgroup_rstat_updated(memcg->css.cgroup, cpu); > > I don't think we can just ignore cgroup_rstat_updated() for nmi as there > is a chance (though very small) that we will loose these stats updates. I'm failing to understand why it's an issue. Not doing cgroup_rstat_updated() can only cause updated_next link to stay NULL when it should be set, but it should be harmless, and no different from racy check that the code already doing: if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next)) return; Imaging it was !NULL, the code would return, but then preemption, something clears it to NULL, and here we're skipping a set of updated_next. > In addition, memcg_rstat_updated() itself is not reentrant safe along > with couple of functions leading to it like __mod_memcg_lruvec_state(). Sure. __mod_memcg_lruvec_state() is not reentrant, but it's not an issue for kmalloc_nolock(), since objcg/memcg charge/uncharge from slub is not calling it (as far as I can tell). > > > statc =3D this_cpu_ptr(memcg->vmstats_percpu); > > for (; statc; statc =3D statc->parent) { > > /* > > @@ -2895,7 +2901,7 @@ static bool consume_obj_stock(struct obj_cgroup *= objcg, unsigned int nr_bytes, > > unsigned long flags; > > bool ret =3D false; > > > > - local_lock_irqsave(&memcg_stock.stock_lock, flags); > > + local_lock_irqsave_check(&memcg_stock.stock_lock, flags); > > > > stock =3D this_cpu_ptr(&memcg_stock); > > if (objcg =3D=3D READ_ONCE(stock->cached_objcg) && stock->nr_byte= s >=3D nr_bytes) { > > @@ -2995,7 +3001,7 @@ static void refill_obj_stock(struct obj_cgroup *o= bjcg, unsigned int nr_bytes, > > unsigned long flags; > > unsigned int nr_pages =3D 0; > > > > - local_lock_irqsave(&memcg_stock.stock_lock, flags); > > + local_lock_irqsave_check(&memcg_stock.stock_lock, flags); > > > > stock =3D this_cpu_ptr(&memcg_stock); > > if (READ_ONCE(stock->cached_objcg) !=3D objcg) { /* reset if nece= ssary */ > > @@ -3088,6 +3094,27 @@ static inline size_t obj_full_size(struct kmem_c= ache *s) > > return s->size + sizeof(struct obj_cgroup *); > > } > > > > +/* > > + * Try subtract from nr_charged_bytes without making it negative > > + */ > > +static bool obj_cgroup_charge_atomic(struct obj_cgroup *objcg, gfp_t f= lags, size_t sz) > > +{ > > + size_t old =3D atomic_read(&objcg->nr_charged_bytes); > > + u32 nr_pages =3D sz >> PAGE_SHIFT; > > + u32 nr_bytes =3D sz & (PAGE_SIZE - 1); > > + > > + if ((ssize_t)(old - sz) >=3D 0 && > > + atomic_cmpxchg(&objcg->nr_charged_bytes, old, old - sz) =3D= =3D old) > > + return true; > > + > > + nr_pages++; > > + if (obj_cgroup_charge_pages(objcg, flags, nr_pages)) > > + return false; > > + > > + atomic_add(PAGE_SIZE - nr_bytes, &objcg->nr_charged_bytes); > > + return true; > > +} > > + > > bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lr= u *lru, > > gfp_t flags, size_t size, void **p) > > { > > @@ -3128,6 +3155,21 @@ bool __memcg_slab_post_alloc_hook(struct kmem_ca= che *s, struct list_lru *lru, > > return false; > > } > > > > + if (!gfpflags_allow_spinning(flags)) { > > + if (local_lock_is_locked(&memcg_stock.stock_lock)) { > > + /* > > + * Cannot use > > + * lockdep_assert_held(this_cpu_ptr(&memcg_stock.= stock_lock)); > > + * since lockdep might not have been informed yet > > + * of lock acquisition. > > + */ > > + return obj_cgroup_charge_atomic(objcg, flags, > > + size * obj_full_s= ize(s)); > > We can not just ignore the stat updates here. > > > + } else { > > + lockdep_assert_not_held(this_cpu_ptr(&memcg_stock= .stock_lock)); > > + } > > + } > > + > > for (i =3D 0; i < size; i++) { > > slab =3D virt_to_slab(p[i]); > > > > @@ -3162,8 +3204,12 @@ bool __memcg_slab_post_alloc_hook(struct kmem_ca= che *s, struct list_lru *lru, > > void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > > void **p, int objects, struct slabobj_ext *ob= j_exts) > > { > > + bool lock_held =3D local_lock_is_locked(&memcg_stock.stock_lock); > > size_t obj_size =3D obj_full_size(s); > > > > + if (likely(!lock_held)) > > + lockdep_assert_not_held(this_cpu_ptr(&memcg_stock.stock_l= ock)); > > + > > for (int i =3D 0; i < objects; i++) { > > struct obj_cgroup *objcg; > > unsigned int off; > > @@ -3174,8 +3220,12 @@ void __memcg_slab_free_hook(struct kmem_cache *s= , struct slab *slab, > > continue; > > > > obj_exts[off].objcg =3D NULL; > > - refill_obj_stock(objcg, obj_size, true, -obj_size, > > - slab_pgdat(slab), cache_vmstat_idx(s)); > > + if (unlikely(lock_held)) { > > + atomic_add(obj_size, &objcg->nr_charged_bytes); > > objcg->nr_charged_bytes is stats ignorant and the relevant stats need to > be updated before putting stuff into it. I'm not following. It's functionally equivalent to refill_obj_stock() without __account_obj_stock(). And the stats are not ignored. The next __memcg_slab_free_hook() from good context will update them. It's only a tiny delay in update. I don't see why it's an issue. > > + } else { > > + refill_obj_stock(objcg, obj_size, true, -obj_size= , > > + slab_pgdat(slab), cache_vmstat_i= dx(s)); > > + } > > obj_cgroup_put(objcg); > > } > > } > > I am actually working on making this whole call chain (i.e. > kmalloc/kmem_cache_alloc to memcg [un]charging) reentrant/nmi safe. Thank you for working on it! You mean this set: https://lore.kernel.org/all/20250429061211.1295443-1-shakeel.butt@linux.dev= / ? it's making css_rstat_updated() re-entrant, which is renamed/reworked version of memcg_rstat_updated(). That's good, but not enough from slub pov. It removes the need for the first hunk in this patch from mm/memcontrol.c + if (!in_nmi()) + cgroup_rstat_updated(...); but hunks in __memcg_slab_post_alloc_hook() and __memcg_slab_free_hook() are still needed. And I think the obj_cgroup_charge_atomic() approach in this patch is correc= t. The delay in rstat update seems fine. Please help me understand what I'm missing.