From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9492DC3ABC9 for ; Tue, 13 May 2025 21:55:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E99836B00CC; Tue, 13 May 2025 17:55:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF6DA6B00CD; Tue, 13 May 2025 17:55:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4AD76B00D3; Tue, 13 May 2025 17:55:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A3D4B6B00CC for ; Tue, 13 May 2025 17:55:39 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D53FB161891 for ; Tue, 13 May 2025 21:55:39 +0000 (UTC) X-FDA: 83439241998.17.6A27A91 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf16.hostedemail.com (Postfix) with ESMTP id EB128180004 for ; Tue, 13 May 2025 21:55:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nkhj1ecT; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747173338; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f0rO3WP4XFL6zJ5Hg0f3J54m6c9RExWTkXri/bw1j1s=; b=7FJuxzwjB7lMRVWO+x2smW849faZ3ogv6du3uPidT5+ZAVPo51JYzDlpZBdEx8fjiSevXE f0zq04z9DFhbY5yFtt4QK8za/fQ4+XuZObxfif95oTEio0Zy82rb2LqpkY7wZNeitwV1+I ytS+YFwE9Zdg0W3m9/pgl1V6mfHTXtU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nkhj1ecT; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747173338; a=rsa-sha256; cv=none; b=bJQ0MC7X9oidiVyTIkJL5dBSYeybIA3dQLEinPvTeh3NG0p812d76zUsufoCj5QIcCiFcp LTLClssGwzus+dEYircmZPQIbhxg0N4yY7jdlkOQ6s8uhbNhbhloLALwC3wD59O4OlgkkN nCJ4X8OQDnnhjV7ej+Fp1uRNaA6nRGw= Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-b24f986674fso297459a12.0 for ; Tue, 13 May 2025 14:55:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747173337; x=1747778137; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=f0rO3WP4XFL6zJ5Hg0f3J54m6c9RExWTkXri/bw1j1s=; b=Nkhj1ecTgCeKluPQbrZQWBX3Ag/NURRSZHO3MZrlBOdUSdeoDLR2d4BmQyXK9VbT/O muWfhidm/snFqzfsWbfCphp9kEHekNIZpzjcjnC29TcifTLmiFaeKFwXjYjVW31B7MPI wO5YS00ka4/7h7Gq/ssVh7zq2QGMZc9StS00KS53l3xmYUy8ojMUvqKeh6XjVN/ZMB9x +pgl9hrpqPZ9W8G3wSsm044QjQvyYRiGLg3P1kWBWBuybeQAndaujnvA4SYmF/n+BCyY wU9aJdIycNvai098QtW2wVM51rNoc+5tt6QYwc+nvl3N4QZj82NdPiI2Fcq2aPcCOKRk QV0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747173337; x=1747778137; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f0rO3WP4XFL6zJ5Hg0f3J54m6c9RExWTkXri/bw1j1s=; b=ggIzg/UfRAFyxel/RO6lGC22aPzBlKUmgZVbvxsSS9tcoQyxU2U6XhMxT9pLiuUHbS yiximoF8V22JVXzTwCT0C6GGLy937VDka73jALTtlUZ9wjAnViZ7WfvQD0/4jPvffRhh 5wh6lyyRcDogfU/3Q5Mv3qi7DTGkZ2CkfchBdUxKD8KXrh7JJV5v9VIHQyaXVE6uvXSy v1HoBrR8l/4EEbGeoO6PK4CMX7S9dwcfqj9xmkZZotoxqU/zFkq4RJAlXqUGBocHf9sZ dIG9sWiEmKN2VZmCkkpkiQO94O65jj+eq9KBH2mLyxztrFTwTvuznQJs1/T2zxwEOQb3 XIXw== X-Forwarded-Encrypted: i=1; AJvYcCXu1pWCQGOTFhvLkoYcS3JiUOIp6jMVChbaG2V4sUHnb8DhNLbzWR6zo5O0kXagNDyPKfw/1yq3dA==@kvack.org X-Gm-Message-State: AOJu0YySRIlhsrhqfJ8THv3Q+ZXqXyC0zFI76rJiEzfnUVSOWzia5VT1 /ch5RFX6bD+mnzl/F8OJZURSJKATAZrtHorQQcmzbcKt6Lj9NS9l X-Gm-Gg: ASbGncvaxhmhBY1o4QZdnyTnYVYgW7xYKsQE24TAUWUzxEBuSKVcPU/xVIlh+zYaq8p xXasXNyoIaN0PoBzWziVcl2IUhHAxgcM3m5+KIkbv1kiYxeQbZa+vtcMj/wbyOav7VL1/iWlEuD gzV1Rsad5cQxMzUqjau48UVakwDgvmnHiqWN5uS2wzUnXkkS7MKKpZlYvB096QuLQgK0QJt3Grr tuaFlKMFwDftVRYgA3fd3bJlTpypZSNPhMKm0gFGquBxU3u3o9HwsioZ6mXTDFs+7EUbkE6yC3C Vk1hWvMCH0LuFRyDgMYKzt2S7xG8tCMSY+2MYzfidxd2ioTofSHzRCvGRDWwnhFI/4K+ATAFYVt RnrKBkfFh9mYI5seJkilR65WRhQ== X-Google-Smtp-Source: AGHT+IFslcIkZYaKWYt+R4dr+yTwjrGZc/XIPki7tcYG8vVxVX2d43zhrH9anQuoGgC+ul6/QNwfyw== X-Received: by 2002:a17:902:ecc6:b0:220:ff82:1c60 with SMTP id d9443c01a7336-2317cb01e62mr75079715ad.14.1747173336509; Tue, 13 May 2025 14:55:36 -0700 (PDT) Received: from MacBook-Pro-49.local ([2001:558:600a:7:a83d:600f:32cc:235a]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22fc828c630sm85318685ad.177.2025.05.13.14.55.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 May 2025 14:55:36 -0700 (PDT) Date: Tue, 13 May 2025 14:55:34 -0700 From: Alexei Starovoitov To: Vlastimil Babka Cc: Sebastian Andrzej Siewior , bpf , linux-mm , Harry Yoo , Shakeel Butt , Michal Hocko , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Johannes Weiner , Matthew Wilcox Subject: Re: [PATCH 4/6] locking/local_lock: Introduce local_lock_irqsave_check() Message-ID: References: <20250501032718.65476-1-alexei.starovoitov@gmail.com> <20250501032718.65476-5-alexei.starovoitov@gmail.com> <20250512140359.CDEasCj3@linutronix.de> <737d8993-b3c7-4ed5-8872-20c62ab81572@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <737d8993-b3c7-4ed5-8872-20c62ab81572@suse.cz> X-Stat-Signature: u68n7z6x6ohmax9erbjwhnxrcswdrmjs X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EB128180004 X-HE-Tag: 1747173337-915104 X-HE-Meta: U2FsdGVkX19pQwwIdL0lG+M4+FUFon7XDNI2kjrQmca3bas3/curZeUbWkndLOL2jMp3DItZC8FLrKlV0mVoYe9NYizj4+DC4nGNF6rUo/JyYmuXCHqWXr5aJUZ61Pp+Z9ZR3723S2Q+RfbcGonee4VxaRuV2EomtHKiMU/oNeA/Hr31rI1cPaZJFdIM3LD75zh7yOQcQLa6PC73ob05cTsd9Bk2YHaneDSfOCVaRDLaLV6ubNVp0HtfpHr/UEaFYdNtTLxC5JVBVmdvwQHt6AGbZguDDHir9C4JaDwe8KBj2Y+FWdYBUdQD2RNU2DaKGnGGWyeqYOVm3qLKtpufq8ztv5dm/DowJrCk7fr7Mu59STuEAzImClnwzUTABrYZXFnHuKEATpLYsQHRbogh1wXPFGpZA2lyZWePArLJCN9Mxgri6c+7rqV8Fa3arUzXjPzFfZZJIpqFgksXVQghpZajPU3ZahsvAD0B3EMIyxYFmZ9GGoeMbT4WKNF1nU5PbnYGWFBQ6vnzXtC/Hipb5Kb/HKll4Vg8vYoj2CLQBlQeshQzLK2IiauyIGosgwuzx2t2Ps8wD0Q3pgMJeaTzX1s52gz6M3fWCy3qCG8UR1G9XbcvSdXW84U80AxFrbUapMDAcL8UHJuLcGND1visIsvb4PmXfC0bHy/zJRy/6Z3NvGMyyW4zGiJdrOToel6TPxDqB4GbDdtJ4BjkeVbtTziT59MTec8+275/DomcsRYzbWCYBNA6ZYgaNhPffuL96FZwuXBJ9KkPHYhQC2Vaehl5zPOaI6OqZeVKuur5q+fk2OC8oDVn+5EG9RVDihWVVRSfVbD0+SL9dmPxeMunkNlNXbtTsUrqMnvmfhAVv+xhSh9z9rz3BfI2D7hAxwj823cDdzpFb26aimaGSWAIZ8nS98Lo4knCRwNP5aNp/dmg/TV1du62epnbIAxBVrjNwMvWPTmFcSuhG7571iV /UTeFq9D 8Ocv28xyL0HsFDA/nmaSQ44uUXHnCBAZuSybZeaBC3S4einHBszQePXwjL3LRExPFOjfOhyMLPkU4kw8sjal+dvd5qriBBNGWmhLydgngVXSLILLxfWrcxuLTAkI9sAbASAZiNN2NTUvPGL7Zy5s3dKU52XrHCFnOQCcqVk9o/RZUc1m6Iodd277yCogGRK34CwN5KgnqkUIzCcua1cvCMy6FSXpxhRLKWosxIdaTQMjIZ/y0KAhpksgpEJWrO6Ad8emDvl6DpyNMkZFvTBkzYFO9H0uf4nvNDcGuLgLBjsn5/RiN5KBs+dC8Yz24iTAKjfe4JShgv1fOsRxamUA5hx2ze/xK9SBBMrctR+DZh+LPRnpObAtkXipS9d+AChk3MysRcbHAM5uWE++ewT5S5w1WV8IL5vCYd7McR9vgmGHAp9QAJGRqlB4RGAmMFoNHxdqmDxIDXGn+UQKNlvP7Agco6ScDouvfRUDKud7C1dlWsYhavurHVtjD4vgfRXi0IAajbSlEnwnHWhWNihYlmxkqT0oxT3NidpSb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 13, 2025 at 08:58:43AM +0200, Vlastimil Babka wrote: > On 5/12/25 19:16, Alexei Starovoitov wrote: > > On Mon, May 12, 2025 at 7:04 AM Sebastian Andrzej Siewior > > wrote: > >> > >> On 2025-04-30 20:27:16 [-0700], Alexei Starovoitov wrote: > >> > --- a/include/linux/local_lock_internal.h > >> > +++ b/include/linux/local_lock_internal.h > >> > @@ -168,6 +168,15 @@ do { \ > >> > /* preemption or migration must be disabled before calling __local_lock_is_locked */ > >> > #define __local_lock_is_locked(lock) READ_ONCE(this_cpu_ptr(lock)->acquired) > >> > > >> > +#define __local_lock_irqsave_check(lock, flags) \ > >> > + do { \ > >> > + if (IS_ENABLED(CONFIG_DEBUG_LOCK_ALLOC) && \ > >> > + (!__local_lock_is_locked(lock) || in_nmi())) \ > >> > + WARN_ON_ONCE(!__local_trylock_irqsave(lock, flags)); \ > >> > + else \ > >> > + __local_lock_irqsave(lock, flags); \ > >> > + } while (0) > >> > + > >> > >> Hmm. If I see this right in SLUB then this is called from preemptible > >> context. Therefore the this_cpu_ptr() from __local_lock_is_locked() > >> should trigger a warning here. > > > > When preemptible the migration is disabled. So no warning. > > > >> This check variant provides only additional debugging and otherwise > >> behaves as local_lock_irqsave(). Therefore the in_nmi() should return > >> immediately with a WARN_ON() regardless if the lock is available or not > >> because the non-try variant should never be invoked from an NMI. > > > > non-try variant can be invoked from NMI, because the earlier > > __local_lock_is_locked() check tells us that the lock is not locked. > > And it's safe to do. > > And that's the main challenge here. > > local_lock_irqsave_check() macro fights lockdep here. > > > >> This looks like additional debug infrastructure that should be part of > >> local_lock_irqsave() itself, > > > > The pattern of > > > > if (!__local_lock_is_locked(lock)) { > > .. lots of code.. > > local_lock_irqsave(lock); > > > > is foreign to lockdep. > > > > Since it can be called from NMI the lockdep just hates it: > > > > [ 1021.956825] inconsistent {INITIAL USE} -> {IN-NMI} usage. > > ... > > [ 1021.956888] lock(per_cpu_ptr(&lock)); > > [ 1021.956890] > > [ 1021.956891] lock(per_cpu_ptr(&lock)); > > .. > > > > and technically lockdep is correct. > > For any normal lock it's a deadlock waiting to happen, > > but not here. > > > > Even without NMI the lockdep doesn't like it: > > [ 14.627331] page_alloc_kthr/1965 is trying to acquire lock: > > [ 14.627331] ffff8881f6ebe0f0 ((local_lock_t > > *)&c->lock){-.-.}-{3:3}, at: ___slab_alloc+0x9a9/0x1ab0 > > [ 14.627331] > > [ 14.627331] but task is already holding lock: > > [ 14.627331] ffff8881f6ebd490 ((local_lock_t > > *)&c->lock){-.-.}-{3:3}, at: ___slab_alloc+0xc7/0x1ab0 > > [ 14.627331] > > [ 14.627331] other info that might help us debug this: > > [ 14.627331] Possible unsafe locking scenario: > > [ 14.627331] > > [ 14.627331] CPU0 > > [ 14.627331] ---- > > [ 14.627331] lock((local_lock_t *)&c->lock); > > [ 14.627331] lock((local_lock_t *)&c->lock); > > > > When slub is holding lock ...bd490 we detect it with > > __local_lock_is_locked(), > > then we check that lock ..be0f0 is not locked, > > and proceed to acquire it, but > > lockdep will show the above splat. > > > > So local_lock_irqsave_check() is a workaround to avoid > > these two false positives from lockdep. > > > > Yours and Vlastimil's observation is correct, that ideally > > local_lock_irqsave() should just handle it, > > but I don't see how to do it. > > How can lockdep understand the if (!locked()) lock() pattern ? > > Such usage is correct only for per-cpu local lock when migration > > is disabled from check to acquire. > > Thanks, I think I finally understand the issue and why a _check variant is > necessary. As a general note as this is so tricky, having more details in > comments and commit messages can't hurt so we can understand it sooner :) > > Again this would be all simpler if we could just use trylock instead of > _check(), but then we need to handle the fallbacks. And AFAIU on RT trylock > can fail "spuriously", i.e. when we don't really preempt ourselves, as we > discussed in that memcg thread. > > > Hence the macro is doing: > > if (IS_ENABLED(CONFIG_DEBUG_LOCK_ALLOC) && > > (!__local_lock_is_locked(lock) || in_nmi())) > > WARN_ON_ONCE(!__local_trylock_irqsave(lock, flags)); > > > > in_nmi() part is a workaround for the first lockdep splat > > and __local_lock_is_locked() is a workaround for 2nd lockdep splat, > > though the code did __local_lock_is_locked() check already. > > So here's where this would be useful to have that info in a comment. > However, I wonder about it, as the code uses __local_trylock_irqsave(), so > lockdep should see it as an opportunistic attempt and not splat as that > trylock alone should be avoiding deadlock - if not we might have a bug in > the lockdep bits of trylock. Point taken. The comments need to be more detailed. I've been thinking of a way to avoid local_lock_irqsave_check() and came up with the following: diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 94be15d574ad..58ac29f4ba9b 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -79,7 +79,7 @@ do { \ \ debug_check_no_locks_freed((void *)lock, sizeof(*lock));\ lockdep_init_map_type(&(lock)->dep_map, #lock, &__key, \ - 0, LD_WAIT_CONFIG, LD_WAIT_INV, \ + 1, LD_WAIT_CONFIG, LD_WAIT_INV, \ LD_LOCK_PERCPU); \ local_lock_debug_init(lock); \ } while (0) @@ -166,11 +166,21 @@ do { \ }) /* preemption or migration must be disabled before calling __local_lock_is_locked */ -#define __local_lock_is_locked(lock) READ_ONCE(this_cpu_ptr(lock)->acquired) +#define __local_lock_is_locked(lock) \ + ({ \ + bool ret = READ_ONCE(this_cpu_ptr(lock)->acquired); \ + \ + if (!ret) \ + this_cpu_ptr(lock)->dep_map.flags = LOCAL_LOCK_UNLOCKED;\ + ret; \ + }) + +#define __local_lock_flags_clear(lock) \ + do { this_cpu_ptr(lock)->dep_map.flags = 0; } while (0) It would need to be wrapped into macroses for !LOCKDEP, of course. diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h index 9f361d3ab9d9..6c580081ace3 100644 --- a/include/linux/lockdep_types.h +++ b/include/linux/lockdep_types.h @@ -190,13 +190,15 @@ struct lockdep_map { u8 wait_type_outer; /* can be taken in this context */ u8 wait_type_inner; /* presents this context */ u8 lock_type; - /* u8 hole; */ + u8 flags; #ifdef CONFIG_LOCK_STAT int cpu; unsigned long ip; #endif }; +#define LOCAL_LOCK_UNLOCKED 1 diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 58d78a33ac65..0eadee339e1f 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4961,6 +4961,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name, lock->wait_type_outer = outer; lock->wait_type_inner = inner; lock->lock_type = lock_type; + lock->flags = 0; /* * No key, no joy, we need to hash something. @@ -5101,6 +5102,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, lockevent_inc(lockdep_nocheck); } + if (unlikely(lock->flags == LOCAL_LOCK_UNLOCKED)) + subclass++; + if (subclass < NR_LOCKDEP_CACHING_CLASSES) class = lock->class_cache[subclass]; /* and the usage from slub/memcg looks like this: if (!!local_lock_is_locked(&s->cpu_slab->lock)) { ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, size); __local_lock_flags_clear(&s->cpu_slab->lock); } With that all normal local_lock_irqsave() automagically work. High level the idea is to tell lockdep: "trust me, I know what I'm doing". Since it's a per-cpu local lock the workaround tells lockdep to treat such local_lock as nested, so lockdep allows second local_lock while the same cpu (in !RT) or task (in RT) is holding another local_lock. It addresses the 2nd false positive above: [ 14.627331] lock((local_lock_t *)&c->lock); [ 14.627331] lock((local_lock_t *)&c->lock); but doesn't address the first false positive of: [ 1021.956825] inconsistent {INITIAL USE} -> {IN-NMI} usage. We can silence lockdep for this lock with: @@ -5839,6 +5840,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, */ kasan_check_byte(lock); + if (unlikely(lock->flags == LOCAL_LOCK_UNLOCKED)) + trylock = 1; + if (unlikely(!lockdep_enabled())) { Then all lockdep false positives are gone. In other words the pair: local_lock_is_locked(&local_lock); __local_lock_flags_clear(&local_lock); guards the region where local_lock can be taken multiple times on that cpu/task from any context including nmi. We know that the task won't migrate, so multiple lock/unlock of unlocked lock is safe. I think this is a lesser evil hack/workaround than local_lock_irqsave_check(). It gives clean start/end scope for such usage of local_lock. Thoughts?