From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D730C02182 for ; Wed, 22 Jan 2025 01:36:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A43F66B0085; Tue, 21 Jan 2025 20:36:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F4076B0088; Tue, 21 Jan 2025 20:36:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BC156B0089; Tue, 21 Jan 2025 20:36:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 62F556B0085 for ; Tue, 21 Jan 2025 20:36:05 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 124421A0742 for ; Wed, 22 Jan 2025 01:36:05 +0000 (UTC) X-FDA: 83033371890.01.279E6AD Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by imf24.hostedemail.com (Postfix) with ESMTP id 2942D180003 for ; Wed, 22 Jan 2025 01:36:02 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MKMhARFc; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737509763; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1NmVTNWFZ0Ug9mBpMXC6A4fuKCaG+c/4rmDBJiKkudM=; b=rvaHYg+xs4MK0ocsyC+8v4gz9bjQmmxAE1uFRzb9Et99T5I24FTbx+ruktPoldw1CUGUGa pFuolBQEqHUtdQAE/biRPy9tand7lqL3kLMlZcDolEVwvqxuhHT87clzUjsxEMtRrVkcXb wBJoeWggyw0O5GN7NQ4b3Vafs8H6ySE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737509763; a=rsa-sha256; cv=none; b=yXI2/4mKjTTrQMxYSxIojoNRVe3BT2gVYPjQF+rOCEGVB3Eoy6FFlr7JZRUsDyEVFhUKas WjZpuCexcuGssDoFBuNNb2MtHjHbjBsfC4+wjFQPKnrkcQt+Wx6CyzLcyk+RiE1P7aLTke 3deKsuTntiFe60SMoNehfJswI23zxnU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MKMhARFc; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-43690d4605dso43592315e9.0 for ; Tue, 21 Jan 2025 17:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737509762; x=1738114562; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1NmVTNWFZ0Ug9mBpMXC6A4fuKCaG+c/4rmDBJiKkudM=; b=MKMhARFcmFy+YejI9gzg6guHRMVNtKMjd35hUizcFAEYv8a59SJZgCYB0SYu+BTKvd QWof9sicorFjaMWe+sd+se4t78KSnIqAQFtdJoP9qne/zAayO6ro+r/p3JFpoS8fuekD VO18noKnHH05gRHN46LVMzxt4Lv3B55ehT//YENyZXbU5VGQUXWAru1Od1SKXJDFI5Q5 f7WxIBYviO+XoaRGmmePhrFmKWIMDIJ1Vr5/7EKMTaDPuQv5iUR45/yprpRue5RUYiMd 4cXeey6NSFm3gOC4NQ980jQTg3CSXXKChTp7/V5iNIZBumwfiFWSs8JXF2WzPZAhiSzU Hxww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737509762; x=1738114562; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1NmVTNWFZ0Ug9mBpMXC6A4fuKCaG+c/4rmDBJiKkudM=; b=EO4wTAxUVivHbqOh3weETC9Q7uXh+V3l9lw9LHshK/TSNqrUHSgXc8BVsY8WgVnnC/ bVGPE8O9NvfquCxTtNfzUgADtfvpYMjdPL/y3bIqQgNvUAgNvUNDp9TZJFtxzI3GxD81 cR2eIkaYhQTnZbp6J/XviAWnNr3U1NI+taSFptZSaHdhjMH8w3PZsbcgaZZFGtB2p2HR vA0Vep3FeST5p9ZbqFTUPb7LJH7auQtWlm0awNs0dNWMToifM39eRZJEiiBmC3Y5K2wO QnU3bTJYBaJoanZHSGN5hwi179U7EL9GRr+8ImrzeAcMLTDNxuHwZ3S62rfqtOEueK+v 1+Hg== X-Forwarded-Encrypted: i=1; AJvYcCXtPSdVuTMK0tgnvCUzYTTRI5PdVy4qzlQM0gM9GzigfWG0K5Hj2YqkFeAlC0BlYedGRZtUipWpiQ==@kvack.org X-Gm-Message-State: AOJu0YwAiCvKdL8Y1N9/+LwPc+Vpze1Cc2wcc9XK5MYDX+TnApz9HLIb zebzX4yLQkwcMMOow2t6HxvWSJHptgN7I3IHo9MoPMK26/UNpu9ehPb01q+DQA7eZNbJOMBw7Ec DUJ8StG11ahq3y7CFye6LiMMtNp8= X-Gm-Gg: ASbGncstTbQ+Dtm7PXtGajcZKiVg4INQjfyD4DUNRt9U68PUc1FGceOHTw7cnk2LOhr p0L46W39IEBbyQd82A4Izxnrfz59OeIWQf1vQAt2OuqZCMF8ISuKID/PkHhAtA8R+KYSgSbDOHU BEz+HSDVY= X-Google-Smtp-Source: AGHT+IFszq65fJ+7rS3NhgoURr9LHxDIYQr1gw7YJyxnjtFCBeoEDGvtVyolNHHNK51My1NUfB+w9S0tAfyLb1FT0WE= X-Received: by 2002:a05:6000:18a8:b0:385:f69a:7e5f with SMTP id ffacd0b85a97d-38bf57a9a86mr22252811f8f.38.1737509761328; Tue, 21 Jan 2025 17:36:01 -0800 (PST) MIME-Version: 1.0 References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> <20250115021746.34691-4-alexei.starovoitov@gmail.com> <20250117203315.FWviQT38@linutronix.de> <20250121164300.NOAtoArV@linutronix.de> In-Reply-To: <20250121164300.NOAtoArV@linutronix.de> From: Alexei Starovoitov Date: Tue, 21 Jan 2025 17:35:49 -0800 X-Gm-Features: AbW1kvbjdqpT1juDtk7qTauH_OlyPGpTK_MxGrXTAs6HUeGK88JjH2vlCWZZ13k Message-ID: Subject: Re: [PATCH bpf-next v5 3/7] locking/local_lock: Introduce local_trylock_irqsave() To: Sebastian Andrzej Siewior Cc: Vlastimil Babka , bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Hou Tao , Johannes Weiner , Shakeel Butt , Michal Hocko , Matthew Wilcox , Thomas Gleixner , Jann Horn , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2942D180003 X-Stat-Signature: p3d7afjgp1g3bd6f83gqehqmborobk8j X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1737509762-468820 X-HE-Meta: U2FsdGVkX1+o/JY+iKVPnSyCGnPTJvd+a+nGBY6LAxcBMEal0wuzKyH2YhyG6VEQIfERF4HzH4WepAo2QTOXs/7k25GR/B7oiJknEy9cd1nYZAShcp0MHgp8WN3t3AFNm93N3AekpY7QSe+WkpRsV5f34CWEJEbz1KOmqEg0yvYImBesuZWy3JDZFLrICKpjHojqgDjSTsx4QugB3I7nU504Wj9WXVznHuRkifjuq5fgOjNqf49JRWjojTfwt+EcT7qfGJu45oENXXybGqCqgTJTirse8LAXOOpXQEjawSoCtVavl+ZDJiaH52hKBu9V+9OT0P3UtiGGzRN817W+8nzPHyeGpTu9QxGELiui3E5i2LoqssUZf3I3574o+Etheg7zz6o8qcrVC5IA5pjg15DciwYO1z1luSYOpIkBvOs5+Mlm1EhddOIL7zkBRNXgI8FH2GzK30VSjenl0UiUG7HpVKzX5kwEu7g6Dj6cLxBfY/Z1cDoUtHLVJ59qOi14O1fZI+kLgG/50L7TDIlwv4Urb2n0HgIVbfZkIrwNDsIB/NtN3wS8GhZjpO6WV2g0EqOxLpGk7UgmeLiqwXW8E2y5IDljNeeXo8XIIDfnfazgSR/cOLWGJugcBfTV6mQfcqZJWs6P8GvXBydj/aOAaJONy5FpSfDxOErgfJXDw0KyAz77WuNYNtK7holyclS0aXfCuuuLB1UDsr4lrnXUekRcQP/MmbQTXIu50BUXOBkJy92S0SNIq6ss8ftEreeeVi8iX2gttH07nOaRKdWVS54/Y0bUKN+vQVIW+o23EkBKvhaFRip4F04RbDDfj0iW4fecyKG6EnQC0kKMIVCBs+8EThwwjdqOUN4U21PixMm6F/R7X6EpYcq+TIeOnrmjI1Im8hieyFbvzJHQubLykGxsIsgiQUEd8GvB5yFmEBNIdIMz+H8/lmfORRLWhX51QBe6NXz1U6KWSx6jkFQ 1Y+ViE5y OW3+ji+ODwuWZHbeQ5pnIZFQokKb1Up1K813u75jGgOYbC+cMSIlDheu4R8fV3KhX2ogoSbi+be4LaqQgAWyCmUUHW7sIfdGna8gFFNmStJVC3FBx5mavDqpN5cNMkRgLpzhD8BMnfir6VHsE1IJ3a28s2O/OaZMeOxwoi3OfWKRYnRphXT4O7/4Ja7KHvoAu4WAJiO3D2k9aggeNTvepB2J9xhYgTn/FfTz+TVoY8XQ7sIMEkyKPzXigdzgNWeD05EsKzxJt6U6w05GvXi3tyTaDGmUYtN7jE60vMqIXg/YeBDrBJQRpjXD0UhR6iYyCkVgkdorkiZL3UVJB2uFQdoX44ZuHThetUUT3C9PCj+cjAwF/TIRbWfLW3bL0+OjmEQXFk4AL2uffP+0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 21, 2025 at 8:43=E2=80=AFAM Sebastian Andrzej Siewior wrote: > > On 2025-01-21 16:59:40 [+0100], Vlastimil Babka wrote: > > I don't think it would work, or am I missing something? The goal is to > > allow the operation (alloc, free) to opportunistically succeed in e.g. > > nmi context, but only if we didn't interrupt anything that holds the > > lock. Otherwise we must allow for failure - hence trylock. > > (a possible extension that I mentioned is to also stop doing irqsave to > > avoid its overhead and thus also operations from an irq context would b= e > > oportunistic) > > But if we detect the "trylock must fail" cases only using lockdep, we'l= l > > deadlock without lockdep. So e.g. the "active" flag has to be there? > > You are right. I noticed that myself but didn't get to reply=E2=80=A6 > > > So yes this goes beyond the original purpose of local_lock. Do you thin= k > > it should be a different lock type then, which would mean the other > > users of current local_lock that don't want the opportunistic nesting > > via trylock, would not inflict the "active" flag overhead? > > > > AFAICS the RT implementation of local_lock could then be shared for bot= h > > of these types, but I might be missing some nuance there. > > I was thinking about this over the weekend and this implementation > extends the data structure by 4 bytes and has this mandatory read/ write > on every lock/ unlock operation. This is what makes it a bit different > than the original. > > If the local_lock_t is replaced with spinlock_t then the data structure > is still extended by four bytes (assuming no lockdep) and we have a > mandatory read/ write operation. The whole thing still does not work on > PREEMPT_RT but it isn't much different from what we have now. This is > kind of my favorite. This could be further optimized to avoid the atomic > operation given it is always local per-CPU memory. Maybe a > local_spinlock_t :) I think the concern of people with pitchforks is overblown. These are the only users of local_lock_t: drivers/block/zram/zcomp.h: local_lock_t lock; drivers/char/random.c: local_lock_t lock; drivers/connector/cn_proc.c: local_lock_t lock; drivers/md/raid5.h: local_lock_t lock; kernel/softirq.c: local_lock_t lock; mm/memcontrol.c: local_lock_t stock_lock; mm/mlock.c: local_lock_t lock; mm/slub.c: local_lock_t lock; mm/swap.c: local_lock_t lock; mm/swap.c: local_lock_t lock_irq; mm/zsmalloc.c: local_lock_t lock; net/bridge/br_netfilter_hooks.c: local_lock_t bh_lock; net/core/skbuff.c: local_lock_t bh_lock; net/ipv4/tcp_sigpool.c: local_lock_t bh_lock; Networking is the one that really cares about performance and there 'int active' adds 4 bytes, but no run-time overhead, since it's using local_lock_nested_bh() that doesn't touch 'active'. memcontrol.c and slub.c are those that need these new trylock logic with 'active' field to protect from NMI on !RT. They will change to new local_trylock_t anyway. What's left is not perf critical. Single WRITE_ONCE in lock and another WRITE_ONCE in unlock is really in the noise. Hence I went with a simple approach you see in this patch. I can go with new local_trylock_t and _Generic() trick. The patch will look like this: diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 8dd71fbbb6d2..ed4623e0c71a 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -15,6 +15,14 @@ typedef struct { #endif } local_lock_t; +typedef struct { +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; + struct task_struct *owner; +#endif + int active; +} local_trylock_t; #define __local_lock_irqsave(lock, flags) \ do { \ + local_lock_t *l; \ local_irq_save(flags); \ - local_lock_acquire(this_cpu_ptr(lock)); \ + l =3D (local_lock_t *)this_cpu_ptr(lock); \ + _Generic((lock), \ + local_trylock_t *: \ + ({ \ + lockdep_assert(((local_trylock_t *)l)->active =3D= =3D 0); \ + WRITE_ONCE(((local_trylock_t *)l)->active, 1); \ + }), \ + default:(void)0); \ + local_lock_acquire(l); \ } while (0) + +#define __local_trylock_irqsave(lock, flags) \ + ({ \ + local_trylock_t *l; \ + local_irq_save(flags); \ + l =3D this_cpu_ptr(lock); \ + if (READ_ONCE(l->active) =3D=3D 1) { \ + local_irq_restore(flags); \ + l =3D NULL; \ + } else { \ + WRITE_ONCE(l->active, 1); \ + local_trylock_acquire((local_lock_t *)l); \ + } \ + !!l; \ + }) and use new local_trylock_t where necessary. Like: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..8fe141e93a0b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1722,7 +1722,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *me= mcg) } struct memcg_stock_pcp { - local_lock_t stock_lock; + local_trylock_t stock_lock; All lines where stock_lock is used will stay as-is. So no code churn. Above _Generic() isn't pretty, but not horrible either. I guess that's a decent trade off. Better ideas?