From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98D35C27C4F for ; Fri, 21 Jun 2024 23:04:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DE666B04A4; Fri, 21 Jun 2024 19:04:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9658F6B04B0; Fri, 21 Jun 2024 19:04:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B7E36B04AE; Fri, 21 Jun 2024 19:04:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 52AB06B04A4 for ; Fri, 21 Jun 2024 19:04:09 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B661B1211F2 for ; Fri, 21 Jun 2024 23:04:08 +0000 (UTC) X-FDA: 82256425776.30.25F836A Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf29.hostedemail.com (Postfix) with ESMTP id C8284120017 for ; Fri, 21 Jun 2024 23:04:05 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=W8HvOUzu; spf=pass (imf29.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719011035; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QyIxA/oNpBTPt4ny609aB/kSdO2414ppktI5EQ6EabI=; b=3Sy0OKTn6icfsabiK+QO5SAxi3nxgHuEi08fvPGiXa/QchCC47ebw9FDnqpwz1xBCZWhPm 61gk8v+GIneN4BPDjt3yHvB3bS0Sltd4MmgqmpC/XuogiNo5B52+PXi9bNokZIuxw3xhQZ Nu1qUNK2TakEEVaWBU4itgtZvDUfcfY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719011035; a=rsa-sha256; cv=none; b=0yO2cgeZJc7SiMtbKKCK9TaaF/C7iY/GRAUew/wxo5zaw6TeMpj/O0NiDwGH/l4qVfqbDL 4Sy8xSkwpd3WKx6DGQD+lvmt48r0QiqR5CMCtWKrX+rVh5S5C3iYrpj6iSWd/w4z18VzRs KtJpsHmWZcM0e5uWQ/kY+aMh47CZraI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=W8HvOUzu; spf=pass (imf29.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-421f4d1c057so23024625e9.3 for ; Fri, 21 Jun 2024 16:04:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719011044; x=1719615844; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QyIxA/oNpBTPt4ny609aB/kSdO2414ppktI5EQ6EabI=; b=W8HvOUzuiMcLpmI1mrdzpqRs00i/nwqx08+DCv9CvyEV54veXLKCKsOQ3O+uKQt48D OvEkKKi+Dbuv7bF309dtutsKVDO/kEy8GkEAI9e3WvZC53LwS6FiMhrvDdOu0ZR59fk2 Gi1EeIyIEVOHV0kdTBYQcQoEPUHSIuqbr9ut7fUcFerRtz+EPmByrqXNZUsAeh3UgCPc M8j/LIUtKUIMwl3kP7leI77WzqvK1TbIvOzvTfoDDhoOYDNeP+1kONwU8mfmMtapGVfu ZluZ8Z+BNVrJwxLeajmlgILWt/pleNkZkJFi8MxBk5rmf8WCNHj5CbwylAnT5oYcK33D qVMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719011044; x=1719615844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QyIxA/oNpBTPt4ny609aB/kSdO2414ppktI5EQ6EabI=; b=TzHt4kjoXl6xukpFpxsrXSiTjbNzz30G9tQAWVAx1j/Wg+t1j67XoOKuWhQrGn8mWV 3kOnTMbF7vmIfeGdTccBcPHR8U5a653j/QxUnfO2M6YIq5Lr0BwDl6Oe4OdJE7eNjdt7 lOAhKAST2i+oHXgsO/TSvNMVUjTtsCn3svP7wD4BmMR/ynf0klbwRD8jYXXpuIm2UsSr 26VLrxSpM8+MBqwhXR7a9sGiVe2hcKC2gFrpWU0KbSaoUJpoJrQHwGTFY6h0hxP86jet Dax8WzBqR5b9PTEq+Z9ELlEVePE8ckjKu8hHxSnQFtu90TPyyKvQxhYU05GJOV8F6eqY 5yCA== X-Gm-Message-State: AOJu0YyTEAIPWdCqXnkdWRPwKYHDCUmK+dmgfb3bWFCexrplnkbwdAlq XLMhQK3WlAkrZ6EcQiLLHU3HxIA6pIOQ8qRPQRLWAQb4SbTSDDbj1SXfGdXXzWQ3ppEVzxLFsCO UTCLXdrI2TG/fbiQEQlqCNtWENQTBpgCpHIXF X-Google-Smtp-Source: AGHT+IGdt97m2HbrRSMe6Qgmb/piNElKrTuWxumai62rKRGIt2GILaEk2AU9o+H7bVMrdkha90OYmZU3J/39eOp4IRM= X-Received: by 2002:adf:f791:0:b0:362:9313:e535 with SMTP id ffacd0b85a97d-363192ce38bmr6585083f8f.48.1719011043943; Fri, 21 Jun 2024 16:04:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Axel Rasmussen Date: Fri, 21 Jun 2024 16:03:25 -0700 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer To: Tetsuo Handa Cc: linux-mm , Andrew Morton , Nicolas Saenz Julienne , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C8284120017 X-Stat-Signature: apeuhso857orzcx8p6h7q6ueejm7ax4t X-HE-Tag: 1719011045-613766 X-HE-Meta: U2FsdGVkX1+WI21ZwtWmyhq7DHU4uLKK/WHqV0EufeX/pLH/OkX91Z/nm+f2emG2EIurti/noOaNU+8JmNijRND8PqrQ+uSZNHIAowPj5TXGUfmaRyt1Xj4QNWD37vDJ/f+4nnw3AGykmO9HPS7nDNlRsgqLA+PIrib2PdY316V2uvy4w20/cdXS9FjTDl4VBEkarLo5SzqLj4JyCAbtYlp+kujgBkniTD7Y8H5iN7f7zZokj+chCxLjmtyteBP4vLye27HrvmK4g8FtSfd5JtLMzWS9FUFWBhEYKr45VrZa59mnLpa92J0I1djK1bKbFRYz6MsqY8qMSqkaFwB1RE1avP4YJwyZ4dq2zAHZ0M1jQRqySUmClpf8td4bMDY8kYPNfKja+Cijr35o5P0iXfkJP9a24qUzBHk5VXWD3Qnf5SvGpZMmTgSotIBaFk/IWGEKq5dXhlGZXl8LhIrKh+RUWdE5Q9hXnGgISDabiPu7lH5pizKcpnfv4PEQCuzI+nBxb82ryT1KYXeZYXIfdmRIV6ZPAzm3y6TLDkXpAZVOYid8W04mSyvGciaGXYsiJDgM8UV3lwKsperwp5Y+9qJuyqxbLGsc4pk6BNY9dLggDgL96/IVMsUVlGjqPyBJiFzsCW0NGaVCQ0XNG1TdWGJett+sXQKys6r0Du+e5anProfoUFX5uIQsZUqMKmmw2/i1pQBJKrsA5neEDQDUdt0MdM2qCFAROoll3ILL2g2CMpdORmU3SS9gLK1UKVVxudXxj0MZhgZJEhDoBAHr1Ltg7uN99KuMtuQCAfJYOLrV2gQS4GQhevo/QyurLcJtTPfCkPo1LWookaLeACKi5J7aWepiM1PY8ND2CeHo+VZhjEqCRqXGcNiUPiF9ZZnRACUYM2pysYo3OnL3yPbdCcGjG4qP8YNfZ2A7BJF1oVv3ew9siUjy6h6bSFvz9Z4N34roZ3mmAlqXuUPldPm KUfYL7lJ MqWdUqcA76sc5JnALEIPw1owJuSnha5qe/DltlIsXDYhIThPFgvH8fsqCHrSIjdxxXWV5qMgQGCt+tcgcEGanRVPuOIXomci3QHHkiEGsb9TdVOi2Wn8ArNEc7aSNF1i1IL3aOrcex7/nGRsKkVtbDje1XUA1Mr1CzAldA/Hlx1Wi49VxzFkMYnBxPfQOtMx1j5iP9NDG6UWq6wFimw7X5y5MPxnH0JeBznvp1ETgLgYLLqL7Go21DOso00nUjqvfZOokS86V/+LefCwD5beA+TZvDdLa4jBv55NuzNt0XRaSjLQ/x0n6CYnK02FcG8MSkpMKnw31maccdiuuPxZE6Qd3sPcBOdrqVxdQenss+GGWm8oAkkAJyz6u9AIvgyVxfmzRFCPnG3A1t2cjSg5dBXyRTUFPcC9BRlQ4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 20, 2024 at 6:08=E2=80=AFPM Tetsuo Handa wrote: > > Commit 2b5067a8143e ("mm: mmap_lock: add tracepoints around lock > acquisition") introduced TRACE_MMAP_LOCK_EVENT() macro using > preempt_disable() in order to let get_mm_memcg_path() return a percpu > buffer exclusively used by normal, softirq, irq and NMI contexts > respectively. > > Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling > preemption") replaced preempt_disable() with local_lock(&memcg_paths.lock= ) > based on an argument that preempt_disable() has to be avoided because > get_mm_memcg_path() might sleep if PREEMPT_RT=3Dy. > > But syzbot started reporting > > inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. > > and > > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > > messages, for local_lock() does not disable IRQ. > > We could replace local_lock() with local_lock_irqsave() in order to > suppress these messages. But this patch instead replaces percpu buffers > with on-stack buffer, for the size of each buffer returned by > get_memcg_path_buf() is only 256 bytes which is tolerable for allocating > from current thread's kernel stack memory. > > Reported-by: syzbot > Closes: https://syzkaller.appspot.com/bug?extid=3D40905bca570ae6784745 > Fixes: 832b50725373 ("mm: mmap_lock: use local locks instead of disabling= preemption") > Signed-off-by: Tetsuo Handa > --- > Only compile tested. > > mm/mmap_lock.c | 175 ++++++------------------------------------------- > 1 file changed, 20 insertions(+), 155 deletions(-) No objections. Looking back all the way to the first version [1] the buffers were already percpu, instead of on the stack like this. IOW, there was no on-list discussion about why this shouldn't go on the stack. It has been a while, but if memory serves I opted to do it that way just out of paranoia around putting large buffers on the stack. But, I agree 256 bytes isn't all that large. That v1 patch wasn't all that complex, but then again it didn't deal with various edge cases properly :) so it has grown significantly more complex over time. Reconsidering the approach seems reasonable now, given how much code this removes. This change looks straightforwardly correct to me. You can take: Reviewed-by: Axel Rasmussen [1]: https://lore.kernel.org/all/20200917154258.1a364cdf@gandalf.local.home= /T/ > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index 1854850b4b89..368b840e7508 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -19,14 +19,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); > > #ifdef CONFIG_MEMCG > > -/* > - * Our various events all share the same buffer (because we don't want o= r need > - * to allocate a set of buffers *per event type*), so we need to protect= against > - * concurrent _reg() and _unreg() calls, and count how many _reg() calls= have > - * been made. > - */ > -static DEFINE_MUTEX(reg_lock); > -static int reg_refcount; /* Protected by reg_lock. */ > +static atomic_t reg_refcount; > > /* > * Size of the buffer for memcg path names. Ignoring stack trace support= , > @@ -34,136 +27,22 @@ static int reg_refcount; /* Protected by reg_lock. *= / > */ > #define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL > > -/* > - * How many contexts our trace events might be called in: normal, softir= q, irq, > - * and NMI. > - */ > -#define CONTEXT_COUNT 4 > - > -struct memcg_path { > - local_lock_t lock; > - char __rcu *buf; > - local_t buf_idx; > -}; > -static DEFINE_PER_CPU(struct memcg_path, memcg_paths) =3D { > - .lock =3D INIT_LOCAL_LOCK(lock), > - .buf_idx =3D LOCAL_INIT(0), > -}; > - > -static char **tmp_bufs; > - > -/* Called with reg_lock held. */ > -static void free_memcg_path_bufs(void) > -{ > - struct memcg_path *memcg_path; > - int cpu; > - char **old =3D tmp_bufs; > - > - for_each_possible_cpu(cpu) { > - memcg_path =3D per_cpu_ptr(&memcg_paths, cpu); > - *(old++) =3D rcu_dereference_protected(memcg_path->buf, > - lockdep_is_held(®_lock)); > - rcu_assign_pointer(memcg_path->buf, NULL); > - } > - > - /* Wait for inflight memcg_path_buf users to finish. */ > - synchronize_rcu(); > - > - old =3D tmp_bufs; > - for_each_possible_cpu(cpu) { > - kfree(*(old++)); > - } > - > - kfree(tmp_bufs); > - tmp_bufs =3D NULL; > -} > - > int trace_mmap_lock_reg(void) > { > - int cpu; > - char *new; > - > - mutex_lock(®_lock); > - > - /* If the refcount is going 0->1, proceed with allocating buffers= . */ > - if (reg_refcount++) > - goto out; > - > - tmp_bufs =3D kmalloc_array(num_possible_cpus(), sizeof(*tmp_bufs)= , > - GFP_KERNEL); > - if (tmp_bufs =3D=3D NULL) > - goto out_fail; > - > - for_each_possible_cpu(cpu) { > - new =3D kmalloc(MEMCG_PATH_BUF_SIZE * CONTEXT_COUNT, GFP_= KERNEL); > - if (new =3D=3D NULL) > - goto out_fail_free; > - rcu_assign_pointer(per_cpu_ptr(&memcg_paths, cpu)->buf, n= ew); > - /* Don't need to wait for inflights, they'd have gotten N= ULL. */ > - } > - > -out: > - mutex_unlock(®_lock); > + atomic_inc(®_refcount); > return 0; > - > -out_fail_free: > - free_memcg_path_bufs(); > -out_fail: > - /* Since we failed, undo the earlier ref increment. */ > - --reg_refcount; > - > - mutex_unlock(®_lock); > - return -ENOMEM; > } > > void trace_mmap_lock_unreg(void) > { > - mutex_lock(®_lock); > - > - /* If the refcount is going 1->0, proceed with freeing buffers. *= / > - if (--reg_refcount) > - goto out; > - > - free_memcg_path_bufs(); > - > -out: > - mutex_unlock(®_lock); > -} > - > -static inline char *get_memcg_path_buf(void) > -{ > - struct memcg_path *memcg_path =3D this_cpu_ptr(&memcg_paths); > - char *buf; > - int idx; > - > - rcu_read_lock(); > - buf =3D rcu_dereference(memcg_path->buf); > - if (buf =3D=3D NULL) { > - rcu_read_unlock(); > - return NULL; > - } > - idx =3D local_add_return(MEMCG_PATH_BUF_SIZE, &memcg_path->buf_id= x) - > - MEMCG_PATH_BUF_SIZE; > - return &buf[idx]; > + atomic_dec(®_refcount); > } > > -static inline void put_memcg_path_buf(void) > -{ > - local_sub(MEMCG_PATH_BUF_SIZE, &this_cpu_ptr(&memcg_paths)->buf_i= dx); > - rcu_read_unlock(); > -} > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) = \ > - do { = \ > - const char *memcg_path; = \ > - local_lock(&memcg_paths.lock); = \ > - memcg_path =3D get_mm_memcg_path(mm); = \ > - trace_mmap_lock_##type(mm, = \ > - memcg_path !=3D NULL ? memcg_path = : "", \ > - ##__VA_ARGS__); = \ > - if (likely(memcg_path !=3D NULL)) = \ > - put_memcg_path_buf(); = \ > - local_unlock(&memcg_paths.lock); = \ > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > + do { \ > + char buf[MEMCG_PATH_BUF_SIZE]; \ > + get_mm_memcg_path(mm, buf, sizeof(buf)); \ > + trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ > } while (0) > > #else /* !CONFIG_MEMCG */ > @@ -185,37 +64,23 @@ void trace_mmap_lock_unreg(void) > #ifdef CONFIG_TRACING > #ifdef CONFIG_MEMCG > /* > - * Write the given mm_struct's memcg path to a percpu buffer, and return= a > - * pointer to it. If the path cannot be determined, or no buffer was ava= ilable > - * (because the trace event is being unregistered), NULL is returned. > - * > - * Note: buffers are allocated per-cpu to avoid locking, so preemption m= ust be > - * disabled by the caller before calling us, and re-enabled only after t= he > - * caller is done with the pointer. > - * > - * The caller must call put_memcg_path_buf() once the buffer is no longe= r > - * needed. This must be done while preemption is still disabled. > + * Write the given mm_struct's memcg path to a buffer. If the path canno= t be > + * determined or the trace event is being unregistered, empty string is = written. > */ > -static const char *get_mm_memcg_path(struct mm_struct *mm) > +static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t bu= flen) > { > - char *buf =3D NULL; > - struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(mm); > + struct mem_cgroup *memcg; > > + buf[0] =3D '\0'; > + /* No need to get path if no trace event is registered. */ > + if (!atomic_read(®_refcount)) > + return; > + memcg =3D get_mem_cgroup_from_mm(mm); > if (memcg =3D=3D NULL) > - goto out; > - if (unlikely(memcg->css.cgroup =3D=3D NULL)) > - goto out_put; > - > - buf =3D get_memcg_path_buf(); > - if (buf =3D=3D NULL) > - goto out_put; > - > - cgroup_path(memcg->css.cgroup, buf, MEMCG_PATH_BUF_SIZE); > - > -out_put: > + return; > + if (memcg->css.cgroup) > + cgroup_path(memcg->css.cgroup, buf, buflen); > css_put(&memcg->css); > -out: > - return buf; > } > > #endif /* CONFIG_MEMCG */ > -- > 2.43.0 >