From: Alessandro Carminati <acarmina@redhat.com>
To: "Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Catalin Marinas" <catalin.marinas@arm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev,
Alessandro Carminati <alessandro.carminati@gmail.com>,
Juri Lelli <juri.lelli@redhat.com>,
Gabriele Paoloni <gpaoloni@redhat.com>
Subject: Re: [RFC] mm/kmemleak: Fix sleeping function called from invalid context at print message
Date: Tue, 19 Nov 2024 16:45:03 +0100 [thread overview]
Message-ID: <CAGegRW4hMhF=zPeRDCgt0g5uTnYvHLSnA7Kw5Lwdsrr5ruYkjQ@mail.gmail.com> (raw)
In-Reply-To: <20241118151351-5a465d40-b8d0-45e4-bf9b-6d3add8ce98b@linutronix.de>
[-- Attachment #1: Type: text/plain, Size: 14774 bytes --]
Hello Thomas,
Thanks for your suggestion.
On Mon, Nov 18, 2024 at 3:29 PM Thomas Weißschuh <
thomas.weissschuh@linutronix.de> wrote:
> On Fri, Nov 15, 2024 at 02:54:10PM +0000, Alessandro Carminati wrote:
> > Address a bug in the kernel that triggers a "sleeping function called
> from
> > invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> > specific conditions:
> > - CONFIG_PREEMPT_RT=y
> > - Set SELinux as the LSM for the system
> > - Set kptr_restrict to 1
> > - kmemleak buffer contains at least one item
> > Ensure the kmemleak buffer contains at least one item
> > Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> > kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> > where kmemleak_seq_show is executed in atomic context within the RT
> kernel.
> > However, the SELinux capability check in this function flow still relies
> on
> > regular spinlocks, leading to potential race conditions that trigger an
> > error when printing the kmemleak backtrace.
> > Move the backtrace printing out of the critical section. Use a stack
> > variable to store the backtrace pointers, avoiding spinlocks in the
> atomic
> > context.
> >
> > Implement delta encoding to minimize the stack memory footprint,
> > addressing the potentially large memory demands for storing these
> pointers
> > on 64-bit systems.
>
> The stacktrace is already stored in the stackdepot.
> Shouldn't it be possible to take a reference to the stackdepot entry
> inside the critical section and then use that reference outside of the
> critical section for printing?
>
Yes, it is indeed possible to do that.
However, kmemleak operates in such a way that entries can be deleted, for
example, if they are found to be false positives.
My concern was that using a reference to a potentially deleted entry could
cause problems. But after considering your suggestion, I verified that
stack_depot_put, which is used to delete a stack depot entry, does not
appear to be called when a kmemleak object is deleted.
This makes me question whether that is the intended behavior.
As things currently stand, it seems possible to remove all the code I added
to store the trace in the stack and instead directly use the stack_depot
handle.
I would appreciate feedback from kmemleak and stackdepot experts regarding
this approach.
>
> > Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
> > ---
> > ```
> > [ 159.247069] BUG: sleeping function called from invalid context at
> kernel/locking/spinlock_rt.c:48
> > [ 159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
> 136, name: cat
> > [ 159.247241] preempt_count: 1, expected: 0
> > [ 159.247277] RCU nest depth: 2, expected: 2
> > [ 159.247388] 6 locks held by cat/136:
> > [ 159.247438] #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at:
> seq_read_iter+0xb8/0xe30
> > [ 159.248835] #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at:
> kmemleak_seq_start+0x34/0x128
> > [ 159.249053] #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at:
> kmemleak_seq_show+0x3c/0x1e0
> > [ 159.249127] #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at:
> has_ns_capability_noaudit+0x8/0x1b0
> > [ 159.249205] #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at:
> avc_compute_av+0xc4/0x3d0
> > [ 159.249364] irq event stamp: 136660
> > [ 159.249407] hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>]
> _raw_spin_unlock_irqrestore+0xa8/0xd8
> > [ 159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>]
> _raw_spin_lock_irqsave+0x8c/0xb0
> > [ 159.249518] softirqs last enabled at (0): [<ffffafe6a5d50b28>]
> copy_process+0x11d8/0x3df8
> > [ 159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [ 159.249970] Preemption disabled at:
> > [ 159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> > [ 159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E
> 6.11.0-rt7+ #34
> > [ 159.250797] Tainted: [E]=UNSIGNED_MODULE
> > [ 159.250822] Hardware name: linux,dummy-virt (DT)
> > [ 159.251050] Call trace:
> > [ 159.251079] dump_backtrace+0xa0/0x128
> > [ 159.251132] show_stack+0x1c/0x30
> > [ 159.251156] dump_stack_lvl+0xe8/0x198
> > [ 159.251180] dump_stack+0x18/0x20
> > [ 159.251227] rt_spin_lock+0x8c/0x1a8
> > [ 159.251273] avc_perm_nonode+0xa0/0x150
> > [ 159.251316] cred_has_capability.isra.0+0x118/0x218
> > [ 159.251340] selinux_capable+0x50/0x80
> > [ 159.251363] security_capable+0x7c/0xd0
> > [ 159.251388] has_ns_capability_noaudit+0x94/0x1b0
> > [ 159.251412] has_capability_noaudit+0x20/0x30
> > [ 159.251437] restricted_pointer+0x21c/0x4b0
> > [ 159.251461] pointer+0x298/0x760
> > [ 159.251482] vsnprintf+0x330/0xf70
> > [ 159.251504] seq_printf+0x178/0x218
> > [ 159.251526] print_unreferenced+0x1a4/0x2d0
> > [ 159.251551] kmemleak_seq_show+0xd0/0x1e0
> > [ 159.251576] seq_read_iter+0x354/0xe30
> > [ 159.251599] seq_read+0x250/0x378
> > [ 159.251622] full_proxy_read+0xd8/0x148
> > [ 159.251649] vfs_read+0x190/0x918
> > [ 159.251672] ksys_read+0xf0/0x1e0
> > [ 159.251693] __arm64_sys_read+0x70/0xa8
> > [ 159.251716] invoke_syscall.constprop.0+0xd4/0x1d8
> > [ 159.251767] el0_svc+0x50/0x158
> > [ 159.251813] el0t_64_sync+0x17c/0x180
> > ```
> > I have considered three potential approaches to address this matter:
> >
> > 1. Remove Raw Pointer Printing
> > The simplest solution is to eliminate raw pointer printing from the
> report.
> > This approach involves minimal changes to the kernel code and is
> > straightforward to implement.
> >
> > While I am confident that omitting the raw address would result in
> > negligible information loss in most scenarios, some may perceive it as a
> > feature regression. Below is an example of the modification:
> > ```
> > - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr);
> > + warn_or_seq_printf(seq, " %pS\n", ptr);
> > ```
> > This change may be acceptable since the %pS format outputs a hex string
> > if no kallsyms are available. However, it modifies the original behavior,
> > and in the kallsyms scenario, the raw pointer would no longer be present.
> >
> > 2. Modify SELinux to Avoid Sleeping Spinlocks
> > Another option is to alter the SELinux capability check to use
> > non-sleeping spinlocks.
> > However, this approach is not advisable. The SELinux capability check is
> > extensively used across the kernel and is far more critical than the
> > kmemleak reporting feature.
> > Adapting it to address this rare issue could unnecessarily introduce
> > latency across the entire kernel, particularly as kmemleak is rarely used
> > in production environments.
> >
> > 3. Move Stack Trace Printing Outside the Atomic Section
> > The third and preferred approach is to move the stack trace printing
> > outside the atomic section. This would preserve the current functionality
> > without modifying SELinux.
> >
> > The primary challenge here lies in making the backtrace pointers
> available
> > after exiting the critical section, as they are captured within it.
> > To address this, the backtrace pointers can be copied to a safe location,
> > enabling access once the raw_spinlock is released.
> >
> > Options for Creating a Safe Location for Backtrace Pointers
> > Several strategies have been considered for storing the backtrace
> pointers
> > safely:
> > * Dynamic Allocation
> > * Allocating memory with kmalloc cannot be done within a raw_spinlock
> > area. Using GFP_ATOMIC is also infeasible.
> > * Since the code that prints the message is inside a loop, executed
> > potentially multiple times, it is only within the raw_spinlock
> > section that we can determine whether allocation is needed.
> > * Allocating and deallocating memory on every loop iteration would be
> > prohibitively expensive.
> > * Global Data Section
> > * In this strategy, the message would be printed after exiting the
> > raw_spinlock protected section.
> > * However, this approach risks data corruption if another occurrence
> > of the issue arises before the first operation completes.
> > * Per-CPU Data
> > * The same concerns as with global data apply here. While data
> > corruption is less likely, it is not impossible.
> > * Stack
> > * Using the stack is the best option since each thread has its own
> > stack, ensuring data isolation. However, the size of the data poses
> > a challenge.
> > * Exporting a full stack trace pointer list requires considerable
> space.
> > A 32-level stack trace in a 64-bit system would require 256 bytes,
> > which is contrary to best practices for stack size management.
> >
> > To mitigate this, I propose using delta encoding to store the addresses.
> > This method reduces the size of each address from 8 bytes to 4 bytes on
> > 64-bit systems. While this introduces some complexity, it significantly
> > reduces memory usage and allows us to preserve the kmemleak reports in
> their
> > current form.
> > mm/kmemleak.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 71 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> > index 0400f5e8ac60..fc5869e09280 100644
> > --- a/mm/kmemleak.c
> > +++ b/mm/kmemleak.c
> > @@ -274,6 +274,44 @@ static void kmemleak_disable(void);
> > pr_warn(fmt, ##__VA_ARGS__); \
> > } while (0)
> >
> > +#define PTR_STORAGE_OP_OK -1
> > +#define PTR_STORAGE_OP_FAIL 0
> > +#define PTR_STORAGE_CAPACITY 32
> > +
> > +struct ptr_storage {
> > + unsigned long base;
> > + u32 data[PTR_STORAGE_CAPACITY];
> > + int nr_entries;
> > +};
> > +
> > +static int ptr_storage_insert(unsigned long p, struct ptr_storage *s)
> > +{
> > + unsigned long diff_data;
> > +
> > + if (s->nr_entries != 0) {
> > + diff_data = s->base - p;
> > + if (s->nr_entries < PTR_STORAGE_CAPACITY) {
> > + s->data[((s->nr_entries - 1))] = diff_data &
> 0xffffffff;
> > + s->nr_entries++;
> > + return PTR_STORAGE_OP_OK;
> > + }
> > + return PTR_STORAGE_OP_FAIL;
> > + }
> > + s->base = p;
> > + s->nr_entries++;
> > + return PTR_STORAGE_OP_OK;
> > +}
> > +
> > +static void *ptr_storage_get(struct ptr_storage *s, int item_no)
> > +{
> > + if (item_no < s->nr_entries && item_no > 0)
> > + return (void *)s->base - (s32)s->data[(item_no - 1)];
> > +
> > + if (item_no == 0)
> > + return (void *)s->base;
> > + return NULL;
> > +}
> > +
> > static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
> > int rowsize, int groupsize, const void
> *buf,
> > size_t len, bool ascii)
> > @@ -357,11 +395,13 @@ static bool unreferenced_object(struct
> kmemleak_object *object)
> > * print_unreferenced function must be called with the object->lock
> held.
> > */
> > static void print_unreferenced(struct seq_file *seq,
> > - struct kmemleak_object *object)
> > + struct kmemleak_object *object,
> > + struct ptr_storage *s)
> > {
> > int i;
> > unsigned long *entries;
> > unsigned int nr_entries;
> > + unsigned long tmp;
> >
> > nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> > warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size
> %zu):\n",
> > @@ -372,8 +412,8 @@ static void print_unreferenced(struct seq_file *seq,
> > warn_or_seq_printf(seq, " backtrace (crc %x):\n",
> object->checksum);
> >
> > for (i = 0; i < nr_entries; i++) {
> > - void *ptr = (void *)entries[i];
> > - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr);
> > + tmp = (unsigned long)entries[i];
> > + ptr_storage_insert(tmp, s);
> > }
> > }
> >
> > @@ -1625,6 +1665,10 @@ static void kmemleak_scan(void)
> > struct zone *zone;
> > int __maybe_unused i;
> > int new_leaks = 0;
> > + struct ptr_storage s = {0};
> > + bool do_print = false;
> > + void *tmp;
> > + int inx;
> >
> > jiffies_last_scan = jiffies;
> >
> > @@ -1783,12 +1827,20 @@ static void kmemleak_scan(void)
> > !(object->flags & OBJECT_REPORTED)) {
> > object->flags |= OBJECT_REPORTED;
> >
> > - if (kmemleak_verbose)
> > - print_unreferenced(NULL, object);
> > + if (kmemleak_verbose) {
> > + print_unreferenced(NULL, object, &s);
> > + do_print = true;
> > + }
> >
> > new_leaks++;
> > }
> > raw_spin_unlock_irq(&object->lock);
> > + if (kmemleak_verbose && do_print) {
> > + for (inx = 0; inx < s.nr_entries; inx++) {
> > + tmp = ptr_storage_get(&s, i);
> > + warn_or_seq_printf(NULL, " [<%pK>]
> %pS\n", tmp, tmp);
> > + }
> > + }
> > }
> > rcu_read_unlock();
> >
> > @@ -1939,11 +1991,23 @@ static int kmemleak_seq_show(struct seq_file
> *seq, void *v)
> > {
> > struct kmemleak_object *object = v;
> > unsigned long flags;
> > + struct ptr_storage s = {0};
> > + void *tmp;
> > + int i;
> > + bool do_print = false;
> >
> > raw_spin_lock_irqsave(&object->lock, flags);
> > - if ((object->flags & OBJECT_REPORTED) &&
> unreferenced_object(object))
> > - print_unreferenced(seq, object);
> > + if ((object->flags & OBJECT_REPORTED) &&
> unreferenced_object(object)) {
> > + print_unreferenced(seq, object, &s);
> > + do_print = true;
> > + }
> > raw_spin_unlock_irqrestore(&object->lock, flags);
> > + if (do_print) {
> > + for (i = 0; i < s.nr_entries; i++) {
> > + tmp = ptr_storage_get(&s, i);
> > + warn_or_seq_printf(seq, " [<%pK>] %pS\n", tmp,
> tmp);
> > + }
> > + }
> > return 0;
> > }
> >
> > --
> > 2.34.1
> >
>
>
--
---
172
[-- Attachment #2: Type: text/html, Size: 18016 bytes --]
next prev parent reply other threads:[~2024-11-19 15:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-15 14:54 Alessandro Carminati
2024-11-18 14:22 ` Thomas Weißschuh
2024-11-19 15:45 ` Alessandro Carminati [this message]
2024-11-19 16:11 ` Thomas Weißschuh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGegRW4hMhF=zPeRDCgt0g5uTnYvHLSnA7Kw5Lwdsrr5ruYkjQ@mail.gmail.com' \
--to=acarmina@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alessandro.carminati@gmail.com \
--cc=andreyknvl@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=catalin.marinas@arm.com \
--cc=clrkwllms@kernel.org \
--cc=gpaoloni@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=thomas.weissschuh@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox