linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>,
	linux-trace-users@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: Bug: broken /proc/kcore in 6.13
Date: Fri, 17 Jan 2025 14:44:04 +0000	[thread overview]
Message-ID: <4fd7e1a3-f7ff-4b9d-9a53-fb73795b5b3d@lucifer.local> (raw)
In-Reply-To: <20250117084038.79f40307@gandalf.local.home>

On Fri, Jan 17, 2025 at 08:40:38AM -0500, Steven Rostedt wrote:
>
> [ Cc'ing the proper folks ]
>
> -- Steve

Thanks Steve!

>
>
> On Fri, 17 Jan 2025 11:36:05 +0100
> Alexandre Ferrieux <alexandre.ferrieux@gmail.com> wrote:
>
> > Hi,
> >
> > Somewhere in the 6.13 branch (not bisected yet, sorry), it stopped being
> > possible to disassemble the running kernel from gdb through /proc/kcore.

Thanks for the report! Much appreciated.

I may try to bisect here also unless you're close to finding the commit that
broke this?

> >
> > More precisely:
> >
> >  - look up a function in /proc/kallsyms => 0xADDRESS
> >  - tell gdb to "core /proc/kcore"
> >  - tell gdb to "disass 0xADDRESS,+LENGTH" (no need for a symbol table)
> >
> >  * if the function is within the main kernel text, it is okay
> >  * if the function is within a module's text, an infinite loop happens:
> >
> >
> > Example:
> >
> >  # egrep -w ice_process_skb_fields\|ksys_write /proc/kallsyms
> >  ffffffffaf296c80 T ksys_write
> >  ffffffffc0b67180 t ice_process_skb_fields       [ice]
> >
> >  # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffaf296c80,+256" -ex quit
> >  ...
> >  Dump of assembler code from 0xffffffffaf296c80 to 0xffffffffaf296d80:
> >    ...
> >  End of assembler dump.
> >
> >  # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffc0b67180,+256" -ex quit
> >  ...
> >  Dump of assembler code from 0xffffffffc0b67180 to 0xffffffffc0b67280:
> >  (***NOTHING***)
> >  ^C <= inefficient, need kill -9
> >
> >
> > Ftrace (see below) shows in this case read_kcore_iter() calls vread_iter() in an
> > infinite loop:
> >
> >         while (true) {
> >                 read += vread_iter(iter, src, left);
> >                 if (read == tsz)
> >                         break;
> >
> >                 src += read;
> >                 left -= read;
> >
> >                 if (fault_in_iov_iter_writeable(iter, left)) {
> >                         ret = -EFAULT;
> >                         goto out;
> >                 }
> >         }
> >
> > As it turns out, in the offending situation, vread_iter() keeps returning 0,
> > with "read" staying at its initial value of 0, and "tsz" nonzero. As a
> > consequence, "src" stays stuck in a place where vread_iter() fails.
> >

Yikes, this is my fault. Sorry about that!

There was some discussion at the time about the infinite loop, obviously with
the understanding that vread_iter() should never return 0 in this scenario
(where we had identified the _category_ of kernel memory being accessed), which
is obviously now rendered false.

The fact that it can is (obviously) rather problematic... obviously we need to
patch this, if this were possible in real scenarios in the past we would
probably also want to backport a fix.

In any case I think we need an explicit check here no matter the cause so we can
never loop like this. This was just an oversight at the time given this is a
documented behaviour.

My instinct is to error out if this returns 0, because that would indicate that
the address is not part of the vmalloc area.

But then it seems add_modules_range() is just adding the module range under
category KCORE_VMALLOC despite it not being in the vmalloc range :/ which is
really odd. This was added a long time ago so clearly not what triggered this
but odd.

In any case, let me go have a look at this...

> > A cursory "git blame" shows that this interplay (vread_iter() legitimately
> > returning zero, and read_kcore_iter() *not* testing it) has been there from
> > quite some time. So, while this is arguably fragile, possibly the new situation
> > lies in the actual memory layout that triggers the failing path.
> >
> > Thanks for any insight, as this completely breaks debugging the running kernel
> > in 6.13.

Apologies again. Let's figure this out and get this fixed!

Cheers, Lorenzo

> >
> > -Alex
> >
> >
> > ------------
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 0/0   #P:48
> > #
> > #           TASK-PID     CPU#     TIMESTAMP  FUNCTION
> > #              | |         |         |         |
> >            <...>-3304    [045]    487.295283: kprobe_read_kcore_iter:
> > (read_kcore_iter+0x4/0xae0) pos=0x7fffc0b6b000
> >            <...>-3304    [045]    487.295298: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> >            <...>-3304    [045]    487.295326: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> >            <...>-3304    [045]    487.295329: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> >            <...>-3304    [045]    487.295338: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> >            <...>-3304    [045]    487.295339: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> >            <...>-3304    [045]    487.295345: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> >            <...>-3304    [045]    487.295347: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> >            <...>-3304    [045]    487.295352: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> >            <...>-3304    [045]    487.295353: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > ...
> >
>


  reply	other threads:[~2025-01-17 14:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <05ea473e-d7e9-4ca5-ad91-ba8c00618fb4@orange.com>
2025-01-17 13:40 ` Steven Rostedt
2025-01-17 14:44   ` Lorenzo Stoakes [this message]
2025-01-17 15:19     ` Alexandre Ferrieux
2025-01-17 15:28       ` Alexandre Ferrieux
2025-01-17 16:31         ` Lorenzo Stoakes
2025-01-17 18:13           ` Lorenzo Stoakes
2025-01-17 19:27             ` Alexandre Ferrieux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fd7e1a3-f7ff-4b9d-9a53-fb73795b5b3d@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=alexandre.ferrieux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-users@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox