From: Dmitry Vyukov <dvyukov@google.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>,
Alexander Potapenko <glider@google.com>,
Tejun Heo <tj@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
kasan-dev <kasan-dev@googlegroups.com>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: kasan: paging percpu + kasan causes a double fault
Date: Thu, 18 Jul 2019 17:51:37 +0200 [thread overview]
Message-ID: <CACT4Y+YevDd-y4Au33=mr-0-UQPy8NR0vmG8zSiCfmzx6gTB-w@mail.gmail.com> (raw)
In-Reply-To: <20190708150532.GB17098@dennisz-mbp>
On Mon, Jul 8, 2019 at 5:05 PM Dennis Zhou <dennis@kernel.org> wrote:
>
> Hi Andrey, Alexander, and Dmitry,
>
> It was reported to me that when percpu is ran with param
> percpu_alloc=page or the embed allocation scheme fails and falls back to
> page that a double fault occurs.
>
> I don't know much about how kasan works, but a difference between the
> two is that we manually reserve vm area via vm_area_register_early().
> I guessed it had something to do with the stack canary or the irq_stack,
> and manually mapped the shadow vm area with kasan_add_zero_shadow(), but
> that didn't seem to do the trick.
>
> RIP resolves to the fixed_percpu_data declaration.
>
> Double fault below:
> [ 0.000000] PANIC: double fault, error_code: 0x0
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc7-00007-ge0afe6d4d12c-dirty #299
> [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [ 0.000000] RIP: 0010:no_context+0x38/0x4b0
> [ 0.000000] Code: df 41 57 41 56 4c 8d bf 88 00 00 00 41 55 49 89 d5 41 54 49 89 f4 55 48 89 fd 4c8
> [ 0.000000] RSP: 0000:ffffc8ffffffff28 EFLAGS: 00010096
> [ 0.000000] RAX: dffffc0000000000 RBX: ffffc8ffffffff50 RCX: 000000000000000b
> [ 0.000000] RDX: fffff52000000030 RSI: 0000000000000003 RDI: ffffc90000000130
> [ 0.000000] RBP: ffffc900000000a8 R08: 0000000000000001 R09: 0000000000000000
> [ 0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> [ 0.000000] R13: fffff52000000030 R14: 0000000000000000 R15: ffffc90000000130
> [ 0.000000] FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000
> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.000000] CR2: ffffc8ffffffff18 CR3: 0000000002e0d001 CR4: 00000000000606b0
> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.000000] Call Trace:
> [ 0.000000] Kernel panic - not syncing: Machine halted.
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc7-00007-ge0afe6d4d12c-dirty #299
> [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [ 0.000000] Call Trace:
> [ 0.000000] <#DF>
> [ 0.000000] dump_stack+0x5b/0x90
> [ 0.000000] panic+0x17e/0x36e
> [ 0.000000] ? __warn_printk+0xdb/0xdb
> [ 0.000000] ? spurious_kernel_fault_check+0x1a/0x60
> [ 0.000000] df_debug+0x2e/0x39
> [ 0.000000] do_double_fault+0x89/0xb0
> [ 0.000000] double_fault+0x1e/0x30
> [ 0.000000] RIP: 0010:no_context+0x38/0x4b0
> [ 0.000000] Code: df 41 57 41 56 4c 8d bf 88 00 00 00 41 55 49 89 d5 41 54 49 89 f4 55 48 89 fd 4c8
> [ 0.000000] RSP: 0000:ffffc8ffffffff28 EFLAGS: 00010096
> [ 0.000000] RAX: dffffc0000000000 RBX: ffffc8ffffffff50 RCX: 000000000000000b
> [ 0.000000] RDX: fffff52000000030 RSI: 0000000000000003 RDI: ffffc90000000130
> [ 0.000000] RBP: ffffc900000000a8 R08: 0000000000000001 R09: 0000000000000000
> [ 0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> [ 0.000000] R13: fffff52000000030 R14: 0000000000000000 R15: ffffc90000000130
Hi Dennis,
I don't have lots of useful info, but a naive question: could you stop
using percpu_alloc=page with KASAN? That should resolve the problem :)
We could even add a runtime check that will clearly say that this
combintation does not work.
I see that setup_per_cpu_areas is called after kasan_init which is
called from setup_arch. So KASAN should already map final shadow at
that point.
The only potential reason that I see is that setup_per_cpu_areas maps
the percpu region at address that is not covered/expected by
kasan_init. Where is page-based percpu is mapped? Is that covered by
kasan_init?
Otherwise, seeing the full stack trace of the fault may shed some light.
next prev parent reply other threads:[~2019-07-18 15:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 15:05 Dennis Zhou
2019-07-18 15:51 ` Dmitry Vyukov [this message]
2019-07-18 16:20 ` Andrey Ryabinin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACT4Y+YevDd-y4Au33=mr-0-UQPy8NR0vmG8zSiCfmzx6gTB-w@mail.gmail.com' \
--to=dvyukov@google.com \
--cc=aryabinin@virtuozzo.com \
--cc=dennis@kernel.org \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox