linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"sbsiddha@gmail.com" <sbsiddha@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, Kernel Team <Kernel-team@fb.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@intel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: [PATCH] x86/mm: Do not split_large_page() for set_kernel_text_rw()
Date: Mon, 26 Aug 2019 04:40:23 +0000	[thread overview]
Message-ID: <164D1F08-80F7-4E13-94FC-78F33B3E299F@fb.com> (raw)
In-Reply-To: <20190823093637.GH2369@hirez.programming.kicks-ass.net>

Cc: Steven Rostedt and Suresh Siddha

Hi Peter, 

> On Aug 23, 2019, at 2:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
>> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
>> split_large_page() for all kernel text pages. This means a single kprobe
>> will put all kernel text in 4k pages:
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
>> 
>>  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
>>  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
>> 
>> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
>> in static_protections().
>> 
>> Two helper functions set_text_rw() and set_text_ro() are added to flip
>> _PAGE_RW bit for kernel text.
>> 
>> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
> 
> ARGH; so this is because ftrace flips the whole kernel range to RW and
> back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
> violation.

Thanks for your comments. Yes, it is related to ftrace, as we have
CONFIG_KPROBES_ON_FTRACE. However, after digging around, I am not sure
what is the expected behavior. 

Kernel text region has two mappings to it. For x86_64 and four-level 
page table, there are: 

	1. kernel identity mapping, from 0xffff888000100000; 
	2. kernel text mapping, from 0xffffffff81000000, 

Per comments in arch/x86/mm/init_64.c:set_kernel_text_rw():

        /*
         * Make the kernel identity mapping for text RW. Kernel text
         * mapping will always be RO. Refer to the comment in
         * static_protections() in pageattr.c
         */
	set_memory_rw(start, (end - start) >> PAGE_SHIFT);

kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
mapping. 

However, my experiment shows that kprobe actually operates on the 
kernel text mapping (0xffffffff81000000-). It is the same w/ and w/o 
CONFIG_KPROBES_ON_FTRACE. Therefore, I am not sure whether the comment
is out-dated (10-year old), or the kprobe is doing something wrong. 


More information about the issue we are looking at. 

We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
kernel text mapping into pte-mapped pages. This increases iTLB 
miss rate from about 300 per million instructions to about 700 per
million instructions (for the application I test with). 

Per bisect, we found this behavior happens after commit 585948f4f695 
("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
proposed this PATCH to fix/workaround this issue. However, per
Peter's comment and my study of the code, this doesn't seem the 
real problem or the only here. 

I also tested that the PMD split issue doesn't happen w/o 
CONFIG_KPROBES_ON_FTRACE. 


In summary, I have the following questions:

1. Which mapping should kprobe work on? Kernel identity mapping or 
   kernel text mapping?
2. FTRACE causes split of PMD mapped kernel text. How should we fix
   this? 

Thanks,
Song





  reply	other threads:[~2019-08-26  4:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23  5:23 Song Liu
2019-08-23  9:36 ` Peter Zijlstra
2019-08-26  4:40   ` Song Liu [this message]
2019-08-26  9:23     ` Peter Zijlstra
2019-08-26 15:08       ` Song Liu
2019-08-26 20:50         ` Song Liu
2019-08-26 11:33   ` Steven Rostedt
2019-08-26 12:44     ` Peter Zijlstra
2019-08-26 15:41     ` Nadav Amit
2019-08-26 15:56       ` Steven Rostedt
2019-08-26 16:09         ` Nadav Amit
2019-08-26 15:56       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=164D1F08-80F7-4E13-94FC-78F33B3E299F@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbsiddha@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox