linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Tycho Andersen <tycho@docker.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel-hardening@lists.openwall.com,
	Marco Benatto <marco.antonio.780@gmail.com>,
	Juerg Haefliger <juerg.haefliger@canonical.com>,
	x86@kernel.org
Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)
Date: Wed, 20 Sep 2017 16:21:15 -0700	[thread overview]
Message-ID: <97475308-1f3d-ea91-5647-39231f3b40e5@intel.com> (raw)
In-Reply-To: <20170920223452.vam3egenc533rcta@smitten>

On 09/20/2017 03:34 PM, Tycho Andersen wrote:
>> I really have to wonder whether there are better ret2dir defenses than
>> this.  The allocator just seems like the *wrong* place to be doing this
>> because it's such a hot path.
> 
> This might be crazy, but what if we defer flushing of the kernel
> ranges until just before we return to userspace? We'd still manipulate
> the prot/xpfo bits for the pages, but then just keep a list of which
> ranges need to be flushed, and do the right thing before we return.
> This leaves a little window between the actual allocation and the
> flush, but userspace would need another thread in its threadgroup to
> predict the next allocation, write the bad stuff there, and do the
> exploit all in that window.

I think the common case is still that you enter the kernel, allocate a
single page (or very few) and then exit.  So, you don't really reduce
the total number of flushes.

Just think of this in terms of IPIs to do the remote TLB flushes.  A CPU
can do roughly 1 million page faults and allocations a second.  Say you
have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads.
That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH*
*CPU*.

I think the only thing that will really help here is if you batch the
allocations.  For instance, you could make sure that the per-cpu-pageset
lists always contain either all kernel or all user data.  Then remap the
entire list at once and do a single flush after the entire list is consumed.

>> Why do you even bother keeping large pages around?  Won't the entire
>> kernel just degrade to using 4k everywhere, eventually?
> 
> Isn't that true of large pages in general? Is there something about
> xpfo that makes this worse? I thought this would only split things if
> they had already been split somewhere else, and the protection can't
> apply to the whole huge page.

Even though the kernel gives out 4k pages, it still *maps* them in the
kernel linear direct map with the largest size available.  My 16GB
laptop, for instance, has 3GB of 2MB transparent huge pages, but the
rest is used as 4k pages.  Yet, from /proc/meminfo:

DirectMap4k:      665280 kB
DirectMap2M:    11315200 kB
DirectMap1G:     4194304 kB

Your code pretty much forces 4k pages coming out of the allocator to be
mapped with 4k mappings.
>>> +inline void xpfo_flush_kernel_tlb(struct page *page, int order)
>>> +{
>>> +	int level;
>>> +	unsigned long size, kaddr;
>>> +
>>> +	kaddr = (unsigned long)page_address(page);
>>> +
>>> +	if (unlikely(!lookup_address(kaddr, &level))) {
>>> +		WARN(1, "xpfo: invalid address to flush %lx %d\n", kaddr, level);
>>> +		return;
>>> +	}
>>> +
>>> +	switch (level) {
>>> +	case PG_LEVEL_4K:
>>> +		size = PAGE_SIZE;
>>> +		break;
>>> +	case PG_LEVEL_2M:
>>> +		size = PMD_SIZE;
>>> +		break;
>>> +	case PG_LEVEL_1G:
>>> +		size = PUD_SIZE;
>>> +		break;
>>> +	default:
>>> +		WARN(1, "xpfo: unsupported page level %x\n", level);
>>> +		return;
>>> +	}
>>> +
>>> +	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
>>> +}
>>
>> I'm not sure flush_tlb_kernel_range() is the best primitive to be
>> calling here.
>>
>> Let's say you walk the page tables and find level=PG_LEVEL_1G.  You call
>> flush_tlb_kernel_range(), you will be above
>> tlb_single_page_flush_ceiling, and you will do a full TLB flush.  But,
>> with a 1GB page, you could have just used a single INVLPG and skipped
>> the global flush.
>>
>> I guess the cost of the IPI is way more than the flush itself, but it's
>> still a shame to toss the entire TLB when you don't have to.
> 
> Ok, do you think it's worth making a new helper for others to use? Or
> should I just keep the logic in this function?

I'd just leave it in place.  Most folks already have a PTE when they do
the invalidation, so this is a bit of a weirdo.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-09-20 23:21 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-07 17:35 [PATCH v6 00/11] Add support for eXclusive Page Frame Ownership Tycho Andersen
2017-09-07 17:35 ` [PATCH v6 01/11] mm: add MAP_HUGETLB support to vm_mmap Tycho Andersen
2017-09-08  7:42   ` Christoph Hellwig
2017-09-07 17:36 ` [PATCH v6 02/11] x86: always set IF before oopsing from page fault Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) Tycho Andersen
2017-09-07 18:33   ` Ralph Campbell
2017-09-07 18:50     ` Tycho Andersen
2017-09-08  7:51   ` Christoph Hellwig
2017-09-08 14:58     ` Tycho Andersen
2017-09-09 15:35   ` Laura Abbott
2017-09-11 15:03     ` Tycho Andersen
2017-09-11  7:24   ` Yisheng Xie
2017-09-11 14:50     ` Tycho Andersen
2017-09-11 16:03       ` Juerg Haefliger
2017-09-11 16:59         ` Tycho Andersen
2017-09-12  8:05         ` Yisheng Xie
2017-09-12 14:36           ` Tycho Andersen
2017-09-12 18:13             ` Tycho Andersen
2017-09-14  6:15               ` Yisheng Xie
2017-09-20 23:46               ` Dave Hansen
2017-09-21  0:02                 ` Tycho Andersen
2017-09-21  0:04                   ` Dave Hansen
2017-09-11 18:32   ` Tycho Andersen
2017-09-11 21:54     ` Marco Benatto
2017-09-20 15:48   ` Dave Hansen
2017-09-20 22:34     ` Tycho Andersen
2017-09-20 23:21       ` Dave Hansen [this message]
2017-09-21  0:09         ` Tycho Andersen
2017-09-21  0:27           ` Dave Hansen
2017-09-21  1:37             ` Tycho Andersen
2017-11-10  1:09             ` Tycho Andersen
2017-11-13 22:20               ` Dave Hansen
2017-11-13 22:46                 ` Dave Hansen
2017-11-15  0:33                   ` [kernel-hardening] " Tycho Andersen
2017-11-15  0:37                     ` Dave Hansen
2017-11-15  0:42                       ` Tycho Andersen
2017-11-15  3:44                   ` Matthew Wilcox
2017-11-15  7:00                     ` Dave Hansen
2017-11-15 14:58                       ` Matthew Wilcox
2017-11-15 16:20                         ` [kernel-hardening] " Tycho Andersen
2017-11-15 21:34                           ` Matthew Wilcox
2017-09-21  0:03   ` Dave Hansen
2017-09-21  0:28   ` Dave Hansen
2017-09-21  1:04     ` Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 04/11] swiotlb: Map the buffer if it was unmapped by XPFO Tycho Andersen
2017-09-07 18:10   ` Christoph Hellwig
2017-09-07 18:44     ` Tycho Andersen
2017-09-08  7:13       ` Christoph Hellwig
2017-09-07 17:36 ` [PATCH v6 05/11] arm64/mm: Add support for XPFO Tycho Andersen
2017-09-08  7:53   ` Christoph Hellwig
2017-09-08 17:24     ` Tycho Andersen
2017-09-14 10:41       ` Julien Grall
2017-09-14 11:29         ` Juergen Gross
2017-09-14 18:22   ` [kernel-hardening] " Mark Rutland
2017-09-18 21:27     ` Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 06/11] xpfo: add primitives for mapping underlying memory Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 07/11] arm64/mm, xpfo: temporarily map dcache regions Tycho Andersen
2017-09-14 18:25   ` Mark Rutland
2017-09-18 21:29     ` Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 08/11] arm64/mm: Add support for XPFO to swiotlb Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 09/11] arm64/mm: disable section/contiguous mappings if XPFO is enabled Tycho Andersen
2017-09-09 15:38   ` Laura Abbott
2017-09-07 17:36 ` [PATCH v6 10/11] mm: add a user_virt_to_phys symbol Tycho Andersen
2017-09-08  7:55   ` Christoph Hellwig
2017-09-08 15:44     ` Kees Cook
2017-09-11  7:36       ` Christoph Hellwig
2017-09-14 18:34   ` [kernel-hardening] " Mark Rutland
2017-09-18 20:56     ` Tycho Andersen
2017-09-07 17:36 ` [PATCH v6 11/11] lkdtm: Add test for XPFO Tycho Andersen
2017-09-07 19:08   ` Kees Cook
2017-09-10  0:57   ` kbuild test robot
2017-09-11 10:34 ` [PATCH v6 00/11] Add support for eXclusive Page Frame Ownership Yisheng Xie
2017-09-11 15:02   ` Tycho Andersen
2017-09-12  7:07     ` Yisheng Xie
2017-09-12  7:40       ` Juerg Haefliger
2017-09-12  8:11         ` Yisheng Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97475308-1f3d-ea91-5647-39231f3b40e5@intel.com \
    --to=dave.hansen@intel.com \
    --cc=juerg.haefliger@canonical.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marco.antonio.780@gmail.com \
    --cc=tycho@docker.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox