From: Aaron Lu <aaron.lu@intel.com>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"Hansen, Dave" <dave.hansen@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
"song@kernel.org" <song@kernel.org>
Subject: Re: [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present
Date: Thu, 11 Aug 2022 20:28:37 +0800 [thread overview]
Message-ID: <3516813d-5d2a-821a-81e8-1ed78ad63561@intel.com> (raw)
In-Reply-To: <YvToWsNqXudd6cSN@hyeyoo>
On 8/11/2022 7:30 PM, Hyeonggon Yoo wrote:
> On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote:
>> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
>>> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
>>>> For configs that don't have PTI enabled or cpus that don't need
>>>> meltdown mitigation, current kernel can lose GLOBAL bit after a page
>>>> goes through a cycle of present -> not present -> present.
>>>>
>>>> It happened like this(__vunmap() does this in vm_remove_mappings()):
>>>> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
>>>> set_memory_np(page, 1): 0x8000000000000062 (NX/D/A/RW) lose G and P
>>>> set_memory_p(pagem 1): 0x8000000000000063 (NX/D/A/RW/P) restored P
>>>>
>>>> In the end, this page's protection no longer has Global bit set and this
>>>> would create problem for this merge small mapping feature.
>>>>
>>>> For this reason, restore Global bit for systems that do not have PTI
>>>> enabled if page is present.
>>>>
>>>> (pgprot_clear_protnone_bits() deserves a better name if this patch is
>>>> acceptible but first, I would like to get some feedback if this is the
>>>> right way to solve this so I didn't bother with the name yet)
>>>>
>>>> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
>>>> ---
>>>> arch/x86/mm/pat/set_memory.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>>> index 1abd5438f126..33657a54670a 100644
>>>> --- a/arch/x86/mm/pat/set_memory.c
>>>> +++ b/arch/x86/mm/pat/set_memory.c
>>>> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
>>>> */
>>>> if (!(pgprot_val(prot) & _PAGE_PRESENT))
>>>> pgprot_val(prot) &= ~_PAGE_GLOBAL;
>>>> + else
>>>> + pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
>>>>
>>>> return prot;
>>>> }
>>>
>>> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
>>>
>>
>> Yes. Is this a problem?
>> I think that is the intended behaviour when PTI is on: not to enable
>> Gloabl bit on kernel mappings.
>
> Please note that I'm not expert on PTI.
>
> but AFAIK with PTI, at least everything (kernel part) mapped to user page table is
> mapped as global when PGE is supported.
>
> Not sure "Global bit is never used for kernel part when PTI is enabled"
> is true.
>
> Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced
> pgprot_clear_protnone_bits() says:
>
> This unconditional setting of _PAGE_GLOBAL is a problem when we have
> PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
> not.
>
> This updated version of the code says:
> 1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
> 2. Never set _PAGE_GLOBAL implicitly
> 3. Allow _PAGE_GLOBAL to be in cpa.set_mask
> 4. Allow _PAGE_GLOBAL to be inherited from previous PTE
>
Thanks for these info, I'll need to take a closer look at PTI.
>>> Maybe it would be less intrusive to make
>>> set_direct_map_default_noflush() replace protection bits
>>> with PAGE_KENREL as it's only called for direct map, and the function
>>> is to reset permission to default:
>>>
>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>> index 1abd5438f126..0dd4433c1382 100644
>>> --- a/arch/x86/mm/pat/set_memory.c
>>> +++ b/arch/x86/mm/pat/set_memory.c
>>> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
>>>
>>> int set_direct_map_default_noflush(struct page *page)
>>> {
>>> - return __set_pages_p(page, 1);
>>> + unsigned long tempaddr = (unsigned long) page_address(page);
>>> + struct cpa_data cpa = {
>>> + .vaddr = &tempaddr,
>>> + .pgd = NULL,
>>> + .numpages = 1,
>>> + .mask_set = PAGE_KERNEL,
>>> + .mask_clr = __pgprot(~0),
>
> Nah, this sets _PAGE_ENC unconditionally, which should be evaluated.
> Maybe less intrusive way would be:
> .mask_set = __pgprot(_PAGE_PRESENT |
> (_PAGE_GLOBAL & __kernel_default_pte_mask)),
> .mask_clr = __pgprot(0),
>
>>> + .flags = 0};
>>> +
>>> + return __change_page_attr_set_clr(&cpa, 0);
>>> }
>>
>> Looks reasonable to me and it is indeed less intrusive. I'm only
>> concerned there might be other paths that also go through present ->
>> not present -> present and this change can not cover them.
>>
>
> AFAIK other paths going through present->not present->present (using CPA)
> is only when DEBUG_PAGEALLOC is used.
>
> Do we care direct map fragmentation when using DEBUG_PAGEALLOC?
>
No, direct mapping does not use large page mapping when DEBUG_PAGEALLOC.
>>>
>>> set_direct_map_{invalid,default}_noflush() is the exact reason
>>> why direct map become split after vmalloc/vfree with special
>>> permissions.
>>
>> Yes I agree, because it can lose G bit after the whole cycle when PTI
>> is not on. When PTI is on, there is no such problem because G bit is
>> not there initially.
>>
>> Thanks,
>> Aaron
>
next prev parent reply other threads:[~2022-08-11 12:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-08 14:56 [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present Aaron Lu
2022-08-11 5:21 ` Hyeonggon Yoo
2022-08-11 8:16 ` Lu, Aaron
2022-08-11 11:30 ` Hyeonggon Yoo
2022-08-11 12:28 ` Aaron Lu [this message]
2022-08-08 14:56 ` [RFC PATCH 2/4] x86/mm/cpa: merge splitted direct mapping when possible Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 3/4] x86/mm/cpa: add merge event counter Aaron Lu
2022-08-08 14:56 ` [TEST NOT_FOR_MERGE 4/4] x86/mm/cpa: add a test interface to split direct map Aaron Lu
2022-08-09 10:04 ` [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Kirill A. Shutemov
2022-08-09 14:58 ` Aaron Lu
2022-08-09 17:56 ` Kirill A. Shutemov
2022-08-11 4:50 ` Hyeonggon Yoo
2022-08-11 7:50 ` Lu, Aaron
2022-08-13 16:05 ` Mike Rapoport
2022-08-16 6:33 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3516813d-5d2a-821a-81e8-1ed78ad63561@intel.com \
--to=aaron.lu@intel.com \
--cc=42.hyeyoo@gmail.com \
--cc=dave.hansen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rick.p.edgecombe@intel.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox