linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"song@kernel.org" <song@kernel.org>
Subject: Re: [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present
Date: Thu, 11 Aug 2022 20:28:37 +0800	[thread overview]
Message-ID: <3516813d-5d2a-821a-81e8-1ed78ad63561@intel.com> (raw)
In-Reply-To: <YvToWsNqXudd6cSN@hyeyoo>

On 8/11/2022 7:30 PM, Hyeonggon Yoo wrote:
> On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote:
>> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
>>> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
>>>> For configs that don't have PTI enabled or cpus that don't need
>>>> meltdown mitigation, current kernel can lose GLOBAL bit after a page
>>>> goes through a cycle of present -> not present -> present.
>>>>
>>>> It happened like this(__vunmap() does this in vm_remove_mappings()):
>>>> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
>>>> set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
>>>> set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P
>>>>
>>>> In the end, this page's protection no longer has Global bit set and this
>>>> would create problem for this merge small mapping feature.
>>>>
>>>> For this reason, restore Global bit for systems that do not have PTI
>>>> enabled if page is present.
>>>>
>>>> (pgprot_clear_protnone_bits() deserves a better name if this patch is
>>>> acceptible but first, I would like to get some feedback if this is the
>>>> right way to solve this so I didn't bother with the name yet)
>>>>
>>>> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
>>>> ---
>>>>  arch/x86/mm/pat/set_memory.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>>> index 1abd5438f126..33657a54670a 100644
>>>> --- a/arch/x86/mm/pat/set_memory.c
>>>> +++ b/arch/x86/mm/pat/set_memory.c
>>>> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
>>>>  	 */
>>>>  	if (!(pgprot_val(prot) & _PAGE_PRESENT))
>>>>  		pgprot_val(prot) &= ~_PAGE_GLOBAL;
>>>> +	else
>>>> +		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
>>>>  
>>>>  	return prot;
>>>>  }
>>>
>>> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
>>>
>>
>> Yes. Is this a problem?
>> I think that is the intended behaviour when PTI is on: not to enable
>> Gloabl bit on kernel mappings.
> 
> Please note that I'm not expert on PTI.
> 
> but AFAIK with PTI, at least everything (kernel part) mapped to user page table is
> mapped as global when PGE is supported.
> 
> Not sure "Global bit is never used for kernel part when PTI is enabled"
> is true.
>
> Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced
> pgprot_clear_protnone_bits() says:
> 	
> 	This unconditional setting of _PAGE_GLOBAL is a problem when we have
> 	PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
> 	not.
> 
> 	This updated version of the code says:
> 	1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
> 	2. Never set _PAGE_GLOBAL implicitly
> 	3. Allow _PAGE_GLOBAL to be in cpa.set_mask
> 	4. Allow _PAGE_GLOBAL to be inherited from previous PTE
>

Thanks for these info, I'll need to take a closer look at PTI.

>>> Maybe it would be less intrusive to make
>>> set_direct_map_default_noflush() replace protection bits
>>> with PAGE_KENREL as it's only called for direct map, and the function
>>> is to reset permission to default:
>>>
>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>> index 1abd5438f126..0dd4433c1382 100644
>>> --- a/arch/x86/mm/pat/set_memory.c
>>> +++ b/arch/x86/mm/pat/set_memory.c
>>> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
>>>
>>>  int set_direct_map_default_noflush(struct page *page)
>>>  {
>>> -       return __set_pages_p(page, 1);
>>> +       unsigned long tempaddr = (unsigned long) page_address(page);
>>> +       struct cpa_data cpa = {
>>> +                       .vaddr = &tempaddr,
>>> +                       .pgd = NULL,
>>> +                       .numpages = 1,
>>> +                       .mask_set = PAGE_KERNEL,
>>> +                       .mask_clr = __pgprot(~0),
> 
> Nah, this sets _PAGE_ENC unconditionally, which should be evaluated.
> Maybe less intrusive way would be:
> 		       .mask_set = __pgprot(_PAGE_PRESENT |
> 					   (_PAGE_GLOBAL & __kernel_default_pte_mask)),
>                        .mask_clr = __pgprot(0),
> 
>>> +                       .flags = 0};
>>> +
>>> +       return __change_page_attr_set_clr(&cpa, 0);
>>>  }
>>
>> Looks reasonable to me and it is indeed less intrusive. I'm only
>> concerned there might be other paths that also go through present ->
>> not present -> present and this change can not cover them.
>>
> 
> AFAIK other paths going through present->not present->present (using CPA)
> is only when DEBUG_PAGEALLOC is used.
> 
> Do we care direct map fragmentation when using DEBUG_PAGEALLOC?
> 

No, direct mapping does not use large page mapping when DEBUG_PAGEALLOC.

>>>
>>> set_direct_map_{invalid,default}_noflush() is the exact reason
>>> why direct map become split after vmalloc/vfree with special
>>> permissions.
>>
>> Yes I agree, because it can lose G bit after the whole cycle when PTI
>> is not on. When PTI is on, there is no such problem because G bit is
>> not there initially.
>>
>> Thanks,
>> Aaron
> 


  reply	other threads:[~2022-08-11 12:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 14:56 [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present Aaron Lu
2022-08-11  5:21   ` Hyeonggon Yoo
2022-08-11  8:16     ` Lu, Aaron
2022-08-11 11:30       ` Hyeonggon Yoo
2022-08-11 12:28         ` Aaron Lu [this message]
2022-08-08 14:56 ` [RFC PATCH 2/4] x86/mm/cpa: merge splitted direct mapping when possible Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 3/4] x86/mm/cpa: add merge event counter Aaron Lu
2022-08-08 14:56 ` [TEST NOT_FOR_MERGE 4/4] x86/mm/cpa: add a test interface to split direct map Aaron Lu
2022-08-09 10:04 ` [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Kirill A. Shutemov
2022-08-09 14:58   ` Aaron Lu
2022-08-09 17:56     ` Kirill A. Shutemov
2022-08-11  4:50 ` Hyeonggon Yoo
2022-08-11  7:50   ` Lu, Aaron
2022-08-13 16:05   ` Mike Rapoport
2022-08-16  6:33     ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3516813d-5d2a-821a-81e8-1ed78ad63561@intel.com \
    --to=aaron.lu@intel.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox