linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kai Huang <kai.huang@linux.intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv5 05/19] mm/page_alloc: Handle allocation for encrypted memory
Date: Wed, 18 Jul 2018 16:03:53 -0700	[thread overview]
Message-ID: <95ce19cb-332c-44f5-b3a1-6cfebd870127@intel.com> (raw)
In-Reply-To: <20180717112029.42378-6-kirill.shutemov@linux.intel.com>

I asked about this before and it still isn't covered in the description:
You were specifically asked (maybe in person at LSF/MM?) not to modify
allocator to pass the keyid around.  Please specifically mention how
this design addresses that feedback in the patch description.

You were told, "don't change the core allocator", so I think you just
added new functions that wrap the core allocator and called them from
the majority of sites that call into the core allocator.  Personally, I
think that misses the point of the original request.

Do I have a better way?  Nope, not really.

> +/*
> + * Encrypted page has to be cleared once keyid is set, not on allocation.
> + */
> +static inline bool encrypted_page_needs_zero(int keyid, gfp_t *gfp_mask)
> +{
> +	if (!keyid)
> +		return false;
> +
> +	if (*gfp_mask & __GFP_ZERO) {
> +		*gfp_mask &= ~__GFP_ZERO;
> +		return true;
> +	}
> +
> +	return false;
> +}

Shouldn't this be zero_page_at_alloc()?

Otherwise, it gets confusing about whether the page needs zeroing at
*all*, vs at alloc vs. free.

> +static inline struct page *alloc_pages_node_keyid(int nid, int keyid,
> +		gfp_t gfp_mask, unsigned int order)
> +{
> +	if (nid == NUMA_NO_NODE)
> +		nid = numa_mem_id();
> +
> +	return __alloc_pages_node_keyid(nid, keyid, gfp_mask, order);
> +}

We have an innumerable number of (__)?alloc_pages* functions.  This adds
two more.  I'm not a big fan of making this worse.

Do I have a better idea?  Not really.  The best I have is to start being
more careful about all of the arguments and actually formalize the list
of things that we need to succeed in an allocation in a struct
alloc_args or something.

>  #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
>  #define alloc_page_vma(gfp_mask, vma, addr)			\
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index f2b4abbca55e..fede9bfa89d9 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -38,9 +38,15 @@ static inline struct page *new_page_nodemask(struct page *page,
>  	unsigned int order = 0;
>  	struct page *new_page = NULL;
>  
> -	if (PageHuge(page))
> +	if (PageHuge(page)) {
> +		/*
> +		 * HugeTLB doesn't support encryption. We shouldn't see
> +		 * such pages.
> +		 */
> +		WARN_ON(page_keyid(page));
>  		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
>  				preferred_nid, nodemask);
> +	}

Shouldn't we be returning NULL?  Seems like failing the allocation is
much less likely to result in bad things happening.

>  	if (PageTransHuge(page)) {
>  		gfp_mask |= GFP_TRANSHUGE;
> @@ -50,8 +56,8 @@ static inline struct page *new_page_nodemask(struct page *page,
>  	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
>  		gfp_mask |= __GFP_HIGHMEM;
>  
> -	new_page = __alloc_pages_nodemask(gfp_mask, order,
> -				preferred_nid, nodemask);
> +	new_page = __alloc_pages_nodemask_keyid(gfp_mask, order,
> +				preferred_nid, nodemask, page_keyid(page));

Needs a comment please.  It's totally non-obvious that this is the
migration case from the context, new_page_nodemask()'s name, or the name
of 'page'.

	/* Allocate a page with the same KeyID as the source page */

> diff --git a/mm/compaction.c b/mm/compaction.c
> index faca45ebe62d..fd51aa32ad96 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1187,6 +1187,7 @@ static struct page *compaction_alloc(struct page *migratepage,
>  	list_del(&freepage->lru);
>  	cc->nr_freepages--;
>  
> +	prep_encrypted_page(freepage, 0, page_keyid(migratepage), false);
>  	return freepage;
>  }

Comments, please.

Why is this here?  What other code might need prep_encrypted_page()?

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 581b729e05a0..ce7b436444b5 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -921,22 +921,28 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
>  /* page allocation callback for NUMA node migration */
>  struct page *alloc_new_node_page(struct page *page, unsigned long node)
>  {
> -	if (PageHuge(page))
> +	if (PageHuge(page)) {
> +		/*
> +		 * HugeTLB doesn't support encryption. We shouldn't see
> +		 * such pages.
> +		 */
> +		WARN_ON(page_keyid(page));
>  		return alloc_huge_page_node(page_hstate(compound_head(page)),
>  					node);
> -	else if (PageTransHuge(page)) {
> +	} else if (PageTransHuge(page)) {
>  		struct page *thp;
>  
> -		thp = alloc_pages_node(node,
> +		thp = alloc_pages_node_keyid(node, page_keyid(page),
>  			(GFP_TRANSHUGE | __GFP_THISNODE),
>  			HPAGE_PMD_ORDER);
>  		if (!thp)
>  			return NULL;
>  		prep_transhuge_page(thp);
>  		return thp;
> -	} else
> -		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
> -						    __GFP_THISNODE, 0);
> +	} else {
> +		return __alloc_pages_node_keyid(node, page_keyid(page),
> +				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
> +	}
>  }
>  
>  /*
> @@ -2013,9 +2019,16 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
>  {
>  	struct mempolicy *pol;
>  	struct page *page;
> -	int preferred_nid;
> +	bool zero = false;
> +	int keyid, preferred_nid;
>  	nodemask_t *nmask;
>  
> +	keyid = vma_keyid(vma);
> +	if (keyid && (gfp & __GFP_ZERO)) {
> +		zero = true;
> +		gfp &= ~__GFP_ZERO;
> +	}

Comments, please.  'zero' should be 'deferred_zero', at least.

Also, can't we hide this a _bit_ better?

	if (deferred_page_zero(vma))
		gfp &= ~__GFP_ZERO;

Then, later:

	deferred_page_prep(vma, page, order);

and hide everything in deferred_page_zero() and deferred_page_prep().


> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3697,6 +3697,39 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> +#ifndef CONFIG_NUMA
> +struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		int node, bool hugepage)
> +{
> +	struct page *page;
> +	bool need_zero;
> +	int keyid = vma_keyid(vma);
> +
> +	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
> +	page = alloc_pages(gfp_mask, order);
> +	prep_encrypted_page(page, order, keyid, need_zero);
> +
> +	return page;
> +}
> +#endif

Is there *ever* a VMA-based allocation that doesn't need zeroing?

> +struct page * __alloc_pages_node_keyid(int nid, int keyid,
> +		gfp_t gfp_mask, unsigned int order)
> +{
> +	struct page *page;
> +	bool need_zero;
> +
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> +	VM_WARN_ON(!node_online(nid));
> +
> +	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
> +	page = __alloc_pages(gfp_mask, order, nid);
> +	prep_encrypted_page(page, order, keyid, need_zero);
> +
> +	return page;
> +}
> +
>  #ifdef CONFIG_LOCKDEP
>  static struct lockdep_map __fs_reclaim_map =
>  	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
> @@ -4401,6 +4434,20 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  }
>  EXPORT_SYMBOL(__alloc_pages_nodemask);
>  
> +struct page *
> +__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, int keyid)
> +{
> +	struct page *page;
> +	bool need_zero;
> +
> +	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
> +	page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask);
> +	prep_encrypted_page(page, order, keyid, need_zero);
> +	return page;
> +}
> +EXPORT_SYMBOL(__alloc_pages_nodemask_keyid);

That looks like three duplicates of the same code, wrapping three more
allocator variants.  Do we really have no other alternatives?  Can you
please go ask the folks that gave you the feedback about the allocator
modifications and ask them if this is OK explicitly?

  reply	other threads:[~2018-07-18 23:03 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17 11:20 [PATCHv5 00/19] MKTME enabling Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 01/19] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 02/19] mm: Do not use zero page in encrypted pages Kirill A. Shutemov
2018-07-18 17:36   ` Dave Hansen
2018-07-19  7:16     ` Kirill A. Shutemov
2018-07-19 13:58       ` Dave Hansen
2018-07-20 12:16         ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 03/19] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
2018-07-18 17:38   ` Dave Hansen
2018-07-19  7:32     ` Kirill A. Shutemov
2018-07-19 14:02       ` Dave Hansen
2018-07-20 12:23         ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 04/19] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
2018-07-18 17:43   ` Dave Hansen
2018-07-17 11:20 ` [PATCHv5 05/19] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
2018-07-18 23:03   ` Dave Hansen [this message]
2018-07-19  8:27     ` Kirill A. Shutemov
2018-07-19 14:05       ` Dave Hansen
2018-07-20 12:25         ` Kirill A. Shutemov
2018-07-26 14:25       ` Michal Hocko
2018-07-17 11:20 ` [PATCHv5 06/19] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
2018-07-18 23:11   ` Dave Hansen
2018-07-19  8:59     ` Kirill A. Shutemov
2018-07-19 14:13       ` Dave Hansen
2018-07-20 12:29         ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 07/19] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
2018-07-18 23:13   ` Dave Hansen
2018-07-19  9:54     ` Kirill A. Shutemov
2018-07-19 14:19       ` Dave Hansen
2018-07-20 12:31         ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 08/19] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
2018-07-18 23:19   ` Dave Hansen
2018-07-19 10:21     ` Kirill A. Shutemov
2018-07-19 12:37       ` Thomas Gleixner
2018-07-19 13:12         ` Kirill A. Shutemov
2018-07-19 13:18           ` Thomas Gleixner
2018-07-19 13:23             ` Kirill A. Shutemov
2018-07-19 13:40               ` Thomas Gleixner
2018-07-20 12:34                 ` Kirill A. Shutemov
2018-07-20 13:17                   ` Thomas Gleixner
2018-07-20 13:40                     ` Kirill A. Shutemov
2018-07-19 14:23       ` Dave Hansen
2018-07-20 12:34         ` Kirill A. Shutemov
2018-07-31  0:08   ` Kai Huang
2018-07-17 11:20 ` [PATCHv5 09/19] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
2018-07-18 23:30   ` Dave Hansen
2018-07-20 12:42     ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 10/19] x86/mm: Implement page_keyid() using page_ext Kirill A. Shutemov
2018-07-18 23:38   ` Dave Hansen
2018-07-23  9:45     ` Kirill A. Shutemov
2018-07-23 17:22       ` Alison Schofield
2018-07-17 11:20 ` [PATCHv5 11/19] x86/mm: Implement vma_keyid() Kirill A. Shutemov
2018-07-18 23:40   ` Dave Hansen
2018-07-23  9:47     ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 12/19] x86/mm: Implement prep_encrypted_page() and arch_free_page() Kirill A. Shutemov
2018-07-18 23:53   ` Dave Hansen
2018-07-23  9:50     ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 13/19] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 14/19] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 15/19] x86/mm: Detect MKTME early Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 16/19] x86/mm: Calculate direct mapping size Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 17/19] x86/mm: Implement sync_direct_mapping() Kirill A. Shutemov
2018-07-19  0:01   ` Dave Hansen
2018-07-23 10:04     ` Kirill A. Shutemov
2018-07-23 12:25       ` Dave Hansen
2018-07-17 11:20 ` [PATCHv5 18/19] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
2018-07-18 22:21   ` Thomas Gleixner
2018-07-23 10:12     ` Kirill A. Shutemov
2018-07-26 17:26       ` Dave Hansen
2018-07-27 13:49         ` Kirill A. Shutemov
2018-07-17 11:20 ` [PATCHv5 19/19] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
2018-08-15  7:48   ` Pavel Machek
2018-08-17  9:24     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95ce19cb-332c-44f5-b3a1-6cfebd870127@intel.com \
    --to=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=kai.huang@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox