linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Punit Agrawal <punit.agrawal@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: will.deacon@arm.com, mark.rutland@arm.com, linux-mm@kvack.org,
	David Woods <dwoods@mellanox.com>,
	linux-arm-kernel@lists.infradead.org,
	Steve Capper <steve.capper@arm.com>
Subject: Re: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries
Date: Fri, 18 Aug 2017 11:30:22 +0100	[thread overview]
Message-ID: <87shgpnp6p.fsf@e105922-lin.cambridge.arm.com> (raw)
In-Reply-To: <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> (Catalin Marinas's message of "Thu, 17 Aug 2017 19:03:11 +0100")

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Thu, Aug 10, 2017 at 06:09:01PM +0100, Punit Agrawal wrote:
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
>>  	return CONT_PTES;
>>  }
>>  
>> +/*
>> + * Changing some bits of contiguous entries requires us to follow a
>> + * Break-Before-Make approach, breaking the whole contiguous set
>> + * before we can change any entries. See ARM DDI 0487A.k_iss10775,
>> + * "Misprogramming of the Contiguous bit", page D4-1762.
>> + *
>> + * This helper performs the break step.
>> + */
>> +static pte_t get_clear_flush(struct mm_struct *mm,
>> +			     unsigned long addr,
>> +			     pte_t *ptep,
>> +			     unsigned long pgsize,
>> +			     unsigned long ncontig)
>> +{
>> +	unsigned long i, saddr = addr;
>> +	struct vm_area_struct vma = { .vm_mm = mm };
>> +	pte_t orig_pte = huge_ptep_get(ptep);
>> +
>> +	/*
>> +	 * If we already have a faulting entry then we don't need
>> +	 * to break before make (there won't be a tlb entry cached).
>> +	 */
>> +	if (!pte_present(orig_pte))
>> +		return orig_pte;
>
> I first thought we could relax this check to pte_valid() as we don't
> care about the PROT_NONE case for hardware page table updates. However,
> I realised that we call this where we expect the pte to be entirely
> cleared but we simply skip it if !present (e.g. swap entry). Is this
> correct?

I've checked back and come to the conclusion that get_clear_flush() will
not get called with swap entries.

In the case of huge_ptep_get_and_clear() below, the callers
(__unmap_hugepage_range() and hugetlb_change_protection()) check for
swap entries before calling. Similarly 

I'll relax the check to pte_valid().

>
>> +
>> +	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
>> +
>> +		/*
>> +		 * If HW_AFDBM is enabled, then the HW could turn on
>> +		 * the dirty bit for any page in the set, so check
>> +		 * them all.  All hugetlb entries are already young.
>> +		 */
>> +		if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +	}
>> +
>> +	flush_tlb_range(&vma, saddr, addr);
>> +	return orig_pte;
>> +}
>
> It would be better if you do something like
>
> 	bool valid = pte_valid(org_pte);
> 	...
> 	if (valid)
> 		flush_tlb_range(...);

With the above change of pte_present() to pte_valid() this isn't needed
anymore.

>
>> +
>> +static void clear_flush(struct mm_struct *mm,
>> +			     unsigned long addr,
>> +			     pte_t *ptep,
>> +			     unsigned long pgsize,
>> +			     unsigned long ncontig)
>> +{
>> +	unsigned long i, saddr = addr;
>> +	struct vm_area_struct vma = { .vm_mm = mm };
>> +
>> +	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
>> +		pte_clear(mm, addr, ptep);
>> +
>> +	flush_tlb_range(&vma, saddr, addr);
>> +}
>> +
>>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  			    pte_t *ptep, pte_t pte)
>>  {
>> @@ -93,6 +149,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  	dpfn = pgsize >> PAGE_SHIFT;
>>  	hugeprot = pte_pgprot(pte);
>>  
>> +	clear_flush(mm, addr, ptep, pgsize, ncontig);
>> +
>>  	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
>>  		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
>>  			 pte_val(pfn_pte(pfn, hugeprot)));
>> @@ -194,7 +252,7 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
>>  pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>>  			      unsigned long addr, pte_t *ptep)
>>  {
>> -	int ncontig, i;
>> +	int ncontig;
>>  	size_t pgsize;
>>  	pte_t orig_pte = huge_ptep_get(ptep);
>>  
>> @@ -202,17 +260,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>>  		return ptep_get_and_clear(mm, addr, ptep);
>>  
>>  	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>> -	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>> -		/*
>> -		 * If HW_AFDBM is enabled, then the HW could
>> -		 * turn on the dirty bit for any of the page
>> -		 * in the set, so check them all.
>> -		 */
>> -		if (pte_dirty(ptep_get_and_clear(mm, addr, ptep)))
>> -			orig_pte = pte_mkdirty(orig_pte);
>> -	}
>>  
>> -	return orig_pte;
>> +	return get_clear_flush(mm, addr, ptep, pgsize, ncontig);
>>  }
>
> E.g. here you don't always clear the pte if a swap entry.
>
>>  
>>  int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> @@ -222,6 +271,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>>  	int ncontig, i, changed = 0;
>>  	size_t pgsize = 0;
>>  	unsigned long pfn = pte_pfn(pte), dpfn;
>> +	pte_t orig_pte;
>>  	pgprot_t hugeprot;
>>  
>>  	if (!pte_cont(pte))
>> @@ -229,12 +279,18 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>>  
>>  	ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize);
>>  	dpfn = pgsize >> PAGE_SHIFT;
>> -	hugeprot = pte_pgprot(pte);
>>  
>> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
>> -		changed |= ptep_set_access_flags(vma, addr, ptep,
>> -				pfn_pte(pfn, hugeprot), dirty);
>> -	}
>> +	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
>> +	if (!pte_same(orig_pte, pte))
>> +		changed = 1;
>> +
>> +	/* Make sure we don't lose the dirty state */
>> +	if (pte_dirty(orig_pte))
>> +		pte = pte_mkdirty(pte);
>> +
>> +	hugeprot = pte_pgprot(pte);
>> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
>> +		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
>>  
>>  	return changed;
>>  }
>> @@ -244,6 +300,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>>  {
>>  	int ncontig, i;
>>  	size_t pgsize;
>> +	pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte;
>
> I'm not particularly fond of too many function calls in the variable
> initialisation part. I would rather keep pte_wrprotect further down
> where you also make it "dirty".
>
>> +	unsigned long pfn = pte_pfn(pte), dpfn;
>> +	pgprot_t hugeprot;
>>  
>>  	if (!pte_cont(*ptep)) {
>>  		ptep_set_wrprotect(mm, addr, ptep);
>> @@ -251,14 +310,21 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>>  	}
>>  
>>  	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>> -		ptep_set_wrprotect(mm, addr, ptep);
>> +	dpfn = pgsize >> PAGE_SHIFT;
>> +
>> +	orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
>
> Can you not use just set pte here instead of deriving it from *ptep
> early on?
>
> 	pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
> 	pte = pte_wrprotect(pte);
>

I've simplified this locally now.

I'll run through a few tests and post a new version.

Thanks for review.
Punit

>> +	if (pte_dirty(orig_pte))
>> +		pte = pte_mkdirty(pte);
>> +
>> +	hugeprot = pte_pgprot(pte);
>> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
>> +		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
>>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-18 10:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-10 17:08 [PATCH v6 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
2017-08-10 17:08 ` [PATCH v6 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Punit Agrawal
2017-08-10 17:08 ` [PATCH v6 2/9] arm64: hugetlb: Introduce pte_pgprot helper Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 3/9] arm64: hugetlb: Spring clean huge pte accessors Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Punit Agrawal
2017-08-17 18:03   ` Catalin Marinas
2017-08-18 10:30     ` Punit Agrawal [this message]
2017-08-18 10:43       ` Catalin Marinas
2017-08-18 12:48         ` Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Punit Agrawal
2017-08-18 11:20   ` Catalin Marinas
2017-08-18 13:49     ` Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 6/9] arm64: hugetlb: Override huge_pte_clear() to support " Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() " Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 8/9] arm64: Re-enable support for " Punit Agrawal
2017-08-10 17:09 ` [PATCH v6 9/9] arm64: hugetlb: Cleanup setup_hugepagesz Punit Agrawal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shgpnp6p.fsf@e105922-lin.cambridge.arm.com \
    --to=punit.agrawal@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dwoods@mellanox.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=steve.capper@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox