linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: James Houghton <jthoughton@google.com>,
	Steve Capper <steve.capper@arm.com>,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs
Date: Tue, 5 Dec 2023 14:43:01 +0000	[thread overview]
Message-ID: <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> (raw)
In-Reply-To: <20231204172646.2541916-1-jthoughton@google.com>

On 04/12/2023 17:26, James Houghton wrote:
> It is currently possible for a userspace application to enter a page
> fault loop when using HugeTLB pages implemented with contiguous PTEs
> when HAFDBS is not available. This happens because:
> 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
>    (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).

Hi James,

Do you know how this happens?

AFAIK, this is the set of valid bit combinations, and
PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is
to understand how this is happening and prevent it?

/*
 * PTE bits configuration in the presence of hardware Dirty Bit Management
 * (PTE_WRITE == PTE_DBM):
 *
 * Dirty  Writable | PTE_RDONLY  PTE_WRITE  PTE_DIRTY (sw)
 *   0      0      |   1           0          0
 *   0      1      |   1           1          0
 *   1      0      |   1           0          1
 *   1      1      |   0           1          x
 *
 * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via
 * the page fault mechanism. Checking the dirty status of a pte becomes:
 *
 *   PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
 */


Thanks,
Ryan


> 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
>    the memory access on a system without HAFDBS, we will get a page
>    fault.
> 3. HugeTLB will check if it needs to update the dirty bits on the PTE.
>    For contiguous PTEs, it will check to see if the pgprot bits need
>    updating. In this case, HugeTLB wants to write a sequence of
>    sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
>    to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
>    so it thinks no update is necessary.
> 
> Please see this[1] reproducer.
> 
> I think (though I may be wrong) that both step (1) and step (3) are
> buggy.
> 
> The first patch in this series fixes step (3); instead of checking if
> pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty
> and pte_sw_dirty separately.
> 
> The second patch in this series makes step (1) less likely to occur.
> Without this patch, we can get the kernel to write a sw-dirty, hw-clean
> PTE with the following steps (showing the relevant VMA flags and pgprot
> bits):
> i.   Create a valid, writable contiguous PTE.
>        VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
>        VMA pgprot bits: PTE_RDONLY | PTE_WRITE
>        PTE pgprot bits: PTE_DIRTY | PTE_WRITE
> ii.  mprotect the VMA to PROT_NONE.
>        VMA vmflags:     VM_SHARED
>        VMA pgprot bits: PTE_RDONLY
>        PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
> iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
>        VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
>        VMA pgprot bits: PTE_RDONLY | PTE_WRITE
>        PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY
> 
> Applying either one of the two patches in this patchset will fix the
> particular issue with HugeTLB pages implemented with contiguous PTEs.
> It's possible that only one of these patches should be taken, or that
> the right fix is something else entirely.
> 
> [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0
> 
> James Houghton (2):
>   arm64: hugetlb: Distinguish between hw and sw dirtiness in
>     __cont_access_flags_changed
>   arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
> 
>  arch/arm64/include/asm/pgtable.h | 6 ++++++
>  arch/arm64/mm/hugetlbpage.c      | 5 ++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> 
> base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4



  parent reply	other threads:[~2023-12-05 14:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 17:26 James Houghton
2023-12-04 17:26 ` [PATCH 1/2] arm64: hugetlb: Distinguish between hw and sw dirtiness in __cont_access_flags_changed James Houghton
2023-12-04 17:26 ` [PATCH 2/2] arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify James Houghton
2023-12-06 10:26   ` Ryan Roberts
2023-12-11 18:42   ` Will Deacon
2023-12-11 19:01     ` James Houghton
2023-12-05 14:43 ` Ryan Roberts [this message]
2023-12-05 17:54   ` [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs James Houghton
2023-12-06 10:24     ` Ryan Roberts
2023-12-06 21:01       ` James Houghton
2023-12-12 17:22 ` (subset) " Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    --cc=steve.capper@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox