linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH mm-unstable v1] mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
Date: Fri, 21 Apr 2023 10:30:42 +1000	[thread overview]
Message-ID: <87v8hqdqlb.fsf@nvidia.com> (raw)
In-Reply-To: <20230418142113.439494-1-david@redhat.com>


David Hildenbrand <david@redhat.com> writes:

> Staring at the comment "Recheck VMA as permissions can change since
> migration started" in remove_migration_pte() can result in confusion,
> because if the source PTE/PMD indicates write permissions, then there
> should be no need to check VMA write permissions when restoring migration
> entries or PTE-mapping a PMD.

Thanks David, I have oft wondered about that but not stared at it to the
point of confusion. The change looks correct to me so feel free to add:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

For the mm/migrate.c parts. Also presumably if mprotect(PROT_READ) was a
problem then mprotect(PROT_NONE) would also need some kind of special
handling which I don't see.

> Commit d3cb8bf6081b ("mm: migrate: Close race between migration completion
> and mprotect") introduced the maybe_mkwrite() handling in
> remove_migration_pte() in 2014, stating that a race between mprotect() and
> migration finishing would be possible, and that we could end up with
> a writable PTE that should be readable.
>
> However, mprotect() code first updates vma->vm_flags / vma->vm_page_prot
> and then walks the page tables to (a) set all present writable PTEs to
> read-only and (b) convert all writable migration entries to readable
> migration entries. While walking the page tables and modifying the
> entries, migration code has to grab the PT locks to synchronize against
> concurrent page table modifications.
>
> Assuming migration would find a writable migration entry (while holding
> the PT lock) and replace it with a writable present PTE, surely mprotect()
> code didn't stumble over the writable migration entry yet (converting it
> into a readable migration entry) and would instead wait for the PT lock to
> convert the now present writable PTE into a read-only PTE. As mprotect()
> didn't finish yet, the behavior is just like migration didn't happen: a
> writable PTE will be converted to a read-only PTE.
>
> So it's fine to rely on the writability information in the source
> PTE/PMD and not recheck against the VMA as long as we're holding the PT
> lock to synchronize with anyone who concurrently wants to downgrade write
> permissions (like mprotect()) by first adjusting vma->vm_flags /
> vma->vm_page_prot to then walk over the page tables to adjust the page
> table entries.
>
> Running test cases that should reveal such races -- mprotect(PROT_READ)
> racing with page migration or THP splitting -- for multiple hours did
> not reveal an issue with this cleanup.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>
> This is a follow-up cleanup to [1]:
> 	[PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not
> 	unconditionally allow for write access
>
> I wanted to be a bit careful and write some test cases to convince myself
> that I am not missing something important. Of course, there is still the
> possibility that my test cases are buggy ;)
>
> Test cases I'm running:
> 	https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_migration.c
> 	https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_thp_split.c
>
>
> [1] https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
>
> ---
>  mm/huge_memory.c | 4 ++--
>  mm/migrate.c     | 5 +----
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c23fa39dec92..624671aaa60d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		} else {
>  			entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
>  			if (write)
> -				entry = maybe_mkwrite(entry, vma);
> +				entry = pte_mkwrite(entry);
>  			if (anon_exclusive)
>  				SetPageAnonExclusive(page + i);
>  			if (!young)
> @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
>  	if (pmd_swp_soft_dirty(*pvmw->pmd))
>  		pmde = pmd_mksoft_dirty(pmde);
>  	if (is_writable_migration_entry(entry))
> -		pmde = maybe_pmd_mkwrite(pmde, vma);
> +		pmde = pmd_mkwrite(pmde);
>  	if (pmd_swp_uffd_wp(*pvmw->pmd))
>  		pmde = pmd_mkuffd_wp(pmde);
>  	if (!is_migration_entry_young(entry))
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5d95e09b1618..02cace7955d4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -213,16 +213,13 @@ static bool remove_migration_pte(struct folio *folio,
>  		if (pte_swp_soft_dirty(*pvmw.pte))
>  			pte = pte_mksoft_dirty(pte);
>  
> -		/*
> -		 * Recheck VMA as permissions can change since migration started
> -		 */
>  		entry = pte_to_swp_entry(*pvmw.pte);
>  		if (!is_migration_entry_young(entry))
>  			pte = pte_mkold(pte);
>  		if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
>  			pte = pte_mkdirty(pte);
>  		if (is_writable_migration_entry(entry))
> -			pte = maybe_mkwrite(pte, vma);
> +			pte = pte_mkwrite(pte);
>  		else if (pte_swp_uffd_wp(*pvmw.pte))
>  			pte = pte_mkuffd_wp(pte);



      parent reply	other threads:[~2023-04-21  0:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 14:21 David Hildenbrand
2023-04-18 15:56 ` Peter Xu
2023-04-18 15:57   ` David Hildenbrand
2023-04-18 19:01   ` Kirill A. Shutemov
2023-04-21  0:30 ` Alistair Popple [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8hqdqlb.fsf@nvidia.com \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox