linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/4] thp: fix MADV_DONTNEED vs. numa balancing race
Date: Tue, 23 May 2017 14:42:01 +0200	[thread overview]
Message-ID: <744fe557-3aaf-0010-4b94-5f53c9f28f89@suse.cz> (raw)
In-Reply-To: <20170516202919.GA2843@redhat.com>

On 05/16/2017 10:29 PM, Andrea Arcangeli wrote:
> On Wed, Apr 12, 2017 at 03:33:35PM +0200, Vlastimil Babka wrote:
>>
>> pmdp_invalidate() does:
>>
>>         pmd_t entry = *pmdp;
>>         set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
>>
>> so it's not atomic and if CPU sets dirty or accessed in the middle of
>> this, they will be lost?
> 
> I agree it looks like the dirty bit can be lost. Furthermore this also
> loses a MMU notifier invalidate that will lead to corruption at the
> secondary MMU level (which will keep using the old protection
> permission, potentially keeping writing to a wrprotected page).

Oh, I didn't paste the whole function, just the pmd manipulation.
There's also a third line:

flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);

so there's no missing invalidate, AFAICS? Sorry for the confusion.

>>
>> But I don't see how the other invalidate caller
>> __split_huge_pmd_locked() deals with this either. Andrea, any idea?
> 
> The original code I wrote did this in __split_huge_page_map to create
> the "entry" to establish in the pte pagetables:
> 
>     	       entry = mk_pte(page + i, vma->vm_page_prot);
> 	       entry = maybe_mkwrite(pte_mkdirty(entry),
> 	       	       		   vma);
> 
> For anonymous memory the dirty bit is only meaningful for swapping,
> and THP couldn't be swapped so setting it unconditional avoided any
> issue with the pmdp_invalidate; pmdp_establish.

Yeah, but now we are going to swap THP's, and we have shmem THP's...

> pmdp_invalidate is needed primarily to avoid aliasing of two different
> TLB translation pointing from the same virtual address to the the same
> physical address that triggered machine checks (while needing to keep
> the pmd huge at all times, back then it was also splitting huge,
> splitting is a software bit so userland could still access the data,
> splitting bit only blocked kernel code to manipulate on it similar to
> what migration entry does right now upstream, except those prevent
> userland to access the page during split which is less efficient than
> the splitting bit, but at least it's only used for the physical split,
> back then there was no difference between virtual and physical split
> and physical split is less frequent than the virtual one right now).

This took me a while to grasp, but I think I understand now :)

> It looks like this needs a pmdp_populate that atomically grabs the
> value of the pmd and returns it like pmdp_huge_get_and_clear_notify
> does

pmdp_huge_get_and_clear_notify() is now gone...

> and a _notify variant to use "freeze" is false (if freeze is true
> the MMU notifier invalidate must have happened when the pmd was set to
> a migration entry). If pmdp_populate_notify (freeze==true)
> /pmd_populate (freeze==false) would return the old pmd value
> atomically with xchg() (just instead of setting it to 0 we should set
> it to the mknotpresent one), then we can set the dirty bit on the ptes
> (__split_huge_pmd_locked) or in the pmd itself in the change_huge_pmd
> accordingly.

I think the confusion was partially caused by the comment at the
original caller of pmdp_invalidate():

we first mark the
* current pmd notpresent (atomically because here the pmd_trans_huge
* and pmd_trans_splitting must remain set at all times on the pmd
* until the split is complete for this pmd),

It says "atomically" but in fact that only means that the pmd_trans_huge
and pmd_trans_splitting flags are not temporarily cleared at any point
of time, right? It's not a true atomic swap.

> If the "dirty" flag information is obtained by the pmd read before
> calling pmdp_invalidate is not ok (losing _notify also not ok).

Right.

> Thanks!
> Andrea
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-05-23 12:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 15:10 [PATCH 0/4] thp: fix few MADV_DONTNEED races Kirill A. Shutemov
2017-03-02 15:10 ` [PATCH 1/4] thp: reduce indentation level in change_huge_pmd() Kirill A. Shutemov
2017-04-12 11:37   ` Vlastimil Babka
2017-03-02 15:10 ` [PATCH 2/4] thp: fix MADV_DONTNEED vs. numa balancing race Kirill A. Shutemov
2017-03-03 17:17   ` Dave Hansen
2017-04-12 13:33   ` Vlastimil Babka
2017-05-16 14:54     ` Vlastimil Babka
2017-05-16 20:29     ` Andrea Arcangeli
2017-05-23 12:42       ` Vlastimil Babka [this message]
2017-06-09  8:21         ` Vlastimil Babka
2017-03-02 15:10 ` [PATCH 3/4] thp: fix MADV_DONTNEED vs. MADV_FREE race Kirill A. Shutemov
2017-03-03  5:35   ` Hillf Danton
2017-03-03 10:26     ` Kirill A. Shutemov
2017-03-06  1:44       ` Minchan Kim
2017-03-07 14:04         ` Kirill A. Shutemov
2017-03-08  6:17           ` Minchan Kim
2017-03-06  2:49   ` Aneesh Kumar K.V
2017-03-07 13:52     ` Kirill A. Shutemov
2017-03-02 15:10 ` [PATCH 4/4] thp: fix MADV_DONTNEED vs clear soft dirty race Kirill A. Shutemov
2017-03-03 22:29   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=744fe557-3aaf-0010-4b94-5f53c9f28f89@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox