From: Nadav Amit <namit@vmware.com>
To: Byungchul Park <byungchul@sk.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
"kernel_team@skhynix.com" <kernel_team@skhynix.com>,
Andrew Morton <akpm@linux-foundation.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"xhao@linux.alibaba.com" <xhao@linux.alibaba.com>,
"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
"hughd@google.com" <hughd@google.com>,
"willy@infradead.org" <willy@infradead.org>,
"david@redhat.com" <david@redhat.com>,
"peterz@infradead.org" <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration
Date: Thu, 9 Nov 2023 10:16:57 +0000 [thread overview]
Message-ID: <C47A7C40-BE3E-4F0F-B854-D40D4795A236@vmware.com> (raw)
In-Reply-To: <20231108041208.GA40954@system.software.com>
> On Nov 8, 2023, at 6:12 AM, Byungchul Park <byungchul@sk.com> wrote:
>
> !! External Email
>
> On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote:
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 6c264d2f969c..75dc48b6e15f 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>>>> if (vmf->page)
>>>> folio = page_folio(vmf->page);
>>>>
>>>> + /*
>>>> + * This folio has its read copy to prevent inconsistency while
>>>> + * deferring TLB flushes. However, the problem might arise if
>>>> + * it's going to become writable.
>>>> + *
>>>> + * To prevent it, give up the deferring TLB flushes and perform
>>>> + * TLB flush right away.
>>>> + */
>>>> + if (folio && migrc_pending_folio(folio)) {
>>>> + migrc_unpend_folio(folio);
>>>> + migrc_try_flush_free_folios(NULL);
>>>
>>> So many potential function calls… Probably they should have been combined
>>> into one and at least migrc_pending_folio() should have been an inline
>>> function in the header.
>>
>> I will try to change it as you mention.
>>
>>>> + }
>>>> +
>>>
>>> What about mprotect? I thought David has changed it so it can set writable
>>> PTEs.
>>
>> I will check it out.
>
> I found mprotect stuff is already performing TLB flushes needed for it.
> So some redundant TLB flushes might happen by migrc but it's not that
> harmful I think. Thanks.
Let me explain the scenario I am concerned with. Assume page P is RO, and
moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0).
Let’s also assume we also have an atomic variable “a”. Initially (a == 0).
I hope I got the migration function names right, but I hope the problem
itself can be clear regardless.
CPU0 CPU1 CPU2 CPU3
---- ---- ---- ----
(user-mode) (user-mode)
Access *p
[Psrc cached in TLB]
migrate_pages_batch()
-> migrate_folio_unmap()
[ PTE updated,
still no flush ]
mprotect(p,
RW)
[ Psrc is
RW ]
[ flush
deferred]
*p = 1 # Pdst
xchg(&a, 1)
mfence
if (a == 1)
assert(*p == 1);
Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas
CPU1 reads from Psrc. But based on x86 memory model, userspace might not
expect this scenario to be possible, hence leading to bugs.
next prev parent reply other threads:[~2023-11-09 10:17 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 7:25 [v3 0/3] Reduce TLB flushes under some specific conditions Byungchul Park
2023-10-30 7:25 ` [v3 1/3] mm/rmap: Recognize non-writable TLB entries during TLB batch flush Byungchul Park
2023-10-30 7:52 ` Nadav Amit
2023-10-30 10:26 ` Byungchul Park
2023-10-30 7:25 ` [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Byungchul Park
2023-10-30 8:00 ` David Hildenbrand
2023-10-30 9:58 ` Byungchul Park
2023-11-01 3:06 ` Huang, Ying
2023-10-30 8:50 ` Nadav Amit
2023-10-30 12:51 ` Byungchul Park
2023-10-30 15:58 ` Nadav Amit
2023-10-30 22:40 ` Byungchul Park
2023-11-08 4:12 ` Byungchul Park
2023-11-09 10:16 ` Nadav Amit [this message]
2023-11-10 1:02 ` Byungchul Park
2023-11-10 3:13 ` Byungchul Park
2023-11-10 22:18 ` Nadav Amit
2023-11-15 5:48 ` Byungchul Park
2023-11-09 5:35 ` Byungchul Park
2023-10-30 7:25 ` [v3 3/3] mm, migrc: Add a sysctl knob to enable/disable MIGRC mechanism Byungchul Park
2023-10-30 8:51 ` Nadav Amit
2023-10-30 10:36 ` Byungchul Park
2023-10-30 17:55 ` [v3 0/3] Reduce TLB flushes under some specific conditions Dave Hansen
2023-10-30 18:32 ` Nadav Amit
2023-10-30 22:55 ` Byungchul Park
2023-10-31 8:46 ` David Hildenbrand
2023-10-31 2:37 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C47A7C40-BE3E-4F0F-B854-D40D4795A236@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=byungchul@sk.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox