From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D090C4332F for ; Fri, 10 Nov 2023 03:14:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC4164401BE; Thu, 9 Nov 2023 22:14:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4DA94401BC; Thu, 9 Nov 2023 22:14:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC6154401BE; Thu, 9 Nov 2023 22:14:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 99F2C4401BC for ; Thu, 9 Nov 2023 22:14:00 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 67B6940106 for ; Fri, 10 Nov 2023 03:14:00 +0000 (UTC) X-FDA: 81440575440.09.7F91A87 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf28.hostedemail.com (Postfix) with ESMTP id 9E173C0013 for ; Fri, 10 Nov 2023 03:13:56 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699586037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ofL2FxKe5f0+wlRgHrZQGb01wdiDiJCxQVPLgBK/f44=; b=kFC58lFePKbWzaf2u5KgZS2TD/1NDprSpSKYu+C4O6i6zsXRLQYwo7IrDTfPFkoelxhyQA eIVFIBQGlqQXEc6T60s+RzSPj5VywTHeT+9XDJhYIOMgW1Vgi0zLw9mG2F7SuLR/WSjYUY 1sE+7OVgKvpL+JKLcVWoiONUM9kViLA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699586037; a=rsa-sha256; cv=none; b=Qs1HQaUp5ky62cGb2e5SG/+i88+IyKMHq5mlxqo4JcO5T0UF/JqK8tFH82irUstOpqVgzI hR8cd42yAPuSTD3BRL2Iw4mykhP4SwKHwydWVzX7cfogFA/Z3J0+fnMMoXJ2lQB5PHpOuw C35J/LN5h0VmnNGQN7EIqKmXZrS6UD8= X-AuditID: a67dfc5b-d6dff70000001748-4a-654d9ff040e6 Date: Fri, 10 Nov 2023 12:13:47 +0900 From: Byungchul Park To: Nadav Amit Cc: Linux Kernel Mailing List , linux-mm , "kernel_team@skhynix.com" , Andrew Morton , "ying.huang@intel.com" , "xhao@linux.alibaba.com" , "mgorman@techsingularity.net" , "hughd@google.com" , "willy@infradead.org" , "david@redhat.com" , "peterz@infradead.org" , Andy Lutomirski , Thomas Gleixner , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Message-ID: <20231110031347.GA62514@system.software.com> References: <20231030072540.38631-1-byungchul@sk.com> <20231030072540.38631-3-byungchul@sk.com> <63C530D3-3A1D-4BE9-8AA7-EFF5B895BE80@vmware.com> <20231030125129.GD81877@system.software.com> <20231108041208.GA40954@system.software.com> <20231110010201.GA72073@system.software.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231110010201.GA72073@system.software.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMIsWRmVeSWpSXmKPExsXC9ZZnoe6H+b6pBnvvK1rMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8X1XQ8ZLY73HmCy 2LxpKrPF7x9AdXOmWFmcnDWZxUHA43trH4vHgk2lHptXaHks3vOSyWPTqk42j02fJrF7vDt3 jt3jxIzfLB47H1p6zDsZ6PF+31U2j62/7Dw+b5LzeDf/LVsAXxSXTUpqTmZZapG+XQJXxpxX h9kKTkhXNG9+zdzA+FK4i5GTQ0LAROLWlglsMPbJX1cYQWwWAVWJ47s3gMXZBNQlbtz4yQxi iwgoShzafw+shlngHavE90+aILawQLTEp72v2EFsXgELiaeLj7J0MXJxCAlcY5JY+7eJFSIh KHFy5hMWiGZ1iT/zLgEN5QCypSWW/+OACMtLNG+dDbaLU8BS4lrvMrCZogLKEge2HWcCmSkh sI1d4mDPaxaIoyUlDq64wTKBUXAWkhWzkKyYhbBiFpIVCxhZVjEKZeaV5SZm5pjoZVTmZVbo JefnbmIERu2y2j/ROxg/XQg+xCjAwajEw3vhuk+qEGtiWXFl7iFGCQ5mJRHeCyZAId6UxMqq 1KL8+KLSnNTiQ4zSHCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSDYxNjYbl//kLDX4zKNwU l/at681orb/xaP6n7CwxiY3ty34d4E/RXjfzrV/FzC07Nh26y2wQPdt6+1cZ+3nbVr+4bq91 9NPyyBkKBxPPBaZd+pZa/j84sMjyn6XIpTv7+PfzCLTvmsG0ieMai4fjgU27dk71V9q643vp N9aLCz8URZRKy7Ls6jJXYinOSDTUYi4qTgQAZbBX49YCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFIsWRmVeSWpSXmKPExsXC5WfdrPthvm+qwbSv/BZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+u7HjJa HO89wGSxedNUZovfP4Dq5kyxsjg5azKLg6DH99Y+Fo8Fm0o9Nq/Q8li85yWTx6ZVnWwemz5N Yvd4d+4cu8eJGb9ZPHY+tPSYdzLQ4/2+q2wei198YPLY+svO4/MmOY9389+yBfBHcdmkpOZk lqUW6dslcGXMeXWYreCEdEXz5tfMDYwvhbsYOTkkBEwkTv66wghiswioShzfvYENxGYTUJe4 ceMnM4gtIqAocWj/PbAaZoF3rBLfP2mC2MIC0RKf9r5iB7F5BSwkni4+ytLFyMUhJHCNSWLt 3yZWiISgxMmZT1ggmtUl/sy7BDSUA8iWllj+jwMiLC/RvHU22C5OAUuJa73LwGaKCihLHNh2 nGkCI98sJJNmIZk0C2HSLCSTFjCyrGIUycwry03MzDHVK87OqMzLrNBLzs/dxAiMwWW1fybu YPxy2f0QowAHoxIP74XrPqlCrIllxZW5hxglOJiVRHgvmACFeFMSK6tSi/Lji0pzUosPMUpz sCiJ83qFpyYICaQnlqRmp6YWpBbBZJk4OKUaGFlPRPgkJkdffv73SeLkpT/NlzOs0vOx/Lh2 hnpDnVx2GM/Hnj15Vx6mezDP1Hxf1SB86M+/lfY1+o8U5jH93VcS73A+ff2OOTOnnc4N7fRQ qQk4/rjHMaQw6UjEtNkM7Epyrbsa2G+dKs7SSTu9XfCq18aMNFk9qV3qzq//HqnfaVrb/Gxq lRJLcUaioRZzUXEiAAtQmiu9AgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 9E173C0013 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 74enuwafzorpai4eefyzm5xrrzquy16m X-HE-Tag: 1699586036-989135 X-HE-Meta: U2FsdGVkX184P50xeXKgRBGLdUAvBYewxLr4qs67EGM4hnLaQ9tyKOJNnLQZaiuiLzSar6sL1EUf0F8408k3FgfOBEHUCaI9wd83tBwSPTpXoiBO+CvtbvkpqItKtjFJqp/JINHWy1LpUXFyuqZwFg9a4lkjV43HtvAhoR/H9dpGytLoNHXT+j5tuBfkPfdPdZyx3z8SBWNFBAdZHHZiTvFvy86/reQEUAd2y9CE1Fnp6yHGG8PfoQocvKHrd/Xmzp97Lm+e7NINgyclb+B6VMmwTSB5falbmQNPErf6BzUlKaie5XcdfrFtmjxiXGQ+nJ4j/5fHuVB5m3BVL9lH2dzok/F2aVRBL6eZhkSjrvas9PvOstJxXDCQWDmx3Lcgel64+4PP3CXlsoUMhtSW6nnkUhsyt80b9cR/5YxIImafYjz7cTjQNlQ9vmjZOLCSs+lWnXx6Zv2Db+jIsebi+I+YagnKnlfiwZmJjBUWap+g449l45oOihqYsV/LwUSMDzETh3I5/paoawtOxgzRF8OQLdrSX8b2I2pSDii1GSsyLHKwvqNShJToyOZSCcGqgDPbmdHKC85YFWcLoIIthQ+MCWR5X8i0G2pC+yrTsHlrbvI0Wrx5wTwqhzGJyJocs6bkg/8R4QzEQ25t5PmO09bzYCPzbB6eNqFfhBX8xXFtJGyzLJ/jGyfw49TiT1gJ1bwDscwoDllJ8lG5wMKQVa858BTgvp6X/dAXWtjL4SVKD8fJaymxcPkre3qS9SdYedO/w1WcXwe4FgxE1srtI8DJN9wavGq07hQeZYBCy2FWTfUHjsQttgQ5dB3G5qsBtBnX6CzqMb7gXe1+Sqte6Z08iSxBP0J7LcGM+IV1xUipso9XVUXWZ8IbhYYCI071kl+MFoECVVvMjCV5pwq7LnMgW1cFiTfxpAWZo6gzsKZvn5KS0imGf+B76P9iKYG1ICtM8qpfeDtCZad4zBC BXnQGzG7 bz+Nms6Gq1FoGrIUiaPlsLZOGJ2llxG1/skRqteBCrYodPIU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 10, 2023 at 10:02:01AM +0900, Byungchul Park wrote: > On Thu, Nov 09, 2023 at 10:16:57AM +0000, Nadav Amit wrote: > > > > > > > On Nov 8, 2023, at 6:12 AM, Byungchul Park wrote: > > > > > > !! External Email > > > > > > On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote: > > >>>> diff --git a/mm/memory.c b/mm/memory.c > > >>>> index 6c264d2f969c..75dc48b6e15f 100644 > > >>>> --- a/mm/memory.c > > >>>> +++ b/mm/memory.c > > >>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > > >>>> if (vmf->page) > > >>>> folio = page_folio(vmf->page); > > >>>> > > >>>> + /* > > >>>> + * This folio has its read copy to prevent inconsistency while > > >>>> + * deferring TLB flushes. However, the problem might arise if > > >>>> + * it's going to become writable. > > >>>> + * > > >>>> + * To prevent it, give up the deferring TLB flushes and perform > > >>>> + * TLB flush right away. > > >>>> + */ > > >>>> + if (folio && migrc_pending_folio(folio)) { > > >>>> + migrc_unpend_folio(folio); > > >>>> + migrc_try_flush_free_folios(NULL); > > >>> > > >>> So many potential function calls… Probably they should have been combined > > >>> into one and at least migrc_pending_folio() should have been an inline > > >>> function in the header. > > >> > > >> I will try to change it as you mention. > > >> > > >>>> + } > > >>>> + > > >>> > > >>> What about mprotect? I thought David has changed it so it can set writable > > >>> PTEs. > > >> > > >> I will check it out. > > > > > > I found mprotect stuff is already performing TLB flushes needed for it. > > > So some redundant TLB flushes might happen by migrc but it's not that > > > harmful I think. Thanks. > > > > Let me explain the scenario I am concerned with. Assume page P is RO, and > > moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0). > > > > Let’s also assume we also have an atomic variable “a”. Initially (a == 0). > > > > I hope I got the migration function names right, but I hope the problem > > itself can be clear regardless. > > > > CPU0 CPU1 CPU2 CPU3 > > ---- ---- ---- ---- > > (user-mode) (user-mode) > > > > Access *p > > [Psrc cached in TLB] > > > > migrate_pages_batch() > > -> migrate_folio_unmap() > > > > [ PTE updated, > > still no flush ] > > > > mprotect(p, > > RW) > > Here, > > mprotect() > do_mprotect_pkey() > tlb_finish_mmu() > tlb_flush_mmu() > > I thought TLB flush for mprotect() is performed by tlb_flush_mmu() so > any cached TLB entries on other CPUs can have chance to update. Could > you correct me if I get it wrong? Thanks. I guess you tried to inform me that x86 mmu automatically keeps the consistancy based on cached TLB entries. Right? If yes, I should do something on that path. If not, it's not problematic. Thoughts? Byungchul > Byungchul > > > > > [ Psrc is > > RW ] > > > > [ flush > > deferred] > > > > > > *p = 1 # Pdst > > > > xchg(&a, 1) > > mfence > > if (a == 1) > > assert(*p == 1); > > > > > > > > Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas > > CPU1 reads from Psrc. But based on x86 memory model, userspace might not > > expect this scenario to be possible, hence leading to bugs.