From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C8F6C27C6E for ; Fri, 14 Jun 2024 01:57:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B469E6B00B4; Thu, 13 Jun 2024 21:57:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF6A96B00B5; Thu, 13 Jun 2024 21:57:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BDF76B00B6; Thu, 13 Jun 2024 21:57:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 814436B00B4 for ; Thu, 13 Jun 2024 21:57:56 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 30773C0439 for ; Fri, 14 Jun 2024 01:57:56 +0000 (UTC) X-FDA: 82227833352.20.94E8D7F Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id 79FC3100008 for ; Fri, 14 Jun 2024 01:57:53 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718330272; a=rsa-sha256; cv=none; b=BArzdMY8auRIO/3FuLiy3rxa6hVj4pJ+5Mhgx8ZgaZHmR8lDhsXW+QmIKLf55oww4shuai BdfVX+DbayKWdjAH4BF+UIps/DWFN8XAdjFANeeJhIKaQjlUBQMvctK7l69Yl3zUnN1C/L H5M78aTo2q2Root4KuOgp/AoGO5cKOs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718330272; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5otfW+bvtj3/O3QzC5LzLjrTOPuj88XObQvf76z5frE=; b=0WRH2NELMgTQuvQWAPSRvU3iEIb4jRSr3zrYwqkAJxWznDvNefiJxyMEjCKRvOdNvKuxcG l45LrQUN5mrDtGbQG7Gtpn87OcSI912Jud2iNdBRhXREjVr63jYlg/BP5ISbsvgF5BIpGi sOPiFQkZxrzlgAAhpwhIAKkHYxwxdPk= X-AuditID: a67dfc5b-d6dff70000001748-de-666ba39e165c Date: Fri, 14 Jun 2024 10:57:45 +0900 From: Byungchul Park To: Dave Hansen Cc: David Hildenbrand , Byungchul Park , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Message-ID: <20240614015745.GA47085@system.software.com> References: <20240531092001.30428-1-byungchul@sk.com> <20240531092001.30428-10-byungchul@sk.com> <26dc4594-430b-483c-a26c-7e68bade74b0@redhat.com> <20240603093505.GA12549@system.software.com> <20240604015348.GB26609@system.software.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240604015348.GB26609@system.software.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrOIsWRmVeSWpSXmKPExsXC9ZZnoe68xdlpBod6zC3mrF/DZvF5wz82 i08vHzBavNjQzmjxdf0vZounn/pYLC7vmsNmcW/Nf1aLo52bmC3O71rLarFj6T4mi0sHFjBZ HO89wGQx/95nNovNm6YyWxyfMpXR4vcPoI6TsyazOAh5fG/tY/HYOesuu8eCTaUem1doeSze 85LJY9OqTjaPTZ8msXu8O3eO3ePEjN8sHvNOBnq833eVzWPrLzuPxqnX2Dw+b5IL4IvisklJ zcksSy3St0vgyjjTe5Sx4LpMxfuVPxgbGD+LdjFyckgImEicO9HFCmP/WzCHBcRmEVCVWLhm P1icTUBd4saNn8wgtgiQfWrlcvYuRi4OZoHjzBIfPi5iBEkICxRIvJowiR3E5hWwkGh6+p8Z pEhI4AqzxNHJX5khEoISJ2c+AdvALKAlcePfS6YuRg4gW1pi+T8OkDCngKXEy4YvbCC2qICy xIFtx5kgjtvHLvHheiqELSlxcMUNlgmMArOQTJ2FZOoshKkLGJlXMQpl5pXlJmbmmOhlVOZl Vugl5+duYgRG57LaP9E7GD9dCD7EKMDBqMTD6/EsK02INbGsuDL3EKMEB7OSCO+shUAh3pTE yqrUovz4otKc1OJDjNIcLErivEbfylOEBNITS1KzU1MLUotgskwcnFINjJLedltKmNo093Nr aebqbmwzf/hQq7Qzcv/SkvTtgb7uNk+5V9z2lev51GL99/NHliSLT/w7JjcsrLYXffLr4I5r ih1zO600NwoqrzrqwaVWvfl0xqqmeO2bqkYhaTyP/iqsuJ/W4XeeRXB3t53hL72erMwnbZtS f/pwWyUkPWvKCZ6Rd1BTiaU4I9FQi7moOBEAYeRamsoCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprPIsWRmVeSWpSXmKPExsXC5WfdrDtvcXaawa6XShZz1q9hs/i84R+b xaeXDxgtXmxoZ7T4uv4Xs8XTT30sFofnnmS1uLxrDpvFvTX/WS2Odm5itji/ay2rxY6l+5gs Lh1YwGRxvPcAk8X8e5/ZLDZvmspscXzKVEaL3z+AOk7OmsziIOzxvbWPxWPnrLvsHgs2lXps XqHlsXjPSyaPTas62Tw2fZrE7vHu3Dl2jxMzfrN4zDsZ6PF+31U2j8UvPjB5bP1l59E49Rqb x+dNcgH8UVw2Kak5mWWpRfp2CVwZZ3qPMhZcl6l4v/IHYwPjZ9EuRk4OCQETiX8L5rCA2CwC qhIL1+xnBbHZBNQlbtz4yQxiiwDZp1YuZ+9i5OJgFjjOLPHh4yJGkISwQIHEqwmT2EFsXgEL iaan/5lBioQErjBLHJ38lRkiIShxcuYTsA3MAloSN/69ZOpi5ACypSWW/+MACXMKWEq8bPjC BmKLCihLHNh2nGkCI+8sJN2zkHTPQuhewMi8ilEkM68sNzEzx1SvODujMi+zQi85P3cTIzDa ltX+mbiD8ctl90OMAhyMSjy8Hs+y0oRYE8uKK3MPMUpwMCuJ8M5aCBTiTUmsrEotyo8vKs1J LT7EKM3BoiTO6xWemiAkkJ5YkpqdmlqQWgSTZeLglGpg5LP9ET7JPF1Pm039TsFNgcC6Xd6m mw60Wj3bePuBk9wejwguqxUqh8+nc7tfuLqyb+M843lrq16deidgdvEvI5ee5P+8n4E8lix6 gXeOu0oJiG+bI3JSNffezxoWTg32Ntu2q7P550hKqpqExVxVr1jbfvUvE/N9tyLtaaZfmZWE d9Qq6CxWYinOSDTUYi4qTgQA7zbYpLICAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 79FC3100008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 97kcm6pqromtyhj9bnpk5ixmx9kdi4dq X-HE-Tag: 1718330273-427039 X-HE-Meta: U2FsdGVkX1+kodBe3Rqnh9YM3XjcCaO7LlseGyUoPCqeHL5na0e5iKbbIqGqSzRDgHm4JK1mDqP9X0auyTsAQwW8OnZPsSHyMx/6Jic2Q9irbpGMAPk89fxj9CEyRhRucaMwdZ2hKm9UGB54p6Kl02sU8AaeOqP8T4lNbDIRUbqDTlAq1YvCMxnUT8xtcfYUr6irKyjInwysO6s1Z+UiUHXjWWDmiVvRyZSlJjg1ue3iliNYiukn+rNoCLgZ7/GKHqVlXxxiCId1bzF4IhiNSShlpsd0AuIHzIGJJAOmd9AM05TPXP/ViaD+FI8RJsXs5d+1wO/KoXGYDGKC7uaoK6Yjhfo8N/bOQxJsJtioutG2c7ghn7VW8Gt7K7jdeMtRLFcUjw9nwCMHEtCySjznja1pDhUIZCHqXQFssmRtX1YekOo3K544LvLhKNFqFE2Hlcu19Ilv3d5JZwVSxPSGcStOKU5syeszeAGRBwudiGuRI5RZFjdpsrIUPJ2q7eqaiYZk2BRlvdvmEw9dzht9DodE9R8HHAdNHURmflzxdMckmdVAEx4qg8RJvoHWbY98eJEM3Usgry60ylLJ0JpW2wlMf2b1cniFePXUQQY/qCKxKPxjIgJF3lVSSc8MQEy3elYNdUBl7fxqAjLgCz7R5GGPzWYiamyvg/UKzLQACkKDGuf5v9moUU/HFubVUNyl4Vnzx1aMBhCgV3CVi9wKuQMMkNPt4o4SF0NZrtodooDDlkdZO7dLBv5ZxP+Bo2/fO/j/grcb2w3lIYsa+hLA9/J01Vtp5ANNpFAJBWVKrzRF1SB0b0Ht4G6QIUZRH2LWMRd3auBE9OGBHlqAFRNHeVJlgQiNwjznrICT6p3xy4gFElRqs8T6KQPovZb4TKmVtoFsFEohqGi6OkXry5yVxmFxJEimo33Ep7b5995tTFcq6SQw6rjCxxnhvl20XRe8PArRc8Z2oK24INWNU9H ms2qvMtB iJu2vx/eZLbYJV6E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 04, 2024 at 10:53:48AM +0900, Byungchul Park wrote: > On Mon, Jun 03, 2024 at 06:23:46AM -0700, Dave Hansen wrote: > > On 6/3/24 02:35, Byungchul Park wrote: > > ...> In luf's point of view, the points where the deferred flush should be > > > performed are simply: > > > > > > 1. when changing the vma maps, that might be luf'ed. > > > 2. when updating data of the pages, that might be luf'ed. > > > > It's simple, but the devil is in the details as always. > > Agree with that. > > > > All we need to do is to indentify the points: > > > > > > 1. when changing the vma maps, that might be luf'ed. > > > > > > a) mmap and munmap e.i. fault handler or unmap_region(). > > > b) permission to writable e.i. mprotect or fault handler. > > > c) what I'm missing. > > > > I'd say it even more generally: anything that installs a PTE which is > > inconsistent with the original PTE. That, of course, includes writes. > > But it also includes crazy things that we do like uprobes. Take a look > > at __replace_page(). > > > > I think the page_vma_mapped_walk() checks plus the ptl keep LUF at bay > > there. But it needs some really thorough review. > > > > But the bigger concern is that, if there was a problem, I can't think of > > a systematic way to find it. > > > > > 2. when updating data of the pages, that might be luf'ed. > > > > > > a) updating files through vfs e.g. file_end_write(). > > > b) updating files through writable maps e.i. 1-a) or 1-b). > > > c) what I'm missing. > > > > Filesystems or block devices that change content without a "write" from > > the local system. Network filesystems and block devices come to mind. > > AFAIK, every network filesystem eventully "updates" its connected local > filesystem. It could be still handled at the point where updating the > local file system. To cover client of network file systems and any using page cache, struct address_space_operations's write_end() call sites seem to be the best place to handle that. At the same time, of course, I should limit the target of luf to 'folio_mapping(folio) != NULL' for file pages. Byungchul > > I honestly don't know what all the rules are around these, but they > > could certainly be troublesome. > > > > There appear to be some interactions for NFS between file locking and > > page cache flushing. > > > > But, stepping back ... > > > > I'd honestly be a lot more comfortable if there was even a debugging LUF > > I'd better provide a method for better debugging. Lemme know whatever > it is we need. > > > mode that enforced a rule that said: > > Why "debugging mode"? The following rules should be enforced always. > > > 1. A LUF'd PTE can't be rewritten until after a luf_flush() occurs > > "luf_flush() should be followed when.." is more correct because > "luf_flush() -> another luf -> the pte gets rewritten" can happen. So > it should be "the pte gets rewritten -> another luf by any chance -> > luf_flush()", that is still safe. > > > 2. A LUF'd page's position in the page cache can't be replaced until > > after a luf_flush() > > "luf_flush() should be followed when.." is more correct too. > > These two rules are exactly same as what I described but more specific. > I like your way to describe the rules. > > Byungchul > > > or *some* other independent set of rules that can tell us when something > > goes wrong. That uprobes code, for instance, seems like it will work. > > But I can also imagine writing it ten other ways where it would break > > when combined with LUF.