From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90C55C7EE2C for ; Wed, 24 May 2023 14:16:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D35D6B0074; Wed, 24 May 2023 10:16:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28348900002; Wed, 24 May 2023 10:16:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1728D6B0078; Wed, 24 May 2023 10:16:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 09F566B0074 for ; Wed, 24 May 2023 10:16:31 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AA5E88097B for ; Wed, 24 May 2023 14:16:30 +0000 (UTC) X-FDA: 80825348940.15.EEB28D8 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 9C453C0009 for ; Wed, 24 May 2023 14:16:28 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=fIm4enJ3; spf=pass (imf28.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684937788; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/kJ2BoKi5i78Mh/y8Y03JIn+l1TFTW2WOH0z7fk8TZU=; b=N+kEloVCKd0IBs5cEdY4RS8r4t3QGTy/Gzjn4HCtQwdYoxY+tU1f3n0g+gQjVSeMcQ8CTO RI5zGfKClwV81apoJ8FqFwhUGjM2gWrdTUTJazBloaP/CBj40UftgH5jIaJKur6VKFj11s 9+uk0QjPqTywHFSgjuTJsEj7pD9cBzc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=fIm4enJ3; spf=pass (imf28.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684937788; a=rsa-sha256; cv=none; b=TJJ9tzMA06VTto4OQM15A7EdyTCxaPv94KEcTjuYKgjgB5ihjRgOrUW/YkerYnGOvIPnhy Adah9Z7L/a94q+yv6kLTSVNVs8vXxMeBcsQMicc+xP8YehHrkNd80taQHjoABRkfRINV7Z BnMq5Hv/ECWeQ8ghUHvv1PCRb8fSSus= Received: from [192.168.10.48] (unknown [119.155.11.156]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id AFC8E6605943; Wed, 24 May 2023 15:16:08 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1684937786; bh=AuJTdSQvXF6YuhHcmscNAwhCf5setc7Hta7xDcqui0k=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=fIm4enJ3OeMwYz3UmiX6os2xp2qfzmCLKiUjTfJ4WEheyVlpslMtqmurhm5tWkKaE B5c862c9aeNgkl2gI4F0CHbOHo9bCwe0x1uvKAZ/Ysnlpi3cIWXeQ2Bz3HiLhkEeRM 9a8r0sAzu0TsuTVvBmJ4/eabCHK0KDuQFc/s+dr0jaSKCc0cjrKYJEahnBrOKm4PRi QLsTEsZBuLUD9wscuazomaYC3KRhTITFTzEJJ+rp00hx1mdxxV5Ijtx4xWfAaB52sm xDB1AKaJuABxKR46EqktjF1zovF2M9Zmgm1WFZ4OkXRpgzwW3Q6Qb/m0OIcDkaoDUD ByXuRdFihANXg== Message-ID: <8947d94d-8229-f8ee-e981-9b73462ecb94@collabora.com> Date: Wed, 24 May 2023 19:16:02 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Cc: Muhammad Usama Anjum , linux-mm@kvack.org, Paul Gofman , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Cyrill Gorcunov , =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrew Morton , Suren Baghdasaryan , Andrei Vagin , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Danylo Mocherniuk , Axel Rasmussen , "Gustavo A . R . Silva" , David Hildenbrand , Dan Williams , linux-kernel@vger.kernel.org, Mike Rapoport , linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com, Nadav Amit Subject: Re: [PATCH RESEND v15 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Content-Language: en-US To: Peter Xu References: <20230420060156.895881-1-usama.anjum@collabora.com> <20230420060156.895881-3-usama.anjum@collabora.com> <0edfaf12-66f2-86d3-df1c-f5dff10fb743@collabora.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 9C453C0009 X-Rspam-User: X-Stat-Signature: jchfcqrjm47du4hh63619fxnd77zt71z X-Rspamd-Server: rspam01 X-HE-Tag: 1684937788-535665 X-HE-Meta: U2FsdGVkX1/Cj0SqedIqh/T+bUIi0se76RvVIPmFUoCsZT5lnB5Q8O10A1cZ3lVPNVZ1CqyHdAMLl3LLU6GCkSoMV4mc2MCywROdLD/5KUfc8gl8FEp4MEIJb24sOQLX7v1MDKXlA59zNxWdwJOcFwXY1PdIrRi1vIDEfyiyAS7hrS1jOuSFtNLFcKPKlJyMeUUn9zYDslU+E8gZu7IeIsAvigjJiVu7P4nX4rVTuUfGM7nMUGxJ3jZSciNNi5RcqRTCDPKzzni1/sWTCx4DyUjjYt3GIBicLXMzac3gZfzEevMkNN2pCTi3eBQDDJYi+teyaSjUY/vDFGk7RDJ85gLl0QsrJz7CbkdggDGrd0ASzGFRo1v/R0shmBaG+V8x/WQSStIK1xIMoP7vCCAjAJtcUbS20ekoJ+ae/6LvywBc7cR9FbZBM46Qd6G//UcAcRelDk4c6oblkOl/eAVAhnIFunR8HRuDmjDI5J8SMxPEBzFp4i9GVY3afA/pTSLyxOjs151FOuBVCo2DWIHCKURFaS/bkCKzCxL2qN98OuXvD3zSzrGbO20MobnC7it9fKkfS5dQlM9LJttWuVDRuEDA+emRm55+rQ65UI1pCWuUZE3yoFoOeimUdnk2WAPYtkrvnvZFytZ27aFZtB2KoVuOfMYIdpIAAGV3Cj7Jk92Y2HQs3hkGkQqu0S3wehoST+rGPbXlntYRUGwhhoUJY9usZhbb/S7q6y35dX0oaFaJTlp3d3vYUsL3B+AAfc1nQevrB3ZJ3mhVTacU7BY1tF3xb5Jg748o29xZlD9C63dnY5AUw5tvMMX6XCBvnOn9J7swzp/umSPOIJlmqpd2L1Zm8MBh83crvcvhq4oRTH2vVkRiLU26pbxbI2+2flzR8+EXds3rM/Xh5vvgoAmsf8sczD5IHFvlFCC587gTCjzjopQfJZX0+49+iGRzT11o4wPhBfA9o09zea+5EE2 T5m/aSqX EVX4tTOrqXvFxzqJkc95/YiQKrtm+oI4iTfvn2SLF9FrW6HJqxkrDbH6WvOx4LNaeCcLw/9NPoK95kp544+17B1X/+/SMlOSzQ3AI8hCH1Diw7iiuuESSWDntcFROUFQD1B9ZKnAVQ7VZWehFNPCDzWYwEXr0mhQ8jw5+FLDij26h6pUMaGew1nqsEKHW/RmIPFYdV8M6KXzbtEl79ZKW+znwCN9+UxUbYQfDCZBcKU0VlVv8b0WDnqPWgDL9z437bTXY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/24/23 6:55 PM, Peter Xu wrote: ... >>> What is the steps of the test? Is it as simple as "writeprotect", >>> "unprotect", then write all pages in a single thread? >>> >>> Is UFFDIO_WRITEPROTECT sent in one range covering all pages? >>> >>> Maybe you can attach the test program here too. >> >> I'd not attached the test earlier as I thought that you wouldn't be >> interested in running the test. I've attached it now. The test has multiple > > Thanks. No plan to run it, just to make sure I understand why such a > difference. > >> threads where one thread tries to get status of flags and reset them, while >> other threads write to that memory. In main(), we call the pagemap_scan >> ioctl to get status of flags and reset the memory area as well. While in N >> threads, the memory is written. >> >> I usually run the test by following where memory area is of 100000 * pages: >> ./win2_linux 8 100000 1 1 0 >> >> I'm running tests on real hardware. The results are pretty consistent. I'm >> also testing only on x86_64. PM_SCAN_OP_WP wins every time as compared to >> UFFDIO_WRITEPROTECT. > > If it's multi-threaded test especially when the ioctl runs together with > the writers, then I'd assume it's caused by writers frequently need to > flush tlb (when writes during UFFDIO_WRITEPROTECT), the flush target could > potentially also include the core running the main thread who is also > trying to reprotect because they run on the same mm. > > This makes me think that your current test case probably is the worst case > of Nadav's patch 6ce64428d6 because (1) the UFFDIO_WRITEPROTECT covers a > super large range, and (2) there're a _lot_ of concurrent writers during > the ioctl, so all of them will need to trigger a tlb flush, and that tlb > flush will further slow down the ioctl sender. > > While I think that's the optimal case sometimes, of having minimum tlb > flush on the ioctl(UFFDIO_WRITEPROTECT), so maybe it makes sense somewhere > else where concurrent writers are not that much. I'll need to rethink a bit > on all these to find out whether we can have a good way for both.. > > For now, if your workload is mostly exactly like your test case, maybe you > can have your pagemap version of WP-only op there, making sure tlb flush is > within the pgtable lock critical section (so you should be safe even > without Nadav's patch). If so, I'd appreciate you can add some comment > somewhere about such difference of using pagemap WP-only and > ioctl(UFFDIO_WRITEPROTECT), though. In short, functional-wise they should > be the same, but trivial detail difference on performance as TBD (maybe one > day we can have a good approach for all and make them aligned again, but > maybe that also doesn't need to block your work). Thank you for understanding what I've been trying to convey. We are going to translate Windows syscall to this new ioctl. So it is very difficult to find out the exact use cases as application must be using this syscall in several different ways. There is one thing for sure is that we want to get best performance possible which we are getting by adding WP-only. I'll add it and send v16. I think that we are almost there. > -- BR, Muhammad Usama Anjum