From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C50FFC38142 for ; Mon, 23 Jan 2023 13:15:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 487C16B0071; Mon, 23 Jan 2023 08:15:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 410006B0072; Mon, 23 Jan 2023 08:15:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B1926B0073; Mon, 23 Jan 2023 08:15:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1530A6B0071 for ; Mon, 23 Jan 2023 08:15:25 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E43131C5E9D for ; Mon, 23 Jan 2023 13:15:24 +0000 (UTC) X-FDA: 80386110168.26.34DD508 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf30.hostedemail.com (Postfix) with ESMTP id 059C88001B for ; Mon, 23 Jan 2023 13:15:21 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=j3gTk8oD; dmarc=pass (policy=none) header.from=collabora.com; spf=pass (imf30.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674479722; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=apkzZbsx4WUkzKrESGAsqk1qjGQ3iXGkOEH2wEtSDxg=; b=GOJ3FIJ3vF0kfyM+gTrZixJH142uH17D3fzgIyjO8tUuKP6EybWp7tAftPggi4kiqW1tg2 chnU0f9KoW2IodQ67rODjZ35RvTn8/MHZvbfU6YCqhyfN8ApfV4NRym4ChdMfab7Ib39D2 hFol0oeBbkXw4mFPpg8IOqq3BJ9MRVQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=j3gTk8oD; dmarc=pass (policy=none) header.from=collabora.com; spf=pass (imf30.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674479722; a=rsa-sha256; cv=none; b=Fogj3rWmEKi2Hke/GLwaMLLeW5zhvkQdJit1cGIqJedXxF7nfiz2PHx/mknjofhars9kPq IOwJ/bYdOhrUIikFR7dyhK1GQuDhB5pd4r3nXLZ+Z3b1jocc4ZjJ6EmlaId9GnBGeti3FB 984vjV99eNS1t8GSgypRr7gaUXJxUSg= Received: from [192.168.10.12] (unknown [39.45.186.163]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id CACAE66029A5; Mon, 23 Jan 2023 13:15:07 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1674479720; bh=eZUiVTNJd98+gtT/nMH4QDYraaSsAMNKpFlsv+UCbg8=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=j3gTk8oDtmYafOYZGKoyCrDpDCujE9melTSWyjfS9iFaunWaHVPhVh70RN3pSH/uq W9Pi/tpMVFBy2oNRDbebNXvPn1/2Voeqf+dIR6i5qiVqpuDz+xppbeK+OO8SZONXxI XowHQIQqYjM2tYb3UW34ZPC/zDudGOTS9mKQQ5o/HUxKXbGMeRkTtPZrOCr8NOPnaL 2TSCQc3Jn21b0/n5FlbOd7SHwHPSUqvo4GBQHh/d4OEgoqXWYFtkKvGZNF1X3JZu16 WoDRWlugm1tN/UsoYje24rujYkiLTYN1kW2/gEBdJQfKDgWZySqTGO3oB84r0tEbk0 eGD9cQYgDYfkA== Message-ID: <0eb79bb3-7384-11c6-a380-c027f09305f2@collabora.com> Date: Mon, 23 Jan 2023 18:15:00 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Cc: Muhammad Usama Anjum , David Hildenbrand , Andrew Morton , =?UTF-8?B?TWljaGHFgiBNaXJvc8WC?= =?UTF-8?Q?aw?= , Paul Gofman , Cyrill Gorcunov , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Mike Rapoport , Nadav Amit , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v7 0/4] Implement IOCTL to get and/or the clear info about PTEs Content-Language: en-US To: Peter Xu , Andrei Vagin , Danylo Mocherniuk References: <20230109064519.3555250-1-usama.anjum@collabora.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 059C88001B X-Stat-Signature: c3pp5a8ccydgmmxhjssh1o6tqjp5gcnd X-HE-Tag: 1674479721-92545 X-HE-Meta: U2FsdGVkX19WbY2hF87XVWro6NpUPzSQ2845+CZXEXZydx0Zmv3OY653tyusIIC21pHlO2RyAg+hlNzYlGiqWOUA7IGcr9FqyY7lz/+zCYRSQ+l4wu7wgm2J0UTKiLJjalDvfcJxC9Rr2Q7/+AHtQ21jnNlOi/KWKCMErWrpgrwXgamKnYCj679mZRyIUkdBB/zByc2i4yWXrzIBt+pZru+sXuUP4PJ2V3gwwxyHHbI+bL+Q7esLAYkbOv4ca91vyI/2BCZJHKsoXsaxrnl6qaux87qiQDKF2j9kCVEI+lFXMvCueO1FF4E0U8U980V1n6+OgvpYehITYXVagBsmj5OcKu25viAj/8U4Kh4KmNFtYKa2DSayWba7PT0feq1owXSfjV16rVCR9o9uQ0eWIaEBGhGuEKD9CJYF2atXrJxiBGv1WKC0lRu+7bOBsMmEBpp3ZwDKm6IxdKJfhDpyk9O+XeRMoqzpvt+IENfzodUGM5WzVB4mWseeBQ0lCthzfjIDokm1308tbiX49HmOuLyRg83yrzlROBtt6feSXcWpJyDJlhdVP4n9nkFUbV6y9mgnKo8devGh98lsgXi3x8MHP3Glkp3iwZHR3q7H99p0pDKUyjx6KxaxY0zG/wvRFWChvbFCkoUaHPjwogmdw55X3eD6Y+l5FOoMjr/mvbn3waJmK+uny1hc0xo2tDw8ESxyX0io9QiYSLUNDoqy0NV3hWzx5svoETajTjtOBCtDOL0DpfXKFIlqNdhBAiYK2+ktrr+gJHVtvT4KdFK3mLieSBAGCcQGBxlGudj56RdkUd4cvRd1hrweRqiGGTaAdpJmlECfC1RDMJehPU2JC9K9OlHdEsCztMJ2iDyY8UhAE6iDNEoixLy0Bpg00ihk57WsHWfCyt5r+hEsBZZboJ0IPmv202aUfQags/C1REvbHnUrtfXrq9xgR6lrmCBMgCjmcvsxaMUn3N76ybJ doTYBpm/ XhWNPMLtFlnv2/UP2lfnDudKbpauljUUz5t6fiPz2Gdrbv1Ff20nyvNvbnSTHt5RRCayUH79NDtkRKoIPdRHspRMG4WBfbE7YOlNpf/tWtOiIIkJbdol6rFetcawj6KBqQqCPcK64lSgpBm6CqhnsN65QFMqHxwTZ8AYIXxL2bQxjBcn+YZkFdQ1s5MnZyP9nUMbTXl3lg88wS5L3GvMgoLpN0+/PgfJmI/eyby1OHitawA7+erNDSrLc+XMzfzkULL7vMQfZToaaPOHKuVnHUil/KM4tTICFZCrdLgk7bw0m7e8f0+W2oqsY+sByMRdblvtxnYbveUFnkPPruRgb9h6Wxy6i/qdKyxPRjvQWnzgop1sGuGp/iGTVWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/19/23 3:12 AM, Peter Xu wrote: > On Mon, Jan 09, 2023 at 11:45:15AM +0500, Muhammad Usama Anjum wrote: >> *Changes in v7:* >> - Add uffd wp async >> - Update the IOCTL to use uffd under the hood instead of soft-dirty >> flags >> >> Stop using the soft-dirty flags for finding which pages have been >> written to. It is too delicate and wrong as it shows more soft-dirty >> pages than the actual soft-dirty pages. There is no interest in >> correcting it [A][B] as this is how the feature was written years ago. >> It shouldn't be updated to changed behaviour. Peter Xu has suggested >> using the async version of the UFFD WP [C] as it is based inherently >> on the PTEs. >> >> So in this patch series, I've added a new mode to the UFFD which is >> asynchronous version of the write protect. When this variant of the >> UFFD WP is used, the page faults are resolved automatically by the >> kernel. The pages which have been written-to can be found by reading >> pagemap file (!PM_UFFD_WP). This feature can be used successfully to >> find which pages have been written to from the time the pages were >> write protected. This works just like the soft-dirty flag without >> showing any extra pages which aren't soft-dirty in reality. >> >> [A] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.com >> [B] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.com >> [C] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n >> >> *Changes in v6:* >> - Updated the interface and made cosmetic changes >> >> *Cover Letter in v5:* >> Hello, > > Please consider either drop the cover letter below this point or rephrase, > otherwise many of them are not true anymore and it can confuse the > reviewers. I'll remove. > > I have a few high level comments/questions here, please bare with me if any > of them are already discussed by others in the old versions; I'd be happy > to read them when there's a pointer to the relevant answers. > > Firstly, doc update is more than welcomed to explain the new interface > first (before throwing the code..). That can be done in pagemap.rst on > pagemap changes, or userfaultfd.rst on userfaultfd. Okay. I'll add the documentation in next version or after the series has been accepted. Initially I'd added the documentation. But the code kept on changing so much that I had to spend considerable time on updating the documentation. I know it is better to add documentation with the patches. I'll try to add it. > > Besides, can you provide more justification on the new pagemap-side > interface design? > > It seems it came from the Windows API GetWriteWatch(), but it's definitely > not exactly that. Let me spell some points out.. Initially, we just wanted a way to emulate Windows API GetWriteWatch(). So we had added `max_pages` in the IOCTL arguments which is optional and can be used to specify how many pages we want to find of our interest. There was only one set of flags to be matched with the pages. > > There're four kinds of masks (required/anyof/excluded/return). Are they > all needed? Why this is a good interface design? Then, CRIU developers Andrea [1] and Danylo [2], asked to include all these different kinds of masks. I'd thought of these masks as fancy filter inside the kernel. But there wasn't anyone else to review. So I'd included them to move forward. Please let me know your thoughts after reading emails from [1]. > > I saw you used page_region structure to keep the information. I think you > wanted to have a densed output, especially if counting in the "return mask" > above it starts to make more sense. If with a very limited return mask it > means many of the (continuous) page information can be merged into a single > page_region struct when the kernel is scanning. Correct. > > However, at the meantime the other three masks (required/anyof/excluded) > made me quite confused - it means you wanted to somehow filter the pages > and only some of them will get collected. The thing is for a continuous > page range if any of the page got skipped due to the masks (e.g. not in > "required" or in "excluded") it also means it can never be merged into > previous page_region either. That seems to be against the principle of > having densed output. The filtering is being done. But the output can still be condensed regardless. There isn't that randomness in the page flags of the consecutive pages. > > I hope you can help clarify what's the major use case here. > > There's also the new interface to do atomic "fetch + update" on wrprotected > pages. Is that just for efficiency or is the accuracy required in some of > the applications? "Atomic fetch and update/clear" or "Atomic fetch Written-to status and clear it" is needed to support GetWriteWatch() and there is no already present way to perform this operation atomically. We want efficiency and accuracy both to get good performance/speed. So this IOCTL is needed to achieve: 1) New functionality which isn't already present 2) Most efficient and accurate method to perform the operation (it isn't possible through soft-dirty feature) > > Thanks, > [1] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com -- BR, Muhammad Usama Anjum