From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94099C04A94 for ; Fri, 11 Aug 2023 15:30:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E02416B0071; Fri, 11 Aug 2023 11:30:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8BB26B0072; Fri, 11 Aug 2023 11:30:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2C936B0074; Fri, 11 Aug 2023 11:30:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B064F6B0071 for ; Fri, 11 Aug 2023 11:30:32 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 67355A1219 for ; Fri, 11 Aug 2023 15:30:32 +0000 (UTC) X-FDA: 81112210704.12.A179F57 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf02.hostedemail.com (Postfix) with ESMTP id B3E578000D for ; Fri, 11 Aug 2023 15:30:27 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=VPpGNHxd; spf=pass (imf02.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691767828; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MQ3PACL+JESe20YCwTLbSwDhdRc2xblPMrLq8i73HzA=; b=CLCMETGASaZ8uQVVdtnkJtkS6ImGY+KeNq5mIyYI9Z7a8DjNrRvV6wGtgXdgAwB6ismGFg kYdyGBmuo1D8q0DyXans8QrKVO05m+O64SspMR3eAEgwD+ENnPdsbuBqBttLgUGfWenJ3E UsY3wvyM1VqvUefdqbZCjllkI+FglHE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691767828; a=rsa-sha256; cv=none; b=T4ysMhvjvJfIvYWY/0sJYLH1lFU9TTDK5MZXeX+gS7lpFs1UZRrltsiXYi9oyhkbnjMax8 0ngA1S0gC94Wo/50wszVgPN88ABMqn3d/USCSvZR3GxbglGfY7b/iiCXcaSSP2RcHpW7zi JvDpLD+XiC9ZdBu9Hvsg0OSKwCIim5Q= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=VPpGNHxd; spf=pass (imf02.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com Received: from [192.168.100.7] (unknown [39.34.188.71]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id E82DE6607247; Fri, 11 Aug 2023 16:30:19 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1691767826; bh=aJAfOnVKYAKCLBtRxHoBmsF+Y76g4uMvYo6dubi5kTY=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=VPpGNHxdfBiX4UxVZhw3GdcPgkC07HQiubODJEhjpBXuXjBCVk5FaHUlGceWK2TGH qmcDLHGvP+P8PR02oZGeuUl81bjyZm2YS6FnDLmYieh5TYtP60iCTHkD+GmBNdgJy3 2Q0jOokybr34srHpePoKISH1sYtvhQNY5YSe0ZyKWVPkC07MVczYSsseFijDYdiYgN KLSpkRKlarFlv3Jao1uhuzMmSe6UtQxRaR/v6k0tZeLwHGzYLAS6wphE/8hZVBOwim WZIIyr7Rykr981nA6X9gV7eOY3FVLXdHAmaojYBKHkRyU6+qkN17jJ+DMDVqYWwHhd ltFX7a5JDownw== Message-ID: Date: Fri, 11 Aug 2023 20:30:16 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Cc: Muhammad Usama Anjum , =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v28 2/6] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Content-Language: en-US To: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= References: <20230809061603.1969154-1-usama.anjum@collabora.com> <20230809061603.1969154-3-usama.anjum@collabora.com> <97de19a3-bba2-9260-7741-cd5b6f4581e9@collabora.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: 49jerwjothojz7p57yt6uhxr1r5qiraz X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B3E578000D X-Rspam-User: X-HE-Tag: 1691767827-880941 X-HE-Meta: U2FsdGVkX1/dsCHxbIxB9yySMZJhYru1uoJ2KnK6f/HyyVDsa/6r7Wv+GXH9nHBHnOjwYIm6LT34r/mDgOrCSLD4MX+PfLvocQLn3dS7AP0KQMVYLId5iGhF80UExyr55unlk319QTaGEHlFO3LOjuq190UowxgHdq4BzHV7aAuQkP7tRIoyOJ82AlVdyzKuO8zCKqq94/1k2ymdZKON4C6/iWCWLZsIETU4Ak7PQnb58d33bM+cANxTOpe85lxDX0xJgL5ujcLQVtPSnLviZvB6i3tRzhvVGDM6hzQLWvQy+8CqwkzYMuO8anuokmlgm9h/Ku7tslv4ag+IMQ/RhD48jVIByZPOOuPXoL9tmzP/3g1oxx3QgQr0QScVT0OOt6gxWUKfiz8CsBE9VCzx12i6yHZokFet4+mJ+ctTTUk3zh//lFTY2vMZ6LBtLac1taQektFDK5ZxBhvmYvdFsV1TfOMvpROOeM4+PVC3zO5JTOrymsQxSm5VSulWbGkQoftG4U2nef2yrM1cb00ixVtlgjtGBvIBSfqfAiPdyfViFICzis7NuGjRT9t94CNOihIAU7lLrW0l2LKfmFwNaGiilNSziLhhhNY0N3ovA0y2GEzZwgoWeU7x4LTpZ22EAtWWWiiOSPnO51YKsfV4SbVDBppJ5NyV2GHBgbBgEFhboriSXKwy4GHuCO7ucGeT5FGMoQc4rqdK3pMoG4be0+qg3DZd9nCU0c1GaWqtD4y5mJNyId/JyHJN5bCfiRHLZwzKPO71s7Iq9IfhKFsOE/klJjeogU5/kQXnhyKDHW8cqVH9r8eKhL12reHeMB1WL3vogxGsIy7TQ7OdY6HC7POp9g5Hi/41APOAP2W+djs99ezu/qCkx2oGHjtbppbH0kBQuyUDkSu1IcEOSWsAs/+DjxhVwU2N3ENLw4l2fVdbJB3mm44R44ajd5GvXdf8sR6qHws+0YNCHOnbfqy lJHsA8q1 Wd77mq1MAU8YJx2T+tpbrZba7IiMAsDPsNpFedNZvByx7za2KuRbeQyc8WySv25dZpifNcPZ+DnuPcPw16vnrK2bREd8F8K2CThPIB2ku9Jbk6wFn7cUkLL3UusCzVDm108mhKAwt0deIczE1w6/niF65yvoTpN7oXVFKZ4d+uUQxDxCjzBHLN9G8NngvVYEaahZT9IM0o7GzieUSYDXU/Mb8rnw9xNQI2AfNlKo9Nchk5Z5kS3HcDklyKh3m++HXr5lELU85BRmuQ8rmCfhn+nkAnm22zFsZ6Ze2d375APaqLsRNjEog3WbOJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/11/23 6:32 PM, Michał Mirosław wrote: > On Fri, Aug 11, 2023 at 05:02:44PM +0500, Muhammad Usama Anjum wrote: >> On 8/10/23 10:32 PM, Michał Mirosław wrote: >>> On Wed, 9 Aug 2023 at 08:16, Muhammad Usama Anjum >>> wrote: > [...] >>>> --- a/fs/proc/task_mmu.c >>>> +++ b/fs/proc/task_mmu.c >>> [...] >>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> +static unsigned long pagemap_thp_category(pmd_t pmd) >>>> +{ >>>> + unsigned long categories = PAGE_IS_HUGE; >>>> + >>>> + if (pmd_present(pmd)) { >>>> + categories |= PAGE_IS_PRESENT; >>>> + if (!pmd_uffd_wp(pmd)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + if (is_zero_pfn(pmd_pfn(pmd))) >>>> + categories |= PAGE_IS_PFNZERO; >>>> + } else if (is_swap_pmd(pmd)) { >>>> + categories |= PAGE_IS_SWAPPED; >>>> + if (!pmd_swp_uffd_wp(pmd)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + } >>>> + >>>> + return categories; >>>> +} >>> I guess THPs can't be file-backed currently, but can we somehow mark >>> this assumption so it can be easily found if the capability arrives? >> Yeah, THPs cannot be file backed. Lets not care for some feature which may >> not arrive in several years or eternity. > > Yes, it might not arrive. But please add at least a comment, so that it > is clearly visible that lack if PAGE_IS_FILE here is intentional. I'll add a comment. > >>>> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >>>> + >>>> +#ifdef CONFIG_HUGETLB_PAGE >>>> +static unsigned long pagemap_hugetlb_category(pte_t pte) >>>> +{ >>>> + unsigned long categories = PAGE_IS_HUGE; >>>> + >>>> + if (pte_present(pte)) { >>>> + categories |= PAGE_IS_PRESENT; >>>> + if (!huge_pte_uffd_wp(pte)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + if (!PageAnon(pte_page(pte))) >>>> + categories |= PAGE_IS_FILE; >>>> + if (is_zero_pfn(pte_pfn(pte))) >>>> + categories |= PAGE_IS_PFNZERO; >>>> + } else if (is_swap_pte(pte)) { >>>> + categories |= PAGE_IS_SWAPPED; >>>> + if (!pte_swp_uffd_wp_any(pte)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + } >>> >>> BTW, can a HugeTLB page be file-backed and swapped out? >> Accourding to pagemap_hugetlb_range(), file-backed HugeTLB page cannot be >> swapped. > > Here too a comment that leaving out this case is intentional would be useful. Sure. > >>> [...] >>>> + walk_start = p.arg.start; >>>> + for (; walk_start < p.arg.end; walk_start = p.arg.walk_end) { > [...[ >>>> + ret = mmap_read_lock_killable(mm); >>>> + if (ret) >>>> + break; >>>> + ret = walk_page_range(mm, walk_start, p.arg.end, >>>> + &pagemap_scan_ops, &p); >>>> + mmap_read_unlock(mm); > [...] >>>> + if (ret != -ENOSPC || p.arg.vec_len - 1 == 0 || >>>> + p.found_pages == p.arg.max_pages) >>>> + break; >>> >>> The second condition is equivalent to `p.arg.vec_len == 1`, but why is >>> that an ending condition? Isn't the last entry enough to gather one >>> more range? (The walk could have returned -ENOSPC due to buffer full >>> and after flushing it could continue with the last free entry.) >> Now we are walking the entire range walk_page_range(). We don't break loop >> when we get -ENOSPC as this error may only mean that the temporary buffer >> is full. So we need check if max pages have been found or output buffer is >> full or ret is 0 or any other error. When p.arg.vec_len = 1 is end >> condition as the last entry is in cur. As we have walked over the entire >> range, cur must be full after which the walk returned. >> >> So current condition is necessary. I've double checked it. I'll change it >> to `p.arg.vec_len == 1`. > > If we have walked the whole range, then the loop will end anyway due to > `walk_start < walk_end` not held in the `for()`'s condition. Sorry, for not explaining to-the-point. Why would we walk the entire range when we should recognize that the output buffer is full and break the loop? I've test cases written for this case. If I remove `p.arg.vec_len == 1` check, there is infinite loop for walking. So we are doing correct thing here. > > [...] >>>> +/* >>>> + * struct pm_scan_arg - Pagemap ioctl argument >>>> + * @size: Size of the structure >>>> + * @flags: Flags for the IOCTL >>>> + * @start: Starting address of the region >>>> + * @end: Ending address of the region >>>> + * @walk_end Address where the scan stopped (written by kernel). >>>> + * walk_end == end informs that the scan completed on entire range. >>> >>> Can we ensure this holds also for the tagged pointers? >> No, we cannot. > > So this need explanation in the comment here. (Though I'd still like to > know how the address tags are supposed to be used from someone that > knows them.) I've looked at some documentations (presentations/talks) about tags. Tags is more like userspace feature. Kernel should just ignore them for our use case. I'll add comment. > > Best Regards > Michał Mirosław -- BR, Muhammad Usama Anjum