From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D746C6FD20 for ; Fri, 24 Mar 2023 06:32:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B89886B0072; Fri, 24 Mar 2023 02:32:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B38F96B0074; Fri, 24 Mar 2023 02:32:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A01456B0075; Fri, 24 Mar 2023 02:32:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8C9CD6B0072 for ; Fri, 24 Mar 2023 02:32:44 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A658C808EF for ; Fri, 24 Mar 2023 06:32:43 +0000 (UTC) X-FDA: 80602823406.24.9B8EB3B Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 79D371C0008 for ; Fri, 24 Mar 2023 06:32:41 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=hdM8mvf1; spf=pass (imf18.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679639561; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MNpwb3d4UbWT1eoYhmMfX+XGP9X/D8e+jZ1yRi0/kjY=; b=JWytleNownLYEhH0qkJL+k8BMFQH0TuZchm1z4IDOvALO5GhBJByaasTtB96MVc0+goco/ CHK+L5YriecTplvJTVzoztT6iYQ0BOxdK4yIaOKCetz74gD6HMthdUN7NO19y1tvNEi+RP 9MgiNB1ubqCWCOECtLjv23WXk6i/QSA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=hdM8mvf1; spf=pass (imf18.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679639562; a=rsa-sha256; cv=none; b=ZfR/HLWi0WLZFEfnC7qFwCkHMzQU8gTjj2ZvMskAmLPi6/apCT6/G0mNfcRNHJOJ+l+XAB ytLBLba05U/evArIUgfreKO5jjw30lKbBQygqHMO7CvMMX1F0RINprIDG4P911elUCnZNh xMDfaEblVEifhBBqhOQtv1nFcBY/LSY= Received: from [192.168.10.39] (unknown [39.37.168.222]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 9FA0E660310B; Fri, 24 Mar 2023 06:32:33 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1679639559; bh=IMA5rGCkVAVmMkQFMSp4zzFkYv6syLIF9LAmjyndPbE=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=hdM8mvf1+XN0U8kmYynVlztFJRmNGyTmYHC8/v6zAF+VHcsOMQVlIR8U5RZbNJr+u Vx4ZMLRnf0Z3+TRBnlrlsVQUdQ+e2QyA/Z1ELTRRUjTnnBBKkMwm90rDwgBoB9WOtN 3piW2bW93Jh0bxcegtwgyJiT9B6N17c+JiPElngLMkL4VYYTD5iqp2lPZ+v/Um2W1X IyRdQ6ZVVNWUMpuDsUmklcliu+izZHUxqG6l7Qb1NjpyY0oA5OK31BWFJyc3bUiC8q OG3E9oUTPlSh5YmAmF+djmZdct/7XU0l5NM36RxMzkWDwH2tbElKTEvbOrJtW3OryG ZqFEMYkoUb7GQ== Message-ID: Date: Fri, 24 Mar 2023 11:32:27 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Cc: Muhammad Usama Anjum , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Mike Rapoport , Nadav Amit , linux-stable , David Hildenbrand Subject: Re: [PATCH] mm/hugetlb: Fix uffd wr-protection for CoW optimization path Content-Language: en-US To: Peter Xu References: <20230321191840.1897940-1-peterx@redhat.com> <44aae7fc-fb1f-b38e-bc17-504abf054e3f@redhat.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 6q1ifwbhrn4n4k33kafgdx8ib8q85d1b X-Rspamd-Queue-Id: 79D371C0008 X-HE-Tag: 1679639561-259907 X-HE-Meta: U2FsdGVkX1+G06pA3s6oRqqqKV3mrlR5QV3Ed9oAXePY9glOr5qydI6k7FL3Si3lTVGy0iu8absU2UXLGjL/h6PoJqYY7A6So+w70pBh1fgvXCUc0uUgw6i5pg8d5dvzX6sYDPckMoobL6NpisD85XDIFN0+D/nTnftCkFIeEV9nqKGs5r3kXbkU1H0YAc0SfkVvlRddkf6B+qyoFjrjhcRbO4NoEAJLSp+lw9F+4uuXReKAfREouASBywKMxSyWxak1xE+TVj7v76TAhw0HWA8JNxZcIXKpdoaBFO4FSEBWTzhqTwuc8AgcKZs4QvE7z5iZEe83nPlNoGbfM0MAomNrDYJz3QdQ/SdeaST6IMM69Ftfy8ghywbQauqxmZKcCC4DeJPrG7Jo5sefQc7C+td34kCdzynQeU4U58cEE14VorqQNzN4KPgr/r5Rjj0Y8uJWhD7qxXncBoGsLbVGj2BYnG8XyoHAQ/XFWMXVEba3zX9oBw4mslpMnnRJIIIc15QPxI72tPEAC1tPoOFXQVVabrjSxUo1d2g58VSbYEE1KVFPvOmWhR52QZWkpTBvSwDnduixZNfGHLqQuJdJKu4WDglB9f3TSPyuRQSR7mHGulnN8PHKc7qAymbXvYHhOj0nNBaEvKTd6KU7RJQ9FpH+rdr0elcTvVzEovRb+7GQYGWKTcDLm/E4NoFvzYL1qNyQenhgpI7hsAfBfZiyPZ528Cxpwh9dtKGao8VCtEwiqJ+6GGJSULjq3irp/WAA3Wm9s8c37Y8/tBlW7de8ZNbyD/eayH1OYpRWkLkfYeQRkE8e0NpXCqImL2gbprtC7nGozIWKUWO7aLJYzaxKM9kJgmFJJG1fJu2bBnozfIvkzWDrtyrd/0c/gwbjNHUQSSW07sQK7KAVTwbxEUuD516YhNJfIZA3RY1lpk6r1yy4AVCeI2crpLTqAg0RjPRBQffNJaPOqzlzB4DSf9V 6wKdTpmX plb7fu458sOw4sSOZMAdKZFDULVi3wpctm8kyc2krpFsQKK3RbCnMUo3bS6ceZaJN9Co7V4fV5uHYe4Ryyw0o2y7MtDII0B4xDvuG4tEee55yD6wh3yOAaO/BdHWxuRPOIAVbP6g3oTVD/6F7jQHLQUB6eXmDdUcTNbfrmTUaPAU0y9f/D9MuKGe52Uj6INEWZkWY950xv2VwKJhRj35YbO1+Ar0K6G9mIf9RvEaYq2p//PE/qt9lP5Wj695qEHnU/dPhfFRJzM44P0F9XdctNask6p4FkfU+9vjJ/dYJo2Oihh8qbtplb+LTbBPcT1eURC7uUThPqI4h9OFbHYsDUagNTzifSAzlAXXWhhOT6tqe5A/FzfeF2czObroQ7rBw+5GoCSh/Eus8XE3H/inTvdwu1RuM/BiUs5YGp8qtuZVcF9nJ24ujn0KHTLBB97vBA7WPsPbPkOZ962A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/24/23 3:11 AM, Peter Xu wrote: > On Thu, Mar 23, 2023 at 08:33:07PM +0500, Muhammad Usama Anjum wrote: >> Hi Peter, >> >> Sorry for late reply. >> >> On 3/22/23 12:50 AM, Peter Xu wrote: >>> On Tue, Mar 21, 2023 at 08:36:35PM +0100, David Hildenbrand wrote: >>>> On 21.03.23 20:18, Peter Xu wrote: >>>>> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be >>>>> writable even with uffd-wp bit set. It only happens with all these >>>>> conditions met: (1) hugetlb memory (2) private mapping (3) original mapping >>>>> was missing, then (4) being wr-protected (IOW, pte marker installed). Then >>>>> write to the page to trigger. >>>>> >>>>> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before >>>>> even reaching hugetlb_wp() to avoid taking more locks that userfault won't >>>>> need. However there's one CoW optimization path for missing hugetlb page >>>>> that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the >>>>> userfaultfd-wp traps. >>>>> >>>>> A few ways to resolve this: >>>>> >>>>> (1) Skip the CoW optimization for hugetlb private mapping, considering >>>>> that private mappings for hugetlb should be very rare, so it may not >>>>> really be helpful to major workloads. The worst case is we only skip the >>>>> optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another >>>>> fault anyway. >>>>> >>>>> (2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault() >>>>> into hugetlb_wp(). The major cons is there're a bunch of locks taken >>>>> when calling hugetlb_wp(), and that will make the changeset unnecessarily >>>>> complicated due to the lock operations. >>>>> >>>>> (3) Carry over uffd-wp bit in hugetlb_wp(), so it'll need to fault again >>>>> for uffd-wp privately mapped pages. >>>>> >>>>> This patch chose option (3) which contains the minimum changeset (simplest >>>>> for backport) and also make sure hugetlb_wp() itself will start to be >>>>> always safe with uffd-wp ptes even if called elsewhere in the future. >>>>> >>>>> This patch will be needed for v5.19+ hence copy stable. >>>>> >>>>> Reported-by: Muhammad Usama Anjum >>>>> Cc: linux-stable >>>>> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") >>>>> Signed-off-by: Peter Xu >>>>> --- >>>>> mm/hugetlb.c | 8 +++++--- >>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>>> index 8bfd07f4c143..22337b191eae 100644 >>>>> --- a/mm/hugetlb.c >>>>> +++ b/mm/hugetlb.c >>>>> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >>>>> struct folio *pagecache_folio, spinlock_t *ptl) >>>>> { >>>>> const bool unshare = flags & FAULT_FLAG_UNSHARE; >>>>> - pte_t pte; >>>>> + pte_t pte, newpte; >>>>> struct hstate *h = hstate_vma(vma); >>>>> struct page *old_page; >>>>> struct folio *new_folio; >>>>> @@ -5622,8 +5622,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >>>>> mmu_notifier_invalidate_range(mm, range.start, range.end); >>>>> page_remove_rmap(old_page, vma, true); >>>>> hugepage_add_new_anon_rmap(new_folio, vma, haddr); >>>>> - set_huge_pte_at(mm, haddr, ptep, >>>>> - make_huge_pte(vma, &new_folio->page, !unshare)); >>>>> + newpte = make_huge_pte(vma, &new_folio->page, !unshare); >>>>> + if (huge_pte_uffd_wp(pte)) >>>>> + newpte = huge_pte_mkuffd_wp(newpte); >>>>> + set_huge_pte_at(mm, haddr, ptep, newpte); >>>>> folio_set_hugetlb_migratable(new_folio); >>>>> /* Make the old page be freed below */ >>>>> new_folio = page_folio(old_page); >>>> >>>> Looks correct to me. Do we have a reproducer? >>> >>> I used a reproducer for the async mode I wrote (patch 2 attached, need to >>> change to VM_PRIVATE): >>> >>> https://lore.kernel.org/all/ZBNr4nohj%2FTw4Zhw@x1n/ >>> >>> I don't think kernel kselftest can trigger it because we don't do strict >>> checks yet with uffd-wp bits. I've already started looking into cleanup >>> the test cases and I do plan to add new tests to cover this. >>> >>> Meanwhile, let's also wait for an ack from Muhammad. Even though the async >>> mode is not part of the code base, it'll be a good test for verifying every >>> single uffd-wp bit being set or cleared as expected. >> I've tested by applying this patch. But the bug is still there. Just like >> Peter has mentioned, we are using our in progress patches related to >> pagemap_scan ioctl and userfaultd wp async patches to reproduce it. >> >> To reproduce please build kernel and run pagemap_ioctl test in mm in >> hugetlb_mem_reproducer branch: >> https://gitlab.collabora.com/usama.anjum/linux-mainline/-/tree/hugetlb_mem_reproducer >> >> In case you have any question on how to reproduce, please let me know. I'll >> try to provide a cleaner alternative. > > Hmm, I think my current fix is incomplete if not wrong. The root cause > should still be valid, however I overlooked another path: > > if (page_mapcount(old_page) == 1 && PageAnon(old_page)) { > if (!PageAnonExclusive(old_page)) > page_move_anon_rmap(old_page, vma); > if (likely(!unshare)) > set_huge_ptep_writable(vma, haddr, ptep); > > delayacct_wpcopy_end(); > return 0; > } > > We should bail out early in this path, and it'll be even easier we always > bail out hugetlb_wp() as long as uffd-wp is detected because userfault > should always be handled before any decision to CoW. > > v2 attached.. Please give it another shot. This attached v2 works. Please add: Tested-by: Muhammad Usama Anjum > > Thanks, > -- BR, Muhammad Usama Anjum