From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49181C6379F for ; Tue, 31 Jan 2023 08:40:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BC2C6B00A2; Tue, 31 Jan 2023 03:40:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96B2B6B00A3; Tue, 31 Jan 2023 03:40:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 796AE6B00A4; Tue, 31 Jan 2023 03:40:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6BD856B00A2 for ; Tue, 31 Jan 2023 03:40:33 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 558E9A0250 for ; Tue, 31 Jan 2023 08:40:33 +0000 (UTC) X-FDA: 80414447946.05.985D31A Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 5DD04180016 for ; Tue, 31 Jan 2023 08:40:31 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=kYq80rrV; spf=pass (imf16.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=reject) header.from=collabora.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675154431; a=rsa-sha256; cv=none; b=my8rGiYH5PQIoNRiWv02iwrhYUY/UXgHGkEet2Pv+sThdjDjFwqDJxcSmGuCO4RE/qFFbR ebnsPdorPM12Kwo/+GgvBJ6Rinu7AwLnX3Jsx7ZHRm1diwHx7Vcero4ZymdYU+EIbgMefs w7nwi3VDH79V8DW8X6KTynwZbaiOHFU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=kYq80rrV; spf=pass (imf16.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=reject) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675154431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7WfWKm1iMthzQrokKA/CBO7z0Hxqla3ggluJ4ULC8aQ=; b=riJWp//RsWE/VPpVZGckpWWCcFT0LiBsOfZIy70YxC5lHyglq32xea5/m6Yw9obFF5JTik NP0Z7zaxHDKPrLLVK5Th7e22+fBlVofgMbu2Gq+k/rdy5ymLU+NZS8cCTMzDA5ZTfp1Aws Z7SmQrJqiVjNh82vT53Bcx5cDlOZL2I= Received: from [192.168.10.12] (unknown [39.45.165.226]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 40F036602EB0; Tue, 31 Jan 2023 08:40:20 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1675154430; bh=rgu7DTIg2yh3vlm4bDLiBc5n/dNTUucjeoIq3IaxRes=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=kYq80rrVyqb14gdPV8m4rokazQdVee0TJewpQ8ndOjQdIMMManRCPZ7I/rUAomJBl TnK2kcah33Z5+ydN/qa4Hsp8NXXPioNzIVJX/fakke/hnDmU4ufKrvCs6qI+f4aflg gMZrOdIX5CNWn3vzsmej9/GMR1jbvWU0/tabs9+mvDHRLBwWJzl3X69O0SdNeqpjNp pmkqBP0aog2sNtThK0cxrzrt04ytFbNDQdU/V6mkkBHaxCcQICYKAxUD5DPnwuxSgs s2bFeAkin24Lh440R2h8vxadA/PSQiXDmc4OEI+DBat6xxExN/+yC5kTiM3B4C39zm 75qbyyMFQESqg== Message-ID: Date: Tue, 31 Jan 2023 13:40:13 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Cc: Muhammad Usama Anjum , David Hildenbrand , Andrew Morton , =?UTF-8?B?TWljaGHFgiBNaXJvc8WC?= =?UTF-8?Q?aw?= , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Mike Rapoport , Nadav Amit , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v8 1/4] userfaultfd: Add UFFD WP Async support Content-Language: en-US To: Peter Xu References: <20230124084323.1363825-1-usama.anjum@collabora.com> <20230124084323.1363825-2-usama.anjum@collabora.com> <1968dff9-f48a-3290-a15b-a8b739f31ed2@collabora.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 5DD04180016 X-Rspamd-Server: rspam01 X-Stat-Signature: 919fez7rrhe3fge78jb8yy3zhcbqhs85 X-HE-Tag: 1675154431-784778 X-HE-Meta: U2FsdGVkX1/48egb4V0EXDblmbaBNrffHehG5xad9JAW/lVzfaDvIXUJjREM97m48AtMWUD+95YraRM+cPdXIXC5aqboAHZCw77X/zcT33Yy5sZEmYsJghCilLdGwi5ox1wYEVjMdGzrKq/RLhwD3GZocTymm3beowJVqy+sYlX51hwmwBoJJduRAW52Xgt5q/kxrz+8twu6mNAfZFyKFWjnaThPRTo17hMGDFSNi/mwVCxU1jCkBhNZCAW2zXbtXuE3ZvsIwqQobSue5QU1xu4yz4ucJtFUoOYE0Q+982Hc/Ml/UtCLirM6eF9h/cP7eXZ0fVFJLvxUhpE6EW7hMZF95AulZslQW5R2EBTNMqOCT1LHcbqGtT/TgGDbQfu6cOUJwquChOrcPwm+K4FvfDeGEyaNw718Ig6VwtyogZiRQZqhs2D0fVjLmwl09iW9BKUSnzbH+Rbmcc49RL8MGr/fZaxgHsm+FojAAW5BuGOgPSL9fHyl+CqlyNQBQc87F2N/fyF3cHKv87pwGaWb/arBLL5mZ81R09dLgHbunitUr2bAf5rW5rRS3qUZ6yoaVkRr0S3k932xR+CXAIGXZyyxtR9eYc76Y3WHfnxXvhdpTjtcW/qDr+FhGB6vgN3MoPYnzumSKQGbu++B3DjQDCyFw1vRpvc6SiRKdC589AwpCu7awFDp9TO/DWO0ghGe/1EZVf7LRhZYuO6BprFLGDCQowRWo4YLJUsRiRnnOVF4t6+gHAgiVliNHT9AxZGy+rJIFIcp+5FZANwaSH80TTnzs8SXmo2tP/7t6/CJ1z/cyMUPuaTl7hn3Mei1LQUfi/cJBMprOiJDPBOAXLcC0d2EGVBUqy3VpiJWd3wPQtaAJp8LnZ9i11P70fIct1zlDnfNEsP3Uamxsf8UOSzErqRZzVsyQxTzDuaq7zY6Db4jgu2XRgifJpZRFYikNnjBApgXllBBj0LnozMg+Bj XUa5nBvJ H0WdmeR+nKxNc7PBS/PxjtMaZLp6mzD1YEAa8/IY0OKT3/2oQ9XxmVSe2NYgbLMttUPhPc2Qb1Vbugk1CZtyDATuOpQFrvf1LkVVubRFpMk3R9SBBZRoxKPN8M+69tTeNf1e3cxmDu8pjr/kav/Ub05fCeZ2Z/BZL6fiRdLoXqvNvf9Bo42YU6tDjDfKE2uJQvikrHz3MhCTn4AyiTofEfFTgpqhK7zyC8eUhmmZ5VzlVBjwyrtuknkrnpbp66I2vIiEzL2dAcAXy6+7imkpv7D0DCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/31/23 2:27 AM, Peter Xu wrote: > On Mon, Jan 30, 2023 at 01:38:16PM +0500, Muhammad Usama Anjum wrote: >> On 1/27/23 8:32 PM, Peter Xu wrote: >>> On Fri, Jan 27, 2023 at 11:47:14AM +0500, Muhammad Usama Anjum wrote: >>>>>> diff --git a/mm/memory.c b/mm/memory.c >>>>>> index 4000e9f017e0..8c03b133d483 100644 >>>>>> --- a/mm/memory.c >>>>>> +++ b/mm/memory.c >>>>>> @@ -3351,6 +3351,18 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >>>>>> >>>>>> if (likely(!unshare)) { >>>>>> if (userfaultfd_pte_wp(vma, *vmf->pte)) { >>>>>> + if (userfaultfd_wp_async(vma)) { >>>>>> + /* >>>>>> + * Nothing needed (cache flush, TLB invalidations, >>>>>> + * etc.) because we're only removing the uffd-wp bit, >>>>>> + * which is completely invisible to the user. This >>>>>> + * falls through to possible CoW. >>>>> >>>>> Here it says it falls through to CoW, but.. >>>>> >>>>>> + */ >>>>>> + pte_unmap_unlock(vmf->pte, vmf->ptl); >>>>>> + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, >>>>>> + pte_clear_uffd_wp(*vmf->pte)); >>>>>> + return 0; >>>>> >>>>> ... it's not doing so. The original lines should do: >>>>> >>>>> https://lore.kernel.org/all/Y8qq0dKIJBshua+X@x1n/ >>> >>> [1] >>> >>>>> >>>>> Side note: you cannot modify pgtable after releasing the pgtable lock. >>>>> It's racy. >>>> If I don't unlock and return after removing the UFFD_WP flag in case of >>>> async wp, the target just gets stuck. Maybe the pte lock is not unlocked in >>>> some path. >>>> >>>> If I unlock and don't return, the crash happens. >>>> >>>> So I'd put unlock and return from here. Please comment on the below patch >>>> and what do you think should be done. I've missed something. >>> >>> Have you tried to just use exactly what I suggested in [1]? I'll paste >>> again: >>> >>> ---8<--- >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 4000e9f017e0..09aab434654c 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3351,8 +3351,20 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >>> >>> if (likely(!unshare)) { >>> if (userfaultfd_pte_wp(vma, *vmf->pte)) { >>> - pte_unmap_unlock(vmf->pte, vmf->ptl); >>> - return handle_userfault(vmf, VM_UFFD_WP); >>> + if (userfaultfd_uffd_wp_async(vma)) { >>> + /* >>> + * Nothing needed (cache flush, TLB >>> + * invalidations, etc.) because we're only >>> + * removing the uffd-wp bit, which is >>> + * completely invisible to the user. >>> + * This falls through to possible CoW. >>> + */ >>> + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, >>> + pte_clear_uffd_wp(*vmf->pte)); >>> + } else { >>> + pte_unmap_unlock(vmf->pte, vmf->ptl); >>> + return handle_userfault(vmf, VM_UFFD_WP); >>> + } >>> } >>> ---8<--- >>> >>> Note that there's no "return", neither the unlock. The lock is used in the >>> follow up write fault resolution and it's released later. >> I've tried out the exact patch above. This doesn't work. The pages keep >> their WP flag even after being resolved in do_wp_page() while is written on >> the page. >> >> So I'd added pte_unmap_unlock() and return 0 from here. This makes the >> patch to work. Maybe you can try this on your end to see what I'm seeing here? > > Oh maybe it's because it didn't update orig_pte. If you want, you can try > again with doing so by changing: > > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, > pte_clear_uffd_wp(*vmf->pte)); > > into: > > pte_t pte = pte_clear_uffd_wp(*vmf->pte); > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); > /* Update this to be prepared for following up CoW handling */ > vmf->orig_pte = pte; > It works. >> >>> >>> Meanwhile please fully digest how pgtable lock is used in this path before >>> moving forward on any of such changes. >>> >>>> >>>>> >>>>>> + } >>>>>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>>>>> return handle_userfault(vmf, VM_UFFD_WP); >>>>>> } >>>>>> @@ -4812,8 +4824,21 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) >>>>>> >>>>>> if (vma_is_anonymous(vmf->vma)) { >>>>>> if (likely(!unshare) && >>>>>> - userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >>>>>> - return handle_userfault(vmf, VM_UFFD_WP); >>>>>> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) { >>>>>> + if (userfaultfd_wp_async(vmf->vma)) { >>>>>> + /* >>>>>> + * Nothing needed (cache flush, TLB invalidations, >>>>>> + * etc.) because we're only removing the uffd-wp bit, >>>>>> + * which is completely invisible to the user. This >>>>>> + * falls through to possible CoW. >>>>>> + */ >>>>>> + set_pmd_at(vmf->vma->vm_mm, vmf->address, vmf->pmd, >>>>>> + pmd_clear_uffd_wp(*vmf->pmd)); >>>>> >>>>> This is for THP, not hugetlb. >>>>> >>>>> Clearing uffd-wp bit here for the whole pmd is wrong to me, because we >>>>> track writes in small page sizes only. We should just split. >>>> By detecting if the fault is async wp, just splitting the PMD doesn't work. >>>> The below given snippit is working right now. But definately, the fault of >>>> the whole PMD is being resolved which if we can bypass by correctly >>>> splitting would be highly desirable. Can you please take a look on UFFD >>>> side and suggest the changes? It would be much appreciated. I'm attaching >>>> WIP v9 patches for you to apply on next(next-20230105) and pagemap_ioctl >>>> selftest can be ran to test things after making changes. >>> >>> Can you elaborate why thp split didn't work? Or if you want, I can look >>> into this and provide the patch to enable uffd async mode. >> Sorry, I was doing the wrong way. Splitting the page does work. What do you >> think about the following: >> >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3351,6 +3351,17 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >> >> if (likely(!unshare)) { >> if (userfaultfd_pte_wp(vma, *vmf->pte)) { >> + if (userfaultfd_wp_async(vma)) { >> + /* >> + * Nothing needed (cache flush, TLB invalidations, >> + * etc.) because we're only removing the uffd-wp bit, >> + * which is completely invisible to the user. >> + */ >> + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, >> + pte_clear_uffd_wp(*vmf->pte)); >> + pte_unmap_unlock(vmf->pte, vmf->ptl); >> + return 0; > > Please give it a shot with above to see whether we can avoid the "return 0" > here. > >> + } >> pte_unmap_unlock(vmf->pte, vmf->ptl); >> return handle_userfault(vmf, VM_UFFD_WP); >> } >> @@ -4812,8 +4823,13 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault >> *vmf) >> >> if (vma_is_anonymous(vmf->vma)) { >> if (likely(!unshare) && >> - userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) { >> + if (userfaultfd_wp_async(vmf->vma)) { >> + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL); >> + return 0; > > Same here, I hope it'll work for you if you just goto __split_huge_pmd() > right below and return with VM_FAULT_FALLBACK. It avoids one more round of > fault just like the pte case above. > It works as well. >> + } >> return handle_userfault(vmf, VM_UFFD_WP); >> + } >> return do_huge_pmd_wp_page(vmf); >> } > -- BR, Muhammad Usama Anjum