From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44B85C25B04 for ; Thu, 15 Dec 2022 07:13:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CDEC8E0003; Thu, 15 Dec 2022 02:13:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 37E658E0002; Thu, 15 Dec 2022 02:13:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2457F8E0003; Thu, 15 Dec 2022 02:13:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 15BAF8E0002 for ; Thu, 15 Dec 2022 02:13:10 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C2F9016024C for ; Thu, 15 Dec 2022 07:13:09 +0000 (UTC) X-FDA: 80243674098.07.749DF2D Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf14.hostedemail.com (Postfix) with ESMTP id 77F5A100002 for ; Thu, 15 Dec 2022 07:13:06 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mC32ZJxV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671088387; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TQouq+/5G7WN0XU/fqkNFn+nokFkLoX2Q2OqFzAwyUk=; b=FmFf/F+BmRVI0XxwX2kqQvFOZToR91O7aeU/34I+GShlB5V5qA0u1GqEDyC33jvRMV7BED g7MHvKldZjqPOYEIYY++mQ5P60uH6YsIk/0PcMcJt+Jxzf/TUQRHZaSXT5XLvlUkvD7xMq if+8FflgtyLECH6BcLel4A2c0hsJs4k= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mC32ZJxV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671088387; a=rsa-sha256; cv=none; b=8amIhDe18wbDRqX8FK5Gg01EwXYVYIrbR2pTmHTURONGrQjn5R9B0/AaUNfniR7zGSRLtx vLwnkleiL9aZDD4OKaADAp9cASrMpkZHIjpsIBdxLWh998lmel5mK5NinIh0TAiq9IwSQc csDne6xvXc/iIx/ZPaMYM3rR4KGPZyU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671088386; x=1702624386; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=EWQk0FKhMSiCXvonJAP6/L07NtFYNlyOomIZtp+JTIo=; b=mC32ZJxVX/gpCzANAXA75VZ1k+9Hvez8MpqpB4hrok+vPutep7bNNR2n DDL2Ux98CFBcAvDRx4o6DQXrjJpUZwHFdEjVXRCiQ8Ahwy4XPCh2D0k/S gf7ScjMgS17vBV0VwEH7jjeWlrhodmcmC7H76ne1DmU1JYMI2S5yDcLIW qzIL69ZZPLyVplPRzPmjDbijLXKHsaq6xft4xj5ndCiG25HL1HfwE1O8o yBYQXeAnTv7iehwGwE/GBcDafG9oi3cxnOh8bZUCsP/YwomqmsF6rBd7B 010+CC0AP0ZJtE8QWPvSNR9zR9F9kndSyOAER3Qco5+YRIUTJl++qZZid A==; X-IronPort-AV: E=McAfee;i="6500,9779,10561"; a="302014615" X-IronPort-AV: E=Sophos;i="5.96,246,1665471600"; d="scan'208";a="302014615" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2022 23:13:04 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10561"; a="680015052" X-IronPort-AV: E=Sophos;i="5.96,246,1665471600"; d="scan'208";a="680015052" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2022 23:13:02 -0800 From: "Huang, Ying" To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrea Arcangeli , Pengfei Xu , Nadav Amit , David Hildenbrand , Andrew Morton , Miaohe Lin Subject: Re: [PATCH 2/2] mm: Fix a few rare cases of using swapin error pte marker References: <20221214200453.1772655-1-peterx@redhat.com> <20221214200453.1772655-3-peterx@redhat.com> Date: Thu, 15 Dec 2022 15:12:13 +0800 In-Reply-To: <20221214200453.1772655-3-peterx@redhat.com> (Peter Xu's message of "Wed, 14 Dec 2022 15:04:53 -0500") Message-ID: <87bko5cf8y.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 77F5A100002 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: hpf4bfzfygp9gixiui8ay8ng81wp14c4 X-HE-Tag: 1671088386-438344 X-HE-Meta: U2FsdGVkX1/pN2f37Dsgw3D0UFBKFXnXvjRVvBTSk39H253EWlgMkYz8Yc8Tl/ueWmVHohjJpMzxIh2QJEmnJJh7IQflLuoNERwCZBfG48DyLAV+9BMxLnatsrtE+GUCIQ7hIB7wB4tPmN4PoYKqFcNX6AHEhTKRBJtSI+77xlw2CgEk3JgkxgPjMDhfEFuc6F8Tfrm9FahBg1Glb/WhVuy2AF0qODUnwzMv5vQx6EVwou1Wm0xpwccpjnHGWNq6Op/26oNXk03pJ/mnpxt+X4BqAGsSQWvwKKGCHMUic0k0YwvGb0TV+W86thGPU+2RtPNgJwi0zEsBfSjZAgQfiMwtEtX46okgIhH2HwQXPsZGiORAol1hwDpmSejt+6SZzrzYrnLsEfWGckvZrexpjTcBYaHtpP21+27k2DDpMIlkElk6YnrPd7COlCUtpuiwjL4Is8BMqrkWfNlFTrRoZi5JC5r5jqpE2UimEDZIvPROhumkvoyc0ZOS6w87UiwFoekqECYQ8g4RjMgPE+3q6uMRSscIx2ayCzRjbCbEwn30uUg0MkI7EM9fuiXzd9apmNk3MSJKzleyfWYwfW6T6wT/sWK09mYcx6Boyfbzm2cMLfv9kYRKnUff4sBgg46I8Lif36Tg0TlDnQYP6eGARUcqPMm9kgopc7WKK/ZPc9Sc9oFWMvT5EDdcdR+k2vavQVwTCoocvNqUUu9zMsbk3CKomcuswp1hKz9ft94FvKKz7BneIbE0I/UTd6w2bUdeIOeZj6BxKezxKX+YjClEb4qIh6riTbch5ZxxuHbConHXDU0ZNTonHJ+2mbxg9XCulTAjCGLaGwNo4fcbLDb+NJFzCBF/5Rq7jOf9pYfHNQUTuA66jwAJvtPgxG6JSFNcwyIMpeuUiVrWe6dZiHiWYWRLdRhvcSo4gqWrsbDs4YoDvS368xVnLw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Peter Xu writes: > This patch should harden commit 15520a3f0469 ("mm: use pte markers for swap > errors") on using pte markers for swapin errors on a few corner cases. > > 1. Propagate swapin errors across fork()s: if there're swapin errors in > the parent mm, after fork()s the child should sigbus too when an error > page is accessed. > > 2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte > marker can be quickly switched to a swapin error. > > 3. Explicitly ignore swapin error pte markers in change_protection(). > > I mostly don't worry on (2) or (3) at all, but we should still have them. > Case (1) is special because it can potentially cause silent data corrupt on > child when parent has swapin error triggered with swapoff, but since swapin > error is rare itself already it's probably not easy to trigger either. > > Currently there is a priority difference between the uffd-wp bit and the > swapin error entry, in which the swapin error always has higher > priority (e.g. we don't need to wr-protect a swapin error pte marker). > > If there will be a 3rd bit introduced, we'll probably need to consider a > more involved approach so we may need to start operate on the bits. Let's > leave that for later. > > This patch is tested with case (1) explicitly where we'll get corrupted > data before in the child if there's existing swapin error pte markers, and > after patch applied the child can be rightfully killed. > > We don't need to copy stable for this one since 15520a3f0469 just landed as > part of v6.2-rc1, only "Fixes" applied. > > Fixes: 15520a3f0469 ("mm: use pte markers for swap errors") > Signed-off-by: Peter Xu > --- > mm/hugetlb.c | 3 +++ > mm/memory.c | 8 ++++++-- > mm/mprotect.c | 8 +++++++- > 3 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f5f445c39dbc..1e8e4eb10328 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4884,6 +4884,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_pte_marker(entry))) { > + /* No swap on hugetlb */ > + WARN_ON_ONCE( > + is_swapin_error_entry(pte_to_swp_entry(entry))); > /* > * We copy the pte marker only if the dst vma has > * uffd-wp enabled. > diff --git a/mm/memory.c b/mm/memory.c > index 032ef700c3e8..3e836fecd035 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -828,7 +828,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > return -EBUSY; > return -ENOENT; > } else if (is_pte_marker_entry(entry)) { > - if (userfaultfd_wp(dst_vma)) > + if (is_swapin_error_entry(entry) || userfaultfd_wp(dst_vma)) Should we do this in [1/2]? It appears that we introduce an issue in [1/2] and fix it in [2/2]? Best Regards, Huang, Ying > set_pte_at(dst_mm, addr, dst_pte, pte); > return 0; > } > @@ -3625,8 +3625,12 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf) > /* > * Be careful so that we will only recover a special uffd-wp pte into a > * none pte. Otherwise it means the pte could have changed, so retry. > + * > + * This should also cover the case where e.g. the pte changed > + * quickly from a PTE_MARKER_UFFD_WP into PTE_MARKER_SWAPIN_ERROR. > + * So is_pte_marker() check is not enough to safely drop the pte. > */ > - if (is_pte_marker(*vmf->pte)) > + if (pte_same(vmf->orig_pte, *vmf->pte)) > pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); > pte_unmap_unlock(vmf->pte, vmf->ptl); > return 0; > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 093cb50f2fc4..a6f905211327 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -245,7 +245,13 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, > newpte = pte_swp_mksoft_dirty(newpte); > if (pte_swp_uffd_wp(oldpte)) > newpte = pte_swp_mkuffd_wp(newpte); > - } else if (pte_marker_entry_uffd_wp(entry)) { > + } else if (is_pte_marker_entry(entry)) { > + /* > + * Ignore swapin errors unconditionally, > + * because any access should sigbus anyway. > + */ > + if (is_swapin_error_entry(entry)) > + continue; > /* > * If this is uffd-wp pte marker and we'd like > * to unprotect it, drop it; the next page