From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B16BD5AE70 for ; Thu, 7 Nov 2024 07:55:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF9E86B008C; Thu, 7 Nov 2024 02:54:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E829C6B0098; Thu, 7 Nov 2024 02:54:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFC6C6B0099; Thu, 7 Nov 2024 02:54:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AD7586B008C for ; Thu, 7 Nov 2024 02:54:59 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 18668AB4CE for ; Thu, 7 Nov 2024 07:54:59 +0000 (UTC) X-FDA: 82758536910.12.3163006 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf20.hostedemail.com (Postfix) with ESMTP id 54B871C0011 for ; Thu, 7 Nov 2024 07:54:12 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=hxgWOB6d; spf=pass (imf20.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730965872; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MuiVOdCzD8MGwxjYTAfGUY1kmdjFbYxSBTn1qFARbjk=; b=t/ZkBu6xu+U7usVYq5dS+QOz6PHm+Coq0Wr2MlTYqToZkJw7PmAdeSjq9vf76tPmXZWzaH vAau9XSo1z4B0vTywbKrCrj1lxehcmPIqn0MkKAmJSXjnt7MQKoRhZvIp+fkr0/heTReQu sCB0BgQyMXrdDe9R3GsmOmKkYfMeRik= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=hxgWOB6d; spf=pass (imf20.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730965872; a=rsa-sha256; cv=none; b=q7POHEmMCO5eC601oO3UckyJMU6i6XWeKvwIiY2/cbJp2Tu6VJbAkNcsrOEd+/gSOtJfw6 iPQ83KuzR33dBAKWOhSSgUHKndUK6dLhhcArJuziMKquS0MVfGFq23VxiKat1Ts5ZZToro QyPEciTFeSV+aZUNSYdtIyLjjb2jUto= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-20c70abba48so7070725ad.0 for ; Wed, 06 Nov 2024 23:54:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1730966095; x=1731570895; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=MuiVOdCzD8MGwxjYTAfGUY1kmdjFbYxSBTn1qFARbjk=; b=hxgWOB6dk5/igOv2UcGacKgiEalCFbl6sreTW3jUMI+Mx5E+Rv9ULZzSGgWR0i9BpF lWLqhqx9Q7MfFmpBiqP67RxWwqxIp2VOyZkiINZb63PZw4d7GLKsEoVeuMLEElHG3F6+ 8K5NektvVRWYBNyrZ16HKsBle2bDOL7rh8iUWtm8TA4msZ8CahOksJdJ/0FrO/Q86nqH mJLK3vinE8OUuz2JIurrc6fntey5w79dI/80rPNvHwYNwRv7weA5C40WrBHhOB+jrTR9 SUUzNO0l7nGoOoAzOEpxcxY6hF1KrrBM98Fh1TuyBcxapetqk4QjFms11gPlCxBnJlXq xDHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730966095; x=1731570895; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MuiVOdCzD8MGwxjYTAfGUY1kmdjFbYxSBTn1qFARbjk=; b=UITjXMfAK/761SnV4QtF9AT3ovsRbn/Q+iuvXdxS9odgzRW7f2+nq676TyWXu9PRx4 toXzCivumCLaSD3PDHO8Jtaa7nOpzY/tF7c3eT263tDfWezeWUlvAvYqhG1e7/DJWadv NObEJKy61JbiVTQP57VKl5s5I9/CkkUJDV8l+5lQvLjvFOs/dhAnx12OCPzmS75K+GL6 jxBUbMjinberOYeitPd3hWkLv2mygxHwcCXCzOBaJSdgsjmfCzM3uvielGt8DcCemm38 vwzfEnxQkKlWU/2qlHmZFXcFZ/bx8d5PclGjvU7tQkWeQiy79bK95O7d1YrwbLsqNEZV hblw== X-Forwarded-Encrypted: i=1; AJvYcCXKReWONz3s/Se81/NyH/MGbdmHtWTPRz/SUNS3jJW/dICbelDoo1HHrF5gffowh2IibjZXPamgxg==@kvack.org X-Gm-Message-State: AOJu0Yz7zhUF89kHj6cr46nPCtZE2B/J+UygWRHS0Sf8mSjal3UWfKoG j50biX8PKNDqS8085rMpvAqUlptu/u4X9+N/lFWIf1a2YvzttHkD7GURBa5C4Ys= X-Google-Smtp-Source: AGHT+IEDCJ0LwE6wTKixD1rDG+hhGWKgZxlUEHyzcCQxACaZidAI+oJODjzm9sVFhmu3GH0AGRU0lA== X-Received: by 2002:a17:902:e805:b0:202:cbf:2d6f with SMTP id d9443c01a7336-2111b0181b5mr305660305ad.57.1730966094743; Wed, 06 Nov 2024 23:54:54 -0800 (PST) Received: from [10.84.149.95] ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21177e87537sm6333055ad.270.2024.11.06.23.54.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 06 Nov 2024 23:54:54 -0800 (PST) Message-ID: <8c27c1f8-9573-4777-8397-929a15e67f60@bytedance.com> Date: Thu, 7 Nov 2024 15:54:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/7] mm: khugepaged: retract_page_tables() use pte_offset_map_rw_nolock() Content-Language: en-US To: Jann Horn Cc: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com, peterx@redhat.com, catalin.marinas@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org References: <4c3f4aa29f38c013c4529a43bce846a3edd31523.1730360798.git.zhengqi.arch@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 54B871C0011 X-Stat-Signature: oqcbt7qyqh684uxyewndbmhcaoxiyiyg X-Rspam-User: X-HE-Tag: 1730966052-886169 X-HE-Meta: U2FsdGVkX18BIxGsNSol5Un6kk74yiCk1jjQtZTTJ0cA6jmUOmxp81INE4cEgijZDbCn0KYEvqdW+5sAdkoNKrB1xze8f00A83TKHD72WP2Pi7uw2RfRHvtT7EU/R4tivjtqoI4wx/F7I6QvMKbSqIM5yNK3ycgVZdUvx9raHg8E3WrlBTdSqRoIy4sHmGNSyuuvGwDCcrFX1y+EBItc4Mzz4s4x8OJFX5/9+LSRiBGxo2wtCilD2/adFNhP1hJnQ+Yn2rDj3KManBrM/VuLalUD0EspoaO+4jQnuZDlacJ9a1LMwaMnWrlCYY4U7ibbWumCrRvtE/IhlFmf9JSiSSN9K5uq3Kz1RU47hNxCO983TXpWmg0FMBx6IogeQX1sVMjZuCVpg6t/cLi48G2yKVJdilP14guKoOMA56OX6OeREUBxFzj6Lxt3i8ZTHb41MQBZMsGaWKoHkh+96DK9IBvNtU5iG+rSohABxVOfrGOcOKOu+/6ASPV7haz2G40WTcvSQ/u3FSS6/g7Regzfg0NwahMBTFI54Lvz0lA3uxGfS/gD/oqsZUZ4BrJxgCY5RcO99hSQKFzc04JGG0zZL2RI4HXD55dzaIfvNA9vnkdCmlaCTy2rEdsVprPoT2Osho5ljjOsV6Y9UWwWoD1geoof9nt1kW+bALWlhNnW0pqN/z3r9WN8tfAwQRjAMEMjEusj9Cgox3ixRmSWI1y2FPxBxXJ01IU7jNzPJO045Ps6F1C2VzEfI9wlVkF9j0DbTxCHZv/crpahlphgqPpEEMO7qrsr88YSEu3AWe2nWSPKs+rPUlzxj8+qQuBTmSbai7qm7RBtUL6K8ldw7weLR77Y3W9J8eSNDj0oe0CVm2uG7LqW+5uzAEV0D5UULX306v+EB5eoBXIYDrp3tCmE7spper235gBnp9d3FMIu+GPYcP8YRVRsc4QZwvA1rLgBoPTNuPcqMNZ9D+uRbSi jodEb78G tNuDSDmxahfh6A4/Aj0VZQWnoCnOIr/9ZZQrRiFVsuww5slqgsZRTDn2xTcXwRYV6IL5JnRrWZFB/U7g7/SvupYeB+TAQum2VLU2r9RjgOHIdDVsNDtOlvxB4CrDVpQ8BxfkhihqO9eKSS8s0vOQqoU6IYu7+Y/xYuEgfvi0tTIxZtbNk0KT/zgcZ2F20Zqxk0DQGs7ib58u+n5O8NuK5xyWiJiWllyGvxg9pDpSDm3Y3yNC7heHD2JyZXd/MdnVl3D6hA/tM231ictBbIP6z/nVfZEEMZYCKwn7zRU3X72B7ISXNXgaZhnFDLt/pc1dFfZF8KwkGGsWIkW2XXtyoEI6cwwvRwAynS2kt1azQqOGfDskY8NOT1LmU8nwhTv/4B+pDkR0P70f8NhOzZTgVFernkerKNTR+ukWLP4+jQUNaIrizPDUDkibt8pSk4D7++Z9wkouBeELhDZpPLzDmbUXD7w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jann, On 2024/11/7 05:48, Jann Horn wrote: > On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng wrote: >> In retract_page_tables(), we may modify the pmd entry after acquiring the >> pml and ptl, so we should also check whether the pmd entry is stable. > > Why does taking the PMD lock not guarantee that the PMD entry is stable? Because the pmd entry may have changed before taking the pmd lock, so we need to recheck it after taking the pmd or pte lock. > >> Using pte_offset_map_rw_nolock() + pmd_same() to do it, and then we can >> also remove the calling of the pte_lockptr(). >> >> Signed-off-by: Qi Zheng >> --- >> mm/khugepaged.c | 17 ++++++++++++++++- >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 6f8d46d107b4b..6d76dde64f5fb 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -1721,6 +1721,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) >> spinlock_t *pml; >> spinlock_t *ptl; >> bool skipped_uffd = false; >> + pte_t *pte; >> >> /* >> * Check vma->anon_vma to exclude MAP_PRIVATE mappings that >> @@ -1756,11 +1757,25 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) >> addr, addr + HPAGE_PMD_SIZE); >> mmu_notifier_invalidate_range_start(&range); >> >> + pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pgt_pmd, &ptl); >> + if (!pte) { >> + mmu_notifier_invalidate_range_end(&range); >> + continue; >> + } >> + >> pml = pmd_lock(mm, pmd); > > I don't understand why you're mapping the page table before locking > the PMD. Doesn't that just mean we need more error checking > afterwards? The main purpose is to obtain the pmdval. If we don't use pte_offset_map_rw_nolock, we should pay attention to recheck pmd entry before pte_lockptr(), like this: pmdval = pmdp_get_lockless(pmd); pmd_lock recheck pmdval pte_lockptr(mm, pmd) Otherwise, it may cause the system to crash. Consider the following situation: CPU 0 CPU 1 zap_pte_range --> clear pmd entry free pte page (by RCU) retract_page_tables --> pmd_lock pte_lockptr(mm, pmd) <-- BOOM!! So maybe calling pte_offset_map_rw_nolock() is more convenient. Thanks, Qi > > >> - ptl = pte_lockptr(mm, pmd); >> if (ptl != pml) >> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >> >> + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { >> + pte_unmap_unlock(pte, ptl); >> + if (ptl != pml) >> + spin_unlock(pml); >> + mmu_notifier_invalidate_range_end(&range); >> + continue; >> + } >> + pte_unmap(pte); >> + >> /* >> * Huge page lock is still held, so normally the page table >> * must remain empty; and we have already skipped anon_vma >> -- >> 2.20.1 >>