From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE758CD4F59 for ; Thu, 5 Sep 2024 06:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30A036B041E; Thu, 5 Sep 2024 02:41:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BAE66B0420; Thu, 5 Sep 2024 02:41:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 181CA6B0421; Thu, 5 Sep 2024 02:41:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EFAC26B041E for ; Thu, 5 Sep 2024 02:41:57 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9084C120999 for ; Thu, 5 Sep 2024 06:41:57 +0000 (UTC) X-FDA: 82529739474.29.9661044 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf14.hostedemail.com (Postfix) with ESMTP id A98BD10000B for ; Thu, 5 Sep 2024 06:41:54 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Bqb6qStM; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725518439; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WamOeSbvE69odMjKoDapuHUBM0CdYmc1m7bY1qxzMEU=; b=fELMsunQRBCYI6sLTqjbnWcVkIu1EUFzonpcQYLOlfq1yez6QiZyPj7PGpxJjZJRnCYTJk RRDTE2eqx4yNdxLtHGZwPiRv3cpTEdnGunpnbQQOfPnS+KksiMQBdDQtjmbvFPcDxU+1Wq 3nfXcBPv2xo7gl0xdAHPopXt8wyw0D0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Bqb6qStM; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725518439; a=rsa-sha256; cv=none; b=GM+Id4H7BAWddWCp74w4i6rxneaPPas9yNwiSUvwz4PkPWMyMSoxUVLcxGnkRBA1jVmSQo u/kog5nc3RzIhDNHpCi3QW9YpcT5rXFmfnuQn2UzAykhEtr1XgTrgDBuRANhg5rnyNAje3 LrOfuyccmc/5Z6qfJL0+ccZ/KuSuojI= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-206b9455460so3570345ad.0 for ; Wed, 04 Sep 2024 23:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1725518513; x=1726123313; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=WamOeSbvE69odMjKoDapuHUBM0CdYmc1m7bY1qxzMEU=; b=Bqb6qStMae7N7cN9HVnuzxGTsP5ksmHYEGEdIiQyqWn9p4SNDk0171ijENcVlhayoe HT9I7IOTbUT1/ioyrIHmElHJAaYJCaVZrGZBOVX3lc3pVmLbh8iNahzF6ZfboCbHk8LE pqGKfGsC9VQjePra+lzQV4wsPgfYlY/17+C/VvokDBIXx5Fa4jT4O6lLmCrbw2ISsvr6 8jeHABBTfn+F9f7gPABppT4r1LPfEA8jqeyMmoS8tqjrkUTjOzYJwEWmrbb82AsSk5dU 58hmYnt2h1xzkRRtQLwtSvMA4+GM/L8Ot1+BgTq/EAMyeJP+Tc30c0sBL8wdc6XwNnAx qq/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725518513; x=1726123313; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WamOeSbvE69odMjKoDapuHUBM0CdYmc1m7bY1qxzMEU=; b=apucgtD5r8bG/G/T+mHvZzfiAL2UfKcaiajos4wrb8RdjtGN3P7Jpk8gHjKvuGof2D +AxhyqokPMzQFZrXRU5LwSoUerBZ7Br7nmCif2xAuwvWtXN8u6yVQAYUIeoVUYKUa0/K 34drMyDpaR2Zg3Tw3MkdUB0bOsNRjJyUUNO9bcrdavhyGyHSCJlkoClqMsHHiJTLKqOz 36U7nPN8OjWLERUs96/ku19YRh6s6VQpumFDo35Y9QNNr2kE0ghcJW0ISLvhHs6pibaA pOKqGRtgDL6xyxcO9/D9/zNu7jf3n6iU816e2a6Qez1ZqhraONwWMZHPfju7kChPUUMh iTLA== X-Forwarded-Encrypted: i=1; AJvYcCUBmFb5QTrjin08HUx8domsz1xcZe2by7QE6P5KcosAk0iCm3n2I87cr5V1iAIJVj8Fd5vV6/TW9Q==@kvack.org X-Gm-Message-State: AOJu0YyHLZtBko4SD0dFVxYl9TOI21euSGofgJJKAY/+1U4cOu8jWuTH GexcsKUueB2vg4ThmqzwW5AVezX/AZu1JMveAtn3g0mBejRPvr2RJs/fD91PINI= X-Google-Smtp-Source: AGHT+IHktKrTh8PvuJAL8kCtov7hkYGbJWjTkWNvKOOeXCvnLHj3NgWJCdo2tCrZ/l2NqbI1yVxHFQ== X-Received: by 2002:a17:903:41ca:b0:205:9112:6c2d with SMTP id d9443c01a7336-206b7d0021cmr84941475ad.5.1725518512930; Wed, 04 Sep 2024 23:41:52 -0700 (PDT) Received: from [10.4.59.158] ([139.177.225.242]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea37cc2sm22663425ad.160.2024.09.04.23.41.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 04 Sep 2024 23:41:52 -0700 (PDT) Message-ID: <7f22c46c-2119-4de6-9d58-efcab05b5751@bytedance.com> Date: Thu, 5 Sep 2024 14:41:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 07/14] mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_rw_nolock() Content-Language: en-US To: Muchun Song Cc: David Hildenbrand , Hugh Dickins , Matthew Wilcox , "Vlastimil Babka (SUSE)" , Andrew Morton , Mike Rapoport , Vishal Moola , Peter Xu , Ryan Roberts , christophe.leroy2@cs-soprasteria.com, LKML , Linux Memory Management List , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org References: <24be821f-a95f-47f1-879a-c392a79072cc@linux.dev> <05955456-8743-448A-B7A4-BC45FABEA628@linux.dev> From: Qi Zheng In-Reply-To: <05955456-8743-448A-B7A4-BC45FABEA628@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: f68yep5jof8eizegf4rocqaigstgcm41 X-Rspam-User: X-Rspamd-Queue-Id: A98BD10000B X-Rspamd-Server: rspam02 X-HE-Tag: 1725518514-315908 X-HE-Meta: U2FsdGVkX1/JXW4FBw28TflE8VrgaI2K5nk8j/ewW+e4TS60sJFxTyIWTs3IsKxxY4/bbKRXmq6/MlAJWLsP8FKvXe8d+/NkcHhhCPm+nf2SN38hqrRyfc3+mEvmSF3V/8fRJ7nDlzYbNxRl3sbikNCRCnmFMOHADGRc15fnpRB5wfOglQ7Sxzf/RfWVoFh3Q8xGaoBpcIBcc4iTDp/9rgYwbGpFVL3v/dnkOGeP3H5LfrcfHQjqea/tVqr7xd7fvmEVoBkHGB4nu+Wr3TEZrjDVREJKGY/fS4DVEMOGUqbzCSY+Mb3fohVvPB3GdJB8gkDIEknZPUQ8Vv4EVayYGgG5BQPgbGmcCZuC2sNFln47ClBVrFbKpyJmkXILahEkjhuiAwcs00FGDONNKEZ4kc8YDQlLaGyn6g2FiVF9Ov2S3S6gK8RxYOwOKwlWthiF9qBSN+yCvt1Isu0j1ats1UZIazFnVv7IEN1rQ3xBtnR1UUKjZCXVfxAKUrl124weAkQFdjYny8yTSPttL8QDOtXixvv6FvhnYNnzkyIEPQsTMumZ/ZJGGrIQ61vx+F6mfXGb8g3EbNj403LPwUkwfsthMgFRSnewdTIHfbNcAfiygvb0EknJ4ehZTGEf4fcxPetugPbB+53yo2+6e54rU0fLeqbB9MOpVnG06XxEMjQZaIcKRX87AP+j4H0hkJbWhOwyxsolN3Jcau/C2GkzJKXOGVr6JAs6RYlatR9XUGttHNZ0Jfq/KB68zNDYcQC/m5VH5F+/s/2jnEwnLDK2IdbMTuQ8v6yNcXjzzi6W/xSt/y0Mk/Rz4I/4T+WdA8aldc6NLjagBIdfhBYA754OBHzBpq3IAEkaUlFza4C85qxur9s+zgyWwbOT8sfvwloor4p8UBqpyMASnf5NwbYAkAxwOLjRuGUk2Kyw7GeMlNSsm5mQceZro6k3JmAtaD20qx7s90VH0d7NE7lkDza +96WlPJp l4UT5Q+ZI5JhT9Lj4ZKpV2vFCHx7XGw9zo1Latsg3mIWrh6bVqgH+wtvfvfg1ntXFcOBK8dXYAjTTaXIamH8wrmDP8ZIYaZKLzFc2l1QivtnhOtr0I4bu6iygrz+pcUQ1sarZmgRlrXizQjYtouokWZnEgDkYhKRy8UFAGmPhG8Gfrv66TQh1SVHeIp5pTxbFPIdVYG4AONgzmVIhsQE4QtWs0bLBQcNhuChCtfsChAdVb2g05XmqKmnkFWG1rvxUSXAK//tBnhFBhn3GfSdYx/DlQmnqUEtXkZHC0bFEJImydLrqoBQRg0mA0UNN42RdZpmxydGnzh8H7vciBhf3rjMDym+MTqrrkJ90wu3PPx0VKcoS7sJk6CljDPkZz32kzN6ad8MUdVQDCe8k1DoKmbLAphdN2WfA6X1BlB++DtXjnsCgtOEmX1p6CRFdzHK35Cw8HWXknqGWg47bI3U5EPqDZ6E5FGIWpPqfCGMG1e6UEkzqrdMyOf8x4Feb/vtaNTZF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/5 14:32, Muchun Song wrote: > > >> On Aug 30, 2024, at 14:54, Qi Zheng wrote: >> >> >> >> On 2024/8/29 16:10, Muchun Song wrote: >>> On 2024/8/22 15:13, Qi Zheng wrote: >>>> In collapse_pte_mapped_thp(), we may modify the pte and pmd entry after >>>> acquring the ptl, so convert it to using pte_offset_map_rw_nolock(). At >>>> this time, the write lock of mmap_lock is not held, and the pte_same() >>>> check is not performed after the PTL held. So we should get pgt_pmd and do >>>> pmd_same() check after the ptl held. >>>> >>>> For the case where the ptl is released first and then the pml is acquired, >>>> the PTE page may have been freed, so we must do pmd_same() check before >>>> reacquiring the ptl. >>>> >>>> Signed-off-by: Qi Zheng >>>> --- >>>> mm/khugepaged.c | 16 +++++++++++++++- >>>> 1 file changed, 15 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>> index 53bfa7f4b7f82..15d3f7f3c65f2 100644 >>>> --- a/mm/khugepaged.c >>>> +++ b/mm/khugepaged.c >>>> @@ -1604,7 +1604,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, >>>> if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) >>>> pml = pmd_lock(mm, pmd); >>>> - start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl); >>>> + start_pte = pte_offset_map_rw_nolock(mm, pmd, haddr, &pgt_pmd, &ptl); >>>> if (!start_pte) /* mmap_lock + page lock should prevent this */ >>>> goto abort; >>>> if (!pml) >>>> @@ -1612,6 +1612,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, >>>> else if (ptl != pml) >>>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>>> + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) >>>> + goto abort; >>>> + >>>> /* step 2: clear page table and adjust rmap */ >>>> for (i = 0, addr = haddr, pte = start_pte; >>>> i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { >>>> @@ -1657,6 +1660,16 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, >>>> /* step 4: remove empty page table */ >>>> if (!pml) { >>>> pml = pmd_lock(mm, pmd); >>>> + /* >>>> + * We called pte_unmap() and release the ptl before acquiring >>>> + * the pml, which means we left the RCU critical section, so the >>>> + * PTE page may have been freed, so we must do pmd_same() check >>>> + * before reacquiring the ptl. >>>> + */ >>>> + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { >>>> + spin_unlock(pml); >>>> + goto pmd_change; >>> Seems we forget to flush TLB since we've cleared some pte entry? >> >> See comment above the ptep_clear(): >> >> /* >> * Must clear entry, or a racing truncate may re-remove it. >> * TLB flush can be left until pmdp_collapse_flush() does it. >> * PTE dirty? Shmem page is already dirty; file is read-only. >> */ >> >> The TLB flush was handed over to pmdp_collapse_flush(). If a > > But you skipped pmdp_collapse_flush(). I skip it only in !pmd_same() case, at which time it must be cleared by other thread, which will be responsible for flushing TLB: CPU 0 CPU 1 pmd_clear spin_unlock flushing tlb spin_lock if (!pmd_same) goto pmd_change; pmdp_collapse_flush Did I miss something? > >> concurrent thread free the PTE page at this time, the TLB will >> also be flushed after pmd_clear(). >> >>>> + } >>>> if (ptl != pml) >>>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>>> } >>>> @@ -1688,6 +1701,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, >>>> pte_unmap_unlock(start_pte, ptl); >>>> if (pml && pml != ptl) >>>> spin_unlock(pml); >>>> +pmd_change: >>>> if (notified) >>>> mmu_notifier_invalidate_range_end(&range); >>>> drop_folio: >