From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5088CC3DA49 for ; Tue, 30 Jul 2024 09:10:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8FE96B0085; Tue, 30 Jul 2024 05:10:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B3FB66B0088; Tue, 30 Jul 2024 05:10:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DFDA6B0089; Tue, 30 Jul 2024 05:10:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 801426B0085 for ; Tue, 30 Jul 2024 05:10:16 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 32EC8801B5 for ; Tue, 30 Jul 2024 09:10:15 +0000 (UTC) X-FDA: 82395847590.03.C4735F8 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf01.hostedemail.com (Postfix) with ESMTP id 9850F40009 for ; Tue, 30 Jul 2024 09:10:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=buULE5HN; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722330569; a=rsa-sha256; cv=none; b=VpJSjk6PPNqDJRZLOTq2O016wFrWRz1oqUEdktduEbm8577Yuo76Ct8Is+b9v9oEWIlLM0 qP0d6G+7/hqbzQwyCPQNVl+YjnL16UtzAbaCaVlHZ2MKvUuV9HSH3IkOaTebH8bv4FZXNZ H/JU4LEMyUjrCfNnbGQFph/JnOrj5hI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=buULE5HN; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722330569; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KSclq4pThTSCwT9YqJqbII2VQ5xJ9oUgsdFqEmgr39M=; b=IJigsy3bymD0nIVqBPbluzAXGwHs+m1ZEllvIl3Qbp6rNpsrC1uQJFsAIKaoyyE2gjqJGH pYIDIjSbz0LpdECpyIu/9DBXFVhAfM2XhL+G1jwz2Lr2j+DEPfz/a3CCn9ZXFn9VblhHfI /ak6PZZviSk1TFVNd1JN85lecTHlDUA= Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-70d2c0ee69fso266725b3a.2 for ; Tue, 30 Jul 2024 02:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722330611; x=1722935411; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=KSclq4pThTSCwT9YqJqbII2VQ5xJ9oUgsdFqEmgr39M=; b=buULE5HNgtO5SgtidjQetxOOWg3cKa/iRDaVDBuIFrAZh/qQYsDXSOajl3OodbJi7T DwOFNegbHU3wSBu2ZXPVlAYmqHz07m+12u9mPYI5s1cJKD0414XaLv2R+ZBXLsIczckX Swga6zbxqU7gWAQO4khKFyYoqCwKumtxME79OFIb5AStqRFIyh/RSKSwmT++0s8I579V 9hzapkzN4MzhS++KIXY1ik7fMgVG/YUv0zAqKyciBZgqVHqQrNH1WRWEnWbq7fMqifoT /NJAO2Uxbfw/KhxIdyAPA1+VrS11gAAAJrRxAdaz8FO+kGCG9Ag9+NgcZQt3B0CuMsZW HXCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722330611; x=1722935411; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KSclq4pThTSCwT9YqJqbII2VQ5xJ9oUgsdFqEmgr39M=; b=NKWjIf/7sxVvbvoXBMkrUcDtLfbVBMWRAnrg18sAkR2uRtiznOY9LmLeaGMq98Wt5z mQV0QY6kbeGBQRLIYpBbDBprElBYW2QTF7LcMLQsclhs38Fu3l2CF4nLYaB8ZjQtcwoE g3Wdw/+lMk7Bkb+vak7dKZCLqa3E6sV8f7TD3kUjgftYCiUweOCBA+kEiHOMRP+Fxe8E rpi8v9OaWPbtNbli89D07RH8lUCB14jz/TFT8cX5kbV/1rPZGwJmwT/SvJvjai+4XYsb H5t+GhnKQFsBZCE/2GD8n0uLh49uR9fU4OfMoTf22sZEidBgun4Gkc4AumiL+QDZfAja P7Iw== X-Forwarded-Encrypted: i=1; AJvYcCWURdB9AMACc5FEa+zjKUanf5oMlS05DIgiUn3Ruuqeke9uyjINKTaMkN96lkt9Irz5YAaBvjVgoanfsCn2wQ0ZYWQ= X-Gm-Message-State: AOJu0YzAuNW0UpgXadonanslfynJmtXDodcDBTnRwFzKLw8X6taoKX/D qJANQiSUSId2NV40Fz+EjaqhBEe8VepPQTvZUR6kOWbayy1UMCwneMpUnxmTtKs= X-Google-Smtp-Source: AGHT+IH44yi8Et9gv9K/PUkEjphadp45oqr8JhJ+BDFwdfghMnYs0seqPjSnpcyasrI9OPdJ+JNyBw== X-Received: by 2002:a05:6a20:43a2:b0:1c3:b15b:f86b with SMTP id adf61e73a8af0-1c476f37bfcmr13171480637.0.1722330610947; Tue, 30 Jul 2024 02:10:10 -0700 (PDT) Received: from [10.255.168.175] ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2cdb74e83b2sm12043744a91.46.2024.07.30.02.10.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Jul 2024 02:10:10 -0700 (PDT) Message-ID: Date: Tue, 30 Jul 2024 17:10:05 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer Content-Language: en-US To: David Hildenbrand Cc: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Muchun Song , Oscar Salvador References: <20240725183955.2268884-1-david@redhat.com> <20240725183955.2268884-2-david@redhat.com> <9e671388-a5c6-4de1-8c85-b7af8aee7f44@redhat.com> <12bae4c3-5dda-4798-9f6a-3ac040111551@bytedance.com> <47fe3480-a4a4-465b-8972-c6507c7a76ee@redhat.com> From: Qi Zheng In-Reply-To: <47fe3480-a4a4-465b-8972-c6507c7a76ee@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9850F40009 X-Stat-Signature: ja3ngoc67e767k7n3p8c77jyaopfhtwj X-Rspam-User: X-HE-Tag: 1722330612-252237 X-HE-Meta: U2FsdGVkX197bac43acSbhbb9xU/52/YMZlUwUbZ6z8N699GSfcJi2ivhbVZvT1NBSj7kKNqlB6iVvZvPXPdxC7SayBdOHsypx7o0UGohNshcij/NDE4uzGKIDIl61ZyHjrGSvp6AqkY33eCnezwqVh3QjcrTCEItHKxLI+ZWqvVZb7tVN/crFDENIHbKGQhYYJM8cigAQj/1BpnPLGz9jpFdXDpCU+1gyNFyhphvx3y8PmT8kc2ivvz26vEsp9tDT2Ek3XO5G25+pcepympFNwKYQmI+02kyjZTstLXgNJ8UbZxnB93mKrYHAoXlgcz2becBv/02s08xGufDH70l/lRPF6NYJVrr8nI9/NEGZRSn2STLe09WlRF06EUjRUwLcE+FNnFRFdxuKvKMtKFFO5eeZwf5UU15sDbX9Op2c07jNk4yjL2bf1OIiedhb+15LtNydK3Mexdfqqnxrox5HLiqCAsmigyh1kZ8lhTwqCpAWoDtM4SBAnKsEIPX2Lh19Ele5J1qTOP6ci62Ff6cnGKtK73xN3QNq9E6NqXHJ/4UkgUaxA+fsrzfRg3ISjggp4W4uRgirLW3AUF9m8gzYweAjxWxvvsE4ILLgp2f4aAEu8M/OhrTVI7PvRDypuAd2nqWaOvXYuKNWbvoF5ErVmzGOgLBVXuIQfdFtbfXO0pPX3SPK5WE7UniTOHZqMHUo9lb1CjRufi2NnuRZq+YBonwCmqnKcqWdcpOSj8Ln+vRm2L2ZlxIqOFml5FLAEkg6gwD5z0ax0po/6nVk72nZe6L/y7GrK09Z9iEK5SW5x93rsZ1HnDe4IEORG1+XdPjjzSyB64GWy8KP1lP/QDC/2p2r1i4nlqvCbtUmaL0Qj7217SY3kcpUDhjayAgZPHk6z+onutn6djfW/wV0SbRaBWt05uzhg0PcYPFj7KEoVQG4Ko6ZdB1Dt3Dhjf08Kvjg06BEHreXzaJHDdIoD jItDsX3i dCWQ2JkSQjUywzImYq7hNCsScBawo89wePwAfXrSZqPhdsa3xranz1mtDGYD9Z5MCUgVbY3BNgSoyGtODL+zwrq28dP36lAKLHVmylGX5xCwapdzXxZbSZTFJECkRN1lqA6bqiRu09ErR8yg5Nt3QOj4/C4dybI3kevUfLzUuMYrxThjJABTDeel59G3Hdktkz55ozPrpmBNFxbznngCDeKIh5MOFHu/mBY1Nn1DQD5qQEuqvCQ7pZzxYBFhZgTLCKuAXMO+U9NrgObjzyP0eyv+BsHzfob0PfeRVfA0U+JrUItIGBEeh/Qu0SvsJ/0qDrjoh2u5KNFZ1gPMWawMsjv77VwNmnykMGsUdEMee4NH+O6Gth9Mo2B91Jdr8hgoHjeIu0G2z5aJ1NuxaV2QaS25izog+r61PsmaKMLtppATU33ywohYIx78Ish1VoruMwkgk/VoafekQ+9GBtSpAciBXtFTDE4ZuYDSV5YBFyizQTNab8W5lCPkH4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On 2024/7/30 16:40, David Hildenbrand wrote: >>> ... also because I still want to understand why the PTL of the PMD table >>> is required at all. What if we lock it first and somebody else wants to >>> lock it after us while we already ripped it out? Sure there must be some >>> reason for the lock, I just don't understand it yet :/. >> >> For pmd lock, I think this is needed to clear the pmd entry >> (pmdp_collapse_flush()). For pte lock, there should be the following two >> reasons: >> > > Thanks for the details. > > My current understanding correct that removing *empty* page tables is > currently only possible if we can guarantee that nothing would try > modifying the > page tables after we drop the PTE page table lock, but we would be happy > if someone > would walk an empty page table while we remove it as long as the access > is read-only. > > In retract_page_tables() I thought that would be guaranteed as we > prevent refaults > from the page cache and exclude any uffd + anon handling using the big > hammer > (if any could be active, disallow zapping the page table). > > What I am still not quite getting is what happens if someone would grab > the PTE page > table lock after we released it -- and how that really protects us here. > > > I assume it's the > > spin_lock(ptl); > if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > ... > > handling in __pte_offset_map_lock() that guarantees that. Yes. > > > That indeed means that pte_offset_map_nolock() requires similar checks > after > obtaining the PTL (for the cases where we are not holding the PMD table > lock > and can be sure that we are the one ripping out that table right now). Yes, if the PTE page will be freed concurrently, then the pmd entry needs to be rechecked after acquiring the PTL. So I made pte_offset_map_nolock() return pmdval in my patch[1], and then we can do the following: start_pte = pte_offset_map_nolock(mm, pmd, &pmdval, addr, &ptl); spin_lock(ptl) if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { ... [1]. https://lore.kernel.org/lkml/7f5233f9f612c7f58abf218852fb1042d764940b.1719570849.git.zhengqi.arch@bytedance.com/ > > >> 1. release it after clearing pmd entry, then we can capture the changed >>      pmd in pte_offset_map_lock() etc after holding this pte lock. >>      (This is also what I did in my patchset) > > Yes, I get it now. > >> >> 2. As mentioned in the comments, we may be concurrent with >>      userfaultfd_ioctl(), but we do not hold the read lock of mmap (or >>      read lock of vma), so the VM_UFFD_WP may be set. Therefore, we need >>      to hold the pte lock to check whether a new pte entry has been >>      inserted. >>      (See commit[1] for more details) > > > Yes, I see we tried to document that magic and it is still absolutely > confusing :) > > But at least now it's clearer to me why retract_page_tables() uses > slightly different > locking than collapse_pte_mapped_thp(). > > Maybe we should look into letting collapse_pte_mapped_thp() use a > similar approach as > retract_page_tables() to at least make it more consistent. That topic is > also touched > in a98460494b16. Agree, more consistency is better. Thanks, Qi >