From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1E66C3DA4A for ; Fri, 16 Aug 2024 09:22:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 139808D005B; Fri, 16 Aug 2024 05:22:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EA498D0002; Fri, 16 Aug 2024 05:22:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF3548D005B; Fri, 16 Aug 2024 05:22:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CF9ED8D0002 for ; Fri, 16 Aug 2024 05:22:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8108C121141 for ; Fri, 16 Aug 2024 09:22:08 +0000 (UTC) X-FDA: 82457567136.16.A392C12 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf03.hostedemail.com (Postfix) with ESMTP id DD8F420017 for ; Fri, 16 Aug 2024 09:22:05 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=iiP+fMNm; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723800044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xtCmVrlxGAafWsgKl5/YAKGcZ+iOciR4h5L3r6w+vzE=; b=4FKHinu/vNL/MeNzN3ySuAz9dOYqDCl9tqsJa3PDELbnFjvQCMWW5X+er15Xo+Zjq810On 9TdPKbZ5CiuI8GtOJay7KwZgT04m/o1HUEx7YgLP2fxgIHnHjvQvSkUnbM/4imhEk5zoHU FM7DlEbsUqJHwTGQ635EzfCY0nx2z+Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723800044; a=rsa-sha256; cv=none; b=B4/T/+PE8ru8ovaHk51TCDDIptra3+B+hisymvmeqpfd3BcBEfelyxDwpyHGr15iK7mshi WHd0RTbfeBFLFo/XFZZ4+Jvrgsk280mwbQGL+d+CcAEK/ug1NrLzEcYy9BpA8AyCRA6RPO Y2p1B6tnVAFXcXLE/YjASYTqzBEftj0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=iiP+fMNm; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-75abb359fa5so237623a12.0 for ; Fri, 16 Aug 2024 02:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1723800124; x=1724404924; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=xtCmVrlxGAafWsgKl5/YAKGcZ+iOciR4h5L3r6w+vzE=; b=iiP+fMNm5UlDCTKnRuzZpAzG9w5y1eQ36em+5BvZsz6OyfSijGsG0bRH6LkvK4weoS 88i9A4+j3YKx1TYK8qvB4NK1EH73wnVxk/OBoynJZMheEyvl6GRr55UerZaNnWSc2Y8o T4UMv07H/jXIAeLHfIrCS8JaAX07PJz/PAxZyP+AYSeQaErZ5OIi45AOwg2pLGN7uFc2 lk870thJfSZbicoxBz8jv6vQArtFBCMq4EegRgSSuar0wb0x194vYpCXZHHbEwHt1EaS 3IWTCDe9Ztf4/YUs/pt/Gpg4W9MmQgVsxGZJ8+NiMR7xqZvp1LYaiELJ+Rcj+eDZtICK KjVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723800124; x=1724404924; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xtCmVrlxGAafWsgKl5/YAKGcZ+iOciR4h5L3r6w+vzE=; b=HBowHHqtepmdwE7lSef5blwebAZ8EO8B4qqbmz4xu5LKFVOnRKnprtqCGbeC0CqHI9 upcTGxTS9RyezgXXeMXbnHTeN6erCEGw+StzGdLsHE4q2RzIynQ9JQgHhpq/nMDunCgZ bhGwU4LG0FWIXulWd4BWmkqVOsP6Bsehy3byf735gVKesOifFNvUlS0qM7mn+UTBhiov CQqO8dibBvHRHsybe/Nk9w918NyiZ050fa+ViC3FM7EP1cWkCDU2wGtZ4CvR7UE8eUWJ ZyuRdb9piBp1OAVRqhE0P+FSHmQIU2MLq/1txj8Z58EJWfed1CBZJa75THg1Ge24q5Dk x3QQ== X-Forwarded-Encrypted: i=1; AJvYcCUniZOHKafmhE/rsmOKQ6NWhyGGVj4Y88g1jgvo4ckd243Mk0lc/I3guK8oBDhWKlAA1EBpbf5Ing==@kvack.org X-Gm-Message-State: AOJu0YzyBiLAnLxcfU/5t+/i9lFxqoGlnhMtcdWaeyEnWqJFahYa65k9 qMr3z8I7WTUhl/+ZSKzT+1TbQVWv6qM4PDVsY4pk4aGDKu00sqFYx0UHfTQC+N4= X-Google-Smtp-Source: AGHT+IGT9XwetaYpGzKxUO2gEEcbAPqXHSAiilZKZwdN/lLLvJXKcLaqrWJkuf3lLTP329x8W0xo8w== X-Received: by 2002:a05:6a21:3282:b0:1c4:c4cc:fa49 with SMTP id adf61e73a8af0-1c905075655mr1544270637.7.1723800124099; Fri, 16 Aug 2024 02:22:04 -0700 (PDT) Received: from [10.4.217.215] ([139.177.225.242]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-201f03795edsm21965795ad.153.2024.08.16.02.21.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Aug 2024 02:22:03 -0700 (PDT) Message-ID: Date: Fri, 16 Aug 2024 17:21:57 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Content-Language: en-US To: David Hildenbrand Cc: hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, the arch/x86 maintainers References: <0e8e0503-5796-4b61-bb5b-249e285f5d21@redhat.com> <39281a4d-d896-46fd-80a5-8cd547d1625f@bytedance.com> <0f467510-a0d0-4a98-8517-43813fa4c131@redhat.com> <3e8253c4-9181-4027-84ee-28e1fc488f61@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DD8F420017 X-Stat-Signature: zud3w7zb9hkojtj7higopqd3bmdq9ijw X-Rspam-User: X-HE-Tag: 1723800125-859844 X-HE-Meta: U2FsdGVkX1+zhVSerFSmtk8yWkdhQvDfMT2RPMCpayaJYNA3bWDixMYd3dSmTk0Qa4/KORzC0hyxdKd06LhU8QNMf1GDfr7uru7BINz5xph54ups6wUS6xw5qgZ2lI7UwWiboDYZH/NiruMZlZ+w7/OgLzj+ZZMnTXhZUh7hWGb+v4aoLEsp+O2B5Lk84NexrrMCmfyV625eWn9C9th4809xo0j177WdG47YOw5B5LxBVR0zPf2cCLX/1jsmwbuNxjmntNO1BmpCvD4Y2TrosspzA9x52IhZk5HpEcQCidFdemBDf7zuo/bMGxGiny/ey4BkHGtKxG+gylfpBOVZ/U6c0vVuWXXCyOVApht9fY0nEXwz+3bb6theshjzIDEM/Pd/DhLBhw8lKVc603QxNvDjBJlCWAhghYNKTJ9AV66KFGkGSW+831apLVluQdQodCi04YeMHTtiyrNaXfZDP2S7xUm6T0qMFHUUf+zsZ2FIiBwZCmWwi8t2TjCPrDzxVdQ7s+Vu0Tw9rKE3JxKW8Z9hnkFgGUYutrnNH4XYhOFGExHZf/yS8//DGPuCg1DAaqI98nISCanWqiUgBjI42RhZ6Howy9OphvC7ZaLkzBRPMucxYEK0TJXCn2Gl5yvSMYquaBsoB7Bv4smAIAV+Rq1F3FF1J1QCcyQp9d9+9EztkQAt4WSVtEnI8DcQfO3cgBVHDeta/UtXwQhswbqmH/nU0h7XUjZAYGsg6A3xreCw+FwJoIHhvPF99P+prBNRlBqvvrKj7k/XEJoI4DyECeMi3bIPEj0mF8121DvmKSKyo1bSx4Z3N91BXDs2zCyAJwFeCCQDhBGeS3rZFtP1j8VedZiHjn7n/MG7eta/01IGTFpWvw+JObA7n0B5l5tbv6zr1XihSre6jSbqYozr281J8inyjDOw73RO0hMkkq4ecTUUgCYyHwnZM/+MYmHztT8/CLVfb6QHCFzNfw/ C1H/O6xG 1HHkMXn2B6r1bdDWB/6dk0mZtAyS/QOSTpA0cUaLjALybOstovtu2cZH0IT56xPgaCbg3uvfFaI6Xukk2PNamUdLYW0+9T84CAIODQX54bTjlCVPtSiifOQwJQdr308gIRzq4RkraWL1Q/G5nvnDvfApcYnBinHjY+tC0oj8vTGFrrY5oAYAViZDHsGKv0jFJv2iMgMwm8GC5wNldaXzOKWZeFH81kBgT4muc8Q/3ZY3ifs56hE1e7GA0ulvAtUIQb01h2hWCz+dPI8zm7xmxDh8uA0YCuJuv2cPNA3AuU1oqoaxr4orjvj89wI6KR20idHRIYvgqSyItR5h7FzaydKoswj9sUhWXnk8sJxBLGX53AxPqBDDDHdoC9n7FOO18ynNO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On 2024/8/16 16:59, David Hildenbrand wrote: > On 12.08.24 08:21, Qi Zheng wrote: >> Hi David, >> >> On 2024/8/10 00:54, David Hildenbrand wrote: >>> On 07.08.24 05:08, Qi Zheng wrote: >>>> Hi David, >>>> >>>> On 2024/8/6 22:16, David Hildenbrand wrote: >>>>> On 06.08.24 04:40, Qi Zheng wrote: >>>>>> Hi David, >>>>>> >>>>>> On 2024/8/5 22:43, David Hildenbrand wrote: >>>>>>> On 05.08.24 14:55, Qi Zheng wrote: >>>>>>>> Make pte_offset_map_nolock() return pmdval so that we can >>>>>>>> recheck the >>>>>>>> *pmd once the lock is taken. This is a preparation for freeing >>>>>>>> empty >>>>>>>> PTE pages, no functional changes are expected. >>>>>>> >>>>>>> Skimming the patches, only patch #4 updates one of the callsites >>>>>>> (collapse_pte_mapped_thp). >>>>>> >>>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry() >>>>>> also used the pmdval returned by pte_offset_map_nolock(). >>>>> >>>>> Right, and I am questioning if only touching these two is sufficient, >>>>> and how we can make it clearer when someone actually has to recheck >>>>> the >>>>> PMD. >>>>> >>>>>> >>>>>>> >>>>>>> Wouldn't we have to recheck if the PMD val changed in more cases >>>>>>> after >>>>>>> taking the PTL? >>>>>>> >>>>>>> If not, would it make sense to have a separate function that >>>>>>> returns the >>>>>>> pmdval and we won't have to update each and every callsite? >>>>>> >>>>>> pte_offset_map_nolock() had already obtained the pmdval previously, >>>>>> just >>>>>> hadn't returned it. And updating those callsite is simple, so I think >>>>>> there may not be a need to add a separate function. >>>>> >>>>> Let me ask this way: why is retract_page_tables() and >>>>> reclaim_pgtables_pmd_entry() different to the other ones, and how >>>>> would >>>>> someone using pte_offset_map_nolock() know what's to do here? >>>> >>>> If we acuqire the PTL (PTE or PMD lock) after calling >>>> pte_offset_map_nolock(), it means we may be modifying the corresponding >>>> pte or pmd entry. In that case, we need to perform a pmd_same() check >>>> after holding the PTL, just like in pte_offset_map_lock(), to prevent >>>> the possibility of the PTE page being reclaimed at that time. >>> >>> Okay, what I thought. >>> >>>> >>>> If we call pte_offset_map_nolock() and do not need to acquire the PTL >>>> afterwards, it means we are only reading the PTE page. In this case, >>>> the >>>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE >>>> page >>>> cannot be reclaimed. >>>> >>>>> >>>>> IIUC, we must check the PMDVAL after taking the PTL in case >>>>> >>>>> (a) we want to modify the page table to turn pte_none() entries to >>>>>        !pte_none(). Because it could be that the page table was >>>>> removed and >>>>>        now is all pte_none() >>>>> >>>>> (b) we want to remove the page table ourselves and want to check if it >>>>>        has already been removed. >>>>> >>>>> Is that it? >>>> >>>> Yes. >>>> >>>>> >>>>> So my thinking is if another function variant can make that clearer. >>>> >>>> OK, how about naming it pte_offset_map_before_lock? >>> >>> That's the issue with some of the code: for example in >>> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and >>> conditionally take the PTL. But we won't be modifying the pages tables. >>> >>> Maybe something like: >>> >>> pte_offset_map_readonly_nolock() >>> >>> and >>> >>> pte_offset_map_maywrite_nolock() >>> >>> The latter would require you to pass the PMD pointer such that you have >>> to really mess up to ignore what to do with it (check PMD same or not >>> check PMD same if you really know what you are douing). >>> >>> The first would not take a PMD pointer at all, because there is no >>> need to. >> >> These two function names LGTM. Will do in the next version. > > That is probably something you can send as a separate patch independent > of this full series. I think it makes sense. I am analyzing whether the existing path of calling pte_offset_map_nolock + spin_lock will be concurrent with the path of reclaiming PTE pages in THP. If so, it actually needs to be fixed first. > > Then we might also get more review+thoughts from other folks! I hope so! Thanks, Qi >