From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B530EC3DA7F for ; Mon, 12 Aug 2024 06:21:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52A966B008C; Mon, 12 Aug 2024 02:21:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D9096B009A; Mon, 12 Aug 2024 02:21:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 379A66B009E; Mon, 12 Aug 2024 02:21:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 191646B008C for ; Mon, 12 Aug 2024 02:21:17 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8B27D41ABE for ; Mon, 12 Aug 2024 06:21:16 +0000 (UTC) X-FDA: 82442596152.09.0A4CC66 Received: from mail-oa1-f50.google.com (mail-oa1-f50.google.com [209.85.160.50]) by imf01.hostedemail.com (Postfix) with ESMTP id AFBBB40012 for ; Mon, 12 Aug 2024 06:21:13 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=YI2x+SCk; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.50 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723443597; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uQlCdV65LEz1kHk47bwY6iHdSZ8qzDbLv2CS0ac6kI8=; b=fFc183Hw3rE5ZvwC/uTZL3Zs7OVNl3QkmfEu5ti7KxhtURM5AqztvIUTYwh+37CeiMR47Z ZkLhnYJNl1CBY/1BuCjHFF7VM9VZNNyqCTo7lqEXz8UTHJr9XIIzBjJ+Q5D8paXCXR0WXt ggSlJmX0d+XwaolruD3qOOnL+HHNWkg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723443597; a=rsa-sha256; cv=none; b=YSrqKlc8YEUTlZ9XW52BwKRMaDhQL6LqvVEMQfC6VOMPSvz1TF/PwqL11/6QuM2xAXCSaa zMqVx4eRcQUETo7oEk0XBpM269bhXREfdkD13d3ByEIozAEhbTlwxUm20Xmj/syUZXULge xljrc+jBQOUofUotsDlRiwZ7Z/rmGoU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=YI2x+SCk; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.50 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com Received: by mail-oa1-f50.google.com with SMTP id 586e51a60fabf-2644c1b07b1so443048fac.1 for ; Sun, 11 Aug 2024 23:21:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1723443672; x=1724048472; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=uQlCdV65LEz1kHk47bwY6iHdSZ8qzDbLv2CS0ac6kI8=; b=YI2x+SCkyccBydDJEYFpnDIfME/qirDQTgs+Rzz+hZZtit3sx1786R8MMQEmr4gWDM bB/lBhKz+jlaspvnEIVU551pm7d/pYFRohayPFSRWvPVdUzPj42ut70lviTCoiAu9a1L Z81l0LBvI2TURE4oJ5uWid3jeMyzDGfPId7nacYtqB94frup3VQEkIMbn+b3gVF3xYl9 FhQybbSOwfARoyBIIQJnm/Eeord+MoOs6uLq9TQMw+LNpxcVxX6kVZgYU8diXiJCC1hz ffWD6mSxQsqWlOnlkvh5X5KFPhnn0+7ZlEvjHxTpXx5RhsR1XivWtVtsd3jnHcfE4k/H AVBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723443672; x=1724048472; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uQlCdV65LEz1kHk47bwY6iHdSZ8qzDbLv2CS0ac6kI8=; b=AspipzD5bPneGtea3y0aUq+BbtHeILAeJY8ZNHgW2uUUZGn204aIW9P/lq/REQr7nB bwhLRWUBwJNhfzzme7GzibRaN/LsSuvPAYISowV9I0gxaedO9inPovgywutJw9hZfGol pi+BvMR54+ZylXLwjwvGLRRCdqT6gnj7FWkh5pPaBVb5DW2BewZpAhF/aYswARq7pE/G q3gx3qMO3pQTYCiAI1zhnYMmxcXX/IGNvyWMi70FLJf4kUybS6HalwKglLPB8Zu+6uYf SiIU08v5IuxxObBDFVCy0BU+6D1GIuWDbq2XVZaYceLHLC4m2d+PsmcRDq6lVB4ZaxvE GcFg== X-Forwarded-Encrypted: i=1; AJvYcCUhuxe9m8p3ixyRBqNR1AmwfuwQ2qKw4HCiIdMLct04fcJ0MrLdZph6lw2vKsAb09NIje+bvnkpCw==@kvack.org X-Gm-Message-State: AOJu0YxhO8TncFxc55Q5LOTEduNdq5EQLXd8nNPxv9qm6SlNnvCPPFAB JFE7ogF66gijAhEi9iouL8BeteSx9d3SJKmneTfR3IMO6BhQMseClxi2Ko+iNM8= X-Google-Smtp-Source: AGHT+IFmB5BoVq6D14NtSCtkb6BEh80Rryr97W+EDEWN3kH2nMQUF8wgurMj4XoRqHWBBaDrHk6YBQ== X-Received: by 2002:a05:6870:658d:b0:260:e5e1:2411 with SMTP id 586e51a60fabf-26c62f1afa6mr5545965fac.6.1723443672322; Sun, 11 Aug 2024 23:21:12 -0700 (PDT) Received: from [10.4.217.215] ([139.177.225.242]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7c3dbe12e80sm3506101a12.24.2024.08.11.23.21.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 23:21:11 -0700 (PDT) Message-ID: <3e8253c4-9181-4027-84ee-28e1fc488f61@bytedance.com> Date: Mon, 12 Aug 2024 14:21:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Content-Language: en-US To: David Hildenbrand Cc: hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, the arch/x86 maintainers References: <0e8e0503-5796-4b61-bb5b-249e285f5d21@redhat.com> <39281a4d-d896-46fd-80a5-8cd547d1625f@bytedance.com> <0f467510-a0d0-4a98-8517-43813fa4c131@redhat.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AFBBB40012 X-Stat-Signature: 4wi7ftdyxnbhha7gk9kbhdjmtorcor7g X-Rspam-User: X-HE-Tag: 1723443673-386890 X-HE-Meta: U2FsdGVkX1/k/6sdRlm9waSBUhdX/I7IFau4x0/lFVc6gllxRT7110ejw+66dBY0z6LkdC5pRA9Q4RhVEQW3dz8eZSkmIvEmD79qnCpC0VCmXdyjkZ7iP523fwS4zPoHKM8+3TrMk1VphVHJa0X85z9n5mcLkMot3v3ebGPcDX6/iDBYoltHkqhbOKOS/LX6lCl++GBIakGUrnjLlN7/toKqqQnZLbMAwuki/U563JJVtAhx8MUnAND6lF5ifwRdxS8PHGURIFD0AQ5lnWT1WtCKiNxcb6vycTu9JDklO2j/c0e580RDOOTRkb9ScrDd5paI5G2etSBr2TvNsfHSNuXKw8br6hxA3uD4F1PXwGfREGEbkbqgP9r+ghnhZix/ch4A7hjKLLyZw1GRedUSB8E8qQh8CnrbJp9LJVdMzDYSop48UUvyAwxWf3vDHCbZ3uA1jeVQAt5OjsuNBMX0JaWpsQhiXmO4rtTRHklzA+ULFkZCMHMzyGhCfteorbLCXS8U5wyaaZY+lcTpnabx5gMJSqnwQjXul68IlG3/dhwyqlPLqiotVnASVWTpC2MX6z9hkAx3hbCIwILF9DStElMb8wlDL/vmokxSfQu+n6mikJuYPRRyN1jq6js1aa9x69oV2ZU6hygqJD7y6INCdSrP78ckuNwQCuZ24uxhlw9IQ4OF9KpbN2XnR3hUf14KJyewHuTPHOYY26ZWxmvMnorY6NoGPObHJS3UMvuZ40d/6FJdMUe4WRt7ZV4SjvtT8RF2u+Iu/zxOsDiLN+RKLpcJpSikEnCgPVCVdISkPkejcBNsyCthNYvskqt18qOb7huQLMrTZnAFToNNwTdQETrvXAE0WPYy7MD1tiZEjyZ7qxDUQFdV7c/5vzRCfyhIBgdXQmhdtol8pChbhKDXM1KErlPm/vIqbOUxI2+vzLWeWNZi2JT9Cr6ReEebfpRm04bAwUnmkC4eAx0HW8o zl1vrp1f H6fbCXq51R4wAbgwdTaLdf+A27sZ1nYUYYhDAUUDmBzOLnL68B0kFa+InqDqTrgluQRjbZoLw5Z911oQcMFum4NS8/Ed3sSQpimp0jcozNrrODnBxBjdnSDizyv9okx8kjWX/RUX8HLKqoq07qJOz7LXnvO4hZnAdJGHXiRejn2xNKCC38Txj48k0KSJ1/I9zTaPUylfvcximTCzX2tMSjrSPR6MzhCyRdKix2VQ/qwR8Yf5ktphc7I7Zps00LinZ2P3CSpx+CrRg1QM8w1AdHFh7WZjNv8QCev0Xgq2BPwj/ylRV0aoXDHrgIPrbJb71U7e289KzFTy3mJpXIGkPeZqkkdrw6f1piujxscdKY6XcT83qerX4uVAOXHsBo8+VhXEV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On 2024/8/10 00:54, David Hildenbrand wrote: > On 07.08.24 05:08, Qi Zheng wrote: >> Hi David, >> >> On 2024/8/6 22:16, David Hildenbrand wrote: >>> On 06.08.24 04:40, Qi Zheng wrote: >>>> Hi David, >>>> >>>> On 2024/8/5 22:43, David Hildenbrand wrote: >>>>> On 05.08.24 14:55, Qi Zheng wrote: >>>>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the >>>>>> *pmd once the lock is taken. This is a preparation for freeing empty >>>>>> PTE pages, no functional changes are expected. >>>>> >>>>> Skimming the patches, only patch #4 updates one of the callsites >>>>> (collapse_pte_mapped_thp). >>>> >>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry() >>>> also used the pmdval returned by pte_offset_map_nolock(). >>> >>> Right, and I am questioning if only touching these two is sufficient, >>> and how we can make it clearer when someone actually has to recheck the >>> PMD. >>> >>>> >>>>> >>>>> Wouldn't we have to recheck if the PMD val changed in more cases after >>>>> taking the PTL? >>>>> >>>>> If not, would it make sense to have a separate function that >>>>> returns the >>>>> pmdval and we won't have to update each and every callsite? >>>> >>>> pte_offset_map_nolock() had already obtained the pmdval previously, >>>> just >>>> hadn't returned it. And updating those callsite is simple, so I think >>>> there may not be a need to add a separate function. >>> >>> Let me ask this way: why is retract_page_tables() and >>> reclaim_pgtables_pmd_entry() different to the other ones, and how would >>> someone using pte_offset_map_nolock() know what's to do here? >> >> If we acuqire the PTL (PTE or PMD lock) after calling >> pte_offset_map_nolock(), it means we may be modifying the corresponding >> pte or pmd entry. In that case, we need to perform a pmd_same() check >> after holding the PTL, just like in pte_offset_map_lock(), to prevent >> the possibility of the PTE page being reclaimed at that time. > > Okay, what I thought. > >> >> If we call pte_offset_map_nolock() and do not need to acquire the PTL >> afterwards, it means we are only reading the PTE page. In this case, the >> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page >> cannot be reclaimed. >> >>> >>> IIUC, we must check the PMDVAL after taking the PTL in case >>> >>> (a) we want to modify the page table to turn pte_none() entries to >>>       !pte_none(). Because it could be that the page table was >>> removed and >>>       now is all pte_none() >>> >>> (b) we want to remove the page table ourselves and want to check if it >>>       has already been removed. >>> >>> Is that it? >> >> Yes. >> >>> >>> So my thinking is if another function variant can make that clearer. >> >> OK, how about naming it pte_offset_map_before_lock? > > That's the issue with some of the code: for example in > filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and > conditionally take the PTL. But we won't be modifying the pages tables. > > Maybe something like: > > pte_offset_map_readonly_nolock() > > and > > pte_offset_map_maywrite_nolock() > > The latter would require you to pass the PMD pointer such that you have > to really mess up to ignore what to do with it (check PMD same or not > check PMD same if you really know what you are douing). > > The first would not take a PMD pointer at all, because there is no need to. These two function names LGTM. Will do in the next version. Thanks, Qi >