From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DAC8C7EE23 for ; Wed, 24 May 2023 04:54:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2DFB900002; Wed, 24 May 2023 00:54:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E05546B0075; Wed, 24 May 2023 00:54:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF42A900002; Wed, 24 May 2023 00:54:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C0EE56B0074 for ; Wed, 24 May 2023 00:54:20 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 85FBB40563 for ; Wed, 24 May 2023 04:54:20 +0000 (UTC) X-FDA: 80823932280.22.4668FEB Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf30.hostedemail.com (Postfix) with ESMTP id AB0B980011 for ; Wed, 24 May 2023 04:54:18 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=cr+AzMNH; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684904058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KqPOl9ypUlM9FOcJpAG9+3J18Ww9uHXlwCXL2XFf4aM=; b=iciKbEUAx+rCxsN81vezIZ99IWe7LxHY83r09e49s+s018GymbO7aZHO2iHt9PSNnoDoDJ fTogEzqZkxmHDzFKwLOGx1m1KGS9w/O0dSKlbG5MELVrehBjX2Sen2JJFMvrVXRw74QwwA 1s/zOG/vtijfLXWFo1Z5/htzG0ACIbE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684904058; a=rsa-sha256; cv=none; b=jP+godB49RC8edr1kJuS0FwEspkYw6r+van+L5hgqZsuasofdk84OF1vf4wCopZCldq1bN Vz8RbmSV4araeNN4J30EmsXzVvDhi54cOSI00pkXRFYWxRRGNboBNq4DihLAShgOXCFV6t Tih5FFPZNK1KBeTKbIFIRhH2t3OqeYI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=cr+AzMNH; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-55db055b412so82511137b3.0 for ; Tue, 23 May 2023 21:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684904058; x=1687496058; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=KqPOl9ypUlM9FOcJpAG9+3J18Ww9uHXlwCXL2XFf4aM=; b=cr+AzMNHLzBZehXjVbZ+6LzhsqAMJoB7wynFTygjsNCLn3hXwhPtGPFhdD4PqnAc9M jj35Sd9X1YYFQKHuxPfyuFw61GgVTeurqC72t0hz+TiRBgb1Mfm9tuWtWTX6DURJdCga gRSaqwx7qYKU3rUSR1AaWRazZRCDCqfL+20T8wlRqqLnHKlKFbQf0qEzcjAhX0Ym1Uo8 D1l4etRSW7oEjEmszglGL6xjlMxjbmwhAes45bCna8eaa33Hi/AEKvt6F3TmlWfHnESv Y9RW4cUzgHMSGhz1/ANzV1xkfNI4iStIaXJjNyeYLQplolsqmQoBrRElRezNE3QEAQXV 47NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684904058; x=1687496058; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KqPOl9ypUlM9FOcJpAG9+3J18Ww9uHXlwCXL2XFf4aM=; b=eXWNqJL+O0aV5Y8iIO27911qHBaEsPOj8mksHn/DWppFd49VpAsTKITVaBih/wT1pE kbod62ddJlrOleNSz8KgM45FNZnLlmRfychbw/TFeep/9V/EZMURyo6mCCkpuO8ILoih Jb8eG/NMsboqyj6LYlixnWZTeW2O6ryky30KE7mFOtp9spw2xN1MlQiTqSgh3SOp1Y9w kmco5a9icIdDNkSsL19er01TE4pgTg4v2OHGYvpCLBtgJXdtGrGcEbdgSepxzAYKjvDx VaLA6V5+3flBqLGXNQ2hf3URET0uf/07QvP/8aoKJKR8x/uRBGNkTMqsGQPDMcMoopHC ZAlg== X-Gm-Message-State: AC+VfDwnJFh7M1dAehiXKIDY4gUwYdKS1nYDLAwJaXK3KSMJ+Z6qFitY 2WGUgKB2QPrbqsBlUON1/kRIxw== X-Google-Smtp-Source: ACHHUZ6atGZH5fRFxmsfK54bzyEVqL7dJkCeyQRAeAXzu+E26Fai81nyT2Er1bQtHD5E8RbQ/8Oa6Q== X-Received: by 2002:a81:ab4a:0:b0:55a:c779:d8c0 with SMTP id d10-20020a81ab4a000000b0055ac779d8c0mr16230329ywk.22.1684904057666; Tue, 23 May 2023 21:54:17 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g134-20020a0ddd8c000000b0055a382ae26fsm3432251ywe.49.2023.05.23.21.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 May 2023 21:54:17 -0700 (PDT) Date: Tue, 23 May 2023 21:54:14 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Qi Zheng cc: Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 29/31] mm/memory: handle_pte_fault() use pte_offset_map_nolock() In-Reply-To: Message-ID: <9cccc47c-d9c7-3071-098-4edb54b178a@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> <5f10e87-c413-eb92-fc6-541e52c1f6be@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: 6x11qqob66z6jcz17k5fxs8ekm6wb5ti X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: AB0B980011 X-Rspam-User: X-HE-Tag: 1684904058-356433 X-HE-Meta: U2FsdGVkX18XAtWoHhrPfNEhaTBxbbHWNMqnq00pitv+5J4Vr7Y3YtLM1p/yx+uAwZlzDccHpVcT6fGALdBzxFz/off6h91UHMd3fnyJ+vRqE8KOYC45z7mgOfMsU7/QGlkMiu73n+j7YEXZUYn1HfGNHVv+fWJa/HhD0Y6g0isCJIbPiDUkC0CTWYdL4cORL82h7GycZ1/K1dLFquj6LANKHdlqQCUKa2bk8lYCDE2bRU5AmT8mYJTKM7jpV3M0Ztc2xKDtoZfdB4RYEu9TKbHPICEEpE3K1fZeH8tXnAu4/TVxABn1YpqSZBhqX+STm3t45WprSZPlwqjobvjZc8rXuTyy/ZiPB8hMgneuyroMQc9WwFNJ2tviNd24okVHdqxBMp+57UuiIJiUb6g8edMWTiQ8szEEB9y48wq7F+EQ/4/qvm1C9W5ovbcJIoCB6uuunqiC/nfMpW/lU6AggcyeX0YDiwGgiqxNG5eShc1sgNaKRc+I0TRq9X759PYNgoUrdDP546kbOx7SkWgoK8wW5QvAIiRta6b3FOHnh1RTIl56aChxKOZl80lEO8iC48ebFGijhbP1PyVcd6cioS/gUULH7RgsGtJIBq+3G3pQi2ZLxJUxA5RpvONp2g7SK1OS4idfcLt6dI13bShpW4w27hipN34RmBRZoupbHIG52BUfF+jnK2/q01uaQxCssVxJgKjCraTmJMUUYGy9Ij2dSOPY7PaGcR+TAKtdlGeqxug5a08nq7YaIzaFi7yTdOBrUXHBgMWQbIKebOSsHXy+HA67pzjmOTMrcCGxEZuvEPLFHKts+3WSm4A1bvMGAqWUiYDPE477VjCUKzXyLhqTrCDPG8BRRZkhiiouTcH5kplwOWimK+/y43rifyWHBQT0MzEz3Wrv3g1jAzbnq8kQ4rd9kP9zAH0Dq5c/6lhN29vXnTlV6xuTQjeuy4VSo9y7HmQQGTk0Q1l/VpW 5RP+F+GM raqZG/lMHRLq0nEVmCUoiZF7Se3VpVfhmOuUdN75e0qx22SWpSCNOkrVc/U/6doMKN0lKzGs3fmA07GS4waeqwBoI4t+7BGzV0lLE7SGQkMN+NOrJvDsYPh5wgapGPBw/X1lchaiASpQcMsQJkXzuqn2RhIJIwoEVu2D9kBF1VjAgQfJyL9RADqhPxPuvQ2Szs7reuF/OXtc5a5QF/FR+emCdrxLqOipVfXaYIH3fSQUaWvICARoq+ZMOHLjyGOVaxg3fPexK1H7Fb4GMoHmmCUUAqTn+sfvgBlJreFXhl99qZq2fJO3v/apxvLSk3OnVDyu2YX3aRYter8GGFQmYwZzhf3TOqCm6ZPczX+9aO3tiFHSd3pD6LubJWFcpOzh/VXw4in8Sly2FQnEZ6W3+/mP6EaFTdy+r21XIfv9LkNVfOZIk7FS+5geSOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 22 May 2023, Qi Zheng wrote: > On 2023/5/22 13:26, Hugh Dickins wrote: > > handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which > > corresponds to vmf.pte, instead of pte_lockptr() being used later, when > > there's a chance that the pmd entry might have changed, perhaps to none, > > or to a huge pmd, with no split ptlock in its struct page. > > > > Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() > > will handle that case by failing. Update the "morph" comment above, > > looking forward to when shmem or file collapse to THP may not take > > mmap_lock for write (or not at all). > > > > do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but > > refresh it when refreshing vmf->pte. > > > > do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a > > two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but > > do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(), > > so adjust that to set vmf->ptl by pte_offset_map_nolock(). > > > > Signed-off-by: Hugh Dickins > > --- > > mm/khugepaged.c | 6 ++++-- > > mm/memory.c | 38 +++++++++++++------------------------- > > 2 files changed, 17 insertions(+), 27 deletions(-) > > ... > > diff --git a/mm/memory.c b/mm/memory.c > > index c7b920291a72..4ec46eecefd3 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c ... > > @@ -4897,27 +4897,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault > > *vmf) > > vmf->pte = NULL; > > vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; > > } else { > > - /* > > - * If a huge pmd materialized under us just retry later. Use > > - * pmd_trans_unstable() via pmd_devmap_trans_unstable() > > instead > > - * of pmd_trans_huge() to ensure the pmd didn't become > > - * pmd_trans_huge under us and then back to pmd_none, as a > > - * result of MADV_DONTNEED running immediately after a huge > > pmd > > - * fault in a different thread of this mm, in turn leading to > > a > > - * misleading pmd_trans_huge() retval. All we have to ensure > > is > > - * that it is a regular pmd that we can walk with > > - * pte_offset_map() and we can do that through an atomic read > > - * in C, which is what pmd_trans_unstable() provides. > > - */ > > - if (pmd_devmap_trans_unstable(vmf->pmd)) > > - return 0; > > /* > > * A regular pmd is established and it can't morph into a huge > > - * pmd from under us anymore at this point because we hold the > > - * mmap_lock read mode and khugepaged takes it in write mode. > > - * So now it's safe to run pte_offset_map(). > > + * pmd by anon khugepaged, since that takes mmap_lock in write > > + * mode; but shmem or file collapse to THP could still morph > > + * it into a huge pmd: just retry later if so. > > */ > > - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); > > + vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, > > + vmf->address, &vmf->ptl); > > + if (unlikely(!vmf->pte)) > > + return 0; > > Just jump to the retry label below? Shrug. Could do. But again I saw no reason to optimize this path, the pmd_devmap_trans_unstable() treatment sets a good enough example. Hugh