From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C438C4332F for ; Thu, 9 Nov 2023 17:27:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02DD9280004; Thu, 9 Nov 2023 12:27:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F20148D001A; Thu, 9 Nov 2023 12:27:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE74C280004; Thu, 9 Nov 2023 12:27:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CE5198D001A for ; Thu, 9 Nov 2023 12:27:54 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9D466140255 for ; Thu, 9 Nov 2023 17:27:54 +0000 (UTC) X-FDA: 81439098468.02.716115F Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf18.hostedemail.com (Postfix) with ESMTP id 518A21C001B for ; Thu, 9 Nov 2023 17:27:51 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=TCp9iWZq; dmarc=none; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699550872; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sx5gc6CzO6ARk+0DIC+sb4srkgbMkdfPH/0I6PsPo3Q=; b=wx0xTwRie8E22ptBVyHEO81v1p89We3uDliuQAduCxDbGbayeD7llV+ffOlESQaZ7K7c1O hip9C714o3X+L+fTJcwErwTyLqDtMcUCPUCwYzigoRS/uds9Jj1l+hQ0yyQYact+J+2NHp xlmEEyqvH/dMZIFFnK52heUERpL6zFU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=TCp9iWZq; dmarc=none; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699550872; a=rsa-sha256; cv=none; b=AZUIGc7au9cOy5SJN1M1TwoR/UQGf7yrqU9ycY9I0uah/rswH6gwRSQA1nprUtfyX2FRzO jjgZXA61Gw20F6NipVrkRrhzreRxKE/CEOHdXeONrvX73+htjQJsg1B2e1RZRZs2KjR1yg 2/SI+W/wsUAZD2DG3iBHJPwCVbwu/VY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Sx5gc6CzO6ARk+0DIC+sb4srkgbMkdfPH/0I6PsPo3Q=; b=TCp9iWZqfexJoSuBZL95DEpqqW Xe8trWVA85ipEAqdc8ml1pl2x4k6YiGk5Zz2L6iOBJBUDVnOVb0+2OekbOJMDLqu4bifLEYTZWm1Y uzQih9eKiWXcW3Ox+jUK30v62y6axGLEqIO6/kjYI8rghBuWerN0uze8B9T3OI/2xT6SFfoPZUIPr +08+zthG5r7gr83+KpxfpOeDm3JbZ+VEXsSm+m/diEVGNt8YUCoVd1jaRdGycni/fFVmnQtall5vR IUwgVgiDL7xueOD6fMiudeClLWhdJFcRtzw7t9rsHm4KkCRir7U9XntxQk5UV3QGc8h/+b711nniw RNe6qRYg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1r18oi-008Vwk-VQ; Thu, 09 Nov 2023 17:27:45 +0000 Date: Thu, 9 Nov 2023 17:27:44 +0000 From: Matthew Wilcox To: "zhangpeng (AS)" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, lstoakes@gmail.com, hughd@google.com, david@redhat.com, fengwei.yin@intel.com, vbabka@suse.cz, peterz@infradead.org, mgorman@suse.de, mingo@redhat.com, riel@redhat.com, ying.huang@intel.com, hannes@cmpxchg.org, Nanyong Sun , Kefeng Wang Subject: Re: [Question]: major faults are still triggered after mlockall when numa balancing Message-ID: References: <9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com> X-Rspamd-Queue-Id: 518A21C001B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: pf5mhhwxwhrhxfm9ydc4uy3jqjszjwjj X-HE-Tag: 1699550871-731694 X-HE-Meta: U2FsdGVkX1+zZ63Fqd2H4tsqvHcaMH7lznG3OO1zQez1X3WdN+rGOC/C/56/obR86xc/qJYjGpOzWfvEsk1ErzMk3zgh0vQTirnFGdAQzmZ4DazwEBdgOh5rmNt/amp1kXNdE7VWMtCekBmW/lMwxD7AB7fmJqDBKTsb6COoUOOQaYF39Op70F1f++oVfXMqUaA22mZF/CEaI7srLO7edRgc0TFlfkJqID0EMkGDALlrTMYyxRR9IV7Iil68zEtqcR+73rNhssnaHsE9dc4WOzTY5zUZIw2z+zgSxPGIiB7DEpvXoR8oE8ViO0zLmQDzrPWkqPdEiDn93eDkMtJjr6UZKnmkCjLet+qbvtU/QIBHy462Mwrt3Zit2Tq2S8S2PH5T09+4/znm40yYwzRenNH1xsbUWcbzKr38yFMhitpcSj0PtC4yfYr2XGu3QF9Q1a3UDmHxcvQ9mhJ7R4rhIgVZ8vtQpRQAkL0k/kWtxwYE+0/bAXFFFfkYaMel9ZwBpSUA611sSdbi/Ud4/fs91dH1Rd2XUczinae/fu66nJxG1yEqlbiGnMYztFFKs+g5RnWElLdDqvJxn8jy45vrVpApiGud1IyEzlbew7xBvLTyZPOdF5GriNEsXbfftimntLTVc/QRvLLUABZfO3N/HqHwFTZkBJjUPrn/ndOKKRfqsD+5BIF7MRlCkNCWkkH9Lsk/bjhelzRm+21n5qYhDTWqnGteHtBXpntl5kKIQISZRmURheznURn/gR5rSg9LCSDZyOtFEysgyuQoDFS3bsdlQ2L51y4IGdp9NiKb1z0mOkyEFuqWhZ3yKMEX6isLEu6UHkmPDMDZmLMU00Mu0HTEvp7LNRFlnl5zYiehoHXVcFlS+oVplD/DBZ/y9QeRO4pHh4FsOKMqi4fmG6rbhrBjgWxbaPRJty9Ji51F1/F+FXktiVdbyKNVy38OPUvZthKeJKuLUvn171OLzRu w+InCX2s F4eq7D1RVeUMPShk7rqhEuCShtnYu7OSZk9a/AoR03O2JBfgulNizMlncVlNpGnJXSDeEzbNrXkilrQlrJsQTCm7uMNnLCjHe3V7qtFnQbMHbZCSLAYE5Ex+VIpWG2dx0H+Aj7w8dA9+YEbI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 09, 2023 at 09:47:24PM +0800, zhangpeng (AS) wrote: > There is a stage in numa fault which will set pte as 0 in do_numa_page() : > ptep_modify_prot_start() will clear the vmf->pte, until > ptep_modify_prot_commit() assign a value to the vmf->pte. [...] > Our problem scenario is as follows: > > task 1 task 2 > ------ ------ > /* scan global variables */ > do_numa_page() > spin_lock(vmf->ptl) > ptep_modify_prot_start() > /* set vmf->pte as null */ > /* Access global variables */ > handle_pte_fault() > /* no pte lock */ > do_pte_missing() > do_fault() > do_read_fault() > ptep_modify_prot_commit() > /* ptep update done */ > pte_unmap_unlock(vmf->pte, vmf->ptl) > do_fault_around() > __do_fault() > filemap_fault() > /* page cache is not available > and a major fault is triggered */ > do_sync_mmap_readahead() > /* page_not_uptodate and goto > out_retry. */ > > Is there any way to avoid such a major fault? Yes, this looks like a bug. It seems to me that the easiest way to fix this is not to zero the pte but to make it protnone? That would send task 2 into do_numa_page() where it would take the ptl, then check pte_same(), see that it's changed and goto out, which will end up retrying the fault. I'm not particularly expert at page table manipulation, so I'll let somebody who is propose an actual patch. Or you could try to do it?