From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D800DC4167D for ; Thu, 9 Nov 2023 22:54:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E0D74401BA; Thu, 9 Nov 2023 17:54:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 38F57440183; Thu, 9 Nov 2023 17:54:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2576C4401BA; Thu, 9 Nov 2023 17:54:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 12D88440183 for ; Thu, 9 Nov 2023 17:54:30 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EDBD4A0394 for ; Thu, 9 Nov 2023 22:54:29 +0000 (UTC) X-FDA: 81439921458.24.175D757 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf25.hostedemail.com (Postfix) with ESMTP id 38CB7A0004 for ; Thu, 9 Nov 2023 22:54:28 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K4yXXv0D; spf=pass (imf25.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699570468; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JIZdn5XykZif1npBwqP+hLZnRaUDtVjzaMva9504omU=; b=IMAzVWIDGouhlX3FxaTZFmJYPuKuKKkMPKo+wMY6ocJZxoeyulaVXqgSQCr/Tyy0OHx9Ec ebvuMtPo73vaV2yw88Wa/rDD/BYUBy0yby47jRSGDEDIC2T/Zx4mXTNW1pGg8ApXh4YM8r CUNnCAIe89hZEQk7nfu5MZfRXrRQWvM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699570468; a=rsa-sha256; cv=none; b=w8iW+5Lkbk76mPKbW1ZnzC/38FbxWyZGCGnCqE4YuWMMAO5bgAUpZDDKgMQlKq/uKh9G0K I8qQbaW4EAALqWBqnOPgRwT+G201YF2ThacCSAW2SfYv4O2n/n123CBLrlt0pHNQ92sGxx uh72GUft4P0f2VPMRpZWU1oeaH2Ps2I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K4yXXv0D; spf=pass (imf25.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2800f7c8125so2024079a91.1 for ; Thu, 09 Nov 2023 14:54:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699570467; x=1700175267; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JIZdn5XykZif1npBwqP+hLZnRaUDtVjzaMva9504omU=; b=K4yXXv0D73/PX6bDeuj4bcpm6nVo3f1qfjIxAI1/Z0FodyzHTAVsi5XfsPoKGN1B1z OKasfBvvQgm0rUm/vPvChOCX71XEp6TSl1XixfFH1L5aXzR5bdRlzsWR0fnkZrsISVUO dtHjn1cbzRhRBA8x0mmf+6th2+Oz8+YWUVZFUjOwcQ/vxMFDl/AZM5gYSeMk5J86ab7+ p9Cwa7ckfmVDan/8QKMGpjr5VGDGUFdHgQLDOelg+x/tYqJgVfYbm1PFNhKHpyp2d6dl ObRIaMgnfFdZ7cc05aCfAOiwOD9xa0LyIfb85rzKbvyY7zlTHb+Ku1ugjbBjZudLODav +yYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699570467; x=1700175267; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JIZdn5XykZif1npBwqP+hLZnRaUDtVjzaMva9504omU=; b=B0jeFHQBt/CroVL8I1AQ2ix1wzaEpceVg+dxnQFCmbMYbhoJWaB22SzQz0IOmGOdmD Rnmmm6BnYNAUnh0shnU/AUZynuvymPKe4BWv9b/5aoJDjB1+oGanKbZYmfgytsNcjaNq +nPEiqVo1NH5zS+M89V42ksQsuRrqatH7b0vJqm7TRAeUeq5i037+wWzY1Mi5Gx28RkJ EUGaPgjinQOydA9Kxu2pbEScYjidMvvVyCzP90LZDt8BLUKGZdanlM+6/zAuYgOTrTc2 QiT6UBqCQEzEFm6BqxVKRV/gFCo+HlCxLcioMDvk8yBLbWbwxSNxfkIk+f5uTPrTJ4W0 GIdQ== X-Gm-Message-State: AOJu0Yy8S6kRvAuw3KT0gwHrTUAVaaU29ZdPrn5nRpdUYUVV8zTSUHoQ S8Dk7JHan1PLXQfsk3BBJH6erKCDEwKYYz5+V3s= X-Google-Smtp-Source: AGHT+IG6IaoVKh1ZSoJU8uXbc5Gn3Y0Jrl1qPtYPEoHh5YTH+VRqcY2zdlsPjHv/eRW6+UAE7FSUC051ONuR6Jto/8Y= X-Received: by 2002:a17:90a:f6ce:b0:280:22bc:686 with SMTP id er14-20020a17090af6ce00b0028022bc0686mr997046pjb.6.1699570466977; Thu, 09 Nov 2023 14:54:26 -0800 (PST) MIME-Version: 1.0 References: <9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com> In-Reply-To: <9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com> From: Yang Shi Date: Thu, 9 Nov 2023 14:54:15 -0800 Message-ID: Subject: Re: [Question]: major faults are still triggered after mlockall when numa balancing To: "zhangpeng (AS)" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Matthew Wilcox , lstoakes@gmail.com, hughd@google.com, david@redhat.com, fengwei.yin@intel.com, vbabka@suse.cz, peterz@infradead.org, mgorman@suse.de, mingo@redhat.com, riel@redhat.com, ying.huang@intel.com, hannes@cmpxchg.org, Nanyong Sun , Kefeng Wang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 38CB7A0004 X-Rspam-User: X-Stat-Signature: w3x6uzatntona3fhy3eo1ef4bidukakd X-Rspamd-Server: rspam03 X-HE-Tag: 1699570468-559518 X-HE-Meta: U2FsdGVkX1/ABbU1Rg0+lH7GhhKwJ0f2jNIv7BBLpa/H3Wi9+KM9NuShz/Q6hFwaRZfAzxtXO+7nSKOS8wNjdwt/ULRhLywZRZEino6lGF/ylzpxIlVqWPZvXYt0YHPe59RAEMfRILtekkrK9nzKX6W8j8jTFUJ3UZHngFZbTOwAU4LfkzdB+aZHellp5RARfviAHW4X88BlnMSSlYFx+WzLAW/Tn4kn84MOmQO1MUs/kpE4ZWD6/z/IreF6Te0NTsqYiGrykmF3wDCQo/R1vI9xPBskr0Hupviv3FbtebuiPAFcUNO3PhHxPcwvnUn2f++43KIxMQx3pz24jk6fpxsRQsqZTSdaqh+3aR1BruvoRWK0YvQe5q3ON5u5O8uYS9Kn7bOzF842aaGluspuwJaX9EO6n3CcnHc/EBTgQ3eFWnbcgnMsyGx3+2EuhGOew8bdv3YL9itUW5jCcQ2zt3FvKiGUCZIYPgP03gEBc8mgpRplhI5+3PkHo2pXsn/+yrH4+owQEXxizzjtEEjh6PHKBQdMMcbVT975pGOTreQOll2AG4yE3lE2RLKylax6AVnWgRi3VRTF9Vp8BU/AhKWtHe0YSHEClCqLw4fqHuIpCsJ8ESOR1n+sNFQlIgvFu09I+OKKwt7nfKKWZlu68f3RB+BGyr/D0lB9jKY9VBT0NgCkUE7+wcxOoeNIUoN0xixdi36e/kqc/f7t4lzuzETcCmNkA3NmrQ3wjGNUcfZ+5rJBHCvs9lyDs08RamOAaEhrJWwTSClO7RjyzD2oBELY6VtXHawt/x8DX97TBDJW6oKj6QBRbO4qNT+aGdhBU7klntLGnH39ORcochabBc2vMgbnS0pkMakLg1rzjJBimjQ7tFWN/b3g5TwlOqoBDqONmxBUkRu9X6KERMzvhYFwHqBnh2QF/vSzfuQDZDPwp1jrrKbvwdkvzz21R+3ogp5fBCJJq/DivnOUaWn xEry/7tK 7XcwZWlN4T4enVPEWkaX6NXiHL6i40KPVXpzxHxAQ8lmZLBQOU5BTnPjtSlKmj/Sg3Y17Oet08tFTzd8mjwPVnEahDim8pu35bT20jrqsB0V8c3MD6nxsO1ekHyKdja414rX3A8uj2UfDxILAzmMMD9XmGeB9SxiYjdjIqUy9JMZFVKXGz85cLpWcV+l+PUywbsLVJ7haClYuxy3Rybj+u/JXwSf+FSLuVZW2Nu8uswzFjCXd01gbhGIaR5WGP/TDThBeoQTJHb7RH4Q4BXwvXKKBR3S9PlOz3r3aSJ+n2SWjlqM5u6GgMnnlsQhY6Clb3k2heFjur+q1dEvM83uYuTFPkA1yd2krvxlGBxMd3ok9d54d27jlFW9zBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 9, 2023 at 5:48=E2=80=AFAM zhangpeng (AS) wrote: > > Hi everyone, > > There is a performance issue that has been bothering us recently. > This problem can reproduce in the latest mainline version (Linux 6.6). > > We use mlockall(MCL_CURRENT | MCL_FUTURE) in the user mode process > to avoid performance problems caused by major fault. > > There is a stage in numa fault which will set pte as 0 in do_numa_page() = : > ptep_modify_prot_start() will clear the vmf->pte, until > ptep_modify_prot_commit() assign a value to the vmf->pte. > > For the data segment of the user-mode program, the global variable area > is a private mapping. After the pagecache is loaded, the private > anonymous page is generated after the COW is triggered. Mlockall can > lock COW pages (anonymous pages), but the original file pages cannot > be locked and may be reclaimed. If the global variable (private anon page= ) > is accessed when vmf->pte is zero which is concurrently set by numa fault= , > a file page fault will be triggered. > > At this time, the original private file page may have been reclaimed. > If the page cache is not available at this time, a major fault will be > triggered and the file will be read, causing additional overhead. > > Our problem scenario is as follows: > > task 1 task 2 > ------ ------ > /* scan global variables */ > do_numa_page() > spin_lock(vmf->ptl) > ptep_modify_prot_start() > /* set vmf->pte as null */ > /* Access global variables */ > handle_pte_fault() > /* no pte lock */ > do_pte_missing() > do_fault() > do_read_fault() > ptep_modify_prot_commit() > /* ptep update done */ > pte_unmap_unlock(vmf->pte, vmf->ptl) > do_fault_around() > __do_fault() > filemap_fault() > /* page cache is not available > and a major fault is triggered *= / > do_sync_mmap_readahead() > /* page_not_uptodate and goto > out_retry. */ > > Is there any way to avoid such a major fault? IMHO I don't think it is a bug. The man page quoted by Willy says "All mapped pages are guaranteed to be resident in RAM when the call returns successfully", but the later COW already made the file page unmapped, right? The PTE pointed to the COW'ed anon page. Hypothetically if we kept the file page mlocked and unmapped, munlock() would have not munlocked the file page at all, it would be mlocked in memory forever. > > -- > Best Regards, > Peng >