From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7171CC47072 for ; Wed, 15 Nov 2023 03:00:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBF8B6B0321; Tue, 14 Nov 2023 22:00:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C6FBC6B0323; Tue, 14 Nov 2023 22:00:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5E476B0324; Tue, 14 Nov 2023 22:00:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A7B626B0321 for ; Tue, 14 Nov 2023 22:00:44 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 57336B5CFB for ; Wed, 15 Nov 2023 03:00:44 +0000 (UTC) X-FDA: 81458686008.10.A1D2153 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf10.hostedemail.com (Postfix) with ESMTP id CDF9DC001E for ; Wed, 15 Nov 2023 03:00:41 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Yebyzf4p; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700017242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cK4E4SKkOZbS5Ky2rJBEddy+bx4mlhYW+Kwstk7/Ejs=; b=cYq/nflO9NfQxbQaxxRRAknYc7/L9prACnbNVYW8diLnkSHudhC0fs/55EzYasDM/bDUwm jeGc0lUoBF9RBSagYTScERIb+rDgkadXS26H/jPBqdutXQM2jiuBIcZIIYR47ABLmhPfoD FoDcfQAj3vtanYaKMVtv/eBmUCQlupw= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Yebyzf4p; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700017242; a=rsa-sha256; cv=none; b=bgYMcg8x/MhwNbtSL5T0vCUUThGnPhjRY1WEoV4jOfnp6SOS7rycopbtdc6NuJ9qJNBeLr ChnWq27jHQ5V9y0fTtx3nZDuPwR/jyXwoHrRb9ysysoeAhtHoTHKpNsow69a4kc8HDQBti K0gnEZOitSIYT+8hDNLI1JBHp808gk8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700017241; x=1731553241; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=0ABMAvlua1+dMNScNiJgwLMEsJzQQD0nYg82pujlRYY=; b=Yebyzf4pAdUp0XgrtZ9LcUItN1PRVnna7+LMGfwF2z8zG7Nf1yeO8e/Q 2YbulObidNRbz64L/l3BBmNIggM+k2bPbFtwoE8O2I81uhzWNprjuZYcB JAfJ5LOjQAaNxKltdOyka4H+kLYGKyQTYMhHkDtY5XRjjy+6aBQAzeN2U BEb+MP0yxs4H+hcDsGDLiDcgsrK3O0IUciPuJ66AC2bT3y8qgELMd8XyO 2GMcVfzdhBlUyhJQUughz6MaHuz4wyLO7Zn2vb9XVSbuSIFfDXdICx8A8 OSXmfJsHF6ltr6uHwTCRlqY07ADxnfOnfNB43r/08t87CpRTJxN8QzgVK g==; X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="394714509" X-IronPort-AV: E=Sophos;i="6.03,303,1694761200"; d="scan'208";a="394714509" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 19:00:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="1012127430" X-IronPort-AV: E=Sophos;i="6.03,303,1694761200"; d="scan'208";a="1012127430" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 19:00:35 -0800 From: "Huang, Ying" To: Baolin Wang Cc: David Hildenbrand , , , , , , John Hubbard , Mel Gorman , Peter Zijlstra Subject: Re: [RFC PATCH] mm: support large folio numa balancing In-Reply-To: (Baolin Wang's message of "Tue, 14 Nov 2023 19:11:56 +0800") References: <606d2d7a-d937-4ffe-a6f2-dfe3ae5a0c91@redhat.com> <871qctf89m.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 15 Nov 2023 10:58:32 +0800 Message-ID: <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: mwgs77f8fzthni6awwj76enfmpmtgcb4 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: CDF9DC001E X-HE-Tag: 1700017241-263289 X-HE-Meta: U2FsdGVkX1+MF2wckhRIkj644ax6Db8KWezN/grehs8632ZvQn6h0TF23shKAGMRMwPaJadMzr1qVSig6ta3IBQeckcIO+Fk5C5rG4fQyq4Me2YxQlAznS2Qvdk5XbPtGtFikcRgRcuQqvxtsNSd4mck7C+tN+TwkjmVBP2jCm94rxPlDS91PEutJnkpOzAa6aS0lhlC/bEjcgo0m6kcjRloUbdF3EVydsx10Tqws79YqqF67tdsEzTbgWARe9NfqGVdKc4/3Fb1TDOgx4Jqum68wgtDBlazfJPnV5SRHVTwyo0CSiek2Asb0IVSEt2eIumeXqNKocwzCHBr8VmP47tWzS//I6sBQuuhRP0X9OS1AbKc+5k/FW0xhmD5Rzvh00IVVAX8S+b6n/sT40av3ke7Zrf5pzwcHE7NPJfX1HPFoBJ0/mFp5D920DiOFX5qKU9AKs6TsiT/r3ejk3v8h8T+lmIau/uC1EZ4+Ys3OdGd957/IhPZvicjt6BepL548gZK1lG+HD+g2mCJqOMTdarPRYGsByDtSXJv7UEfSUyvq/rhUtDYKCzNVWAE1pcPPvqV41/KJi3ZPg1NyIhOfGPKOwmnr3zEPdydNdVkxazcIMzaKuhxY3HuCvZhFycLMjWB8Y+XH7bh6iTRd7h2uJjBEczPpxuvOgIdWslEqe9buOdkkTO99FX5OSP9uP257SkEFoXy2YKXsUsEEeDqfqsjrgheGLQ6R2pmgIcDgTiryopGVqc1hT9ft8spyGu03fWt7hBcQDErl8LsL5rSoqI6wNaFEqtjHhYpXDS57cvnyWrYhmPFhBDoWoXlMmQ+MTxsfPWQ9qOQgSjqnMh5ba3EflNLC/6umtPM2tNMUEs6s9WuU+BoJPaYVh9zQUC7zipVbs2OOOfEeo3/EIjPpC4L2XtRzgGBmPcGMlqfxDSFERhx+i/F3bFRiAUTrtPNYu4X6kZ2UyDk4ug5NCk 4cAcgn8t AsY+XxmWch28+tb9/CcP0hRbLsjwe+eddO03MCTW/Ykd/Bd43sO2Vx6uYpneuh3HslI0NamvFtfyQ9S4XRoiqx3N6ymUl/K5bh29zYzBA/EkIi9h2Ny+XZXJnbd1KM2cBqs6Nkmt/D/9vJVA5NR255sdz+6La1/7wx5Mmw12kYX7wZH8ln3CukoaAgmRJprixc/HGbnM6yFuIRe+L7C+XHB/hiQ8OGp1E2ktovShBAYu9fa/UMuhPVcwweg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baolin Wang writes: > On 11/14/2023 9:12 AM, Huang, Ying wrote: >> David Hildenbrand writes: >> >>> On 13.11.23 11:45, Baolin Wang wrote: >>>> Currently, the file pages already support large folio, and supporting for >>>> anonymous pages is also under discussion[1]. Moreover, the numa balancing >>>> code are converted to use a folio by previous thread[2], and the migrate_pages >>>> function also already supports the large folio migration. >>>> So now I did not see any reason to continue restricting NUMA >>>> balancing for >>>> large folio. >>> >>> I recall John wanted to look into that. CCing him. >>> >>> I'll note that the "head page mapcount" heuristic to detect sharers will >>> now strike on the PTE path and make us believe that a large folios is >>> exclusive, although it isn't. >> Even 4k folio may be shared by multiple processes/threads. So, numa >> balancing uses a multi-stage node selection algorithm (mostly >> implemented in should_numa_migrate_memory()) to identify shared folios. >> I think that the algorithm needs to be adjusted for PTE mapped large >> folio for shared folios. > > Not sure I get you here. In should_numa_migrate_memory(), it will use > last CPU id, last PID and group numa faults to determine if this page > can be migrated to the target node. So for large folio, a precise > folio sharers check can make the numa faults of a group more accurate, > which is enough for should_numa_migrate_memory() to make a decision? A large folio that is mapped by multiple process may be accessed by one remote NUMA node, so we still want to migrate it. A large folio that is mapped by one process but accessed by multiple threads on multiple NUMA node may be not migrated. > Could you provide a more detailed description of the algorithm you > would like to change for large folio? Thanks. I haven't thought about that thoroughly. So, please evaluate the algorithm by yourself. For example, the 2 sub-pages of a shared PTE-mapped large folio may be accessed together by a task. This may cause the folio be migrated wrongly. One possible solution is to restore all other PTE mappings of the large folio in do_numa_page() as the first step. This resembles the PMD-mapped THP behavior. >> And, as a performance improvement patch, some performance data needs to > > Do you have some benchmark recommendation? I know the the autonuma can > not support large folio now. There are autonuma-benchmark, and specjbb is used by someone before. >> be provided. And, the effect of shared folio detection needs to be >> tested too -- Best Regards, Huang, Ying