From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E03C6C30653 for ; Mon, 1 Jul 2024 08:34:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 763F26B009C; Mon, 1 Jul 2024 04:34:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 714206B009D; Mon, 1 Jul 2024 04:34:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 603046B009E; Mon, 1 Jul 2024 04:34:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 41DC66B009C for ; Mon, 1 Jul 2024 04:34:56 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B7557140A21 for ; Mon, 1 Jul 2024 08:34:55 +0000 (UTC) X-FDA: 82290523350.20.E4597A0 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf19.hostedemail.com (Postfix) with ESMTP id AD78A1A0004 for ; Mon, 1 Jul 2024 08:34:52 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PQ4OFjDk; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719822872; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b3VHDNR1bpq/WlKGKMqPNWrd1ej56OZCzzSLCChnNas=; b=I0H1YjsKn/HHej94++36DWCysCmztDxRTY+7f2S9BXgbH0+NxARWU5bDa0bPPXueKkkOAy HyQC+nWLUJYafTqn0k1HAZrRBHAaoUZBjn5RAyxep0H51BNxZIyuH3MSKvDvEeF0LoHecs 8HExug+UQDJd780oaSysvl7COsU4g6U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719822872; a=rsa-sha256; cv=none; b=mCMprBBGIfXFKRolMRFCugVtRgQJ3QUbzutcNOJsdZ3UU9NK0kSO6xSw2LesSgr/O+TiAt tOQoeCtrLAMrjwqf0euSWcyC/sPqRkSPf+Hic96co5uP2oQYclCgfxyNaXv8KBomv+iKSd ry7wTA7Si6Tt0fgR8Sh+999Snmo113o= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PQ4OFjDk; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719822893; x=1751358893; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=p7ktz7dlrNKb9ZMJ9u8GxENG+VyajmcO26M0wpetK+g=; b=PQ4OFjDkaUbK7EM/Un72L77twrIZI5gCnX5I5txOv1yBSwUK9FGTgnxb 8rRwYrHEWgLj4dXQ39MBeWNHfGWR8/K6AXDLoC19advLUQVVENdVWpJiw 4L9dflCeHOomtg6b4SEazR6fEfyGSEXzHYjx/Yl9D7EOxkRtAZq9ERD6V yFOEHkFqeTUubqWGe23B5/ge8WVneDESCQhxUzvg7D/tvH6icIE7gX1xd QsrMNw9V9dBAL9w5RBUT6u3ggQtOYE8f7GbUidMCK5Hm+Vj+zboOEBcYJ bTarlCgl9geu8heBJF8ReXrFIvg5I9SzGBaOtuUN8SsZWe2krz7pWR7Ue A==; X-CSE-ConnectionGUID: Y3SAaBU7SFqLZHwsxMEbrQ== X-CSE-MsgGUID: dN2TGwYiRKSTjJZflYYlHg== X-IronPort-AV: E=McAfee;i="6700,10204,11119"; a="28326921" X-IronPort-AV: E=Sophos;i="6.09,175,1716274800"; d="scan'208";a="28326921" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2024 01:34:52 -0700 X-CSE-ConnectionGUID: 57dpLEkBQ6eqbP0yC6s8OQ== X-CSE-MsgGUID: LxVWJEe3TMOY5fWZplOaFg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,175,1716274800"; d="scan'208";a="50397561" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2024 01:34:50 -0700 From: "Huang, Ying" To: "Zi Yan" Cc: David Hildenbrand , , , Andrew Morton Subject: Re: [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL In-Reply-To: (Zi Yan's message of "Wed, 26 Jun 2024 13:37:15 -0400") References: <20240620212935.656243-1-david@redhat.com> <20240620212935.656243-3-david@redhat.com> <0F4ABC1D-7A26-4AE2-BCAA-3EA906FB13A7@nvidia.com> <9af34a6b-ca56-4a64-8aa6-ade65f109288@redhat.com> Date: Mon, 01 Jul 2024 16:32:58 +0800 Message-ID: <87ed8do6kl.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: xn5wgfsqsyr6s4o78hhzfg1tf1a5stcj X-Rspamd-Queue-Id: AD78A1A0004 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719822892-290164 X-HE-Meta: U2FsdGVkX1+UiV5//fgRBhO4EheoyzHea9zAPkkS86EcbpHKuskxzih3QObooYCgXBhQbtPXJIh9JY34WlyQiOJM87NJxw+ghZePTQhO8yfKfkoKP7BdFFJubvE/ipYbIyI2cAFlfnUZVxUHqiH02IkFhpiqTWLTdCIm/phtBmYT3OlpXaQwM9kCvuobhVLNgup697Ibf9oJh9AEw3Za9z5CSP/vNc0xiAw3pkkQ+iZ+kFuPZ+JfmniyjAS7cKw4fX+/XuP3XLpWsgsVaaRDMbVnhtGuAccb2dFnvWKDR9VIyjZI9TMEEiTRO6xqo4xNRWQZz0o3j9TgHqhnx4GZq+2ZovGny6g/81ao6YZwKeG4CKUDk3cg/xkKSpjepbFhbaTw90rwP7+cgwsKAdCZ5VueR5J/GKLnQCEBtKaWyxRnlPnV/WgkQqnoGDVidYzPiADYgRrmBNFU6UvMh/QMPCTrdN5gefYmX9NzcOtr0JeRYsgPN/YZ2etQLvolMK+fa87sJmLqtoPDQ+wVIPhNoIe8qvw1PDICHrL5eFCeuI/rQ4J1vMJcQmL5hLfnV2D5kt9NdG0ejVXjBUg03qZgQXtzt8tr/0GSMtVk+F+rB+8QTEgrS6yYIDUbVSjCytYGncn+Jod1G5WgpU0qQt4WYldRmc9lE1myhiNUBGHxoqbWex5mK9S0OAgpkjqm31m3lDHOY4whphG4Q6LWzzj/rrAMkF14msSElSZKZ3R85NiFrxfgmnWKCaPofIndw8fbb6ZY6li1AUPFtvgngdaYrqLkOkXaFuBL/kON5/XtqFMryd5tP1I5IaPswTK5ryksqlvbFNOVITN+kKL0kL43PY8s0L/bkrtQ7qG8PVkH1WCRMH8AZwCZrUxlG3zEERurk/htJJ2/5yBD2+WF0yOnLRQHnEPZprHtYgGO8Eb9DPRkH8uJGAlYu6KK9wxM8RF8yBRUCMhvZN0GY8xeYrq vTqkRUb1 9jXfSMrVUAgvG3klBIEd0zeoFejSZMe8whsl8va0ptVllmK9NKqmWRUKsxYRV2vKotYb4KyufCi4wBzZ4oIiHHNbmWLWdUQN0ZJvREvfqMX22bHKhkOZG+ClEGz+O5x0xF+fHYNGLu/IkLmyn5G93PP3GKg8ZZbdZqBk6cpxOTmnv7I6EYeUNxcV4IaMRdLqVW3fh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Zi Yan" writes: > On Wed Jun 26, 2024 at 12:49 PM EDT, David Hildenbrand wrote: >> On 21.06.24 22:48, Zi Yan wrote: >> > On 21 Jun 2024, at 16:18, David Hildenbrand wrote: >> > >> >> On 21.06.24 15:44, Zi Yan wrote: >> >>> On 20 Jun 2024, at 17:29, David Hildenbrand wrote: >> >>> >> >>>> Currently we always take a folio reference even if migration will not >> >>>> even be tried or isolation failed, requiring us to grab+drop an additional >> >>>> reference. >> >>>> >> >>>> Further, we end up calling folio_likely_mapped_shared() while the folio >> >>>> might have already been unmapped, because after we dropped the PTL, that >> >>>> can easily happen. We want to stop touching mapcounts and friends from >> >>>> such context, and only call folio_likely_mapped_shared() while the folio >> >>>> is still mapped: mapcount information is pretty much stale and unreliable >> >>>> otherwise. >> >>>> >> >>>> So let's move checks into numamigrate_isolate_folio(), rename that >> >>>> function to migrate_misplaced_folio_prepare(), and call that function >> >>>> from callsites where we call migrate_misplaced_folio(), but still with >> >>>> the PTL held. >> >>>> >> >>>> We can now stop taking temporary folio references, and really only take >> >>>> a reference if folio isolation succeeded. Doing the >> >>>> folio_likely_mapped_shared() + golio isolation under PT lock is now similar >> >>>> to how we handle MADV_PAGEOUT. >> >>>> >> >>>> While at it, combine the folio_is_file_lru() checks. >> >>>> >> >>>> Signed-off-by: David Hildenbrand >> >>>> --- >> >>>> include/linux/migrate.h | 7 ++++ >> >>>> mm/huge_memory.c | 8 ++-- >> >>>> mm/memory.c | 9 +++-- >> >>>> mm/migrate.c | 81 +++++++++++++++++++---------------------- >> >>>> 4 files changed, 55 insertions(+), 50 deletions(-) >> >>> >> >>> LGTM. Reviewed-by: Zi Yan >> >>> >> >>> One nit below: >> >>> >> >>> >> >>> >> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> >>>> index fc27dabcd8e3..4b2817bb2c7d 100644 >> >>>> --- a/mm/huge_memory.c >> >>>> +++ b/mm/huge_memory.c >> >>>> @@ -1688,11 +1688,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) >> >>>> if (node_is_toptier(nid)) >> >>>> last_cpupid = folio_last_cpupid(folio); >> >>>> target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags); >> >>>> - if (target_nid == NUMA_NO_NODE) { >> >>>> - folio_put(folio); >> >>>> + if (target_nid == NUMA_NO_NODE) >> >>>> + goto out_map; >> >>>> + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { >> >>>> + flags |= TNF_MIGRATE_FAIL; >> >>>> goto out_map; >> >>>> } >> >>>> - >> >>>> + /* The folio is isolated and isolation code holds a folio reference. */ >> >>>> spin_unlock(vmf->ptl); >> >>>> writable = false; >> >>>> >> >>>> diff --git a/mm/memory.c b/mm/memory.c >> >>>> index 118660de5bcc..4fd1ecfced4d 100644 >> >>>> --- a/mm/memory.c >> >>>> +++ b/mm/memory.c >> >>> >> >>> >> >>> >> >>>> @@ -5345,10 +5343,13 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) >> >>>> else >> >>>> last_cpupid = folio_last_cpupid(folio); >> >>>> target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags); >> >>>> - if (target_nid == NUMA_NO_NODE) { >> >>>> - folio_put(folio); >> >>>> + if (target_nid == NUMA_NO_NODE) >> >>>> + goto out_map; >> >>>> + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { >> >>>> + flags |= TNF_MIGRATE_FAIL; >> >>>> goto out_map; >> >>>> } >> >>> >> >>> These two locations are repeated code, maybe just merge the ifs into >> >>> numa_migrate_prep(). Feel free to ignore if you are not going to send >> >>> another version. :) >> >> >> >> I went back and forth a couple of times and >> >> >> >> a) Didn't want to move numa_migrate_prep() into >> >> migrate_misplaced_folio_prepare(), because having that code in >> >> mm/migrate.c felt a bit odd. >> > >> > I agree after checking the actual code, since the code is just >> > updating NUMA fault stats and checking where the folio should be. >> > >> >> >> >> b) Didn't want to move migrate_misplaced_folio_prepare() because I enjoy >> >> seeing the migrate_misplaced_folio_prepare() and >> >> migrate_misplaced_folio() calls in the same callercontext. >> >> >> >> I also considered renaming numa_migrate_prep(), but wasn't really able to come up with a good name. >> > >> > How about numa_migrate_check()? Since it tells whether a folio should be >> > migrated or not. >> > >> >> >> >> But maybe a) is not too bad? >> >> >> >> We'd have migrate_misplaced_folio_prepare() consume &flags and &target_nid, and perform the "flags |= TNF_MIGRATE_FAIL;" internally. >> >> >> >> What would be your take? >> > >> > I would either rename numa_migrate_prep() or just do nothing. I have to admit >> > that the "prep" and "prepare" in both function names motivated me to propose >> > the merge, but now the actual code tells me they should be separate. >> >> Let's leave it like that for now. Renaming to numa_migrate_check() makes >> sense, and likely moving more numa handling stuff in there. >> >> Bit I yet have to figure out why some of the memory.c vs. huge_memory.c >> code differences exist, so we can unify them. >> >> For example, why did 33024536bafd9 introduce slightly different >> last_cpupid handling in do_huge_pmd_numa_page(), whereby it seems like >> some subtle difference in handling NUMA_BALANCING_MEMORY_TIERING? Maybe >> I am missing something obvious. :) > > It seems to me that a sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING > check is missing in do_huge_pmd_numa_page(). So the > > if (node_is_toptier(nid)) > > should be > > if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) || > node_is_toptier(nid)) > > to be consistent with other checks. Add Ying to confirm. Yes. It should be so. Sorry for my mistake and confusing. > I also think a function like > > bool folio_has_cpupid(folio) > { > return !(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) > || node_is_toptier(folio_nid(folio)); > } > > would be better than the existing checks. Yes. This looks better. Even better, we can add some comments to the function too. -- Best Regards, Huang, Ying