From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C8FC27C4F for ; Fri, 21 Jun 2024 17:47:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65F2F6B05F2; Fri, 21 Jun 2024 13:47:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60EF06B05F3; Fri, 21 Jun 2024 13:47:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AF126B05F4; Fri, 21 Jun 2024 13:47:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2AD9F6B05F2 for ; Fri, 21 Jun 2024 13:47:36 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 59E7CA113C for ; Fri, 21 Jun 2024 17:47:35 +0000 (UTC) X-FDA: 82255628070.12.54E6731 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf10.hostedemail.com (Postfix) with ESMTP id 69767C0013 for ; Fri, 21 Jun 2024 17:47:31 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=XzVsdzul; spf=pass (imf10.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718992041; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bs87PPJQVTQ2eZp3vLh6LBPKnNfGxQBD9n+kptHrFbY=; b=CIYssDHIH1EBhJiSuBkh0lumgKJTx2RNyMHazVcwRF1lW5W1wKQ3CCap2VnmQuV81A9yKc LDo+8BdtQWMCjI8t7AH6AcvrXsGaCpd0Gcp/yrKogN79xc+AK18rqcVkwk7X502+SPdr0H Cubgdb0sDe9IgooUNv+UcUtLdcWm7a4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718992041; a=rsa-sha256; cv=none; b=TuzwduG1xxS1juSgMOYb3zTj4Xj3uphe0ZbHx+FjPWHVt9vCR3mlNXs3yugSAElvAz/buU R3m5QBJqVe/xtxuPkENZ6Cz6H5QRVStd2WV/DNnDljaCLvtPMppRNgb/wDboa0d4Tb8i2M nZVAmRwVhBU/mSq0LFqxAjqtRzBcYdg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=XzVsdzul; spf=pass (imf10.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45LHU9P4001221; Fri, 21 Jun 2024 17:47:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= message-id:date:mime-version:subject:to:cc:references:from :in-reply-to:content-type:content-transfer-encoding; s=pp1; bh=B s87PPJQVTQ2eZp3vLh6LBPKnNfGxQBD9n+kptHrFbY=; b=XzVsdzulq5svMWxTV vcW4zVui0SlaDLH1QLAcoj0oS4cCJ9DWSg6YeJ96vLbMA16I+V4ONnNzdWvqPt1X Nh7rgFxlA1fghS5FkYLBS+t8Ohq1L3TSXcFvEfDD691L7k0yMubLegsQ4rX2/m4+ MfQJzjdOK1R6ICUye8stHyRr8O+vk4HnVkUGogjmmBd6PRNpmmwv6doaHoqRu3TM XPyW792Pg46qP4EUbfTx6BUzjXW/gq51XdUlYm4/fCgzGU22AR5LxR+4l6DiYiSo CFIhzCyynYfVpx0eLHut/MgKwzJmLVqPNHb6zKkdTtWey0zgFzMhOaAX13qyb4lH jVMkA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ywdyar1pc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jun 2024 17:47:30 +0000 (GMT) Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 45LHlTWr003582; Fri, 21 Jun 2024 17:47:29 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ywdyar1pa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jun 2024 17:47:29 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 45LFTn7l031351; Fri, 21 Jun 2024 17:47:29 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([172.16.1.74]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3yvrrq8f7s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jun 2024 17:47:29 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 45LHlQjU22151926 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Jun 2024 17:47:28 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C092858054; Fri, 21 Jun 2024 17:47:26 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1256D58056; Fri, 21 Jun 2024 17:47:25 +0000 (GMT) Received: from [9.171.63.80] (unknown [9.171.63.80]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 21 Jun 2024 17:47:24 +0000 (GMT) Message-ID: <4474a212-7fc0-46b8-9f5b-bae878970b6b@linux.ibm.com> Date: Fri, 21 Jun 2024 23:17:23 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton References: <20240620212935.656243-1-david@redhat.com> <20240620212935.656243-3-david@redhat.com> Content-Language: en-US From: Donet Tom In-Reply-To: <20240620212935.656243-3-david@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: JllXY4P0dRnbr1_TgDGf1UqTJE2xr3NU X-Proofpoint-GUID: y1LQfnrD6B2QqpEwxfX9cCcEZZLxKGB5 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-21_08,2024-06-21_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 impostorscore=0 clxscore=1011 malwarescore=0 priorityscore=1501 phishscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 adultscore=0 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2406210126 X-Rspamd-Queue-Id: 69767C0013 X-Stat-Signature: ostca7n9h5qaug8ncstk3yq77rcyfx9m X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1718992051-602681 X-HE-Meta: U2FsdGVkX19h56NiYdIbYxaZ69q58JtEgduwZ861sd+5TpJgPE/H001vP68UAxHWyo77abWqKiPcMfeu9Y2wBrSgG3UKDSAbglBfUvAtJMfMfAsSRK14gCK0brQrnJAHXCGxVbWtAGrKXGdaAtzMkAtvg8ze0QHFzsUkoCO9CIXVVIUIbMEwS8sBYHZgCRdnk7duozQaAxoJiB9/C/S4XljWqdEo8Y3lKrqv5mfEME0kJ5L5XZNoeAPJw03NRqcokBJcs7/hIlIfOTnd+e/QX9jsMdJlqsB307yu8oEEPX5N+LMwR1fRtKUJvVkos+A9ZSgHuqcRkjFdfo+a7sDv4zfYIyc5Yz3wACN1V8XRW92sGb41AQ1wY2/ykATNB4H1yiEwweHczBRC4G2mM6W1FigHB3id/coSigF6Zn0hHX7S/4v0kJ8NONJtpJz/Eg5Zlk97q75NAWbmgVKO0+w1z6yoi0FLegpywBOKEmco5StWIrLf/rvYTSMnqTg6jknmm4ZrsU6xr1fd1ipo4qOflOpcVnh1bpYFBnqaEzPKpR+gP9uJ1knZzT8I5UOJNT9sr8oWkHYGzAiZS43doZYgdra4yPU18CNpmOWxi437SBE+MeBCBSkFzFESc2bF9sUvXiAuxrvqCmcEjTNEAktgAYc8oN+qxTZmUBLvBmEPeNxXcnt7WLWIlXcHSw8iiydSGNaks9SCmeHKyvA1R0TPmFiQYgllqtCHJnrfTJNB4X8unQriKCiF1xwUXIsR08kEjnjMsW4NUPk6i1fa/AvY4YcrSl+gnF+ppDmhZWOvYZ45IJza44u52yguCB9dWDRcexVKqVkNmIPQcI9ykifzWxabGAW0ZgRob5LiSITye0Kt4UG1EcMbrCAfN2qB8NHvh5szrJGlGteF4JclU/eQ2x44B79Eh+T50HqUFRgngvsMCxpI9dkR0KTKrkLw+8tgeSxdz+VIB/IcwgUwU2F 1HhzSWT8 EPZk3lGHO9UnilzTF54deKLYCm7k7u+8gFKy60ADEpup+K8HcG1808Wvdl2fl908CDtf37kmjcKFdTP3Uyqluhj/OdtxYUoKWsWbUl23ULRYCM7VuNhXUV9H7t4OuFVgmr29j+AsNCsHzje9X09fl0EqMVqMHG+6CF50f58fj2hZbod2gjPPKrCJK1VztIdHJxYc0hzLiiMtcVM6D+vEC1gLjhzEQSp1KbH9QUCdNi92TBXwdt7MhZWwjnw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/21/24 02:59, David Hildenbrand wrote: > Currently we always take a folio reference even if migration will not > even be tried or isolation failed, requiring us to grab+drop an additional > reference. > > Further, we end up calling folio_likely_mapped_shared() while the folio > might have already been unmapped, because after we dropped the PTL, that > can easily happen. We want to stop touching mapcounts and friends from > such context, and only call folio_likely_mapped_shared() while the folio > is still mapped: mapcount information is pretty much stale and unreliable > otherwise. > > So let's move checks into numamigrate_isolate_folio(), rename that > function to migrate_misplaced_folio_prepare(), and call that function > from callsites where we call migrate_misplaced_folio(), but still with > the PTL held. > > We can now stop taking temporary folio references, and really only take > a reference if folio isolation succeeded. Doing the > folio_likely_mapped_shared() + golio isolation under PT lock is now similar > to how we handle MADV_PAGEOUT. > > While at it, combine the folio_is_file_lru() checks. > > Signed-off-by: David Hildenbrand > --- > include/linux/migrate.h | 7 ++++ > mm/huge_memory.c | 8 ++-- > mm/memory.c | 9 +++-- > mm/migrate.c | 81 +++++++++++++++++++---------------------- > 4 files changed, 55 insertions(+), 50 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index f9d92482d117..644be30b69c8 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -139,9 +139,16 @@ const struct movable_operations *page_movable_ops(struct page *page) > } > > #ifdef CONFIG_NUMA_BALANCING > +int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node); > int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > int node); > #else > +static inline int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node) > +{ > + return -EAGAIN; /* can't migrate now */ > +} > static inline int migrate_misplaced_folio(struct folio *folio, > struct vm_area_struct *vma, int node) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index fc27dabcd8e3..4b2817bb2c7d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1688,11 +1688,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) > if (node_is_toptier(nid)) > last_cpupid = folio_last_cpupid(folio); > target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags); > - if (target_nid == NUMA_NO_NODE) { > - folio_put(folio); > + if (target_nid == NUMA_NO_NODE) > + goto out_map; > + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { > + flags |= TNF_MIGRATE_FAIL; > goto out_map; > } > - > + /* The folio is isolated and isolation code holds a folio reference. */ > spin_unlock(vmf->ptl); > writable = false; > > diff --git a/mm/memory.c b/mm/memory.c > index 118660de5bcc..4fd1ecfced4d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5207,8 +5207,6 @@ int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf, > { > struct vm_area_struct *vma = vmf->vma; > > - folio_get(folio); > - > /* Record the current PID acceesing VMA */ > vma_set_access_pid_bit(vma); > > @@ -5345,10 +5343,13 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) > else > last_cpupid = folio_last_cpupid(folio); > target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags); > - if (target_nid == NUMA_NO_NODE) { > - folio_put(folio); > + if (target_nid == NUMA_NO_NODE) > + goto out_map; > + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { > + flags |= TNF_MIGRATE_FAIL; > goto out_map; > } > + /* The folio is isolated and isolation code holds a folio reference. */ > pte_unmap_unlock(vmf->pte, vmf->ptl); > writable = false; > ignore_writable = true; > diff --git a/mm/migrate.c b/mm/migrate.c > index 0307b54879a0..27f070f64f27 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2530,9 +2530,37 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src, > return __folio_alloc_node(gfp, order, nid); > } > > -static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) > +/* > + * Prepare for calling migrate_misplaced_folio() by isolating the folio if > + * permitted. Must be called with the PTL still held. > + */ > +int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node) > { > int nr_pages = folio_nr_pages(folio); > + pg_data_t *pgdat = NODE_DATA(node); > + > + if (folio_is_file_lru(folio)) { > + /* > + * Do not migrate file folios that are mapped in multiple > + * processes with execute permissions as they are probably > + * shared libraries. > + * > + * See folio_likely_mapped_shared() on possible imprecision > + * when we cannot easily detect if a folio is shared. > + */ > + if ((vma->vm_flags & VM_EXEC) && > + folio_likely_mapped_shared(folio)) > + return -EACCES; > + > + /* > + * Do not migrate dirty folios as not all filesystems can move > + * dirty folios in MIGRATE_ASYNC mode which is a waste of > + * cycles. > + */ > + if (folio_test_dirty(folio)) > + return -EAGAIN; > + } > > /* Avoid migrating to a node that is nearly full */ > if (!migrate_balanced_pgdat(pgdat, nr_pages)) { > @@ -2550,65 +2578,37 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) > * further. > */ > if (z < 0) > - return 0; > + return -EAGAIN; > > wakeup_kswapd(pgdat->node_zones + z, 0, > folio_order(folio), ZONE_MOVABLE); > - return 0; > + return -EAGAIN; > } > > if (!folio_isolate_lru(folio)) > - return 0; > + return -EAGAIN; > > node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), > nr_pages); > - > - /* > - * Isolating the folio has taken another reference, so the > - * caller's reference can be safely dropped without the folio > - * disappearing underneath us during migration. > - */ > - folio_put(folio); > - return 1; > + return 0; > } > > /* > * Attempt to migrate a misplaced folio to the specified destination > - * node. Caller is expected to have an elevated reference count on > - * the folio that will be dropped by this function before returning. > + * node. Caller is expected to have isolated the folio by calling > + * migrate_misplaced_folio_prepare(), which will result in an > + * elevated reference count on the folio. This function will un-isolate the > + * folio, dereferencing the folio before returning. > */ > int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > int node) > { > pg_data_t *pgdat = NODE_DATA(node); > - int isolated; > int nr_remaining; > unsigned int nr_succeeded; > LIST_HEAD(migratepages); > int nr_pages = folio_nr_pages(folio); > > - /* > - * Don't migrate file folios that are mapped in multiple processes > - * with execute permissions as they are probably shared libraries. > - * > - * See folio_likely_mapped_shared() on possible imprecision when we > - * cannot easily detect if a folio is shared. > - */ > - if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) && > - (vma->vm_flags & VM_EXEC)) > - goto out; > - > - /* > - * Also do not migrate dirty folios as not all filesystems can move > - * dirty folios in MIGRATE_ASYNC mode which is a waste of cycles. > - */ > - if (folio_is_file_lru(folio) && folio_test_dirty(folio)) > - goto out; > - > - isolated = numamigrate_isolate_folio(pgdat, folio); > - if (!isolated) > - goto out; > - > list_add(&folio->lru, &migratepages); > nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, > NULL, node, MIGRATE_ASYNC, > @@ -2620,7 +2620,6 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > folio_is_file_lru(folio), -nr_pages); > folio_putback_lru(folio); > } > - isolated = 0; > } > if (nr_succeeded) { > count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); > @@ -2629,11 +2628,7 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > nr_succeeded); > } > BUG_ON(!list_empty(&migratepages)); > - return isolated ? 0 : -EAGAIN; > - > -out: > - folio_put(folio); > - return -EAGAIN; > + return nr_remaining ? -EAGAIN : 0; > } > #endif /* CONFIG_NUMA_BALANCING */ > #endif /* CONFIG_NUMA */ Hi David, I have tested these patches on my system. My system has 2 DRAM nodes and 1 PMEM node. I tested page migration between DRAM nodes and page promotion from PMEM to DRAM. Both are working fine. below are the results. Migration results ============= numa_pte_updates 18977 numa_huge_pte_updates 0 numa_hint_faults 18504 numa_hint_faults_local 2116 numa_pages_migrated 16388 Promotion Results =============== nr_sec_page_table_pages 0 nr_iommu_pages 0 nr_swapcached 0 pgpromote_success 16386 pgpromote_candidate 0 Tested-by: Donet Tom Thanks Donet