From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 023B2C27C4F for ; Fri, 21 Jun 2024 04:07:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2921C6B00E3; Fri, 21 Jun 2024 00:07:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F28E8D00DB; Fri, 21 Jun 2024 00:07:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F39798D0129; Fri, 21 Jun 2024 00:07:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA0178D00DB for ; Fri, 21 Jun 2024 00:07:22 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5B1634084F for ; Fri, 21 Jun 2024 04:07:22 +0000 (UTC) X-FDA: 82253561124.03.B1411E8 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) by imf14.hostedemail.com (Postfix) with ESMTP id D85CF10000A for ; Fri, 21 Jun 2024 04:07:18 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TUnUGvqL; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718942830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rxIbxagrj/Vedqxi//Pn/mRNtDSRg0EkKj8wPHa94BQ=; b=7M3HI5iTYKURnd+iHB5Q8IWuxmRa9tqn0bM/Z+BQcoteobKHEYUrG3G+NikBKqqrDUWopT VSoAc/FtrI//lUTkSH4F0WqFMpR5h9tLACW85FZmfdbhKALDMlVT8JGjarkYCQ+88XWJ3s YId3ibhPWV5xkJILnhMzaxrb5sscazc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718942830; a=rsa-sha256; cv=none; b=7IN6CKMTpjzSG1fvBn0HH71hMNiyXjQLU6DTt7auVQrwU1ZTDXAL99VuwzayQ59M4B5JJl /xrcFZaoejh/wEQSOS/2z1652Uo7ZKY2lGv1ecGFEv83qet4aSwTfTMHpcN4lU+YiDz+Zg 4gVCQRozgjvKzX13pUQqd5tJu3xn4xc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TUnUGvqL; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1718942835; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=rxIbxagrj/Vedqxi//Pn/mRNtDSRg0EkKj8wPHa94BQ=; b=TUnUGvqL4MDH8Kki5ox/2qF8svGL0mNePHhoWT2WO2lHjhW/grI8SGNkIsWnkZ7jHFke6SltRU29QV8kjkZECx2/WTOk/l9CYfbjAdbbfY65AZ9k2mQd3McWxbOOoqXjLvLQOwG0ZILZDxQoWm/GOIGMMEwO1tV9MJMIyqn7BFw= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045075189;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0W8uztnr_1718942833; Received: from 192.168.43.81(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W8uztnr_1718942833) by smtp.aliyun-inc.com; Fri, 21 Jun 2024 12:07:14 +0800 Message-ID: <05b89531-4103-49cb-bbbf-5a2cfaa3106b@linux.alibaba.com> Date: Fri, 21 Jun 2024 12:07:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton References: <20240620212935.656243-1-david@redhat.com> <20240620212935.656243-3-david@redhat.com> From: Baolin Wang In-Reply-To: <20240620212935.656243-3-david@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D85CF10000A X-Stat-Signature: wju8ub5h9fhqjdotef1okc4x3w7tct8o X-HE-Tag: 1718942838-695374 X-HE-Meta: U2FsdGVkX1+LOdNMnSXS+hvNH21y1wedxqY2OuM7iSnQcwhYEqJedtKx5RDbnWKPZ5SNgKFoE8yggTvgoXlttINKV0ianzBX3X38Bmt4vtaXg+NcApEbWucCeH9oNSQV7EOr01fArIiG+J9StiCgNph7zmWCm0vgcNDJLw46ct1qQZmL/UZ8kyDIhvfP7Ok8SEMSIKlidxwWWPOMIy/oba4+kjOSniAneID/szSOUUSryOb9DogQ2VC648uL2R/g+XhfDqcKCYBAZaI3zFUIio7Tdoeo1LEdkkj7FgayIJoXYp6gM2ciptE0BhF2yTBU0c85HbAene8TuR2unSs+k46oFN2T2fJtxNVbyNKNUsqzaAhG6XdDNtXfTxnJoWnrrsU3yu/8SxZyz+SJQ1n2w7dRGtx6YkVqTk4T8p0zR24wAZrN4laARp5FvpS6DvMnG08wNlRIvyBHCUaVeZUriXu6Atcl6QpGp8y3WfJ7JAT2C2K/VxMpZkyS66tX++Q4J6Oj+V561WZi6aZtVQ/jaKcS3QXiPGMH/MTSerXMG7SjfHnbnq5edV85VTL1iEYXK02KlkackUqoUl6HCJ8bFTboMjsSrzJ/tkoatHSYv7ze43HgadR0OiFH+v7xn3LpPX5xuf23fkBt6BFtH6Qoo9xkZna5cqP45mW5z3H96zNpeOlibcKor4e+Qq/99B7DmWiMKRv9KNFXU1SZayO7AB4j7SznS/Bj3eOz41y4uiQbvuOaH3hXgDOCpPSOwpL4mLQaCTK9x2OBTk4iY5tL5LMVCBWl3Rf+4+cyoeGp3aEmfu3rmat5stq2TSdf2qgJsMcU5oOUZ4Po3QqvHE2IhY3fA24GaEWbBiqpGpkpglJrUnEPr8bgrY8YhFCeRaGJBkXt2K+lrIvQwDcQKs8SHDm4IrqqgtCw1gxURHW2vdhiD1lEi6GNtraQyCmmy4RurpLEHqUSaRHNbxoHOwL TZMDwxyQ IF72UvUAbsvcybWaOdUhWnojS2Cbywt1caxwELH6WY6DIVmNS23UziDhpxHCgkUxF3NM+AxR4GfNbxttV7jM3yy4a1Ql0EGAD77y+YGd5hhTt9fIl6mxyjf3Fc2oX7cWgCiUizwqk+c5Ug8CEIvHL4R9V4scAdRRmUvyeMMY8bH8dZUrawtZA1ENbTuopjmujZy5W72Cj8l1bF7Yk5i1RZ1nXvczS7GdAjGPhTmyXNs1f+ZASCdHLg7ADjT+BXCJL6oJQXZeF6pd3yTbcXYozlCmm4LmlcvyM536mhS0R/cT5Ae9d8BFOp5bYhQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/21 05:29, David Hildenbrand wrote: > Currently we always take a folio reference even if migration will not > even be tried or isolation failed, requiring us to grab+drop an additional > reference. > > Further, we end up calling folio_likely_mapped_shared() while the folio > might have already been unmapped, because after we dropped the PTL, that > can easily happen. We want to stop touching mapcounts and friends from > such context, and only call folio_likely_mapped_shared() while the folio > is still mapped: mapcount information is pretty much stale and unreliable > otherwise. > > So let's move checks into numamigrate_isolate_folio(), rename that > function to migrate_misplaced_folio_prepare(), and call that function > from callsites where we call migrate_misplaced_folio(), but still with > the PTL held. > > We can now stop taking temporary folio references, and really only take > a reference if folio isolation succeeded. Doing the > folio_likely_mapped_shared() + golio isolation under PT lock is now similar > to how we handle MADV_PAGEOUT. s/golio/folio Make sense to me. Feel free to add: Reviewed-by: Baolin Wang > > While at it, combine the folio_is_file_lru() checks. > > Signed-off-by: David Hildenbrand > --- > include/linux/migrate.h | 7 ++++ > mm/huge_memory.c | 8 ++-- > mm/memory.c | 9 +++-- > mm/migrate.c | 81 +++++++++++++++++++---------------------- > 4 files changed, 55 insertions(+), 50 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index f9d92482d117..644be30b69c8 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -139,9 +139,16 @@ const struct movable_operations *page_movable_ops(struct page *page) > } > > #ifdef CONFIG_NUMA_BALANCING > +int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node); > int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > int node); > #else > +static inline int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node) > +{ > + return -EAGAIN; /* can't migrate now */ > +} > static inline int migrate_misplaced_folio(struct folio *folio, > struct vm_area_struct *vma, int node) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index fc27dabcd8e3..4b2817bb2c7d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1688,11 +1688,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) > if (node_is_toptier(nid)) > last_cpupid = folio_last_cpupid(folio); > target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags); > - if (target_nid == NUMA_NO_NODE) { > - folio_put(folio); > + if (target_nid == NUMA_NO_NODE) > + goto out_map; > + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { > + flags |= TNF_MIGRATE_FAIL; > goto out_map; > } > - > + /* The folio is isolated and isolation code holds a folio reference. */ > spin_unlock(vmf->ptl); > writable = false; > > diff --git a/mm/memory.c b/mm/memory.c > index 118660de5bcc..4fd1ecfced4d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5207,8 +5207,6 @@ int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf, > { > struct vm_area_struct *vma = vmf->vma; > > - folio_get(folio); > - > /* Record the current PID acceesing VMA */ > vma_set_access_pid_bit(vma); > > @@ -5345,10 +5343,13 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) > else > last_cpupid = folio_last_cpupid(folio); > target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags); > - if (target_nid == NUMA_NO_NODE) { > - folio_put(folio); > + if (target_nid == NUMA_NO_NODE) > + goto out_map; > + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { > + flags |= TNF_MIGRATE_FAIL; > goto out_map; > } > + /* The folio is isolated and isolation code holds a folio reference. */ > pte_unmap_unlock(vmf->pte, vmf->ptl); > writable = false; > ignore_writable = true; > diff --git a/mm/migrate.c b/mm/migrate.c > index 0307b54879a0..27f070f64f27 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2530,9 +2530,37 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src, > return __folio_alloc_node(gfp, order, nid); > } > > -static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) > +/* > + * Prepare for calling migrate_misplaced_folio() by isolating the folio if > + * permitted. Must be called with the PTL still held. > + */ > +int migrate_misplaced_folio_prepare(struct folio *folio, > + struct vm_area_struct *vma, int node) > { > int nr_pages = folio_nr_pages(folio); > + pg_data_t *pgdat = NODE_DATA(node); > + > + if (folio_is_file_lru(folio)) { > + /* > + * Do not migrate file folios that are mapped in multiple > + * processes with execute permissions as they are probably > + * shared libraries. > + * > + * See folio_likely_mapped_shared() on possible imprecision > + * when we cannot easily detect if a folio is shared. > + */ > + if ((vma->vm_flags & VM_EXEC) && > + folio_likely_mapped_shared(folio)) > + return -EACCES; > + > + /* > + * Do not migrate dirty folios as not all filesystems can move > + * dirty folios in MIGRATE_ASYNC mode which is a waste of > + * cycles. > + */ > + if (folio_test_dirty(folio)) > + return -EAGAIN; > + } > > /* Avoid migrating to a node that is nearly full */ > if (!migrate_balanced_pgdat(pgdat, nr_pages)) { > @@ -2550,65 +2578,37 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) > * further. > */ > if (z < 0) > - return 0; > + return -EAGAIN; > > wakeup_kswapd(pgdat->node_zones + z, 0, > folio_order(folio), ZONE_MOVABLE); > - return 0; > + return -EAGAIN; > } > > if (!folio_isolate_lru(folio)) > - return 0; > + return -EAGAIN; > > node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), > nr_pages); > - > - /* > - * Isolating the folio has taken another reference, so the > - * caller's reference can be safely dropped without the folio > - * disappearing underneath us during migration. > - */ > - folio_put(folio); > - return 1; > + return 0; > } (just a nit: returning boolean seems more readable) > > /* > * Attempt to migrate a misplaced folio to the specified destination > - * node. Caller is expected to have an elevated reference count on > - * the folio that will be dropped by this function before returning. > + * node. Caller is expected to have isolated the folio by calling > + * migrate_misplaced_folio_prepare(), which will result in an > + * elevated reference count on the folio. This function will un-isolate the > + * folio, dereferencing the folio before returning. > */ > int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > int node) > { > pg_data_t *pgdat = NODE_DATA(node); > - int isolated; > int nr_remaining; > unsigned int nr_succeeded; > LIST_HEAD(migratepages); > int nr_pages = folio_nr_pages(folio); > > - /* > - * Don't migrate file folios that are mapped in multiple processes > - * with execute permissions as they are probably shared libraries. > - * > - * See folio_likely_mapped_shared() on possible imprecision when we > - * cannot easily detect if a folio is shared. > - */ > - if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) && > - (vma->vm_flags & VM_EXEC)) > - goto out; > - > - /* > - * Also do not migrate dirty folios as not all filesystems can move > - * dirty folios in MIGRATE_ASYNC mode which is a waste of cycles. > - */ > - if (folio_is_file_lru(folio) && folio_test_dirty(folio)) > - goto out; > - > - isolated = numamigrate_isolate_folio(pgdat, folio); > - if (!isolated) > - goto out; > - > list_add(&folio->lru, &migratepages); > nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, > NULL, node, MIGRATE_ASYNC, > @@ -2620,7 +2620,6 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > folio_is_file_lru(folio), -nr_pages); > folio_putback_lru(folio); > } > - isolated = 0; > } > if (nr_succeeded) { > count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); > @@ -2629,11 +2628,7 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > nr_succeeded); > } > BUG_ON(!list_empty(&migratepages)); > - return isolated ? 0 : -EAGAIN; > - > -out: > - folio_put(folio); > - return -EAGAIN; > + return nr_remaining ? -EAGAIN : 0; > } > #endif /* CONFIG_NUMA_BALANCING */ > #endif /* CONFIG_NUMA */