From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32FCAC433DB for ; Mon, 29 Mar 2021 18:34:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B782D60231 for ; Mon, 29 Mar 2021 18:34:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B782D60231 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 56F096B0085; Mon, 29 Mar 2021 14:34:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 544E76B0087; Mon, 29 Mar 2021 14:34:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40DB46B0088; Mon, 29 Mar 2021 14:34:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 1D4596B0085 for ; Mon, 29 Mar 2021 14:34:25 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CBC746D78 for ; Mon, 29 Mar 2021 18:34:24 +0000 (UTC) X-FDA: 77973762048.32.DAE98B4 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf07.hostedemail.com (Postfix) with ESMTP id CDD62A0049FF for ; Mon, 29 Mar 2021 18:33:50 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id t20so4817293plr.13 for ; Mon, 29 Mar 2021 11:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=29/H2A8xfgwNsEVym/wTOPWEPYxThWyLZ916L71Yy00=; b=mrmoTg7T8w2ewOZaujRQucTGGuKwCcmyMGPkK+fJqCDjBpKWyHWnHZFWACqWCEkwiH cUHSuyTUqNP6MtQdz9EX3FZtA0eyM0rfofkZqTqORPNdPlZ/qp70hMZ3l0uR3Li2BvBI nxFmgtpTbx4bpxpjf2I6eowQo4Ji2QZ/r7RZPZyqByrpip2VPHMk6xIJGb0Pb134TAJQ G6y8TGwicM/jwNtKC0CC117RjFSU7+8CtiDOZZuVNr2e+fPJBQwukYzAKYD2IUpvY7Nt eAMcwuM5ini3R8dkVeNQhX6cf9JHKsueug8CZ7BI8f6FnkGJJHnWDb6Xw4AFqWENv6o/ qXQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=29/H2A8xfgwNsEVym/wTOPWEPYxThWyLZ916L71Yy00=; b=WuWMq4w5GXxrLVNpYo/Fiai3uvUEvtzfr39/i/14i0KCrTeZ9ax68qtA+I4fZlprab 5OrokF5xE1KSQUhhMccq6HuaBjs8gdmJBgTUTAadqzv7adVa44UwQfRN3GC8tFx+Bo7+ WDKjhbzWvw/0vrpPARZ6KqG1Nhbsisq/mNbcctBGbP5Y29dEuMnFMEVccug3j2+eDSS4 tZ5eqmlPuE07RltkYCFTOQagjHDtiRgG9573lW6lucrSmNjQweYO7M+ckOKaT+CRxZ1O +hWtPgdWFtcSxfdkMazDQxwVc+vcD1ag3URnZJ9d/b3AYx4MVIvFtQFpdVTTpzgFQp4W esLA== X-Gm-Message-State: AOAM531LQMGXOBE/QI3qU7kBoEeLBuK3+D+DXNhJ969HlraiVIhhmHnB QY8AtMKPcsbF45MQADRc1qo= X-Google-Smtp-Source: ABdhPJzzGtILjPNCm+kpWPLTjkZtqjZWR1rcTVghha1v/37IKDj/4fa2DaB5UVX0VoofTQSBqcMHwQ== X-Received: by 2002:a17:90b:203:: with SMTP id fy3mr490758pjb.32.1617042829545; Mon, 29 Mar 2021 11:33:49 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id x11sm1151158pjh.0.2021.03.29.11.33.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Mar 2021 11:33:48 -0700 (PDT) From: Yang Shi To: mgorman@suse.de, kirill.shutemov@linux.intel.com, ziy@nvidia.com, mhocko@suse.com, ying.huang@intel.com, hughd@google.com, hca@linux.ibm.com, gor@linux.ibm.com, borntraeger@de.ibm.com, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/6] mm: thp: refactor NUMA fault handling Date: Mon, 29 Mar 2021 11:33:10 -0700 Message-Id: <20210329183312.178266-5-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210329183312.178266-1-shy828301@gmail.com> References: <20210329183312.178266-1-shy828301@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: CDD62A0049FF X-Stat-Signature: d8spdftadequjd5s3f8tyqo4q5dqf1ti Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=mail-pl1-f179.google.com; client-ip=209.85.214.179 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617042830-132204 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the THP NUMA fault support was added THP migration was not supported= yet. So the ad hoc THP migration was implemented in NUMA fault handling. Sinc= e v4.14 THP migration has been supported so it doesn't make too much sense to sti= ll keep another THP migration implementation rather than using the generic migrat= ion code. This patch reworked the NUMA fault handling to use generic migration impl= ementation to migrate misplaced page. There is no functional change. After the refactor the flow of NUMA fault handling looks just like its PTE counterpart: Acquire ptl Restore PMD Prepare for migration (elevate page refcount) Release ptl Isolate page from lru and elevate page refcount Migrate the misplaced THP In the old code anon_vma lock was needed to serialize THP migration against THP split, but since then the THP code has been reworked a lot, it seems anon_vma lock is not required anymore to avoid the race. The page refcount elevation when holding ptl should prevent from THP split. Signed-off-by: Yang Shi --- include/linux/migrate.h | 23 ------ mm/huge_memory.c | 132 ++++++++---------------------- mm/migrate.c | 173 ++++++---------------------------------- 3 files changed, 57 insertions(+), 271 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 6abd34986cc5..6c8640e9af4f 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -100,15 +100,10 @@ static inline void __ClearPageMovable(struct page *= page) #endif =20 #ifdef CONFIG_NUMA_BALANCING -extern bool pmd_trans_migrating(pmd_t pmd); extern int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node, bool compound); #else -static inline bool pmd_trans_migrating(pmd_t pmd) -{ - return false; -} static inline int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node, bool compound) @@ -117,24 +112,6 @@ static inline int migrate_misplaced_page(struct page= *page, } #endif /* CONFIG_NUMA_BALANCING */ =20 -#if defined(CONFIG_NUMA_BALANCING) && defined(CONFIG_TRANSPARENT_HUGEPAG= E) -extern int migrate_misplaced_transhuge_page(struct mm_struct *mm, - struct vm_area_struct *vma, - pmd_t *pmd, pmd_t entry, - unsigned long address, - struct page *page, int node); -#else -static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm, - struct vm_area_struct *vma, - pmd_t *pmd, pmd_t entry, - unsigned long address, - struct page *page, int node) -{ - return -EAGAIN; -} -#endif /* CONFIG_NUMA_BALANCING && CONFIG_TRANSPARENT_HUGEPAGE*/ - - #ifdef CONFIG_MIGRATION =20 /* diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 53f3843ce72a..157c63b0fd95 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1419,94 +1419,20 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault = *vmf) { struct vm_area_struct *vma =3D vmf->vma; pmd_t pmd =3D vmf->orig_pmd; - struct anon_vma *anon_vma =3D NULL; struct page *page; unsigned long haddr =3D vmf->address & HPAGE_PMD_MASK; - int page_nid =3D NUMA_NO_NODE, this_nid =3D numa_node_id(); + int page_nid =3D NUMA_NO_NODE; int target_nid, last_cpupid =3D -1; - bool page_locked; bool migrated =3D false; - bool was_writable; + bool was_writable =3D pmd_savedwrite(pmd); int flags =3D 0; =20 vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(pmd, *vmf->pmd))) - goto out_unlock; - - /* - * If there are potential migrations, wait for completion and retry - * without disrupting NUMA hinting information. Do not relock and - * check_same as the page may no longer be mapped. - */ - if (unlikely(pmd_trans_migrating(*vmf->pmd))) { - page =3D pmd_page(*vmf->pmd); - if (!get_page_unless_zero(page)) - goto out_unlock; - spin_unlock(vmf->ptl); - put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE); - goto out; - } - - page =3D pmd_page(pmd); - BUG_ON(is_huge_zero_page(page)); - page_nid =3D page_to_nid(page); - last_cpupid =3D page_cpupid_last(page); - count_vm_numa_event(NUMA_HINT_FAULTS); - if (page_nid =3D=3D this_nid) { - count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); - flags |=3D TNF_FAULT_LOCAL; - } - - /* See similar comment in do_numa_page for explanation */ - if (!pmd_savedwrite(pmd)) - flags |=3D TNF_NO_GROUP; - - /* - * Acquire the page lock to serialise THP migrations but avoid dropping - * page_table_lock if at all possible - */ - page_locked =3D trylock_page(page); - target_nid =3D mpol_misplaced(page, vma, haddr); - if (target_nid =3D=3D NUMA_NO_NODE) { - /* If the page was locked, there are no parallel migrations */ - if (page_locked) - goto clear_pmdnuma; - } - - /* Migration could have started since the pmd_trans_migrating check */ - if (!page_locked) { - page_nid =3D NUMA_NO_NODE; - if (!get_page_unless_zero(page)) - goto out_unlock; + if (unlikely(!pmd_same(pmd, *vmf->pmd))) { spin_unlock(vmf->ptl); - put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE); goto out; } =20 - /* - * Page is misplaced. Page lock serialises migrations. Acquire anon_vma - * to serialises splits - */ - get_page(page); - spin_unlock(vmf->ptl); - anon_vma =3D page_lock_anon_vma_read(page); - - /* Confirm the PMD did not change while page_table_lock was released */ - spin_lock(vmf->ptl); - if (unlikely(!pmd_same(pmd, *vmf->pmd))) { - unlock_page(page); - put_page(page); - page_nid =3D NUMA_NO_NODE; - goto out_unlock; - } - - /* Bail if we fail to protect against THP splits for any reason */ - if (unlikely(!anon_vma)) { - put_page(page); - page_nid =3D NUMA_NO_NODE; - goto clear_pmdnuma; - } - /* * Since we took the NUMA fault, we must have observed the !accessible * bit. Make sure all other CPUs agree with that, to avoid them @@ -1533,38 +1459,44 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault = *vmf) haddr + HPAGE_PMD_SIZE); } =20 - /* - * Migrate the THP to the requested node, returns with page unlocked - * and access rights restored. - */ - spin_unlock(vmf->ptl); - - migrated =3D migrate_misplaced_transhuge_page(vma->vm_mm, vma, - vmf->pmd, pmd, vmf->address, page, target_nid); - if (migrated) { - flags |=3D TNF_MIGRATED; - page_nid =3D target_nid; - } else - flags |=3D TNF_MIGRATE_FAIL; - - goto out; -clear_pmdnuma: - BUG_ON(!PageLocked(page)); - was_writable =3D pmd_savedwrite(pmd); + /* Restore the PMD */ pmd =3D pmd_modify(pmd, vma->vm_page_prot); pmd =3D pmd_mkyoung(pmd); if (was_writable) pmd =3D pmd_mkwrite(pmd); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - unlock_page(page); -out_unlock: + + page =3D vm_normal_page_pmd(vma, haddr, pmd); + if (!page) { + spin_unlock(vmf->ptl); + goto out; + } + + /* See similar comment in do_numa_page for explanation */ + if (!was_writable) + flags |=3D TNF_NO_GROUP; + + page_nid =3D page_to_nid(page); + last_cpupid =3D page_cpupid_last(page); + target_nid =3D numa_migrate_prep(page, vma, haddr, page_nid, + &flags); + spin_unlock(vmf->ptl); =20 -out: - if (anon_vma) - page_unlock_anon_vma_read(anon_vma); + if (target_nid =3D=3D NUMA_NO_NODE) { + put_page(page); + goto out; + } + + migrated =3D migrate_misplaced_page(page, vma, target_nid, true); + if (migrated) { + flags |=3D TNF_MIGRATED; + page_nid =3D target_nid; + } else + flags |=3D TNF_MIGRATE_FAIL; =20 +out: if (page_nid !=3D NUMA_NO_NODE) task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags); diff --git a/mm/migrate.c b/mm/migrate.c index 9c4ae5132919..86325c750c14 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2066,6 +2066,23 @@ static struct page *alloc_misplaced_dst_page(struc= t page *page, return newpage; } =20 +static struct page *alloc_misplaced_dst_page_thp(struct page *page, + unsigned long data) +{ + int nid =3D (int) data; + struct page *newpage; + + newpage =3D alloc_pages_node(nid, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE= ), + HPAGE_PMD_ORDER); + if (!newpage) + goto out; + + prep_transhuge_page(newpage); + +out: + return newpage; +} + static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { int page_lru; @@ -2104,12 +2121,6 @@ static int numamigrate_isolate_page(pg_data_t *pgd= at, struct page *page) return 1; } =20 -bool pmd_trans_migrating(pmd_t pmd) -{ - struct page *page =3D pmd_page(pmd); - return PageLocked(page); -} - static inline bool is_shared_exec_page(struct vm_area_struct *vma, struct page *page) { @@ -2133,6 +2144,12 @@ int migrate_misplaced_page(struct page *page, stru= ct vm_area_struct *vma, int isolated; int nr_remaining; LIST_HEAD(migratepages); + new_page_t *new; + + if (compound) + new =3D alloc_misplaced_dst_page_thp; + else + new =3D alloc_misplaced_dst_page; =20 /* * Don't migrate file pages that are mapped in multiple processes @@ -2153,9 +2170,8 @@ int migrate_misplaced_page(struct page *page, struc= t vm_area_struct *vma, goto out; =20 list_add(&page->lru, &migratepages); - nr_remaining =3D migrate_pages(&migratepages, alloc_misplaced_dst_page, - NULL, node, MIGRATE_ASYNC, - MR_NUMA_MISPLACED); + nr_remaining =3D migrate_pages(&migratepages, *new, NULL, node, + MIGRATE_ASYNC, MR_NUMA_MISPLACED); if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); @@ -2174,145 +2190,6 @@ int migrate_misplaced_page(struct page *page, str= uct vm_area_struct *vma, return 0; } #endif /* CONFIG_NUMA_BALANCING */ - -#if defined(CONFIG_NUMA_BALANCING) && defined(CONFIG_TRANSPARENT_HUGEPAG= E) -/* - * Migrates a THP to a given target node. page must be locked and is unl= ocked - * before returning. - */ -int migrate_misplaced_transhuge_page(struct mm_struct *mm, - struct vm_area_struct *vma, - pmd_t *pmd, pmd_t entry, - unsigned long address, - struct page *page, int node) -{ - spinlock_t *ptl; - pg_data_t *pgdat =3D NODE_DATA(node); - int isolated =3D 0; - struct page *new_page =3D NULL; - int page_lru =3D page_is_file_lru(page); - unsigned long start =3D address & HPAGE_PMD_MASK; - - if (is_shared_exec_page(vma, page)) - goto out; - - new_page =3D alloc_pages_node(node, - (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE), - HPAGE_PMD_ORDER); - if (!new_page) - goto out_fail; - prep_transhuge_page(new_page); - - isolated =3D numamigrate_isolate_page(pgdat, page); - if (!isolated) { - put_page(new_page); - goto out_fail; - } - - /* Prepare a page as a migration target */ - __SetPageLocked(new_page); - if (PageSwapBacked(page)) - __SetPageSwapBacked(new_page); - - /* anon mapping, we can simply copy page->mapping to the new page: */ - new_page->mapping =3D page->mapping; - new_page->index =3D page->index; - /* flush the cache before copying using the kernel virtual address */ - flush_cache_range(vma, start, start + HPAGE_PMD_SIZE); - migrate_page_copy(new_page, page); - WARN_ON(PageLRU(new_page)); - - /* Recheck the target PMD */ - ptl =3D pmd_lock(mm, pmd); - if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) { - spin_unlock(ptl); - - /* Reverse changes made by migrate_page_copy() */ - if (TestClearPageActive(new_page)) - SetPageActive(page); - if (TestClearPageUnevictable(new_page)) - SetPageUnevictable(page); - - unlock_page(new_page); - put_page(new_page); /* Free it */ - - /* Retake the callers reference and putback on LRU */ - get_page(page); - putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); - - goto out_unlock; - } - - entry =3D mk_huge_pmd(new_page, vma->vm_page_prot); - entry =3D maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - - /* - * Overwrite the old entry under pagetable lock and establish - * the new PTE. Any parallel GUP will either observe the old - * page blocking on the page lock, block on the page table - * lock or observe the new page. The SetPageUptodate on the - * new page and page_add_new_anon_rmap guarantee the copy is - * visible before the pagetable update. - */ - page_add_anon_rmap(new_page, vma, start, true); - /* - * At this point the pmd is numa/protnone (i.e. non present) and the TL= B - * has already been flushed globally. So no TLB can be currently - * caching this non present pmd mapping. There's no need to clear the - * pmd before doing set_pmd_at(), nor to flush the TLB after - * set_pmd_at(). Clearing the pmd here would introduce a race - * condition against MADV_DONTNEED, because MADV_DONTNEED only holds th= e - * mmap_lock for reading. If the pmd is set to NULL at any given time, - * MADV_DONTNEED won't wait on the pmd lock and it'll skip clearing thi= s - * pmd. - */ - set_pmd_at(mm, start, pmd, entry); - update_mmu_cache_pmd(vma, address, &entry); - - page_ref_unfreeze(page, 2); - mlock_migrate_page(new_page, page); - page_remove_rmap(page, true); - set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED); - - spin_unlock(ptl); - - /* Take an "isolate" reference and put new page on the LRU. */ - get_page(new_page); - putback_lru_page(new_page); - - unlock_page(new_page); - unlock_page(page); - put_page(page); /* Drop the rmap reference */ - put_page(page); /* Drop the LRU isolation reference */ - - count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); - count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); - return isolated; - -out_fail: - count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); - ptl =3D pmd_lock(mm, pmd); - if (pmd_same(*pmd, entry)) { - entry =3D pmd_modify(entry, vma->vm_page_prot); - set_pmd_at(mm, start, pmd, entry); - update_mmu_cache_pmd(vma, address, &entry); - } - spin_unlock(ptl); - -out_unlock: - unlock_page(page); -out: - put_page(page); - return 0; -} -#endif /* CONFIG_NUMA_BALANCING */ - #endif /* CONFIG_NUMA */ =20 #ifdef CONFIG_DEVICE_PRIVATE --=20 2.26.2