From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 90BB7C6FD19
	for <linux-mm@archiver.kernel.org>; Mon, 13 Mar 2023 12:44:20 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DF57D6B0071; Mon, 13 Mar 2023 08:44:19 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DA6126B0072; Mon, 13 Mar 2023 08:44:19 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C470A6B0074; Mon, 13 Mar 2023 08:44:19 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id B64AA6B0071
	for <linux-mm@kvack.org>; Mon, 13 Mar 2023 08:44:19 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 6E6BBAAF67
	for <linux-mm@kvack.org>; Mon, 13 Mar 2023 12:44:19 +0000 (UTC)
X-FDA: 80563843038.06.35854D7
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
	by imf20.hostedemail.com (Postfix) with ESMTP id 38B831C0022
	for <linux-mm@kvack.org>; Mon, 13 Mar 2023 12:44:15 +0000 (UTC)
Authentication-Results: imf20.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=h3ni4bsc;
	spf=pass (imf20.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678711457; a=rsa-sha256;
	cv=none;
	b=yAFm00BwrbReNpFhHzUxG1K5KMLWe0Xjk1o42AnH5TZoUtx+rhH0sNKR8vF7/OPGjNqFsb
	kNV56PjZq9RxTM/mWR4ty+tZgoXz89Y6MDpDiBwBNPuSRfr3Be0Ej4CCIPdcBPbYo2B8Zl
	OwbW0pgbqnzt6FhCcPQ7h9mb9bqULg4=
ARC-Authentication-Results: i=1;
	imf20.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=h3ni4bsc;
	spf=pass (imf20.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1678711457;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=9eqD9hHexPc3y5JOE54deUYZ2SMWoKebySgUBABqJlI=;
	b=5bWYcfPZBDYAUtVrQt9MU60w2VHnGrZGjRXKbWFm50W7D4jGONeKXODoWgTRiPwWX3+PDG
	P0blHJHPKT1G0ctvKfsRgqnYA6YGjQ2FMJdlyVBIGwwrUhFv+9Se9pI5fS2VEpaNe4Fhhb
	d5dKsyNt1s+mHfSCr2wXTgQq9J1CUk8=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1678711456; x=1710247456;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hUG1bUoFVWB/VZA5fzXRqhvSKxl9ESyzxRiFybJ+EPA=;
  b=h3ni4bscYG95ZE/NWJifE9/DjBjEKPI2iEvcfjIn5S2H2oez1xXmyKUe
   am9wkFr4wmWEWoFLKXEpw/hJL+GbN84WAgSBoGROA3MmUfEyRnDShojF1
   AwnfESGxKoQt3hS5sH33vFx4jK6yPz+pDhut1CPZbxcethjnlf94F9jsn
   yDg4XTpCOMKIWc8PMTymOvdlO3VK7C/274VSHqH8oILIkmPrFQD/2kem9
   Oa6gf5SSyWp69uWpsxZ56F4QIpqWF9lB6ndjopKavp6vImQw1QlpDrX54
   S3oQ9HZEHaqnM3jl+FyQ2cU4s4NdfrX5FXCbj5QtUC/H3ACCJ1nAOzLcF
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="335834155"
X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; 
   d="scan'208";a="335834155"
Received: from orsmga004.jf.intel.com ([10.7.209.38])
  by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 05:44:14 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="802432929"
X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; 
   d="scan'208";a="802432929"
Received: from fyin-dev.sh.intel.com ([10.239.159.32])
  by orsmga004.jf.intel.com with ESMTP; 13 Mar 2023 05:44:11 -0700
From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org,
	akpm@linux-foundation.org,
	willy@infradead.org,
	mike.kravetz@oracle.com,
	sidhartha.kumar@oracle.com,
	naoya.horiguchi@nec.com,
	jane.chu@oracle.com,
	david@redhat.com
Cc: fengwei.yin@intel.com
Subject: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Mon, 13 Mar 2023 20:45:22 +0800
Message-Id: <20230313124526.1207490-2-fengwei.yin@intel.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20230313124526.1207490-1-fengwei.yin@intel.com>
References: <20230313124526.1207490-1-fengwei.yin@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Rspam-User: 
X-Rspamd-Queue-Id: 38B831C0022
X-Rspamd-Server: rspam01
X-Stat-Signature: jug4k76ymxrccg9uym1ghqa8y6szjphr
X-HE-Tag: 1678711455-905590
X-HE-Meta: U2FsdGVkX19T0BY96JsNzDep45tBGwLQlwUTL5vcTWldBSvBzbRGgWw+r5J0NhSF4GWjDf+oSwjYItX5jMtdThUZOLiOgJrLqSE6GZ94s3twj2uyRgt9lKmfELBUE9xFOIRhe6n1QKcOBQVJj41bt41EZUmhArI7VN6urtO9rf3I5kYvl8Pq40CEGaHYKPhRET1GQZVDARapF8HMMfaALfemCv4+cubQJ0/ErRmmyEQOJ4vpjluiVmmxqyWbhcATG/Isydb4sCLgYI7+l0VDbj0rJ2HFc0nLCo/kjJkDEdnR4PauGIPT6XzVU6/f5KlwooHqxu3AaaDekd2Epgbzc2L8J1bTkV/fWsOMgj4hIXEw4XrJCu1qCfdvn1E296bUPRRggVDILF2K47TxuQTb+UzkIj6lVF9nF2tFYRHWm0qY2wN82WsXGUAYCEQZUdO4n5P9/fs1RZdRLfyuXIl/725eqIwm6gOcn3H9BwT9o/tE70BuG7ZviwSh8tqz8A3J8rFJYWD74LFmmG1VaXFi8zgYklnzDHzWSafWa4LJ2PEaVqb0fAM1hBhOfUW4B4Zf3c7S/R5qCogtJHhY/lLkJ1xP+IDHsvJlobboKn8l0EuW12pRrhfgEsuRrVC/6YugQ8zwbl0ZfXzeuxQfMZShFOQdG0j37xsRzSNdoY+aoJaRHoCbhVQ0bwOWPYA581py2pYHAHgX7UonSBfwgAS2BQTfjniFcArTctcXWx9yMoRzO0DanSNRRYuz5R5L8YKchVpnfeMx5QzDK1UAXsN31rBPxjVBdqX3V199mQSW1dm4d59G2yZ3QYJABgttjTgV31kGG5tTOXt/1dm1mukJo0vs6yE+3ke0qMUqY1OeXy8xs0QVhIx9/9QY1ekoAjGbJAeDhSNzz5EZabvtuZTweMyDy6luMx4Dyp62muoL6D/872VKT2K7hsKC8dtiqNosknaLqYkTfOeGsxEBfAA
 1QeWGWh1
 Rtl9of70z7An9BPKJQ2dje/B2cSXBuSovemvle53MdizDULwnz8+9D63pfePHpAlWVRYHw0JR/TtZmBKp06nvWcjZ+fU9JY0YhY4SlCfHvGk156FQ9pyv8FqMN3zKyLIQZvYKiN69AQt9CLf0o/b/5iCjSqHzxHTx3PM0rq4grcuOiiGPLbv0LcrGK5Y+vjXUYJedFKQTRmmE7dBQMQCAFQyHbnw98fjnmaYn+cmvCbh95WUiAXbes7oS5A==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/rmap.c | 200 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 121 insertions(+), 79 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ba901c416785..3a2e3ccb8031 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1441,6 +1441,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
 	munlock_vma_folio(folio, vma, compound);
 }
 
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+		struct vm_area_struct *vma, struct mmu_notifier_range range,
+		struct page_vma_mapped_walk pvmw, unsigned long address,
+		enum ttu_flags flags)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t pteval;
+	bool ret = true, anon = folio_test_anon(folio);
+
+	/*
+	 * The try_to_unmap() is only passed a hugetlb page
+	 * in the case where the hugetlb page is poisoned.
+	 */
+	VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+	/*
+	 * huge_pmd_unshare may unmap an entire PMD page.
+	 * There is no way of knowing exactly which PMDs may
+	 * be cached for this mm, so we must flush them all.
+	 * start/end were already adjusted in caller
+	 * (try_to_unmap_one) to cover this range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
+	/*
+	 * To call huge_pmd_unshare, i_mmap_rwsem must be
+	 * held in write mode.  Caller needs to explicitly
+	 * do this outside rmap routines.
+	 *
+	 * We also must hold hugetlb vma_lock in write mode.
+	 * Lock order dictates acquiring vma_lock BEFORE
+	 * i_mmap_rwsem.  We can only try lock here and fail
+	 * if unsuccessful.
+	 */
+	if (!anon) {
+		VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (!hugetlb_vma_trylock_write(vma)) {
+			ret = false;
+			goto out;
+		}
+		if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+			hugetlb_vma_unlock_write(vma);
+			flush_tlb_range(vma,
+					range.start, range.end);
+			mmu_notifier_invalidate_range(mm,
+					range.start, range.end);
+			/*
+			 * The ref count of the PMD page was
+			 * dropped which is part of the way map
+			 * counting is done for shared PMDs.
+			 * Return 'true' here.  When there is
+			 * no other sharing, huge_pmd_unshare
+			 * returns false and we will unmap the
+			 * actual page and drop map count
+			 * to zero.
+			 */
+			goto out;
+		}
+		hugetlb_vma_unlock_write(vma);
+	}
+	pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+	/*
+	 * Now the pte is cleared. If this pte was uffd-wp armed,
+	 * we may want to replace a none pte with a marker pte if
+	 * it's file-backed, so we don't lose the tracking info.
+	 */
+	pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+	/* Set the dirty flag on the folio now the pte is gone. */
+	if (huge_pte_dirty(pteval))
+		folio_mark_dirty(folio);
+
+	/* Update high watermark before we lower rss */
+	update_hiwater_rss(mm);
+
+	/* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
+	pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+	set_huge_pte_at(mm, address, pvmw.pte, pteval);
+	hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+	/*
+	 * No need to call mmu_notifier_invalidate_range() it has be
+	 * done above for all cases requiring it to happen under page
+	 * table lock before mmu_notifier_invalidate_range_end()
+	 *
+	 * See Documentation/mm/mmu_notifier.rst
+	 */
+	page_remove_rmap(&folio->page, vma, true);
+	/* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
+	 * necessary to call mlock_drain_local().
+	 */
+	folio_put(folio);
+
+out:
+	return ret;
+}
+
 /*
  * @arg: enum ttu_flags will be passed to this argument
  */
@@ -1504,86 +1601,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			break;
 		}
 
+		address = pvmw.address;
+		if (folio_test_hugetlb(folio)) {
+			ret = try_to_unmap_one_hugetlb(folio, vma, range,
+							pvmw, address, flags);
+
+			/* no need to loop for hugetlb */
+			page_vma_mapped_walk_done(&pvmw);
+			break;
+		}
+
 		subpage = folio_page(folio,
 					pte_pfn(*pvmw.pte) - folio_pfn(folio));
-		address = pvmw.address;
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio)) {
-			bool anon = folio_test_anon(folio);
-
-			/*
-			 * The try_to_unmap() is only passed a hugetlb page
-			 * in the case where the hugetlb page is poisoned.
-			 */
-			VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+		/* Nuke the page table entry. */
+		if (should_defer_flush(mm, flags)) {
 			/*
-			 * huge_pmd_unshare may unmap an entire PMD page.
-			 * There is no way of knowing exactly which PMDs may
-			 * be cached for this mm, so we must flush them all.
-			 * start/end were already adjusted above to cover this
-			 * range.
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the folio.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
 			 */
-			flush_cache_range(vma, range.start, range.end);
+			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 *
-			 * We also must hold hugetlb vma_lock in write mode.
-			 * Lock order dictates acquiring vma_lock BEFORE
-			 * i_mmap_rwsem.  We can only try lock here and fail
-			 * if unsuccessful.
-			 */
-			if (!anon) {
-				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
-				if (!hugetlb_vma_trylock_write(vma)) {
-					page_vma_mapped_walk_done(&pvmw);
-					ret = false;
-					break;
-				}
-				if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
-					hugetlb_vma_unlock_write(vma);
-					flush_tlb_range(vma,
-						range.start, range.end);
-					mmu_notifier_invalidate_range(mm,
-						range.start, range.end);
-					/*
-					 * The ref count of the PMD page was
-					 * dropped which is part of the way map
-					 * counting is done for shared PMDs.
-					 * Return 'true' here.  When there is
-					 * no other sharing, huge_pmd_unshare
-					 * returns false and we will unmap the
-					 * actual page and drop map count
-					 * to zero.
-					 */
-					page_vma_mapped_walk_done(&pvmw);
-					break;
-				}
-				hugetlb_vma_unlock_write(vma);
-			}
-			pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
 		} else {
-			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-			/* Nuke the page table entry. */
-			if (should_defer_flush(mm, flags)) {
-				/*
-				 * We clear the PTE but do not flush so potentially
-				 * a remote CPU could still be writing to the folio.
-				 * If the entry was previously clean then the
-				 * architecture must guarantee that a clear->dirty
-				 * transition on a cached TLB entry is written through
-				 * and traps if the PTE is unmapped.
-				 */
-				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
-				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-			} else {
-				pteval = ptep_clear_flush(vma, address, pvmw.pte);
-			}
+			pteval = ptep_clear_flush(vma, address, pvmw.pte);
 		}
 
 		/*
@@ -1602,14 +1650,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 		if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
 			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
-			if (folio_test_hugetlb(folio)) {
-				hugetlb_count_sub(folio_nr_pages(folio), mm);
-				set_huge_pte_at(mm, address, pvmw.pte, pteval);
-			} else {
-				dec_mm_counter(mm, mm_counter(&folio->page));
-				set_pte_at(mm, address, pvmw.pte, pteval);
-			}
-
+			dec_mm_counter(mm, mm_counter(&folio->page));
+			set_pte_at(mm, address, pvmw.pte, pteval);
 		} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
 			/*
 			 * The guest indicated that the page content is of no
-- 
2.30.2