From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90BB7C6FD19 for ; Mon, 13 Mar 2023 12:44:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF57D6B0071; Mon, 13 Mar 2023 08:44:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA6126B0072; Mon, 13 Mar 2023 08:44:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C470A6B0074; Mon, 13 Mar 2023 08:44:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B64AA6B0071 for ; Mon, 13 Mar 2023 08:44:19 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6E6BBAAF67 for ; Mon, 13 Mar 2023 12:44:19 +0000 (UTC) X-FDA: 80563843038.06.35854D7 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf20.hostedemail.com (Postfix) with ESMTP id 38B831C0022 for ; Mon, 13 Mar 2023 12:44:15 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=h3ni4bsc; spf=pass (imf20.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678711457; a=rsa-sha256; cv=none; b=yAFm00BwrbReNpFhHzUxG1K5KMLWe0Xjk1o42AnH5TZoUtx+rhH0sNKR8vF7/OPGjNqFsb kNV56PjZq9RxTM/mWR4ty+tZgoXz89Y6MDpDiBwBNPuSRfr3Be0Ej4CCIPdcBPbYo2B8Zl OwbW0pgbqnzt6FhCcPQ7h9mb9bqULg4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=h3ni4bsc; spf=pass (imf20.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678711457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9eqD9hHexPc3y5JOE54deUYZ2SMWoKebySgUBABqJlI=; b=5bWYcfPZBDYAUtVrQt9MU60w2VHnGrZGjRXKbWFm50W7D4jGONeKXODoWgTRiPwWX3+PDG P0blHJHPKT1G0ctvKfsRgqnYA6YGjQ2FMJdlyVBIGwwrUhFv+9Se9pI5fS2VEpaNe4Fhhb d5dKsyNt1s+mHfSCr2wXTgQq9J1CUk8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678711456; x=1710247456; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hUG1bUoFVWB/VZA5fzXRqhvSKxl9ESyzxRiFybJ+EPA=; b=h3ni4bscYG95ZE/NWJifE9/DjBjEKPI2iEvcfjIn5S2H2oez1xXmyKUe am9wkFr4wmWEWoFLKXEpw/hJL+GbN84WAgSBoGROA3MmUfEyRnDShojF1 AwnfESGxKoQt3hS5sH33vFx4jK6yPz+pDhut1CPZbxcethjnlf94F9jsn yDg4XTpCOMKIWc8PMTymOvdlO3VK7C/274VSHqH8oILIkmPrFQD/2kem9 Oa6gf5SSyWp69uWpsxZ56F4QIpqWF9lB6ndjopKavp6vImQw1QlpDrX54 S3oQ9HZEHaqnM3jl+FyQ2cU4s4NdfrX5FXCbj5QtUC/H3ACCJ1nAOzLcF w==; X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="335834155" X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; d="scan'208";a="335834155" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 05:44:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="802432929" X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; d="scan'208";a="802432929" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga004.jf.intel.com with ESMTP; 13 Mar 2023 05:44:11 -0700 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, mike.kravetz@oracle.com, sidhartha.kumar@oracle.com, naoya.horiguchi@nec.com, jane.chu@oracle.com, david@redhat.com Cc: fengwei.yin@intel.com Subject: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Date: Mon, 13 Mar 2023 20:45:22 +0800 Message-Id: <20230313124526.1207490-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230313124526.1207490-1-fengwei.yin@intel.com> References: <20230313124526.1207490-1-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 38B831C0022 X-Rspamd-Server: rspam01 X-Stat-Signature: jug4k76ymxrccg9uym1ghqa8y6szjphr X-HE-Tag: 1678711455-905590 X-HE-Meta: U2FsdGVkX19T0BY96JsNzDep45tBGwLQlwUTL5vcTWldBSvBzbRGgWw+r5J0NhSF4GWjDf+oSwjYItX5jMtdThUZOLiOgJrLqSE6GZ94s3twj2uyRgt9lKmfELBUE9xFOIRhe6n1QKcOBQVJj41bt41EZUmhArI7VN6urtO9rf3I5kYvl8Pq40CEGaHYKPhRET1GQZVDARapF8HMMfaALfemCv4+cubQJ0/ErRmmyEQOJ4vpjluiVmmxqyWbhcATG/Isydb4sCLgYI7+l0VDbj0rJ2HFc0nLCo/kjJkDEdnR4PauGIPT6XzVU6/f5KlwooHqxu3AaaDekd2Epgbzc2L8J1bTkV/fWsOMgj4hIXEw4XrJCu1qCfdvn1E296bUPRRggVDILF2K47TxuQTb+UzkIj6lVF9nF2tFYRHWm0qY2wN82WsXGUAYCEQZUdO4n5P9/fs1RZdRLfyuXIl/725eqIwm6gOcn3H9BwT9o/tE70BuG7ZviwSh8tqz8A3J8rFJYWD74LFmmG1VaXFi8zgYklnzDHzWSafWa4LJ2PEaVqb0fAM1hBhOfUW4B4Zf3c7S/R5qCogtJHhY/lLkJ1xP+IDHsvJlobboKn8l0EuW12pRrhfgEsuRrVC/6YugQ8zwbl0ZfXzeuxQfMZShFOQdG0j37xsRzSNdoY+aoJaRHoCbhVQ0bwOWPYA581py2pYHAHgX7UonSBfwgAS2BQTfjniFcArTctcXWx9yMoRzO0DanSNRRYuz5R5L8YKchVpnfeMx5QzDK1UAXsN31rBPxjVBdqX3V199mQSW1dm4d59G2yZ3QYJABgttjTgV31kGG5tTOXt/1dm1mukJo0vs6yE+3ke0qMUqY1OeXy8xs0QVhIx9/9QY1ekoAjGbJAeDhSNzz5EZabvtuZTweMyDy6luMx4Dyp62muoL6D/872VKT2K7hsKC8dtiqNosknaLqYkTfOeGsxEBfAA 1QeWGWh1 Rtl9of70z7An9BPKJQ2dje/B2cSXBuSovemvle53MdizDULwnz8+9D63pfePHpAlWVRYHw0JR/TtZmBKp06nvWcjZ+fU9JY0YhY4SlCfHvGk156FQ9pyv8FqMN3zKyLIQZvYKiN69AQt9CLf0o/b/5iCjSqHzxHTx3PM0rq4grcuOiiGPLbv0LcrGK5Y+vjXUYJedFKQTRmmE7dBQMQCAFQyHbnw98fjnmaYn+cmvCbh95WUiAXbes7oS5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It's to prepare the batched rmap update for large folio. No need to looped handle hugetlb. Just handle hugetlb and bail out early. Signed-off-by: Yin Fengwei Reviewed-by: Mike Kravetz --- mm/rmap.c | 200 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 121 insertions(+), 79 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index ba901c416785..3a2e3ccb8031 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1441,6 +1441,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, munlock_vma_folio(folio, vma, compound); } +static bool try_to_unmap_one_hugetlb(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + bool ret = true, anon = folio_test_anon(folio); + + /* + * The try_to_unmap() is only passed a hugetlb page + * in the case where the hugetlb page is poisoned. + */ + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted in caller + * (try_to_unmap_one) to cover this range. + */ + flush_cache_range(vma, range.start, range.end); + + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. + */ + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + ret = false; + goto out; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + goto out; + } + hugetlb_vma_unlock_write(vma); + } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (huge_pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */ + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page)); + set_huge_pte_at(mm, address, pvmw.pte, pteval); + hugetlb_count_sub(folio_nr_pages(folio), mm); + + /* + * No need to call mmu_notifier_invalidate_range() it has be + * done above for all cases requiring it to happen under page + * table lock before mmu_notifier_invalidate_range_end() + * + * See Documentation/mm/mmu_notifier.rst + */ + page_remove_rmap(&folio->page, vma, true); + /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not + * necessary to call mlock_drain_local(). + */ + folio_put(folio); + +out: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1504,86 +1601,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } + address = pvmw.address; + if (folio_test_hugetlb(folio)) { + ret = try_to_unmap_one_hugetlb(folio, vma, range, + pvmw, address, flags); + + /* no need to loop for hugetlb */ + page_vma_mapped_walk_done(&pvmw); + break; + } + subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); - if (folio_test_hugetlb(folio)) { - bool anon = folio_test_anon(folio); - - /* - * The try_to_unmap() is only passed a hugetlb page - * in the case where the hugetlb page is poisoned. - */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { /* - * huge_pmd_unshare may unmap an entire PMD page. - * There is no way of knowing exactly which PMDs may - * be cached for this mm, so we must flush them all. - * start/end were already adjusted above to cover this - * range. + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. */ - flush_cache_range(vma, range.start, range.end); + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. - */ - if (!anon) { - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); - if (!hugetlb_vma_trylock_write(vma)) { - page_vma_mapped_walk_done(&pvmw); - ret = false; - break; - } - if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - hugetlb_vma_unlock_write(vma); - flush_tlb_range(vma, - range.start, range.end); - mmu_notifier_invalidate_range(mm, - range.start, range.end); - /* - * The ref count of the PMD page was - * dropped which is part of the way map - * counting is done for shared PMDs. - * Return 'true' here. When there is - * no other sharing, huge_pmd_unshare - * returns false and we will unmap the - * actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; - } - hugetlb_vma_unlock_write(vma); - } - pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); } else { - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } + pteval = ptep_clear_flush(vma, address, pvmw.pte); } /* @@ -1602,14 +1650,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); - set_huge_pte_at(mm, address, pvmw.pte, pteval); - } else { - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } - + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no -- 2.30.2