From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4832CC54E5D for ; Mon, 18 Mar 2024 13:24:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7D688E0001; Mon, 18 Mar 2024 09:24:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2D958D0001; Mon, 18 Mar 2024 09:24:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACE2D8E0001; Mon, 18 Mar 2024 09:24:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 99B408D0001 for ; Mon, 18 Mar 2024 09:24:27 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4919D1C0EAD for ; Mon, 18 Mar 2024 13:24:27 +0000 (UTC) X-FDA: 81910228974.17.B37FFED Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by imf01.hostedemail.com (Postfix) with ESMTP id 093F44001F for ; Mon, 18 Mar 2024 13:24:24 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QqWQr7nK; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of aravinda.prasad@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=aravinda.prasad@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710768265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WBWjSW8p9YnGbGHwcxLroBXQEoQF4AQBf8o9H1MmYQ0=; b=61cjcwWwO/qQ6EFxKUEpYDnPuG8ZAnptymm91RB5SmmCKggPrT5+PTDO6caFjWNhUNCvsC XDZaHXqmU+ISndhrAV4biAUrGXOtk44d1uVTz+/T309YCMntq9bO1nJH8TJfHCyNepdfLA MEnYTjcG8E3IaqmVjY7rE/UAYFK9oeU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QqWQr7nK; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of aravinda.prasad@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=aravinda.prasad@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710768265; a=rsa-sha256; cv=none; b=Mp0rrQuEqHvRSto7QIF98znEQIP6VMrTJ3LscxgPY1T/AaXe19hJJns1t/wJQcDtZXGlqP l7g3IYY+6XiVHY+UsIOtVw/bEHAd1+jpxRA4EYRa96DO6MTNZRuPjoI9bJy7Q5dFV7xHPM UnnBlRfUCpJeaV+9K+BpT2pS3osTNpk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710768265; x=1742304265; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=snDYfeLbVNxVF6bzeLRakha46vIlLpVRETc5gZIsPWM=; b=QqWQr7nKHJNlCqNsn9At2ZB5gp7V81AZlxWanrKot/x8GyaDyfe61ghE 2y2Q0EAFELUDupG3E6lCmXTG9gocpMQZo/FZAojCUlMEfBBtNC+rKL7f6 1qi6Y6BpDnS5W/ooU8idZafY/gJULqZKV1j0tiZpZCkAfI6nA9k4f+LrQ TKto04M01OpOteu6gZgtVUoDFKAD/gCw2StnXUhT2boqwkRiU7ZOIY8Se cYhb9miSayFbOl6iWLHlYs43Gc9MhRddpvTsq/5gESsVn+KzN0O04eTPc eok/U6yyL2i7UhdemuiLY8c8yvj9gDLtkpaMvYySOXKiXqVraPA7wrurc A==; X-IronPort-AV: E=McAfee;i="6600,9927,11016"; a="23037944" X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="23037944" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,134,1708416000"; d="scan'208";a="14102758" Received: from adr-par-inspur1.iind.intel.com ([10.223.93.209]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2024 06:24:21 -0700 From: Aravinda Prasad To: damon@lists.linux.dev, linux-mm@kvack.org, sj@kernel.org, linux-kernel@vger.kernel.org Cc: aravinda.prasad@intel.com, s2322819@ed.ac.uk, sandeep4.kumar@intel.com, ying.huang@intel.com, dave.hansen@intel.com, dan.j.williams@intel.com, sreenivas.subramoney@intel.com, antti.kervinen@intel.com, alexander.kanevskiy@intel.com Subject: [PATCH v2 2/3] mm/damon: profiling enhancement Date: Mon, 18 Mar 2024 18:58:47 +0530 Message-Id: <20240318132848.82686-3-aravinda.prasad@intel.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20240318132848.82686-1-aravinda.prasad@intel.com> References: <20240318132848.82686-1-aravinda.prasad@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 093F44001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 45atiz746sncsxbhrz5tj19hrzy7usky X-HE-Tag: 1710768264-569069 X-HE-Meta: U2FsdGVkX1+97gDubNGVz8B07VZqXS5xi9Lk7nZv2T/9UodAM7xUEEtD93X6/JvCEYnwsu5nFKyLU6Cl7bCHCUNPaUBPhXhLQOiTZSgH/A5h4kUCPe46t3d3lMyTJNLH1E4CVU3ozZYIj008SPW3JBuyrIBCfwqFkpdF3uHRJ6VbQEKbrkZYFXE0NG9SNVUbtOEcyOKQkwgERfg8DX7ErYC5ckZiudZKW7Bh9g2kiVCDrwuCRcN9iycA/FteJTO3nyyijA5GoPM0OkXIM8qo3sxT3ubv5NCyqty23fYAF/kyqSyUD3qILrmvlyUM414vJ7M0vSjjvz9IY8dNeZ941CdQNqTMVtugPQz7P8M9QrMIIEAfOIDNonCnASaCB/+K1oe6sfa/72lGxhj4DTiO1lNWfm2wWv/EBsLZPTyRaJEbbRje0FDQNSZLPCeuK2epzAVUY9DL1TRpTsQm8/hls20lYJx3rqD0MahYaZ3LLxmzEIm/NJx+36gKTWYYT6dQjqOp+cXG8A75poX6vxDBn2Oi3z3LMW1j8PxKnzhj0r8QNnEcb9VQRupDcAlEX/n2AvjoP6nZa7qSTPyi674SOYcOborcTwI3RQwlFcAy1eWnzYoYkiMqV9m82cwPZJaxbF9VlGd1RphyqxYWSVdK6XhmcJZZe+pg1bdIsGhrX2gsfeG0lROa1yWv1zgFL7SfAXObtGYUJq89WxPhkRD+6KroHI0eL5dQFYyxuW27qbj0MxEU0zrhW2mdHvqoj/HPwNqwwvr/dKz4zpsnZwNG/SDHCGgEpmlWo52rQbKht+3jcHM5J+VEBhpaFNt+1MDRP23a04/yFMKAtektNltKfNiUd3VTsGP6PU9FfX/SRX7lXG2x/xRAGc+cRLXac13dZA+deGMBNnb31N24pbOsED3hZdIxtS9moFHoOMYx9GBuklrOP6/J0sazolk/jtkNtTBoIJu7H85PWdZ5A5s hpZj2SMs FHS28M8VDn0vmC2NnspbRsHuCsmOq4D9AT+t8QKLoN1FLP+rCvsgLEQTZFLgY+Oo9YboQphK1fnZX21+e9CGwNzVgcvWFterv6GPfkPFWJhgss7TT61wGcpbPgmtaHE+S7dAAW6c1XdqSEfahVyomKGq+khVGa6gd5HPv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds profiling enhancement for DAMON. Given the sampling_addr and its region bounds, this patch picks the highest possible page table tree level such that the address range covered by the picked page table level (P*D) is within the region's bounds. Once a page table level is picked, access bit setting and checking is done at that level. As the higher levels of the page table tree covers a larger address space, any accessed bit set implies one or more pages in the given region is accessed. This helps in quickly identifying hot regions when the region size is large (e.g., several GBs), which is common for large footprint applications. Signed-off-by: Alan Nair Signed-off-by: Sandeep Kumar Signed-off-by: Aravinda Prasad --- mm/damon/vaddr.c | 233 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 221 insertions(+), 12 deletions(-) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 381559e4a1fa..daa1a2aedab6 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -52,6 +52,53 @@ static struct mm_struct *damon_get_mm(struct damon_target *t) return mm; } +/* Pick the highest possible page table profiling level for addr + * in the region defined by start and end + */ +static int pick_profile_level(unsigned long start, unsigned long end, + unsigned long addr) +{ + /* Start with PTE and check if higher levels can be picked */ + int level = 0; + + if (!arch_has_hw_nonleaf_pmd_young()) + return level; + + /* Check if PMD or higher can be picked, else use PTE */ + if (pmd_addr_start(addr, (start) - 1) < start + || pmd_addr_end(addr, (end) + 1) > end) + return level; + + level++; + /* Check if PUD or higher can be picked, else use PMD */ + if (pud_addr_start(addr, (start) - 1) < start + || pud_addr_end(addr, (end) + 1) > end) + return level; + + if (pgtable_l5_enabled()) { + level++; + /* Check if P4D or higher can be picked, else use PUD */ + if (p4d_addr_start(addr, (start) - 1) < start + || p4d_addr_end(addr, (end) + 1) > end) + return level; + } + + level++; + /* Check if PGD can be picked, else return PUD level */ + if (pgd_addr_start(addr, (start) - 1) < start + || pgd_addr_end(addr, (end) + 1) > end) + return level; + +#ifdef CONFIG_PAGE_TABLE_ISOLATION + /* Do not pick PGD level if PTI is enabled */ + if (static_cpu_has(X86_FEATURE_PTI)) + return level; +#endif + + /* Return PGD level */ + return ++level; +} + /* * Functions for the initial monitoring target regions construction */ @@ -387,16 +434,90 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, #define damon_mkold_hugetlb_entry NULL #endif /* CONFIG_HUGETLB_PAGE */ -static const struct mm_walk_ops damon_mkold_ops = { - .pmd_entry = damon_mkold_pmd_entry, + +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG +static int damon_mkold_pmd(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + + if (!pmd_present(*pmd)) + return 0; + + ptl = pmd_lock(walk->mm, pmd); + pmdp_clear_young_notify(walk->vma, addr, pmd); + spin_unlock(ptl); + + return 0; +} + +static int damon_mkold_pud(pud_t *pud, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + + if (!pud_present(*pud)) + return 0; + + ptl = pud_lock(walk->mm, pud); + pudp_clear_young_notify(walk->vma, addr, pud); + spin_unlock(ptl); + + return 0; +} + +static int damon_mkold_p4d(p4d_t *p4d, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm = walk->mm; + + if (!p4d_present(*p4d)) + return 0; + + spin_lock(&mm->page_table_lock); + p4dp_clear_young_notify(walk->vma, addr, p4d); + spin_unlock(&mm->page_table_lock); + + return 0; +} + +static int damon_mkold_pgd(pgd_t *pgd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm = walk->mm; + + if (!pgd_present(*pgd)) + return 0; + + spin_lock(&mm->page_table_lock); + pgdp_clear_young_notify(walk->vma, addr, pgd); + spin_unlock(&mm->page_table_lock); + + return 0; +} +#endif + +static const struct mm_walk_ops damon_mkold_ops[] = { + {.pmd_entry = damon_mkold_pmd_entry, .hugetlb_entry = damon_mkold_hugetlb_entry, - .walk_lock = PGWALK_RDLOCK, + .walk_lock = PGWALK_RDLOCK}, +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG + {.pmd_entry = damon_mkold_pmd}, + {.pud_entry = damon_mkold_pud}, + {.p4d_entry = damon_mkold_p4d}, + {.pgd_entry = damon_mkold_pgd}, +#endif }; -static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) +static void damon_va_mkold(struct mm_struct *mm, struct damon_region *r) { + unsigned long addr = r->sampling_addr; + int profile_level; + + profile_level = pick_profile_level(r->ar.start, r->ar.end, addr); + mmap_read_lock(mm); - walk_page_range(mm, addr, addr + 1, &damon_mkold_ops, NULL); + walk_page_range(mm, addr, addr + 1, &damon_mkold_ops[profile_level], NULL); mmap_read_unlock(mm); } @@ -409,7 +530,7 @@ static void __damon_va_prepare_access_check(struct mm_struct *mm, { r->sampling_addr = damon_rand(r->ar.start, r->ar.end); - damon_va_mkold(mm, r->sampling_addr); + damon_va_mkold(mm, r); } static void damon_va_prepare_access_checks(struct damon_ctx *ctx) @@ -531,22 +652,110 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, #define damon_young_hugetlb_entry NULL #endif /* CONFIG_HUGETLB_PAGE */ -static const struct mm_walk_ops damon_young_ops = { - .pmd_entry = damon_young_pmd_entry, + +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG +static int damon_young_pmd(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + struct damon_young_walk_private *priv = walk->private; + + if (!pmd_present(*pmd)) + return 0; + + ptl = pmd_lock(walk->mm, pmd); + if (pmd_young(*pmd) || mmu_notifier_test_young(walk->mm, addr)) + priv->young = true; + + *priv->folio_sz = (1UL << PMD_SHIFT); + spin_unlock(ptl); + + return 0; +} + +static int damon_young_pud(pud_t *pud, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + spinlock_t *ptl; + struct damon_young_walk_private *priv = walk->private; + + if (!pud_present(*pud)) + return 0; + + ptl = pud_lock(walk->mm, pud); + if (pud_young(*pud) || mmu_notifier_test_young(walk->mm, addr)) + priv->young = true; + + *priv->folio_sz = (1UL << PUD_SHIFT); + spin_unlock(ptl); + + return 0; +} + +static int damon_young_p4d(p4d_t *p4d, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct mm_struct *mm = walk->mm; + struct damon_young_walk_private *priv = walk->private; + + if (!p4d_present(*p4d)) + return 0; + + spin_lock(&mm->page_table_lock); + if (p4d_young(*p4d) || mmu_notifier_test_young(walk->mm, addr)) + priv->young = true; + + *priv->folio_sz = (1UL << P4D_SHIFT); + spin_unlock(&mm->page_table_lock); + + return 0; +} + +static int damon_young_pgd(pgd_t *pgd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct damon_young_walk_private *priv = walk->private; + + if (!pgd_present(*pgd)) + return 0; + + spin_lock(&pgd_lock); + if (pgd_young(*pgd) || mmu_notifier_test_young(walk->mm, addr)) + priv->young = true; + + *priv->folio_sz = (1UL << PGDIR_SHIFT); + spin_unlock(&pgd_lock); + + return 0; +} +#endif + +static const struct mm_walk_ops damon_young_ops[] = { + {.pmd_entry = damon_young_pmd_entry, .hugetlb_entry = damon_young_hugetlb_entry, - .walk_lock = PGWALK_RDLOCK, + .walk_lock = PGWALK_RDLOCK}, +#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG + {.pmd_entry = damon_young_pmd}, + {.pud_entry = damon_young_pud}, + {.p4d_entry = damon_young_p4d}, + {.pgd_entry = damon_young_pgd}, +#endif }; -static bool damon_va_young(struct mm_struct *mm, unsigned long addr, +static bool damon_va_young(struct mm_struct *mm, struct damon_region *r, unsigned long *folio_sz) { + unsigned long addr = r->sampling_addr; + int profile_level; struct damon_young_walk_private arg = { .folio_sz = folio_sz, .young = false, }; + profile_level = pick_profile_level(r->ar.start, r->ar.end, addr); + mmap_read_lock(mm); - walk_page_range(mm, addr, addr + 1, &damon_young_ops, &arg); + walk_page_range(mm, addr, addr + 1, &damon_young_ops[profile_level], &arg); mmap_read_unlock(mm); return arg.young; } @@ -577,7 +786,7 @@ static void __damon_va_check_access(struct mm_struct *mm, return; } - last_accessed = damon_va_young(mm, r->sampling_addr, &last_folio_sz); + last_accessed = damon_va_young(mm, r, &last_folio_sz); damon_update_region_access_rate(r, last_accessed, attrs); last_addr = r->sampling_addr; -- 2.21.3