linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@amd.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@redhat.com>, <rppt@kernel.org>,
	Bharata B Rao <bharata@amd.com>,
	Disha Talreja <dishaa.talreja@amd.com>,
	Raghavendra K T <raghavendra.kt@amd.com>
Subject: [PATCH V2 2/3] sched/numa: Enhance vma scanning logic
Date: Wed, 1 Feb 2023 13:32:21 +0530	[thread overview]
Message-ID: <5f0872657ddb164aa047a2231f8dc1086fe6adf6.1675159422.git.raghavendra.kt@amd.com> (raw)
In-Reply-To: <cover.1675159422.git.raghavendra.kt@amd.com>

 During the Numa scanning make sure only relevant vmas of the
tasks are scanned.

Before:
 All the tasks of a process participate in scanning the vma
even if they do not access vma in it's lifespan.

Now:
 Except cases of first few unconditional scans, if a process do
not touch vma (exluding false positive cases of PID collisions)
tasks no longer scan all vma.

Logic used:
1) 6 bits of PID used to mark active bit in vma numab status during
 fault to remember PIDs accessing vma. (Thanks Mel)

2) Subsequently in scan path, vma scanning is skipped if current PID
had not accessed vma.

3) First two times we do allow unconditional scan to preserve earlier
 behaviour of scanning.

Acknowledgement to Bharata B Rao <bharata@amd.com> for initial patch
to store pid information.

Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
---
 include/linux/mm.h       | 14 ++++++++++++++
 include/linux/mm_types.h |  1 +
 kernel/sched/fair.c      | 15 +++++++++++++++
 mm/huge_memory.c         |  1 +
 mm/memory.c              |  1 +
 5 files changed, 32 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74d9df1d8982..489422942482 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1381,6 +1381,16 @@ static inline int xchg_page_access_time(struct page *page, int time)
 	last_time = page_cpupid_xchg_last(page, time >> PAGE_ACCESS_TIME_BUCKETS);
 	return last_time << PAGE_ACCESS_TIME_BUCKETS;
 }
+
+static inline void vma_set_active_pid_bit(struct vm_area_struct *vma)
+{
+	unsigned int active_pid_bit;
+
+	if (vma->numab) {
+		active_pid_bit = current->pid % BITS_PER_LONG;
+		vma->numab->accessing_pids |= 1UL << active_pid_bit;
+	}
+}
 #else /* !CONFIG_NUMA_BALANCING */
 static inline int page_cpupid_xchg_last(struct page *page, int cpupid)
 {
@@ -1430,6 +1440,10 @@ static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
 {
 	return false;
 }
+
+static inline void vma_set_active_pid_bit(struct vm_area_struct *vma)
+{
+}
 #endif /* CONFIG_NUMA_BALANCING */
 
 #if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e84f95a77321..980a6a4308b6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -437,6 +437,7 @@ struct anon_vma_name {
 
 struct vma_numab {
 	unsigned long next_scan;
+	unsigned long accessing_pids;
 };
 
 /*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 060b241ce3c5..3505ae57c07c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2916,6 +2916,18 @@ static void reset_ptenuma_scan(struct task_struct *p)
 	p->mm->numa_scan_offset = 0;
 }
 
+static bool vma_is_accessed(struct vm_area_struct *vma)
+{
+	unsigned int active_pid_bit;
+
+	if (READ_ONCE(current->mm->numa_scan_seq) < 2)
+		return true;
+
+	active_pid_bit = current->pid % BITS_PER_LONG;
+
+	return vma->numab->accessing_pids & (1UL << active_pid_bit);
+}
+
 /*
  * The expensive part of numa migration is done from task_work context.
  * Triggered from task_tick_numa().
@@ -3032,6 +3044,9 @@ static void task_numa_work(struct callback_head *work)
 		if (mm->numa_scan_seq && time_before(jiffies, vma->numab->next_scan))
 			continue;
 
+		if (!vma_is_accessed(vma))
+			continue;
+
 		do {
 			start = max(start, vma->vm_start);
 			end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 811d19b5c4f6..d908aa95f3c3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1485,6 +1485,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	bool was_writable = pmd_savedwrite(oldpmd);
 	int flags = 0;
 
+	vma_set_active_pid_bit(vma);
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
 		spin_unlock(vmf->ptl);
diff --git a/mm/memory.c b/mm/memory.c
index 8c8420934d60..2ec3045cb8b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4718,6 +4718,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	bool was_writable = pte_savedwrite(vmf->orig_pte);
 	int flags = 0;
 
+	vma_set_active_pid_bit(vma);
 	/*
 	 * The "pte" at this point cannot be used safely without
 	 * validation through pte_unmap_same(). It's of NUMA type but
-- 
2.34.1



  parent reply	other threads:[~2023-02-01  8:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01  8:02 [PATCH V2 0/3] sched/numa: Enhance vma scanning Raghavendra K T
2023-02-01  8:02 ` [PATCH V2 1/3] sched/numa: Apply the scan delay to every vma instead of tasks Raghavendra K T
2023-02-03 10:24   ` Peter Zijlstra
2023-02-04 17:19     ` Raghavendra K T
2023-02-01  8:02 ` Raghavendra K T [this message]
2023-02-03 11:15   ` [PATCH V2 2/3] sched/numa: Enhance vma scanning logic Peter Zijlstra
2023-02-03 11:27     ` Peter Zijlstra
2023-02-04 18:18       ` Raghavendra K T
2023-02-04 18:14     ` Raghavendra K T
2023-02-07  6:41       ` Raghavendra K T
2023-02-27  6:40         ` Raghavendra K T
2023-02-27 10:06           ` Peter Zijlstra
2023-02-27 10:12             ` Raghavendra K T
2023-02-28  4:59       ` Raghavendra K T
2023-02-01  8:02 ` [PATCH V2 3/3] sched/numa: Reset the accessing PID information periodically Raghavendra K T
2023-02-03 11:35   ` Peter Zijlstra
2023-02-04 18:32     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f0872657ddb164aa047a2231f8dc1086fe6adf6.1675159422.git.raghavendra.kt@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=dishaa.talreja@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox