* [PATCH v7 1/6] mm/vmscan: make active/inactive ratio as 1:1 for anon lru
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
@ 2020-07-23 7:49 ` js1304
2020-07-23 7:49 ` [PATCH v7 2/6] mm/vmscan: protect the workingset on anonymous LRU js1304
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Current implementation of LRU management for anonymous page has some
problems. Most important one is that it doesn't protect the workingset,
that is, pages on the active LRU list. Although, this problem will be
fixed in the following patchset, the preparation is required and
this patch does it.
What following patch does is to implement workingset protection. After
the following patchset, newly created or swap-in pages will start their
lifetime on the inactive list. If inactive list is too small, there is not
enough chance to be referenced and the page cannot become the workingset.
In order to provide the newly anonymous or swap-in pages enough chance to
be referenced again, this patch makes active/inactive LRU ratio as 1:1.
This is just a temporary measure. Later patch in the series introduces
workingset detection for anonymous LRU that will be used to better decide
if pages should start on the active and inactive list. Afterwards this
patch is effectively reverted.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6acc956..d5a19c7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2208,7 +2208,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
gb = (inactive + active) >> (30 - PAGE_SHIFT);
- if (gb)
+ if (gb && is_file_lru(inactive_lru))
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v7 2/6] mm/vmscan: protect the workingset on anonymous LRU
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
2020-07-23 7:49 ` [PATCH v7 1/6] mm/vmscan: make active/inactive ratio as 1:1 for anon lru js1304
@ 2020-07-23 7:49 ` js1304
2020-07-23 7:49 ` [PATCH v7 3/6] mm/workingset: prepare the workingset detection infrastructure for anon LRU js1304
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
In current implementation, newly created or swap-in anonymous page
is started on active list. Growing active list results in rebalancing
active/inactive list so old pages on active list are demoted to inactive
list. Hence, the page on active list isn't protected at all.
Following is an example of this situation.
Assume that 50 hot pages on active list. Numbers denote the number of
pages on active/inactive list (active | inactive).
1. 50 hot pages on active list
50(h) | 0
2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)
3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)
This patch tries to fix this issue.
Like as file LRU, newly created or swap-in anonymous pages will be
inserted to the inactive list. They are promoted to active list if
enough reference happens. This simple modification changes the above
example as following.
1. 50 hot pages on active list
50(h) | 0
2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)
3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)
As you can see, hot pages on active list would be protected.
Note that, this implementation has a drawback that the page cannot
be promoted and will be swapped-out if re-access interval is greater than
the size of inactive list but less than the size of total(active+inactive).
To solve this potential issue, following patch will apply workingset
detection similar to the one that's already applied to file LRU.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/swap.h | 2 +-
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 2 +-
mm/memory.c | 9 ++++-----
mm/migrate.c | 2 +-
mm/swap.c | 13 +++++++------
mm/swapfile.c | 2 +-
mm/userfaultfd.c | 2 +-
mm/vmscan.c | 4 +---
10 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7eb59bc..51ec9cd 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -352,7 +352,7 @@ extern void deactivate_page(struct page *page);
extern void mark_page_lazyfree(struct page *page);
extern void swap_setup(void);
-extern void lru_cache_add_active_or_unevictable(struct page *page,
+extern void lru_cache_add_inactive_or_unevictable(struct page *page,
struct vm_area_struct *vma);
/* linux/mm/vmscan.c */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f500204..02791f8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -184,7 +184,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
get_page(new_page);
page_add_new_anon_rmap(new_page, vma, addr, false);
- lru_cache_add_active_or_unevictable(new_page, vma);
+ lru_cache_add_inactive_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
dec_mm_counter(mm, MM_ANONPAGES);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 15c9690..2068518 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -619,7 +619,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr, true);
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b043c40..02fb51f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1173,7 +1173,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address, true);
- lru_cache_add_active_or_unevictable(new_page, vma);
+ lru_cache_add_inactive_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
diff --git a/mm/memory.c b/mm/memory.c
index 45e1dc0..25769b6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2717,7 +2717,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
- lru_cache_add_active_or_unevictable(new_page, vma);
+ lru_cache_add_inactive_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
* mmu page tables (such as kvm shadow page tables), we want the
@@ -3268,10 +3268,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
page_add_new_anon_rmap(page, vma, vmf->address, false);
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
} else {
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
- activate_page(page);
}
swap_free(entry);
@@ -3416,7 +3415,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3674,7 +3673,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
page_add_file_rmap(page, false);
diff --git a/mm/migrate.c b/mm/migrate.c
index c233781..4fef341 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2912,7 +2912,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
inc_mm_counter(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
get_page(page);
if (flush) {
diff --git a/mm/swap.c b/mm/swap.c
index 587be74..d16d65d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -476,23 +476,24 @@ void lru_cache_add(struct page *page)
EXPORT_SYMBOL(lru_cache_add);
/**
- * lru_cache_add_active_or_unevictable
+ * lru_cache_add_inactive_or_unevictable
* @page: the page to be added to LRU
* @vma: vma in which page is mapped for determining reclaimability
*
- * Place @page on the active or unevictable LRU list, depending on its
+ * Place @page on the inactive or unevictable LRU list, depending on its
* evictability. Note that if the page is not evictable, it goes
* directly back onto it's zone's unevictable list, it does NOT use a
* per cpu pagevec.
*/
-void lru_cache_add_active_or_unevictable(struct page *page,
+void lru_cache_add_inactive_or_unevictable(struct page *page,
struct vm_area_struct *vma)
{
+ bool unevictable;
+
VM_BUG_ON_PAGE(PageLRU(page), page);
- if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
- SetPageActive(page);
- else if (!TestSetPageMlocked(page)) {
+ unevictable = (vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED;
+ if (unlikely(unevictable) && !TestSetPageMlocked(page)) {
/*
* We use the irq-unsafe __mod_zone_page_stat because this
* counter is not modified from interrupt context, and the pte
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7b0974f..6bd9b4c 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1921,7 +1921,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
page_add_anon_rmap(page, vma, addr, false);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr, false);
- lru_cache_add_active_or_unevictable(page, vma);
+ lru_cache_add_inactive_or_unevictable(page, vma);
}
swap_free(entry);
/*
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index b804193..9a3d451 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -123,7 +123,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
inc_mm_counter(dst_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
- lru_cache_add_active_or_unevictable(page, dst_vma);
+ lru_cache_add_inactive_or_unevictable(page, dst_vma);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d5a19c7..9406948 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -998,8 +998,6 @@ static enum page_references page_check_references(struct page *page,
return PAGEREF_RECLAIM;
if (referenced_ptes) {
- if (PageSwapBacked(page))
- return PAGEREF_ACTIVATE;
/*
* All mapped pages start out with page table
* references from the instantiating fault, so we need
@@ -1022,7 +1020,7 @@ static enum page_references page_check_references(struct page *page,
/*
* Activate file-backed executable pages after first usage.
*/
- if (vm_flags & VM_EXEC)
+ if ((vm_flags & VM_EXEC) && !PageSwapBacked(page))
return PAGEREF_ACTIVATE;
return PAGEREF_KEEP;
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v7 3/6] mm/workingset: prepare the workingset detection infrastructure for anon LRU
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
2020-07-23 7:49 ` [PATCH v7 1/6] mm/vmscan: make active/inactive ratio as 1:1 for anon lru js1304
2020-07-23 7:49 ` [PATCH v7 2/6] mm/vmscan: protect the workingset on anonymous LRU js1304
@ 2020-07-23 7:49 ` js1304
2020-07-23 7:49 ` [PATCH v7 4/6] mm/swapcache: support to handle the shadow entries js1304
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon
and file variants, as well as the refaults counter in struct lruvec.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/mmzone.h | 16 +++++++++++-----
mm/memcontrol.c | 16 +++++++++++-----
mm/vmscan.c | 15 ++++++++++-----
mm/vmstat.c | 9 ++++++---
mm/workingset.c | 8 +++++---
5 files changed, 43 insertions(+), 21 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 635a96c..efbd95d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -173,9 +173,15 @@ enum node_stat_item {
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
WORKINGSET_NODES,
- WORKINGSET_REFAULT,
- WORKINGSET_ACTIVATE,
- WORKINGSET_RESTORE,
+ WORKINGSET_REFAULT_BASE,
+ WORKINGSET_REFAULT_ANON = WORKINGSET_REFAULT_BASE,
+ WORKINGSET_REFAULT_FILE,
+ WORKINGSET_ACTIVATE_BASE,
+ WORKINGSET_ACTIVATE_ANON = WORKINGSET_ACTIVATE_BASE,
+ WORKINGSET_ACTIVATE_FILE,
+ WORKINGSET_RESTORE_BASE,
+ WORKINGSET_RESTORE_ANON = WORKINGSET_RESTORE_BASE,
+ WORKINGSET_RESTORE_FILE,
WORKINGSET_NODERECLAIM,
NR_ANON_MAPPED, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
@@ -277,8 +283,8 @@ struct lruvec {
unsigned long file_cost;
/* Non-resident age, driven by LRU movement */
atomic_long_t nonresident_age;
- /* Refaults at the time of last reclaim cycle */
- unsigned long refaults;
+ /* Refaults at the time of last reclaim cycle, anon=0, file=1 */
+ unsigned long refaults[2];
/* Various lruvec state flags (enum lruvec_flags) */
unsigned long flags;
#ifdef CONFIG_MEMCG
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 14dd98d..e84c2b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1530,12 +1530,18 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT),
memcg_events(memcg, PGMAJFAULT));
- seq_buf_printf(&s, "workingset_refault %lu\n",
- memcg_page_state(memcg, WORKINGSET_REFAULT));
- seq_buf_printf(&s, "workingset_activate %lu\n",
- memcg_page_state(memcg, WORKINGSET_ACTIVATE));
+ seq_buf_printf(&s, "workingset_refault_anon %lu\n",
+ memcg_page_state(memcg, WORKINGSET_REFAULT_ANON));
+ seq_buf_printf(&s, "workingset_refault_file %lu\n",
+ memcg_page_state(memcg, WORKINGSET_REFAULT_FILE));
+ seq_buf_printf(&s, "workingset_activate_anon %lu\n",
+ memcg_page_state(memcg, WORKINGSET_ACTIVATE_ANON));
+ seq_buf_printf(&s, "workingset_activate_file %lu\n",
+ memcg_page_state(memcg, WORKINGSET_ACTIVATE_FILE));
seq_buf_printf(&s, "workingset_restore %lu\n",
- memcg_page_state(memcg, WORKINGSET_RESTORE));
+ memcg_page_state(memcg, WORKINGSET_RESTORE_ANON));
+ seq_buf_printf(&s, "workingset_restore %lu\n",
+ memcg_page_state(memcg, WORKINGSET_RESTORE_FILE));
seq_buf_printf(&s, "workingset_nodereclaim %lu\n",
memcg_page_state(memcg, WORKINGSET_NODERECLAIM));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9406948..6dda5b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2683,7 +2683,10 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
if (!sc->force_deactivate) {
unsigned long refaults;
- if (inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
+ refaults = lruvec_page_state(target_lruvec,
+ WORKINGSET_ACTIVATE_ANON);
+ if (refaults != target_lruvec->refaults[0] ||
+ inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
sc->may_deactivate |= DEACTIVATE_ANON;
else
sc->may_deactivate &= ~DEACTIVATE_ANON;
@@ -2694,8 +2697,8 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
* rid of any stale active pages quickly.
*/
refaults = lruvec_page_state(target_lruvec,
- WORKINGSET_ACTIVATE);
- if (refaults != target_lruvec->refaults ||
+ WORKINGSET_ACTIVATE_FILE);
+ if (refaults != target_lruvec->refaults[1] ||
inactive_is_low(target_lruvec, LRU_INACTIVE_FILE))
sc->may_deactivate |= DEACTIVATE_FILE;
else
@@ -2972,8 +2975,10 @@ static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
unsigned long refaults;
target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
- refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE);
- target_lruvec->refaults = refaults;
+ refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE_ANON);
+ target_lruvec->refaults[0] = refaults;
+ refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE_FILE);
+ target_lruvec->refaults[1] = refaults;
}
/*
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5b35c0e..6eecfcb 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1190,9 +1190,12 @@ const char * const vmstat_text[] = {
"nr_isolated_anon",
"nr_isolated_file",
"workingset_nodes",
- "workingset_refault",
- "workingset_activate",
- "workingset_restore",
+ "workingset_refault_anon",
+ "workingset_refault_file",
+ "workingset_activate_anon",
+ "workingset_activate_file",
+ "workingset_restore_anon",
+ "workingset_restore_file",
"workingset_nodereclaim",
"nr_anon_pages",
"nr_mapped",
diff --git a/mm/workingset.c b/mm/workingset.c
index 21b2986..2d77e4d 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -6,6 +6,7 @@
*/
#include <linux/memcontrol.h>
+#include <linux/mm_inline.h>
#include <linux/writeback.h>
#include <linux/shmem_fs.h>
#include <linux/pagemap.h>
@@ -280,6 +281,7 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
*/
void workingset_refault(struct page *page, void *shadow)
{
+ bool file = page_is_file_lru(page);
struct mem_cgroup *eviction_memcg;
struct lruvec *eviction_lruvec;
unsigned long refault_distance;
@@ -346,7 +348,7 @@ void workingset_refault(struct page *page, void *shadow)
memcg = page_memcg(page);
lruvec = mem_cgroup_lruvec(memcg, pgdat);
- inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
+ inc_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file);
/*
* Compare the distance to the existing workingset size. We
@@ -366,7 +368,7 @@ void workingset_refault(struct page *page, void *shadow)
SetPageActive(page);
workingset_age_nonresident(lruvec, thp_nr_pages(page));
- inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
+ inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file);
/* Page was active prior to eviction */
if (workingset) {
@@ -375,7 +377,7 @@ void workingset_refault(struct page *page, void *shadow)
spin_lock_irq(&page_pgdat(page)->lru_lock);
lru_note_cost_page(page);
spin_unlock_irq(&page_pgdat(page)->lru_lock);
- inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
+ inc_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file);
}
out:
rcu_read_unlock();
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v7 4/6] mm/swapcache: support to handle the shadow entries
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
` (2 preceding siblings ...)
2020-07-23 7:49 ` [PATCH v7 3/6] mm/workingset: prepare the workingset detection infrastructure for anon LRU js1304
@ 2020-07-23 7:49 ` js1304
2020-07-23 7:49 ` [PATCH v7 5/6] mm/swap: implement workingset detection for anonymous LRU js1304
2020-07-23 7:49 ` [PATCH v7 6/6] mm/vmscan: restore active/inactive ratio " js1304
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Workingset detection for anonymous page will be implemented in the
following patch and it requires to store the shadow entries into the
swapcache. This patch implements an infrastructure to store the shadow
entry in the swapcache.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/swap.h | 17 ++++++++++++----
mm/shmem.c | 3 ++-
mm/swap_state.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++------
mm/swapfile.c | 2 ++
mm/vmscan.c | 2 +-
5 files changed, 69 insertions(+), 12 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 51ec9cd..8a4c592 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -414,9 +414,13 @@ extern struct address_space *swapper_spaces[];
extern unsigned long total_swapcache_pages(void);
extern void show_swap_cache_info(void);
extern int add_to_swap(struct page *page);
-extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
-extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
+extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
+ gfp_t gfp, void **shadowp);
+extern void __delete_from_swap_cache(struct page *page,
+ swp_entry_t entry, void *shadow);
extern void delete_from_swap_cache(struct page *);
+extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
+ unsigned long end);
extern void free_page_and_swap_cache(struct page *);
extern void free_pages_and_swap_cache(struct page **, int);
extern struct page *lookup_swap_cache(swp_entry_t entry,
@@ -570,13 +574,13 @@ static inline int add_to_swap(struct page *page)
}
static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
- gfp_t gfp_mask)
+ gfp_t gfp_mask, void **shadowp)
{
return -1;
}
static inline void __delete_from_swap_cache(struct page *page,
- swp_entry_t entry)
+ swp_entry_t entry, void *shadow)
{
}
@@ -584,6 +588,11 @@ static inline void delete_from_swap_cache(struct page *page)
{
}
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+ unsigned long end)
+{
+}
+
static inline int page_swapcount(struct page *page)
{
return 0;
diff --git a/mm/shmem.c b/mm/shmem.c
index 89b357a..85ed46f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1434,7 +1434,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
list_add(&info->swaplist, &shmem_swaplist);
if (add_to_swap_cache(page, swap,
- __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
+ __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
+ NULL) == 0) {
spin_lock_irq(&info->lock);
shmem_recalc_inode(inode);
info->swapped++;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 66e750f..13d8d66 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -110,12 +110,14 @@ void show_swap_cache_info(void)
* add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
* but sets SwapCache flag and private instead of mapping and index.
*/
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+ gfp_t gfp, void **shadowp)
{
struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page));
unsigned long i, nr = thp_nr_pages(page);
+ void *old;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@@ -125,16 +127,25 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
SetPageSwapCache(page);
do {
+ unsigned long nr_shadows = 0;
+
xas_lock_irq(&xas);
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;
for (i = 0; i < nr; i++) {
VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
+ old = xas_load(&xas);
+ if (xa_is_value(old)) {
+ nr_shadows++;
+ if (shadowp)
+ *shadowp = old;
+ }
set_page_private(page + i, entry.val + i);
xas_store(&xas, page);
xas_next(&xas);
}
+ address_space->nrexceptional -= nr_shadows;
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
ADD_CACHE_INFO(add_total, nr);
@@ -154,7 +165,8 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
* This must be called only on pages that have
* been verified to be in the swap cache.
*/
-void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
+void __delete_from_swap_cache(struct page *page,
+ swp_entry_t entry, void *shadow)
{
struct address_space *address_space = swap_address_space(entry);
int i, nr = thp_nr_pages(page);
@@ -166,12 +178,14 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageWriteback(page), page);
for (i = 0; i < nr; i++) {
- void *entry = xas_store(&xas, NULL);
+ void *entry = xas_store(&xas, shadow);
VM_BUG_ON_PAGE(entry != page, entry);
set_page_private(page + i, 0);
xas_next(&xas);
}
ClearPageSwapCache(page);
+ if (shadow)
+ address_space->nrexceptional += nr;
address_space->nrpages -= nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
ADD_CACHE_INFO(del_total, nr);
@@ -208,7 +222,7 @@ int add_to_swap(struct page *page)
* Add it to the swap cache.
*/
err = add_to_swap_cache(page, entry,
- __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
+ __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL);
if (err)
/*
* add_to_swap_cache() doesn't return -EEXIST, so we can safely
@@ -246,13 +260,44 @@ void delete_from_swap_cache(struct page *page)
struct address_space *address_space = swap_address_space(entry);
xa_lock_irq(&address_space->i_pages);
- __delete_from_swap_cache(page, entry);
+ __delete_from_swap_cache(page, entry, NULL);
xa_unlock_irq(&address_space->i_pages);
put_swap_page(page, entry);
page_ref_sub(page, thp_nr_pages(page));
}
+void clear_shadow_from_swap_cache(int type, unsigned long begin,
+ unsigned long end)
+{
+ unsigned long curr = begin;
+ void *old;
+
+ for (;;) {
+ unsigned long nr_shadows = 0;
+ swp_entry_t entry = swp_entry(type, curr);
+ struct address_space *address_space = swap_address_space(entry);
+ XA_STATE(xas, &address_space->i_pages, curr);
+
+ xa_lock_irq(&address_space->i_pages);
+ xas_for_each(&xas, old, end) {
+ if (!xa_is_value(old))
+ continue;
+ xas_store(&xas, NULL);
+ nr_shadows++;
+ }
+ address_space->nrexceptional -= nr_shadows;
+ xa_unlock_irq(&address_space->i_pages);
+
+ /* search the next swapcache until we meet end */
+ curr >>= SWAP_ADDRESS_SPACE_SHIFT;
+ curr++;
+ curr <<= SWAP_ADDRESS_SPACE_SHIFT;
+ if (curr > end)
+ break;
+ }
+}
+
/*
* If we are the only user, then try to free up the swap cache.
*
@@ -429,7 +474,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
__SetPageSwapBacked(page);
/* May fail (-ENOMEM) if XArray node allocation failed. */
- if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK)) {
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, NULL)) {
put_swap_page(page, entry);
goto fail_unlock;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6bd9b4c..3dba9be 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -696,6 +696,7 @@ static void add_to_avail_list(struct swap_info_struct *p)
static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
unsigned int nr_entries)
{
+ unsigned long begin = offset;
unsigned long end = offset + nr_entries - 1;
void (*swap_slot_free_notify)(struct block_device *, unsigned long);
@@ -722,6 +723,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
swap_slot_free_notify(si->bdev, offset);
offset++;
}
+ clear_shadow_from_swap_cache(si->type, begin, end);
}
static void set_cluster_next(struct swap_info_struct *si, unsigned long next)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6dda5b2..b9b543e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -896,7 +896,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
mem_cgroup_swapout(page, swap);
- __delete_from_swap_cache(page, swap);
+ __delete_from_swap_cache(page, swap, NULL);
xa_unlock_irqrestore(&mapping->i_pages, flags);
put_swap_page(page, swap);
workingset_eviction(page, target_memcg);
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v7 5/6] mm/swap: implement workingset detection for anonymous LRU
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
` (3 preceding siblings ...)
2020-07-23 7:49 ` [PATCH v7 4/6] mm/swapcache: support to handle the shadow entries js1304
@ 2020-07-23 7:49 ` js1304
2020-07-23 7:49 ` [PATCH v7 6/6] mm/vmscan: restore active/inactive ratio " js1304
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
This patch implements workingset detection for anonymous LRU.
All the infrastructure is implemented by the previous patches so this
patch just activates the workingset detection by installing/retrieving
the shadow entry and adding refault calculation.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/swap.h | 6 ++++++
mm/memory.c | 11 ++++-------
mm/swap_state.c | 23 ++++++++++++++++++-----
mm/vmscan.c | 7 ++++---
mm/workingset.c | 15 +++++++++++----
5 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a4c592..6610469 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -414,6 +414,7 @@ extern struct address_space *swapper_spaces[];
extern unsigned long total_swapcache_pages(void);
extern void show_swap_cache_info(void);
extern int add_to_swap(struct page *page);
+extern void *get_shadow_from_swap_cache(swp_entry_t entry);
extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp, void **shadowp);
extern void __delete_from_swap_cache(struct page *page,
@@ -573,6 +574,11 @@ static inline int add_to_swap(struct page *page)
return 0;
}
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+ return NULL;
+}
+
static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp_mask, void **shadowp)
{
diff --git a/mm/memory.c b/mm/memory.c
index 25769b6..4934dbc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3100,6 +3100,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
int locked;
int exclusive = 0;
vm_fault_t ret = 0;
+ void *shadow = NULL;
if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
goto out;
@@ -3151,13 +3152,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- /*
- * XXX: Move to lru_cache_add() when it
- * supports new vs putback
- */
- spin_lock_irq(&page_pgdat(page)->lru_lock);
- lru_note_cost_page(page);
- spin_unlock_irq(&page_pgdat(page)->lru_lock);
+ shadow = get_shadow_from_swap_cache(entry);
+ if (shadow)
+ workingset_refault(page, shadow);
lru_cache_add(page);
swap_readpage(page, true);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 13d8d66..146a86d 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -106,6 +106,20 @@ void show_swap_cache_info(void)
printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
}
+void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+ struct address_space *address_space = swap_address_space(entry);
+ pgoff_t idx = swp_offset(entry);
+ struct page *page;
+
+ page = find_get_entry(address_space, idx);
+ if (xa_is_value(page))
+ return page;
+ if (page)
+ put_page(page);
+ return NULL;
+}
+
/*
* add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
* but sets SwapCache flag and private instead of mapping and index.
@@ -406,6 +420,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
{
struct swap_info_struct *si;
struct page *page;
+ void *shadow = NULL;
*new_page_allocated = false;
@@ -474,7 +489,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
__SetPageSwapBacked(page);
/* May fail (-ENOMEM) if XArray node allocation failed. */
- if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, NULL)) {
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) {
put_swap_page(page, entry);
goto fail_unlock;
}
@@ -484,10 +499,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
goto fail_unlock;
}
- /* XXX: Move to lru_cache_add() when it supports new vs putback */
- spin_lock_irq(&page_pgdat(page)->lru_lock);
- lru_note_cost_page(page);
- spin_unlock_irq(&page_pgdat(page)->lru_lock);
+ if (shadow)
+ workingset_refault(page, shadow);
/* Caller will initiate read into locked page */
SetPageWorkingset(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b9b543e..9d4e28c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -854,6 +854,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
{
unsigned long flags;
int refcount;
+ void *shadow = NULL;
BUG_ON(!PageLocked(page));
BUG_ON(mapping != page_mapping(page));
@@ -896,13 +897,13 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
mem_cgroup_swapout(page, swap);
- __delete_from_swap_cache(page, swap, NULL);
+ if (reclaimed && !mapping_exiting(mapping))
+ shadow = workingset_eviction(page, target_memcg);
+ __delete_from_swap_cache(page, swap, shadow);
xa_unlock_irqrestore(&mapping->i_pages, flags);
put_swap_page(page, swap);
- workingset_eviction(page, target_memcg);
} else {
void (*freepage)(struct page *);
- void *shadow = NULL;
freepage = mapping->a_ops->freepage;
/*
diff --git a/mm/workingset.c b/mm/workingset.c
index 2d77e4d..92e6611 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -353,15 +353,22 @@ void workingset_refault(struct page *page, void *shadow)
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
- * all the memory was available to the page cache. Whether
- * cache can compete with anon or not depends on having swap.
+ * all the memory was available to the workingset. Whether
+ * workingset competition needs to consider anon or not depends
+ * on having swap.
*/
workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE);
- if (mem_cgroup_get_nr_swap_pages(memcg) > 0) {
+ if (!file) {
workingset_size += lruvec_page_state(eviction_lruvec,
- NR_INACTIVE_ANON);
+ NR_INACTIVE_FILE);
+ }
+ if (mem_cgroup_get_nr_swap_pages(memcg) > 0) {
workingset_size += lruvec_page_state(eviction_lruvec,
NR_ACTIVE_ANON);
+ if (file) {
+ workingset_size += lruvec_page_state(eviction_lruvec,
+ NR_INACTIVE_ANON);
+ }
}
if (refault_distance > workingset_size)
goto out;
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v7 6/6] mm/vmscan: restore active/inactive ratio for anonymous LRU
2020-07-23 7:49 [PATCH v7 0/6] workingset protection/detection on the anonymous LRU list js1304
` (4 preceding siblings ...)
2020-07-23 7:49 ` [PATCH v7 5/6] mm/swap: implement workingset detection for anonymous LRU js1304
@ 2020-07-23 7:49 ` js1304
5 siblings, 0 replies; 7+ messages in thread
From: js1304 @ 2020-07-23 7:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Johannes Weiner, Michal Hocko,
Hugh Dickins, Minchan Kim, Vlastimil Babka, Mel Gorman,
Matthew Wilcox, kernel-team, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Now that workingset detection is implemented for anonymous LRU, we don't
need large inactive list to allow detecting frequently accessed pages
before they are reclaimed, anymore. This effectively reverts the temporary
measure put in by commit "mm/vmscan: make active/inactive ratio as 1:1 for
anon lru".
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9d4e28c..b0de23d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2207,7 +2207,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
gb = (inactive + active) >> (30 - PAGE_SHIFT);
- if (gb && is_file_lru(inactive_lru))
+ if (gb)
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
--
2.7.4
^ permalink raw reply [flat|nested] 7+ messages in thread