* Re: [RFC] mm:do recheck for freeable page in reclaim path
@ 2015-03-19 0:42 Minchan Kim
0 siblings, 0 replies; 2+ messages in thread
From: Minchan Kim @ 2015-03-19 0:42 UTC (permalink / raw)
To: Wang, Yalin
Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
Johannes Weiner, Mel Gorman, Rik van Riel, Shaohua Li
Do not send your patch to this thread. It's second time.
Your patch is totally irrelevant to this patchset.
Send your patch as another thread.
On Wed, Mar 11, 2015 at 05:47:28PM +0800, Wang, Yalin wrote:
> In reclaim path, if encounter a freeable page,
> the try_to_unmap may fail, because the page's pte is
> dirty, we can recheck this page as normal non-freeable page,
> this means we can swap out this page into swap partition.
Pz, Pz, Pz write down more detail in description.
You mean page_check_references in shrink_page_list decided
it as freeable page but try_to_unmap failed because someone
touched the page during the race window between page_check_references
and try_to_unmap in shrink_page_list?
If so, it's surely recent referenced page so it should be promoted
to active list.
If I missed something, please write it down more detail in description
and send a patch as new thread, not sending it to this patchset thread.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH 1/4] mm: free swp_entry in madvise_free
@ 2015-03-11 1:20 Minchan Kim
2015-03-11 1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
0 siblings, 1 reply; 2+ messages in thread
From: Minchan Kim @ 2015-03-11 1:20 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Michal Hocko, Johannes Weiner,
Mel Gorman, Rik van Riel, Shaohua Li, Yalin.Wang, Minchan Kim
When I test below piece of code with 12 processes(ie, 512M * 12 = 6G consume)
on my (3G ram + 12 cpu + 8G swap, the madvise_free is siginficat slower
(ie, 2x times) than madvise_dontneed.
loop = 5;
mmap(512M);
while (loop--) {
memset(512M);
madvise(MADV_FREE or MADV_DONTNEED);
}
The reason is lots of swapin.
1) dontneed: 1,612 swapin
2) madvfree: 879,585 swapin
If we find hinted pages were already swapped out when syscall is called,
it's pointless to keep the swapped-out pages in pte.
Instead, let's free the cold page because swapin is more expensive
than (alloc page + zeroing).
With this patch, it reduced swapin from 879,585 to 1,878 so elapsed time
1) dontneed: 6.10user 233.50system 0:50.44elapsed
2) madvfree: 6.03user 401.17system 1:30.67elapsed
2) madvfree + below patch: 6.70user 339.14system 1:04.45elapsed
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/madvise.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..ebe692e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -274,7 +274,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
spinlock_t *ptl;
pte_t *pte, ptent;
struct page *page;
+ swp_entry_t entry;
unsigned long next;
+ int nr_swap = 0;
next = pmd_addr_end(addr, end);
if (pmd_trans_huge(*pmd)) {
@@ -293,8 +295,22 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
- if (!pte_present(ptent))
+ if (pte_none(ptent))
continue;
+ /*
+ * If the pte has swp_entry, just clear page table to
+ * prevent swap-in which is more expensive rather than
+ * (page allocation + zeroing).
+ */
+ if (!pte_present(ptent)) {
+ entry = pte_to_swp_entry(ptent);
+ if (non_swap_entry(entry))
+ continue;
+ nr_swap--;
+ free_swap_and_cache(entry);
+ pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+ continue;
+ }
page = vm_normal_page(vma, addr, ptent);
if (!page)
@@ -326,6 +342,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
+
+ if (nr_swap) {
+ if (current->mm == mm)
+ sync_mm_rss(mm);
+
+ add_mm_counter(mm, MM_SWAPENTS, nr_swap);
+ }
+
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
next:
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread* [PATCH 3/4] mm: move lazy free pages to inactive list
2015-03-11 1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
@ 2015-03-11 1:20 ` Minchan Kim
2015-03-11 9:47 ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
0 siblings, 1 reply; 2+ messages in thread
From: Minchan Kim @ 2015-03-11 1:20 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Michal Hocko, Johannes Weiner,
Mel Gorman, Rik van Riel, Shaohua Li, Yalin.Wang, Minchan Kim
MADV_FREE is hint that it's okay to discard pages if there is
memory pressure and we uses reclaimers(ie, kswapd and direct reclaim)
to free them so there is no worth to remain them in active anonymous LRU
so this patch moves them to inactive LRU list's head.
This means that MADV_FREE-ed pages which were living on the inactive list
are reclaimed first because they are more likely to be cold rather than
recently active pages.
A arguable issue for the approach would be whether we should put it to
head or tail in inactive list. I selected *head* because kernel cannot
make sure it's really cold or warm for every MADV_FREE usecase but
at least we know it's not *hot* so landing of inactive head would be
comprimise for various usecases.
This is fixing a suboptimal behavior of MADV_FREE when pages living on
the active list will sit there for a long time even under memory
pressure while the inactive list is reclaimed heavily. This basically
breaks the whole purpose of using MADV_FREE to help the system to free
memory which is might not be used.
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/swap.h | 1 +
mm/madvise.c | 2 ++
mm/swap.c | 35 +++++++++++++++++++++++++++++++++++
3 files changed, 38 insertions(+)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index cee108c..0428e4c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -308,6 +308,7 @@ extern void lru_add_drain_cpu(int cpu);
extern void lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_file_page(struct page *page);
+extern void deactivate_page(struct page *page);
extern void swap_setup(void);
extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/madvise.c b/mm/madvise.c
index ebe692e..22e8f0c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -340,6 +340,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
ptent = pte_mkold(ptent);
ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
+ if (PageActive(page))
+ deactivate_page(page);
tlb_remove_tlb_entry(tlb, pte, addr);
}
diff --git a/mm/swap.c b/mm/swap.c
index 5b2a605..393968c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -43,6 +43,7 @@ int page_cluster;
static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
/*
* This path almost never happens for VM activity - pages are normally
@@ -789,6 +790,23 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
update_page_reclaim_stat(lruvec, file, 0);
}
+
+static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+ void *arg)
+{
+ if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
+ int file = page_is_file_cache(page);
+ int lru = page_lru_base_type(page);
+
+ del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+ ClearPageActive(page);
+ add_page_to_lru_list(page, lruvec, lru);
+
+ __count_vm_event(PGDEACTIVATE);
+ update_page_reclaim_stat(lruvec, file, 0);
+ }
+}
+
/*
* Drain pages out of the cpu's pagevecs.
* Either "cpu" is the current CPU, and preemption has already been
@@ -815,6 +833,10 @@ void lru_add_drain_cpu(int cpu)
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
+ pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+ if (pagevec_count(pvec))
+ pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+
activate_page_drain(cpu);
}
@@ -844,6 +866,18 @@ void deactivate_file_page(struct page *page)
}
}
+void deactivate_page(struct page *page)
+{
+ if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
+ struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+ put_cpu_var(lru_deactivate_pvecs);
+ }
+}
+
void lru_add_drain(void)
{
lru_add_drain_cpu(get_cpu());
@@ -873,6 +907,7 @@ void lru_add_drain_all(void)
if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
+ pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
need_activate_page_drain(cpu)) {
INIT_WORK(work, lru_add_drain_per_cpu);
schedule_work_on(cpu, work);
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread* [RFC] mm:do recheck for freeable page in reclaim path
2015-03-11 1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
@ 2015-03-11 9:47 ` Wang, Yalin
0 siblings, 0 replies; 2+ messages in thread
From: Wang, Yalin @ 2015-03-11 9:47 UTC (permalink / raw)
To: 'Minchan Kim',
Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
Johannes Weiner, Mel Gorman, Rik van Riel, Shaohua Li
In reclaim path, if encounter a freeable page,
the try_to_unmap may fail, because the page's pte is
dirty, we can recheck this page as normal non-freeable page,
this means we can swap out this page into swap partition.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
---
mm/vmscan.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413..9930850 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1000,6 +1000,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
}
}
+recheck:
if (!force_reclaim)
references = page_check_references(page, sc,
&freeable);
@@ -1045,6 +1046,10 @@ unmap:
switch (try_to_unmap(page,
freeable ? TTU_FREE : ttu_flags)) {
case SWAP_FAIL:
+ if (freeable) {
+ freeable = false;
+ goto recheck;
+ }
goto activate_locked;
case SWAP_AGAIN:
goto keep_locked;
--
2.2.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-03-19 0:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19 0:42 [RFC] mm:do recheck for freeable page in reclaim path Minchan Kim
-- strict thread matches above, loose matches on Subject: below --
2015-03-11 1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-11 1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-03-11 9:47 ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox