linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC V3] mm: change mm_advise_free to clear page dirty
@ 2015-03-03  3:25 Minchan Kim
  2015-03-03  3:59 ` Wang, Yalin
  2015-03-05 15:35 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Minchan Kim @ 2015-03-03  3:25 UTC (permalink / raw)
  To: Wang, Yalin
  Cc: 'Michal Hocko', 'Andrew Morton',
	'linux-kernel@vger.kernel.org',
	'linux-mm@kvack.org', 'Rik van Riel',
	'Johannes Weiner', 'Mel Gorman',
	'Shaohua Li',
	Hugh Dickins, Cyrill Gorcunov

Could you separte this patch in this patchset thread?
It's tackling differnt problem.

As well, I had a question to previous thread about why shared page
has a problem now but you didn't answer and send a new patchset.
It makes reviewers/maintainer time waste/confuse. Please, don't
hurry to send a code. Before that, resolve reviewers's comments.

On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
> This patch add ClearPageDirty() to clear AnonPage dirty flag,
> if not clear page dirty for this anon page, the page will never be
> treated as freeable. We also make sure the shared AnonPage is not
> freeable, we implement it by dirty all copyed AnonPage pte,
> so that make sure the Anonpage will not become freeable, unless
> all process which shared this page call madvise_free syscall.

Please, spend more time to make description clear. I really doubt
who understand this description without code inspection. :(
Of course, I'm not a person to write description clear like native
, either but just I'm sure I spend a more time to write description
rather than coding, at least. :)

> 
> Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
> ---
>  mm/madvise.c | 16 +++++++++-------
>  mm/memory.c  | 12 ++++++++++--
>  2 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6d0fcb8..b61070d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -297,23 +297,25 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			continue;
>  
>  		page = vm_normal_page(vma, addr, ptent);
> -		if (!page)
> +		if (!page || !trylock_page(page))
>  			continue;
>  
>  		if (PageSwapCache(page)) {
> -			if (!trylock_page(page))
> -				continue;
> -
>  			if (!try_to_free_swap(page)) {
>  				unlock_page(page);
>  				continue;
>  			}
> -
> -			ClearPageDirty(page);
> -			unlock_page(page);
>  		}
>  
>  		/*
> +		 * we clear page dirty flag for AnonPage, no matter if this
> +		 * page is in swapcahce or not, AnonPage not in swapcache also set
> +		 * dirty flag sometimes, this happened when a AnonPage is removed
> +		 * from swapcahce by try_to_free_swap()
> +		 */
> +		ClearPageDirty(page);
> +		unlock_page(page);
> +		/*

Parent:

ptrP = malloc();
*ptrP = 'a';
fork(); -> child process pte has dirty by your patch
..
memory pressure -> So, swapped out the page.
..
..
Child: var = *ptrP; assert(var =='a') -> So, swapin happens and child has pte_clean
parent: var = *ptrP; aasert(var == 'a') -> So, swapin happens and parent has pte_clean
..
..
Parent:
madvise_free -> remove PageDirty
So, both parent and child has pte_clean and !PageDirty, which
is target for VM to discard a page.
..
VM discard the page by memory pressure.
..
Child: var = *ptrP: assert(var == 'a'); <---- oops.

And blindly ClearPageDirty makes duplicates swap out.

>  		 * Some of architecture(ex, PPC) don't update TLB
>  		 * with set_pte_at and tlb_remove_tlb_entry so for
>  		 * the portability, remap the pte with old|clean
> diff --git a/mm/memory.c b/mm/memory.c
> index 8068893..3d949b3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	if (page) {
>  		get_page(page);
>  		page_dup_rmap(page);
> -		if (PageAnon(page))
> +		if (PageAnon(page)) {
> +			/*
> +			 * we dirty the copyed pte for anon page,
> +			 * this is useful for madvise_free_pte_range(),
> +			 * this can prevent shared anon page freed by madvise_free
> +			 * syscall
> +			 */
> +			pte = pte_mkdirty(pte);

It made every MADV_FREE hinted page void. IOW, if a process called MADV_FREE
calls fork, VM cannot discard pages if child doesn't free pages or calls madvise_free.
Then, if parent calls madvise_free before fork, we couldn't free those pages.
IOW, you are ignoring below example.

parent:
ptr1 = malloc(len);
        -> allocator calls mmap(len);
memset(ptr1, 'a', len);
free(ptr1);
        -> allocator calls madvise_free(ptr1, len);
fork();
..
..
        -> VM discard hinted pages
child:

ptr2 = malloc(len)
        -> allocator reuses the chunk allocated from parent.
so, child will see zero pages from ptr2 but he doesn't write
anything so garbage|zero page anything is okay to him.

As well, you are adding new instructions in fork which is very frequent syscall
so I'd like to find another way to avoid adding instructions in such hot path.

I will send different patch. Please review it.

So, my suggestion is below. It always makes pte dirty so let's Cc
Cyrill to take care of softdirty and Hugh who is Mr.Swap.

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [PATCH RFC 1/4] mm: throttle MADV_FREE
@ 2015-02-24  8:18 Minchan Kim
  2015-02-24 15:43 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2015-02-24  8:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Michal Hocko,
	Johannes Weiner, Mel Gorman, Shaohua Li, Yalin.Wang, Minchan Kim

Recently, Shaohua reported that MADV_FREE is much slower than
MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
applications went to stall with direct reclaim since kswapd's
reclaim speed isn't fast than applications's allocation speed
so that it causes lots of stall and lock contention.

This patch throttles MADV_FREEing so it works only if there
are enough pages in the system which will not trigger backgroud/
direct reclaim. Otherwise, MADV_FREE falls back to MADV_DONTNEED
because there is no point to delay freeing if we know system
is under memory pressure.

When I test the patch on my 3G machine + 12 CPU + 8G swap,
test: 12 processes

loop = 5;
mmap(512M);
while (loop--) {
	memset(512M);
	madvise(MADV_FREE or MADV_DONTNEED);
}

1) dontneed: 6.78user 234.09system 0:48.89elapsed
2) madvfree: 6.03user 401.17system 1:30.67elapsed
3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapsed

It's clearly win.

Reported-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/madvise.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8921c2..81bb26ecf064 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -523,8 +523,17 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		 * XXX: In this implementation, MADV_FREE works like
 		 * MADV_DONTNEED on swapless system or full swap.
 		 */
-		if (get_nr_swap_pages() > 0)
-			return madvise_free(vma, prev, start, end);
+		if (get_nr_swap_pages() > 0) {
+			unsigned long threshold;
+			/*
+			 * If we have trobule with memory pressure(ie,
+			 * under high watermark), free pages instantly.
+			 */
+			threshold = min_free_kbytes >> (PAGE_SHIFT - 10);
+			threshold = threshold + (threshold >> 1);
+			if (nr_free_pages() > threshold)
+				return madvise_free(vma, prev, start, end);
+		}
 		/* passthrough */
 	case MADV_DONTNEED:
 		return madvise_dontneed(vma, prev, start, end);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-09  0:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03  3:25 [RFC V3] mm: change mm_advise_free to clear page dirty Minchan Kim
2015-03-03  3:59 ` Wang, Yalin
2015-03-03  4:14   ` Minchan Kim
2015-03-03  6:46     ` Wang, Yalin
2015-03-03 13:40       ` Minchan Kim
2015-03-05 15:35 ` Michal Hocko
2015-03-09  0:57   ` Minchan Kim
  -- strict thread matches above, loose matches on Subject: below --
2015-02-24  8:18 [PATCH RFC 1/4] mm: throttle MADV_FREE Minchan Kim
2015-02-24 15:43 ` Michal Hocko
2015-02-25  0:08   ` Minchan Kim
2015-02-27  3:37     ` [RFC] mm: change mm_advise_free to clear page dirty Wang, Yalin
2015-02-27 21:02       ` Michal Hocko
2015-02-28  2:11         ` Wang, Yalin
2015-02-28  6:01           ` [RFC V2] " Wang, Yalin
2015-03-02 12:38             ` Michal Hocko
2015-03-03  2:06               ` [RFC V3] " Wang, Yalin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox