linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Ulrich Drepper <drepper@redhat.com>
Cc: Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Jakub Jelinek <jakub@redhat.com>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: missing madvise functionality
Date: Wed, 04 Apr 2007 17:46:12 +1000	[thread overview]
Message-ID: <461357C4.4010403@yahoo.com.au> (raw)
In-Reply-To: <46128051.9000609@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

Ulrich Drepper wrote:
> People might remember the thread about mysql not scaling and pointing
> the finger quite happily at glibc.  Well, the situation is not like that.
> 
> The problem is glibc has to work around kernel limitations.  If the
> malloc implementation detects that a large chunk of previously allocated
> memory is now free and unused it wants to return the memory to the
> system.  What we currently have to do is this:
> 
>   to free:      mmap(PROT_NONE) over the area
>   to reuse:     mprotect(PROT_READ|PROT_WRITE)
> 
> Yep, that's expensive, both operations need to get locks preventing
> other threads from doing the same.
> 
> Some people were quick to suggest that we simply avoid the freeing in
> many situations (that's what the patch submitted by Yanmin Zhang
> basically does).  That's no solution.  One of the very good properties
> of the current allocator is that it does not use much memory.

Does mmap(PROT_NONE) actually free the memory?


> A solution for this problem is a madvise() operation with the following
> property:
> 
>   - the content of the address range can be discarded
> 
>   - if an access to a page in the range happens in the future it must
>     succeed.  The old page content can be provided or a new, empty page
>     can be provided
> 
> That's it.  The current MADV_DONTNEED doesn't cut it because it zaps the
> pages, causing *all* future reuses to create page faults.  This is what
> I guess happens in the mysql test case where the pages where unused and
> freed but then almost immediately reused.  The page faults erased all
> the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
> calls.

Two questions.

In the case of pages being unused then almost immediately reused, why is
it a bad solution to avoid freeing? Is it that you want to avoid
heuristics because in some cases they could fail and end up using memory?

Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault
than a syscall? (including the cost of the TLB fill for the memory access
after the syscall, of course).

zapping the pages puts them on a nice LIFO cache hot list of pages that
can be quickly used when the next fault comes in, or used for any other
allocation in the kernel. Putting them on some sort of reclaim list seems
a bit pointless.

Oh, also: something like this patch would help out MADV_DONTNEED, as it
means it can run concurrently with page faults. I think the locking will
work (but needs forward porting).

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: madv-mmap_sem.patch --]
[-- Type: text/plain, Size: 1305 bytes --]

Index: linux-2.6/mm/madvise.c
===================================================================
--- linux-2.6.orig/mm/madvise.c
+++ linux-2.6/mm/madvise.c
@@ -12,6 +12,25 @@
 #include <linux/hugetlb.h>
 
 /*
+ * Any behaviour which results in changes to the vma->vm_flags needs to
+ * take mmap_sem for writing. Others, which simply traverse vmas, need
+ * to only take it for reading.
+ */
+static int madvise_need_mmap_write(int behavior)
+{
+	switch (behavior) {
+	case MADV_DOFORK:
+	case MADV_DONTFORK:
+	case MADV_NORMAL:
+	case MADV_SEQUENTIAL:
+	case MADV_RANDOM:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
+/*
  * We can potentially split a vm area into separate
  * areas, each area with its own behavior.
  */
@@ -264,7 +283,10 @@ asmlinkage long sys_madvise(unsigned lon
 	int error = -EINVAL;
 	size_t len;
 
-	down_write(&current->mm->mmap_sem);
+	if (madvise_need_mmap_write(behavior))
+		down_write(&current->mm->mmap_sem);
+	else
+		down_read(&current->mm->mmap_sem);
 
 	if (start & ~PAGE_MASK)
 		goto out;
@@ -323,6 +345,10 @@ asmlinkage long sys_madvise(unsigned lon
 		vma = prev->vm_next;
 	}
 out:
-	up_write(&current->mm->mmap_sem);
+	if (madvise_need_mmap_write(behavior))
+		up_write(&current->mm->mmap_sem);
+	else
+		up_read(&current->mm->mmap_sem);
+
 	return error;
 }

  parent reply	other threads:[~2007-04-04  7:46 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <46128051.9000609@redhat.com>
     [not found] ` <p73648dz5oa.fsf@bingen.suse.de>
     [not found]   ` <46128CC2.9090809@redhat.com>
     [not found]     ` <20070403172841.GB23689@one.firstfloor.org>
2007-04-03 19:59       ` Andrew Morton
2007-04-03 20:09         ` Andi Kleen
2007-04-03 20:17         ` Ulrich Drepper
2007-04-03 20:29           ` Jakub Jelinek
2007-04-03 20:38             ` Rik van Riel
2007-04-03 21:49             ` Andrew Morton
2007-04-03 23:01               ` Eric Dumazet
2007-04-04  2:22                 ` Nick Piggin
2007-04-04  5:41                   ` Eric Dumazet
2007-04-04  6:09                     ` [patches] threaded vma patches (was Re: missing madvise functionality) Nick Piggin
2007-04-04  6:26                       ` Andrew Morton
2007-04-04  6:38                         ` Nick Piggin
2007-04-04  6:42                       ` Ulrich Drepper
2007-04-04  6:44                         ` Nick Piggin
2007-04-04  6:50                         ` Eric Dumazet
2007-04-04  6:54                           ` Ulrich Drepper
2007-04-04  7:33                             ` Eric Dumazet
2007-04-04  8:25                   ` missing madvise functionality Peter Zijlstra
2007-04-04  8:55                     ` Nick Piggin
2007-04-04  9:12                       ` William Lee Irwin III
2007-04-04  9:23                         ` Nick Piggin
2007-04-04  9:34                       ` Eric Dumazet
2007-04-04  9:45                         ` Nick Piggin
2007-04-04 10:05                         ` Nick Piggin
2007-04-04 11:54                           ` Eric Dumazet
2007-04-05  2:01                             ` Nick Piggin
2007-04-05  6:09                               ` Eric Dumazet
2007-04-05  6:19                                 ` Ulrich Drepper
2007-04-05  6:54                                   ` Eric Dumazet
2007-04-03 23:02               ` Andrew Morton
2007-04-04  9:15                 ` Hugh Dickins
2007-04-04 14:55                   ` Rik van Riel
2007-04-04 15:25                     ` Hugh Dickins
2007-04-05  1:44                       ` Nick Piggin
2007-04-04 18:04                   ` Andrew Morton
2007-04-04 18:08                     ` Rik van Riel
2007-04-04 20:56                       ` Andrew Morton
2007-04-04 18:39                     ` Hugh Dickins
2007-04-03 23:44               ` Andrew Morton
2007-04-04 13:09             ` William Lee Irwin III
2007-04-04 13:38               ` William Lee Irwin III
2007-04-04 18:51               ` Andrew Morton
2007-04-05  4:14                 ` William Lee Irwin III
2007-04-04 23:00             ` preemption and rwsems (was: Re: missing madvise functionality) Andrew Morton
2007-04-05  7:31             ` missing madvise functionality Rik van Riel
2007-04-05  7:39               ` Rik van Riel
2007-04-05  8:32                 ` Andrew Morton
2007-04-05 15:47                   ` Rik van Riel
2007-04-05  8:08               ` Eric Dumazet
2007-04-05  8:31                 ` Rik van Riel
2007-04-05  9:06                   ` Eric Dumazet
2007-04-05  9:45               ` Jakub Jelinek
2007-04-05 16:15                 ` Rik van Riel
2007-04-05 16:10               ` Ulrich Drepper
2007-04-06  2:28                 ` Nick Piggin
2007-04-06  2:52                   ` Ulrich Drepper
2007-04-06  2:59                     ` Nick Piggin
2007-04-05 12:48             ` preemption and rwsems (was: Re: missing madvise functionality) David Howells
2007-04-05 19:11               ` Ingo Molnar
2007-04-05 20:37                 ` Andrew Morton
2007-04-06  9:08                   ` Ingo Molnar
2007-04-06 19:30                     ` Andrew Morton
2007-04-06 19:40                       ` Ingo Molnar
2007-04-05 19:27               ` Andrew Morton
2007-04-03 20:51           ` missing madvise functionality Andrew Morton
2007-04-03 20:57             ` Ulrich Drepper
2007-04-03 21:00             ` Rik van Riel
2007-04-03 21:10               ` Eric Dumazet
2007-04-03 21:12                 ` Jörn Engel
2007-04-03 21:15                 ` Rik van Riel
2007-04-03 21:30                   ` Eric Dumazet
2007-04-03 21:22                 ` Jeremy Fitzhardinge
2007-04-03 21:29                   ` Rik van Riel
2007-04-03 21:46                 ` Ulrich Drepper
2007-04-03 22:51                   ` Andi Kleen
2007-04-03 23:07                     ` Ulrich Drepper
2007-04-03 21:16               ` Andrew Morton
2007-04-04 18:49             ` Anton Blanchard
2007-04-04  7:46 ` Nick Piggin [this message]
2007-04-04  8:04   ` Nick Piggin
2007-04-04  8:20   ` Jakub Jelinek
2007-04-04  8:47     ` Nick Piggin
2007-04-05  4:23       ` Nick Piggin
2007-04-05 18:38   ` Rik van Riel
2007-04-05 21:07     ` Andrew Morton
2007-04-05 21:39       ` Rik van Riel
2007-04-06  1:28     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=461357C4.4010403@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=drepper@redhat.com \
    --cc=jakub@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox