linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 08/11] Add /proc trigger for memory compaction
Date: Fri, 26 Mar 2010 10:46:18 +0000	[thread overview]
Message-ID: <20100326104618.GZ2024@csn.ul.ie> (raw)
In-Reply-To: <20100324133351.c7730969.akpm@linux-foundation.org>

On Wed, Mar 24, 2010 at 01:33:51PM -0700, Andrew Morton wrote:
> On Tue, 23 Mar 2010 12:25:43 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > This patch adds a proc file /proc/sys/vm/compact_memory. When an arbitrary
> > value is written to the file, all zones are compacted. The expected user
> > of such a trigger is a job scheduler that prepares the system before the
> > target application runs.
> > 
> >
> > ...
> >
> > +/* This is the entry point for compacting all nodes via /proc/sys/vm */
> > +int sysctl_compaction_handler(struct ctl_table *table, int write,
> > +			void __user *buffer, size_t *length, loff_t *ppos)
> > +{
> > +	if (write)
> > +		return compact_nodes();
> > +
> > +	return 0;
> > +}
> 
> Neato.  When I saw the overall description I was afraid that this stuff
> would be fiddling with kernel threads.
> 

Not yet anyway. It has been floated in the past to have a kcompactd
similar to kswapd but right now there is no justification for it. Like
other suggestions made in the past, it has potential but needs data to
justify.

> The underlying compaction code can at times cause rather large amounts
> of memory to be put onto private lists, so it's lost to the rest of the
> kernel.  What happens if 10000 processes simultaneously write to this
> thing?  It's root-only so I guess the answer is "root becomes unemployed".
> 

Well, root becomes unemployed but I shouldn't be supplying the rope.
Lets keep min_free_kbytes as the "fall off the cliff" tunable. I added
too_many_isolated()-like logic and also handling of fatal signals.

> I fear that the overall effect of this feature is that people will come
> up with ghastly hacks which keep on poking this tunable as a workaround
> for some VM shortcoming.  This will lead to more shortcomings, and
> longer-lived ones.
> 

That would be very unfortunate and also a self-defeating measure in the short
run, let alone the long run.  I consider the tunable to be more like the
"drop_caches" tunable. It can be used for good or bad and all the bad uses
kick you in the ass because it does not resolve the underlying problem and
is expensive to use.

I had three legit uses in mind for it

1. Batch-systems that compact memory before a job is scheduler to reduce
   start-up time of applications using huge pages. Depending on their
   setup, sysfs might be a better fit for them

2. Illustrate a bug in direct compaction. i.e. I'd get a report on some
   allocation failure that was consistent but when the tunable is poked,
   it works perfectly

3. Development uses. Measuring worst-case scenarios for compaction (rare
   obviously), stress testing compaction to try catch bugs in migration
   and measuring how effective compaction currently is.

Do these justify the existance of the tunable or is the risk of abuse
too high?

This is what the isolate logic looks like


diff --git a/mm/compaction.c b/mm/compaction.c
index e0e8100..a6a6958 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -13,6 +13,7 @@
 #include <linux/mm_inline.h>
 #include <linux/sysctl.h>
 #include <linux/sysfs.h>
+#include <linux/backing-dev.h>
 #include "internal.h"
 
 /*
@@ -197,6 +198,20 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
 }
 
+/* Similar to reclaim, but different enough that they don't share logic */
+static int too_many_isolated(struct zone *zone)
+{
+
+	unsigned long inactive, isolated;
+
+	inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
+					zone_page_state(zone, NR_INACTIVE_ANON);
+	isolated = zone_page_state(zone, NR_ISOLATED_FILE) +
+					zone_page_state(zone, NR_ISOLATED_ANON);
+
+	return isolated > inactive;
+}
+
 /*
  * Isolate all pages that can be migrated from the block pointed to by
  * the migrate scanner within compact_control.
@@ -223,6 +238,14 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		return 0;
 	}
 
+	/* Do not isolate the world */
+	while (unlikely(too_many_isolated(zone))) {
+		congestion_wait(BLK_RW_ASYNC, HZ/10);
+
+		if (fatal_signal_pending(current))
+			return 0;
+	}
+
 	/* Time to isolate some pages for migration */
 	spin_lock_irq(&zone->lru_lock);
 	for (; low_pfn < end_pfn; low_pfn++) {
@@ -309,6 +332,9 @@ static int compact_finished(struct zone *zone,
 	unsigned int order;
 	unsigned long watermark = low_wmark_pages(zone) + (1 << cc->order);
 
+	if (fatal_signal_pending(current))
+		return COMPACT_PARTIAL;
+
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (cc->free_pfn <= cc->migrate_pfn)
 		return COMPACT_COMPLETE;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-03-26 10:46 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-23 12:25 [PATCH 0/11] Memory Compaction v5 Mel Gorman
2010-03-23 12:25 ` [PATCH 01/11] mm,migration: Take a reference to the anon_vma before migrating Mel Gorman
2010-03-23 12:25 ` [PATCH 02/11] mm,migration: Do not try to migrate unmapped anonymous pages Mel Gorman
2010-03-23 17:22   ` Christoph Lameter
2010-03-23 18:04     ` Mel Gorman
2010-03-23 12:25 ` [PATCH 03/11] mm: Share the anon_vma ref counts between KSM and page migration Mel Gorman
2010-03-23 17:25   ` Christoph Lameter
2010-03-23 23:55   ` KAMEZAWA Hiroyuki
2010-03-23 12:25 ` [PATCH 04/11] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove Mel Gorman
2010-03-23 12:25 ` [PATCH 05/11] Export unusable free space index via /proc/unusable_index Mel Gorman
2010-03-23 17:31   ` Christoph Lameter
2010-03-23 18:14     ` Mel Gorman
2010-03-24  0:03   ` KAMEZAWA Hiroyuki
2010-03-24  0:16     ` Minchan Kim
2010-03-24  0:13       ` KAMEZAWA Hiroyuki
2010-03-24 10:25     ` Mel Gorman
2010-03-23 12:25 ` [PATCH 06/11] Export fragmentation index via /proc/extfrag_index Mel Gorman
2010-03-23 17:37   ` Christoph Lameter
2010-03-23 12:25 ` [PATCH 07/11] Memory compaction core Mel Gorman
2010-03-23 17:56   ` Christoph Lameter
2010-03-23 18:15     ` Mel Gorman
2010-03-23 18:33       ` Christoph Lameter
2010-03-23 18:58         ` Mel Gorman
2010-03-23 19:20           ` Christoph Lameter
2010-03-24  1:03   ` KAMEZAWA Hiroyuki
2010-03-24  1:47     ` Minchan Kim
2010-03-24  1:53       ` KAMEZAWA Hiroyuki
2010-03-24  2:10         ` Minchan Kim
2010-03-24 10:57           ` Mel Gorman
2010-03-24 20:33   ` Andrew Morton
2010-03-24 20:59     ` Jonathan Corbet
2010-03-24 21:14       ` Andrew Morton
2010-03-24 21:19         ` Christoph Lameter
2010-03-24 21:19       ` Andrea Arcangeli
2010-03-24 21:28         ` Jonathan Corbet
2010-03-24 21:47           ` Andrea Arcangeli
2010-03-24 21:54             ` Jonathan Corbet
2010-03-24 22:06               ` Andrea Arcangeli
2010-03-24 21:57             ` Andrea Arcangeli
2010-03-25  9:13     ` Mel Gorman
2010-03-23 12:25 ` [PATCH 08/11] Add /proc trigger for memory compaction Mel Gorman
2010-03-23 18:25   ` Christoph Lameter
2010-03-23 18:32     ` Mel Gorman
2010-03-24 20:33   ` Andrew Morton
2010-03-26 10:46     ` Mel Gorman [this message]
2010-03-23 12:25 ` [PATCH 09/11] Add /sys trigger for per-node " Mel Gorman
2010-03-23 18:27   ` Christoph Lameter
2010-03-23 22:45   ` Minchan Kim
2010-03-24  0:19   ` KAMEZAWA Hiroyuki
2010-03-23 12:25 ` [PATCH 10/11] Direct compact when a high-order allocation fails Mel Gorman
2010-03-23 23:10   ` Minchan Kim
2010-03-24 11:11     ` Mel Gorman
2010-03-24 11:59       ` Minchan Kim
2010-03-24 12:06         ` Minchan Kim
2010-03-24 12:10           ` Mel Gorman
2010-03-24 12:09         ` Mel Gorman
2010-03-24 12:25           ` Minchan Kim
2010-03-24  1:19   ` KAMEZAWA Hiroyuki
2010-03-24 11:40     ` Mel Gorman
2010-03-25  0:30       ` KAMEZAWA Hiroyuki
2010-03-25  9:48         ` Mel Gorman
2010-03-25  9:50           ` KAMEZAWA Hiroyuki
2010-03-25 10:16             ` Mel Gorman
2010-03-26  1:03               ` KAMEZAWA Hiroyuki
2010-03-26  9:40                 ` Mel Gorman
2010-03-24 20:48   ` Andrew Morton
2010-03-25  0:57     ` KAMEZAWA Hiroyuki
2010-03-25 10:21     ` Mel Gorman
2010-03-23 12:25 ` [PATCH 11/11] Do not compact within a preferred zone after a compaction failure Mel Gorman
2010-03-23 18:31   ` Christoph Lameter
2010-03-23 18:39     ` Mel Gorman
2010-03-23 19:27       ` Christoph Lameter
2010-03-24 10:37         ` Mel Gorman
2010-03-24 19:54           ` Christoph Lameter
2010-03-24 20:53   ` Andrew Morton
2010-03-25  9:40     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2010-03-12 16:41 [PATCH 0/11] Memory Compaction v4 Mel Gorman
2010-03-12 16:41 ` [PATCH 08/11] Add /proc trigger for memory compaction Mel Gorman
2010-03-17  3:18   ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100326104618.GZ2024@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox