linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	NeilBrown <neilb@suse.de>,
	babydr@baby-dragons.com, cl@linux-foundation.org,
	lee.schermerhorn@hp.com, apw@shadowen.org
Subject: Re: [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected)
Date: Tue, 1 Jul 2008 09:09:11 +0100	[thread overview]
Message-ID: <20080701080910.GA10865@csn.ul.ie> (raw)
In-Reply-To: <1214877439.7885.40.camel@dwillia2-linux.ch.intel.com>

(Christoph's address corrected and Andys added to cc)

On (30/06/08 18:57), Dan Williams didst pronounce:
> Hello,
> 
> Prompted by a report from a user I have bisected a performance loss
> apparently introduced by commit 54a6eb5c (mm: use two zonelist that are
> filtered by GFP mask).  The test is simple sequential writes to a 4 disk
> raid5 array.  Performance should be about 20% greater than 2.6.25 due to
> commit 8b3e6cdc (md: introduce get_priority_stripe() to improve raid456
> write performance).  The sample data below shows sporadic performance
> starting at 54a6eb5c.  The '+' indicates where I hand applied 8b3e6cdc.
> 
> revision   2.6.25.8-fc8 2.6.25.9+ dac1d27b+ 18ea7e71+ 54a6eb5c+ 2.6.26-rc1 2.6.26-rc8
>            138          168       169       167       177       149        144
>            140          168       172       170       109       138        142
>            142          165       169       164       119       138        129
>            144          168       169       171       120       139        135
>            142          165       174       166       165       122        154
> MB/s (avg) 141          167       171       168       138       137        141
> % change   0%           18%       21%       19%       -2%       -3%        0%
> result     base         good      good      good      [bad]     bad        bad
> 

That is not good at all as this patch is not a straight-forward revert but
the second time it's come under suspicion.

> Notable observations:
> 1/ This problem does not reproduce when ARCH=x86_64, i.e. 2.6.26-rc8 and
> 54a6eb5c show consistent performance at 170MB/s.

I'm very curious as to why this doesn't affect x86_64. HIGHMEM is one
possibility if GFP_KERNEL is a major factor and it has to scan over the
unusable zone a lot. However, another remote possibility is that many function
calls are more expensive on x86 than on x86_64 (this is a wild guess based
on the registers available). Spectulative patch is below.

If 8b3e6cdc is reverted from 2.6.26-rc8, what do the figures look like?
i.e. is the zonelist filtering looking like a performance regression or is
it just somehow negating the benefits of the raid patch?

> 2/ Single drive performance appears to be unaffected
> 3/ A quick test shows that raid0 performance is also sporadic:
>    2147483648 bytes (2.1 GB) copied, 7.72408 s, 278 MB/s
>    2147483648 bytes (2.1 GB) copied, 7.78478 s, 276 MB/s
>    2147483648 bytes (2.1 GB) copied, 11.0323 s, 195 MB/s
>    2147483648 bytes (2.1 GB) copied, 8.41244 s, 255 MB/s
>    2147483648 bytes (2.1 GB) copied, 30.7649 s, 69.8 MB/s
> 

Are these synced writes? i.e. is it possible the performance at the end
is dropped because memory becomes full of dirty pages at that point?

> System/Test configuration:
> (2) Intel(R) Xeon(R) CPU 5150
> mem=1024M
> CONFIG_HIGHMEM4G=y (full config attached)
> mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
> for i in `seq 1 5`; do dd if=/dev/zero of=/dev/md0 bs=1024k count=2048; done
> 
> Neil suggested CONFIG_NOHIGHMEM=y, I will give that a shot tomorrow.
> Other suggestions / experiments?
> 

There was a deporkify patch which replaced inline function with normal
functions. I worried at the time that all the function calls in an
iterator may cause a performnace problem but I couldn't measure it so
assumed the reduction in text size was a plus. This is a partial
reporkify patch that should reduce the number of function calls that
take place at the cost of larger text. Can you try it out applied
against 2.6.26-rc8 please?

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc8-clean/include/linux/mmzone.h linux-2.6.26-rc8-repork/include/linux/mmzone.h
--- linux-2.6.26-rc8-clean/include/linux/mmzone.h	2008-06-24 18:58:20.000000000 -0700
+++ linux-2.6.26-rc8-repork/include/linux/mmzone.h	2008-07-01 00:49:17.000000000 -0700
@@ -742,6 +742,15 @@ static inline int zonelist_node_idx(stru
 #endif /* CONFIG_NUMA */
 }
 
+static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
+{
+#ifdef CONFIG_NUMA
+	return node_isset(zonelist_node_idx(zref), *nodes);
+#else
+	return 1;
+#endif /* CONFIG_NUMA */
+}
+
 /**
  * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
  * @z - The cursor used as a starting point for the search
@@ -754,10 +763,26 @@ static inline int zonelist_node_idx(stru
  * search. The zoneref returned is a cursor that is used as the next starting
  * point for future calls to next_zones_zonelist().
  */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
+static inline struct zoneref *next_zones_zonelist(struct zoneref *z,
 					enum zone_type highest_zoneidx,
 					nodemask_t *nodes,
-					struct zone **zone);
+					struct zone **zone)
+{
+	/*
+	 * Find the next suitable zone to use for the allocation.
+	 * Only filter based on nodemask if it's set
+	 */
+	if (likely(nodes == NULL))
+		while (zonelist_zone_idx(z) > highest_zoneidx)
+			z++;
+	else
+		while (zonelist_zone_idx(z) > highest_zoneidx ||
+				(z->zone && !zref_in_nodemask(z, nodes)))
+			z++;
+
+	*zone = zonelist_zone(z++);
+	return z;
+}
 
 /**
  * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc8-clean/mm/mmzone.c linux-2.6.26-rc8-repork/mm/mmzone.c
--- linux-2.6.26-rc8-clean/mm/mmzone.c	2008-06-24 18:58:20.000000000 -0700
+++ linux-2.6.26-rc8-repork/mm/mmzone.c	2008-07-01 00:48:19.000000000 -0700
@@ -42,33 +42,3 @@ struct zone *next_zone(struct zone *zone
 	return zone;
 }
 
-static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
-{
-#ifdef CONFIG_NUMA
-	return node_isset(zonelist_node_idx(zref), *nodes);
-#else
-	return 1;
-#endif /* CONFIG_NUMA */
-}
-
-/* Returns the next zone at or below highest_zoneidx in a zonelist */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
-					enum zone_type highest_zoneidx,
-					nodemask_t *nodes,
-					struct zone **zone)
-{
-	/*
-	 * Find the next suitable zone to use for the allocation.
-	 * Only filter based on nodemask if it's set
-	 */
-	if (likely(nodes == NULL))
-		while (zonelist_zone_idx(z) > highest_zoneidx)
-			z++;
-	else
-		while (zonelist_zone_idx(z) > highest_zoneidx ||
-				(z->zone && !zref_in_nodemask(z, nodes)))
-			z++;
-
-	*zone = zonelist_zone(z++);
-	return z;
-}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-07-01  8:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-01  1:57 Dan Williams
2008-07-01  8:09 ` Mel Gorman [this message]
2008-07-01 17:58   ` Andy Whitcroft
2008-07-01 19:07     ` Mel Gorman
2008-07-01 20:29       ` Dan Williams
2008-07-02  5:18         ` Mel Gorman
2008-07-03  1:49           ` Dan Williams
2008-07-03  4:27             ` Mel Gorman
2008-07-03  4:43               ` Linus Torvalds
2008-07-03  5:00                 ` Mel Gorman
2008-07-03  5:54                   ` Dan Williams
2008-07-03 13:37                     ` Christoph Lameter
2008-07-03 16:36                       ` [PATCH] Do not clobber pgdat->nr_zones during memory initialisation Mel Gorman
2008-07-03 16:44                         ` Linus Torvalds
2008-07-03 16:46                           ` Linus Torvalds
2008-07-03 17:16                           ` Mel Gorman
2008-07-03 16:38                     ` [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected) Mel Gorman
2008-07-01 22:28       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080701080910.GA10865@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=apw@shadowen.org \
    --cc=babydr@baby-dragons.com \
    --cc=cl@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox