[PATCH] /dev/zero page fault scaling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] /dev/zero page fault scaling
@ 2004-07-14 19:27 Brent Casavant
  2004-07-14 20:39 ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Brent Casavant @ 2004-07-14 19:27 UTC (permalink / raw)
  To: linux-mm

As discussed earlier this week on the linux-mm list, there are some
scaling issues with the sbinfo stat_lock in mm/shmem.c.  In particular,
bouncing the corresponding cache-line between CPUs in a large machine
causes a dramatic slowdown in page fault performance.

However, the superblock statistics being kept for the /dev/zero use
of this code are unnecessary, and I don't even think there's a way
to obtain them.  The attached patch causes the relevant sections of
code to skip the locks and statistic updates for /dev/zero, causing
a significant speedup.

In a test program to measure the page fault performance, at 256P we
see a 150x improvement in the number of page faults per cpu per
wall-clock second (and other similar measures).  Page fault performance
drops by about 50% at 512P compared to 256P, however this is likely
a seperate problem (investigation has not started), but is still
138x better than before these changes.

I'm not sure if this list is the appropriate place to submit these
changes.  If not, please direct me to the correct lists/people to
submit this to.  The patch is against 2.6.(something recent, maybe 7).

Signed-off-by: Brent Casavant <bcasavan@sgi.com>

--- linux.orig/mm/shmem.c	2004-07-13 17:20:34.000000000 -0500
+++ linux/mm/shmem.c	2004-07-13 17:09:32.000000000 -0500
@@ -60,6 +60,7 @@
 /* info->flags needs VM_flags to handle pagein/truncate races efficiently */
 #define SHMEM_PAGEIN	 VM_READ
 #define SHMEM_TRUNCATE	 VM_WRITE
+#define SHMEM_NOSBINFO	 VM_EXEC

 /* Pretend that each entry is of this size in directory's i_size */
 #define BOGO_DIRENT_SIZE 20
@@ -185,6 +186,9 @@
 static void shmem_free_block(struct inode *inode)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
+
+	if (SHMEM_I(inode)->flags & SHMEM_NOSBINFO)
+		return;
 	spin_lock(&sbinfo->stat_lock);
 	sbinfo->free_blocks++;
 	inode->i_blocks -= BLOCKS_PER_PAGE;
@@ -213,11 +217,14 @@
 	if (freed > 0) {
 		struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 		info->alloced -= freed;
+		shmem_unacct_blocks(info->flags, freed);
+
+		if (info->flags & SHMEM_NOSBINFO)
+			return;
 		spin_lock(&sbinfo->stat_lock);
 		sbinfo->free_blocks += freed;
 		inode->i_blocks -= freed*BLOCKS_PER_PAGE;
 		spin_unlock(&sbinfo->stat_lock);
-		shmem_unacct_blocks(info->flags, freed);
 	}
 }

@@ -351,14 +358,16 @@
 		 * page (and perhaps indirect index pages) yet to allocate:
 		 * a waste to allocate index if we cannot allocate data.
 		 */
-		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks <= 1) {
+		if (!(info->flags & SHMEM_NOSBINFO)) {
+			spin_lock(&sbinfo->stat_lock);
+			if (sbinfo->free_blocks <= 1) {
+				spin_unlock(&sbinfo->stat_lock);
+				return ERR_PTR(-ENOSPC);
+			}
+			sbinfo->free_blocks--;
+			inode->i_blocks += BLOCKS_PER_PAGE;
 			spin_unlock(&sbinfo->stat_lock);
-			return ERR_PTR(-ENOSPC);
 		}
-		sbinfo->free_blocks--;
-		inode->i_blocks += BLOCKS_PER_PAGE;
-		spin_unlock(&sbinfo->stat_lock);

 		spin_unlock(&info->lock);
 		page = shmem_dir_alloc(mapping_gfp_mask(inode->i_mapping));
@@ -1002,16 +1005,24 @@
 	} else {
 		shmem_swp_unmap(entry);
 		sbinfo = SHMEM_SB(inode->i_sb);
-		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks == 0 || shmem_acct_block(info->flags)) {
+		if (!(info->flags & SHMEM_NOSBINFO)) {
+			spin_lock(&sbinfo->stat_lock);
+			if (sbinfo->free_blocks == 0 || shmem_acct_block(info->flags)) {
+				spin_unlock(&sbinfo->stat_lock);
+				spin_unlock(&info->lock);
+				error = -ENOSPC;
+				goto failed;
+			}
+			sbinfo->free_blocks--;
+			inode->i_blocks += BLOCKS_PER_PAGE;
 			spin_unlock(&sbinfo->stat_lock);
-			spin_unlock(&info->lock);
-			error = -ENOSPC;
-			goto failed;
+		} else {
+			if (shmem_acct_block(info->flags)) {
+				spin_unlock(&info->lock);
+				error = -ENOSPC;
+				goto failed;
+			}
 		}
-		sbinfo->free_blocks--;
-		inode->i_blocks += BLOCKS_PER_PAGE;
-		spin_unlock(&sbinfo->stat_lock);

 		if (!filepage) {
 			spin_unlock(&info->lock);
@@ -2032,6 +2049,7 @@
 	struct inode *inode;
 	struct dentry *dentry, *root;
 	struct qstr this;
+	struct shmem_inode_info *info;

 	if (IS_ERR(shm_mnt))
 		return (void *)shm_mnt;
@@ -2061,7 +2079,11 @@
 	if (!inode)
 		goto close_file;

-	SHMEM_I(inode)->flags = flags & VM_ACCOUNT;
+	info = SHMEM_I(inode);
+	info->flags = flags & VM_ACCOUNT;
+	if (0 == strcmp("dev/zero", name)) {
+		info->flags |= SHMEM_NOSBINFO;
+	}
 	d_instantiate(dentry, inode);
 	inode->i_size = size;
 	inode->i_nlink = 0;	/* It is unlinked */

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-14 19:27 [PATCH] /dev/zero page fault scaling Brent Casavant
@ 2004-07-14 20:39 ` Hugh Dickins
  2004-07-14 21:31   ` Brent Casavant
  2004-07-15 16:28   ` Brent Casavant
  0 siblings, 2 replies; 10+ messages in thread
From: Hugh Dickins @ 2004-07-14 20:39 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-mm

On Wed, 14 Jul 2004, Brent Casavant wrote:
> 
> In a test program to measure the page fault performance, at 256P we
> see a 150x improvement in the number of page faults per cpu per
> wall-clock second (and other similar measures).  Page fault performance
> drops by about 50% at 512P compared to 256P, however this is likely
> a seperate problem (investigation has not started), but is still
> 138x better than before these changes.

Wow.  Good work.

> I'm not sure if this list is the appropriate place to submit these
> changes.  If not, please direct me to the correct lists/people to
> submit this to.  The patch is against 2.6.(something recent, maybe 7).

This list'll do fine.  I'm the (unlisted) tmpfs maintainer, I'll give
your patch a go tomorrow, and try to convert it to NULL sbinfo as I
mentioned.  I'll send you back the result, but won't send it on to
Andrew thence Linus for a couple of weeks, until after the Ottawa
Linux Symposium.

Thanks a lot!
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-14 20:39 ` Hugh Dickins
@ 2004-07-14 21:31   ` Brent Casavant
  2004-07-15 16:28   ` Brent Casavant
  1 sibling, 0 replies; 10+ messages in thread
From: Brent Casavant @ 2004-07-14 21:31 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

On Wed, 14 Jul 2004, Hugh Dickins wrote:

> On Wed, 14 Jul 2004, Brent Casavant wrote:

> Wow.  Good work.

*blush*  Thanks.  Though of course credit for the observation about
not needing to track this info for /dev/null goes to Jack Steiner.

> > I'm not sure if this list is the appropriate place to submit these
> > changes.
>
> This list'll do fine.  I'm the (unlisted) tmpfs maintainer, I'll give
> your patch a go tomorrow, and try to convert it to NULL sbinfo as I
> mentioned.  I'll send you back the result, but won't send it on to
> Andrew thence Linus for a couple of weeks, until after the Ottawa
> Linux Symposium.

Thank you.  Actually I'm going to be *very* interested to see how the
NULL sbinfo works.  There's so much I don't understand yet.

Oh, one thing you might want to double-check in my patch: I not
only avoided updating free_blocks, but also i_blocks, since it
was under the same lock and always updated at the same time as
free_blocks.  I didn't see any problems with this from my testing,
but I also wasn't 100% sure that was the correct thing to do.  If
it's not correct we still have a problem as the i_blocks cacheline
would then need to ping-pong around the machine.

Thanks,
Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-14 20:39 ` Hugh Dickins
  2004-07-14 21:31   ` Brent Casavant
@ 2004-07-15 16:28   ` Brent Casavant
  2004-07-15 20:28     ` Hugh Dickins
  1 sibling, 1 reply; 10+ messages in thread
From: Brent Casavant @ 2004-07-15 16:28 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

On Wed, 14 Jul 2004, Hugh Dickins wrote:

> This list'll do fine.  I'm the (unlisted) tmpfs maintainer, I'll give
> your patch a go tomorrow, and try to convert it to NULL sbinfo as I
> mentioned.  I'll send you back the result, but won't send it on to
> Andrew thence Linus for a couple of weeks, until after the Ottawa
> Linux Symposium.

Hmm.  There's more of the same lurking in here.  I moved on to the next
page fault scaling problem on the list, namely with SysV shared memory
segments.  I'll give you one guess which cacheline is the culprit in that
case.

Unless I'm mistaken, we don't need to track sbinfo for SysV segments
either.  So my next task is to figure out how to turn on the SHMEM_NOSBINFO
bit for that case as well.

So there may be a new patch coming in the next day or two.

Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-15 16:28   ` Brent Casavant
@ 2004-07-15 20:28     ` Hugh Dickins
  2004-07-15 21:36       ` Brent Casavant
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hugh Dickins @ 2004-07-15 20:28 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-mm

On Thu, 15 Jul 2004, Brent Casavant wrote:
> 
> Hmm.  There's more of the same lurking in here.  I moved on to the next
> page fault scaling problem on the list, namely with SysV shared memory
> segments.  I'll give you one guess which cacheline is the culprit in that
> case.
> 
> Unless I'm mistaken, we don't need to track sbinfo for SysV segments
> either.  So my next task is to figure out how to turn on the SHMEM_NOSBINFO
> bit for that case as well.

You should find that you've already fixed that one with your first patch:
the shared writable /dev/zero mappings and the SysV shared memory live in
the same internal mount.  But if you find that the SysV shm is still a
problem with your patch, then either I'm confused or your patch is wrong.

By the way, just curious, but can you tell us what it is that is making
so much use of the shared writable /dev/zero mappings?  I thought they
were just an odd corner not used very much; ordinary anonymous memory
doesn't involve tmpfs objects.

And in earlier mail you wrote:
> 
> Oh, one thing you might want to double-check in my patch: I not
> only avoided updating free_blocks, but also i_blocks, since it
> was under the same lock and always updated at the same time as
> free_blocks.  I didn't see any problems with this from my testing,
> but I also wasn't 100% sure that was the correct thing to do.  If
> it's not correct we still have a problem as the i_blocks cacheline
> would then need to ping-pong around the machine.

You're okay to avoid updating i_blocks too: if you look at the simple
fs functions in fs/libfs.c (used by ramfs for one), you'll find that
it's acceptable for simple filesystems to leave i_blocks, like f_bfree
etc, unsupported at 0.  Though the i_blocks cacheline itself shouldn't
have been much of a problem, since it's per-inode not per-sb: your
problem would again be with the sbinfo cacheline we use to lock it
(and, if we did want to update i_blocks, that could easily be changed
to the shmem info lock instead: it was just convenient for me to update
it at the same time and under the same protection as free_blocks).

Your patch, by the way, seemed to be against 2.6.8-rc1 or 2.6.8-rc1-mm1
or recent bk: applied cleanly to those rather than 2.6.6 or 2.6.7.
So I've done my NULL sbinfo version below against that too.

This is really an agglommeration of several patches: NULL sbinfo based
on (but eliminating) your SHMEM_NOSBINFO; a holey file panic fix which
I sent Linus and lkml earlier on (which I wouldn't have found for weeks
if you hadn't prompted me to look again here: thank you!); and replacing
the shmem_inodes list of all by shmem_swaplist list of those which might
have pages on swap, a less significant scalability enhancement.  I'll
break it up into smaller patches when I come to submit it in a couple
of weeks.  I've done basic testing, but it will need more later on
(I'm unfamiliar with MS_NOUSER, not sure if my use of it is correct).
I'm as likely to find a 512P machine as a basilisk, so scalability
testing I leave to you.

I felt a little vulnerable, in making this scalability improvement
for the invisible internal mount, that next someone (you?) would
make the same complaint of the visible tmpfs mounts.  So now, if
you "mount -t tmpfs -o nr_blocks=0 -o nr_inodes=0 tmpfs /wherever",
the 0s will be interpreted to give a NULL-sbinfo unlimited mount.
Generally inadvisable (unless /proc/sys/vm/overcommit_memory 2 is
independently enforcing strict memory accounting), but useful to
have as a more scalable option.

Hugh

--- 2.6.8-rc1/mm/shmem.c	2004-07-11 21:59:42.000000000 +0100
+++ linux/mm/shmem.c	2004-07-15 17:08:56.529384032 +0100
@@ -179,16 +179,19 @@ static struct backing_dev_info shmem_bac
 	.unplug_io_fn = default_unplug_io_fn,
 };
 
-LIST_HEAD(shmem_inodes);
-static spinlock_t shmem_ilock = SPIN_LOCK_UNLOCKED;
+LIST_HEAD(shmem_swaplist);
+static spinlock_t shmem_swaplist_lock = SPIN_LOCK_UNLOCKED;
 
 static void shmem_free_block(struct inode *inode)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
-	spin_lock(&sbinfo->stat_lock);
-	sbinfo->free_blocks++;
-	inode->i_blocks -= BLOCKS_PER_PAGE;
-	spin_unlock(&sbinfo->stat_lock);
+
+	if (sbinfo) {
+		spin_lock(&sbinfo->stat_lock);
+		sbinfo->free_blocks++;
+		inode->i_blocks -= BLOCKS_PER_PAGE;
+		spin_unlock(&sbinfo->stat_lock);
+	}
 }
 
 /*
@@ -213,11 +216,13 @@ static void shmem_recalc_inode(struct in
 	if (freed > 0) {
 		struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 		info->alloced -= freed;
-		spin_lock(&sbinfo->stat_lock);
-		sbinfo->free_blocks += freed;
-		inode->i_blocks -= freed*BLOCKS_PER_PAGE;
-		spin_unlock(&sbinfo->stat_lock);
 		shmem_unacct_blocks(info->flags, freed);
+		if (sbinfo) {
+			spin_lock(&sbinfo->stat_lock);
+			sbinfo->free_blocks += freed;
+			inode->i_blocks -= freed*BLOCKS_PER_PAGE;
+			spin_unlock(&sbinfo->stat_lock);
+		}
 	}
 }
 
@@ -337,7 +342,6 @@ static swp_entry_t *shmem_swp_alloc(stru
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	struct page *page = NULL;
 	swp_entry_t *entry;
-	static const swp_entry_t unswapped = { 0 };
 
 	if (sgp != SGP_WRITE &&
 	    ((loff_t) index << PAGE_CACHE_SHIFT) >= i_size_read(inode))
@@ -345,20 +349,22 @@ static swp_entry_t *shmem_swp_alloc(stru
 
 	while (!(entry = shmem_swp_entry(info, index, &page))) {
 		if (sgp == SGP_READ)
-			return (swp_entry_t *) &unswapped;
+			return shmem_swp_map(ZERO_PAGE(0));
 		/*
 		 * Test free_blocks against 1 not 0, since we have 1 data
 		 * page (and perhaps indirect index pages) yet to allocate:
 		 * a waste to allocate index if we cannot allocate data.
 		 */
-		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks <= 1) {
+		if (sbinfo) {
+			spin_lock(&sbinfo->stat_lock);
+			if (sbinfo->free_blocks <= 1) {
+				spin_unlock(&sbinfo->stat_lock);
+				return ERR_PTR(-ENOSPC);
+			}
+			sbinfo->free_blocks--;
+			inode->i_blocks += BLOCKS_PER_PAGE;
 			spin_unlock(&sbinfo->stat_lock);
-			return ERR_PTR(-ENOSPC);
 		}
-		sbinfo->free_blocks--;
-		inode->i_blocks += BLOCKS_PER_PAGE;
-		spin_unlock(&sbinfo->stat_lock);
 
 		spin_unlock(&info->lock);
 		page = shmem_dir_alloc(mapping_gfp_mask(inode->i_mapping));
@@ -599,17 +605,21 @@ static void shmem_delete_inode(struct in
 	struct shmem_inode_info *info = SHMEM_I(inode);
 
 	if (inode->i_op->truncate == shmem_truncate) {
-		spin_lock(&shmem_ilock);
-		list_del(&info->list);
-		spin_unlock(&shmem_ilock);
 		shmem_unacct_size(info->flags, inode->i_size);
 		inode->i_size = 0;
 		shmem_truncate(inode);
+		if (!list_empty(&info->list)) {
+			spin_lock(&shmem_swaplist_lock);
+			list_del_init(&info->list);
+			spin_unlock(&shmem_swaplist_lock);
+		}
+	}
+	if (sbinfo) {
+		BUG_ON(inode->i_blocks);
+		spin_lock(&sbinfo->stat_lock);
+		sbinfo->free_inodes++;
+		spin_unlock(&sbinfo->stat_lock);
 	}
-	BUG_ON(inode->i_blocks);
-	spin_lock(&sbinfo->stat_lock);
-	sbinfo->free_inodes++;
-	spin_unlock(&sbinfo->stat_lock);
 	clear_inode(inode);
 }
 
@@ -714,22 +724,23 @@ found:
  */
 int shmem_unuse(swp_entry_t entry, struct page *page)
 {
-	struct list_head *p;
+	struct list_head *p, *next;
 	struct shmem_inode_info *info;
 	int found = 0;
 
-	spin_lock(&shmem_ilock);
-	list_for_each(p, &shmem_inodes) {
+	spin_lock(&shmem_swaplist_lock);
+	list_for_each_safe(p, next, &shmem_swaplist) {
 		info = list_entry(p, struct shmem_inode_info, list);
-
-		if (info->swapped && shmem_unuse_inode(info, entry, page)) {
+		if (!info->swapped)
+			list_del_init(&info->list);
+		else if (shmem_unuse_inode(info, entry, page)) {
 			/* move head to start search for next from here */
-			list_move_tail(&shmem_inodes, &info->list);
+			list_move_tail(&shmem_swaplist, &info->list);
 			found = 1;
 			break;
 		}
 	}
-	spin_unlock(&shmem_ilock);
+	spin_unlock(&shmem_swaplist_lock);
 	return found;
 }
 
@@ -771,6 +782,12 @@ static int shmem_writepage(struct page *
 		shmem_swp_set(info, entry, swap.val);
 		shmem_swp_unmap(entry);
 		spin_unlock(&info->lock);
+		if (list_empty(&info->list)) {
+			spin_lock(&shmem_swaplist_lock);
+			/* move instead of add in case we're racing */
+			list_move_tail(&info->list, &shmem_swaplist);
+			spin_unlock(&shmem_swaplist_lock);
+		}
 		unlock_page(page);
 		return 0;
 	}
@@ -1002,16 +1019,23 @@ repeat:
 	} else {
 		shmem_swp_unmap(entry);
 		sbinfo = SHMEM_SB(inode->i_sb);
-		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks == 0 || shmem_acct_block(info->flags)) {
+		if (sbinfo) {
+			spin_lock(&sbinfo->stat_lock);
+			if (sbinfo->free_blocks == 0 ||
+			    shmem_acct_block(info->flags)) {
+				spin_unlock(&sbinfo->stat_lock);
+				spin_unlock(&info->lock);
+				error = -ENOSPC;
+				goto failed;
+			}
+			sbinfo->free_blocks--;
+			inode->i_blocks += BLOCKS_PER_PAGE;
 			spin_unlock(&sbinfo->stat_lock);
+		} else if (shmem_acct_block(info->flags)) {
 			spin_unlock(&info->lock);
 			error = -ENOSPC;
 			goto failed;
 		}
-		sbinfo->free_blocks--;
-		inode->i_blocks += BLOCKS_PER_PAGE;
-		spin_unlock(&sbinfo->stat_lock);
 
 		if (!filepage) {
 			spin_unlock(&info->lock);
@@ -1185,13 +1209,15 @@ shmem_get_inode(struct super_block *sb, 
 	struct shmem_inode_info *info;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 
-	spin_lock(&sbinfo->stat_lock);
-	if (!sbinfo->free_inodes) {
+	if (sbinfo) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return NULL;
+		}
+		sbinfo->free_inodes--;
 		spin_unlock(&sbinfo->stat_lock);
-		return NULL;
 	}
-	sbinfo->free_inodes--;
-	spin_unlock(&sbinfo->stat_lock);
 
 	inode = new_inode(sb);
 	if (inode) {
@@ -1206,6 +1232,7 @@ shmem_get_inode(struct super_block *sb, 
 		info = SHMEM_I(inode);
 		memset(info, 0, (char *)inode - (char *)info);
 		spin_lock_init(&info->lock);
+		INIT_LIST_HEAD(&info->list);
  		mpol_shared_policy_init(&info->policy);
 		switch (mode & S_IFMT) {
 		default:
@@ -1214,9 +1241,6 @@ shmem_get_inode(struct super_block *sb, 
 		case S_IFREG:
 			inode->i_op = &shmem_inode_operations;
 			inode->i_fop = &shmem_file_operations;
-			spin_lock(&shmem_ilock);
-			list_add_tail(&info->list, &shmem_inodes);
-			spin_unlock(&shmem_ilock);
 			break;
 		case S_IFDIR:
 			inode->i_nlink++;
@@ -1232,27 +1256,27 @@ shmem_get_inode(struct super_block *sb, 
 	return inode;
 }
 
-static int shmem_set_size(struct shmem_sb_info *info,
+static int shmem_set_size(struct shmem_sb_info *sbinfo,
 			  unsigned long max_blocks, unsigned long max_inodes)
 {
 	int error;
 	unsigned long blocks, inodes;
 
-	spin_lock(&info->stat_lock);
-	blocks = info->max_blocks - info->free_blocks;
-	inodes = info->max_inodes - info->free_inodes;
+	spin_lock(&sbinfo->stat_lock);
+	blocks = sbinfo->max_blocks - sbinfo->free_blocks;
+	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
 	error = -EINVAL;
 	if (max_blocks < blocks)
 		goto out;
 	if (max_inodes < inodes)
 		goto out;
 	error = 0;
-	info->max_blocks  = max_blocks;
-	info->free_blocks = max_blocks - blocks;
-	info->max_inodes  = max_inodes;
-	info->free_inodes = max_inodes - inodes;
+	sbinfo->max_blocks  = max_blocks;
+	sbinfo->free_blocks = max_blocks - blocks;
+	sbinfo->max_inodes  = max_inodes;
+	sbinfo->free_inodes = max_inodes - inodes;
 out:
-	spin_unlock(&info->stat_lock);
+	spin_unlock(&sbinfo->stat_lock);
 	return error;
 }
 
@@ -1508,13 +1532,16 @@ static int shmem_statfs(struct super_blo
 
 	buf->f_type = TMPFS_MAGIC;
 	buf->f_bsize = PAGE_CACHE_SIZE;
-	spin_lock(&sbinfo->stat_lock);
-	buf->f_blocks = sbinfo->max_blocks;
-	buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
-	buf->f_files = sbinfo->max_inodes;
-	buf->f_ffree = sbinfo->free_inodes;
-	spin_unlock(&sbinfo->stat_lock);
 	buf->f_namelen = NAME_MAX;
+	if (sbinfo) {
+		spin_lock(&sbinfo->stat_lock);
+		buf->f_blocks = sbinfo->max_blocks;
+		buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
+		buf->f_files = sbinfo->max_inodes;
+		buf->f_ffree = sbinfo->free_inodes;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+	/* else leave those fields 0 like simple_statfs */
 	return 0;
 }
 
@@ -1655,9 +1682,6 @@ static int shmem_symlink(struct inode *d
 			return error;
 		}
 		inode->i_op = &shmem_symlink_inode_operations;
-		spin_lock(&shmem_ilock);
-		list_add_tail(&info->list, &shmem_inodes);
-		spin_unlock(&shmem_ilock);
 		kaddr = kmap_atomic(page, KM_USER0);
 		memcpy(kaddr, symname, len);
 		kunmap_atomic(kaddr, KM_USER0);
@@ -1786,11 +1810,21 @@ bad_val:
 static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
-	unsigned long max_blocks = sbinfo->max_blocks;
-	unsigned long max_inodes = sbinfo->max_inodes;
+	unsigned long max_blocks = 0;
+	unsigned long max_inodes = 0;
 
+	if (sbinfo) {
+		max_blocks = sbinfo->max_blocks;
+		max_inodes = sbinfo->max_inodes;
+	}
 	if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes))
 		return -EINVAL;
+	/* Keep it simple: disallow limited <-> unlimited remount */
+	if ((max_blocks || max_inodes) == !sbinfo)
+		return -EINVAL;
+	/* But allow the pointless unlimited -> unlimited remount */
+	if (!sbinfo)
+		return 0;
 	return shmem_set_size(sbinfo, max_blocks, max_inodes);
 }
 #endif
@@ -1800,39 +1834,38 @@ static int shmem_fill_super(struct super
 {
 	struct inode *inode;
 	struct dentry *root;
-	unsigned long blocks, inodes;
+	unsigned long blocks = 0;
+	unsigned long inodes = 0;
 	int mode   = S_IRWXUGO | S_ISVTX;
 	uid_t uid = current->fsuid;
 	gid_t gid = current->fsgid;
-	struct shmem_sb_info *sbinfo;
+	struct shmem_sb_info *sbinfo = NULL;
 	int err = -ENOMEM;
 
-	sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
-	if (!sbinfo)
-		return -ENOMEM;
-	sb->s_fs_info = sbinfo;
-	memset(sbinfo, 0, sizeof(struct shmem_sb_info));
-
+#ifdef CONFIG_TMPFS
 	/*
 	 * Per default we only allow half of the physical ram per
-	 * tmpfs instance
+	 * tmpfs instance; but the internal instance is left unlimited.
 	 */
-	blocks = inodes = totalram_pages / 2;
+	if (!(sb->s_flags & MS_NOUSER))
+		blocks = inodes = totalram_pages / 2;
 
-#ifdef CONFIG_TMPFS
-	if (shmem_parse_options(data, &mode, &uid, &gid, &blocks, &inodes)) {
-		err = -EINVAL;
-		goto failed;
+	if (shmem_parse_options(data, &mode, &uid, &gid, &blocks, &inodes))
+		return -EINVAL;
+
+	if (blocks || inodes) {
+		sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
+		if (!sbinfo)
+			return -ENOMEM;
+		sb->s_fs_info = sbinfo;
+		spin_lock_init(&sbinfo->stat_lock);
+		sbinfo->max_blocks = blocks;
+		sbinfo->free_blocks = blocks;
+		sbinfo->max_inodes = inodes;
+		sbinfo->free_inodes = inodes;
 	}
-#else
-	sb->s_flags |= MS_NOUSER;
 #endif
 
-	spin_lock_init(&sbinfo->stat_lock);
-	sbinfo->max_blocks = blocks;
-	sbinfo->free_blocks = blocks;
-	sbinfo->max_inodes = inodes;
-	sbinfo->free_inodes = inodes;
 	sb->s_maxbytes = SHMEM_MAX_BYTES;
 	sb->s_blocksize = PAGE_CACHE_SIZE;
 	sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
@@ -1997,15 +2030,13 @@ static int __init init_tmpfs(void)
 #ifdef CONFIG_TMPFS
 	devfs_mk_dir("shm");
 #endif
-	shm_mnt = kern_mount(&tmpfs_fs_type);
+	shm_mnt = do_kern_mount(tmpfs_fs_type.name, MS_NOUSER,
+				tmpfs_fs_type.name, NULL);
 	if (IS_ERR(shm_mnt)) {
 		error = PTR_ERR(shm_mnt);
 		printk(KERN_ERR "Could not kern_mount tmpfs\n");
 		goto out1;
 	}
-
-	/* The internal instance should not do size checking */
-	shmem_set_size(SHMEM_SB(shm_mnt->mnt_sb), ULONG_MAX, ULONG_MAX);
 	return 0;
 
 out1:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-15 20:28     ` Hugh Dickins
@ 2004-07-15 21:36       ` Brent Casavant
  2004-07-15 21:52       ` Brent Casavant
  2004-07-16 22:35       ` Brent Casavant
  2 siblings, 0 replies; 10+ messages in thread
From: Brent Casavant @ 2004-07-15 21:36 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

On Thu, 15 Jul 2004, Hugh Dickins wrote:

> On Thu, 15 Jul 2004, Brent Casavant wrote:
> >
> > Hmm.  There's more of the same lurking in here.  I moved on to the next
> > page fault scaling problem on the list, namely with SysV shared memory
> > segments.  I'll give you one guess which cacheline is the culprit in that
> > case.
> >
> > Unless I'm mistaken, we don't need to track sbinfo for SysV segments
> > either.  So my next task is to figure out how to turn on the SHMEM_NOSBINFO
> > bit for that case as well.
>
> You should find that you've already fixed that one with your first patch:
> the shared writable /dev/zero mappings and the SysV shared memory live in
> the same internal mount.  But if you find that the SysV shm is still a
> problem with your patch, then either I'm confused or your patch is wrong.

My patch is slightly wrong.  Note that it special cases the "dev/zero"
inode name to perform the detection.  This doesn't catch the SYSV%08x
path from newseg() in ipc/shm.c.

Given that I imagine we don't want a bug in the tmpfs case when a file
is named "SYSV*", I've reworked the patch to let the caller of
shmem_file_setup() pass the desired behavior in the flags argument.

But with your new patch that problem has probably disappeared anyway.
Let me give it a spin and let you know.  I won't be able to do that
until Monday as our 512P machine is booked up the rest of today.
Maybe I'll come in to work to look at it this weekend, but probably not.
Gotta have a life too. :)

> By the way, just curious, but can you tell us what it is that is making
> so much use of the shared writable /dev/zero mappings?  I thought they
> were just an odd corner not used very much; ordinary anonymous memory
> doesn't involve tmpfs objects.

It's actually an artifical test program that nails this particular
code path.  However it was written to simulate a situation that
can happen in MPI codes.  My guess (the original investigation was
over a year ago) is that SGI's MPI folks first noticed this with SysV
segments, and then after the test program was written we found that
the same thing applied to /dev/zero.

> Your patch, by the way, seemed to be against 2.6.8-rc1 or 2.6.8-rc1-mm1
> or recent bk: applied cleanly to those rather than 2.6.6 or 2.6.7.
> So I've done my NULL sbinfo version below against that too.

Yeah, figures.  I wasn't sure exactly what the contents of the tree
I patched against were.  Usually doesn't metter to me.  Thanks for
being forgiving on it. :)

> This is really an agglommeration of several patches: NULL sbinfo based
> on (but eliminating) your SHMEM_NOSBINFO; a holey file panic fix which
> I sent Linus and lkml earlier on (which I wouldn't have found for weeks
> if you hadn't prompted me to look again here: thank you!); and replacing
> the shmem_inodes list of all by shmem_swaplist list of those which might
> have pages on swap, a less significant scalability enhancement.  I'll
> break it up into smaller patches when I come to submit it in a couple
> of weeks.  I've done basic testing, but it will need more later on
> (I'm unfamiliar with MS_NOUSER, not sure if my use of it is correct).
> I'm as likely to find a 512P machine as a basilisk, so scalability
> testing I leave to you.

Yeah, I saw the holey file thing fly by earlier today.  Good eye.

Will most definitely do.  I suspect that the NULL sbinfo will make my
SysV shared memory scaling bug go away too.

> I felt a little vulnerable, in making this scalability improvement
> for the invisible internal mount, that next someone (you?) would
> make the same complaint of the visible tmpfs mounts.  So now, if
> you "mount -t tmpfs -o nr_blocks=0 -o nr_inodes=0 tmpfs /wherever",
> the 0s will be interpreted to give a NULL-sbinfo unlimited mount.
> Generally inadvisable (unless /proc/sys/vm/overcommit_memory 2 is
> independently enforcing strict memory accounting), but useful to
> have as a more scalable option.

I don't think anyone at SGI will complain about a problem with tmpfs
mounts.  Undoubtedly the same problem exists there, but somehow I don't
see people doing heavy parallel page faulting on mmap()ed regular files.

But, as always, I reserve the right to be wrong. :)

But if someone does complain, we (you, wli, and I) have already
thought about some possible solutions -- so we're a step ahead!

Thanks for all your help with this.  I greatly appreciate it.  I'll
run your patch through some paces and let you know what it turns up.

Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-15 20:28     ` Hugh Dickins
  2004-07-15 21:36       ` Brent Casavant
@ 2004-07-15 21:52       ` Brent Casavant
  2004-07-15 23:21         ` Hugh Dickins
  2004-07-16 22:35       ` Brent Casavant
  2 siblings, 1 reply; 10+ messages in thread
From: Brent Casavant @ 2004-07-15 21:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

On Thu, 15 Jul 2004, Hugh Dickins wrote:

> +	/* Keep it simple: disallow limited <-> unlimited remount */
> +	if ((max_blocks || max_inodes) == !sbinfo)
> +		return -EINVAL;

Just caught this one.

Shouldn't this be:

	if ((max_blocks || max_inodes) && !sbinfo)
		return -EINVAL;

Otherwise I think it looks good, though I don't understand some parts
of it of course.  I'm pretty sure it solves the SysV shared memory
scaling problem as well, just by visual inspection.

I'll give the patch a whirl when I can next schedule time on our 512P.

Thanks,
Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-15 21:52       ` Brent Casavant
@ 2004-07-15 23:21         ` Hugh Dickins
  0 siblings, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2004-07-15 23:21 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-mm

On Thu, 15 Jul 2004, Brent Casavant wrote:
> On Thu, 15 Jul 2004, Hugh Dickins wrote:
> 
> > +	/* Keep it simple: disallow limited <-> unlimited remount */
> > +	if ((max_blocks || max_inodes) == !sbinfo)
> > +		return -EINVAL;
> 
> Just caught this one.
> 
> Shouldn't this be:
> 
> 	if ((max_blocks || max_inodes) && !sbinfo)
> 		return -EINVAL;

That's only one half of what I'm trying to disable there, certainly
the more justifiable half, unlimited -> limited.  At the same time
I'm trying to say

	if (!(max_blocks || max_inodes) && sbinfo)
		return -EINVAL;

that is, also disable limited -> unlimited.  Why?  To save bloating
the code, really.  If that's allowed then (a) we need to add in
kfreeing the old sbinfo and (b) we ought really to go through the
existing inodes changing i_blocks (maintained while sbinfo) to 0
(as always while !sbinfo).  Not worth the bother, I thought.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-15 20:28     ` Hugh Dickins
  2004-07-15 21:36       ` Brent Casavant
  2004-07-15 21:52       ` Brent Casavant
@ 2004-07-16 22:35       ` Brent Casavant
  2004-08-02 14:37         ` Brent Casavant
  2 siblings, 1 reply; 10+ messages in thread
From: Brent Casavant @ 2004-07-16 22:35 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2753 bytes --]

On Thu, 15 Jul 2004, Hugh Dickins wrote:

> I'm as likely to find a 512P machine as a basilisk, so scalability
> testing I leave to you.

OK, I managed to grab some time on the machine today.  Parallel
page faulting for /dev/zero and SysV shared memory has definitely
improved in the first few test cases I have.

The test we have is a program which specifically targets page faulting.
This test program was written after observing some of these issues on
MPI and OpenMP applications.

The test program does this:

	1. Forks N child processes, or creates N Pthreads.
	2. Each child/thread creates a memory object via malloc,
	   mmap of /dev/zero, or shmget.
	3. Each child/thread touches each page of the memory object
	   by writing a single byte to the page.
	4. Time to perform step 3 is measured.
	5. The results are aggregated by the main process/thread
	   and a report generated, including statistics such as
	   pagefaults per CPU per wallclock second.

Another variant has the main thread/process create the memory object
and assign the range to be touched to each child/thread, which then
omit the object creation stage and skip to step 3.  We call this
the "preallocate" option.

In our case we typically run with 100MB per child/thread, and run a
sequence of powers of 2 number of CPUs, up to 512.  All of the work
to this point has been on the fork without preallocation variants.

I'm now looking at the fork with preallocation variants, and find
that we're hammering *VERY* hard on the shmem_inode_info i_lock,
mostly in shmem_getpage code.  In fact, performance drops off
significantly even at 4P, and gets positively horrible by 32P
(you don't even want to know about >32P -- but things get 2-4x
worse with each doubling of CPUs).

Just so you can see it, I've attached the most recent run output
from the program.  Take a look at the next to last column of numbers.
In the past few days the last few rows of the second two test cases
have gone from 2-3 digit numbers to 5-digit numbers -- that's what
we've been concentrating on.

Note that due to hardware failures the machine is only running 510 CPUs
in the attached output, and that things got so miserably slow that I
didn't even let the runs finish.  The last column is meaningless, and
always 0.  Also the label "shared" means "preallocate" from the
discussion above.  Oh, and this is a 2.6.7 based kernel -- I'll change
to 2.6.8 sometime soon.

Anyway, the i_lock is my next vic^H^H^Hsubject of investigation.

Cheers, and have a great weekend,
Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.

[-- Attachment #2: shmem scaling test case results --]
[-- Type: TEXT/PLAIN, Size: 6448 bytes --]

                                                                                                                     PF/
                                MAX        MIN                                  TOTCPU/      TOT_PF/   TOT_PF/     WSEC/
TYPE:               CPUS       WALL       WALL        SYS     USER     TOTCPU       CPU     WALL_SEC   SYS_SEC       CPU   NODES
fork                   1      0.059      0.059      0.059    0.000      0.059     0.059       104174    104174    104174       0
fork                   2      0.091      0.090      0.178    0.003      0.181     0.090       134419     68686     67209       0
fork                   4      0.091      0.089      0.354    0.007      0.360     0.090       268838     69066     67209       0
fork                   8      0.091      0.089      0.710    0.013      0.723     0.090       537677     68781     67209       0
fork                  16      0.092      0.089      1.420    0.021      1.440     0.090      1063914     68781     66494       0
fork                  32      0.092      0.089      2.835    0.048      2.883     0.090      2127828     68899     66494       0
fork                  64      0.092      0.088      5.697    0.073      5.771     0.090      4255657     68569     66494       0
fork                 128      0.092      0.089     11.381    0.163     11.544     0.090      8511314     68651     66494       0
fork                 256      0.117      0.058     22.773    0.314     23.088     0.090     13334391     68616     52087       0
fork                 510      0.094      0.057     45.409    0.657     46.066     0.090     33205760     68555     65109       0


                                                                                                                     PF/
                                MAX        MIN                                  TOTCPU/      TOT_PF/   TOT_PF/     WSEC/
TYPE:               CPUS       WALL       WALL        SYS     USER     TOTCPU       CPU     WALL_SEC   SYS_SEC       CPU   NODES
fork:zero              1      0.064      0.064      0.064    0.000      0.064     0.064        94704     94704     94704       0
fork:zero              2      0.094      0.094      0.186    0.001      0.187     0.093       130218     65794     65109       0
fork:zero              4      0.094      0.092      0.368    0.003      0.371     0.093       260437     66318     65109       0
fork:zero              8      0.095      0.092      0.729    0.012      0.741     0.093       515504     66939     64438       0
fork:zero             16      0.094      0.091      1.450    0.025      1.476     0.092      1041749     67345     65109       0
fork:zero             32      0.095      0.091      2.923    0.037      2.960     0.092      2062019     66827     64438       0
fork:zero             64      0.095      0.091      5.814    0.092      5.906     0.092      4124038     67187     64438       0
fork:zero            128      0.097      0.090     11.831    0.181     12.012     0.094      8081449     66039     63136       0
fork:zero            256      0.107      0.068     24.208    0.326     24.534     0.096     14546609     64549     56822       0
fork:zero            510      0.475      0.054    173.469    0.683    174.151     0.341      6559162     17945     12861       0


                                                                                                                     PF/
                                MAX        MIN                                  TOTCPU/      TOT_PF/   TOT_PF/     WSEC/
TYPE:               CPUS       WALL       WALL        SYS     USER     TOTCPU       CPU     WALL_SEC   SYS_SEC       CPU   NODES
fork:shmem             1      0.063      0.063      0.062    0.002      0.063     0.063        96161     99214     96161       0
fork:shmem             2      0.094      0.093      0.185    0.002      0.187     0.093       130218     66142     65109       0
fork:shmem             4      0.094      0.091      0.363    0.007      0.370     0.093       260437     67209     65109       0
fork:shmem             8      0.094      0.092      0.726    0.013      0.738     0.092       520874     67300     65109       0
fork:shmem            16      0.094      0.092      1.452    0.022      1.475     0.092      1041749     67254     65109       0
fork:shmem            32      0.094      0.091      2.906    0.045      2.951     0.092      2083498     67209     65109       0
fork:shmem            64      0.096      0.092      5.823    0.090      5.913     0.092      4081956     67085     63780       0
fork:shmem           128      0.096      0.091     11.659    0.179     11.838     0.092      8163913     67012     63780       0
fork:shmem           256      0.098      0.063     23.380    0.348     23.728     0.093     16001270     66836     62504       0
fork:shmem           510      0.489      0.048    177.804    0.656    178.460     0.350      6362780     17508     12476       0


                                                                                                                     PF/
                                MAX        MIN                                  TOTCPU/      TOT_PF/   TOT_PF/     WSEC/
TYPE:               CPUS       WALL       WALL        SYS     USER     TOTCPU       CPU     WALL_SEC   SYS_SEC       CPU   NODES
fork:zero:shared       1      0.064      0.064      0.062    0.002      0.064     0.064        94704     97664     94704       0
fork:zero:shared       2      0.096      0.095      0.189    0.001      0.190     0.095       127561     64438     63780       0
fork:zero:shared       4      0.213      0.210      0.845    0.003      0.848     0.212       114688     28904     28672       0
fork:zero:shared       8      0.985      0.935      5.220    0.009      5.229     0.654        49557      9355      6194       0
fork:zero:shared      16      3.213      2.811     17.494    0.021     17.516     1.095        30397      5582      1899       0
fork:zero:shared      32      6.832      5.795     44.188    0.052     44.240     1.383        28590      4420       893       0
fork:zero:shared      64     14.677     11.181    128.041    0.082    128.123     2.002        26617      3051       415       0
fork:zero:shared     128     29.026      3.561    282.180    0.172    282.352     2.206        26917      2768       210       0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] /dev/zero page fault scaling
  2004-07-16 22:35       ` Brent Casavant
@ 2004-08-02 14:37         ` Brent Casavant
  0 siblings, 0 replies; 10+ messages in thread
From: Brent Casavant @ 2004-08-02 14:37 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

On Fri, 16 Jul 2004, Brent Casavant wrote:

> On Thu, 15 Jul 2004, Hugh Dickins wrote:
>
> > I'm as likely to find a 512P machine as a basilisk, so scalability
> > testing I leave to you.
>
> OK, I managed to grab some time on the machine today.  Parallel
> page faulting for /dev/zero and SysV shared memory has definitely
> improved in the first few test cases I have.

Hmm... This message must have come unwedged from a mail server somewhere.
You can mostly ignore it, unless you find it interesting.

Brent

-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-08-02 14:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-14 19:27 [PATCH] /dev/zero page fault scaling Brent Casavant
2004-07-14 20:39 ` Hugh Dickins
2004-07-14 21:31   ` Brent Casavant
2004-07-15 16:28   ` Brent Casavant
2004-07-15 20:28     ` Hugh Dickins
2004-07-15 21:36       ` Brent Casavant
2004-07-15 21:52       ` Brent Casavant
2004-07-15 23:21         ` Hugh Dickins
2004-07-16 22:35       ` Brent Casavant
2004-08-02 14:37         ` Brent Casavant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox