From: Hugh Dickins <hugh@veritas.com>
To: Brent Casavant <bcasavan@sgi.com>
Cc: William Lee Irwin III <wli@holomorphy.com>,
Andrew Morton <akpm@osdl.org>,
linux-mm@kvack.org
Subject: Re: Scaling problem with shmem_sb_info->stat_lock
Date: Thu, 29 Jul 2004 20:58:54 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.44.0407292006290.1096-100000@localhost.localdomain> (raw)
In-Reply-To: <Pine.SGI.4.58.0407281707370.33392@kzerza.americas.sgi.com>
On Wed, 28 Jul 2004, Brent Casavant wrote:
>
> With Hugh's fix, the problem has now moved to other areas -- I consider
> the stat_lock issue solved.
Me too, though I haven't passed those changes up the chain yet:
waiting to see what happens in this next round.
I didn't look into Andrew's percpu_counters in any depth:
once I'd come across PERCPU_ENOUGH_ROOM 32768 I concluded that
percpu space is a precious resource that we should resist depleting
per mountpoint; but if ext2/3 use it, I guess tmpfs could as well.
Revisit another time if NULL sbinfo found wanting.
> Now I'm running up against the shmem_inode_info
> lock field. A per-CPU structure isn't appropriate here because what it's
> mostly protecting is the inode swap entries, and that isn't at all amenable
> to a per-CPU breakdown (i.e. this is real data, not statistics).
Jack Steiner's question was, why is this an issue on 2.6 when it
wasn't on 2.4? Perhaps better parallelism elsewhere in 2.6 has
shifted contention to here? Or was it an issue in 2.4 after all?
I keep wondering: why is contention on shmem_inode_info->lock a big
deal for you, but not contention on inode->i_mapping->tree_lock?
Once the shm segment or /dev/zero mapping pages are allocated, info->lock
shouldn't be used at all until you get to swapping - and I hope it's safe
to assume that someone with 512 cpus isn't optimizing for swapping.
It's true that when shmem_getpage is allocating index and data pages,
it dips into and out of info->lock several times: I expect that does
exacerbate the bouncing. Earlier in the day I was trying to rewrite
it a little to avoid that, for you to investigate if it makes any
difference; but abandoned that once I realized it would mean
memclearing pages inside the lock, something I'd much rather avoid.
> The "obvious" fix is to morph the code so that the swap entries can be
> updated in parallel to eachother and in parallel to the other miscellaneous
> fields in the shmem_inode_info structure.
Why are all these threads allocating to the inode at the same time?
Are they all trying to lock down the same pages? Or is each trying
to fault in a different page (as your "parallel" above suggests)?
Why doesn't the creator of the shm segment or /dev/zero mapping just
fault in all the pages before handing over to the other threads?
But I may well have entirely the wrong model of what's going on.
Could you provide a small .c testcase to show what it's actually
trying to do when the problem manifests? I don't have many cpus
to reproduce it on, but it should help to provoke a solution.
And/or profiles.
(Once we've shifted the contention from info->lock to mapping->tree_lock,
it'll be interesting but not conclusive to hear how 2.6.8 compares with
2.6.8-mm: since mm is currently using read/write_lock_irq on tree_lock.)
Thanks,
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-07-29 19:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-07-12 21:11 Brent Casavant
2004-07-12 21:55 ` William Lee Irwin III
2004-07-12 22:42 ` Brent Casavant
2004-07-13 19:56 ` Brent Casavant
2004-07-13 20:41 ` Hugh Dickins
2004-07-13 21:35 ` Brent Casavant
2004-07-13 22:50 ` William Lee Irwin III
2004-07-13 22:22 ` William Lee Irwin III
2004-07-13 22:27 ` Brent Casavant
2004-07-28 9:26 ` Andrew Morton
2004-07-28 9:59 ` William Lee Irwin III
2004-07-28 22:21 ` Brent Casavant
2004-07-28 23:05 ` Andrew Morton
2004-07-28 23:40 ` Brent Casavant
2004-07-28 23:53 ` William Lee Irwin III
2004-07-28 23:53 ` William Lee Irwin III
2004-07-29 14:54 ` Brent Casavant
2004-07-29 19:58 ` Hugh Dickins [this message]
2004-07-29 21:21 ` Brent Casavant
2004-07-29 21:51 ` Brent Casavant
2004-07-30 1:00 ` William Lee Irwin III
2004-07-30 21:40 ` Brent Casavant
2004-07-30 23:34 ` Paul Jackson
2004-07-31 3:37 ` Ray Bryant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.44.0407292006290.1096-100000@localhost.localdomain \
--to=hugh@veritas.com \
--cc=akpm@osdl.org \
--cc=bcasavan@sgi.com \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox