linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Ravikiran G Thirumalai <kiran@scalex86.org>
Cc: nickpiggin@yahoo.com.au, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, ak@suse.de, shai@scalex86.org,
	pravin.shelar@calsoftinc.com
Subject: Re: High lock spin time for zone->lru_lock under extreme conditions
Date: Sat, 13 Jan 2007 13:20:23 -0800	[thread overview]
Message-ID: <20070113132023.0f8d2da8.akpm@osdl.org> (raw)
In-Reply-To: <20070113195334.GC4234@localhost.localdomain>

> On Sat, 13 Jan 2007 11:53:34 -0800 Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> On Sat, Jan 13, 2007 at 12:00:17AM -0800, Andrew Morton wrote:
> > > On Fri, 12 Jan 2007 23:36:43 -0800 Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> > > > >void __lockfunc _spin_lock_irq(spinlock_t *lock)
> > > > >{
> > > > >        local_irq_disable();
> > > > >        ------------------------> rdtsc(t1);
> > > > >        preempt_disable();
> > > > >        spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> > > > >        _raw_spin_lock(lock);
> > > > >        ------------------------> rdtsc(t2);
> > > > >        if (lock->spin_time < (t2 - t1))
> > > > >                lock->spin_time = t2 - t1;
> > > > >}
> > > > >
> > > > >On some runs, we found that the zone->lru_lock spun for 33 seconds or more
> > > > >while the maximal CS time was 3 seconds or so.
> > > > 
> > > > What is the "CS time"?
> > > 
> > > Critical Section :).  This is the maximal time interval I measured  from 
> > > t2 above to the time point we release the spin lock.  This is the hold 
> > > time I guess.
> > 
> > By no means.  The theory here is that CPUA is taking and releasing the
> > lock at high frequency, but CPUB never manages to get in and take it.  In
> > which case the maximum-acquisition-time is much larger than the
> > maximum-hold-time.
> > 
> > I'd suggest that you use a similar trick to measure the maximum hold time:
> > start the timer after we got the lock, stop it just before we release the
> > lock (assuming that the additional rdtsc delay doesn't "fix" things, of
> > course...)
> 
> Well, that is exactly what I described above  as CS time.

Seeing the code helps.

>  The
> instrumentation goes like this:
> 
> void __lockfunc _spin_lock_irq(spinlock_t *lock)
> {
>         unsigned long long t1,t2;
>         local_irq_disable();
>         t1 = get_cycles_sync();
>         preempt_disable();
>         spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
>         _raw_spin_lock(lock);
>         t2 = get_cycles_sync();
>         lock->raw_lock.htsc = t2;
>         if (lock->spin_time < (t2 - t1))
>                 lock->spin_time = t2 - t1;
> }
> ...
> 
> void __lockfunc _spin_unlock_irq(spinlock_t *lock)
> {
>         unsigned long long t1 ;
>         spin_release(&lock->dep_map, 1, _RET_IP_);
>         t1 = get_cycles_sync();
>         if (lock->cs_time < (t1 -  lock->raw_lock.htsc))
>                 lock->cs_time = t1 -  lock->raw_lock.htsc;
>         _raw_spin_unlock(lock);
>         local_irq_enable();
>         preempt_enable();
> }
> 
> Am I missing something?  Is this not what you just described? (The
> synchronizing rdtsc might not be really required at all locations, but I 
> doubt if it would contribute a significant fraction to 33s  or even 
> the 3s hold time on a 2.6 GHZ opteron).

OK, now we need to do a dump_stack() each time we discover a new max hold
time.  That might a bit tricky: the printk code does spinlocking too so
things could go recursively deadlocky.  Maybe make spin_unlock_irq() return
the hold time then do:

void lru_spin_unlock_irq(struct zone *zone)
{
	long this_time;

	this_time = spin_unlock_irq(&zone->lru_lock);
	if (this_time > zone->max_time) {
		zone->max_time = this_time;
		dump_stack();
	}
}

or similar.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-01-13 21:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-12 16:01 Ravikiran G Thirumalai
2007-01-12 17:03 ` Peter Zijlstra
2007-01-12 19:46 ` Christoph Lameter
2007-01-12 21:25   ` Andrew Morton
2007-01-12 21:40   ` Ravikiran G Thirumalai
2007-01-12 21:45     ` Christoph Lameter
2007-01-13  1:00       ` Ravikiran G Thirumalai
2007-01-13  1:11         ` Andrew Morton
2007-01-13  7:42           ` Ravikiran G Thirumalai
2007-01-13  4:39 ` Nick Piggin
2007-01-13  7:36   ` Ravikiran G Thirumalai
2007-01-13  7:53     ` Nick Piggin
2007-01-13  8:00     ` Andrew Morton
2007-01-13 19:53       ` Ravikiran G Thirumalai
2007-01-13 21:20         ` Andrew Morton [this message]
2007-01-16  2:56           ` Ravikiran G Thirumalai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070113132023.0f8d2da8.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=ak@suse.de \
    --cc=kiran@scalex86.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pravin.shelar@calsoftinc.com \
    --cc=shai@scalex86.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox