From: Liam Howlett <liam.howlett@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Michel Lespinasse <walken.cr@gmail.com>
Subject: Re: Strange mmap_lock tracepoint
Date: Wed, 1 Sep 2021 17:05:00 +0000 [thread overview]
Message-ID: <20210901170447.fcqdnw43q7cmamzu@revolver> (raw)
In-Reply-To: <1585872a-1562-c74a-b686-e0051fa75cda@suse.cz>
* Vlastimil Babka <vbabka@suse.cz> [210901 06:52]:
> On 8/31/21 22:40, Liam Howlett wrote:
> >
> > Hello,
> >
> > I've been trying to trace the mmap_lock calls using tracepoints and
> > captured this behaviour which I cannot explain. This was with the maple
> > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
> > using 8 virtual CPUs.
> >
> > AFAICT, there are zero callers that use the mmap_lock directly besides a
> > prefetchw(&mm->mmap_lock);
> >
> >
> > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000de8cf25e lock
> > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
> > ebizzy-803 [000] .... 5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
> > ebizzy-803 [000] .... 5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
> >
> > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> >
> > ebizzy-803 [000] .... 5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
> >
> > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
>
> I was thinking khugepaged interference, but this seems to be a system-wide
> tracing (judging from <idle>-0) so you would see khugepaged too, right?
Correct. All calls will go through Michel's tracepoints from what I can
tell.
>
> In the other hand it seems strange to have a long list of just cpu 0 here.
> Are other CPU's missing or just the interleaving is imperfect because
> timestamps are not perfectly in sync between cpus, and in fact there was
> another CPU who took the lock?
Yes, it could be the clock. I had used the default clock which is local
to CPUs. I was looking for documentation in 'time stamp' but should
have looked for 'clock'. I've re-run the test with counter, which
should remove the potential of incorrect ordering. I've also imported
into a spreadsheet and sorted by the counter to ensure there isn't
interleaving of printing causing issues. This is why the output below
has a slightly different whitespace than the original.
ebizzy-1437 [003] .... 83650150 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83650157 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83650158 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437 [003] .... 83650159 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83650160 : lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437 [003] d... 83650161 : lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437 [003] d... 83650163 : lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437 [003] d... 83650164 : lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437 [003] d... 83650168 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83650169 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83650171 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83650173 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83650174 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83650175 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83650178 : lock_release: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83650181 : lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83650182 : lock_release: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83650183 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] d... 83650184 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] d... 83650185 : lock_acquire: 00000000f5623f3e read jiffies_seq.seqcount
<idle>-0 [003] d... 83650186 : lock_release: 00000000f5623f3e jiffies_seq.seqcount
<idle>-0 [003] d... 83650187 : lock_acquire: 000000006705bb4e &base->lock
<idle>-0 [003] d... 83650188 : lock_release: 000000006705bb4e &base->lock
<idle>-0 [003] d... 83650189 : lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] d... 83650190 : lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] d... 83650191 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] d... 83650192 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] d... 83650193 : lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] d... 83650194 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] d... 83650195 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] d... 83650196 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] d... 83650197 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] d... 83650198 : lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] d.h. 83651382 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] d.h. 83651386 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] d.h. 83651391 : lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0 [003] dNh. 83651403 : lock_release: 00000000b5963f30 &rq->__lock
<idle>-0 [003] dN.. 83651406 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651409 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651411 : lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] dN.. 83651415 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651417 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651427 : lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] dN.. 83651429 : lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] dN.. 83651431 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651433 : lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0 [003] dN.. 83651441 : lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0 [003] dN.. 83651443 : lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83651448 : lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437 [003] d... 83651453 : lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437 [003] d... 83651455 : lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437 [003] .... 83651458 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437 [003] .... 83651462 : lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437 [003] .... 83651464 : lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437 [003] .... 83651466 : lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437 [003] .... 83651467 : lock_release: 000000005ada5d35 fs_reclaim
ebizzy-1437 [003] d... 83651470 : lock_acquire: 0000000002c996e9 lock
ebizzy-1437 [003] d... 83651472 : lock_release: 0000000002c996e9 lock
ebizzy-1437 [003] .... 83651475 : lock_acquire: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437 [003] .... 83651476 : lock_acquire: 0000000064812dba lock
ebizzy-1437 [003] .... 83651477 : lock_release: 0000000064812dba lock
ebizzy-1437 [003] .... 83651478 : lock_release: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437 [003] .... 83651479 : lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1437 [003] .... 83651480 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83651489 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83651491 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437 [003] .... 83651492 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83651494 : lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437 [003] d... 83651496 : lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437 [003] d... 83651497 : lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437 [003] d... 83651499 : lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437 [003] d... 83651501 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83651504 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83651505 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83651507 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83651510 : lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437 [003] d... 83651512 : lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437 [003] d... 83651515 : lock_release: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83651517 : lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0 [003] d... 83651520 : lock_release: 00000000b5963f30 &rq->__lock
Please note that this is now next-20210811 directly. No maple tree
patches.
As per the grouping of each thread, I don't really have an answer.
There are places in the trace which do have different CPUs and threads
running interleaved:
ebizzy-1445 [000] .... 83693711 : lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1445 [000] .... 83693715 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83693727 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83693730 : lock_acquire: 0000000055bc357b try read &mm->mmap_lock
ebizzy-1437 [003] .... 83693733 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437 [003] .... 83693738 : lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437 [003] .... 83693742 : lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1445 [000] .... 83693743 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437 [003] .... 83693747 : lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start
> <removed old log>
> >
> > The following tracepoints were enabled:
> > events/mmap_lock/enable
> > events/lock/enable
> >
> > My reading of the above trace is that the ebizzy thread dropped the lock
> > and immediately attempted to reacquire and detected it was in contention
> > so the thread added itself to the list and went to sleep only to wake up
> > and get the lock?
>
> Which path does this mmap_read_trylock() followed by immediate
> mmap_read_lock() anyway? I mean down_read() is implemented like this
> internally, but that wouldn't generate these mmap_lock_ events.
I think it's do_user_addr_fault()?
>
> > Doesn't that mean the mmap_lock is _not_ contended? The lack of
> > contention makes sense since there is no tracepoint of an attempt to
> > acquire the lock between the dropping of the lock and the same MM waking
> > up to get the lock.
> >
> > What is even stranger is if I change the value of _Q_PENDING_LOOPS from
> > (1 << 9) to (1 << 10), the benchmark performs better.
> >
> > From the above trace and the effects of the _Q_PENDING_LOOPS change, it
> > looks like the lock is somehow remaining in a state that causes a
> > failure to acquire the lock even when it is not held or contended?
So the new log is with trace_clock set to counter with next-20210811
running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs.
Thanks,
Liam
next prev parent reply other threads:[~2021-09-01 17:05 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-31 20:40 Liam Howlett
2021-09-01 10:52 ` Vlastimil Babka
2021-09-01 17:05 ` Liam Howlett [this message]
2021-09-03 1:57 ` Liam Howlett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210901170447.fcqdnw43q7cmamzu@revolver \
--to=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=vbabka@suse.cz \
--cc=walken.cr@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox