linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@amd.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com,
	"Upadhyay, Neeraj" <Neeraj.Upadhyay@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	willy@infradead.org, yuzhao@google.com, kinseyho@google.com,
	Mel Gorman <mgorman@suse.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system
Date: Fri, 19 Jul 2024 11:46:09 +0530	[thread overview]
Message-ID: <584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com> (raw)
In-Reply-To: <CAGudoHFDZi=79GgtWHWw52kyu81ZK2O4=30YrKhPerDxXdxbKg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4821 bytes --]

On 18-Jul-24 5:41 PM, Mateusz Guzik wrote:
> On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@amd.com> wrote:
>>
>> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote:
>>> As for clear_shadow_entry mentioned in the opening mail, the content is:
>>>           spin_lock(&mapping->host->i_lock);
>>>           xa_lock_irq(&mapping->i_pages);
>>>           __clear_shadow_entry(mapping, index, entry);
>>>           xa_unlock_irq(&mapping->i_pages);
>>>           if (mapping_shrinkable(mapping))
>>>                   inode_add_lru(mapping->host);
>>>           spin_unlock(&mapping->host->i_lock);
>>>
>>> so for all I know it's all about the xarray thing, not the i_lock per se.
>>
>> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq
>> and hence concluded it to be i_lock.
> 
> I'm not disputing it was i_lock. I am claiming that the i_pages is
> taken immediately after and it may be that in your workload this is
> the thing with the actual contention problem, making i_lock a red
> herring.
> 
> I tried to match up offsets to my own kernel binary, but things went haywire.
> 
> Can you please resolve a bunch of symbols, like this:
> ./scripts/faddr2line vmlinux clear_shadow_entry+92
> 
> and then paste the source code from reported lines? (I presume you are
> running with some local patches, so opening relevant files in my repo
> may still give bogus resutls)
> 
> Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332

clear_shadow_entry+92

$ ./scripts/faddr2line vmlinux clear_shadow_entry+92
clear_shadow_entry+92/0x180:
spin_lock_irq at include/linux/spinlock.h:376
(inlined by) clear_shadow_entry at mm/truncate.c:51

42 static void clear_shadow_entry(struct address_space *mapping,
43                                struct folio_batch *fbatch, pgoff_t 
*indices)
44 {
45         int i;
46
47         if (shmem_mapping(mapping) || dax_mapping(mapping))
48                 return;
49
50         spin_lock(&mapping->host->i_lock);
51         xa_lock_irq(&mapping->i_pages);


__remove_mapping+98

$ ./scripts/faddr2line vmlinux __remove_mapping+98
__remove_mapping+98/0x230:
spin_lock_irq at include/linux/spinlock.h:376
(inlined by) __remove_mapping at mm/vmscan.c:695

684 static int __remove_mapping(struct address_space *mapping, struct 
folio *folio,
685                             bool reclaimed, struct mem_cgroup 
*target_memcg)
686 {
687         int refcount;
688         void *shadow = NULL;
689
690         BUG_ON(!folio_test_locked(folio));
691         BUG_ON(mapping != folio_mapping(folio));
692
693         if (!folio_test_swapcache(folio))
694                 spin_lock(&mapping->host->i_lock);
695         xa_lock_irq(&mapping->i_pages);


__filemap_add_folio+332

$ ./scripts/faddr2line vmlinux __filemap_add_folio+332
__filemap_add_folio+332/0x480:
spin_lock_irq at include/linux/spinlock.h:377
(inlined by) __filemap_add_folio at mm/filemap.c:878

851 noinline int __filemap_add_folio(struct address_space *mapping,
852                 struct folio *folio, pgoff_t index, gfp_t gfp, void 
**shadowp)
853 {
854         XA_STATE(xas, &mapping->i_pages, index);
             ...
874         for (;;) {
875                 int order = -1, split_order = 0;
876                 void *entry, *old = NULL;
877
878                 xas_lock_irq(&xas);
879                 xas_for_each_conflict(&xas, entry) {

> 
> Most notably in __remove_mapping i_lock is conditional:
>          if (!folio_test_swapcache(folio))
>                  spin_lock(&mapping->host->i_lock);
>          xa_lock_irq(&mapping->i_pages);
> 
> and the disasm of the offset in my case does not match either acquire.
> For all I know i_lock in this routine is *not* taken and all the
> queued up __remove_mapping callers increase i_lock -> i_pages wait
> times in clear_shadow_entry.

So the first two are on i_pages lock and the last one is xa_lock.

> 
> To my cursory reading i_lock in clear_shadow_entry can be hacked away
> with some effort, but should this happen the contention is going to
> shift to i_pages presumably with more soft lockups (except on that
> lock). I am not convinced messing with it is justified. From looking
> at other places the i_lock is not a problem in other spots fwiw.
> 
> All that said even if it is i_lock in both cases *and* someone whacks
> it, the mm folk should look into what happens when (maybe i_lock ->)
> i_pages lock is held. To that end perhaps you could provide a
> flamegraph or output of perf record -a -g, I don't know what's
> preferred.

I have attached the flamegraph but this is for the kernel that has been 
running with all the accumulated fixes so far. The original one (w/o 
fixes) did show considerable time spent on 
native_queued_spin_lock_slowpath but unfortunately unable to locate it now.

Regards,
Bharata.

[-- Attachment #2: perf 1.svg --]
[-- Type: image/svg+xml, Size: 1215900 bytes --]

  reply	other threads:[~2024-07-19  6:16 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03 15:11 Bharata B Rao
2024-07-06 22:42 ` Yu Zhao
2024-07-08 14:34   ` Bharata B Rao
2024-07-08 16:17     ` Yu Zhao
2024-07-09  4:30       ` Bharata B Rao
2024-07-09  5:58         ` Yu Zhao
2024-07-11  5:43           ` Bharata B Rao
2024-07-15  5:19             ` Bharata B Rao
2024-07-19 20:21               ` Yu Zhao
2024-07-20  7:57                 ` Mateusz Guzik
2024-07-22  4:17                   ` Bharata B Rao
2024-07-22  4:12                 ` Bharata B Rao
2024-07-25  9:59               ` zhaoyang.huang
2024-07-26  3:26                 ` Zhaoyang Huang
2024-07-29  4:49                   ` Bharata B Rao
2024-08-13 11:04           ` Usama Arif
2024-08-13 17:43             ` Yu Zhao
2024-07-17  9:37         ` Vlastimil Babka
2024-07-17 10:50           ` Bharata B Rao
2024-07-17 11:15             ` Hillf Danton
2024-07-18  9:02               ` Bharata B Rao
2024-07-10 12:03   ` Bharata B Rao
2024-07-10 12:24     ` Mateusz Guzik
2024-07-10 13:04       ` Mateusz Guzik
2024-07-15  5:22         ` Bharata B Rao
2024-07-15  6:48           ` Mateusz Guzik
2024-07-10 18:04     ` Yu Zhao
2024-07-17  9:42 ` Vlastimil Babka
2024-07-17 10:31   ` Bharata B Rao
2024-07-17 16:44     ` Karim Manaouil
2024-07-17 11:29   ` Mateusz Guzik
2024-07-18  9:00     ` Bharata B Rao
2024-07-18 12:11       ` Mateusz Guzik
2024-07-19  6:16         ` Bharata B Rao [this message]
2024-07-19  7:06           ` Yu Zhao
2024-07-19 14:26           ` Mateusz Guzik
2024-07-17 16:34   ` Karim Manaouil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com \
    --to=bharata@amd.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mjguzik@gmail.com \
    --cc=nikunj@amd.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox