linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Bharata B Rao <bharata@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 nikunj@amd.com, "Upadhyay, Neeraj" <Neeraj.Upadhyay@amd.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	willy@infradead.org,  yuzhao@google.com, kinseyho@google.com,
	Mel Gorman <mgorman@suse.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system
Date: Fri, 19 Jul 2024 16:26:56 +0200	[thread overview]
Message-ID: <CAGudoHEC6WD3NkssVz1gXzsBFa4WCSKKx2f6eDDC9q_Qh9nSdg@mail.gmail.com> (raw)
In-Reply-To: <584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com>

On Fri, Jul 19, 2024 at 8:16 AM Bharata B Rao <bharata@amd.com> wrote:
>
> On 18-Jul-24 5:41 PM, Mateusz Guzik wrote:
> > On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@amd.com> wrote:
> >>
> >> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote:
> >>> As for clear_shadow_entry mentioned in the opening mail, the content is:
> >>>           spin_lock(&mapping->host->i_lock);
> >>>           xa_lock_irq(&mapping->i_pages);
> >>>           __clear_shadow_entry(mapping, index, entry);
> >>>           xa_unlock_irq(&mapping->i_pages);
> >>>           if (mapping_shrinkable(mapping))
> >>>                   inode_add_lru(mapping->host);
> >>>           spin_unlock(&mapping->host->i_lock);
> >>>
> >>> so for all I know it's all about the xarray thing, not the i_lock per se.
> >>
> >> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq
> >> and hence concluded it to be i_lock.
> >
> > I'm not disputing it was i_lock. I am claiming that the i_pages is
> > taken immediately after and it may be that in your workload this is
> > the thing with the actual contention problem, making i_lock a red
> > herring.
> >
> > I tried to match up offsets to my own kernel binary, but things went haywire.
> >
> > Can you please resolve a bunch of symbols, like this:
> > ./scripts/faddr2line vmlinux clear_shadow_entry+92
> >
> > and then paste the source code from reported lines? (I presume you are
> > running with some local patches, so opening relevant files in my repo
> > may still give bogus resutls)
> >
> > Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332
>
> clear_shadow_entry+92
>
> $ ./scripts/faddr2line vmlinux clear_shadow_entry+92
> clear_shadow_entry+92/0x180:
> spin_lock_irq at include/linux/spinlock.h:376
> (inlined by) clear_shadow_entry at mm/truncate.c:51
>
> 42 static void clear_shadow_entry(struct address_space *mapping,
> 43                                struct folio_batch *fbatch, pgoff_t
> *indices)
> 44 {
> 45         int i;
> 46
> 47         if (shmem_mapping(mapping) || dax_mapping(mapping))
> 48                 return;
> 49
> 50         spin_lock(&mapping->host->i_lock);
> 51         xa_lock_irq(&mapping->i_pages);
>
>
> __remove_mapping+98
>
> $ ./scripts/faddr2line vmlinux __remove_mapping+98
> __remove_mapping+98/0x230:
> spin_lock_irq at include/linux/spinlock.h:376
> (inlined by) __remove_mapping at mm/vmscan.c:695
>
> 684 static int __remove_mapping(struct address_space *mapping, struct
> folio *folio,
> 685                             bool reclaimed, struct mem_cgroup
> *target_memcg)
> 686 {
> 687         int refcount;
> 688         void *shadow = NULL;
> 689
> 690         BUG_ON(!folio_test_locked(folio));
> 691         BUG_ON(mapping != folio_mapping(folio));
> 692
> 693         if (!folio_test_swapcache(folio))
> 694                 spin_lock(&mapping->host->i_lock);
> 695         xa_lock_irq(&mapping->i_pages);
>
>
> __filemap_add_folio+332
>
> $ ./scripts/faddr2line vmlinux __filemap_add_folio+332
> __filemap_add_folio+332/0x480:
> spin_lock_irq at include/linux/spinlock.h:377
> (inlined by) __filemap_add_folio at mm/filemap.c:878
>
> 851 noinline int __filemap_add_folio(struct address_space *mapping,
> 852                 struct folio *folio, pgoff_t index, gfp_t gfp, void
> **shadowp)
> 853 {
> 854         XA_STATE(xas, &mapping->i_pages, index);
>              ...
> 874         for (;;) {
> 875                 int order = -1, split_order = 0;
> 876                 void *entry, *old = NULL;
> 877
> 878                 xas_lock_irq(&xas);
> 879                 xas_for_each_conflict(&xas, entry) {
>
> >
> > Most notably in __remove_mapping i_lock is conditional:
> >          if (!folio_test_swapcache(folio))
> >                  spin_lock(&mapping->host->i_lock);
> >          xa_lock_irq(&mapping->i_pages);
> >
> > and the disasm of the offset in my case does not match either acquire.
> > For all I know i_lock in this routine is *not* taken and all the
> > queued up __remove_mapping callers increase i_lock -> i_pages wait
> > times in clear_shadow_entry.
>
> So the first two are on i_pages lock and the last one is xa_lock.
>

bottom line though messing with i_lock removal is not justified afaics

> >
> > To my cursory reading i_lock in clear_shadow_entry can be hacked away
> > with some effort, but should this happen the contention is going to
> > shift to i_pages presumably with more soft lockups (except on that
> > lock). I am not convinced messing with it is justified. From looking
> > at other places the i_lock is not a problem in other spots fwiw.
> >
> > All that said even if it is i_lock in both cases *and* someone whacks
> > it, the mm folk should look into what happens when (maybe i_lock ->)
> > i_pages lock is held. To that end perhaps you could provide a
> > flamegraph or output of perf record -a -g, I don't know what's
> > preferred.
>
> I have attached the flamegraph but this is for the kernel that has been
> running with all the accumulated fixes so far. The original one (w/o
> fixes) did show considerable time spent on
> native_queued_spin_lock_slowpath but unfortunately unable to locate it now.
>

So I think the problems at this point are all mm, so I'm kicking the
ball to that side.

-- 
Mateusz Guzik <mjguzik gmail.com>


  parent reply	other threads:[~2024-07-19 14:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03 15:11 Bharata B Rao
2024-07-06 22:42 ` Yu Zhao
2024-07-08 14:34   ` Bharata B Rao
2024-07-08 16:17     ` Yu Zhao
2024-07-09  4:30       ` Bharata B Rao
2024-07-09  5:58         ` Yu Zhao
2024-07-11  5:43           ` Bharata B Rao
2024-07-15  5:19             ` Bharata B Rao
2024-07-19 20:21               ` Yu Zhao
2024-07-20  7:57                 ` Mateusz Guzik
2024-07-22  4:17                   ` Bharata B Rao
2024-07-22  4:12                 ` Bharata B Rao
2024-07-25  9:59               ` zhaoyang.huang
2024-07-26  3:26                 ` Zhaoyang Huang
2024-07-29  4:49                   ` Bharata B Rao
2024-08-13 11:04           ` Usama Arif
2024-08-13 17:43             ` Yu Zhao
2024-07-17  9:37         ` Vlastimil Babka
2024-07-17 10:50           ` Bharata B Rao
2024-07-17 11:15             ` Hillf Danton
2024-07-18  9:02               ` Bharata B Rao
2024-07-10 12:03   ` Bharata B Rao
2024-07-10 12:24     ` Mateusz Guzik
2024-07-10 13:04       ` Mateusz Guzik
2024-07-15  5:22         ` Bharata B Rao
2024-07-15  6:48           ` Mateusz Guzik
2024-07-10 18:04     ` Yu Zhao
2024-07-17  9:42 ` Vlastimil Babka
2024-07-17 10:31   ` Bharata B Rao
2024-07-17 16:44     ` Karim Manaouil
2024-07-17 11:29   ` Mateusz Guzik
2024-07-18  9:00     ` Bharata B Rao
2024-07-18 12:11       ` Mateusz Guzik
2024-07-19  6:16         ` Bharata B Rao
2024-07-19  7:06           ` Yu Zhao
2024-07-19 14:26           ` Mateusz Guzik [this message]
2024-07-17 16:34   ` Karim Manaouil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHEC6WD3NkssVz1gXzsBFa4WCSKKx2f6eDDC9q_Qh9nSdg@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=nikunj@amd.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox