linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@amd.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com,
	"Upadhyay, Neeraj" <Neeraj.Upadhyay@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	willy@infradead.org, vbabka@suse.cz, kinseyho@google.com,
	Mel Gorman <mgorman@suse.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system
Date: Thu, 11 Jul 2024 11:13:18 +0530	[thread overview]
Message-ID: <b68e43d4-91f2-4481-80a9-d166c0a43584@amd.com> (raw)
In-Reply-To: <CAOUHufa7OBtNHKMhfu8wOOE4f0w3b0_2KzzV7-hrc9rVL8e=iw@mail.gmail.com>

On 09-Jul-24 11:28 AM, Yu Zhao wrote:
> On Mon, Jul 8, 2024 at 10:31 PM Bharata B Rao <bharata@amd.com> wrote:
>>
>> On 08-Jul-24 9:47 PM, Yu Zhao wrote:
>>> On Mon, Jul 8, 2024 at 8:34 AM Bharata B Rao <bharata@amd.com> wrote:
>>>>
>>>> Hi Yu Zhao,
>>>>
>>>> Thanks for your patches. See below...
>>>>
>>>> On 07-Jul-24 4:12 AM, Yu Zhao wrote:
>>>>> Hi Bharata,
>>>>>
>>>>> On Wed, Jul 3, 2024 at 9:11 AM Bharata B Rao <bharata@amd.com> wrote:
>>>>>>
>>>> <snip>
>>>>>>
>>>>>> Some experiments tried
>>>>>> ======================
>>>>>> 1) When MGLRU was enabled many soft lockups were observed, no hard
>>>>>> lockups were seen for 48 hours run. Below is once such soft lockup.
>>>>>
>>>>> This is not really an MGLRU issue -- can you please try one of the
>>>>> attached patches? It (truncate.patch) should help with or without
>>>>> MGLRU.
>>>>
>>>> With truncate.patch and default LRU scheme, a few hard lockups are seen.
>>>
>>> Thanks.
>>>
>>> In your original report, you said:
>>>
>>>     Most of the times the two contended locks are lruvec and
>>>     inode->i_lock spinlocks.
>>>     ...
>>>     Often times, the perf output at the time of the problem shows
>>>     heavy contention on lruvec spin lock. Similar contention is
>>>     also observed with inode i_lock (in clear_shadow_entry path)
>>>
>>> Based on this new report, does it mean the i_lock is not as contended,
>>> for the same path (truncation) you tested? If so, I'll post
>>> truncate.patch and add reported-by and tested-by you, unless you have
>>> objections.
>>
>> truncate.patch has been tested on two systems with default LRU scheme
>> and the lockup due to inode->i_lock hasn't been seen yet after 24 hours run.
> 
> Thanks.
> 
>>>
>>> The two paths below were contended on the LRU lock, but they already
>>> batch their operations. So I don't know what else we can do surgically
>>> to improve them.
>>
>> What has been seen with this workload is that the lruvec spinlock is
>> held for a long time from shrink_[active/inactive]_list path. In this
>> path, there is a case in isolate_lru_folios() where scanning of LRU
>> lists can become unbounded. To isolate a page from ZONE_DMA, sometimes
>> scanning/skipping of more than 150 million folios were seen. There is
>> already a comment in there which explains why nr_skipped shouldn't be
>> counted, but is there any possibility of re-looking at this condition?
> 
> For this specific case, probably this can help:
> 
> @@ -1659,8 +1659,15 @@ static unsigned long
> isolate_lru_folios(unsigned long nr_to_scan,
>                  if (folio_zonenum(folio) > sc->reclaim_idx ||
>                                  skip_cma(folio, sc)) {
>                          nr_skipped[folio_zonenum(folio)] += nr_pages;
> -                       move_to = &folios_skipped;
> -                       goto move;
> +                       list_move(&folio->lru, &folios_skipped);
> +                       if (spin_is_contended(&lruvec->lru_lock)) {
> +                               if (!list_empty(dst))
> +                                       break;
> +                               spin_unlock_irq(&lruvec->lru_lock);
> +                               cond_resched();
> +                               spin_lock_irq(&lruvec->lru_lock);
> +                       }
> +                       continue;
>                  }

Thanks, this helped. With this fix, the test ran for 24hrs without any 
lockups attributable to lruvec spinlock. As noted in this thread, 
earlier isolate_lru_folios() used to scan millions of folios and spend a 
lot of time with spinlock held but after this fix, such a scenario is no 
longer seen.

However the contention seems to have shifted to other areas and these 
are the two MM related soft and hard lockups that were observed during 
this run:

Soft lockup
===========
watchdog: BUG: soft lockup - CPU#425 stuck for 12s!
CPU: 425 PID: 145707 Comm: fio Kdump: loaded Tainted: G        W 
  6.10.0-rc3-trkwtrs_trnct_nvme_lruvecresched #21
RIP: 0010:handle_softirqs+0x70/0x2f0

   __rmqueue_pcplist+0x4ce/0x9a0
   get_page_from_freelist+0x2e1/0x1650
   __alloc_pages_noprof+0x1b4/0x12c0
   alloc_pages_mpol_noprof+0xdd/0x200
   folio_alloc_noprof+0x67/0xe0

Hard lockup
===========
watchdog: Watchdog detected hard LOCKUP on cpu 296
CPU: 296 PID: 150155 Comm: fio Kdump: loaded Tainted: G        W    L 
  6.10.0-rc3-trkwtrs_trnct_nvme_lruvecresched #21
RIP: 0010:native_queued_spin_lock_slowpath+0x347/0x430

  Call Trace:
   <NMI>
   ? watchdog_hardlockup_check+0x1a2/0x370
   ? watchdog_overflow_callback+0x6d/0x80
<SNIP>
  native_queued_spin_lock_slowpath+0x347/0x430
   </NMI>
   <IRQ>
   _raw_spin_lock_irqsave+0x46/0x60
   free_unref_page+0x19f/0x540
   ? __slab_free+0x2ab/0x2b0
   __free_pages+0x9d/0xb0
   __free_slab+0xa7/0xf0
   free_slab+0x31/0x100
   discard_slab+0x32/0x40
   __put_partials+0xb8/0xe0
   put_cpu_partial+0x5a/0x90
   __slab_free+0x1d9/0x2b0
   kfree+0x244/0x280
   mempool_kfree+0x12/0x20
   mempool_free+0x30/0x90
   nvme_unmap_data+0xd0/0x150 [nvme]
   nvme_pci_complete_batch+0xaf/0xd0 [nvme]
   nvme_irq+0x96/0xe0 [nvme]
   __handle_irq_event_percpu+0x50/0x1b0

Regards,
Bharata.


  reply	other threads:[~2024-07-11  5:43 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03 15:11 Bharata B Rao
2024-07-06 22:42 ` Yu Zhao
2024-07-08 14:34   ` Bharata B Rao
2024-07-08 16:17     ` Yu Zhao
2024-07-09  4:30       ` Bharata B Rao
2024-07-09  5:58         ` Yu Zhao
2024-07-11  5:43           ` Bharata B Rao [this message]
2024-07-15  5:19             ` Bharata B Rao
2024-07-19 20:21               ` Yu Zhao
2024-07-20  7:57                 ` Mateusz Guzik
2024-07-22  4:17                   ` Bharata B Rao
2024-07-22  4:12                 ` Bharata B Rao
2024-07-25  9:59               ` zhaoyang.huang
2024-07-26  3:26                 ` Zhaoyang Huang
2024-07-29  4:49                   ` Bharata B Rao
2024-08-13 11:04           ` Usama Arif
2024-08-13 17:43             ` Yu Zhao
2024-07-17  9:37         ` Vlastimil Babka
2024-07-17 10:50           ` Bharata B Rao
2024-07-17 11:15             ` Hillf Danton
2024-07-18  9:02               ` Bharata B Rao
2024-07-10 12:03   ` Bharata B Rao
2024-07-10 12:24     ` Mateusz Guzik
2024-07-10 13:04       ` Mateusz Guzik
2024-07-15  5:22         ` Bharata B Rao
2024-07-15  6:48           ` Mateusz Guzik
2024-07-10 18:04     ` Yu Zhao
2024-07-17  9:42 ` Vlastimil Babka
2024-07-17 10:31   ` Bharata B Rao
2024-07-17 16:44     ` Karim Manaouil
2024-07-17 11:29   ` Mateusz Guzik
2024-07-18  9:00     ` Bharata B Rao
2024-07-18 12:11       ` Mateusz Guzik
2024-07-19  6:16         ` Bharata B Rao
2024-07-19  7:06           ` Yu Zhao
2024-07-19 14:26           ` Mateusz Guzik
2024-07-17 16:34   ` Karim Manaouil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b68e43d4-91f2-4481-80a9-d166c0a43584@amd.com \
    --to=bharata@amd.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=nikunj@amd.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox