linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>, Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Issue fixed by commit 53a59fc67f97 is surfacing again..
Date: Fri, 29 Jun 2018 17:32:00 +0200	[thread overview]
Message-ID: <11416e51-08b5-11ec-a2c8-9078c386d895@linux.vnet.ibm.com> (raw)

Hi,

The commit 53a59fc67f97 (mm: limit mmu_gather batching to fix soft lockups on
!CONFIG_PREEMPT) fixed soft lockup displayed when large processes exited.

Today on a large system, we are seeing it again :

NMI watchdog: BUG: soft lockup - CPU#1015 stuck for 21s! [forkoff:182534]
Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
nfs lockd grace sunrpc fscache af_packet ip_set nfnetlink bridge stp llc
libcrc32c x_tables dm_mod ghash_generic gf128mul vmx_crypto rtc_generic tg3 ses
enclosure scsi_transport_sas ptp pps_core libphy btrfs xor raid6_pq sd_mod
crc32c_vpmsum ipr(X) libata sg scsi_mod autofs4 [last unloaded: ip_tables]
Supported: Yes, External
CPU: 1015 PID: 182534 Comm: forkoff Tainted: G
4.12.14-23-default #1 SLE15
task: c00001f262efcb00 task.stack: c00001f264688000
NIP: c0000000000164c4 LR: c0000000000164c4 CTR: 000000000000aa18
REGS: c00001f26468b570 TRAP: 0901   Tainted: G
(4.12.14-23-default)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
  CR: 42042824  XER: 00000000
CFAR: c00000000099829c SOFTE: 1
GPR00: c0000000002d43b8 c00001f26468b7f0 c00000000116a900 0000000000000900
GPR04: c00014fb0fff6410 f0000005075aa860 0000000000000008 0000000000000000
GPR08: c000000007d39d00 00000000800003d8 00000000800003f7 000014fa8e880000
GPR12: 0000000000002200 c000000007d39d00
NIP [c0000000000164c4] arch_local_irq_restore+0x74/0x90
LR [c0000000000164c4] arch_local_irq_restore+0x74/0x90
Call Trace:
[c00001f26468b7f0] [f0000005075a9500] 0xf0000005075a9500 (unreliable)
[c00001f26468b810] [c0000000002d43b8] free_unref_page_list+0x198/0x280
[c00001f26468b870] [c0000000002e1064] release_pages+0x3d4/0x510
[c00001f26468b950] [c000000000343acc] free_pages_and_swap_cache+0x12c/0x160
[c00001f26468b9a0] [c000000000318a88] tlb_flush_mmu_free+0x68/0xa0
[c00001f26468b9e0] [c00000000031c7ac] zap_pte_range+0x30c/0xa40
[c00001f26468bae0] [c00000000031d344] unmap_page_range+0x334/0x6d0
[c00001f26468bbc0] [c00000000031dc84] unmap_vmas+0x94/0x140
[c00001f26468bc10] [c00000000032b478] exit_mmap+0xe8/0x1f0
[c00001f26468bcd0] [c0000000000ff460] mmput+0x80/0x1c0
[c00001f26468bd00] [c000000000109430] do_exit+0x370/0xc70
[c00001f26468bdd0] [c000000000109e00] do_group_exit+0x60/0x100
[c00001f26468be10] [c000000000109ec4] SyS_exit_group+0x24/0x30
[c00001f26468be30] [c00000000000b088] system_call+0x3c/0x12c
Instruction dump:
994d02ba 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010
7c0803a6 4e800020 60000000 4bff4165 <60000000> 4bffffe4 60000000 e92d0020

This has been created on a 32TB node where ~1500 processes, each allocating
10GB, are spawning/exiting in a stressing loop.

As Power is 64K page size based, MAX_GATHER_BATCH = 8189, so
MAX_GATHER_BATCH_COUNT will not exceed 1.

So there is no way to loop in zap_pte_range() due to the batch's limit.
I guess we are never hitting the workaround introduced in the commit
53a59fc67f97. By the way should cond_resched being called in zap_pte_range()
when the flush is due to the batch's limit ?
Something like that :

@@ -1338,7 +1345,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
                        if (unlikely(page_mapcount(page) < 0))
                                print_bad_pte(vma, addr, ptent, page);
                        if (unlikely(__tlb_remove_page(tlb, page))) {
-                               force_flush = 1;
+                               force_flush = 2;
                                addr += PAGE_SIZE;
                                break;
                        }
@@ -1398,12 +1405,19 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
         * batch buffers or because we needed to flush dirty TLB
         * entries before releasing the ptl), free the batched
         * memory too. Restart if we didn't do everything.
+        * In the case the flush was due to the batch buffer's limit,
+        * give a chance to the other task to be run to avoid soft lockup
+        * when dealing with large amount of memory.
         */
        if (force_flush) {
+               bool force_sched = (force_flush == 2);
                force_flush = 0;
                tlb_flush_mmu_free(tlb);
-               if (addr != end)
+               if (addr != end) {
+                       if (force_sched)
+                               cond_resched();
                        goto again;
+               }
        }

Anyway, this should not fix the soft lockup I'm facing because
MAX_GATHER_BATCH_COUNT=1 on ppc64.

Indeed, I'm wondering if the 10K pages is too large in some cases, especially
when the node is loaded, and contention on the pte lock is likely to happen.
Here with less than 8k pages processed soft lockup are surfacing.

Should the MAX_GATHER_BATCH limit be forced to lower value on ppc64 or more
code introduced to work around that ?

Cheers,
Laurent.

             reply	other threads:[~2018-06-29 15:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 15:32 Laurent Dufour [this message]
2018-07-02  9:27 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11416e51-08b5-11ec-a2c8-9078c386d895@linux.vnet.ibm.com \
    --to=ldufour@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox