Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: jstancek@redhat.com
Cc: mhocko@suse.com, tj@kernel.org, clameter@sgi.com,
	js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
Date: Fri, 29 Jan 2016 21:35:08 +0900	[thread overview]
Message-ID: <201601292135.DHG60988.SOQFJFOHFVMLOt@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <443846857.13955817.1454052773098.JavaMail.zimbra@redhat.com>

Jan Stancek wrote:
> > Jan, can you reproduce your problem with below patch applied?
> 
> I took v4.5-rc1, applied your memalloc patch and then patch below.
> 
> I have mixed results so far. First attempt hanged after ~15 minutes,
> second is still running (for 12+ hours).
> 
> The way it hanged is different from previous ones, I don't recall seeing
> messages like these before:
>   SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
>   NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
> 
> Full log from one that hanged:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.4-v4.5-rc1_and_wait_iff_congested_patch.txt
> 

The first attempt's failure is not a OOM bug. It's a hard lockup due to
flood of memory allocation failure messages which lasted for 10 seconds
with IRQ disabled. The caller which requested these atomic allocation
did not expect such situation. I think dma_active_cacheline can consider
adding __GFP_NOWARN. Please consult lib/dma-debug.c maintainers.

  static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);

----------
int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd) {
  spin_lock_irqsave(ap->lock, irq_flags); /* Disable IRQ. */
  __ata_scsi_queuecmd(cmd, dev) {
    ata_scsi_translate(dev, scmd, xlat_func) {
      ata_qc_issue(qc) {
        ata_sg_setup(qc) {
          dma_map_sg(ap->dev, qc->sg, qc->n_elem, qc->dma_dir) {
            debug_dma_map_sg(dev, sg, nents, ents, dir) {
              add_dma_entry(entry) { /* Iterate the loop for "ents" times. */
                rc = active_cacheline_insert(entry); /* "SLUB: Unable to allocate memory" message */
                if (rc == -ENOMEM) {
                        pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug disabled\n");
                        global_disable = true;
                }
              }
            }
          }
        }
      }
    }
  }
  spin_unlock_irqrestore(ap->lock, irq_flags); /* Enable IRQ */
}
----------

By the way, I think there is no need to print these error messages
again after global_disable became true.

----------
[ 1053.123934] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1053.147529] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1053.796970] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1053.820563] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1054.469776] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1054.493371] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1055.142562] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1055.166156] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1055.815330] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1055.838924] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1056.495796] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1056.519400] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1057.168741] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1057.192333] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1057.841671] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1057.865264] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1058.514604] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1058.538200] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1059.187551] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1059.211142] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1059.860486] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1059.884080] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1060.533430] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1060.557023] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1061.206393] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1061.229984] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1061.879330] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1061.902924] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1062.552266] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1062.575857] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1063.219374] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1063.242967] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1063.892314] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1063.915908] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2016-01-29 12:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18 15:38 Jan Stancek
2016-01-19 10:29 ` Tetsuo Handa
2016-01-19 15:13   ` Jan Stancek
2016-01-20 10:23     ` [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab Tetsuo Handa
2016-01-20 13:17       ` [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Tetsuo Handa
2016-01-20 15:10         ` Tejun Heo
2016-01-20 15:54           ` Tetsuo Handa
2016-01-22 15:14   ` Jan Stancek
2016-01-23  6:30     ` Tetsuo Handa
2016-01-26  7:48     ` [LTP] " Jan Stancek
2016-01-26 14:46       ` Tetsuo Handa
2016-01-27 11:02         ` Tetsuo Handa
2016-01-28 15:48           ` Tetsuo Handa
2016-01-29  7:32             ` Jan Stancek
2016-01-29 12:35               ` Tetsuo Handa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201601292135.DHG60988.SOQFJFOHFVMLOt@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=arekm@maven.pl \
    --cc=clameter@sgi.com \
    --cc=js1304@gmail.com \
    --cc=jstancek@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox