Re: Machine lockups on extreme memory pressure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Shakeel Butt <shakeelb@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	 LKML <linux-kernel@vger.kernel.org>,
	Greg Thelen <gthelen@google.com>
Subject: Re: Machine lockups on extreme memory pressure
Date: Tue, 22 Sep 2020 06:37:02 -0700	[thread overview]
Message-ID: <CALvZod6=VwQduoG3GiW-=csAQja4vCsXAhKH_tSuA4JYx0dEiA@mail.gmail.com> (raw)
In-Reply-To: <20200922111202.GY12990@dhcp22.suse.cz>

On Tue, Sep 22, 2020 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-09-20 11:35:35, Shakeel Butt wrote:
> > Hi all,
> >
> > We are seeing machine lockups due extreme memory pressure where the
> > free pages on all the zones are way below the min watermarks. The stack
> > of the stuck CPU looks like the following (I had to crash the machine to
> > get the info).
>
> sysrq+l didn't report anything?
>

Sorry I misspoke earlier that I personally crash the machine. I get to
know the state of the machine from the crash dump. We have a crash
timer on our machines which need to be reset every couple of hours
from user space. If the user space daemon responsible to reset does
not get chance to reset it, the machine get crashed, so, these crashes
are where the user space timer resetter daemon could not run for
couple of hours.

> >  #0 [ ] crash_nmi_callback
> >  #1 [ ] nmi_handle
> >  #2 [ ] default_do_nmi
> >  #3 [ ] do_nmi
> >  #4 [ ] end_repeat_nmi
> > --- <NMI exception stack> ---
> >  #5 [ ] queued_spin_lock_slowpath
> >  #6 [ ] _raw_spin_lock
> >  #7 [ ] ____cache_alloc_node
> >  #8 [ ] fallback_alloc
> >  #9 [ ] __kmalloc_node_track_caller
> > #10 [ ] __alloc_skb
> > #11 [ ] tcp_send_ack
> > #12 [ ] tcp_delack_timer
> > #13 [ ] run_timer_softirq
> > #14 [ ] irq_exit
> > #15 [ ] smp_apic_timer_interrupt
> > #16 [ ] apic_timer_interrupt
> > --- <IRQ stack> ---
> > #17 [ ] apic_timer_interrupt
> > #18 [ ] _raw_spin_lock
> > #19 [ ] vmpressure
> > #20 [ ] shrink_node
> > #21 [ ] do_try_to_free_pages
> > #22 [ ] try_to_free_pages
> > #23 [ ] __alloc_pages_direct_reclaim
> > #24 [ ] __alloc_pages_nodemask
> > #25 [ ] cache_grow_begin
> > #26 [ ] fallback_alloc
> > #27 [ ] __kmalloc_node_track_caller
> > #28 [ ] __alloc_skb
> > #29 [ ] tcp_sendmsg_locked
> > #30 [ ] tcp_sendmsg
> > #31 [ ] inet6_sendmsg
> > #32 [ ] ___sys_sendmsg
> > #33 [ ] sys_sendmsg
> > #34 [ ] do_syscall_64
> >
> > These are high traffic machines. Almost all the CPUs are stuck on the
> > root memcg's vmpressure sr_lock and almost half of the CPUs are stuck
> > on kmem cache node's list_lock in the IRQ.
>
> Are you able to track down the lock holder?
>
> > Note that the vmpressure sr_lock is irq-unsafe.
>
> Which is ok because this is only triggered from the memory reclaim and
> that cannot ever happen from the interrrupt context for obvoius reasons.
>
> > Couple of months back, we observed a similar
> > situation with swap locks which forces us to disable swap on global
> > pressure. Since we do proactive reclaim disabling swap on global reclaim
> > was not an issue. However now we have started seeing the same situation
> > with other irq-unsafe locks like vmpressure sr_lock and almost all the
> > slab shrinkers have irq-unsafe spinlocks. One of way to mitigate this
> > is by converting all such locks (which can be taken in reclaim path)
> > to be irq-safe but it does not seem like a maintainable solution.
>
> This doesn't make much sense to be honest. We are not disabling IRQs
> unless it is absolutely necessary.
>
> > Please note that we are running user space oom-killer which is more
> > aggressive than oomd/PSI but even that got stuck under this much memory
> > pressure.
> >
> > I am wondering if anyone else has seen a similar situation in production
> > and if there is a recommended way to resolve this situation.
>
> I would recommend to focus on tracking down the who is blocking the
> further progress.

I was able to find the CPU next in line for the list_lock from the
dump. I don't think anyone is blocking the progress as such but more
like the spinlock in the irq context is starving the spinlock in the
process context. This is a high traffic machine and there are tens of
thousands of potential network ACKs on the queue.

I talked about this problem with Johannes at LPC 2019 and I think we
talked about two potential solutions. First was to somehow give memory
reserves to oomd and second was in-kernel PSI based oom-killer. I am
not sure the first one will work in this situation but the second one
might help.

Shakeel

next prev parent reply	other threads:[~2020-09-22 13:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-21 18:35 Shakeel Butt
2020-09-22 11:12 ` Michal Hocko
2020-09-22 13:37   ` Shakeel Butt [this message]
2020-09-22 15:16     ` Michal Hocko
2020-09-22 16:29       ` Shakeel Butt
2020-09-22 16:34         ` Michal Hocko
2020-09-22 16:51           ` Shakeel Butt
2020-09-22 17:01             ` Michal Hocko
2020-10-30 17:01               ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod6=VwQduoG3GiW-=csAQja4vCsXAhKH_tSuA4JYx0dEiA@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox