linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Linux MM <linux-mm@kvack.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	 LKML <linux-kernel@vger.kernel.org>,
	Greg Thelen <gthelen@google.com>
Subject: Machine lockups on extreme memory pressure
Date: Mon, 21 Sep 2020 11:35:35 -0700	[thread overview]
Message-ID: <CALvZod4FWLsV9byrKQojeus7tMDhHjQHFF5J_JpNsyB0HkaERA@mail.gmail.com> (raw)

Hi all,

We are seeing machine lockups due extreme memory pressure where the
free pages on all the zones are way below the min watermarks. The stack
of the stuck CPU looks like the following (I had to crash the machine to
get the info).

 #0 [ ] crash_nmi_callback
 #1 [ ] nmi_handle
 #2 [ ] default_do_nmi
 #3 [ ] do_nmi
 #4 [ ] end_repeat_nmi
--- <NMI exception stack> ---
 #5 [ ] queued_spin_lock_slowpath
 #6 [ ] _raw_spin_lock
 #7 [ ] ____cache_alloc_node
 #8 [ ] fallback_alloc
 #9 [ ] __kmalloc_node_track_caller
#10 [ ] __alloc_skb
#11 [ ] tcp_send_ack
#12 [ ] tcp_delack_timer
#13 [ ] run_timer_softirq
#14 [ ] irq_exit
#15 [ ] smp_apic_timer_interrupt
#16 [ ] apic_timer_interrupt
--- <IRQ stack> ---
#17 [ ] apic_timer_interrupt
#18 [ ] _raw_spin_lock
#19 [ ] vmpressure
#20 [ ] shrink_node
#21 [ ] do_try_to_free_pages
#22 [ ] try_to_free_pages
#23 [ ] __alloc_pages_direct_reclaim
#24 [ ] __alloc_pages_nodemask
#25 [ ] cache_grow_begin
#26 [ ] fallback_alloc
#27 [ ] __kmalloc_node_track_caller
#28 [ ] __alloc_skb
#29 [ ] tcp_sendmsg_locked
#30 [ ] tcp_sendmsg
#31 [ ] inet6_sendmsg
#32 [ ] ___sys_sendmsg
#33 [ ] sys_sendmsg
#34 [ ] do_syscall_64

These are high traffic machines. Almost all the CPUs are stuck on the
root memcg's vmpressure sr_lock and almost half of the CPUs are stuck
on kmem cache node's list_lock in the IRQ. Note that the vmpressure
sr_lock is irq-unsafe. Couple of months back, we observed a similar
situation with swap locks which forces us to disable swap on global
pressure. Since we do proactive reclaim disabling swap on global reclaim
was not an issue. However now we have started seeing the same situation
with other irq-unsafe locks like vmpressure sr_lock and almost all the
slab shrinkers have irq-unsafe spinlocks. One of way to mitigate this
is by converting all such locks (which can be taken in reclaim path)
to be irq-safe but it does not seem like a maintainable solution.

Please note that we are running user space oom-killer which is more
aggressive than oomd/PSI but even that got stuck under this much memory
pressure.

I am wondering if anyone else has seen a similar situation in production
and if there is a recommended way to resolve this situation.

thanks,
Shakeel


             reply	other threads:[~2020-09-21 18:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-21 18:35 Shakeel Butt [this message]
2020-09-22 11:12 ` Michal Hocko
2020-09-22 13:37   ` Shakeel Butt
2020-09-22 15:16     ` Michal Hocko
2020-09-22 16:29       ` Shakeel Butt
2020-09-22 16:34         ` Michal Hocko
2020-09-22 16:51           ` Shakeel Butt
2020-09-22 17:01             ` Michal Hocko
2020-10-30 17:01               ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod4FWLsV9byrKQojeus7tMDhHjQHFF5J_JpNsyB0HkaERA@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox