linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Linux MM <linux-mm@kvack.org>
Subject: Re: Softlockup during memory allocation
Date: Thu, 24 Nov 2016 15:09:38 +0200	[thread overview]
Message-ID: <a655e607-91c5-173c-ec3a-e211df598f92@kyup.com> (raw)
In-Reply-To: <20161124121209.GE20668@dhcp22.suse.cz>



On 11/24/2016 02:12 PM, Michal Hocko wrote:
> On Thu 24-11-16 13:45:03, Nikolay Borisov wrote:
> [...]
>> Ok, I think I know what has happened. Inspecting the data structures of
>> the respective cgroup here is what the mem_cgroup_per_zone looks like:
>>
>>   zoneinfo[2] =   {
>>     lruvec = {{
>>         lists = {
>>           {
>>             next = 0xffffea004f98c660,
>>             prev = 0xffffea0063f6b1a0
>>           },
>>           {
>>             next = 0xffffea0004123120,
>>             prev = 0xffffea002c2e2260
>>           },
>>           {
>>             next = 0xffff8818c37bb360,
>>             prev = 0xffff8818c37bb360
>>           },
>>           {
>>             next = 0xffff8818c37bb370,
>>             prev = 0xffff8818c37bb370
>>           },
>>           {
>>             next = 0xffff8818c37bb380,
>>             prev = 0xffff8818c37bb380
>>           }
>>         },
>>         reclaim_stat = {
>>           recent_rotated = {172969085, 43319509},
>>           recent_scanned = {173112994, 185446658}
>>         },
>>         zone = 0xffff88207fffcf00
>>     }},
>>     lru_size = {159722, 158714, 0, 0, 0},
>>     }
>>
>> So this means that there are inactive_anon and active_annon only -
>> correct?
> 
> yes. at least in this particular zone.
> 
>> Since the machine doesn't have any swap this means anon memory
>> has nowhere to go. If I'm interpreting the data correctly then this
>> explains why reclaim makes no progress. If that's the case then I have
>> the following questions:
>>
>> 1. Shouldn't reclaim exit at some point rather than being stuck in
>> reclaim without making further progress.
> 
> Reclaim (try_to_free_mem_cgroup_pages) has to go down all priorities
> without to get out. We are not doing any pro-active checks whether there
> is anything reclaimable but that alone shouldn't be such a big deal
> because shrink_node_memcg should simply do nothing because
> get_scan_count will find no pages to scan. So it shouldn't take much
> time to realize there is nothing to reclaim and get back to try_charge
> which retries few more times and eventually goes OOM. I do not see how
> we could trigger rcu stalls here. There shouldn't be any long RCU
> critical section on the way and preemption points on the way.
> 
>> 2. It seems rather strange that there are no (INACTIVE|ACTIVE)_FILE
>> pages - is this possible?
> 
> All of them might be reclaimed already as a result of the memory
> pressure in the memcg. So not all that surprising. But the fact that
> you are hitting the limit means that the anonymous pages saturate your
> hard limit so your memcg seems underprovisioned.
> 
>> 3. Why hasn't OOM been activated in order to free up some anonymous memory ?
> 
> It should eventually. Maybe there still were some reclaimable pages in
> other zones for this memcg.

I just checked all the zones for both nodes (the machines have 2 NUMA
nodes) so essentially there are no reclaimable pages - all are
anonymous. So the pertinent question is why process are sleeping in
reclamation path when there are no pages to free. I also observed the
same behavior on a different node, this time the priority was 0 and the
code hasn't resorted to OOM. This seems all too strange..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-11-24 13:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-01  8:12 Nikolay Borisov
2016-11-01  8:16 ` Nikolay Borisov
2016-11-02 19:00 ` Vlastimil Babka
2016-11-04  3:46   ` Hugh Dickins
2016-11-04 12:18 ` Nikolay Borisov
2016-11-13 22:02   ` Nikolay Borisov
2016-11-21  5:31     ` Michal Hocko
2016-11-22  8:56       ` Nikolay Borisov
2016-11-22 14:30         ` Michal Hocko
2016-11-22 14:32           ` Michal Hocko
2016-11-22 14:46             ` Nikolay Borisov
2016-11-22 14:35           ` Nikolay Borisov
2016-11-22 17:02             ` Michal Hocko
2016-11-23  7:44               ` Nikolay Borisov
2016-11-23  7:49                 ` Michal Hocko
2016-11-23  7:50                   ` Michal Hocko
2016-11-24 11:45                   ` Nikolay Borisov
2016-11-24 12:12                     ` Michal Hocko
2016-11-24 13:09                       ` Nikolay Borisov [this message]
2016-11-25  9:00                         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a655e607-91c5-173c-ec3a-e211df598f92@kyup.com \
    --to=kernel@kyup.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox