Re: Caching/buffers become useless after some time

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marinko Catovic <marinko.catovic@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, Christopher Lameter <cl@linux.com>
Subject: Re: Caching/buffers become useless after some time
Date: Tue, 30 Oct 2018 19:26:32 +0100	[thread overview]
Message-ID: <CADF2uSry7SNQE0NPazAtra-4OELPonnWzzhbrBcqGRiVKWRg5Q@mail.gmail.com> (raw)
In-Reply-To: <98305976-612f-cf6d-1377-2f9f045710a9@suse.cz>

Am Di., 30. Okt. 2018 um 18:03 Uhr schrieb Vlastimil Babka <vbabka@suse.cz>:
>
> On 10/30/18 5:08 PM, Marinko Catovic wrote:
> >> One notable thing here is that there shouldn't be any reason to do the
> >> direct reclaim when kswapd itself doesn't do anything. It could be
> >> either blocked on something but I find it quite surprising to see it in
> >> that state for the whole 1500s time period or we are simply not low on
> >> free memory at all. That would point towards compaction triggered memory
> >> reclaim which account as the direct reclaim as well. The direct
> >> compaction triggered more than once a second in average. We shouldn't
> >> really reclaim unless we are low on memory but repeatedly failing
> >> compaction could just add up and reclaim a lot in the end. There seem to
> >> be quite a lot of low order request as per your trace buffer
>
> I realized that the fact that slabs grew so large might be very
> relevant. It means a lot of unmovable pages, and while they are slowly
> being freed, the remaining are scattered all over the memory, making it
> impossible to successfully compact, until the slabs are almost
> *completely* freed. It's in fact the theoretical worst case scenario for
> compaction and fragmentation avoidance. Next time it would be nice to
> also gather /proc/pagetypeinfo, and /proc/slabinfo to see what grew so
> much there (probably dentries and inodes).

how would you like the results? as a job collecting those from 3 >
drop_caches until worst case, which may be 24 hours every 5 seconds,
or at what point in time?
Please note that I already provided them (see my response before) as a
one-time snapshot while being in the worst case;

cat /proc/pagetypeinfo https://pastebin.com/W1sJscsZ
cat /proc/slabinfo     https://pastebin.com/9ZPU3q7X

> The question is why the problems happened some time later after the
> unmovable pollution. The trace showed me that the structure of
> allocations wrt order+flags as Michal breaks them down below, is not
> significanly different in the last phase than in the whole trace.
> Possibly the state of memory gradually changed so that the various
> heuristics (fragindex, pageblock skip bits etc) resulted in compaction
> being tried more than initially, eventually hitting a very bad corner case.
>
> >> $ grep order trace-last-phase | sed 's@.*\(order=[0-9]*\).*gfp_flags=\(.*\)@\1 \2@' | sort | uniq -c
> >>    1238 order=1 __GFP_HIGH|__GFP_ATOMIC|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
> >>    5812 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
> >>     121 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
> >>      22 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>  395910 order=1 GFP_KERNEL_ACCOUNT|__GFP_ZERO
> >>  783055 order=1 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>    1060 order=1 __GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
> >>    3278 order=2 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
> >>  797255 order=2 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>   93524 order=3 GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>  498148 order=3 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>  243563 order=3 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>      10 order=4 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>     114 order=7 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>   67621 order=9 GFP_TRANSHUGE|__GFP_THISNODE
> >>
> >> We can safely rule out NOWAIT and ATOMIC because those do not reclaim.
> >> That leaves us with
> >>    5812 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
> >>     121 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
> >>      22 order=1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>  395910 order=1 GFP_KERNEL_ACCOUNT|__GFP_ZERO
>
> I suspect there are lots of short-lived processes, so these are probably
> rapidly recycled and not causing compaction.

Well yes, since it is about shared hosting there are lots of users,
running lots of scripts, perhaps 5-50 new forks and kills every
second, depending on load, hard to tell.

> It also seems to be pgd allocation (2 pages due to PTI) not kernel stack?

plain english, please? :)

> >>    1060 order=1 __GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
> >>    3278 order=2 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
> >>      10 order=4 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>     114 order=7 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE
> >>   67621 order=9 GFP_TRANSHUGE|__GFP_THISNODE
>
> I would again suspect those. IIRC we already confirmed earlier that THP
> defrag setting is madvise or madvise+defer, and there are
> madvise(MADV_HUGEPAGE) using processes? Did you ever try changing defrag
> to plain 'defer'?

Yes, I think I mentioned this before. AFAIK it did not make
(immediate) changes, madvise is the current type.

> and there are madvise(MADV_HUGEPAGE) using processes?

Can't tell you that..

> >>
> >> by large the kernel stack allocations are in lead. You can put some
> >> relief by enabling CONFIG_VMAP_STACK. There is alos a notable number of
> >> THP pages allocations. Just curious are you running on a NUMA machine?
> >> If yes [1] might be relevant. Other than that nothing really jumped at
> >> me.
>
>
> > thanks a lot Vlastimil!
>
> And Michal :)
>
> > I would not really know whether this is a NUMA, it is some usual
> > server running with a i7-8700
> > and ECC RAM. How would I find out?
>
> Please provide /proc/zoneinfo and we'll see.

there you go: cat /proc/zoneinfo     https://pastebin.com/RMTwtXGr

> > So I should do CONFIG_VMAP_STACK=y and try that..?
>
> I suspect you already have it.

Yes true, the currently loaded kernel is with =y there.

next prev parent reply	other threads:[~2018-10-30 18:26 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 13:18 Marinko Catovic
2018-07-12 11:34 ` Michal Hocko
2018-07-13 15:48   ` Marinko Catovic
2018-07-16 15:53     ` Marinko Catovic
2018-07-16 16:23       ` Michal Hocko
2018-07-16 16:33         ` Marinko Catovic
2018-07-16 16:45           ` Michal Hocko
2018-07-20 22:03             ` Marinko Catovic
2018-07-27 11:15               ` Vlastimil Babka
2018-07-30 14:40                 ` Michal Hocko
2018-07-30 22:08                   ` Marinko Catovic
2018-08-02 16:15                     ` Vlastimil Babka
2018-08-03 14:13                       ` Marinko Catovic
2018-08-06  9:40                         ` Vlastimil Babka
2018-08-06 10:29                           ` Marinko Catovic
2018-08-06 12:00                             ` Michal Hocko
2018-08-06 15:37                               ` Christopher Lameter
2018-08-06 18:16                                 ` Michal Hocko
2018-08-09  8:29                                   ` Marinko Catovic
2018-08-21  0:36                                     ` Marinko Catovic
2018-08-21  6:49                                       ` Michal Hocko
2018-08-21  7:19                                         ` Vlastimil Babka
2018-08-22 20:02                                           ` Marinko Catovic
2018-08-23 12:10                                             ` Vlastimil Babka
2018-08-23 12:21                                               ` Michal Hocko
2018-08-24  0:11                                                 ` Marinko Catovic
2018-08-24  6:34                                                   ` Vlastimil Babka
2018-08-24  8:11                                                     ` Marinko Catovic
2018-08-24  8:36                                                       ` Vlastimil Babka
2018-08-29 14:54                                                         ` Marinko Catovic
2018-08-29 15:01                                                           ` Michal Hocko
2018-08-29 15:13                                                             ` Marinko Catovic
2018-08-29 15:27                                                               ` Michal Hocko
2018-08-29 16:44                                                                 ` Marinko Catovic
2018-10-22  1:19                                                                   ` Marinko Catovic
2018-10-23 17:41                                                                     ` Marinko Catovic
2018-10-26  5:48                                                                       ` Marinko Catovic
2018-10-26  8:01                                                                     ` Michal Hocko
2018-10-26 23:31                                                                       ` Marinko Catovic
2018-10-27  6:42                                                                         ` Michal Hocko
     [not found]                                                                     ` <6e3a9434-32f2-0388-e0c7-2bd1c2ebc8b1@suse.cz>
2018-10-30 15:30                                                                       ` Michal Hocko
2018-10-30 16:08                                                                         ` Marinko Catovic
2018-10-30 17:00                                                                           ` Vlastimil Babka
2018-10-30 18:26                                                                             ` Marinko Catovic [this message]
2018-10-31  7:34                                                                               ` Michal Hocko
2018-10-31  7:32                                                                             ` Michal Hocko
2018-10-31 13:40                                                                             ` Vlastimil Babka
2018-10-31 14:53                                                                               ` Marinko Catovic
2018-10-31 17:01                                                                                 ` Michal Hocko
2018-10-31 19:21                                                                                   ` Marinko Catovic
2018-11-01 13:23                                                                                     ` Michal Hocko
2018-11-01 22:46                                                                                       ` Marinko Catovic
2018-11-02  8:05                                                                                         ` Michal Hocko
2018-11-02 11:31                                                                                           ` Marinko Catovic
2018-11-02 11:49                                                                                             ` Michal Hocko
2018-11-02 12:22                                                                                               ` Vlastimil Babka
2018-11-02 12:41                                                                                                 ` Marinko Catovic
2018-11-02 13:13                                                                                                   ` Vlastimil Babka
2018-11-02 13:50                                                                                                     ` Marinko Catovic
2018-11-02 14:49                                                                                                       ` Vlastimil Babka
2018-11-02 14:59                                                                                 ` Vlastimil Babka
2018-11-30 12:01                                                                                   ` Marinko Catovic
2018-12-10 21:30                                                                                     ` Marinko Catovic
2018-12-10 21:47                                                                                       ` Michal Hocko
2018-10-31 13:12                                                                     ` Vlastimil Babka
2018-08-24  6:24                                                 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADF2uSry7SNQE0NPazAtra-4OELPonnWzzhbrBcqGRiVKWRg5Q@mail.gmail.com \
    --to=marinko.catovic@gmail.com \
    --cc=cl@linux.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox