> Hmm it's actually interesting to see GFP_TRANSHUGE there and not
> GFP_TRANSHUGE_LIGHT. What's your thp defrag setting? (cat
> /sys/kernel/mm/transparent_hugepage/enabled). Maybe it's set to
> "always", or there's a heavily faulting process that's using
> madvise(MADV_HUGEPAGE). If that's the case, setting it to "defer" or

> even "never" could be a workaround.

cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

according to the docs this is the default

> "madvise" will enter direct reclaim like "always" but only for regions
> that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

would any change there kick in immediately, even when in the 100M/10G case?

> or there's a heavily faulting process that's using madvise(MADV_HUGEPAGE)

are you suggesting that a/one process can cause this?

how would one be able to identify it..? should killing it allow the cache to be

populated again instantly? if yes, then I could start killing all processes on the

host until there is improvement to observe.

so far I can tell that it is not the database server, since restarting it did not help at all.

Please remember that, suggesting this, I can see how buffers (the 100MB value)

are `oscillating`. When in the cache-useless state it jumps around literally every second

from e.g. 100 to 102, then 99, 104, 85, 101, 105, 98, .. and so on, where it always gets

closer from well-populated several GB in the beginning to those 100MB over the days.

so doing anything that should cause an effect would be easily measurable instantly,

which is to date only achieved by dropping caches.

Please tell me if you need any measurements again, when or at what state, with code

snippets perhaps to fit your needs.

Am Do., 23. Aug. 2018 um 14:21 Uhr schrieb Michal Hocko <mhocko@suse.com>:

On Thu 23-08-18 14:10:28, Vlastimil Babka wrote:
> On 08/22/2018 10:02 PM, Marinko Catovic wrote:
> >> It might be also interesting to do in the problematic state, instead of
> >> dropping caches:
> >>
> >> - save snapshot of /proc/vmstat and /proc/pagetypeinfo
> >> - echo 1 > /proc/sys/vm/compact_memory
> >> - save new snapshot of /proc/vmstat and /proc/pagetypeinfo
> >
> > There was just a worstcase in progress, about 100MB/10GB were used,
> > super-low perfomance, but could not see any improvement there after echo 1,
> > I watches this for about 3 minutes, the cache usage did not change.
> >
> > pagetypeinfo before echo https://pastebin.com/MjSgiMRL
> > pagetypeinfo 3min after echo https://pastebin.com/uWM6xGDd
> >
> > vmstat before echo https://pastebin.com/TjYSKNdE
> > vmstat 3min after echo https://pastebin.com/MqTibEKi
>
> OK, that confirms compaction is useless here. Thanks.
>
> It also shows that all orders except order-9 are in fact plentiful.
> Michal's earlier summary of the trace shows that most allocations are up
> to order-3 and should be fine, the exception is THP:
>
> 277 9 GFP_TRANSHUGE|__GFP_THISNODE

But please note that this is not from the time when the page cache
dropped to the observed values. So we do not know what happened at the
time.

Anyway 277 THP pages paging out such a large page cache amount would be
more than unexpected even for explicitly costly THP fault in methods.
--
Michal Hocko
SUSE Labs