On Mon 06-08-18 15:37:14, Cristopher Lameter wrote:
> > On Mon, 6 Aug 2018, Michal Hocko wrote:
> >
> > > Because a lot of FS metadata is fragmenting the memory and a large
> > > number of high order allocations which want to be served reclaim a lot
> > > of memory to achieve their gol. Considering a large part of memory is
> > > fragmented by unmovable objects there is no other way than to use
> > > reclaim to release that memory.
> >
> > Well it looks like the fragmentation issue gets worse. Is that enough to
> > consider merging the slab defrag patchset and get some work done on
> inodes
> > and dentries to make them movable (or use targetd reclaim)?
>
> Is there anything to test?
> --
> Michal Hocko
> SUSE Labs
>

> [Please do not top-post]

like this?

> The only way how kmemcg limit could help I can think of would be to
> enforce metadata reclaim much more often. But that is rather a bad
> workaround.

would that have some significant performance impact?
I would be willing to try if you think the idea is not thaaat bad.
If so, could you please explain what to do?

> > > Because a lot of FS metadata is fragmenting the memory and a large
> > > number of high order allocations which want to be served reclaim a lot
> > > of memory to achieve their gol. Considering a large part of memory is
> > > fragmented by unmovable objects there is no other way than to use
> > > reclaim to release that memory.
> >
> > Well it looks like the fragmentation issue gets worse. Is that enough to
> > consider merging the slab defrag patchset and get some work done on
inodes
> > and dentries to make them movable (or use targetd reclaim)?

> Is there anything to test?

Are you referring to some known issue there, possibly directly related to
mine?
If so, I would be willing to test that patchset, if it makes into the
kernel.org sources,
or if I'd have to patch that manually.


> Well, there are some drivers (mostly out-of-tree) which are high order
> hungry. You can try to trace all allocations which with order > 0 and
> see who that might be.
> # mount -t tracefs none /debug/trace/
> # echo stacktrace > /debug/trace/trace_options
> # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter
> # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable
> # cat /debug/trace/trace_pipe
>
> And later this to disable tracing.
> # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable

I just had a major cache-useless situation, with like 100M/8G usage only
and horrible performance. There you go:

https://nofile.io/f/mmwVedaTFsd

I think mysql occurs mostly, regardless of the binary name this is actually
mariadb in version 10.1.

> You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches
> should be sufficient to drop metadata only.

that is exactly what I am doing, I already mentioned that 1> does not
make any difference at all 2> is the only way that helps.
just 5 minutes after doing that the usage grew to 2GB/10GB and is steadily
going up, as usual.