On Mon 06-08-18 15:37:14, Cristopher Lameter wrote: > > On Mon, 6 Aug 2018, Michal Hocko wrote: > > > > > Because a lot of FS metadata is fragmenting the memory and a large > > > number of high order allocations which want to be served reclaim a lot > > > of memory to achieve their gol. Considering a large part of memory is > > > fragmented by unmovable objects there is no other way than to use > > > reclaim to release that memory. > > > > Well it looks like the fragmentation issue gets worse. Is that enough to > > consider merging the slab defrag patchset and get some work done on > inodes > > and dentries to make them movable (or use targetd reclaim)? > > Is there anything to test? > -- > Michal Hocko > SUSE Labs > > [Please do not top-post] like this? > The only way how kmemcg limit could help I can think of would be to > enforce metadata reclaim much more often. But that is rather a bad > workaround. would that have some significant performance impact? I would be willing to try if you think the idea is not thaaat bad. If so, could you please explain what to do? > > > Because a lot of FS metadata is fragmenting the memory and a large > > > number of high order allocations which want to be served reclaim a lot > > > of memory to achieve their gol. Considering a large part of memory is > > > fragmented by unmovable objects there is no other way than to use > > > reclaim to release that memory. > > > > Well it looks like the fragmentation issue gets worse. Is that enough to > > consider merging the slab defrag patchset and get some work done on inodes > > and dentries to make them movable (or use targetd reclaim)? > Is there anything to test? Are you referring to some known issue there, possibly directly related to mine? If so, I would be willing to test that patchset, if it makes into the kernel.org sources, or if I'd have to patch that manually. > Well, there are some drivers (mostly out-of-tree) which are high order > hungry. You can try to trace all allocations which with order > 0 and > see who that might be. > # mount -t tracefs none /debug/trace/ > # echo stacktrace > /debug/trace/trace_options > # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter > # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable > # cat /debug/trace/trace_pipe > > And later this to disable tracing. > # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable I just had a major cache-useless situation, with like 100M/8G usage only and horrible performance. There you go: https://nofile.io/f/mmwVedaTFsd I think mysql occurs mostly, regardless of the binary name this is actually mariadb in version 10.1. > You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches > should be sufficient to drop metadata only. that is exactly what I am doing, I already mentioned that 1> does not make any difference at all 2> is the only way that helps. just 5 minutes after doing that the usage grew to 2GB/10GB and is steadily going up, as usual.