> > one host is at a healthy state right now, I'd run that over there > immediately. > > Let's see what we can get from here. > oh well, that went fast. actually with having low values for buffers (around 100MB) with caches around 20G or so, the performance was nevertheless super-low, I really had to drop the caches right now. This is the first time I see it with caches >10G happening, but hopefully this also provides a clue for you. Just after starting the stats I reset from previously defer to madvise - I suspect that this somehow caused the rapid reaction, since a few minutes later I saw that the free RAM jumped from 5GB to 10GB, after that I went afk, returning to the pc since my monitoring systems went crazy telling me about downtime. If you think changing /sys/kernel/mm/transparent_hugepage/defrag back to its default, while it was on defer now for days, was a mistake, then please tell me. here you go: https://nofile.io/f/VqRg644AT01/vmstat.tar.gz trace_pipe: https://nofile.io/f/wFShvZScpvn/trace_pipe.gz