From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by kanga.kvack.org (Postfix) with ESMTP id 99D896B000A for ; Thu, 9 Aug 2018 04:29:35 -0400 (EDT) Received: by mail-wr1-f70.google.com with SMTP id a9-v6so4025130wrw.20 for ; Thu, 09 Aug 2018 01:29:35 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id e3-v6sor2607793wru.81.2018.08.09.01.29.33 for (Google Transport Security); Thu, 09 Aug 2018 01:29:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180806181638.GE10003@dhcp22.suse.cz> References: <20180730144048.GW24267@dhcp22.suse.cz> <1f862d41-1e9f-5324-fb90-b43f598c3955@suse.cz> <30f7ec9a-e090-06f1-1851-b18b3214f5e3@suse.cz> <20180806120042.GL19540@dhcp22.suse.cz> <010001650fe29e66-359ffa28-9290-4e83-a7e2-b6d1d8d2ee1d-000000@email.amazonses.com> <20180806181638.GE10003@dhcp22.suse.cz> From: Marinko Catovic Date: Thu, 9 Aug 2018 10:29:33 +0200 Message-ID: Subject: Re: Caching/buffers become useless after some time Content-Type: multipart/alternative; boundary="0000000000003edf8c0572fc6dd1" Sender: owner-linux-mm@kvack.org List-ID: Cc: Christopher Lameter , Vlastimil Babka , linux-mm@kvack.org --0000000000003edf8c0572fc6dd1 Content-Type: text/plain; charset="UTF-8" On Mon 06-08-18 15:37:14, Cristopher Lameter wrote: > > On Mon, 6 Aug 2018, Michal Hocko wrote: > > > > > Because a lot of FS metadata is fragmenting the memory and a large > > > number of high order allocations which want to be served reclaim a lot > > > of memory to achieve their gol. Considering a large part of memory is > > > fragmented by unmovable objects there is no other way than to use > > > reclaim to release that memory. > > > > Well it looks like the fragmentation issue gets worse. Is that enough to > > consider merging the slab defrag patchset and get some work done on > inodes > > and dentries to make them movable (or use targetd reclaim)? > > Is there anything to test? > -- > Michal Hocko > SUSE Labs > > [Please do not top-post] like this? > The only way how kmemcg limit could help I can think of would be to > enforce metadata reclaim much more often. But that is rather a bad > workaround. would that have some significant performance impact? I would be willing to try if you think the idea is not thaaat bad. If so, could you please explain what to do? > > > Because a lot of FS metadata is fragmenting the memory and a large > > > number of high order allocations which want to be served reclaim a lot > > > of memory to achieve their gol. Considering a large part of memory is > > > fragmented by unmovable objects there is no other way than to use > > > reclaim to release that memory. > > > > Well it looks like the fragmentation issue gets worse. Is that enough to > > consider merging the slab defrag patchset and get some work done on inodes > > and dentries to make them movable (or use targetd reclaim)? > Is there anything to test? Are you referring to some known issue there, possibly directly related to mine? If so, I would be willing to test that patchset, if it makes into the kernel.org sources, or if I'd have to patch that manually. > Well, there are some drivers (mostly out-of-tree) which are high order > hungry. You can try to trace all allocations which with order > 0 and > see who that might be. > # mount -t tracefs none /debug/trace/ > # echo stacktrace > /debug/trace/trace_options > # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter > # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable > # cat /debug/trace/trace_pipe > > And later this to disable tracing. > # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable I just had a major cache-useless situation, with like 100M/8G usage only and horrible performance. There you go: https://nofile.io/f/mmwVedaTFsd I think mysql occurs mostly, regardless of the binary name this is actually mariadb in version 10.1. > You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches > should be sufficient to drop metadata only. that is exactly what I am doing, I already mentioned that 1> does not make any difference at all 2> is the only way that helps. just 5 minutes after doing that the usage grew to 2GB/10GB and is steadily going up, as usual. --0000000000003edf8c0572fc6dd1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

=
On= Mon 06-08-18 15:37:14, Cristopher Lameter wrote:
> On Mon, 6 Aug 2018, Michal Hocko wrote:
>
> > Because a lot of FS metadata is fragmenting the memory and a larg= e
> > number of high order allocations which want to be served reclaim = a lot
> > of memory to achieve their gol. Considering a large part of memor= y is
> > fragmented by unmovable objects there is no other way than to use=
> > reclaim to release that memory.
>
> Well it looks like the fragmentation issue gets worse. Is that enough = to
> consider merging the slab defrag patchset and get some work done on in= odes
> and dentries to make them movable (or use targetd reclaim)?

Is there anything to test?
--
Michal Hocko
SUSE Labs

> [Please do n= ot top-post]

like this?

> The only way how kmemcg limit could help I can think of would be to=
> enforce metadata reclaim much more often. But that is rather a bad=
> workaround.

would that have some significant performance= impact?
I would be willing to try if you t= hink the idea is not thaaat bad.
If so, cou= ld you please explain what to do?

> > > Because a lot= of FS metadata is fragmenting the memory and a large
> > > number of high order allocations which want to be served rec= laim a lot
> > > of memory to achieve their gol. Considering a large part of = memory is
> > > fragmented by unmovable objects there is no other way than t= o use
> > > reclaim to release that memory.
> >
> > Well it looks like the fragmentation issue gets worse. Is that en= ough to
> > consider merging the slab defrag patchset and get some work done = on inodes
> > and dentries to make them movable (or use targetd reclaim)?

> Is there anything to test?

Are you referring to some known i= ssue there, possibly directly related to mine?
If so, I would be willing to test that patchset, if it makes into the <= a href=3D"http://kernel.org" target=3D"_blank">kernel.org sources,
or if I'd have to patch that manually.
<= /div>


> Well, there are some drivers (mostly ou= t-of-tree) which are high order
> hungry. You can try to trace all allocations which with order > 0 a= nd
> see who that might be.
> # mount -t tracefs none /debug/trace/
> # echo stacktrace > /debug/trace/trace_options
> # echo "order>0" > /debug/trace/events/kmem/mm_pa= ge_alloc/filter
> # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable
> # cat /debug/trace/trace_pipe
>
> And later this to disable tracing.
> # echo 0 > /debug/trace/= events/kmem/mm_page_alloc/enable
<= br>
I just had a major cache-useless situat= ion, with like 100M/8G usage only
and horri= ble performance. There you go:


I think mysql occurs= mostly, regardless of the binary name this is actually
mariadb in version 10.1.

<= /div>
> You do not have to drop all caches. echo 2 > /proc/sys/vm/dr= op_caches
> should be sufficient to drop metadata only.

that is exactly wha= t I am doing, I already mentioned that 1> does not
make any difference at all 2> is the only way that helps.
just 5 minutes after doing that the usage grew= to 2GB/10GB and is steadily
going up, as u= sual.
--0000000000003edf8c0572fc6dd1--