From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f46.google.com (mail-vk0-f46.google.com [209.85.213.46]) by kanga.kvack.org (Postfix) with ESMTP id AB8446B0255 for ; Wed, 2 Mar 2016 21:23:04 -0500 (EST) Received: by mail-vk0-f46.google.com with SMTP id c3so7946090vkb.3 for ; Wed, 02 Mar 2016 18:23:04 -0800 (PST) Received: from mail-vk0-x232.google.com (mail-vk0-x232.google.com. [2607:f8b0:400c:c05::232]) by mx.google.com with ESMTPS id b135si23921087vke.26.2016.03.02.18.23.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Mar 2016 18:23:04 -0800 (PST) Received: by mail-vk0-x232.google.com with SMTP id c3so7945910vkb.3 for ; Wed, 02 Mar 2016 18:23:03 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20160302173639.GD26701@dhcp22.suse.cz> References: <20160302173639.GD26701@dhcp22.suse.cz> Date: Thu, 3 Mar 2016 10:23:03 +0800 Message-ID: Subject: Re: kswapd consumes 100% CPU when highest zone is small From: Jerry Lee Content-Type: multipart/alternative; boundary=001a114314dc9f274b052d1bad63 Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org --001a114314dc9f274b052d1bad63 Content-Type: text/plain; charset=UTF-8 On 3 March 2016 at 01:36, Michal Hocko wrote: > On Wed 02-03-16 14:20:38, Jerry Lee wrote: > > Hi, > > > > I have a x86_64 system with 2G RAM using linux-3.12.x. During copying > > large > > files (e.g. 100GB), kswapd easily consumes 100% CPU until the file is > > deleted > > or the page cache is dropped. With setting the min_free_kbytes from > 16384 > > to > > 65536, the symptom is mitigated but I can't totally get rid of the > problem. > > > > After some trial and error, I found that highest zone is always > unbalanced > > with > > order-0 page request so that pgdat_blanaced() continuously return false > and > > kswapd can't sleep. > > > > Here's the watermarks (min_free_kbytes = 65536) in my system: > > Node 0, zone DMA > > pages free 2167 > > min 138 > > low 172 > > high 207 > > scanned 0 > > spanned 4095 > > present 3996 > > managed 3974 > > > > Node 0, zone DMA32 > > pages free 215375 > > min 16226 > > low 20282 > > high 24339 > > scanned 0 > > spanned 1044480 > > present 490971 > > managed 464223 > > > > Node 0, zone Normal > > pages free 7 > > min 18 > > low 22 > > high 27 > > scanned 0 > > spanned 1536 > > present 1536 > > managed 523 > > The zone Normal is just too small and that confuses the reclaim path. > > > > > Besides, when the kswapd crazily spins, the value of the following > entries > > in vmstat increases quickly even when I stop copying file: > > > > pgalloc_dma 17719 > > pgalloc_dma32 3262823 > > slabs_scanned 937728 > > kswapd_high_wmark_hit_quickly 54333233 > > pageoutrun 54333235 > > > > Is there anything I could do to totally get rid of the problem? > > I would try to sacrifice those few megs and get rid of zone normal > completely. AFAIR mem=4G should limit the max_pfn to 4G so DMA32 should > cover the shole memory. > I came up with a patch that seem to work well on my system. But, I am afraid that it breaks the rule that all zones must be balanced for order-0 request and It may cause some other side-effect? I thought that the patch is just a workaround (a bad one) and not a cure-all. BTW, if I upgrade the RAM from 2G to 4G, the problem is gone because the Normal zone won't confuse the reclaim path as you said before. Thanks --- a/linux-3.12.6/mm/vmscan.c +++ b/linux-3.12.6/mm/vmscan.c @@ -2755,6 +2755,7 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx) unsigned long managed_pages = 0; unsigned long balanced_pages = 0; int i; +#define HWMARK_THRESHOLD 128 /* Check the watermark levels */ for (i = 0; i <= classzone_idx; i++) { @@ -2779,7 +2780,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx) if (zone_balanced(zone, order, 0, i)) balanced_pages += zone->managed_pages; - else if (!order) + else if (!order && + (high_wmark_pages(zone) > HWMARK_THRESHOLD)) return false; } > -- > Michal Hocko > SUSE Labs > --001a114314dc9f274b052d1bad63 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On 3 March 2016 at 01:36, Michal Hocko <mhocko@kernel.org> w= rote:
=
On Wed 02-03-16 14:20:38, Jerry Lee wrote:
> Hi,
>
> I have a x86_64 system with 2G RAM using linux-3.12.x.=C2=A0 During co= pying
> large
> files (e.g. 100GB), kswapd easily consumes 100% CPU until the file is<= br> > deleted
> or the page cache is dropped.=C2=A0 With setting the min_free_kbytes f= rom 16384
> to
> 65536, the symptom is mitigated but I can't totally get rid of the= problem.
>
> After some trial and error, I found that highest zone is always unbala= nced
> with
> order-0 page request so that pgdat_blanaced() continuously return fals= e and
> kswapd can't sleep.
>
> Here's the watermarks (min_free_kbytes =3D 65536) in my system: > Node 0, zone=C2=A0 =C2=A0 =C2=A0 DMA
>=C2=A0 =C2=A0pages free=C2=A0 =C2=A0 =C2=A02167
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0min=C2=A0 =C2=A0 =C2=A0 138
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0low=C2=A0 =C2=A0 =C2=A0 172
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0high=C2=A0 =C2=A0 =C2=A0207
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scanned=C2=A0 0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spanned=C2=A0 4095
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0present=C2=A0 3996
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0managed=C2=A0 3974
>
> Node 0, zone=C2=A0 =C2=A0 DMA32
>=C2=A0 =C2=A0pages free=C2=A0 =C2=A0 =C2=A0215375
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0min=C2=A0 =C2=A0 =C2=A0 16226
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0low=C2=A0 =C2=A0 =C2=A0 20282
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0high=C2=A0 =C2=A0 =C2=A024339
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scanned=C2=A0 0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spanned=C2=A0 1044480
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0present=C2=A0 490971
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0managed=C2=A0 464223
>
> Node 0, zone=C2=A0 =C2=A0Normal
>=C2=A0 =C2=A0pages free=C2=A0 =C2=A0 =C2=A07
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0min=C2=A0 =C2=A0 =C2=A0 18
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0low=C2=A0 =C2=A0 =C2=A0 22
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0high=C2=A0 =C2=A0 =C2=A027
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scanned=C2=A0 0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spanned=C2=A0 1536
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0present=C2=A0 1536
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0managed=C2=A0 523

The zone Normal is just too small and that confuses the reclaim= path.

>
> Besides, when the kswapd crazily spins, the value of the following ent= ries
> in vmstat increases quickly even when I stop copying file:
>
> pgalloc_dma 17719
> pgalloc_dma32 3262823
> slabs_scanned 937728
> kswapd_high_wmark_hit_quickly 54333233
> pageoutrun 54333235
>
> Is there anything I could do to totally get rid of the problem?

I would try to sacrifice those few megs and get rid of zone normal completely. AFAIR mem=3D4G should limit the max_pfn to 4G so DMA32 should cover the shole memory.

I came up with = a patch that seem to work well on my system.=C2=A0 But, I am afraid
tha= t it breaks the rule that all zones must be balanced for order-0 request an= d
It may cause some other side-effect?=C2=A0 I thought that the patch i= s just a workaround
(a bad one) and not a cure-all.

<= /div>
BTW, if I upgrade the RAM from 2G to 4G, the problem is gone beca= use the
Normal zone won't confuse the reclaim path as yo= u said before.

Thanks


--= - a/linux-3.12.6/mm/vmscan.c
+++ b/linux-3.12.6/mm/vmscan.c
@@ -2755,= 6 +2755,7 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int cl= asszone_idx)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long ma= naged_pages =3D 0;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned l= ong balanced_pages =3D 0;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int= i;
+#define HWMARK_THRESHOLD 128
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 /* Check the watermark levels */
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 for (i =3D 0; i <=3D classzone_idx; i++) {
@= @ -2779,7 +2780,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order= , int classzone_idx)
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (zone_balanced(zone,= order, 0, i))
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 balanced_pages +=3D zone->managed_pages;
-=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if = (!order)
+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 else if (!order &&
+=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (high_wmark_pages(zone) > = HWMARK_THRESHOLD))
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 return false;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }<= br>

=C2=A0
--
Michal Hocko
SUSE Labs

--001a114314dc9f274b052d1bad63-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org