From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id 746306B0032 for ; Tue, 23 Jul 2013 00:58:24 -0400 (EDT) From: Lisa Du Date: Mon, 22 Jul 2013 21:58:17 -0700 Subject: Possible deadloop in direct reclaim? Message-ID: <89813612683626448B837EE5A0B6A7CB3B62F8F272@SC-VEXCH4.marvell.com> Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_89813612683626448B837EE5A0B6A7CB3B62F8F272SCVEXCH4marve_" MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: "linux-mm@kvack.org" --_000_89813612683626448B837EE5A0B6A7CB3B62F8F272SCVEXCH4marve_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dear Sir: Currently I met a possible deadloop in direct reclaim. After run plenty of = the application, system run into a status that system memory is very fragme= ntized. Like only order-0 and order-1 memory left. Then one process required a order-2 buffer but it enter an endless direct r= eclaim. From my trace log, I can see this loop already over 200,000 times. = Kswapd was first wake up and then go back to sleep as it cannot rebalance t= his order's memory. But zone->all_unreclaimable remains 1. Though direct_reclaim every time returns no pages, but as zone->all_unrecla= imable =3D 1, so it loop again and again. Even when zone->pages_scanned als= o becomes very large. It will block the process for long time, until some w= atchdog thread detect this and kill this process. Though it's in __alloc_pa= ges_slowpath, but it's too slow right? Maybe cost over 50 seconds or even m= ore. I think it's not as expected right? Can we also add below check in the fun= ction all_unreclaimable() to terminate this loop? @@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist *zoneli= st, continue; if (!zone->all_unreclaimable) return false; + if (sc->nr_reclaimed =3D=3D 0 && !zone_reclaimable(zone)) + return true; } BTW: I'm using kernel3.4, I also try to search in the kernel3.9, d= idn't see a possible fix for such issue. Or is anyone also met such issue b= efore? Any comment will be welcomed, looking forward to your reply! Thanks! Best Regards Lisa Du --_000_89813612683626448B837EE5A0B6A7CB3B62F8F272SCVEXCH4marve_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Dear Sir:

Curren= tly I met a possible deadloop in direct reclaim. After run plenty of the application, system run into a status that system memory is very fragmentized. Like only order-0 and order-1 memory left.

Then o= ne process required a order-2 buffer but it enter an endless direct reclaim. From my t= race log, I can see this loop already over 200,000 times. Kswapd was first wake = up and then go back to sleep as it cannot rebalance this order’s memory.= But zone->all_unreclaimable remains 1.

Though direct_reclaim every time returns no pages, but as zone->all_unreclaimab= le =3D 1, so it loop again and again. Even when zone->pages_scanned also become= s very large. It will block the process for long time, until some watchdog th= read detect this and kill this process. Though it’s in __alloc_pages_slowp= ath, but it’s too slow right? Maybe cost over 50 seconds or even more.

I thin= k it’s not as expected right?  Can we also add below check in the function al= l_unreclaimable() to terminate this loop?

&= nbsp;

@@ -23= 55,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,

 =             &nb= sp;          continue;

 =             &nb= sp;  if (!zone->all_unreclaimable)

 =             &nb= sp;          return false;

+ = ;            &n= bsp; if (sc->nr_reclaimed =3D=3D 0 && !zone_reclaimable(zone))

+ = ;            &n= bsp;         return true;

 =        }

      = ;   BTW: I’m using kernel3.4, I also try to search in the kernel3.9, didn̵= 7;t see a possible fix for such issue. Or is anyone also met such issue before?= Any comment will be welcomed, looking forward to your reply!<= /p>

 

Thanks!

 

Best Regards

Lisa Du

 

--_000_89813612683626448B837EE5A0B6A7CB3B62F8F272SCVEXCH4marve_-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org