From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 0B3E66B0387 for ; Mon, 27 Feb 2017 20:53:33 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id n89so105695688pfa.7 for ; Mon, 27 Feb 2017 17:53:33 -0800 (PST) Received: from mail-pg0-x244.google.com (mail-pg0-x244.google.com. [2607:f8b0:400e:c05::244]) by mx.google.com with ESMTPS id h186si204381pfc.283.2017.02.27.17.53.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Feb 2017 17:53:31 -0800 (PST) Received: by mail-pg0-x244.google.com with SMTP id x17so2861221pgi.0 for ; Mon, 27 Feb 2017 17:53:31 -0800 (PST) Subject: Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages References: <1487918992-7515-1-git-send-email-hejianet@gmail.com> <20170224084949.GA19161@dhcp22.suse.cz> <20170224165105.GB20092@cmpxchg.org> <20170227085024.GD14029@dhcp22.suse.cz> <20170227170634.GA20423@cmpxchg.org> From: hejianet Message-ID: <37863671-bc0b-3f70-1158-685f5b379789@gmail.com> Date: Tue, 28 Feb 2017 09:53:20 +0800 MIME-Version: 1.0 In-Reply-To: <20170227170634.GA20423@cmpxchg.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner , Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Vlastimil Babka , Minchan Kim , Rik van Riel Hi Johannes I have another concern: kswapd -> balance_pgdat -> age_active_anon This code path will do some background works to age anon list, will this patch have some impact on it if the retry time is > 16 and kswapd is not waken up? B.R. Jia On 28/02/2017 1:06 AM, Johannes Weiner wrote: > On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote: >> On Fri 24-02-17 11:51:05, Johannes Weiner wrote: >> [...] >>> >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001 >>> From: Johannes Weiner >>> Date: Fri, 24 Feb 2017 10:56:32 -0500 >>> Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes >>> >>> Jia He reports a problem with kswapd spinning at 100% CPU when >>> requesting more hugepages than memory available in the system: >>> >>> $ echo 4000 >/proc/sys/vm/nr_hugepages >>> >>> top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 >>> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie >>> %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st >>> KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers >>> KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3 >>> >>> At that time, there are no reclaimable pages left in the node, but as >>> kswapd fails to restore the high watermarks it refuses to go to sleep. >>> >>> Kswapd needs to back away from nodes that fail to balance. Up until >>> 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") >>> kswapd had such a mechanism. It considered zones whose theoretically >>> reclaimable pages it had reclaimed six times over as unreclaimable and >>> backed away from them. This guard was erroneously removed as the patch >>> changed the definition of a balanced node. >>> >>> However, simply restoring this code wouldn't help in the case reported >>> here: there *are* no reclaimable pages that could be scanned until the >>> threshold is met. Kswapd would stay awake anyway. >>> >>> Introduce a new and much simpler way of backing off. If kswapd runs >>> through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single >>> page, make it back off from the node. This is the same number of shots >>> direct reclaim takes before declaring OOM. Kswapd will go to sleep on >>> that node until a direct reclaimer manages to reclaim some pages, thus >>> proving the node reclaimable again. >> >> Yes this looks, nice&simple. I would just be worried about [1] a bit. >> Maybe that is worth a separate patch though. >> >> [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3quxwz@dhcp22.suse.cz > > I think I'd prefer the simplicity of keeping this contained inside > vmscan.c, as an interaction between direct reclaimers and kswapd, as > well as leaving the wakeup tied to actually seeing reclaimable pages > rather than merely producing free pages (e.g. should we also add a > kick to a large munmap() for example?). > > OOM kills come with such high latencies that I cannot imagine a > slightly quicker kswapd restart would matter in practice. > >>> Reported-by: Jia He >>> Signed-off-by: Johannes Weiner >> >> Acked-by: Michal Hocko > > Thanks! > >> I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES >> to mm/internal.h. This is MM internal thing and there is no need to make >> it visible. > > Good point, I'll move it. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org