From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id B728028070C for ; Tue, 22 Aug 2017 16:57:18 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id b8so96706134pgn.10 for ; Tue, 22 Aug 2017 13:57:18 -0700 (PDT) Received: from mail-pg0-x229.google.com (mail-pg0-x229.google.com. [2607:f8b0:400e:c05::229]) by mx.google.com with ESMTPS id o1si10237979pld.315.2017.08.22.13.57.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Aug 2017 13:57:16 -0700 (PDT) Received: by mail-pg0-x229.google.com with SMTP id m133so37280075pga.5 for ; Tue, 22 Aug 2017 13:57:16 -0700 (PDT) Date: Tue, 22 Aug 2017 13:57:14 -0700 (PDT) From: David Rientjes Subject: Re: [RFC PATCH 0/6] proactive kcompactd In-Reply-To: <20170821141014.GC1371@cmpxchg.org> Message-ID: References: <20170727160701.9245-1-vbabka@suse.cz> <20170821141014.GC1371@cmpxchg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Vlastimil Babka , linux-mm@kvack.org, Joonsoo Kim , Mel Gorman , Michal Hocko , Andrea Arcangeli , Rik van Riel On Mon, 21 Aug 2017, Johannes Weiner wrote: > > I think I would have liked to have seen "less proactive" :) > > > > Kcompactd currently has the problem that it is MIGRATE_SYNC_LIGHT so it > > continues until it can defragment memory. On a host with 128GB of memory > > and 100GB of it sitting in a hugetlb pool, we constantly get kcompactd > > wakeups for order-2 memory allocation. The stats are pretty bad: > > > > compact_migrate_scanned 2931254031294 > > compact_free_scanned 102707804816705 > > compact_isolated 1309145254 > > > > 0.0012% of memory scanned is ever actually isolated. We constantly see > > very high cpu for compaction_alloc() because kcompactd is almost always > > running in the background and iterating most memory completely needlessly > > (define needless as 0.0012% of memory scanned being isolated). > > The free page scanner will inevitably wade through mostly used memory, > but 0.0012% is lower than what systems usually have free. I'm guessing > this is because of concurrent allocation & free cycles racing with the > scanner? There could also be an issue with how we do partial scans. > More than 90% of this system's memory is in the hugetlbfs pool so the freeing scanner needlessly scans over it. Because kcompactd does MIGRATE_SYNC_LIGHT compaction, it doesn't stop iterating until the allocation is successful at pgdat->kcompactd_max_order or the migration and freeing scanners meet. This is normally all memory. Because of MIGRATE_SYNC_LIGHT, kcompactd does respect deferred compaction and will avoid doing compaction at all for the next 1 << COMPACT_MAX_DEFER_SHIFT wakeups, but while the rest of userspace not mapping hugetlbfs memory tries to fault thp, this happens almost nonstop at 100% of cpu. Although this might not be a typical configuration, it can easily be used to demonstrate how inefficient kcompactd behaves under load when a small amount of memory is free or cannot be isolated because its pinned. vm.extfrag_threshold isn't an adequate solution. > Anyway, we've also noticed scalability issues with the current scanner > on 128G and 256G machines. Even with a better efficiency - finding the > 1% of free memory, that's still a ton of linear search space. > Agreed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org