From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 1B6046B000A for ; Mon, 15 Oct 2018 18:30:20 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id g63-v6so5681460pfc.9 for ; Mon, 15 Oct 2018 15:30:20 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c12-v6sor3090132pgn.27.2018.10.15.15.30.18 for (Google Transport Security); Mon, 15 Oct 2018 15:30:19 -0700 (PDT) Date: Mon, 15 Oct 2018 15:30:17 -0700 (PDT) From: David Rientjes Subject: Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings In-Reply-To: Message-ID: References: <20180925120326.24392-2-mhocko@kernel.org> <20181005073854.GB6931@suse.de> <20181005232155.GA2298@redhat.com> <20181009094825.GC6931@suse.de> <20181009122745.GN8528@dhcp22.suse.cz> <20181009130034.GD6931@suse.de> <20181009142510.GU8528@dhcp22.suse.cz> <20181009230352.GE9307@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Michal Hocko , Mel Gorman , Andrew Morton , Vlastimil Babka , Andrea Argangeli , Zi Yan , Stefan Priebe - Profihost AG , "Kirill A. Shutemov" , linux-mm@kvack.org, LKML , Stable tree On Wed, 10 Oct 2018, David Rientjes wrote: > > I think "madvise vs mbind" is more an issue of "no-permission vs > > permission" required. And if the processes ends up swapping out all > > other process with their memory already allocated in the node, I think > > some permission is correct to be required, in which case an mbind > > looks a better fit. MPOL_PREFERRED also looks a first candidate for > > investigation as it's already not black and white and allows spillover > > and may already do the right thing in fact if set on top of > > MADV_HUGEPAGE. > > > > We would never want to thrash the local node for hugepages because there > is no guarantee that any swapping is useful. On COMPACT_SKIPPED due to > low memory, we have very clear evidence that pageblocks are already > sufficiently fragmented by unmovable pages such that compaction itself, > even with abundant free memory, fails to free an entire pageblock due to > the allocator's preference to fragment pageblocks of fallback migratetypes > over returning remote free memory. > > As I've stated, we do not want to reclaim pointlessly when compaction is > unable to access the freed memory or there is no guarantee it can free an > entire pageblock. Doing so allows thrashing of the local node, or remote > nodes if __GFP_THISNODE is removed, and the hugepage still cannot be > allocated. If this proposed mbind() that requires permissions is geared > to me as the user, I'm afraid the details of what leads to the thrashing > are not well understood because I certainly would never use this. > At the risk of beating a dead horse that has already been beaten, what are the plans for this patch when the merge window opens? It would be rather unfortunate for us to start incurring a 14% increase in access latency and 40% increase in fault latency. Would it be possible to test with my patch[*] that does not try reclaim to address the thrashing issue? If that is satisfactory, I don't have a strong preference if it is done with a hardcoded pageblock_order and __GFP_NORETRY check or a new __GFP_COMPACT_ONLY flag. I think the second issue of faulting remote thp by removing __GFP_THISNODE needs supporting evidence that shows some platforms benefit from this (and not with numa=fake on the command line :). [*] https://marc.info/?l=linux-kernel&m=153903127717471