From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vn0-f54.google.com (mail-vn0-f54.google.com [209.85.216.54]) by kanga.kvack.org (Postfix) with ESMTP id D56E16B0038 for ; Thu, 14 May 2015 03:51:31 -0400 (EDT) Received: by vnbg1 with SMTP id g1so4608943vnb.2 for ; Thu, 14 May 2015 00:51:31 -0700 (PDT) Received: from gate.crashing.org (gate.crashing.org. [63.228.1.57]) by mx.google.com with ESMTPS id h8si902114vda.25.2015.05.14.00.51.30 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 14 May 2015 00:51:30 -0700 (PDT) Message-ID: <1431589879.4160.50.camel@kernel.crashing.org> Subject: Re: Interacting with coherent memory on external devices From: Benjamin Herrenschmidt Date: Thu, 14 May 2015 17:51:19 +1000 In-Reply-To: <55545124.7090804@suse.cz> References: <20150424150829.GA3840@gmail.com> <20150424164325.GD3840@gmail.com> <20150424171957.GE3840@gmail.com> <20150424192859.GF3840@gmail.com> <20150425114633.GI5561@linux.vnet.ibm.com> <20150427154728.GA26980@gmail.com> <553E6405.1060007@redhat.com> <1430178843.16571.134.camel@kernel.crashing.org> <55535B6E.5090700@suse.cz> <1431560326.20218.94.camel@kernel.crashing.org> <55545124.7090804@suse.cz> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Christoph Lameter , Rik van Riel , Jerome Glisse , "Paul E. McKenney" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, jglisse@redhat.com, mgorman@suse.de, aarcange@redhat.com, airlied@redhat.com, aneesh.kumar@linux.vnet.ibm.com, Cameron Buschardt , Mark Hairgrove , Geoffrey Gerfin , John McKenna , akpm@linux-foundation.org On Thu, 2015-05-14 at 09:39 +0200, Vlastimil Babka wrote: > On 05/14/2015 01:38 AM, Benjamin Herrenschmidt wrote: > > On Wed, 2015-05-13 at 16:10 +0200, Vlastimil Babka wrote: > >> Sorry for reviving oldish thread... > > > > Well, that's actually appreciated since this is constructive discussion > > of the kind I was hoping to trigger initially :-) I'll look at > > I hoped so :) > > > ZONE_MOVABLE, I wasn't aware of its existence. > > > > Don't we still have the problem that ZONEs must be somewhat contiguous > > chunks ? Ie, my "CAPI memory" will be interleaved in the physical > > address space somewhat.. This is due to the address space on some of > > those systems where you'll basically have something along the lines of: > > > > [ node 0 mem ] [ node 0 CAPI dev ] .... [ node 1 mem] [ node 1 CAPI dev] ... > > Oh, I see. The VM code should cope with that, but some operations would > be inefficiently looping over the holes in the CAPI zone by 2MB > pageblock per iteration. This would include compaction scanning, which > would suck if you need those large contiguous allocations as you said. > Interleaving works better if it's done with a smaller granularity. > > But I guess you could just represent the CAPI as multiple NUMA nodes, > each with single ZONE_MOVABLE zone. Especially if "node 0 CAPI dev" and > "node 1 CAPI dev" differs in other characteristics than just using a > different range of PFNs... otherwise what's the point of this split anyway? Correct, I think we want the CAPI devs to look like CPU-less NUMA nodes anyway. This is the right way to target an allocation at one of them and it conveys the distance properly, so it makes sense. I'll add the ZONE_MOVABLE to the list of things to investigate on our side, thanks for the pointer ! Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org