From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id B85376B0089 for ; Tue, 23 Nov 2010 10:46:06 -0500 (EST) Received: from eu_spt1 (mailout1.w1.samsung.com [210.118.77.11]) by mailout1.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTP id <0LCC002URH4RYM@mailout1.w1.samsung.com> for linux-mm@kvack.org; Tue, 23 Nov 2010 15:46:03 +0000 (GMT) Received: from linux.samsung.com ([106.116.38.10]) by spt1.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTPA id <0LCC0075RH4RWW@spt1.w1.samsung.com> for linux-mm@kvack.org; Tue, 23 Nov 2010 15:46:03 +0000 (GMT) Date: Tue, 23 Nov 2010 16:46:03 +0100 From: =?utf-8?B?TWljaGHFgiBOYXphcmV3aWN6?= Subject: Re: [PATCH 0/4] big chunk memory allocator v4 In-reply-to: <20101122090431.4ff9c941.kamezawa.hiroyu@jp.fujitsu.com> Message-id: MIME-version: 1.0 Content-type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-transfer-encoding: Quoted-Printable References: <20101119171033.a8d9dc8f.kamezawa.hiroyu@jp.fujitsu.com> <20101119125653.16dd5452.akpm@linux-foundation.org> <20101122090431.4ff9c941.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: Andrew Morton , KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , minchan.kim@gmail.com, Bob Liu , fujita.tomonori@lab.ntt.co.jp, pawel@osciak.com, andi.kleen@intel.com, felipe.contreras@gmail.com, "kosaki.motohiro@jp.fujitsu.com" , Marek Szyprowski List-ID: On Mon, 22 Nov 2010 01:04:31 +0100, KAMEZAWA Hiroyuki wrote: > On Fri, 19 Nov 2010 12:56:53 -0800 > Andrew Morton wrote: > >> On Fri, 19 Nov 2010 17:10:33 +0900 >> KAMEZAWA Hiroyuki wrote: >> >> > Hi, this is an updated version. >> > >> > No major changes from the last one except for page allocation funct= ion. >> > removed RFC. >> > >> > Order of patches is >> > >> > [1/4] move some functions from memory_hotplug.c to page_isolation.c= >> > [2/4] search physically contiguous range suitable for big chunk all= oc. >> > [3/4] allocate big chunk memory based on memory hotplug(migration) = technique >> > [4/4] modify page allocation function. >> > >> > For what: >> > >> > I hear there is requirements to allocate a chunk of page which is= larger than >> > MAX_ORDER. Now, some (embeded) device use a big memory chunk. To = use memory, >> > they hide some memory range by boot option (mem=3D) and use hidde= n memory >> > for its own purpose. But this seems a lack of feature in memory m= anagement. >> > >> > This patch adds >> > alloc_contig_pages(start, end, nr_pages, gfp_mask) >> > to allocate a chunk of page whose length is nr_pages from [start,= end) >> > phys address. This uses similar logic of memory-unplug, which tri= es to >> > offline [start, end) pages. By this, drivers can allocate 30M or = 128M or >> > much bigger memory chunk on demand. (I allocated 1G chunk in my t= est). >> > >> > But yes, because of fragmentation, this cannot guarantee 100% all= oc. >> > If alloc_contig_pages() is called in system boot up or movable_zo= ne is used, >> > this allocation succeeds at high rate. >> >> So this is an alternatve implementation for the functionality offered= >> by Michal's "The Contiguous Memory Allocator framework". >> > > Yes, this will be a backends for that kind of works. As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc= _contig_pages() patches. The most important difference is that alloc_contig_pages() would look fo= r a chunk of memory that can be allocated and then perform migration whereas CMA a= ssumes that regions it controls are always "migratable". Also, I've tried to remove the requirement for MAX_ORDER alignment. > I think there are two ways to allocate contiguous pages larger than MA= X_ORDER. > > 1) hide some memory at boot and add an another memory allocator. > 2) support a range allocator as [start, end) > > This is an trial from 2). I used memory-hotplug technique because I kn= ow some. > This patch itself has no "map" and "management" function, so it should= be > developped in another patch (but maybe it will be not my work.) Yes, this is also a valid point. From my use cases, the alloc_contig_pa= ges() would probably not be enough and require some management code to be adde= d. >> > I tested this on x86-64, and it seems to work as expected. But fe= edback from >> > embeded guys are appreciated because I think they are main user o= f this >> > function. >> >> From where I sit, feedback from the embedded guys is *vital*, because= >> they are indeed the main users. >> >> Michal, I haven't made a note of all the people who are interested in= >> and who are potential users of this code. Your patch series has a >> billion cc's and is up to version 6. Ah, yes... I was thinking about shrinking the cc list but didn't want t= o seem rude or anything removing ppl who have shown interest in the previo= us posted version. >> Could I ask that you review and >> test this code, and also hunt down other people (probably at other >> organisations) who can do likewise for us? Because until we hear fro= m >> those people that this work satisfies their needs, we can't really >> proceed much further. A few things than: 1. As Felipe mentioned, on ARM it is often desired to have the memory mapped as non-cacheable, which most often mean that the memory never= reaches the page allocator. This means, that alloc_contig_pages() would not be suitable for cases where one needs such memory. Or could this be overcome by adding the memory back as highmem? But= then, it would force to compile in highmem support even if platform does not really need it. 2. Device drivers should not by themselves know what ranges of memory to= allocate memory from. Moreover, some device drivers could require allocation different buffers from different ranges. As such, this would require some management code on top of alloc_contig_pages(). 3. When posting hwmem, Johan Mossberg mentioned that he'd like to see notion of "pinning" chunks (so that not-pinned chunks can be moved around when hardware does not use them to defragment memory). This would again require some management code on top of alloc_contig_pages(). 4. I might be mistaken here, but the way I understand ZONE_MOVABLE work is that it is cut of from the end of memory. Or am I talking nonsen= se? My concern is that at least one chip I'm working with requires allocations from different memory banks which would basically mean t= hat there would have to be two movable zones, ie: +-------------------+-------------------+ | Memory Bank #1 | Memory Bank #2 | +---------+---------+---------+---------+ | normal | movable | normal | movable | +---------+---------+---------+---------+ So even though I'm personally somehow drawn by alloc_contig_pages()'s simplicity (compared to CMA at least), those quick thoughts make me thin= k that alloc_contig_pages() would work rather as a backend (as Kamezawa mentioned) for some, maybe even tiny but still present, management code which would handle "marking memory fragments as ZONE_MOVABLE" (whatever that would involve) and deciding which memory ranges drivers can allocat= e from. I'm also wondering whether alloc_contig_pages()'s first-fit is suitable = but that probably cannot be judged without some benchmarks. -- = Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=3D./ `o | Computer Science, Micha=C5=82 "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org