From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 05 Dec 2003 00:44:06 +0900 From: IWAMOTO Toshihiro Subject: Re: memory hotremove prototype, take 3 In-Reply-To: <152440000.1070516333@[10.10.2.4]> References: <20031201034155.11B387007A@sv1.valinux.co.jp> <187360000.1070480461@flay> <20031204035842.72C9A7007A@sv1.valinux.co.jp> <152440000.1070516333@10.10.2.4> MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Message-Id: <20031204154406.7FC587007A@sv1.valinux.co.jp> Sender: owner-linux-mm@kvack.org Return-Path: To: "Martin J. Bligh" Cc: IWAMOTO Toshihiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: At Wed, 03 Dec 2003 21:38:54 -0800, Martin J. Bligh wrote: > > My target is somewhat NUMA-ish and fairly large. So I'm not sure if > > CONFIG_NONLINEAR fits, but CONFIG_NUMA isn't perfect either. > > If your target is NUMA, then you really, really need CONFIG_NONLINEAR. > We don't support multiple pgdats per node, nor do I wish to, as it'll > make an unholy mess ;-). With CONFIG_NONLINEAR, the discontiguities > within a node are buried down further, so we have much less complexity > to deal with from the main VM. The abstraction also keeps the poor > VM engineers trying to read / write the code saner via simplicity ;-) IIRC, memory is contiguous within a NUMA node. I think Goto-san will clarify this issue when his code gets ready. :-) > WRT generic discontigmem support (not NUMA), doing that via pgdats > should really go away, as there's no real difference between the > chunks of physical memory as far as the page allocator is concerned. > The plan is to use Daniel's nonlinear stuff to replace that, and keep > the pgdats strictly for NUMA. Same would apply to hotpluggable zones - > I'd hate to end up with 512 pgdats of stuff that are really all the > same memory types underneath. Yes. Unnecessary zone rebalancing would suck. > The real issue you have is the mapping of the struct pages - if we can > acheive a non-contig mapping of the mem_map / lmem_map array, we should > be able to take memory on and offline reasonably easy. If you're willing > for a first implementation to pre-allocate the struct page array for > every possible virtual address, it makes life a lot easier. Preallocating struct page array isn't feasible for the target system because max memory / min memory ratio is large. Our plan is to use the beginning (or the end) of the memory block being hotplugged. If a 2GB memory block is added, first ~20MB is used for the struct page array for the rest of the memory block. > >> PS. What's this bit of the patch for? > >> > >> void *vmalloc(unsigned long size) > >> { > >> +#ifdef CONFIG_MEMHOTPLUGTEST > >> + return __vmalloc(size, GFP_KERNEL, PAGE_KERNEL); > >> +#else > >> return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL); > >> +#endif > >> } > > > > This is necessary because kernel memory cannot be swapped out. > > Only highmem can be hot removed, though it doesn't need to be highmem. > > We can define another zone attribute such as GFP_HOTPLUGGABLE. > > You could just lock the pages, I'd think? I don't see at a glance > exactly what you were using this for, but would that work? I haven't seriously considered to implement vmalloc'd memory, but I guess that would be too complicated if not impossible. Making kernel threads or interrupt handlers block on memory access sound very difficult to me. -- IWAMOTO Toshihiro -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org