From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTP id 664578CF for ; Sat, 14 Jun 2014 01:17:04 +0000 (UTC) Received: from smtp.demon.co.uk (mdfmta010.mxout.tch.inty.net [91.221.169.51]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 84BBC1F986 for ; Sat, 14 Jun 2014 01:17:03 +0000 (UTC) Message-ID: <539BA32A.8090104@lougher.demon.co.uk> Date: Sat, 14 Jun 2014 02:19:38 +0100 From: Phillip Lougher MIME-Version: 1.0 To: Christoph Lameter References: <53994FED.1080106@lougher.demon.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: ksummit-discuss@lists.linuxfoundation.org Subject: Re: [Ksummit-discuss] [CORE TOPIC] Redesign Memory Management layer and more core subsystem List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 13/06/14 18:02, Christoph Lameter wrote: > On Thu, 12 Jun 2014, Phillip Lougher wrote: > >>> 1. The need to use larger order pages, and the resulting problems with >>> fragmentation. Memory sizes grow and therefore the number of page structs >>> where state has to be maintained. Maybe there is something different? If >>> we use hugepages then we have 511 useless page structs. Some apps need >>> linear memory where we have trouble and are creating numerous memory >>> allocators (recently the new bootmem allocator and CMA. Plus lots of >>> specialized allocators in various subsystems). >>> >> >> This was never solved to my knowledge, there is no panacea here. >> Even in the 90s we had video subsystems wanting to allocate in units >> of 1Mbyte, and others in units of 4k. The "solution" was so called >> split-level allocators, each specialised to deal with a particular >> "first class media", with them giving back memory to the underlying >> allocator when memory got tight in another specialised allocator. >> Not much different to the ad-hoc solutions being adopted in Linux, >> except the general idea was each specialised allocator had the same >> API. > > It is solvable if the objects are inherent movable. If any object > allocated provides a function that makes an object movable then > defragmentation is possible and therefore large contiguous area of memory > can be created at any time. > > >>> Can we develop the notion that subsystems own certain cores so that their >>> execution is restricted to a subset of the system avoiding data >>> replication and keeping subsystem data hot? I.e. have a device driver >>> and subsystems driving those devices just run on the NUMA node to which >>> the PCI-E root complex is attached. Restricting to NUMA node reduces data >>> locality complexity and increases performance due to cache hot data. >> >> Lots of academic hot-air was expended here when designing distributed >> systems which could scale seamlessly across heterogeneous CPUs connected >> via different levels of interconnects (bus, ATM, ethernet etc.), zoning, >> migration, replication etc. The "solution" is probably out there somewhere >> forgotten about. > > We have the issue with homogenous cpus due to the proliferation of cores > on processors now. Maybe that is solvable? > >> Case in point, many years ago I was the lead Linux guy for a company >> designing a SOC for digital TV. Just before I left I had an interesting >> "conversation" with the chief hardware guy of the team who designed the SOC. >> Turns out they'd budgeted for the RAM bandwidth needed to decode a typical >> MPEG stream, but they'd not reckoned on all the memcopies Linux needs to do >> between its "separate address space" processes. He'd been used to embedded >> oses which run in a single address space. > > Well maybe that is appropriate for some processes? And we could carve out > subsections of the hardware where single adress space stuff is possible? > Apologies, maybe what I was trying to say wasn't clear :) I wasn't arguing against it, but rather should we be trying to do this at the Linux kernel level. Embedded systems have long had the need to carve out (mainly heterogenous) processors from Linux. Media systems have VLIW media processors (i.e. Philips Trimedia), and mobile phones typically have separate baseband processors. This is done without any core support necessary from the kernel. Just write a device driver that presents a programming & I/O channel to the carved out hardware. Additionally, where Linux kernel has been too heavy weight with its slow real-time response, and/or expensive paged multi-address spaces, the solution is often to use a nano-kernel like ADEOS or RTLinux, running Linux as a separate OS, leaving scope to run lighter weight real-time single address operating systems in parallel. In otherwords if we need more efficiency, do it outside of Linux, rather than try to rewrite the strong protection model in Linux. That way leads to pain. My point about the hardware engineer is people can't have their cake and eat it. Unix/Linux has been successful partly because of its strong protection/paged model. It is difficult to be both secure and efficient. If you want to both then you need to design it into the operating system from the outset. Linux isn't a good place to start. Phillip >