From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sun, 10 Oct 1999 10:21:24 -0400 (EDT) From: James Simmons Subject: Re: MMIO regions In-Reply-To: <199910101124.HAA32129@light.alephnull.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Rik Faith Cc: Linux MM List-ID: > If I understand what you are saying, there are serious performance > implications for direct-rendering clients (in addition to the added > scheduler overhead, which will negatively impact overall system > performance). > > I believe you are saying: > 1) There are n processes, each of which has the MMIO region mmap'd. > 2) The scheduler will only schedule one of these processes at a time, > even on an SMP system. [I'm assuming this is what you mean by "in > use", since the scheduler can't know about actual MMIO writes -- it > has to assume that a mapped region is a region that is "in use", > even if it isn't (e.g., a threaded program may have the MMIO region > mapped in n-1 threads, but may only direct render in 1 thread).] > > On MMIO-based graphics cards (i.e., those that do not use traditional DMA), > a direct-rendering client will intersperse relatively long periods of > computation with relatively short periods of MMIO writes. In your scheme, > one of these clients will run for a whole time slice before the other one > runs (i.e., they will run in alternate time slices, even on an SMP system > with sufficient processors to run both simultaneously). Because actual > MMIO writes take up a relatively small fraction of that time slice, > rendering performance will potentially decrease by a factor of 2 (or more, > if more CPUs are available). This is significant, especially since many > high-end OpenGL applications are threaded and expect to be able to run > simultaneously on SMP systems. > I notice this when I was playing with my code. Also I realized regular kernel semaphores are not going to be able to give you hard realtime guarantees that are needed. Even the regular interrupt handling is just not good enough. A good example is VBL. With ordinary interrupt handling it takes a enormous amount of time to get to the interrput handler. The effect gets worst under a very highly loaded machine. The tearing effect gets worst. Its not unusual for a graphics program to create a high load either. So actually I'm designing a hard realtime schedular that does this. The regular schedular is not going to cut the mustard. Plus this gives a enormous performace boost no matter what the load. Someone familiar with IRIX told me thats what SGI does to optimize their systems. Also you can have the following Data-> accel engine context switch other data->accel engine. This would confuss most cards. With a realtime handler you can make sure that a accel command is finished then allow a context switch. > The cooperative locking system used by the DRI (see > http://precisioninsight.com/dr/locking.html) allows direct-rendering > clients to perform fine-grain locking only when the MMIO region is actually > being written. The overhead for this system is extremely low (about 2 > instructions to lock, and 1 instruction to unlock). Cooperative locking > like this allows several threads that all map the same MMIO region to run > simultaneously on an SMP system. I'm familar with the system. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/