From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 6 Mar 2006 17:39:41 -0800 From: Andrew Morton Subject: Re: [PATCH] avoid atomic op on page free Message-Id: <20060306173941.4b5e0fc7.akpm@osdl.org> In-Reply-To: <20060307011107.GI32565@linux.intel.com> References: <20060307001015.GG32565@linux.intel.com> <20060306165039.1c3b66d8.akpm@osdl.org> <20060307011107.GI32565@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Benjamin LaHaise Cc: linux-mm@kvack.org, netdev@vger.kernel.org List-ID: Benjamin LaHaise wrote: > > On Mon, Mar 06, 2006 at 04:50:39PM -0800, Andrew Morton wrote: > > Am a bit surprised at those numbers. > > > Because userspace has to do peculiar things to get its pages taken off the > > LRU. What exactly was that application doing? > > It's just a simple send() and recv() pair of processes. Networking uses > pages for the buffer on user transmits. You mean non-zero-copy transmits? If they were zero-copy then those pages would still be on the LRU. > Those pages tend to be freed > in irq context on transmit or in the receiver if the traffic is local. If it was a non-zero-copy Tx then networking owns that page and can just do free_hot_page() on it and avoid all that stuff in put_page(). > > The patch adds slight overhead to the common case while providing > > improvement to what I suspect is a very uncommon case? > > At least on any modern CPU with branch prediction, the test is essentially > free (2 memory reads that pipeline well, iow 1 cycle, maybe 2). The > upside is that you get to avoid the atomic (~17 cycles on a P4 with a > simple test program, the penalty doubles if there is one other instruction > that operates on memory in the loop), disabling interrupts (~20 cycles?, I > don't remember) another atomic for the spinlock, another atomic for > TestClearPageLRU() and the pushf/popf (expensive as they rely on whatever > instruction that might still be in flight to complete and add the penalty > for changing irq state). That's at least 70 cycles without including the > memory barrier side effects which can cost 100 cycles+. Add in the costs > for the cacheline bouncing of the lru_lock and we're talking *expensive*. > > So, a 1-2 cycle cost for a case that normally takes from 17 to 100+ cycles? > I think that's worth it given the benefits. Thing is, that case would represent about 1000000th of the number of put_pages()s which get done in the world. IOW: a net loss. > Also, I think the common case (page cache read / map) is something that > should be done differently, as those atomics really do add up to major > pain. Using rcu for page cache reads would be truely wonderful, but that > will take some time. > We'd to consider the interaction with those pages which get temporarily removed from the LRU in reclaim. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org