From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 22 Apr 2003 09:58:42 -0700 From: "Martin J. Bligh" Subject: Re: objrmap and vmtruncate Message-ID: <1040000.1051030721@[10.10.2.4]> In-Reply-To: <20030422162055.GJ8978@holomorphy.com> References: <20030422145644.GG8978@holomorphy.com> <20030422162055.GJ8978@holomorphy.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org Return-Path: To: William Lee Irwin III , Ingo Molnar Cc: Andrew Morton , Andrea Arcangeli , mingo@elte.hu, hugh@veritas.com, dmccr@us.ibm.com, Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: >> using nonlinear mappings adds the overhead of pte chains, which roughly >> doubles the pagetable overhead. (or companion pagetables, which triple >> the pagetable overhead) Purely RAM-wise the break-even point is at >> around 8 pages, 8 pte chain entries make up for 64 bytes of vma overhead. >> the biggest problem i can see is that we (well, the kernel) has to make a >> judgement of RAM footprint vs. algorithmic overhead, which is apples to >> oranges. Nonlinear vmas [or just linear vmas with pte chains installed], >> while being only O(N), double/triple the pagetable overhead. objrmap >> linear vmas, while having only the pagetable overhead, are O(N^2). [well, >> it's O(N*M)] >> RAM-footprint wise the boundary is clear: above 8 pages of granularity, >> vmas with objrmap cost less RAM than nonlinear mappings. >> CPU-time-wise the nonlinear mappings with pte chains always beat objrmap. > > There's definitely an argument brewing here. Large 32-bit is very space > conscious; the rest of the world is largely oblivious to these specific > forms of space consumption aside from those tight on space in general. However, the time consumption affects everybody. The overhead of pte-chains is very significant ... people seem to be conveniently forgetting that for some reason. Ingo's rmap_pages thing solves the lowmem space problem, but the time problem is still there, if not worse. Please don't create the impression that rmap methodologies are only an issue for large 32 bit machines - that's not true at all. People seem to be focused on one corner case of performance for objrmap ... If you want a countercase for pte-chain based rmap, try creating 1000 processes in a machine with a decent amount of RAM. Make them share libraries (libc, etc), and then fork and exit in a FIFO rolling fashion. Just forking off a bunch of stuff (regular programs / shell scripts) that do similar amounts of work will presumably approximate this. Kernel compiles see large benefits here, for instance. Things that were less dominated by userspace calculations would see even bigger changes. I've not seen anything but a focused microbenchmark deliberately written for the job do better on pte-chain based rmap that partial objrmap yet. If we had something more realistic, it would become rather more interesting. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org