From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 21 Mar 2000 02:20:53 +0100 From: Jamie Lokier Subject: madvise (MADV_FREE) Message-ID: <20000321022053.A4271@pcep-jamie.cern.ch> References: <20000320135939.A3390@pcep-jamie.cern.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Chuck Lever on Mon, Mar 20, 2000 at 02:09:26PM -0500 Sender: owner-linux-mm@kvack.org Return-Path: To: Chuck Lever Cc: linux-mm@kvack.org List-ID: Hi Chuck About MADV_FREE --------------- > > The principle here is very simple: MADV_FREE marks all the pages in > > the region as "discardable", and clears the accessed and dirty bits > > of those pages. > > > > Later when the kernel needs to free some memory, it is permitted to > > free "discardable" pages immediately provided they are still not > > accessed or dirty. When vmscan is clearing the accessed and dirty > > bits on pages, if they were set it must clear the " discardable" bit. > > > > This allows malloc() and other user space allocators to free pages > > back to the system. Unlike DU's MADV_DONTNEED, or mmapping > > /dev/zero, if the system does not need the page there is no > > inefficient zero-copy. If there was, malloc() would be better off > > not bothering to return the pages. > > unless i've completely misunderstood what you are proposing, this is what > MADV_DONTNEED does today, No, your MADV_DONTNEED _always_ discards the data in those pages. That makes it too inefficient for application memory allocators, because they will often want to reuse some of the pages soon after. You don't want redundant page zeroing, and you don't want to give up memory which is still nice and warm in the CPU's cache. Unless the kernel has a better use for it than you. MADV_FREE on the other hand simply permits the kernel to reclaim those pages, if it is under memory pressure. If there is no pressure, the pages are reused by the application unchanged. In this way different subsystems competing for memory get to share it out -- essentially the fairness mechanisms in the kernel are extending to application page management. And the application hardly knows a think about it. Here's why MADV_FREE works, and the other things don't: A typical memory allocator creates holes in its heap, which the kernel has to swap out if it needs memory. I guess about 1/4 of all data in swap is this kind of junk (but it's just a guess). But it's quite inefficient for an allocator to unconditionally give pages back to the kernel. The cost-benefit is "cost of giving page to kernel" vs. "cost of maybe paging out". The cost of giving up pages is significant: each one implies a COW fault, clear_page when you reuse the page, and loss of cache-warm memory. You assume a page is not likely to swap, because there's a reasonable chance the application will reallocate it before that happens. So on balance, giving pages unconditionally to the kernel is a loss. --> No sane free(3) would call MADV_DONTNEED or msync(MS_INVALIDATE). A better application allocator would base decisions about when to return pages to the kernel on the likelihood of swapping and measured cost of swapping vs. retaining pages. Of course that's very difficult and system specific. And really only the kernel has access to all the information on memory pressure. So the best arrangment is to let the kernel make page reclamation decisions. And if a page is not reclaimed before it is reused, let the application reuse the page unchanged and cache-warm. MADV_FREE is the mechanism for doing that. And it's a very nice, simple one to use. Paging decisions stay in the kernel where they belong. Applications run fast if they have enough memory. Everything is happy. > ... except it doesn't schedule the "freed" pages for > disposal ahead of other pages in the system. but that should be easy > enough to add once the semantics are nailed down and the bugs have been > eliminated. It's not clear you'd want to do that. There is a cost for every "freed" page disposed of, so you don't want to dispose of them ahead of other pages. > ok, i don't understand why you think this. and besides, free(3) doesn't > shrink the heap currently, i believe. this would work if free(3) used > sbrk() to shrink the heap in an intelligent fashion, freeing kernel VM > resources along the way. if you want something to help free(3), i would > favor this design instead. free(3) already uses sbrk() to shrink the heap at the end. It's not usable for the typical 1/3 of memory which becomes holes in the heap. Yes the idea is to modify free(3) to permit the kernel to reclaim memory that is free in the application. However, none of sbrk() _or_ MADV_DONTNEED _or_ MADV_ZERO _or_ mmap(/dev/zero) have the desired effect. It has to be a win for the application to call this function -- and it it's a loss to zero pages as soon as you free them. But it's relatively cheap to just mark the pages as "reclaimable" without losing them. enjoy, -- Jamie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/