From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: vDSO vs. mm : problems with ppc vdso From: Benjamin Herrenschmidt Content-Type: text/plain Date: Tue, 28 Feb 2006 16:39:14 +1100 Message-Id: <1141105154.3767.27.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: Hugh Dickins , akpm@osdl.org, Paul Mackerras , Nick Piggin List-ID: is for 2.6.16 and see if we want to do something about it). I have discovered some issues with my vDSO implementation that went unnoticed so far but might cause problems with the VM. The problems are related to the way the powerpc vDSO is implemented in order to support COW (for breakpoints) and randomisation. It's not implemented as a gate_area() hack. Instead, I create a vma at process exec (see arch_setup_additional_pages() in arch/powerpc/kernel/vdso.c, which is called from binfmt_elf.c). This vma has custom vm_ops with a nopage() function that maps in pages from the vdso on demand. Those pages are kernel pages shared by all processes at first, though if a COW happens, they will be replaced by normal anonymous pages by the normal COW code. A first problem happens here (though it's not my main concern right now. It's a bug I need to fix but at least I have a good handle on it). The nopage function decides wether to map the pages from the 32 or the 64 bits vdso based on test_thread_flag(). This is broken if those pages end up being faulted in as the result of a get_user_pages() done by another process. Typically, that means that a 64 bits gdb tracing a 32 bits program will fault the wrong pages in. So I need a way to "know" what vdso to fault it based on the vma ... that will require me to either hack something in the vma (stuff a flag somewhere ?) or find a way to identify a 32 bits vma from a 64 bits vma... The second problem is more subtle and that's where I really need a VM guru to help me assess how bad the situation is and what should be done to fix it. Since when not-COWed, those vDSO pages are actually kernel pages mapped into every process, they aren't per-se anonymous pages, nor file pages... in fact, they don't quite fit in anything rmap knows about. However, I can't mark the VMA as VM_RESERVED or anything like that since that would prevent COW from working. Thus we hit some "interesting" code path in rmap of that sort: - page_address_in_vma() will always fail for those pages afaik. Not sure of the consequences at this point. (Neither PageAnon() nor page->mapping) - page_referenced() will not get into any of the code path under "if (page_mapped(page) && page->mapping) {" thanks to page->mapping being NULL afaik. I think that's a good thing in this case. We rely solely on the PTE information for these pages - try_to_unmap() gets more funny... It will call try_to_unmap_file(). Maybe we shouldn't ... maybe I should set the kernel pages of the vdso's PageLocked(), though I would have to dig through the possible side effects of that (notably vs. COW). If that works though, it may be a good workaround to avoid nasty code path in the VM. - If we hit try_to_unmap_one(), we'll probably do dec_mm_counter(mm, file_rss). But file_rss has never been incremented when the page was faulted in in the first place, was it ? Those shared kernel pages shouldn't be accounted there anyway - There may be other problematic code path outside of rmap.c that I missed. I'd really like to assess the situation and maybe get a few band aids in 2.6.16 if proper fixes are too complicated... Thanks ! Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org