From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 5 Apr 2006 11:17:21 -0700 (PDT) From: Christoph Lameter Subject: Some ideas on lazy migration with swapless migration In-Reply-To: <1144256328.5203.36.camel@localhost.localdomain> Message-ID: References: <20060404065739.24532.95451.sendpatchset@schroedinger.engr.sgi.com> <1144248362.5203.22.camel@localhost.localdomain> <1144256328.5203.36.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Lee Schermerhorn Cc: linux-mm@kvack.org, lhms-devel@lists.sourceforge.net, Hirokazu Takahashi , Marcelo Tosatti , KAMEZAWA Hiroyuki List-ID: I think it is possible to do lazy migration without having to resort to a migration cache by either A. Forbidding write to the page. The corresponding invocation to to do_wp_page() on a write attempt can then be used to migrate the page. However, this would only work for write attempts. B. Clear the present bit. The corresponding invocation of do_swap_page may check for the type of pte and do the lazy migration and then set the present bit again. Hmm... B. would be an even better way to replace SWP_TYPE_MIGRATION and not use the swap code at all (which would simply take a lock on the page and redo the fault after releasing the lock) but it would require some work to get arch support for clearing and setting the present bit. However, there are only a few arches supporting NUMA and migration. So it should be doable. Maybe the idea with the present bit can be used to further simplify migration: 1. Before migration clear all the present bits which guarantees that the faults will stall in do_swap_page() since the page is locked. No need to reduce the mapcount since the ptes are still there and can be switched back to working condition by do_swap_page(). 2. do_swap_page() will lock the page (and therefore stall during migration). After the page lock is obtained we check the present bit if it is now set then redo the fault. If not then do lazy migration if needed and set the bit. 3. Migration will move the page and then replace ptes with cleared present bits with ptes pointing to the new page with the present bit enabled. Since we do not reduce the mapcount, we can use that mapcount to verify that it is still safe to get to the corresponding anonymous vma for anonymous pages. Some portions of the vm would have to be fixed up to know how to deal with valid ptes that are not present (fork and unmap code). For file backed pages we would not have to remove the references anymore. We can migrate in the same way as the anonymous pages. We just need to make sure to first change the mapping. That would be an important feature for us because it preserves the page state in a better way. We could also preserve the dirty bits and accessed bits in the pte. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org