* [RFC/PATCH 00/15 v2] kvm on big iron [not found] <1206030270.6690.51.camel@cotte.boeblingen.de.ibm.com> @ 2008-03-22 17:02 ` Carsten Otte 2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte [not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com> [not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com> 1 sibling, 2 replies; 9+ messages in thread From: Carsten Otte @ 2008-03-22 17:02 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity, Nick Piggin, Andrew Morton, hugh, Linux Memory Management List Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao This patch series introduces a backend for kvm to run on IBM System z machines (aka s390x) that uses the mainframe's sie virtualization capability. Many thanks for the review feedback we have received so far, I do greatly appreciate it! The first submission didn't draw much attention of elder vm magicians on linux-mm. I am adding Nick, Hugh and Andrew explicitly to the first two patches. Please do comment on our common code change buried in there. Is this acceptable for you? Who else does need to review them? Changes from the Version 1: - include Feedback from Randy Dunlap on the Documentation - include Feedback from Jeremy Fitzhardinge, the prototype for dup_mm has moved to include/linux/sched.h - rebase to current kvm.git hash g361be34. Thank you Avi for pulling in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-). Todo list: - I've created a patch for Christoph Helwig's feedback about symbolic names for machine_flags. This change is independent of the kvm port, and I will submit it for review to Martin. - Rusty Russell has provided feedback that improves patch #15. Christian is looking into that and will likely update that patch. If this goes in before, we can safely do an add-on patch on top of #15. - an open comment from Dave Hansen about a possible race enable_sie versus ptrace in patch #1 exceeds my basic vm knowledge and needs to be answered by Martin or Heiko The patch queue consists of the following patches: [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable [RFC/PATCH 02/15] preparation: host memory management changes for s390 kvm [RFC/PATCH 03/15] preparation: address of the 64bit extint parm in lowcore [RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module [RFC/PATCH 06/15] kvm-s390: sie intercept handling [RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw [RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions [RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp [RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions [RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390 [RFC/PATCH 12/15] kvm-s390: API documentation [RFC/PATCH 13/15] kvm-s390: update maintainers [RFC/PATCH 14/15] guest: detect when running on kvm [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 00/15 v3] kvm on big iron 2008-03-22 17:02 ` [RFC/PATCH 00/15 v2] kvm on big iron Carsten Otte @ 2008-03-25 17:47 ` Carsten Otte 2008-03-27 12:02 ` Avi Kivity [not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com> 1 sibling, 1 reply; 9+ messages in thread From: Carsten Otte @ 2008-03-25 17:47 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity Cc: Andrew Morton, Linux Memory Management List, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao, oliver.paukstadt Many thanks for the review feedback we have received so far, and many thanks to Andrew for reviewing our common code memory management changes. I do greatly appreciate that :-). All important parts have been reviewed, all review feedback has been integrated in the code. Therefore we would like to ask for inclusion of our work into kvm.git. Changes from Version 1: - include feedback from Randy Dunlap on the documentation - include feedback from Jeremy Fitzhardinge, the prototype for dup_mm has moved to include/linux/sched.h - rebase to current kvm.git hash g361be34. Thank you Avi for pulling in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-). Changes from Version 2: - include feedback from Rusty Russell on the virtio patch - include fix for race s390_enable_sie() versus ptrace spotted by Dave Hansen: we now do task_lock() to protect mm_users from update while we're growing the page table. Good catch, Dave :-). - rebase to current kvm.git hash g680615e Todo list: - I've created a patch for Christoph Helwig's feedback about symbolic names for machine_flags. This change is independent of the kvm port, and I will submit it for review to Martin. The patch queue consists of the following patches: [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable [RFC/PATCH 02/15] preparation: host memory management changes for s390 kvm [RFC/PATCH 03/15] preparation: address of the 64bit extint parm in lowcore [RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module [RFC/PATCH 06/15] kvm-s390: sie intercept handling [RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw [RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions [RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp [RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions [RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390 [RFC/PATCH 12/15] kvm-s390: API documentation [RFC/PATCH 13/15] kvm-s390: update maintainers [RFC/PATCH 14/15] guest: detect when running on kvm [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 00/15 v3] kvm on big iron 2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte @ 2008-03-27 12:02 ` Avi Kivity 0 siblings, 0 replies; 9+ messages in thread From: Avi Kivity @ 2008-03-27 12:02 UTC (permalink / raw) To: Carsten Otte Cc: virtualization, kvm-devel, Andrew Morton, Linux Memory Management List, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao, oliver.paukstadt Carsten Otte wrote: > Many thanks for the review feedback we have received so far, > and many thanks to Andrew for reviewing our common code memory > management changes. I do greatly appreciate that :-). > > All important parts have been reviewed, all review feedback has been > integrated in the code. Therefore we would like to ask for inclusion of > our work into kvm.git. > > Changes from Version 1: > - include feedback from Randy Dunlap on the documentation > - include feedback from Jeremy Fitzhardinge, the prototype for dup_mm > has moved to include/linux/sched.h > - rebase to current kvm.git hash g361be34. Thank you Avi for pulling > in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-). > > Changes from Version 2: > - include feedback from Rusty Russell on the virtio patch > - include fix for race s390_enable_sie() versus ptrace spotted by Dave > Hansen: we now do task_lock() to protect mm_users from update while > we're growing the page table. Good catch, Dave :-). > - rebase to current kvm.git hash g680615e > > Applied all, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>]
* [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable [not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com> @ 2008-03-25 17:47 ` Carsten Otte, Martin Schwidefsky, Carsten Otte 2008-03-25 17:47 ` [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger 1 sibling, 0 replies; 9+ messages in thread From: Carsten Otte, Martin Schwidefsky, Carsten Otte @ 2008-03-25 17:47 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity Cc: Andrew Morton, Linux Memory Management List, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao, oliver.paukstadt The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now call task_lock() to prevent race against ptrace modification of mm_users. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- arch/s390/Kconfig | 4 ++ arch/s390/kernel/setup.c | 4 ++ arch/s390/mm/pgtable.c | 65 +++++++++++++++++++++++++++++++++++++++-- include/asm-s390/mmu.h | 1 include/asm-s390/mmu_context.h | 8 ++++- include/asm-s390/pgtable.h | 1 include/linux/sched.h | 2 + kernel/fork.c | 2 - 8 files changed, 82 insertions(+), 5 deletions(-) Index: linux-host/arch/s390/Kconfig =================================================================== --- linux-host.orig/arch/s390/Kconfig +++ linux-host/arch/s390/Kconfig @@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK default y depends on SMP && PREEMPT +config PGSTE + bool + default y if KVM + mainmenu "Linux Kernel Configuration" config S390 Index: linux-host/arch/s390/kernel/setup.c =================================================================== --- linux-host.orig/arch/s390/kernel/setup.c +++ linux-host/arch/s390/kernel/setup.c @@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c early_param("ipldelay", early_parse_ipldelay); #ifdef CONFIG_S390_SWITCH_AMODE +#ifdef CONFIG_PGSTE +unsigned int switch_amode = 1; +#else unsigned int switch_amode = 0; +#endif EXPORT_SYMBOL_GPL(switch_amode); static void set_amode_and_uaccess(unsigned long user_amode, Index: linux-host/arch/s390/mm/pgtable.c =================================================================== --- linux-host.orig/arch/s390/mm/pgtable.c +++ linux-host/arch/s390/mm/pgtable.c @@ -30,11 +30,27 @@ #define TABLES_PER_PAGE 4 #define FRAG_MASK 15UL #define SECOND_HALVES 10UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 256, 0, PAGE_SIZE/4); + clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 768, 0, PAGE_SIZE/4); +} + #else #define ALLOC_ORDER 2 #define TABLES_PER_PAGE 2 #define FRAG_MASK 3UL #define SECOND_HALVES 2UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2); + memset(table + 256, 0, PAGE_SIZE/2); +} + #endif unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec) @@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m unsigned long *table; unsigned long bits; - bits = mm->context.noexec ? 3UL : 1UL; + bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL; spin_lock(&mm->page_table_lock); page = NULL; if (!list_empty(&mm->context.pgtable_list)) { @@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m pgtable_page_ctor(page); page->flags &= ~FRAG_MASK; table = (unsigned long *) page_to_phys(page); - clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); + if (mm->context.pgstes) + clear_table_pgstes(table); + else + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); spin_lock(&mm->page_table_lock); list_add(&page->lru, &mm->context.pgtable_list); } @@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m struct page *page; unsigned long bits; - bits = mm->context.noexec ? 3UL : 1UL; + bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL; bits <<= (__pa(table) & (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long); page = pfn_to_page(__pa(table) >> PAGE_SHIFT); spin_lock(&mm->page_table_lock); @@ -228,3 +247,43 @@ void disable_noexec(struct mm_struct *mm mm->context.noexec = 0; update_mm(mm, tsk); } + +/* + * switch on pgstes for its userspace process (for kvm) + */ +int s390_enable_sie(void) +{ + struct task_struct *tsk = current; + struct mm_struct *mm; + int rc; + + task_lock(tsk); + + rc = 0; + if (tsk->mm->context.pgstes) + goto unlock; + + rc = -EINVAL; + if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 || + tsk->mm != tsk->active_mm || tsk->mm->ioctx_list) + goto unlock; + + tsk->mm->context.pgstes = 1; /* dirty little tricks .. */ + mm = dup_mm(tsk); + tsk->mm->context.pgstes = 0; + + rc = -ENOMEM; + if (!mm) + goto unlock; + mmput(tsk->mm); + tsk->mm = tsk->active_mm = mm; + preempt_disable(); + update_mm(mm, tsk); + cpu_set(smp_processor_id(), mm->cpu_vm_mask); + preempt_enable(); + rc = 0; +unlock: + task_unlock(tsk); + return rc; +} +EXPORT_SYMBOL_GPL(s390_enable_sie); Index: linux-host/include/asm-s390/mmu.h =================================================================== --- linux-host.orig/include/asm-s390/mmu.h +++ linux-host/include/asm-s390/mmu.h @@ -7,6 +7,7 @@ typedef struct { unsigned long asce_bits; unsigned long asce_limit; int noexec; + int pgstes; } mm_context_t; #endif Index: linux-host/include/asm-s390/mmu_context.h =================================================================== --- linux-host.orig/include/asm-s390/mmu_context.h +++ linux-host/include/asm-s390/mmu_context.h @@ -20,7 +20,13 @@ static inline int init_new_context(struc #ifdef CONFIG_64BIT mm->context.asce_bits |= _ASCE_TYPE_REGION3; #endif - mm->context.noexec = s390_noexec; + if (current->mm->context.pgstes) { + mm->context.noexec = 0; + mm->context.pgstes = 1; + } else { + mm->context.noexec = s390_noexec; + mm->context.pgstes = 0; + } mm->context.asce_limit = STACK_TOP_MAX; crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); return 0; Index: linux-host/include/asm-s390/pgtable.h =================================================================== --- linux-host.orig/include/asm-s390/pgtable.h +++ linux-host/include/asm-s390/pgtable.h @@ -966,6 +966,7 @@ static inline pte_t mk_swap_pte(unsigned extern int add_shared_memory(unsigned long start, unsigned long size); extern int remove_shared_memory(unsigned long start, unsigned long size); +extern int s390_enable_sie(void); /* * No page table caches to initialise Index: linux-host/kernel/fork.c =================================================================== --- linux-host.orig/kernel/fork.c +++ linux-host/kernel/fork.c @@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk, * Allocate a new mm structure and copy contents from the * mm structure of the passed in task structure. */ -static struct mm_struct *dup_mm(struct task_struct *tsk) +struct mm_struct *dup_mm(struct task_struct *tsk) { struct mm_struct *mm, *oldmm = current->mm; int err; Index: linux-host/include/linux/sched.h =================================================================== --- linux-host.orig/include/linux/sched.h +++ linux-host/include/linux/sched.h @@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *); extern struct mm_struct *get_task_mm(struct task_struct *task); /* Remove the current tasks stale references to the old mm_struct */ extern void mm_release(struct task_struct *, struct mm_struct *); +/* Allocate a new mm structure and copy contents from tsk->mm */ +extern struct mm_struct *dup_mm(struct task_struct *tsk); extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *); extern void flush_thread(void); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm [not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com> 2008-03-25 17:47 ` [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky, Carsten Otte @ 2008-03-25 17:47 ` Carsten Otte, Heiko Carstens, Christian Borntraeger 1 sibling, 0 replies; 9+ messages in thread From: Carsten Otte, Heiko Carstens, Christian Borntraeger @ 2008-03-25 17:47 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity Cc: Andrew Morton, Linux Memory Management List, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao, oliver.paukstadt This patch changes the s390 memory management defintions to use the pgste field for dirty and reference bit tracking of host and guest code. Usually on s390, dirty and referenced are tracked in storage keys, which belong to the physical page. This changes with virtualization: The guest and host dirty/reference bits are defined to be the logical OR of the values for the mapping and the physical page. This patch implements the necessary changes in pgtable.h for s390. There is a common code change in mm/rmap.c, the call to page_test_and_clear_young must be moved. This is a no-op for all architecture but s390. page_referenced checks the referenced bits for the physiscal page and for all mappings: o The physical page is checked with page_test_and_clear_young. o The mappings are checked with ptep_test_and_clear_young and friends. Without pgstes (the current implementation on Linux s390) the physical page check is implemented but the mapping callbacks are no-ops because dirty and referenced are not tracked in the s390 page tables. The pgstes introduces guest and host dirty and reference bits for s390 in the host mapping. These mapping must be checked before page_test_and_clear_young resets the reference bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- include/asm-s390/pgtable.h | 92 +++++++++++++++++++++++++++++++++++++++++++-- mm/rmap.c | 7 +-- 2 files changed, 93 insertions(+), 6 deletions(-) Index: kvm/include/asm-s390/pgtable.h =================================================================== --- kvm.orig/include/asm-s390/pgtable.h +++ kvm/include/asm-s390/pgtable.h @@ -30,6 +30,7 @@ */ #ifndef __ASSEMBLY__ #include <linux/mm_types.h> +#include <asm/bitops.h> #include <asm/bug.h> #include <asm/processor.h> @@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE]; * swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid. */ +/* Page status table bits for virtualization */ +#define RCP_PCL_BIT 55 +#define RCP_HR_BIT 54 +#define RCP_HC_BIT 53 +#define RCP_GR_BIT 50 +#define RCP_GC_BIT 49 + #ifndef __s390x__ /* Bits in the segment table address-space-control-element */ @@ -513,6 +521,48 @@ static inline int pte_file(pte_t pte) #define __HAVE_ARCH_PTE_SAME #define pte_same(a,b) (pte_val(a) == pte_val(b)) +static inline void rcp_lock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE); + preempt_disable(); + while (test_and_set_bit(RCP_PCL_BIT, pgste)) + ; +#endif +} + +static inline void rcp_unlock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE); + clear_bit(RCP_PCL_BIT, pgste); + preempt_enable(); +#endif +} + +/* forward declaration for SetPageUptodate in page-flags.h*/ +static inline void page_clear_dirty(struct page *page); +#include <linux/page-flags.h> + +static inline void ptep_rcp_copy(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + struct page *page = virt_to_page(pte_val(*ptep)); + unsigned int skey; + unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE); + + skey = page_get_storage_key(page_to_phys(page)); + if (skey & _PAGE_CHANGED) + set_bit(RCP_GC_BIT, pgste); + if (skey & _PAGE_REFERENCED) + set_bit(RCP_GR_BIT, pgste); + if (test_and_clear_bit(RCP_HC_BIT, pgste)) + SetPageDirty(page); + if (test_and_clear_bit(RCP_HR_BIT, pgste)) + SetPageReferenced(page); +#endif +} + /* * query functions pte_write/pte_dirty/pte_young only work if * pte_present() is true. Undefined behaviour if not.. @@ -599,6 +649,8 @@ static inline void pmd_clear(pmd_t *pmd) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + if (mm->context.pgstes) + ptep_rcp_copy(ptep); pte_val(*ptep) = _PAGE_TYPE_EMPTY; if (mm->context.noexec) pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY; @@ -667,6 +719,24 @@ static inline pte_t pte_mkyoung(pte_t pt static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_PGSTE + unsigned long physpage; + int young; + unsigned long *pgste; + + if (!vma->vm_mm->context.pgstes) + return 0; + physpage = pte_val(*ptep) & PAGE_MASK; + pgste = (unsigned long *) (ptep + PTRS_PER_PTE); + + young = ((page_get_storage_key(physpage) & _PAGE_REFERENCED) != 0); + rcp_lock(ptep); + if (young) + set_bit(RCP_GR_BIT, pgste); + young |= test_and_clear_bit(RCP_HR_BIT, pgste); + rcp_unlock(ptep); + return young; +#endif return 0; } @@ -674,7 +744,13 @@ static inline int ptep_test_and_clear_yo static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { - /* No need to flush TLB; bits are in storage key */ + /* No need to flush TLB + * On s390 reference bits are in storage key and never in TLB + * With virtualization we handle the reference bit, without we + * we can simply return */ +#ifdef CONFIG_PGSTE + return ptep_test_and_clear_young(vma, address, ptep); +#endif return 0; } @@ -693,15 +769,25 @@ static inline void __ptep_ipte(unsigned : "=m" (*ptep) : "m" (*ptep), "a" (pto), "a" (address)); } - pte_val(*ptep) = _PAGE_TYPE_EMPTY; } static inline void ptep_invalidate(struct mm_struct *mm, unsigned long address, pte_t *ptep) { + if (mm->context.pgstes) { + rcp_lock(ptep); + __ptep_ipte(address, ptep); + ptep_rcp_copy(ptep); + pte_val(*ptep) = _PAGE_TYPE_EMPTY; + rcp_unlock(ptep); + return; + } __ptep_ipte(address, ptep); - if (mm->context.noexec) + pte_val(*ptep) = _PAGE_TYPE_EMPTY; + if (mm->context.noexec) { __ptep_ipte(address, ptep + PTRS_PER_PTE); + pte_val(*(ptep + PTRS_PER_PTE)) = _PAGE_TYPE_EMPTY; + } } /* Index: kvm/mm/rmap.c =================================================================== --- kvm.orig/mm/rmap.c +++ kvm/mm/rmap.c @@ -413,9 +413,6 @@ int page_referenced(struct page *page, i { int referenced = 0; - if (page_test_and_clear_young(page)) - referenced++; - if (TestClearPageReferenced(page)) referenced++; @@ -433,6 +430,10 @@ int page_referenced(struct page *page, i unlock_page(page); } } + + if (page_test_and_clear_young(page)) + referenced++; + return referenced; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>]
* [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable [not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com> @ 2008-03-22 17:02 ` Carsten Otte, Martin Schwidefsky 2008-03-24 21:50 ` Andrew Morton 2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger 1 sibling, 1 reply; 9+ messages in thread From: Carsten Otte, Martin Schwidefsky @ 2008-03-22 17:02 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity, Nick Piggin, Andrew Morton, hugh, Linux Memory Management List Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- arch/s390/Kconfig | 4 +++ arch/s390/kernel/setup.c | 4 +++ arch/s390/mm/pgtable.c | 53 ++++++++++++++++++++++++++++++++++++++--- include/asm-s390/mmu.h | 1 include/asm-s390/mmu_context.h | 8 +++++- include/asm-s390/pgtable.h | 1 include/linux/sched.h | 2 + kernel/fork.c | 2 - 8 files changed, 70 insertions(+), 5 deletions(-) Index: linux-host/arch/s390/Kconfig =================================================================== --- linux-host.orig/arch/s390/Kconfig +++ linux-host/arch/s390/Kconfig @@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK default y depends on SMP && PREEMPT +config PGSTE + bool + default y if KVM + mainmenu "Linux Kernel Configuration" config S390 Index: linux-host/arch/s390/kernel/setup.c =================================================================== --- linux-host.orig/arch/s390/kernel/setup.c +++ linux-host/arch/s390/kernel/setup.c @@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c early_param("ipldelay", early_parse_ipldelay); #ifdef CONFIG_S390_SWITCH_AMODE +#ifdef CONFIG_PGSTE +unsigned int switch_amode = 1; +#else unsigned int switch_amode = 0; +#endif EXPORT_SYMBOL_GPL(switch_amode); static void set_amode_and_uaccess(unsigned long user_amode, Index: linux-host/arch/s390/mm/pgtable.c =================================================================== --- linux-host.orig/arch/s390/mm/pgtable.c +++ linux-host/arch/s390/mm/pgtable.c @@ -30,11 +30,27 @@ #define TABLES_PER_PAGE 4 #define FRAG_MASK 15UL #define SECOND_HALVES 10UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 256, 0, PAGE_SIZE/4); + clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 768, 0, PAGE_SIZE/4); +} + #else #define ALLOC_ORDER 2 #define TABLES_PER_PAGE 2 #define FRAG_MASK 3UL #define SECOND_HALVES 2UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2); + memset(table + 256, 0, PAGE_SIZE/2); +} + #endif unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec) @@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m unsigned long *table; unsigned long bits; - bits = mm->context.noexec ? 3UL : 1UL; + bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL; spin_lock(&mm->page_table_lock); page = NULL; if (!list_empty(&mm->context.pgtable_list)) { @@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m pgtable_page_ctor(page); page->flags &= ~FRAG_MASK; table = (unsigned long *) page_to_phys(page); - clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); + if (mm->context.pgstes) + clear_table_pgstes(table); + else + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); spin_lock(&mm->page_table_lock); list_add(&page->lru, &mm->context.pgtable_list); } @@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m struct page *page; unsigned long bits; - bits = mm->context.noexec ? 3UL : 1UL; + bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL; bits <<= (__pa(table) & (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long); page = pfn_to_page(__pa(table) >> PAGE_SHIFT); spin_lock(&mm->page_table_lock); @@ -228,3 +247,31 @@ void disable_noexec(struct mm_struct *mm mm->context.noexec = 0; update_mm(mm, tsk); } + +/* + * switch on pgstes for its userspace process (for kvm) + */ +int s390_enable_sie(void) +{ + struct task_struct *tsk = current; + struct mm_struct *mm; + + if (tsk->mm->context.pgstes) + return 0; + if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 || + tsk->mm != tsk->active_mm || tsk->mm->ioctx_list) + return -EINVAL; + tsk->mm->context.pgstes = 1; /* dirty little tricks .. */ + mm = dup_mm(tsk); + tsk->mm->context.pgstes = 0; + if (!mm) + return -ENOMEM; + mmput(tsk->mm); + tsk->mm = tsk->active_mm = mm; + preempt_disable(); + update_mm(mm, tsk); + cpu_set(smp_processor_id(), mm->cpu_vm_mask); + preempt_enable(); + return 0; +} +EXPORT_SYMBOL_GPL(s390_enable_sie); Index: linux-host/include/asm-s390/mmu.h =================================================================== --- linux-host.orig/include/asm-s390/mmu.h +++ linux-host/include/asm-s390/mmu.h @@ -7,6 +7,7 @@ typedef struct { unsigned long asce_bits; unsigned long asce_limit; int noexec; + int pgstes; } mm_context_t; #endif Index: linux-host/include/asm-s390/mmu_context.h =================================================================== --- linux-host.orig/include/asm-s390/mmu_context.h +++ linux-host/include/asm-s390/mmu_context.h @@ -20,7 +20,13 @@ static inline int init_new_context(struc #ifdef CONFIG_64BIT mm->context.asce_bits |= _ASCE_TYPE_REGION3; #endif - mm->context.noexec = s390_noexec; + if (current->mm->context.pgstes) { + mm->context.noexec = 0; + mm->context.pgstes = 1; + } else { + mm->context.noexec = s390_noexec; + mm->context.pgstes = 0; + } mm->context.asce_limit = STACK_TOP_MAX; crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); return 0; Index: linux-host/include/asm-s390/pgtable.h =================================================================== --- linux-host.orig/include/asm-s390/pgtable.h +++ linux-host/include/asm-s390/pgtable.h @@ -966,6 +966,7 @@ static inline pte_t mk_swap_pte(unsigned extern int add_shared_memory(unsigned long start, unsigned long size); extern int remove_shared_memory(unsigned long start, unsigned long size); +extern int s390_enable_sie(void); /* * No page table caches to initialise Index: linux-host/kernel/fork.c =================================================================== --- linux-host.orig/kernel/fork.c +++ linux-host/kernel/fork.c @@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk, * Allocate a new mm structure and copy contents from the * mm structure of the passed in task structure. */ -static struct mm_struct *dup_mm(struct task_struct *tsk) +struct mm_struct *dup_mm(struct task_struct *tsk) { struct mm_struct *mm, *oldmm = current->mm; int err; Index: linux-host/include/linux/sched.h =================================================================== --- linux-host.orig/include/linux/sched.h +++ linux-host/include/linux/sched.h @@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *); extern struct mm_struct *get_task_mm(struct task_struct *task); /* Remove the current tasks stale references to the old mm_struct */ extern void mm_release(struct task_struct *, struct mm_struct *); +/* Allocate a new mm structure and copy contents from tsk->mm */ +extern struct mm_struct *dup_mm(struct task_struct *tsk); extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *); extern void flush_thread(void); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable 2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky @ 2008-03-24 21:50 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2008-03-24 21:50 UTC (permalink / raw) To: Carsten Otte Cc: virtualization, kvm-devel, avi, npiggin, hugh, linux-mm, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, xiantao.zhang On Sat, 22 Mar 2008 18:02:37 +0100 Carsten Otte <cotte@de.ibm.com> wrote: > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > The SIE instruction on s390 uses the 2nd half of the page table page to > virtualize the storage keys of a guest. This patch offers the s390_enable_sie > function, which reorganizes the page tables of a single-threaded process to > reserve space in the page table: > s390_enable_sie makes sure that the process is single threaded and then uses > dup_mm to create a new mm with reorganized page tables. The old mm is freed > and the process has now a page status extended field after every page table. > > Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. > > This patch has a small common code hit, namely making dup_mm non-static. > > Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's > review feedback. Now we do have the prototype for dup_mm in > include/linux/sched.h. > > ... > > --- linux-host.orig/kernel/fork.c > +++ linux-host/kernel/fork.c > @@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk, > * Allocate a new mm structure and copy contents from the > * mm structure of the passed in task structure. > */ > -static struct mm_struct *dup_mm(struct task_struct *tsk) > +struct mm_struct *dup_mm(struct task_struct *tsk) > { > struct mm_struct *mm, *oldmm = current->mm; > int err; ack > --- linux-host.orig/include/linux/sched.h > +++ linux-host/include/linux/sched.h > @@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *); > extern struct mm_struct *get_task_mm(struct task_struct *task); > /* Remove the current tasks stale references to the old mm_struct */ > extern void mm_release(struct task_struct *, struct mm_struct *); > +/* Allocate a new mm structure and copy contents from tsk->mm */ > +extern struct mm_struct *dup_mm(struct task_struct *tsk); > > extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *); > extern void flush_thread(void); > hm, why did we put these in sched.h? oh well - acked-by-me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm [not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com> 2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky @ 2008-03-22 17:02 ` Carsten Otte, Heiko Carstens, Christian Borntraeger 2008-03-24 21:52 ` Andrew Morton 1 sibling, 1 reply; 9+ messages in thread From: Carsten Otte, Heiko Carstens, Christian Borntraeger @ 2008-03-22 17:02 UTC (permalink / raw) To: virtualization, kvm-devel, Avi Kivity, Nick Piggin, Andrew Morton, hugh, Linux Memory Management List Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao This patch changes the s390 memory management defintions to use the pgste field for dirty and reference bit tracking of host and guest code. Usually on s390, dirty and referenced are tracked in storage keys, which belong to the physical page. This changes with virtualization: The guest and host dirty/reference bits are defined to be the logical OR of the values for the mapping and the physical page. This patch implements the necessary changes in pgtable.h for s390. There is a common code change in mm/rmap.c, the call to page_test_and_clear_young must be moved. This is a no-op for all architecture but s390. page_referenced checks the referenced bits for the physiscal page and for all mappings: o The physical page is checked with page_test_and_clear_young. o The mappings are checked with ptep_test_and_clear_young and friends. Without pgstes (the current implementation on Linux s390) the physical page check is implemented but the mapping callbacks are no-ops because dirty and referenced are not tracked in the s390 page tables. The pgstes introduces guest and host dirty and reference bits for s390 in the host mapping. These mapping must be checked before page_test_and_clear_young resets the reference bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- include/asm-s390/pgtable.h | 109 +++++++++++++++++++++++++++++++++++++++++++-- mm/rmap.c | 7 +- 2 files changed, 110 insertions(+), 6 deletions(-) Index: linux-host/include/asm-s390/pgtable.h =================================================================== --- linux-host.orig/include/asm-s390/pgtable.h +++ linux-host/include/asm-s390/pgtable.h @@ -30,6 +30,7 @@ */ #ifndef __ASSEMBLY__ #include <linux/mm_types.h> +#include <asm/atomic.h> #include <asm/bug.h> #include <asm/processor.h> @@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE]; * swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid. */ +/* Page status extended for virtualization */ +#define _PAGE_RCP_PCL 0x0080000000000000UL +#define _PAGE_RCP_HR 0x0040000000000000UL +#define _PAGE_RCP_HC 0x0020000000000000UL +#define _PAGE_RCP_GR 0x0004000000000000UL +#define _PAGE_RCP_GC 0x0002000000000000UL + #ifndef __s390x__ /* Bits in the segment table address-space-control-element */ @@ -513,6 +521,67 @@ static inline int pte_file(pte_t pte) #define __HAVE_ARCH_PTE_SAME #define pte_same(a,b) (pte_val(a) == pte_val(b)) +static inline void rcp_lock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE); + preempt_disable(); + atomic64_set_mask(_PAGE_RCP_PCL, rcp); +#endif +} + +static inline void rcp_unlock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE); + atomic64_clear_mask(_PAGE_RCP_PCL, rcp); + preempt_enable(); +#endif +} + +static inline void rcp_set_bits(pte_t *ptep, unsigned long val) +{ +#ifdef CONFIG_PGSTE + *(unsigned long *) (ptep + PTRS_PER_PTE) |= val; +#endif +} + +static inline int rcp_test_and_clear_bits(pte_t *ptep, unsigned long val) +{ +#ifdef CONFIG_PGSTE + unsigned long ret; + + ret = *(unsigned long *) (ptep + PTRS_PER_PTE); + *(unsigned long *) (ptep + PTRS_PER_PTE) &= ~val; + return (ret & val) == val; +#else + return 0; +#endif +} + + +/* forward declaration for SetPageUptodate in page-flags.h*/ +static inline void page_clear_dirty(struct page *page); +#include <linux/page-flags.h> + +static inline void ptep_rcp_copy(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + struct page *page = virt_to_page(pte_val(*ptep)); + unsigned int skey; + + skey = page_get_storage_key(page_to_phys(page)); + if (skey & _PAGE_CHANGED) + rcp_set_bits(ptep, _PAGE_RCP_GC); + if (skey & _PAGE_REFERENCED) + rcp_set_bits(ptep, _PAGE_RCP_GR); + if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HC)) + SetPageDirty(page); + if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR)) + SetPageReferenced(page); +#endif +} + /* * query functions pte_write/pte_dirty/pte_young only work if * pte_present() is true. Undefined behaviour if not.. @@ -599,6 +668,8 @@ static inline void pmd_clear(pmd_t *pmd) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + if (mm->context.pgstes) + ptep_rcp_copy(ptep); pte_val(*ptep) = _PAGE_TYPE_EMPTY; if (mm->context.noexec) pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY; @@ -667,6 +738,22 @@ static inline pte_t pte_mkyoung(pte_t pt static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_PGSTE + unsigned long physpage; + int young; + + if (!vma->vm_mm->context.pgstes) + return 0; + physpage = pte_val(*ptep) & PAGE_MASK; + + young = ((page_get_storage_key(physpage) & _PAGE_REFERENCED) != 0); + rcp_lock(ptep); + if (young) + rcp_set_bits(ptep, _PAGE_RCP_GR); + young |= rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR); + rcp_unlock(ptep); + return young; +#endif return 0; } @@ -674,7 +761,13 @@ static inline int ptep_test_and_clear_yo static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { - /* No need to flush TLB; bits are in storage key */ + /* No need to flush TLB + * On s390 reference bits are in storage key and never in TLB + * With virtualization we handle the reference bit, without we + * we can simply return */ +#ifdef CONFIG_PGSTE + return ptep_test_and_clear_young(vma, address, ptep); +#endif return 0; } @@ -693,15 +786,25 @@ static inline void __ptep_ipte(unsigned : "=m" (*ptep) : "m" (*ptep), "a" (pto), "a" (address)); } - pte_val(*ptep) = _PAGE_TYPE_EMPTY; } static inline void ptep_invalidate(struct mm_struct *mm, unsigned long address, pte_t *ptep) { + if (mm->context.pgstes) { + rcp_lock(ptep); + __ptep_ipte(address, ptep); + ptep_rcp_copy(ptep); + pte_val(*ptep) = _PAGE_TYPE_EMPTY; + rcp_unlock(ptep); + return; + } __ptep_ipte(address, ptep); - if (mm->context.noexec) + pte_val(*ptep) = _PAGE_TYPE_EMPTY; + if (mm->context.noexec) { __ptep_ipte(address, ptep + PTRS_PER_PTE); + pte_val(*(ptep + PTRS_PER_PTE)) = _PAGE_TYPE_EMPTY; + } } /* Index: linux-host/mm/rmap.c =================================================================== --- linux-host.orig/mm/rmap.c +++ linux-host/mm/rmap.c @@ -413,9 +413,6 @@ int page_referenced(struct page *page, i { int referenced = 0; - if (page_test_and_clear_young(page)) - referenced++; - if (TestClearPageReferenced(page)) referenced++; @@ -433,6 +430,10 @@ int page_referenced(struct page *page, i unlock_page(page); } } + + if (page_test_and_clear_young(page)) + referenced++; + return referenced; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm 2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger @ 2008-03-24 21:52 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2008-03-24 21:52 UTC (permalink / raw) To: Carsten Otte Cc: virtualization, kvm-devel, avi, npiggin, hugh, linux-mm, schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck, rvdheij, rusty, arnd, xiantao.zhang On Sat, 22 Mar 2008 18:02:39 +0100 Carsten Otte <cotte@de.ibm.com> wrote: > From: Heiko Carstens <heiko.carstens@de.ibm.com> > From: Christian Borntraeger <borntraeger@de.ibm.com> > > This patch changes the s390 memory management defintions to use the pgste field > for dirty and reference bit tracking of host and guest code. Usually on s390, > dirty and referenced are tracked in storage keys, which belong to the physical > page. This changes with virtualization: The guest and host dirty/reference bits > are defined to be the logical OR of the values for the mapping and the physical > page. This patch implements the necessary changes in pgtable.h for s390. > > > There is a common code change in mm/rmap.c, the call to page_test_and_clear_young > must be moved. This is a no-op for all architecture but s390. page_referenced > checks the referenced bits for the physiscal page and for all mappings: > o The physical page is checked with page_test_and_clear_young. > o The mappings are checked with ptep_test_and_clear_young and friends. > > Without pgstes (the current implementation on Linux s390) the physical page > check is implemented but the mapping callbacks are no-ops because dirty > and referenced are not tracked in the s390 page tables. The pgstes introduces > guest and host dirty and reference bits for s390 in the host mapping. These > mapping must be checked before page_test_and_clear_young resets the reference > bit. > > ... > > --- linux-host.orig/mm/rmap.c > +++ linux-host/mm/rmap.c > @@ -413,9 +413,6 @@ int page_referenced(struct page *page, i > { > int referenced = 0; > > - if (page_test_and_clear_young(page)) > - referenced++; > - > if (TestClearPageReferenced(page)) > referenced++; > > @@ -433,6 +430,10 @@ int page_referenced(struct page *page, i > unlock_page(page); > } > } > + > + if (page_test_and_clear_young(page)) > + referenced++; > + > return referenced; > } ack. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-27 12:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1206030270.6690.51.camel@cotte.boeblingen.de.ibm.com>
2008-03-22 17:02 ` [RFC/PATCH 00/15 v2] kvm on big iron Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte
2008-03-27 12:02 ` Avi Kivity
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
2008-03-25 17:47 ` [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky, Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
[not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>
2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky
2008-03-24 21:50 ` Andrew Morton
2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
2008-03-24 21:52 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox