* [RFC/PATCH 00/15 v2] kvm on big iron
[not found] <1206030270.6690.51.camel@cotte.boeblingen.de.ibm.com>
@ 2008-03-22 17:02 ` Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
[not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>
1 sibling, 2 replies; 9+ messages in thread
From: Carsten Otte @ 2008-03-22 17:02 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity, Nick Piggin,
Andrew Morton, hugh, Linux Memory Management List
Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT,
jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao
This patch series introduces a backend for kvm to run on IBM System z
machines (aka s390x) that uses the mainframe's sie virtualization
capability. Many thanks for the review feedback we have received so far,
I do greatly appreciate it!
The first submission didn't draw much attention of elder vm magicians on
linux-mm. I am adding Nick, Hugh and Andrew explicitly to the first two
patches. Please do comment on our common code change buried in there. Is
this acceptable for you? Who else does need to review them?
Changes from the Version 1:
- include Feedback from Randy Dunlap on the Documentation
- include Feedback from Jeremy Fitzhardinge, the prototype for dup_mm
has moved to include/linux/sched.h
- rebase to current kvm.git hash g361be34. Thank you Avi for pulling
in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-).
Todo list:
- I've created a patch for Christoph Helwig's feedback about symbolic
names for machine_flags. This change is independent of the kvm port, and
I will submit it for review to Martin.
- Rusty Russell has provided feedback that improves patch #15. Christian
is looking into that and will likely update that patch. If this goes in
before, we can safely do an add-on patch on top of #15.
- an open comment from Dave Hansen about a possible race enable_sie
versus ptrace in patch #1 exceeds my basic vm knowledge and needs to be
answered by Martin or Heiko
The patch queue consists of the following patches:
[RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user
pagetable
[RFC/PATCH 02/15] preparation: host memory management changes for s390
kvm
[RFC/PATCH 03/15] preparation: address of the 64bit extint parm in
lowcore
[RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use
[RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module
[RFC/PATCH 06/15] kvm-s390: sie intercept handling
[RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw
[RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions
[RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp
[RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions
[RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390
[RFC/PATCH 12/15] kvm-s390: API documentation
[RFC/PATCH 13/15] kvm-s390: update maintainers
[RFC/PATCH 14/15] guest: detect when running on kvm
[RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable
[not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>
@ 2008-03-22 17:02 ` Carsten Otte, Martin Schwidefsky
2008-03-24 21:50 ` Andrew Morton
2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
1 sibling, 1 reply; 9+ messages in thread
From: Carsten Otte, Martin Schwidefsky @ 2008-03-22 17:02 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity, Nick Piggin,
Andrew Morton, hugh, Linux Memory Management List
Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT,
jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/Kconfig | 4 +++
arch/s390/kernel/setup.c | 4 +++
arch/s390/mm/pgtable.c | 53 ++++++++++++++++++++++++++++++++++++++---
include/asm-s390/mmu.h | 1
include/asm-s390/mmu_context.h | 8 +++++-
include/asm-s390/pgtable.h | 1
include/linux/sched.h | 2 +
kernel/fork.c | 2 -
8 files changed, 70 insertions(+), 5 deletions(-)
Index: linux-host/arch/s390/Kconfig
===================================================================
--- linux-host.orig/arch/s390/Kconfig
+++ linux-host/arch/s390/Kconfig
@@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK
default y
depends on SMP && PREEMPT
+config PGSTE
+ bool
+ default y if KVM
+
mainmenu "Linux Kernel Configuration"
config S390
Index: linux-host/arch/s390/kernel/setup.c
===================================================================
--- linux-host.orig/arch/s390/kernel/setup.c
+++ linux-host/arch/s390/kernel/setup.c
@@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c
early_param("ipldelay", early_parse_ipldelay);
#ifdef CONFIG_S390_SWITCH_AMODE
+#ifdef CONFIG_PGSTE
+unsigned int switch_amode = 1;
+#else
unsigned int switch_amode = 0;
+#endif
EXPORT_SYMBOL_GPL(switch_amode);
static void set_amode_and_uaccess(unsigned long user_amode,
Index: linux-host/arch/s390/mm/pgtable.c
===================================================================
--- linux-host.orig/arch/s390/mm/pgtable.c
+++ linux-host/arch/s390/mm/pgtable.c
@@ -30,11 +30,27 @@
#define TABLES_PER_PAGE 4
#define FRAG_MASK 15UL
#define SECOND_HALVES 10UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+ memset(table + 256, 0, PAGE_SIZE/4);
+ clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+ memset(table + 768, 0, PAGE_SIZE/4);
+}
+
#else
#define ALLOC_ORDER 2
#define TABLES_PER_PAGE 2
#define FRAG_MASK 3UL
#define SECOND_HALVES 2UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2);
+ memset(table + 256, 0, PAGE_SIZE/2);
+}
+
#endif
unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec)
@@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m
unsigned long *table;
unsigned long bits;
- bits = mm->context.noexec ? 3UL : 1UL;
+ bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
spin_lock(&mm->page_table_lock);
page = NULL;
if (!list_empty(&mm->context.pgtable_list)) {
@@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m
pgtable_page_ctor(page);
page->flags &= ~FRAG_MASK;
table = (unsigned long *) page_to_phys(page);
- clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
+ if (mm->context.pgstes)
+ clear_table_pgstes(table);
+ else
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
spin_lock(&mm->page_table_lock);
list_add(&page->lru, &mm->context.pgtable_list);
}
@@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m
struct page *page;
unsigned long bits;
- bits = mm->context.noexec ? 3UL : 1UL;
+ bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
bits <<= (__pa(table) & (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long);
page = pfn_to_page(__pa(table) >> PAGE_SHIFT);
spin_lock(&mm->page_table_lock);
@@ -228,3 +247,31 @@ void disable_noexec(struct mm_struct *mm
mm->context.noexec = 0;
update_mm(mm, tsk);
}
+
+/*
+ * switch on pgstes for its userspace process (for kvm)
+ */
+int s390_enable_sie(void)
+{
+ struct task_struct *tsk = current;
+ struct mm_struct *mm;
+
+ if (tsk->mm->context.pgstes)
+ return 0;
+ if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 ||
+ tsk->mm != tsk->active_mm || tsk->mm->ioctx_list)
+ return -EINVAL;
+ tsk->mm->context.pgstes = 1; /* dirty little tricks .. */
+ mm = dup_mm(tsk);
+ tsk->mm->context.pgstes = 0;
+ if (!mm)
+ return -ENOMEM;
+ mmput(tsk->mm);
+ tsk->mm = tsk->active_mm = mm;
+ preempt_disable();
+ update_mm(mm, tsk);
+ cpu_set(smp_processor_id(), mm->cpu_vm_mask);
+ preempt_enable();
+ return 0;
+}
+EXPORT_SYMBOL_GPL(s390_enable_sie);
Index: linux-host/include/asm-s390/mmu.h
===================================================================
--- linux-host.orig/include/asm-s390/mmu.h
+++ linux-host/include/asm-s390/mmu.h
@@ -7,6 +7,7 @@ typedef struct {
unsigned long asce_bits;
unsigned long asce_limit;
int noexec;
+ int pgstes;
} mm_context_t;
#endif
Index: linux-host/include/asm-s390/mmu_context.h
===================================================================
--- linux-host.orig/include/asm-s390/mmu_context.h
+++ linux-host/include/asm-s390/mmu_context.h
@@ -20,7 +20,13 @@ static inline int init_new_context(struc
#ifdef CONFIG_64BIT
mm->context.asce_bits |= _ASCE_TYPE_REGION3;
#endif
- mm->context.noexec = s390_noexec;
+ if (current->mm->context.pgstes) {
+ mm->context.noexec = 0;
+ mm->context.pgstes = 1;
+ } else {
+ mm->context.noexec = s390_noexec;
+ mm->context.pgstes = 0;
+ }
mm->context.asce_limit = STACK_TOP_MAX;
crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm));
return 0;
Index: linux-host/include/asm-s390/pgtable.h
===================================================================
--- linux-host.orig/include/asm-s390/pgtable.h
+++ linux-host/include/asm-s390/pgtable.h
@@ -966,6 +966,7 @@ static inline pte_t mk_swap_pte(unsigned
extern int add_shared_memory(unsigned long start, unsigned long size);
extern int remove_shared_memory(unsigned long start, unsigned long size);
+extern int s390_enable_sie(void);
/*
* No page table caches to initialise
Index: linux-host/kernel/fork.c
===================================================================
--- linux-host.orig/kernel/fork.c
+++ linux-host/kernel/fork.c
@@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk,
* Allocate a new mm structure and copy contents from the
* mm structure of the passed in task structure.
*/
-static struct mm_struct *dup_mm(struct task_struct *tsk)
+struct mm_struct *dup_mm(struct task_struct *tsk)
{
struct mm_struct *mm, *oldmm = current->mm;
int err;
Index: linux-host/include/linux/sched.h
===================================================================
--- linux-host.orig/include/linux/sched.h
+++ linux-host/include/linux/sched.h
@@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *);
extern struct mm_struct *get_task_mm(struct task_struct *task);
/* Remove the current tasks stale references to the old mm_struct */
extern void mm_release(struct task_struct *, struct mm_struct *);
+/* Allocate a new mm structure and copy contents from tsk->mm */
+extern struct mm_struct *dup_mm(struct task_struct *tsk);
extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *);
extern void flush_thread(void);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm
[not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>
2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky
@ 2008-03-22 17:02 ` Carsten Otte, Heiko Carstens, Christian Borntraeger
2008-03-24 21:52 ` Andrew Morton
1 sibling, 1 reply; 9+ messages in thread
From: Carsten Otte, Heiko Carstens, Christian Borntraeger @ 2008-03-22 17:02 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity, Nick Piggin,
Andrew Morton, hugh, Linux Memory Management List
Cc: schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT,
jeroney, aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao
This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.
There is a common code change in mm/rmap.c, the call to page_test_and_clear_young
must be moved. This is a no-op for all architecture but s390. page_referenced
checks the referenced bits for the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.
Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
include/asm-s390/pgtable.h | 109 +++++++++++++++++++++++++++++++++++++++++++--
mm/rmap.c | 7 +-
2 files changed, 110 insertions(+), 6 deletions(-)
Index: linux-host/include/asm-s390/pgtable.h
===================================================================
--- linux-host.orig/include/asm-s390/pgtable.h
+++ linux-host/include/asm-s390/pgtable.h
@@ -30,6 +30,7 @@
*/
#ifndef __ASSEMBLY__
#include <linux/mm_types.h>
+#include <asm/atomic.h>
#include <asm/bug.h>
#include <asm/processor.h>
@@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE];
* swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid.
*/
+/* Page status extended for virtualization */
+#define _PAGE_RCP_PCL 0x0080000000000000UL
+#define _PAGE_RCP_HR 0x0040000000000000UL
+#define _PAGE_RCP_HC 0x0020000000000000UL
+#define _PAGE_RCP_GR 0x0004000000000000UL
+#define _PAGE_RCP_GC 0x0002000000000000UL
+
#ifndef __s390x__
/* Bits in the segment table address-space-control-element */
@@ -513,6 +521,67 @@ static inline int pte_file(pte_t pte)
#define __HAVE_ARCH_PTE_SAME
#define pte_same(a,b) (pte_val(a) == pte_val(b))
+static inline void rcp_lock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE);
+ preempt_disable();
+ atomic64_set_mask(_PAGE_RCP_PCL, rcp);
+#endif
+}
+
+static inline void rcp_unlock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE);
+ atomic64_clear_mask(_PAGE_RCP_PCL, rcp);
+ preempt_enable();
+#endif
+}
+
+static inline void rcp_set_bits(pte_t *ptep, unsigned long val)
+{
+#ifdef CONFIG_PGSTE
+ *(unsigned long *) (ptep + PTRS_PER_PTE) |= val;
+#endif
+}
+
+static inline int rcp_test_and_clear_bits(pte_t *ptep, unsigned long val)
+{
+#ifdef CONFIG_PGSTE
+ unsigned long ret;
+
+ ret = *(unsigned long *) (ptep + PTRS_PER_PTE);
+ *(unsigned long *) (ptep + PTRS_PER_PTE) &= ~val;
+ return (ret & val) == val;
+#else
+ return 0;
+#endif
+}
+
+
+/* forward declaration for SetPageUptodate in page-flags.h*/
+static inline void page_clear_dirty(struct page *page);
+#include <linux/page-flags.h>
+
+static inline void ptep_rcp_copy(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ struct page *page = virt_to_page(pte_val(*ptep));
+ unsigned int skey;
+
+ skey = page_get_storage_key(page_to_phys(page));
+ if (skey & _PAGE_CHANGED)
+ rcp_set_bits(ptep, _PAGE_RCP_GC);
+ if (skey & _PAGE_REFERENCED)
+ rcp_set_bits(ptep, _PAGE_RCP_GR);
+ if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HC))
+ SetPageDirty(page);
+ if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR))
+ SetPageReferenced(page);
+#endif
+}
+
/*
* query functions pte_write/pte_dirty/pte_young only work if
* pte_present() is true. Undefined behaviour if not..
@@ -599,6 +668,8 @@ static inline void pmd_clear(pmd_t *pmd)
static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
+ if (mm->context.pgstes)
+ ptep_rcp_copy(ptep);
pte_val(*ptep) = _PAGE_TYPE_EMPTY;
if (mm->context.noexec)
pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY;
@@ -667,6 +738,22 @@ static inline pte_t pte_mkyoung(pte_t pt
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
+#ifdef CONFIG_PGSTE
+ unsigned long physpage;
+ int young;
+
+ if (!vma->vm_mm->context.pgstes)
+ return 0;
+ physpage = pte_val(*ptep) & PAGE_MASK;
+
+ young = ((page_get_storage_key(physpage) & _PAGE_REFERENCED) != 0);
+ rcp_lock(ptep);
+ if (young)
+ rcp_set_bits(ptep, _PAGE_RCP_GR);
+ young |= rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR);
+ rcp_unlock(ptep);
+ return young;
+#endif
return 0;
}
@@ -674,7 +761,13 @@ static inline int ptep_test_and_clear_yo
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- /* No need to flush TLB; bits are in storage key */
+ /* No need to flush TLB
+ * On s390 reference bits are in storage key and never in TLB
+ * With virtualization we handle the reference bit, without we
+ * we can simply return */
+#ifdef CONFIG_PGSTE
+ return ptep_test_and_clear_young(vma, address, ptep);
+#endif
return 0;
}
@@ -693,15 +786,25 @@ static inline void __ptep_ipte(unsigned
: "=m" (*ptep) : "m" (*ptep),
"a" (pto), "a" (address));
}
- pte_val(*ptep) = _PAGE_TYPE_EMPTY;
}
static inline void ptep_invalidate(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
+ if (mm->context.pgstes) {
+ rcp_lock(ptep);
+ __ptep_ipte(address, ptep);
+ ptep_rcp_copy(ptep);
+ pte_val(*ptep) = _PAGE_TYPE_EMPTY;
+ rcp_unlock(ptep);
+ return;
+ }
__ptep_ipte(address, ptep);
- if (mm->context.noexec)
+ pte_val(*ptep) = _PAGE_TYPE_EMPTY;
+ if (mm->context.noexec) {
__ptep_ipte(address, ptep + PTRS_PER_PTE);
+ pte_val(*(ptep + PTRS_PER_PTE)) = _PAGE_TYPE_EMPTY;
+ }
}
/*
Index: linux-host/mm/rmap.c
===================================================================
--- linux-host.orig/mm/rmap.c
+++ linux-host/mm/rmap.c
@@ -413,9 +413,6 @@ int page_referenced(struct page *page, i
{
int referenced = 0;
- if (page_test_and_clear_young(page))
- referenced++;
-
if (TestClearPageReferenced(page))
referenced++;
@@ -433,6 +430,10 @@ int page_referenced(struct page *page, i
unlock_page(page);
}
}
+
+ if (page_test_and_clear_young(page))
+ referenced++;
+
return referenced;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable
2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky
@ 2008-03-24 21:50 ` Andrew Morton
0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2008-03-24 21:50 UTC (permalink / raw)
To: Carsten Otte
Cc: virtualization, kvm-devel, avi, npiggin, hugh, linux-mm,
schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT,
jeroney, aliguori, jblunck, rvdheij, rusty, arnd, xiantao.zhang
On Sat, 22 Mar 2008 18:02:37 +0100
Carsten Otte <cotte@de.ibm.com> wrote:
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
>
> The SIE instruction on s390 uses the 2nd half of the page table page to
> virtualize the storage keys of a guest. This patch offers the s390_enable_sie
> function, which reorganizes the page tables of a single-threaded process to
> reserve space in the page table:
> s390_enable_sie makes sure that the process is single threaded and then uses
> dup_mm to create a new mm with reorganized page tables. The old mm is freed
> and the process has now a page status extended field after every page table.
>
> Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
>
> This patch has a small common code hit, namely making dup_mm non-static.
>
> Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
> review feedback. Now we do have the prototype for dup_mm in
> include/linux/sched.h.
>
> ...
>
> --- linux-host.orig/kernel/fork.c
> +++ linux-host/kernel/fork.c
> @@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk,
> * Allocate a new mm structure and copy contents from the
> * mm structure of the passed in task structure.
> */
> -static struct mm_struct *dup_mm(struct task_struct *tsk)
> +struct mm_struct *dup_mm(struct task_struct *tsk)
> {
> struct mm_struct *mm, *oldmm = current->mm;
> int err;
ack
> --- linux-host.orig/include/linux/sched.h
> +++ linux-host/include/linux/sched.h
> @@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *);
> extern struct mm_struct *get_task_mm(struct task_struct *task);
> /* Remove the current tasks stale references to the old mm_struct */
> extern void mm_release(struct task_struct *, struct mm_struct *);
> +/* Allocate a new mm structure and copy contents from tsk->mm */
> +extern struct mm_struct *dup_mm(struct task_struct *tsk);
>
> extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *);
> extern void flush_thread(void);
>
hm, why did we put these in sched.h?
oh well - acked-by-me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm
2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
@ 2008-03-24 21:52 ` Andrew Morton
0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2008-03-24 21:52 UTC (permalink / raw)
To: Carsten Otte
Cc: virtualization, kvm-devel, avi, npiggin, hugh, linux-mm,
schwidefsky, heiko.carstens, os, borntraeger, hollisb, EHRHARDT,
jeroney, aliguori, jblunck, rvdheij, rusty, arnd, xiantao.zhang
On Sat, 22 Mar 2008 18:02:39 +0100
Carsten Otte <cotte@de.ibm.com> wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> From: Christian Borntraeger <borntraeger@de.ibm.com>
>
> This patch changes the s390 memory management defintions to use the pgste field
> for dirty and reference bit tracking of host and guest code. Usually on s390,
> dirty and referenced are tracked in storage keys, which belong to the physical
> page. This changes with virtualization: The guest and host dirty/reference bits
> are defined to be the logical OR of the values for the mapping and the physical
> page. This patch implements the necessary changes in pgtable.h for s390.
>
>
> There is a common code change in mm/rmap.c, the call to page_test_and_clear_young
> must be moved. This is a no-op for all architecture but s390. page_referenced
> checks the referenced bits for the physiscal page and for all mappings:
> o The physical page is checked with page_test_and_clear_young.
> o The mappings are checked with ptep_test_and_clear_young and friends.
>
> Without pgstes (the current implementation on Linux s390) the physical page
> check is implemented but the mapping callbacks are no-ops because dirty
> and referenced are not tracked in the s390 page tables. The pgstes introduces
> guest and host dirty and reference bits for s390 in the host mapping. These
> mapping must be checked before page_test_and_clear_young resets the reference
> bit.
>
> ...
>
> --- linux-host.orig/mm/rmap.c
> +++ linux-host/mm/rmap.c
> @@ -413,9 +413,6 @@ int page_referenced(struct page *page, i
> {
> int referenced = 0;
>
> - if (page_test_and_clear_young(page))
> - referenced++;
> -
> if (TestClearPageReferenced(page))
> referenced++;
>
> @@ -433,6 +430,10 @@ int page_referenced(struct page *page, i
> unlock_page(page);
> }
> }
> +
> + if (page_test_and_clear_young(page))
> + referenced++;
> +
> return referenced;
> }
ack.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 00/15 v3] kvm on big iron
2008-03-22 17:02 ` [RFC/PATCH 00/15 v2] kvm on big iron Carsten Otte
@ 2008-03-25 17:47 ` Carsten Otte
2008-03-27 12:02 ` Avi Kivity
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
1 sibling, 1 reply; 9+ messages in thread
From: Carsten Otte @ 2008-03-25 17:47 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity
Cc: Andrew Morton, Linux Memory Management List, schwidefsky,
heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney,
aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao,
oliver.paukstadt
Many thanks for the review feedback we have received so far,
and many thanks to Andrew for reviewing our common code memory
management changes. I do greatly appreciate that :-).
All important parts have been reviewed, all review feedback has been
integrated in the code. Therefore we would like to ask for inclusion of
our work into kvm.git.
Changes from Version 1:
- include feedback from Randy Dunlap on the documentation
- include feedback from Jeremy Fitzhardinge, the prototype for dup_mm
has moved to include/linux/sched.h
- rebase to current kvm.git hash g361be34. Thank you Avi for pulling
in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-).
Changes from Version 2:
- include feedback from Rusty Russell on the virtio patch
- include fix for race s390_enable_sie() versus ptrace spotted by Dave
Hansen: we now do task_lock() to protect mm_users from update while
we're growing the page table. Good catch, Dave :-).
- rebase to current kvm.git hash g680615e
Todo list:
- I've created a patch for Christoph Helwig's feedback about symbolic
names for machine_flags. This change is independent of the kvm port, and
I will submit it for review to Martin.
The patch queue consists of the following patches:
[RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user
pagetable
[RFC/PATCH 02/15] preparation: host memory management changes for s390
kvm
[RFC/PATCH 03/15] preparation: address of the 64bit extint parm in
lowcore
[RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use
[RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module
[RFC/PATCH 06/15] kvm-s390: sie intercept handling
[RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw
[RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions
[RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp
[RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions
[RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390
[RFC/PATCH 12/15] kvm-s390: API documentation
[RFC/PATCH 13/15] kvm-s390: update maintainers
[RFC/PATCH 14/15] guest: detect when running on kvm
[RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
@ 2008-03-25 17:47 ` Carsten Otte, Martin Schwidefsky, Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
1 sibling, 0 replies; 9+ messages in thread
From: Carsten Otte, Martin Schwidefsky, Carsten Otte @ 2008-03-25 17:47 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity
Cc: Andrew Morton, Linux Memory Management List, schwidefsky,
heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney,
aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao,
oliver.paukstadt
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/Kconfig | 4 ++
arch/s390/kernel/setup.c | 4 ++
arch/s390/mm/pgtable.c | 65 +++++++++++++++++++++++++++++++++++++++--
include/asm-s390/mmu.h | 1
include/asm-s390/mmu_context.h | 8 ++++-
include/asm-s390/pgtable.h | 1
include/linux/sched.h | 2 +
kernel/fork.c | 2 -
8 files changed, 82 insertions(+), 5 deletions(-)
Index: linux-host/arch/s390/Kconfig
===================================================================
--- linux-host.orig/arch/s390/Kconfig
+++ linux-host/arch/s390/Kconfig
@@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK
default y
depends on SMP && PREEMPT
+config PGSTE
+ bool
+ default y if KVM
+
mainmenu "Linux Kernel Configuration"
config S390
Index: linux-host/arch/s390/kernel/setup.c
===================================================================
--- linux-host.orig/arch/s390/kernel/setup.c
+++ linux-host/arch/s390/kernel/setup.c
@@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c
early_param("ipldelay", early_parse_ipldelay);
#ifdef CONFIG_S390_SWITCH_AMODE
+#ifdef CONFIG_PGSTE
+unsigned int switch_amode = 1;
+#else
unsigned int switch_amode = 0;
+#endif
EXPORT_SYMBOL_GPL(switch_amode);
static void set_amode_and_uaccess(unsigned long user_amode,
Index: linux-host/arch/s390/mm/pgtable.c
===================================================================
--- linux-host.orig/arch/s390/mm/pgtable.c
+++ linux-host/arch/s390/mm/pgtable.c
@@ -30,11 +30,27 @@
#define TABLES_PER_PAGE 4
#define FRAG_MASK 15UL
#define SECOND_HALVES 10UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+ memset(table + 256, 0, PAGE_SIZE/4);
+ clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+ memset(table + 768, 0, PAGE_SIZE/4);
+}
+
#else
#define ALLOC_ORDER 2
#define TABLES_PER_PAGE 2
#define FRAG_MASK 3UL
#define SECOND_HALVES 2UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2);
+ memset(table + 256, 0, PAGE_SIZE/2);
+}
+
#endif
unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec)
@@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m
unsigned long *table;
unsigned long bits;
- bits = mm->context.noexec ? 3UL : 1UL;
+ bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
spin_lock(&mm->page_table_lock);
page = NULL;
if (!list_empty(&mm->context.pgtable_list)) {
@@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m
pgtable_page_ctor(page);
page->flags &= ~FRAG_MASK;
table = (unsigned long *) page_to_phys(page);
- clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
+ if (mm->context.pgstes)
+ clear_table_pgstes(table);
+ else
+ clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
spin_lock(&mm->page_table_lock);
list_add(&page->lru, &mm->context.pgtable_list);
}
@@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m
struct page *page;
unsigned long bits;
- bits = mm->context.noexec ? 3UL : 1UL;
+ bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
bits <<= (__pa(table) & (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long);
page = pfn_to_page(__pa(table) >> PAGE_SHIFT);
spin_lock(&mm->page_table_lock);
@@ -228,3 +247,43 @@ void disable_noexec(struct mm_struct *mm
mm->context.noexec = 0;
update_mm(mm, tsk);
}
+
+/*
+ * switch on pgstes for its userspace process (for kvm)
+ */
+int s390_enable_sie(void)
+{
+ struct task_struct *tsk = current;
+ struct mm_struct *mm;
+ int rc;
+
+ task_lock(tsk);
+
+ rc = 0;
+ if (tsk->mm->context.pgstes)
+ goto unlock;
+
+ rc = -EINVAL;
+ if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 ||
+ tsk->mm != tsk->active_mm || tsk->mm->ioctx_list)
+ goto unlock;
+
+ tsk->mm->context.pgstes = 1; /* dirty little tricks .. */
+ mm = dup_mm(tsk);
+ tsk->mm->context.pgstes = 0;
+
+ rc = -ENOMEM;
+ if (!mm)
+ goto unlock;
+ mmput(tsk->mm);
+ tsk->mm = tsk->active_mm = mm;
+ preempt_disable();
+ update_mm(mm, tsk);
+ cpu_set(smp_processor_id(), mm->cpu_vm_mask);
+ preempt_enable();
+ rc = 0;
+unlock:
+ task_unlock(tsk);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(s390_enable_sie);
Index: linux-host/include/asm-s390/mmu.h
===================================================================
--- linux-host.orig/include/asm-s390/mmu.h
+++ linux-host/include/asm-s390/mmu.h
@@ -7,6 +7,7 @@ typedef struct {
unsigned long asce_bits;
unsigned long asce_limit;
int noexec;
+ int pgstes;
} mm_context_t;
#endif
Index: linux-host/include/asm-s390/mmu_context.h
===================================================================
--- linux-host.orig/include/asm-s390/mmu_context.h
+++ linux-host/include/asm-s390/mmu_context.h
@@ -20,7 +20,13 @@ static inline int init_new_context(struc
#ifdef CONFIG_64BIT
mm->context.asce_bits |= _ASCE_TYPE_REGION3;
#endif
- mm->context.noexec = s390_noexec;
+ if (current->mm->context.pgstes) {
+ mm->context.noexec = 0;
+ mm->context.pgstes = 1;
+ } else {
+ mm->context.noexec = s390_noexec;
+ mm->context.pgstes = 0;
+ }
mm->context.asce_limit = STACK_TOP_MAX;
crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm));
return 0;
Index: linux-host/include/asm-s390/pgtable.h
===================================================================
--- linux-host.orig/include/asm-s390/pgtable.h
+++ linux-host/include/asm-s390/pgtable.h
@@ -966,6 +966,7 @@ static inline pte_t mk_swap_pte(unsigned
extern int add_shared_memory(unsigned long start, unsigned long size);
extern int remove_shared_memory(unsigned long start, unsigned long size);
+extern int s390_enable_sie(void);
/*
* No page table caches to initialise
Index: linux-host/kernel/fork.c
===================================================================
--- linux-host.orig/kernel/fork.c
+++ linux-host/kernel/fork.c
@@ -498,7 +498,7 @@ void mm_release(struct task_struct *tsk,
* Allocate a new mm structure and copy contents from the
* mm structure of the passed in task structure.
*/
-static struct mm_struct *dup_mm(struct task_struct *tsk)
+struct mm_struct *dup_mm(struct task_struct *tsk)
{
struct mm_struct *mm, *oldmm = current->mm;
int err;
Index: linux-host/include/linux/sched.h
===================================================================
--- linux-host.orig/include/linux/sched.h
+++ linux-host/include/linux/sched.h
@@ -1758,6 +1758,8 @@ extern void mmput(struct mm_struct *);
extern struct mm_struct *get_task_mm(struct task_struct *task);
/* Remove the current tasks stale references to the old mm_struct */
extern void mm_release(struct task_struct *, struct mm_struct *);
+/* Allocate a new mm structure and copy contents from tsk->mm */
+extern struct mm_struct *dup_mm(struct task_struct *tsk);
extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *);
extern void flush_thread(void);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
2008-03-25 17:47 ` [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky, Carsten Otte
@ 2008-03-25 17:47 ` Carsten Otte, Heiko Carstens, Christian Borntraeger
1 sibling, 0 replies; 9+ messages in thread
From: Carsten Otte, Heiko Carstens, Christian Borntraeger @ 2008-03-25 17:47 UTC (permalink / raw)
To: virtualization, kvm-devel, Avi Kivity
Cc: Andrew Morton, Linux Memory Management List, schwidefsky,
heiko.carstens, os, borntraeger, hollisb, EHRHARDT, jeroney,
aliguori, jblunck, rvdheij, rusty, arnd, Zhang, Xiantao,
oliver.paukstadt
This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.
There is a common code change in mm/rmap.c, the call to page_test_and_clear_young
must be moved. This is a no-op for all architecture but s390. page_referenced
checks the referenced bits for the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.
Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
include/asm-s390/pgtable.h | 92 +++++++++++++++++++++++++++++++++++++++++++--
mm/rmap.c | 7 +--
2 files changed, 93 insertions(+), 6 deletions(-)
Index: kvm/include/asm-s390/pgtable.h
===================================================================
--- kvm.orig/include/asm-s390/pgtable.h
+++ kvm/include/asm-s390/pgtable.h
@@ -30,6 +30,7 @@
*/
#ifndef __ASSEMBLY__
#include <linux/mm_types.h>
+#include <asm/bitops.h>
#include <asm/bug.h>
#include <asm/processor.h>
@@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE];
* swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid.
*/
+/* Page status table bits for virtualization */
+#define RCP_PCL_BIT 55
+#define RCP_HR_BIT 54
+#define RCP_HC_BIT 53
+#define RCP_GR_BIT 50
+#define RCP_GC_BIT 49
+
#ifndef __s390x__
/* Bits in the segment table address-space-control-element */
@@ -513,6 +521,48 @@ static inline int pte_file(pte_t pte)
#define __HAVE_ARCH_PTE_SAME
#define pte_same(a,b) (pte_val(a) == pte_val(b))
+static inline void rcp_lock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE);
+ preempt_disable();
+ while (test_and_set_bit(RCP_PCL_BIT, pgste))
+ ;
+#endif
+}
+
+static inline void rcp_unlock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE);
+ clear_bit(RCP_PCL_BIT, pgste);
+ preempt_enable();
+#endif
+}
+
+/* forward declaration for SetPageUptodate in page-flags.h*/
+static inline void page_clear_dirty(struct page *page);
+#include <linux/page-flags.h>
+
+static inline void ptep_rcp_copy(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+ struct page *page = virt_to_page(pte_val(*ptep));
+ unsigned int skey;
+ unsigned long *pgste = (unsigned long *) (ptep + PTRS_PER_PTE);
+
+ skey = page_get_storage_key(page_to_phys(page));
+ if (skey & _PAGE_CHANGED)
+ set_bit(RCP_GC_BIT, pgste);
+ if (skey & _PAGE_REFERENCED)
+ set_bit(RCP_GR_BIT, pgste);
+ if (test_and_clear_bit(RCP_HC_BIT, pgste))
+ SetPageDirty(page);
+ if (test_and_clear_bit(RCP_HR_BIT, pgste))
+ SetPageReferenced(page);
+#endif
+}
+
/*
* query functions pte_write/pte_dirty/pte_young only work if
* pte_present() is true. Undefined behaviour if not..
@@ -599,6 +649,8 @@ static inline void pmd_clear(pmd_t *pmd)
static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
+ if (mm->context.pgstes)
+ ptep_rcp_copy(ptep);
pte_val(*ptep) = _PAGE_TYPE_EMPTY;
if (mm->context.noexec)
pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY;
@@ -667,6 +719,24 @@ static inline pte_t pte_mkyoung(pte_t pt
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
+#ifdef CONFIG_PGSTE
+ unsigned long physpage;
+ int young;
+ unsigned long *pgste;
+
+ if (!vma->vm_mm->context.pgstes)
+ return 0;
+ physpage = pte_val(*ptep) & PAGE_MASK;
+ pgste = (unsigned long *) (ptep + PTRS_PER_PTE);
+
+ young = ((page_get_storage_key(physpage) & _PAGE_REFERENCED) != 0);
+ rcp_lock(ptep);
+ if (young)
+ set_bit(RCP_GR_BIT, pgste);
+ young |= test_and_clear_bit(RCP_HR_BIT, pgste);
+ rcp_unlock(ptep);
+ return young;
+#endif
return 0;
}
@@ -674,7 +744,13 @@ static inline int ptep_test_and_clear_yo
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- /* No need to flush TLB; bits are in storage key */
+ /* No need to flush TLB
+ * On s390 reference bits are in storage key and never in TLB
+ * With virtualization we handle the reference bit, without we
+ * we can simply return */
+#ifdef CONFIG_PGSTE
+ return ptep_test_and_clear_young(vma, address, ptep);
+#endif
return 0;
}
@@ -693,15 +769,25 @@ static inline void __ptep_ipte(unsigned
: "=m" (*ptep) : "m" (*ptep),
"a" (pto), "a" (address));
}
- pte_val(*ptep) = _PAGE_TYPE_EMPTY;
}
static inline void ptep_invalidate(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
+ if (mm->context.pgstes) {
+ rcp_lock(ptep);
+ __ptep_ipte(address, ptep);
+ ptep_rcp_copy(ptep);
+ pte_val(*ptep) = _PAGE_TYPE_EMPTY;
+ rcp_unlock(ptep);
+ return;
+ }
__ptep_ipte(address, ptep);
- if (mm->context.noexec)
+ pte_val(*ptep) = _PAGE_TYPE_EMPTY;
+ if (mm->context.noexec) {
__ptep_ipte(address, ptep + PTRS_PER_PTE);
+ pte_val(*(ptep + PTRS_PER_PTE)) = _PAGE_TYPE_EMPTY;
+ }
}
/*
Index: kvm/mm/rmap.c
===================================================================
--- kvm.orig/mm/rmap.c
+++ kvm/mm/rmap.c
@@ -413,9 +413,6 @@ int page_referenced(struct page *page, i
{
int referenced = 0;
- if (page_test_and_clear_young(page))
- referenced++;
-
if (TestClearPageReferenced(page))
referenced++;
@@ -433,6 +430,10 @@ int page_referenced(struct page *page, i
unlock_page(page);
}
}
+
+ if (page_test_and_clear_young(page))
+ referenced++;
+
return referenced;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 00/15 v3] kvm on big iron
2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte
@ 2008-03-27 12:02 ` Avi Kivity
0 siblings, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2008-03-27 12:02 UTC (permalink / raw)
To: Carsten Otte
Cc: virtualization, kvm-devel, Andrew Morton,
Linux Memory Management List, schwidefsky, heiko.carstens, os,
borntraeger, hollisb, EHRHARDT, jeroney, aliguori, jblunck,
rvdheij, rusty, arnd, Zhang, Xiantao, oliver.paukstadt
Carsten Otte wrote:
> Many thanks for the review feedback we have received so far,
> and many thanks to Andrew for reviewing our common code memory
> management changes. I do greatly appreciate that :-).
>
> All important parts have been reviewed, all review feedback has been
> integrated in the code. Therefore we would like to ask for inclusion of
> our work into kvm.git.
>
> Changes from Version 1:
> - include feedback from Randy Dunlap on the documentation
> - include feedback from Jeremy Fitzhardinge, the prototype for dup_mm
> has moved to include/linux/sched.h
> - rebase to current kvm.git hash g361be34. Thank you Avi for pulling
> in the fix we need, and for moving KVM_MAX_VCPUS to include/arch :-).
>
> Changes from Version 2:
> - include feedback from Rusty Russell on the virtio patch
> - include fix for race s390_enable_sie() versus ptrace spotted by Dave
> Hansen: we now do task_lock() to protect mm_users from update while
> we're growing the page table. Good catch, Dave :-).
> - rebase to current kvm.git hash g680615e
>
>
Applied all, thanks.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-27 12:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1206030270.6690.51.camel@cotte.boeblingen.de.ibm.com>
2008-03-22 17:02 ` [RFC/PATCH 00/15 v2] kvm on big iron Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 00/15 v3] " Carsten Otte
2008-03-27 12:02 ` Avi Kivity
[not found] ` <1206458154.6217.12.camel@cotte.boeblingen.de.ibm.com>
2008-03-25 17:47 ` [RFC/PATCH 01/15 v3] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky, Carsten Otte
2008-03-25 17:47 ` [RFC/PATCH 02/15 v3] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
[not found] ` <1206203560.7177.45.camel@cotte.boeblingen.de.ibm.com>
2008-03-22 17:02 ` [RFC/PATCH 01/15 v2] preparation: provide hook to enable pgstes in user pagetable Carsten Otte, Martin Schwidefsky
2008-03-24 21:50 ` Andrew Morton
2008-03-22 17:02 ` [RFC/PATCH 02/15 v2] preparation: host memory management changes for s390 kvm Carsten Otte, Heiko Carstens, Christian Borntraeger
2008-03-24 21:52 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox