* [PATCH 0/2] s390: Remove uses of page->index
@ 2024-12-19 16:22 Matthew Wilcox (Oracle)
2024-12-19 16:22 ` [PATCH 1/2] s390: Convert gmap code to use ptdesc Matthew Wilcox (Oracle)
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-12-19 16:22 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: Matthew Wilcox (Oracle), David Hildenbrand, linux-mm, linux-s390
These two patches compile ... I can promise nothing more than that.
David suggested to me that the gmap code really should be using ptdesc,
and I think I agree with him. The vsie code looks quite different
and probably shouldn't be using a ptdesc, but we can use page->private
instead of page->index. It's not yet clear to me if we'll ever manage
to get rid of page->private.
Matthew Wilcox (Oracle) (2):
s390: Convert gmap code to use ptdesc
s390: Convert vsie code to use page->private
arch/s390/kvm/vsie.c | 6 +-
arch/s390/mm/gmap.c | 181 +++++++++++++++++++++----------------------
2 files changed, 92 insertions(+), 95 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/2] s390: Convert gmap code to use ptdesc 2024-12-19 16:22 [PATCH 0/2] s390: Remove uses of page->index Matthew Wilcox (Oracle) @ 2024-12-19 16:22 ` Matthew Wilcox (Oracle) 2024-12-19 21:23 ` Matthew Wilcox 2024-12-19 16:22 ` [PATCH 2/2] s390: Convert vsie code to use page->private Matthew Wilcox (Oracle) ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Matthew Wilcox (Oracle) @ 2024-12-19 16:22 UTC (permalink / raw) To: Claudio Imbrenda Cc: Matthew Wilcox (Oracle), David Hildenbrand, linux-mm, linux-s390 There was originally some doubt about whether these page tables should be represented by a vanilla struct page or whether they should be a ptdesc. As we continue on our quest to shrink struct page, we seem to have crossed the line into believing that thse page tables should be a ptdesc. At least for now. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- arch/s390/mm/gmap.c | 181 ++++++++++++++++++++++---------------------- 1 file changed, 89 insertions(+), 92 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 16b8a36c56de..2ca100aae1f7 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -26,15 +26,15 @@ #define GMAP_SHADOW_FAKE_TABLE 1ULL -static struct page *gmap_alloc_crst(void) +static struct ptdesc *gmap_alloc_crst(void) { - struct page *page; + struct ptdesc *ptdesc; - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); - if (!page) + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); + if (!ptdesc) return NULL; - __arch_set_page_dat(page_to_virt(page), 1UL << CRST_ALLOC_ORDER); - return page; + __arch_set_page_dat(ptdesc_to_virt(ptdesc), 1UL << CRST_ALLOC_ORDER); + return ptdesc; } /** @@ -46,7 +46,7 @@ static struct page *gmap_alloc_crst(void) static struct gmap *gmap_alloc(unsigned long limit) { struct gmap *gmap; - struct page *page; + struct ptdesc *ptdesc; unsigned long *table; unsigned long etype, atype; @@ -79,12 +79,12 @@ static struct gmap *gmap_alloc(unsigned long limit) spin_lock_init(&gmap->guest_table_lock); spin_lock_init(&gmap->shadow_lock); refcount_set(&gmap->ref_count, 1); - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) goto out_free; - page->index = 0; - list_add(&page->lru, &gmap->crst_list); - table = page_to_virt(page); + ptdesc->pt_index = 0; + list_add(&ptdesc->pt_list, &gmap->crst_list); + table = ptdesc_to_virt(ptdesc); crst_table_init(table, etype); gmap->table = table; gmap->asce = atype | _ASCE_TABLE_LENGTH | @@ -193,23 +193,21 @@ static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) */ static void gmap_free(struct gmap *gmap) { - struct page *page, *next; + struct ptdesc *ptdesc, *next; /* Flush tlb of all gmaps (if not already done for shadows) */ if (!(gmap_is_shadow(gmap) && gmap->removed)) gmap_flush_tlb(gmap); /* Free all segment & region tables. */ - list_for_each_entry_safe(page, next, &gmap->crst_list, lru) - __free_pages(page, CRST_ALLOC_ORDER); + list_for_each_entry_safe(ptdesc, next, &gmap->crst_list, pt_list) + pagetable_free(ptdesc); gmap_radix_tree_free(&gmap->guest_to_host); gmap_radix_tree_free(&gmap->host_to_guest); /* Free additional data for a shadow gmap */ if (gmap_is_shadow(gmap)) { - struct ptdesc *ptdesc, *n; - /* Free all page tables. */ - list_for_each_entry_safe(ptdesc, n, &gmap->pt_list, pt_list) + list_for_each_entry_safe(ptdesc, next, &gmap->pt_list, pt_list) page_table_free_pgste(ptdesc); gmap_rmap_radix_tree_free(&gmap->host_to_rmap); /* Release reference to the parent */ @@ -287,26 +285,26 @@ EXPORT_SYMBOL_GPL(gmap_remove); static int gmap_alloc_table(struct gmap *gmap, unsigned long *table, unsigned long init, unsigned long gaddr) { - struct page *page; + struct ptdesc *ptdesc; unsigned long *new; /* since we dont free the gmap table until gmap_free we can unlock */ - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) return -ENOMEM; - new = page_to_virt(page); + new = ptdesc_to_virt(ptdesc); crst_table_init(new, init); spin_lock(&gmap->guest_table_lock); if (*table & _REGION_ENTRY_INVALID) { - list_add(&page->lru, &gmap->crst_list); + list_add(&ptdesc->pt_list, &gmap->crst_list); *table = __pa(new) | _REGION_ENTRY_LENGTH | (*table & _REGION_ENTRY_TYPE_MASK); - page->index = gaddr; - page = NULL; + ptdesc->pt_index = gaddr; + ptdesc = NULL; } spin_unlock(&gmap->guest_table_lock); - if (page) - __free_pages(page, CRST_ALLOC_ORDER); + if (ptdesc) + pagetable_free(ptdesc); return 0; } @@ -318,13 +316,13 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned long *table, */ static unsigned long __gmap_segment_gaddr(unsigned long *entry) { - struct page *page; + struct ptdesc *ptdesc; unsigned long offset; offset = (unsigned long) entry / sizeof(unsigned long); offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE; - page = pmd_pgtable_page((pmd_t *) entry); - return page->index + offset; + ptdesc = pmd_ptdesc((pmd_t *) entry); + return ptdesc->pt_index + offset; } /** @@ -1458,7 +1456,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) { unsigned long r3o, *r3e; phys_addr_t sgt; - struct page *page; + struct ptdesc *ptdesc; BUG_ON(!gmap_is_shadow(sg)); r3e = gmap_table_walk(sg, raddr, 2); /* get region-3 pointer */ @@ -1471,9 +1469,9 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) *r3e = _REGION3_ENTRY_EMPTY; __gmap_unshadow_sgt(sg, raddr, __va(sgt)); /* Free segment table */ - page = phys_to_page(sgt); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(sgt)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } /** @@ -1487,7 +1485,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) static void __gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr, unsigned long *r3t) { - struct page *page; + struct ptdesc *ptdesc; phys_addr_t sgt; int i; @@ -1499,9 +1497,9 @@ static void __gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr, r3t[i] = _REGION3_ENTRY_EMPTY; __gmap_unshadow_sgt(sg, raddr, __va(sgt)); /* Free segment table */ - page = phys_to_page(sgt); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(sgt)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } } @@ -1516,7 +1514,7 @@ static void gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr) { unsigned long r2o, *r2e; phys_addr_t r3t; - struct page *page; + struct ptdesc *ptdesc; BUG_ON(!gmap_is_shadow(sg)); r2e = gmap_table_walk(sg, raddr, 3); /* get region-2 pointer */ @@ -1529,9 +1527,9 @@ static void gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr) *r2e = _REGION2_ENTRY_EMPTY; __gmap_unshadow_r3t(sg, raddr, __va(r3t)); /* Free region 3 table */ - page = phys_to_page(r3t); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(r3t)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } /** @@ -1546,7 +1544,7 @@ static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, unsigned long *r2t) { phys_addr_t r3t; - struct page *page; + struct ptdesc *ptdesc; int i; BUG_ON(!gmap_is_shadow(sg)); @@ -1557,9 +1555,9 @@ static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, r2t[i] = _REGION2_ENTRY_EMPTY; __gmap_unshadow_r3t(sg, raddr, __va(r3t)); /* Free region 3 table */ - page = phys_to_page(r3t); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(r3t)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } } @@ -1573,7 +1571,7 @@ static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, static void gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr) { unsigned long r1o, *r1e; - struct page *page; + struct ptdesc *ptdesc; phys_addr_t r2t; BUG_ON(!gmap_is_shadow(sg)); @@ -1587,9 +1585,9 @@ static void gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr) *r1e = _REGION1_ENTRY_EMPTY; __gmap_unshadow_r2t(sg, raddr, __va(r2t)); /* Free region 2 table */ - page = phys_to_page(r2t); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(r2t)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } /** @@ -1604,7 +1602,7 @@ static void __gmap_unshadow_r1t(struct gmap *sg, unsigned long raddr, unsigned long *r1t) { unsigned long asce; - struct page *page; + struct ptdesc *ptdesc; phys_addr_t r2t; int i; @@ -1619,9 +1617,9 @@ static void __gmap_unshadow_r1t(struct gmap *sg, unsigned long raddr, gmap_idte_one(asce, raddr); r1t[i] = _REGION1_ENTRY_EMPTY; /* Free region 2 table */ - page = phys_to_page(r2t); - list_del(&page->lru); - __free_pages(page, CRST_ALLOC_ORDER); + ptdesc = page_ptdesc(phys_to_page(r2t)); + list_del(&ptdesc->pt_list); + pagetable_free(ptdesc); } } @@ -1819,18 +1817,18 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t, unsigned long raddr, origin, offset, len; unsigned long *table; phys_addr_t s_r2t; - struct page *page; + struct ptdesc *ptdesc; int rc; BUG_ON(!gmap_is_shadow(sg)); /* Allocate a shadow region second table */ - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) return -ENOMEM; - page->index = r2t & _REGION_ENTRY_ORIGIN; + ptdesc->pt_index = r2t & _REGION_ENTRY_ORIGIN; if (fake) - page->index |= GMAP_SHADOW_FAKE_TABLE; - s_r2t = page_to_phys(page); + ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE; + s_r2t = page_to_phys(ptdesc_page(ptdesc)); /* Install shadow region second table */ spin_lock(&sg->guest_table_lock); table = gmap_table_walk(sg, saddr, 4); /* get region-1 pointer */ @@ -1851,7 +1849,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t, _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID; if (sg->edat_level >= 1) *table |= (r2t & _REGION_ENTRY_PROTECT); - list_add(&page->lru, &sg->crst_list); + list_add(&ptdesc->pt_list, &sg->crst_list); if (fake) { /* nothing to protect for fake tables */ *table &= ~_REGION_ENTRY_INVALID; @@ -1879,7 +1877,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t, return rc; out_free: spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); + pagetable_free(ptdesc); return rc; } EXPORT_SYMBOL_GPL(gmap_shadow_r2t); @@ -1903,18 +1901,18 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t, unsigned long raddr, origin, offset, len; unsigned long *table; phys_addr_t s_r3t; - struct page *page; + struct ptdesc *ptdesc; int rc; BUG_ON(!gmap_is_shadow(sg)); /* Allocate a shadow region second table */ - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) return -ENOMEM; - page->index = r3t & _REGION_ENTRY_ORIGIN; + ptdesc->pt_index = r3t & _REGION_ENTRY_ORIGIN; if (fake) - page->index |= GMAP_SHADOW_FAKE_TABLE; - s_r3t = page_to_phys(page); + ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE; + s_r3t = page_to_phys(ptdesc_page(ptdesc)); /* Install shadow region second table */ spin_lock(&sg->guest_table_lock); table = gmap_table_walk(sg, saddr, 3); /* get region-2 pointer */ @@ -1935,7 +1933,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t, _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID; if (sg->edat_level >= 1) *table |= (r3t & _REGION_ENTRY_PROTECT); - list_add(&page->lru, &sg->crst_list); + list_add(&ptdesc->pt_list, &sg->crst_list); if (fake) { /* nothing to protect for fake tables */ *table &= ~_REGION_ENTRY_INVALID; @@ -1963,7 +1961,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t, return rc; out_free: spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); + pagetable_free(ptdesc); return rc; } EXPORT_SYMBOL_GPL(gmap_shadow_r3t); @@ -1987,18 +1985,18 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt, unsigned long raddr, origin, offset, len; unsigned long *table; phys_addr_t s_sgt; - struct page *page; + struct ptdesc *ptdesc; int rc; BUG_ON(!gmap_is_shadow(sg) || (sgt & _REGION3_ENTRY_LARGE)); /* Allocate a shadow segment table */ - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) return -ENOMEM; - page->index = sgt & _REGION_ENTRY_ORIGIN; + ptdesc->pt_index = sgt & _REGION_ENTRY_ORIGIN; if (fake) - page->index |= GMAP_SHADOW_FAKE_TABLE; - s_sgt = page_to_phys(page); + ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE; + s_sgt = page_to_phys(ptdesc_page(ptdesc)); /* Install shadow region second table */ spin_lock(&sg->guest_table_lock); table = gmap_table_walk(sg, saddr, 2); /* get region-3 pointer */ @@ -2019,7 +2017,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt, _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID; if (sg->edat_level >= 1) *table |= sgt & _REGION_ENTRY_PROTECT; - list_add(&page->lru, &sg->crst_list); + list_add(&ptdesc->pt_list, &sg->crst_list); if (fake) { /* nothing to protect for fake tables */ *table &= ~_REGION_ENTRY_INVALID; @@ -2047,7 +2045,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt, return rc; out_free: spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); + pagetable_free(ptdesc); return rc; } EXPORT_SYMBOL_GPL(gmap_shadow_sgt); @@ -2070,7 +2068,7 @@ int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, int *fake) { unsigned long *table; - struct page *page; + struct ptdesc *ptdesc; int rc; BUG_ON(!gmap_is_shadow(sg)); @@ -2078,10 +2076,10 @@ int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */ if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { /* Shadow page tables are full pages (pte+pgste) */ - page = pfn_to_page(*table >> PAGE_SHIFT); - *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE; + ptdesc = page_ptdesc(pfn_to_page(*table >> PAGE_SHIFT)); + *pgt = ptdesc->pt_index & ~GMAP_SHADOW_FAKE_TABLE; *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT); - *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE); + *fake = !!(ptdesc->pt_index & GMAP_SHADOW_FAKE_TABLE); rc = 0; } else { rc = -EAGAIN; @@ -2961,11 +2959,10 @@ EXPORT_SYMBOL_GPL(__s390_uv_destroy_range); */ void s390_unlist_old_asce(struct gmap *gmap) { - struct page *old; + struct ptdesc *old; - old = virt_to_page(gmap->table); + old = virt_to_ptdesc(gmap->table); spin_lock(&gmap->guest_table_lock); - list_del(&old->lru); /* * Sometimes the topmost page might need to be "removed" multiple * times, for example if the VM is rebooted into secure mode several @@ -2980,7 +2977,7 @@ void s390_unlist_old_asce(struct gmap *gmap) * pointers, so list_del can work (and do nothing) without * dereferencing stale or invalid pointers. */ - INIT_LIST_HEAD(&old->lru); + list_del_init(&old->pt_list); spin_unlock(&gmap->guest_table_lock); } EXPORT_SYMBOL_GPL(s390_unlist_old_asce); @@ -3001,7 +2998,7 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce); int s390_replace_asce(struct gmap *gmap) { unsigned long asce; - struct page *page; + struct ptdesc *ptdesc; void *table; s390_unlist_old_asce(gmap); @@ -3010,11 +3007,11 @@ int s390_replace_asce(struct gmap *gmap) if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) return -EINVAL; - page = gmap_alloc_crst(); - if (!page) + ptdesc = gmap_alloc_crst(); + if (!ptdesc) return -ENOMEM; - page->index = 0; - table = page_to_virt(page); + ptdesc->pt_index = 0; + table = ptdesc_to_virt(ptdesc); memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); /* @@ -3023,7 +3020,7 @@ int s390_replace_asce(struct gmap *gmap) * it will be freed when the VM is torn down. */ spin_lock(&gmap->guest_table_lock); - list_add(&page->lru, &gmap->crst_list); + list_add(&ptdesc->pt_list, &gmap->crst_list); spin_unlock(&gmap->guest_table_lock); /* Set new table origin while preserving existing ASCE control bits */ -- 2.45.2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] s390: Convert gmap code to use ptdesc 2024-12-19 16:22 ` [PATCH 1/2] s390: Convert gmap code to use ptdesc Matthew Wilcox (Oracle) @ 2024-12-19 21:23 ` Matthew Wilcox 2024-12-20 9:22 ` David Hildenbrand 0 siblings, 1 reply; 12+ messages in thread From: Matthew Wilcox @ 2024-12-19 21:23 UTC (permalink / raw) To: Claudio Imbrenda Cc: David Hildenbrand, linux-mm, linux-s390, Vishal Moola (Oracle) On Thu, Dec 19, 2024 at 04:22:49PM +0000, Matthew Wilcox (Oracle) wrote: > There was originally some doubt about whether these page tables > should be represented by a vanilla struct page or whether they > should be a ptdesc. As we continue on our quest to shrink > struct page, we seem to have crossed the line into believing that > thse page tables should be a ptdesc. At least for now. Looking at this patch some more, I'm not sure that pt_index is really what we should be calling this. Would pt_gmap_addr make sense? If not, what's the right name? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] s390: Convert gmap code to use ptdesc 2024-12-19 21:23 ` Matthew Wilcox @ 2024-12-20 9:22 ` David Hildenbrand 0 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2024-12-20 9:22 UTC (permalink / raw) To: Matthew Wilcox, Claudio Imbrenda Cc: linux-mm, linux-s390, Vishal Moola (Oracle) On 19.12.24 22:23, Matthew Wilcox wrote: > On Thu, Dec 19, 2024 at 04:22:49PM +0000, Matthew Wilcox (Oracle) wrote: >> There was originally some doubt about whether these page tables >> should be represented by a vanilla struct page or whether they >> should be a ptdesc. As we continue on our quest to shrink >> struct page, we seem to have crossed the line into believing that >> thse page tables should be a ptdesc. At least for now. > > Looking at this patch some more, I'm not sure that pt_index is really > what we should be calling this. Would pt_gmap_addr make sense? If > not, what's the right name? Let me try to refresh my memory. (1) Ordinary gmap for running virtual machines We set it to the guest physical address that this page table is responsible for. So we can easily translate from a given gmap page table + offset back to the guest physical address. We set it for PGD/P4D/PUD/PMD page tables in gmap_alloc_table(), and to 0 for the highest level (which logically corresponds to guest physical address 0: page->index = 0). We really only use it for PMD page tables in __gmap_segment_gaddr(), to calculate the guest physical address. (PTE page tables are shared with the process space page tables, which is why we don't maually allocate them or set/get page->index) (2) Shadow gmaps for nested virtualization We set it to the guest physical address of the page table we are shadowing. Meaning, this gmap page table shadows a page table in guest physical address space, and we have to We set it for all page table levels when shadowing them (when marking the to-be-shadowed page table R/O in the gmap so we can catch modifications). We use it in gmap_shadow_pgt_lookup() to translate from a nested guest physical address to a guest physical address, required to resolve faults for our nested guest. We only used "ptdesc->pt_index" in gmap_shadow_pgt() so far, because there we allocate PTE page tables as a shadow page table (not for user space page tables, though!). Long story short, they are "guest physical addresses", which we simply call "gaddr" in that code. So "pt_gaddr" or "pt_gmap_gaddr" make sense. (we should fixup that one pt_index user) -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] s390: Convert vsie code to use page->private 2024-12-19 16:22 [PATCH 0/2] s390: Remove uses of page->index Matthew Wilcox (Oracle) 2024-12-19 16:22 ` [PATCH 1/2] s390: Convert gmap code to use ptdesc Matthew Wilcox (Oracle) @ 2024-12-19 16:22 ` Matthew Wilcox (Oracle) 2024-12-20 9:55 ` David Hildenbrand 2024-12-19 16:33 ` [PATCH 0/2] s390: Remove uses of page->index David Hildenbrand 2025-01-03 14:53 ` Claudio Imbrenda 3 siblings, 1 reply; 12+ messages in thread From: Matthew Wilcox (Oracle) @ 2024-12-19 16:22 UTC (permalink / raw) To: Claudio Imbrenda Cc: Matthew Wilcox (Oracle), David Hildenbrand, linux-mm, linux-s390 The vsie pages are not standard page tables, so do not convert them to use ptdesc. Howver, page->index is going away so use page->private to store the address rather than page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- arch/s390/kvm/vsie.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 150b9387860a..26cbd69eb06d 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -1393,9 +1393,9 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) kvm->arch.vsie.next++; kvm->arch.vsie.next %= nr_vcpus; } - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); + radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->private >> 9); } - page->index = addr; + page->private = addr; /* double use of the same address */ if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) { page_ref_dec(page); @@ -1496,7 +1496,7 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) vsie_page = page_to_virt(page); release_gmap_shadow(vsie_page); /* free the radix tree entry */ - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); + radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->private >> 9); __free_page(page); } kvm->arch.vsie.page_count = 0; -- 2.45.2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] s390: Convert vsie code to use page->private 2024-12-19 16:22 ` [PATCH 2/2] s390: Convert vsie code to use page->private Matthew Wilcox (Oracle) @ 2024-12-20 9:55 ` David Hildenbrand 2024-12-20 11:40 ` Claudio Imbrenda 0 siblings, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2024-12-20 9:55 UTC (permalink / raw) To: Matthew Wilcox (Oracle), Claudio Imbrenda; +Cc: linux-mm, linux-s390 On 19.12.24 17:22, Matthew Wilcox (Oracle) wrote: > The vsie pages are not standard page tables, so do not convert them to > use ptdesc. They are not page tables "at all" :) A vsie_page is used when emulating hardware virtualization (SIE -- Start Interpretive Executio) for a CPU: vSIE -- virtual SIE These pages primarily hold the shadowed "SIE control block" (SCB) that tells the hardware what to do when entering hardware virtualization (SIE), and include things like CPU state such as registers. We store some other information in the same page. We have to the SCB provided by the VM VCPU when running the nested VM VCPU. The SCB resides in guest physical memory. So similar to shadowing of page tables, we have to detect modifications to the SCB, so we can update out shadowed version accordingly. We use grab a page and fill a vsie page VCPU wants to execute the SIE instruction (executing the nested VM VCPU), and return it to the pool when we are done emulating the SIE instruction. We try to reuse the same vsie pages over various runs, thats why we store the address of the last SCB address we shadowed, to look it up. (improves HW performance) So page->index will hold the "guest physical address of the SCB we shadowed the last time this vsie page was used". We seem to have space in the vsie page, so I think we can avoid messing with page-> completely! From c94e4ecd6ee791ef9cda1c0577a1e765e5ce2867 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Fri, 20 Dec 2024 10:53:46 +0100 Subject: [PATCH] tmp Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/s390/kvm/vsie.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 150b9387860ad..0a8cffe9b80bf 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -46,7 +46,13 @@ struct vsie_page { gpa_t gvrd_gpa; /* 0x0240 */ gpa_t riccbd_gpa; /* 0x0248 */ gpa_t sdnx_gpa; /* 0x0250 */ - __u8 reserved[0x0700 - 0x0258]; /* 0x0258 */ + /* + * guest address of the original SCB. Remains set for free vsie + * pages, so we can properly look them up in our addr_to_page + * radix tree. + */ + gpa_t scb_gpa; /* 0x0258 */ + __u8 reserved[0x0700 - 0x0260]; /* 0x0260 */ struct kvm_s390_crypto_cb crycb; /* 0x0700 */ __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ }; @@ -1383,6 +1389,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) page_ref_inc(page); kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = page; kvm->arch.vsie.page_count++; + vsie_page = page_to_virt(page); } else { /* reuse an existing entry that belongs to nobody */ while (true) { @@ -1393,9 +1400,11 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) kvm->arch.vsie.next++; kvm->arch.vsie.next %= nr_vcpus; } - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); + vsie_page = page_to_virt(page); + radix_tree_delete(&kvm->arch.vsie.addr_to_page, + vsie_page->scb_gpa >> 9); } - page->index = addr; + vsie_page->scb_gpa = addr; /* double use of the same address */ if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) { page_ref_dec(page); @@ -1404,7 +1413,6 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) } mutex_unlock(&kvm->arch.vsie.mutex); - vsie_page = page_to_virt(page); memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); release_gmap_shadow(vsie_page); vsie_page->fault_addr = 0; @@ -1496,7 +1504,8 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) vsie_page = page_to_virt(page); release_gmap_shadow(vsie_page); /* free the radix tree entry */ - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); + radix_tree_delete(&kvm->arch.vsie.addr_to_page, + vsie_page->scb_gpa >> 9); __free_page(page); } kvm->arch.vsie.page_count = 0; -- 2.47.1 Howver, page->index is going away so use page->private to > store the address rather than page->index. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > arch/s390/kvm/vsie.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c > index 150b9387860a..26cbd69eb06d 100644 > --- a/arch/s390/kvm/vsie.c > +++ b/arch/s390/kvm/vsie.c > @@ -1393,9 +1393,9 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) > kvm->arch.vsie.next++; > kvm->arch.vsie.next %= nr_vcpus; > } > - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); > + radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->private >> 9); > } > - page->index = addr; > + page->private = addr; > /* double use of the same address */ > if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) { > page_ref_dec(page); > @@ -1496,7 +1496,7 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) > vsie_page = page_to_virt(page); > release_gmap_shadow(vsie_page); > /* free the radix tree entry */ > - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); > + radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->private >> 9); > __free_page(page); > } > kvm->arch.vsie.page_count = 0; -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] s390: Convert vsie code to use page->private 2024-12-20 9:55 ` David Hildenbrand @ 2024-12-20 11:40 ` Claudio Imbrenda 0 siblings, 0 replies; 12+ messages in thread From: Claudio Imbrenda @ 2024-12-20 11:40 UTC (permalink / raw) To: David Hildenbrand; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-s390 On Fri, 20 Dec 2024 10:55:44 +0100 David Hildenbrand <david@redhat.com> wrote: > On 19.12.24 17:22, Matthew Wilcox (Oracle) wrote: > > The vsie pages are not standard page tables, so do not convert them to > > use ptdesc. > > They are not page tables "at all" :) > > A vsie_page is used when emulating hardware virtualization (SIE -- > Start Interpretive Executio) for a CPU: vSIE -- virtual SIE > > These pages primarily hold the shadowed "SIE control block" (SCB) that tells > the hardware what to do when entering hardware virtualization (SIE), and > include things like CPU state such as registers. We store some other > information in the same page. > > We have to the SCB provided by the VM VCPU when running the nested VM VCPU. The > SCB resides in guest physical memory. So similar to shadowing of page tables, > we have to detect modifications to the SCB, so we can update out shadowed version > accordingly. > > We use grab a page and fill a vsie page VCPU wants to execute the SIE instruction > (executing the nested VM VCPU), and return it to the pool when we are done > emulating the SIE instruction. We try to reuse the same vsie pages over various > runs, thats why we store the address of the last SCB address we shadowed, > to look it up. (improves HW performance) > > > So page->index will hold the "guest physical address of the SCB we shadowed > the last time this vsie page was used". > > We seem to have space in the vsie page, so I think we can avoid messing > with page-> completely! I really like this. > > From c94e4ecd6ee791ef9cda1c0577a1e765e5ce2867 Mon Sep 17 00:00:00 2001 > From: David Hildenbrand <david@redhat.com> > Date: Fri, 20 Dec 2024 10:53:46 +0100 > Subject: [PATCH] tmp > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > arch/s390/kvm/vsie.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c > index 150b9387860ad..0a8cffe9b80bf 100644 > --- a/arch/s390/kvm/vsie.c > +++ b/arch/s390/kvm/vsie.c > @@ -46,7 +46,13 @@ struct vsie_page { > gpa_t gvrd_gpa; /* 0x0240 */ > gpa_t riccbd_gpa; /* 0x0248 */ > gpa_t sdnx_gpa; /* 0x0250 */ > - __u8 reserved[0x0700 - 0x0258]; /* 0x0258 */ > + /* > + * guest address of the original SCB. Remains set for free vsie > + * pages, so we can properly look them up in our addr_to_page > + * radix tree. > + */ > + gpa_t scb_gpa; /* 0x0258 */ > + __u8 reserved[0x0700 - 0x0260]; /* 0x0260 */ > struct kvm_s390_crypto_cb crycb; /* 0x0700 */ > __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ > }; > @@ -1383,6 +1389,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) > page_ref_inc(page); > kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = page; > kvm->arch.vsie.page_count++; > + vsie_page = page_to_virt(page); > } else { > /* reuse an existing entry that belongs to nobody */ > while (true) { > @@ -1393,9 +1400,11 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) > kvm->arch.vsie.next++; > kvm->arch.vsie.next %= nr_vcpus; > } > - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); > + vsie_page = page_to_virt(page); > + radix_tree_delete(&kvm->arch.vsie.addr_to_page, > + vsie_page->scb_gpa >> 9); > } > - page->index = addr; > + vsie_page->scb_gpa = addr; > /* double use of the same address */ > if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) { > page_ref_dec(page); > @@ -1404,7 +1413,6 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) > } > mutex_unlock(&kvm->arch.vsie.mutex); > > - vsie_page = page_to_virt(page); > memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); > release_gmap_shadow(vsie_page); > vsie_page->fault_addr = 0; > @@ -1496,7 +1504,8 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) > vsie_page = page_to_virt(page); > release_gmap_shadow(vsie_page); > /* free the radix tree entry */ > - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); > + radix_tree_delete(&kvm->arch.vsie.addr_to_page, > + vsie_page->scb_gpa >> 9); > __free_page(page); > } > kvm->arch.vsie.page_count = 0; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] s390: Remove uses of page->index 2024-12-19 16:22 [PATCH 0/2] s390: Remove uses of page->index Matthew Wilcox (Oracle) 2024-12-19 16:22 ` [PATCH 1/2] s390: Convert gmap code to use ptdesc Matthew Wilcox (Oracle) 2024-12-19 16:22 ` [PATCH 2/2] s390: Convert vsie code to use page->private Matthew Wilcox (Oracle) @ 2024-12-19 16:33 ` David Hildenbrand 2024-12-19 16:52 ` Matthew Wilcox 2025-01-03 14:53 ` Claudio Imbrenda 3 siblings, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2024-12-19 16:33 UTC (permalink / raw) To: Matthew Wilcox (Oracle), Claudio Imbrenda; +Cc: linux-mm, linux-s390 On 19.12.24 17:22, Matthew Wilcox (Oracle) wrote: > These two patches compile ... I can promise nothing more than that. > > David suggested to me that the gmap code really should be using ptdesc, > and I think I agree with him. The vsie code looks quite different > and probably shouldn't be using a ptdesc, but we can use page->private > instead of page->index. It's not yet clear to me if we'll ever manage > to get rid of page->private. Just curious, does that mean that memdesc would always contain these additional 8 bytes? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] s390: Remove uses of page->index 2024-12-19 16:33 ` [PATCH 0/2] s390: Remove uses of page->index David Hildenbrand @ 2024-12-19 16:52 ` Matthew Wilcox 2024-12-19 16:56 ` David Hildenbrand 0 siblings, 1 reply; 12+ messages in thread From: Matthew Wilcox @ 2024-12-19 16:52 UTC (permalink / raw) To: David Hildenbrand; +Cc: Claudio Imbrenda, linux-mm, linux-s390 On Thu, Dec 19, 2024 at 05:33:32PM +0100, David Hildenbrand wrote: > On 19.12.24 17:22, Matthew Wilcox (Oracle) wrote: > > These two patches compile ... I can promise nothing more than that. > > > > David suggested to me that the gmap code really should be using ptdesc, > > and I think I agree with him. The vsie code looks quite different > > and probably shouldn't be using a ptdesc, but we can use page->private > > instead of page->index. It's not yet clear to me if we'll ever manage > > to get rid of page->private. > > Just curious, does that mean that memdesc would always contain these > additional 8 bytes? Eventually we'll have a choice to make. 1. Shrink struct page to 8 bytes, but we'll need to allocate a 32 byte memdesc for all the current users of page->private 2. Shrink struct page to 16 bytes I genuinely don't know which will be better for the whole system. If you're asking because we can defer some of the mapcount work by using the 8 bytes of page->private to store mapcount, then yes, let's do that. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] s390: Remove uses of page->index 2024-12-19 16:52 ` Matthew Wilcox @ 2024-12-19 16:56 ` David Hildenbrand 0 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2024-12-19 16:56 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Claudio Imbrenda, linux-mm, linux-s390 On 19.12.24 17:52, Matthew Wilcox wrote: > On Thu, Dec 19, 2024 at 05:33:32PM +0100, David Hildenbrand wrote: >> On 19.12.24 17:22, Matthew Wilcox (Oracle) wrote: >>> These two patches compile ... I can promise nothing more than that. >>> >>> David suggested to me that the gmap code really should be using ptdesc, >>> and I think I agree with him. The vsie code looks quite different >>> and probably shouldn't be using a ptdesc, but we can use page->private >>> instead of page->index. It's not yet clear to me if we'll ever manage >>> to get rid of page->private. >> >> Just curious, does that mean that memdesc would always contain these >> additional 8 bytes? > > Eventually we'll have a choice to make. > > 1. Shrink struct page to 8 bytes, but we'll need to allocate a 32 byte > memdesc for all the current users of page->private > 2. Shrink struct page to 16 bytes > > I genuinely don't know which will be better for the whole system. > > If you're asking because we can defer some of the mapcount > work by using the 8 bytes of page->private to store mapcount, then > yes, let's do that. Yeah, it could give us a bit more time, until we know that we can just get rid of it in all kernel configs. But let's see if that is still a problem when we're getting closer to removing them. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] s390: Remove uses of page->index 2024-12-19 16:22 [PATCH 0/2] s390: Remove uses of page->index Matthew Wilcox (Oracle) ` (2 preceding siblings ...) 2024-12-19 16:33 ` [PATCH 0/2] s390: Remove uses of page->index David Hildenbrand @ 2025-01-03 14:53 ` Claudio Imbrenda 2025-01-07 10:23 ` David Hildenbrand 3 siblings, 1 reply; 12+ messages in thread From: Claudio Imbrenda @ 2025-01-03 14:53 UTC (permalink / raw) To: Matthew Wilcox (Oracle); +Cc: David Hildenbrand, linux-mm, linux-s390 On Thu, 19 Dec 2024 16:22:48 +0000 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote: > These two patches compile ... I can promise nothing more than that. hi, I appreciate the effort, but I'm actually working on a series to completely stop using page->index and page->lru . It works, only needs reviews and some proper testing; I will send it out next week. if David can send a polished version of his patch, I can pick it, otherwise I'll polish the WIP he has already sent. @David: can you please polish and resend your patch? with that done, this whole series becomes moot (sorry) > > David suggested to me that the gmap code really should be using ptdesc, > and I think I agree with him. The vsie code looks quite different > and probably shouldn't be using a ptdesc, but we can use page->private > instead of page->index. It's not yet clear to me if we'll ever manage > to get rid of page->private. > > Matthew Wilcox (Oracle) (2): > s390: Convert gmap code to use ptdesc > s390: Convert vsie code to use page->private > > arch/s390/kvm/vsie.c | 6 +- > arch/s390/mm/gmap.c | 181 +++++++++++++++++++++---------------------- > 2 files changed, 92 insertions(+), 95 deletions(-) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] s390: Remove uses of page->index 2025-01-03 14:53 ` Claudio Imbrenda @ 2025-01-07 10:23 ` David Hildenbrand 0 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2025-01-07 10:23 UTC (permalink / raw) To: Claudio Imbrenda, Matthew Wilcox (Oracle); +Cc: linux-mm, linux-s390 On 03.01.25 15:53, Claudio Imbrenda wrote: > On Thu, 19 Dec 2024 16:22:48 +0000 > "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote: > >> These two patches compile ... I can promise nothing more than that. > > hi, > > I appreciate the effort, but I'm actually working on a series to > completely stop using page->index and page->lru . It works, only needs > reviews and some proper testing; I will send it out next week. > > if David can send a polished version of his patch, I can pick it, > otherwise I'll polish the WIP he has already sent. > > @David: can you please polish and resend your patch? Yes, I can send an official patch today. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-07 10:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-19 16:22 [PATCH 0/2] s390: Remove uses of page->index Matthew Wilcox (Oracle) 2024-12-19 16:22 ` [PATCH 1/2] s390: Convert gmap code to use ptdesc Matthew Wilcox (Oracle) 2024-12-19 21:23 ` Matthew Wilcox 2024-12-20 9:22 ` David Hildenbrand 2024-12-19 16:22 ` [PATCH 2/2] s390: Convert vsie code to use page->private Matthew Wilcox (Oracle) 2024-12-20 9:55 ` David Hildenbrand 2024-12-20 11:40 ` Claudio Imbrenda 2024-12-19 16:33 ` [PATCH 0/2] s390: Remove uses of page->index David Hildenbrand 2024-12-19 16:52 ` Matthew Wilcox 2024-12-19 16:56 ` David Hildenbrand 2025-01-03 14:53 ` Claudio Imbrenda 2025-01-07 10:23 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox