linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1
@ 2024-04-04 16:36 David Hildenbrand
  2024-04-04 16:36 ` [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

On my journey to remove page_mapcount(), I got hooked up on other folio
cleanups that Willy most certainly will enjoy.

This series removes the s390x usage of:
* page_mapcount() [patches WIP]
* page_has_private() [have patches to remove it]

... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
pages of large folios).

Further, one "easy" fix upfront.

... unfortunately there is one other issue I spotted that I am not
tackling in this series, because I am not 100% sure what we want to
do: the usage of page_ref_freeze()/folio_ref_freeze() in
make_folio_secure() is unsafe. :(

In make_folio_secure(), we're holding the folio lock, the mmap lock and
the PT lock. So we are protected against concurrent fork(), zap, GUP,
swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
also block concurrent GUP-fast very reliably.

But if the folio is mapped into multiple page tables, we could see
concurrent zapping of the folio, a pagecache folios could get mapped/
accessed concurrent, we could see fork() sharing the page in another
process, GUP ... trying to adjust the folio refcount while we froze it.
Very bad.

For anonymous folios, it would likely be sufficient to check that
folio_mapcount() == 1. For pagecache folios, that's insufficient, likely
we would have to lock the pagecache. To handle folios mapped into
multiple page tables, we would have to do what
split_huge_page_to_list_to_order() does (temporary migration entries).

So it's a bit more involved, and I'll have to leave that to s390x folks to
figure out. There are othe reasonable cleanups I think, but I'll have to
focus on other stuff.

Compile tested, but not runtime tested, I'll appreiate some testing help
from people with UV access and experience.

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Huth <thuth@redhat.com>

David Hildenbrand (5):
  s390/uv: don't call wait_on_page_writeback() without a reference
  s390/uv: convert gmap_make_secure() to work on folios
  s390/uv: convert PG_arch_1 users to only work on small folios
  s390/uv: update PG_arch_1 comment
  s390/hugetlb: convert PG_arch_1 code to work on folio->flags

 arch/s390/include/asm/page.h |   2 +
 arch/s390/kernel/uv.c        | 112 ++++++++++++++++++++++-------------
 arch/s390/mm/gmap.c          |   4 +-
 arch/s390/mm/hugetlbpage.c   |   8 +--
 4 files changed, 79 insertions(+), 47 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
@ 2024-04-04 16:36 ` David Hildenbrand
  2024-04-10 17:21   ` Claudio Imbrenda
  2024-04-04 16:36 ` [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

wait_on_page_writeback() requires that no spinlocks are held and that
a page reference is held, as documented for folio_wait_writeback(). After
we dropped the PTL, the page could get freed concurrently. So grab a
temporary reference.

Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kernel/uv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index fc07bc39e698..7401838b960b 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -314,6 +314,13 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 			rc = make_page_secure(page, uvcb);
 			unlock_page(page);
 		}
+
+		/*
+		 * Once we drop the PTL, the page may get unmapped and
+		 * freed immediately. We need a temporary reference.
+		 */
+		if (rc == -EAGAIN)
+			get_page(page);
 	}
 	pte_unmap_unlock(ptep, ptelock);
 out:
@@ -325,6 +332,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 		 * completion, this is just a useless check, but it is safe.
 		 */
 		wait_on_page_writeback(page);
+		put_page(page);
 	} else if (rc == -EBUSY) {
 		/*
 		 * If we have tried a local drain and the page refcount
-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
  2024-04-04 16:36 ` [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference David Hildenbrand
@ 2024-04-04 16:36 ` David Hildenbrand
  2024-04-05  3:29   ` Matthew Wilcox
  2024-04-10 17:31   ` Claudio Imbrenda
  2024-04-04 16:36 ` [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

We have various goals that require gmap_make_secure() to only work on
folios. We want to limit the use of page_mapcount() to the places where it
is absolutely necessary, we want to avoid using page flags of tail
pages, and we want to remove page_has_private().

So, let's convert gmap_make_secure() to folios. While s390x makes sure
to never have PMD-mapped THP in processes that use KVM -- by remapping
them using PTEs in thp_split_walk_pmd_entry()->split_huge_pmd() -- we might
still find PTE-mapped THPs and could end up working on tail pages of
such large folios for now.

To handle that cleanly, let's simply split any PTE-mapped large folio,
so we can be sure that we are always working with small folios and never
on tail pages.

There is no real change: splitting will similarly fail on unexpected folio
references, just like it would already when we try to freeze the folio
refcount.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/page.h |  1 +
 arch/s390/kernel/uv.c        | 66 ++++++++++++++++++++++--------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 9381879f7ecf..54d015bcd8e3 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -215,6 +215,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
 
 #define phys_to_page(phys)	pfn_to_page(phys_to_pfn(phys))
 #define page_to_phys(page)	pfn_to_phys(page_to_pfn(page))
+#define folio_to_phys(page)	pfn_to_phys(folio_pfn(folio))
 
 static inline void *pfn_to_virt(unsigned long pfn)
 {
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 7401838b960b..adcbd4b13035 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -181,36 +181,36 @@ int uv_convert_owned_from_secure(unsigned long paddr)
 }
 
 /*
- * Calculate the expected ref_count for a page that would otherwise have no
+ * Calculate the expected ref_count for a folio that would otherwise have no
  * further pins. This was cribbed from similar functions in other places in
  * the kernel, but with some slight modifications. We know that a secure
- * page can not be a huge page for example.
+ * folio can only be a small folio for example.
  */
-static int expected_page_refs(struct page *page)
+static int expected_folio_refs(struct folio *folio)
 {
 	int res;
 
-	res = page_mapcount(page);
-	if (PageSwapCache(page)) {
+	res = folio_mapcount(folio);
+	if (folio_test_swapcache(folio)) {
 		res++;
-	} else if (page_mapping(page)) {
+	} else if (folio_mapping(folio)) {
 		res++;
-		if (page_has_private(page))
+		if (folio_has_private(folio))
 			res++;
 	}
 	return res;
 }
 
-static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
+static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
 {
 	int expected, cc = 0;
 
-	if (PageWriteback(page))
+	if (folio_test_writeback(folio))
 		return -EAGAIN;
-	expected = expected_page_refs(page);
-	if (!page_ref_freeze(page, expected))
+	expected = expected_folio_refs(folio);
+	if (!folio_ref_freeze(folio, expected))
 		return -EBUSY;
-	set_bit(PG_arch_1, &page->flags);
+	set_bit(PG_arch_1, &folio->flags);
 	/*
 	 * If the UVC does not succeed or fail immediately, we don't want to
 	 * loop for long, or we might get stall notifications.
@@ -220,9 +220,9 @@ static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
 	 * -EAGAIN and we let the callers deal with it.
 	 */
 	cc = __uv_call(0, (u64)uvcb);
-	page_ref_unfreeze(page, expected);
+	folio_ref_unfreeze(folio, expected);
 	/*
-	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
+	 * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
 	 * If busy or partially completed, return -EAGAIN.
 	 */
 	if (cc == UVC_CC_OK)
@@ -277,7 +277,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 	bool local_drain = false;
 	spinlock_t *ptelock;
 	unsigned long uaddr;
-	struct page *page;
+	struct folio *folio;
 	pte_t *ptep;
 	int rc;
 
@@ -306,33 +306,49 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 	if (!ptep)
 		goto out;
 	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
-		page = pte_page(*ptep);
+		folio = page_folio(pte_page(*ptep));
 		rc = -EAGAIN;
-		if (trylock_page(page)) {
+
+		/* We might get PTE-mapped large folios; split them first. */
+		if (folio_test_large(folio)) {
+			rc = -E2BIG;
+		} else if (folio_trylock(folio)) {
 			if (should_export_before_import(uvcb, gmap->mm))
-				uv_convert_from_secure(page_to_phys(page));
-			rc = make_page_secure(page, uvcb);
-			unlock_page(page);
+				uv_convert_from_secure(folio_to_phys(folio));
+			rc = make_folio_secure(folio, uvcb);
+			folio_unlock(folio);
 		}
 
 		/*
-		 * Once we drop the PTL, the page may get unmapped and
+		 * Once we drop the PTL, the folio may get unmapped and
 		 * freed immediately. We need a temporary reference.
 		 */
-		if (rc == -EAGAIN)
-			get_page(page);
+		if (rc == -EAGAIN || rc == -E2BIG)
+			folio_get(folio);
 	}
 	pte_unmap_unlock(ptep, ptelock);
 out:
 	mmap_read_unlock(gmap->mm);
 
+	if (rc == -E2BIG) {
+		/*
+		 * Splitting might fail with -EBUSY due to unexpected folio
+		 * references, just like make_folio_secure(). So handle it
+		 * ahead of time without the PTL being held.
+		 */
+		folio_lock(folio);
+		rc = split_folio(folio);
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
 	if (rc == -EAGAIN) {
 		/*
 		 * If we are here because the UVC returned busy or partial
 		 * completion, this is just a useless check, but it is safe.
 		 */
-		wait_on_page_writeback(page);
-		put_page(page);
+		folio_wait_writeback(folio);
+		folio_put(folio);
 	} else if (rc == -EBUSY) {
 		/*
 		 * If we have tried a local drain and the page refcount
-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
  2024-04-04 16:36 ` [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference David Hildenbrand
  2024-04-04 16:36 ` [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios David Hildenbrand
@ 2024-04-04 16:36 ` David Hildenbrand
  2024-04-05  3:36   ` Matthew Wilcox
  2024-04-10 17:42   ` Claudio Imbrenda
  2024-04-04 16:36 ` [PATCH v1 4/5] s390/uv: update PG_arch_1 comment David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

Now that make_folio_secure() may only set PG_arch_1 for small folios,
let's convert relevant remaining UV code to only work on (small) folios
and simply reject large folios early. This way, we'll never end up
touching PG_arch_1 on tail pages of a large folio in UV code.

The folio_get()/folio_put() for functions that are documented to already
hold a folio reference look weird and it should probably be removed.
Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
should really consume a folio reference instead. But these are cleanups for
another day.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/page.h |  1 +
 arch/s390/kernel/uv.c        | 39 +++++++++++++++++++++---------------
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 54d015bcd8e3..b64384872c0f 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -214,6 +214,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
 #define pfn_to_phys(pfn)	((pfn) << PAGE_SHIFT)
 
 #define phys_to_page(phys)	pfn_to_page(phys_to_pfn(phys))
+#define phys_to_folio(phys)	page_folio(phys_to_page(phys))
 #define page_to_phys(page)	pfn_to_phys(page_to_pfn(page))
 #define folio_to_phys(page)	pfn_to_phys(folio_pfn(folio))
 
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index adcbd4b13035..9c0113b26735 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -134,14 +134,17 @@ static int uv_destroy_page(unsigned long paddr)
  */
 int uv_destroy_owned_page(unsigned long paddr)
 {
-	struct page *page = phys_to_page(paddr);
+	struct folio *folio = phys_to_folio(paddr);
 	int rc;
 
-	get_page(page);
+	if (unlikely(folio_test_large(folio)))
+		return 0;
+
+	folio_get(folio);
 	rc = uv_destroy_page(paddr);
 	if (!rc)
-		clear_bit(PG_arch_1, &page->flags);
-	put_page(page);
+		clear_bit(PG_arch_1, &folio->flags);
+	folio_put(folio);
 	return rc;
 }
 
@@ -169,14 +172,17 @@ int uv_convert_from_secure(unsigned long paddr)
  */
 int uv_convert_owned_from_secure(unsigned long paddr)
 {
-	struct page *page = phys_to_page(paddr);
+	struct folio *folio = phys_to_folio(paddr);
 	int rc;
 
-	get_page(page);
+	if (unlikely(folio_test_large(folio)))
+		return 0;
+
+	folio_get(folio);
 	rc = uv_convert_from_secure(paddr);
 	if (!rc)
-		clear_bit(PG_arch_1, &page->flags);
-	put_page(page);
+		clear_bit(PG_arch_1, &folio->flags);
+	folio_put(folio);
 	return rc;
 }
 
@@ -457,33 +463,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
  */
 int arch_make_page_accessible(struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int rc = 0;
 
-	/* Hugepage cannot be protected, so nothing to do */
-	if (PageHuge(page))
+	/* Large folios cannot be protected, so nothing to do */
+	if (unlikely(folio_test_large(folio)))
 		return 0;
 
 	/*
 	 * PG_arch_1 is used in 3 places:
 	 * 1. for kernel page tables during early boot
 	 * 2. for storage keys of huge pages and KVM
-	 * 3. As an indication that this page might be secure. This can
+	 * 3. As an indication that this small folio might be secure. This can
 	 *    overindicate, e.g. we set the bit before calling
 	 *    convert_to_secure.
 	 * As secure pages are never huge, all 3 variants can co-exists.
 	 */
-	if (!test_bit(PG_arch_1, &page->flags))
+	if (!test_bit(PG_arch_1, &folio->flags))
 		return 0;
 
-	rc = uv_pin_shared(page_to_phys(page));
+	rc = uv_pin_shared(folio_to_phys(folio));
 	if (!rc) {
-		clear_bit(PG_arch_1, &page->flags);
+		clear_bit(PG_arch_1, &folio->flags);
 		return 0;
 	}
 
-	rc = uv_convert_from_secure(page_to_phys(page));
+	rc = uv_convert_from_secure(folio_to_phys(folio));
 	if (!rc) {
-		clear_bit(PG_arch_1, &page->flags);
+		clear_bit(PG_arch_1, &folio->flags);
 		return 0;
 	}
 
-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 4/5] s390/uv: update PG_arch_1 comment
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
                   ` (2 preceding siblings ...)
  2024-04-04 16:36 ` [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios David Hildenbrand
@ 2024-04-04 16:36 ` David Hildenbrand
  2024-04-10 17:19   ` Claudio Imbrenda
  2024-04-04 16:36 ` [PATCH v1 5/5] s390/hugetlb: convert PG_arch_1 code to work on folio->flags David Hildenbrand
  2024-04-05  3:42 ` [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 Matthew Wilcox
  5 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

We removed the usage of PG_arch_1 for page tables in commit
a51324c430db ("s390/cmma: rework no-dat handling").

Let's update the comment in UV to reflect that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kernel/uv.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 9c0113b26735..76fc61333fae 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -471,13 +471,12 @@ int arch_make_page_accessible(struct page *page)
 		return 0;
 
 	/*
-	 * PG_arch_1 is used in 3 places:
-	 * 1. for kernel page tables during early boot
-	 * 2. for storage keys of huge pages and KVM
-	 * 3. As an indication that this small folio might be secure. This can
+	 * PG_arch_1 is used in 2 places:
+	 * 1. for storage keys of hugetlb folios and KVM
+	 * 2. As an indication that this small folio might be secure. This can
 	 *    overindicate, e.g. we set the bit before calling
 	 *    convert_to_secure.
-	 * As secure pages are never huge, all 3 variants can co-exists.
+	 * As secure pages are never large folios, both variants can co-exists.
 	 */
 	if (!test_bit(PG_arch_1, &folio->flags))
 		return 0;
-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 5/5] s390/hugetlb: convert PG_arch_1 code to work on folio->flags
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
                   ` (3 preceding siblings ...)
  2024-04-04 16:36 ` [PATCH v1 4/5] s390/uv: update PG_arch_1 comment David Hildenbrand
@ 2024-04-04 16:36 ` David Hildenbrand
  2024-04-05  3:42 ` [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 Matthew Wilcox
  5 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-04-04 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, kvm, David Hildenbrand, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Claudio Imbrenda, Gerald Schaefer, Thomas Huth

Let's make it clearer that we are always working on folio flags and
never page flags of tail pages.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/gmap.c        | 4 ++--
 arch/s390/mm/hugetlbpage.c | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 9233b0acac89..ca31f2143bc0 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2731,7 +2731,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
 {
 	pmd_t *pmd = (pmd_t *)pte;
 	unsigned long start, end;
-	struct page *page = pmd_page(*pmd);
+	struct folio *folio = pmd_folio(*pmd);
 
 	/*
 	 * The write check makes sure we do not set a key on shared
@@ -2746,7 +2746,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
 	start = pmd_val(*pmd) & HPAGE_MASK;
 	end = start + HPAGE_SIZE - 1;
 	__storage_key_init_range(start, end);
-	set_bit(PG_arch_1, &page->flags);
+	set_bit(PG_arch_1, &folio->flags);
 	cond_resched();
 	return 0;
 }
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index e1e63dc1b23d..21ed6ac5f1c5 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -121,7 +121,7 @@ static inline pte_t __rste_to_pte(unsigned long rste)
 
 static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
 {
-	struct page *page;
+	struct folio *folio;
 	unsigned long size, paddr;
 
 	if (!mm_uses_skeys(mm) ||
@@ -129,16 +129,16 @@ static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
 		return;
 
 	if ((rste & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) {
-		page = pud_page(__pud(rste));
+		folio = page_folio(pud_page(__pud(rste)));
 		size = PUD_SIZE;
 		paddr = rste & PUD_MASK;
 	} else {
-		page = pmd_page(__pmd(rste));
+		folio = pmd_folio(__pmd(rste));
 		size = PMD_SIZE;
 		paddr = rste & PMD_MASK;
 	}
 
-	if (!test_and_set_bit(PG_arch_1, &page->flags))
+	if (!test_and_set_bit(PG_arch_1, &folio->flags))
 		__storage_key_init_range(paddr, paddr + size - 1);
 }
 
-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios
  2024-04-04 16:36 ` [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios David Hildenbrand
@ 2024-04-05  3:29   ` Matthew Wilcox
       [not found]     ` <67557c5b-afd8-4578-a00d-6750accc1026@redhat.com>
  2024-04-10 17:31   ` Claudio Imbrenda
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2024-04-05  3:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Janosch Frank, Claudio Imbrenda, Gerald Schaefer,
	Thomas Huth

On Thu, Apr 04, 2024 at 06:36:39PM +0200, David Hildenbrand wrote:
> +		/* We might get PTE-mapped large folios; split them first. */
> +		if (folio_test_large(folio)) {
> +			rc = -E2BIG;

We agree to this point.  I just turned this into -EINVAL.

>  
> +	if (rc == -E2BIG) {
> +		/*
> +		 * Splitting might fail with -EBUSY due to unexpected folio
> +		 * references, just like make_folio_secure(). So handle it
> +		 * ahead of time without the PTL being held.
> +		 */
> +		folio_lock(folio);
> +		rc = split_folio(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}

Ummm ... if split_folio() succeeds, aren't we going to return 0 from
this function, which will be interpreted as make_folio_secure() having
succeeded?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios
  2024-04-04 16:36 ` [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios David Hildenbrand
@ 2024-04-05  3:36   ` Matthew Wilcox
  2024-04-10 17:42   ` Claudio Imbrenda
  1 sibling, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2024-04-05  3:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Janosch Frank, Claudio Imbrenda, Gerald Schaefer,
	Thomas Huth

On Thu, Apr 04, 2024 at 06:36:40PM +0200, David Hildenbrand wrote:
> Now that make_folio_secure() may only set PG_arch_1 for small folios,
> let's convert relevant remaining UV code to only work on (small) folios
> and simply reject large folios early. This way, we'll never end up
> touching PG_arch_1 on tail pages of a large folio in UV code.
> 
> The folio_get()/folio_put() for functions that are documented to already
> hold a folio reference look weird and it should probably be removed.
> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
> should really consume a folio reference instead. But these are cleanups for
> another day.

Yes, and we should convert arch_make_page_accessible() to
arch_make_folio_accessible() ... one of the two callers already has the
folio (and page-writeback already calls arch_make_folio_accessible()


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1
  2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
                   ` (4 preceding siblings ...)
  2024-04-04 16:36 ` [PATCH v1 5/5] s390/hugetlb: convert PG_arch_1 code to work on folio->flags David Hildenbrand
@ 2024-04-05  3:42 ` Matthew Wilcox
  5 siblings, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2024-04-05  3:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Janosch Frank, Claudio Imbrenda, Gerald Schaefer,
	Thomas Huth

On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
> On my journey to remove page_mapcount(), I got hooked up on other folio
> cleanups that Willy most certainly will enjoy.
> 
> This series removes the s390x usage of:
> * page_mapcount() [patches WIP]
> * page_has_private() [have patches to remove it]
> 
> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
> pages of large folios).
> 
> Further, one "easy" fix upfront.

Looks like you didn't see:

https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/

> ... unfortunately there is one other issue I spotted that I am not
> tackling in this series, because I am not 100% sure what we want to
> do: the usage of page_ref_freeze()/folio_ref_freeze() in
> make_folio_secure() is unsafe. :(
> 
> In make_folio_secure(), we're holding the folio lock, the mmap lock and
> the PT lock. So we are protected against concurrent fork(), zap, GUP,
> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
> also block concurrent GUP-fast very reliably.
> 
> But if the folio is mapped into multiple page tables, we could see
> concurrent zapping of the folio, a pagecache folios could get mapped/
> accessed concurrent, we could see fork() sharing the page in another
> process, GUP ... trying to adjust the folio refcount while we froze it.
> Very bad.

Hmmm.  Why is that not then a problem for, eg, splitting or migrating?
Is it because they unmap first and then try to freeze?

> For anonymous folios, it would likely be sufficient to check that
> folio_mapcount() == 1. For pagecache folios, that's insufficient, likely
> we would have to lock the pagecache. To handle folios mapped into
> multiple page tables, we would have to do what
> split_huge_page_to_list_to_order() does (temporary migration entries).
> 
> So it's a bit more involved, and I'll have to leave that to s390x folks to
> figure out. There are othe reasonable cleanups I think, but I'll have to
> focus on other stuff.
> 
> Compile tested, but not runtime tested, I'll appreiate some testing help
> from people with UV access and experience.
> 
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Thomas Huth <thuth@redhat.com>
> 
> David Hildenbrand (5):
>   s390/uv: don't call wait_on_page_writeback() without a reference
>   s390/uv: convert gmap_make_secure() to work on folios
>   s390/uv: convert PG_arch_1 users to only work on small folios
>   s390/uv: update PG_arch_1 comment
>   s390/hugetlb: convert PG_arch_1 code to work on folio->flags
> 
>  arch/s390/include/asm/page.h |   2 +
>  arch/s390/kernel/uv.c        | 112 ++++++++++++++++++++++-------------
>  arch/s390/mm/gmap.c          |   4 +-
>  arch/s390/mm/hugetlbpage.c   |   8 +--
>  4 files changed, 79 insertions(+), 47 deletions(-)
> 
> -- 
> 2.44.0
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 4/5] s390/uv: update PG_arch_1 comment
  2024-04-04 16:36 ` [PATCH v1 4/5] s390/uv: update PG_arch_1 comment David Hildenbrand
@ 2024-04-10 17:19   ` Claudio Imbrenda
  0 siblings, 0 replies; 17+ messages in thread
From: Claudio Imbrenda @ 2024-04-10 17:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Thu,  4 Apr 2024 18:36:41 +0200
David Hildenbrand <david@redhat.com> wrote:

> We removed the usage of PG_arch_1 for page tables in commit
> a51324c430db ("s390/cmma: rework no-dat handling").
> 
> Let's update the comment in UV to reflect that.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  arch/s390/kernel/uv.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 9c0113b26735..76fc61333fae 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -471,13 +471,12 @@ int arch_make_page_accessible(struct page *page)
>  		return 0;
>  
>  	/*
> -	 * PG_arch_1 is used in 3 places:
> -	 * 1. for kernel page tables during early boot
> -	 * 2. for storage keys of huge pages and KVM
> -	 * 3. As an indication that this small folio might be secure. This can
> +	 * PG_arch_1 is used in 2 places:
> +	 * 1. for storage keys of hugetlb folios and KVM
> +	 * 2. As an indication that this small folio might be secure. This can
>  	 *    overindicate, e.g. we set the bit before calling
>  	 *    convert_to_secure.
> -	 * As secure pages are never huge, all 3 variants can co-exists.
> +	 * As secure pages are never large folios, both variants can co-exists.
>  	 */
>  	if (!test_bit(PG_arch_1, &folio->flags))
>  		return 0;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference
  2024-04-04 16:36 ` [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference David Hildenbrand
@ 2024-04-10 17:21   ` Claudio Imbrenda
  2024-04-11  8:24     ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Claudio Imbrenda @ 2024-04-10 17:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Thu,  4 Apr 2024 18:36:38 +0200
David Hildenbrand <david@redhat.com> wrote:

> wait_on_page_writeback() requires that no spinlocks are held and that
> a page reference is held, as documented for folio_wait_writeback(). After

oops

> we dropped the PTL, the page could get freed concurrently. So grab a
> temporary reference.
> 
> Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  arch/s390/kernel/uv.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index fc07bc39e698..7401838b960b 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -314,6 +314,13 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  			rc = make_page_secure(page, uvcb);
>  			unlock_page(page);
>  		}
> +
> +		/*
> +		 * Once we drop the PTL, the page may get unmapped and
> +		 * freed immediately. We need a temporary reference.
> +		 */
> +		if (rc == -EAGAIN)
> +			get_page(page);
>  	}
>  	pte_unmap_unlock(ptep, ptelock);
>  out:
> @@ -325,6 +332,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  		 * completion, this is just a useless check, but it is safe.
>  		 */
>  		wait_on_page_writeback(page);
> +		put_page(page);
>  	} else if (rc == -EBUSY) {
>  		/*
>  		 * If we have tried a local drain and the page refcount



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios
  2024-04-04 16:36 ` [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios David Hildenbrand
  2024-04-05  3:29   ` Matthew Wilcox
@ 2024-04-10 17:31   ` Claudio Imbrenda
  2024-04-11  9:09     ` David Hildenbrand
  1 sibling, 1 reply; 17+ messages in thread
From: Claudio Imbrenda @ 2024-04-10 17:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Thu,  4 Apr 2024 18:36:39 +0200
David Hildenbrand <david@redhat.com> wrote:

> We have various goals that require gmap_make_secure() to only work on
> folios. We want to limit the use of page_mapcount() to the places where it
> is absolutely necessary, we want to avoid using page flags of tail
> pages, and we want to remove page_has_private().
> 
> So, let's convert gmap_make_secure() to folios. While s390x makes sure
> to never have PMD-mapped THP in processes that use KVM -- by remapping
> them using PTEs in thp_split_walk_pmd_entry()->split_huge_pmd() -- we might
> still find PTE-mapped THPs and could end up working on tail pages of
> such large folios for now.
> 
> To handle that cleanly, let's simply split any PTE-mapped large folio,
> so we can be sure that we are always working with small folios and never
> on tail pages.
> 
> There is no real change: splitting will similarly fail on unexpected folio
> references, just like it would already when we try to freeze the folio
> refcount.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/include/asm/page.h |  1 +
>  arch/s390/kernel/uv.c        | 66 ++++++++++++++++++++++--------------
>  2 files changed, 42 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 9381879f7ecf..54d015bcd8e3 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -215,6 +215,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
>  
>  #define phys_to_page(phys)	pfn_to_page(phys_to_pfn(phys))
>  #define page_to_phys(page)	pfn_to_phys(page_to_pfn(page))
> +#define folio_to_phys(page)	pfn_to_phys(folio_pfn(folio))
>  
>  static inline void *pfn_to_virt(unsigned long pfn)
>  {
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 7401838b960b..adcbd4b13035 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -181,36 +181,36 @@ int uv_convert_owned_from_secure(unsigned long paddr)
>  }
>  
>  /*
> - * Calculate the expected ref_count for a page that would otherwise have no
> + * Calculate the expected ref_count for a folio that would otherwise have no
>   * further pins. This was cribbed from similar functions in other places in
>   * the kernel, but with some slight modifications. We know that a secure
> - * page can not be a huge page for example.
> + * folio can only be a small folio for example.
>   */
> -static int expected_page_refs(struct page *page)
> +static int expected_folio_refs(struct folio *folio)
>  {
>  	int res;
>  
> -	res = page_mapcount(page);
> -	if (PageSwapCache(page)) {
> +	res = folio_mapcount(folio);
> +	if (folio_test_swapcache(folio)) {
>  		res++;
> -	} else if (page_mapping(page)) {
> +	} else if (folio_mapping(folio)) {
>  		res++;
> -		if (page_has_private(page))
> +		if (folio_has_private(folio))
>  			res++;
>  	}
>  	return res;
>  }
>  
> -static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
> +static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
>  {
>  	int expected, cc = 0;
>  
> -	if (PageWriteback(page))
> +	if (folio_test_writeback(folio))
>  		return -EAGAIN;
> -	expected = expected_page_refs(page);
> -	if (!page_ref_freeze(page, expected))
> +	expected = expected_folio_refs(folio);
> +	if (!folio_ref_freeze(folio, expected))
>  		return -EBUSY;
> -	set_bit(PG_arch_1, &page->flags);
> +	set_bit(PG_arch_1, &folio->flags);
>  	/*
>  	 * If the UVC does not succeed or fail immediately, we don't want to
>  	 * loop for long, or we might get stall notifications.
> @@ -220,9 +220,9 @@ static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
>  	 * -EAGAIN and we let the callers deal with it.
>  	 */
>  	cc = __uv_call(0, (u64)uvcb);
> -	page_ref_unfreeze(page, expected);
> +	folio_ref_unfreeze(folio, expected);
>  	/*
> -	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
> +	 * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
>  	 * If busy or partially completed, return -EAGAIN.
>  	 */
>  	if (cc == UVC_CC_OK)
> @@ -277,7 +277,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  	bool local_drain = false;
>  	spinlock_t *ptelock;
>  	unsigned long uaddr;
> -	struct page *page;
> +	struct folio *folio;
>  	pte_t *ptep;
>  	int rc;
>  
> @@ -306,33 +306,49 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  	if (!ptep)
>  		goto out;
>  	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> -		page = pte_page(*ptep);
> +		folio = page_folio(pte_page(*ptep));
>  		rc = -EAGAIN;
> -		if (trylock_page(page)) {
> +
> +		/* We might get PTE-mapped large folios; split them first. */
> +		if (folio_test_large(folio)) {
> +			rc = -E2BIG;
> +		} else if (folio_trylock(folio)) {
>  			if (should_export_before_import(uvcb, gmap->mm))
> -				uv_convert_from_secure(page_to_phys(page));
> -			rc = make_page_secure(page, uvcb);
> -			unlock_page(page);
> +				uv_convert_from_secure(folio_to_phys(folio));
> +			rc = make_folio_secure(folio, uvcb);
> +			folio_unlock(folio);
>  		}
>  
>  		/*
> -		 * Once we drop the PTL, the page may get unmapped and
> +		 * Once we drop the PTL, the folio may get unmapped and
>  		 * freed immediately. We need a temporary reference.
>  		 */
> -		if (rc == -EAGAIN)
> -			get_page(page);
> +		if (rc == -EAGAIN || rc == -E2BIG)
> +			folio_get(folio);
>  	}
>  	pte_unmap_unlock(ptep, ptelock);
>  out:
>  	mmap_read_unlock(gmap->mm);
>  
> +	if (rc == -E2BIG) {
> +		/*
> +		 * Splitting might fail with -EBUSY due to unexpected folio
> +		 * references, just like make_folio_secure(). So handle it
> +		 * ahead of time without the PTL being held.
> +		 */
> +		folio_lock(folio);
> +		rc = split_folio(folio);

if split_folio returns -EAGAIN...

> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
>  	if (rc == -EAGAIN) {

... we will not skip this ...

>  		/*
>  		 * If we are here because the UVC returned busy or partial
>  		 * completion, this is just a useless check, but it is safe.
>  		 */
> -		wait_on_page_writeback(page);
> -		put_page(page);
> +		folio_wait_writeback(folio);
> +		folio_put(folio);

... and we will do one folio_put() too many

>  	} else if (rc == -EBUSY) {
>  		/*
>  		 * If we have tried a local drain and the page refcount

are we sure that split_folio() can never return -EAGAIN now and in the
future too?

maybe just change it to  } else if (...   ?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios
       [not found]     ` <67557c5b-afd8-4578-a00d-6750accc1026@redhat.com>
@ 2024-04-10 17:32       ` Claudio Imbrenda
  0 siblings, 0 replies; 17+ messages in thread
From: Claudio Imbrenda @ 2024-04-10 17:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, linux-kernel, linux-mm, linux-s390, kvm,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Fri, 5 Apr 2024 09:09:30 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 05.04.24 05:29, Matthew Wilcox wrote:
> > On Thu, Apr 04, 2024 at 06:36:39PM +0200, David Hildenbrand wrote:  
> >> +		/* We might get PTE-mapped large folios; split them first. */
> >> +		if (folio_test_large(folio)) {
> >> +			rc = -E2BIG;  
> > 
> > We agree to this point.  I just turned this into -EINVAL.
> >   
> >>   
> >> +	if (rc == -E2BIG) {
> >> +		/*
> >> +		 * Splitting might fail with -EBUSY due to unexpected folio
> >> +		 * references, just like make_folio_secure(). So handle it
> >> +		 * ahead of time without the PTL being held.
> >> +		 */
> >> +		folio_lock(folio);
> >> +		rc = split_folio(folio);
> >> +		folio_unlock(folio);
> >> +		folio_put(folio);
> >> +	}  
> > 
> > Ummm ... if split_folio() succeeds, aren't we going to return 0 from
> > this function, which will be interpreted as make_folio_secure() having
> > succeeded?  
> 
> I assume the code would have to handle that, because it must deal with 
> possible races that would try to convert the folio page.
> 
> But the right thing to do is
> 
> if (!rc)
> 	goto again;
> 
> after the put.

yes please



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios
  2024-04-04 16:36 ` [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios David Hildenbrand
  2024-04-05  3:36   ` Matthew Wilcox
@ 2024-04-10 17:42   ` Claudio Imbrenda
  1 sibling, 0 replies; 17+ messages in thread
From: Claudio Imbrenda @ 2024-04-10 17:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Thu,  4 Apr 2024 18:36:40 +0200
David Hildenbrand <david@redhat.com> wrote:

> Now that make_folio_secure() may only set PG_arch_1 for small folios,
> let's convert relevant remaining UV code to only work on (small) folios
> and simply reject large folios early. This way, we'll never end up
> touching PG_arch_1 on tail pages of a large folio in UV code.
> 
> The folio_get()/folio_put() for functions that are documented to already
> hold a folio reference look weird and it should probably be removed.
> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
> should really consume a folio reference instead. But these are cleanups for
> another day.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/include/asm/page.h |  1 +
>  arch/s390/kernel/uv.c        | 39 +++++++++++++++++++++---------------
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 54d015bcd8e3..b64384872c0f 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -214,6 +214,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
>  #define pfn_to_phys(pfn)	((pfn) << PAGE_SHIFT)
>  
>  #define phys_to_page(phys)	pfn_to_page(phys_to_pfn(phys))
> +#define phys_to_folio(phys)	page_folio(phys_to_page(phys))
>  #define page_to_phys(page)	pfn_to_phys(page_to_pfn(page))
>  #define folio_to_phys(page)	pfn_to_phys(folio_pfn(folio))
>  
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index adcbd4b13035..9c0113b26735 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -134,14 +134,17 @@ static int uv_destroy_page(unsigned long paddr)
>   */
>  int uv_destroy_owned_page(unsigned long paddr)
>  {
> -	struct page *page = phys_to_page(paddr);
> +	struct folio *folio = phys_to_folio(paddr);
>  	int rc;
>  
> -	get_page(page);
> +	if (unlikely(folio_test_large(folio)))
> +		return 0;

please add a comment here to explain why it's ok to just return 0
here...

> +
> +	folio_get(folio);
>  	rc = uv_destroy_page(paddr);
>  	if (!rc)
> -		clear_bit(PG_arch_1, &page->flags);
> -	put_page(page);
> +		clear_bit(PG_arch_1, &folio->flags);
> +	folio_put(folio);
>  	return rc;
>  }
>  
> @@ -169,14 +172,17 @@ int uv_convert_from_secure(unsigned long paddr)
>   */
>  int uv_convert_owned_from_secure(unsigned long paddr)
>  {
> -	struct page *page = phys_to_page(paddr);
> +	struct folio *folio = phys_to_folio(paddr);
>  	int rc;
>  
> -	get_page(page);
> +	if (unlikely(folio_test_large(folio)))
> +		return 0;

... and here

> +
> +	folio_get(folio);
>  	rc = uv_convert_from_secure(paddr);
>  	if (!rc)
> -		clear_bit(PG_arch_1, &page->flags);
> -	put_page(page);
> +		clear_bit(PG_arch_1, &folio->flags);
> +	folio_put(folio);
>  	return rc;
>  }
>  
> @@ -457,33 +463,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
>   */
>  int arch_make_page_accessible(struct page *page)
>  {
> +	struct folio *folio = page_folio(page);
>  	int rc = 0;
>  
> -	/* Hugepage cannot be protected, so nothing to do */
> -	if (PageHuge(page))
> +	/* Large folios cannot be protected, so nothing to do */
> +	if (unlikely(folio_test_large(folio)))
>  		return 0;
>  
>  	/*
>  	 * PG_arch_1 is used in 3 places:
>  	 * 1. for kernel page tables during early boot
>  	 * 2. for storage keys of huge pages and KVM
> -	 * 3. As an indication that this page might be secure. This can
> +	 * 3. As an indication that this small folio might be secure. This can
>  	 *    overindicate, e.g. we set the bit before calling
>  	 *    convert_to_secure.
>  	 * As secure pages are never huge, all 3 variants can co-exists.
>  	 */
> -	if (!test_bit(PG_arch_1, &page->flags))
> +	if (!test_bit(PG_arch_1, &folio->flags))
>  		return 0;
>  
> -	rc = uv_pin_shared(page_to_phys(page));
> +	rc = uv_pin_shared(folio_to_phys(folio));
>  	if (!rc) {
> -		clear_bit(PG_arch_1, &page->flags);
> +		clear_bit(PG_arch_1, &folio->flags);
>  		return 0;
>  	}
>  
> -	rc = uv_convert_from_secure(page_to_phys(page));
> +	rc = uv_convert_from_secure(folio_to_phys(folio));
>  	if (!rc) {
> -		clear_bit(PG_arch_1, &page->flags);
> +		clear_bit(PG_arch_1, &folio->flags);
>  		return 0;
>  	}
>  



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference
  2024-04-10 17:21   ` Claudio Imbrenda
@ 2024-04-11  8:24     ` David Hildenbrand
  2024-04-11 13:13       ` Alexander Gordeev
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-04-11  8:24 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On 10.04.24 19:21, Claudio Imbrenda wrote:
> Reviewed-by: Claudio Imbrenda<imbrenda@linux.ibm.com>

Thanks! I'll rebase this series on top of the s390/features tree for 
now, where Willy's cleanups already reside.

If maintainers want to have that fix first, I can send it out with 
Willy's patches rebased on this fix. Whatever people prefer.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios
  2024-04-10 17:31   ` Claudio Imbrenda
@ 2024-04-11  9:09     ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-04-11  9:09 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, linux-mm, linux-s390, kvm, Matthew Wilcox,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

[...]

>> +	if (rc == -E2BIG) {
>> +		/*
>> +		 * Splitting might fail with -EBUSY due to unexpected folio
>> +		 * references, just like make_folio_secure(). So handle it
>> +		 * ahead of time without the PTL being held.
>> +		 */
>> +		folio_lock(folio);
>> +		rc = split_folio(folio);
> 
> if split_folio returns -EAGAIN...
> 
>> +		folio_unlock(folio);
>> +		folio_put(folio);
>> +	}
>> +
>>   	if (rc == -EAGAIN) {
> 
> ... we will not skip this ...
> 
>>   		/*
>>   		 * If we are here because the UVC returned busy or partial
>>   		 * completion, this is just a useless check, but it is safe.
>>   		 */
>> -		wait_on_page_writeback(page);
>> -		put_page(page);
>> +		folio_wait_writeback(folio);
>> +		folio_put(folio);
> 
> ... and we will do one folio_put() too many
> 
>>   	} else if (rc == -EBUSY) {
>>   		/*
>>   		 * If we have tried a local drain and the page refcount
> 
> are we sure that split_folio() can never return -EAGAIN now and in the
> future too?

Yes, and in contrast to documentation, that can actually happen! The 
documentation is even wrong: we return -EAGAIN if there are unexpected 
folio references (e.g., pinned), thanks for raising that.

> 
> maybe just change it to  } else if (...   ?


I think I'll make it all clearer by handling split_folio() return values 
separately.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference
  2024-04-11  8:24     ` David Hildenbrand
@ 2024-04-11 13:13       ` Alexander Gordeev
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Gordeev @ 2024-04-11 13:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Claudio Imbrenda, linux-kernel, linux-mm, linux-s390, kvm,
	Matthew Wilcox, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Janosch Frank,
	Gerald Schaefer, Thomas Huth

On Thu, Apr 11, 2024 at 10:24:23AM +0200, David Hildenbrand wrote:
> Thanks! I'll rebase this series on top of the s390/features tree for now,
> where Willy's cleanups already reside.

Yes, rebase it on to of s390/features, please.

> If maintainers want to have that fix first, I can send it out with Willy's
> patches rebased on this fix. Whatever people prefer.

Many thanks!

> -- 
> Cheers,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-04-11 13:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 16:36 [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 David Hildenbrand
2024-04-04 16:36 ` [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference David Hildenbrand
2024-04-10 17:21   ` Claudio Imbrenda
2024-04-11  8:24     ` David Hildenbrand
2024-04-11 13:13       ` Alexander Gordeev
2024-04-04 16:36 ` [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios David Hildenbrand
2024-04-05  3:29   ` Matthew Wilcox
     [not found]     ` <67557c5b-afd8-4578-a00d-6750accc1026@redhat.com>
2024-04-10 17:32       ` Claudio Imbrenda
2024-04-10 17:31   ` Claudio Imbrenda
2024-04-11  9:09     ` David Hildenbrand
2024-04-04 16:36 ` [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios David Hildenbrand
2024-04-05  3:36   ` Matthew Wilcox
2024-04-10 17:42   ` Claudio Imbrenda
2024-04-04 16:36 ` [PATCH v1 4/5] s390/uv: update PG_arch_1 comment David Hildenbrand
2024-04-10 17:19   ` Claudio Imbrenda
2024-04-04 16:36 ` [PATCH v1 5/5] s390/hugetlb: convert PG_arch_1 code to work on folio->flags David Hildenbrand
2024-04-05  3:42 ` [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1 Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox