From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8E2EC3DA6D for ; Fri, 16 May 2025 12:40:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB6356B015B; Fri, 16 May 2025 08:40:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9ECC56B015C; Fri, 16 May 2025 08:40:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77E2A6B015D; Fri, 16 May 2025 08:40:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 540C76B015B for ; Fri, 16 May 2025 08:40:00 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D682280374 for ; Fri, 16 May 2025 12:40:00 +0000 (UTC) X-FDA: 83448728160.23.49D1695 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id B6C038000B for ; Fri, 16 May 2025 12:39:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XLx7u4yy; spf=pass (imf30.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XLx7u4yy; spf=pass (imf30.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747399198; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hiLZZy7QAtq/50XMExKGpNPia0HeTXt82qBD/QjAX4s=; b=qtCJx7RBPKXw54zGRumPnBQdiM7Y5biUu9K3g2qxqVrVDUzv+8a8NwTc2ujY+DeSE7NIaI 8bMKq1xhDUA1EP3rgGxRAB4iU1PWRdIKPbjYl54maCnNIWFV84hApYkzgiMMIhBKVWpGyS FBsab3wTL5qs5CEy3uMVuawsBgvpTJw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747399198; a=rsa-sha256; cv=none; b=Fac20XgSY+DoX1ao8D5Q9QJv+DGGp3GR78x656mVmDh+s9Gkucjw1a4iFgBZZ1zZyQU7ZO W/nU76I5EI8+cBUOw9eLoEtwTIzfXXMbpzl71uvw4fsD+AYDcy7zVVV2LKhq3FHCLm0jc6 oNJ8rc+S2aB8FYu3xou53ZGTXQMSDaA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747399198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hiLZZy7QAtq/50XMExKGpNPia0HeTXt82qBD/QjAX4s=; b=XLx7u4yyIBKGuwaWjpvaVo0ENXq4aXzpINKoUoo989bdfPVzAfI9NALX0rSiJCZzY/2XJW NYNz/cIH4qO3OLE3Z9nOwCFkw5acFhIe28Fy7YS5H5EV00XcS3Q6hClYqu7euecz0jMNoV ubsrQacJ33ImgZQIFeUPmMh29KaNhq8= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-32-xPcmQHkzOKSn_Lk0NT_hww-1; Fri, 16 May 2025 08:39:57 -0400 X-MC-Unique: xPcmQHkzOKSn_Lk0NT_hww-1 X-Mimecast-MFC-AGG-ID: xPcmQHkzOKSn_Lk0NT_hww_1747399196 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43cf172ffe1so14772165e9.3 for ; Fri, 16 May 2025 05:39:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747399196; x=1748003996; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hiLZZy7QAtq/50XMExKGpNPia0HeTXt82qBD/QjAX4s=; b=rlzeMNbrJqhBDx3Jhwcyh4AQD+1jEcEJ1cnNBdynzMppbiGl7NAK+y7U7/27buIecU NP7djK9nuSj4JVJefou3UD7vVAw5kRlyRcCUEvhfBthAUxpKKc4tcmt9yoRh9F02Ouun LdgD6A8Jr9Fvmn+uXeE+ZeyyzayqQ/CT7ZoHQFfvquOjyBi+vHOyhP5uanv6aQcO7ptx 6xJ/EA2qrTgPzJz9pE4DvydhiWgTmri86HDQfkxahasZNqZlxqRy+NJDm5bZDYHbg6DK JBdrDIcThyumeeeVk1/aW97wk2kGebZnPcRpdna2AicVM6wSs5fWf/jDWzXUDHCqSm/t 51+Q== X-Forwarded-Encrypted: i=1; AJvYcCUBTB3fHRRu68eGFqlzGpumhPcOGH7LCF6ICTuPUzwEVPg9p54gafO8s+fCT0wHpFFHUohKYVZ7DA==@kvack.org X-Gm-Message-State: AOJu0YxGvQRSI0jl8Lkobjw5+APLN15Zvielm8eVCCP+6LQqzWXEoe8I ZUyOu9VpCMTQ4q5ckxFHBWTNPDkALUV6Nt/j82ZvdEH2WtCk26EnANoCtRoSC0+JnaRzUa5KrXn L4fW4GjEeWaJN6jLhEn0Bbs3SYbgekxSobVMwBiggX7L9fkDYB2pu X-Gm-Gg: ASbGnctkcRQ68Dd2Ta7cQukhYRY4WwoPeh+Ki6v3YiwSJBgLBoEuMIFx4GH0mJGTP9C z2t+eJRpjaHmCZT7/wpOpnf6DMopdwSQ8NJWIJxCNzVbkq0/CfP9prtuG1qOM0UC26Jqapm8MdC PdjNbCXwEdbN5n4ELgftdtTerw6ZLnZFThPCXWa8fKTQHVxxzs7NjnIhvA07uagTwdSY0uiRzey rI0ZbDsMf6VT4szR1O/TT1oN95nGz7j/udSc0jXk0zJTrIXFJzKvLZhG1v8scg1aIiwBI8jmnN6 rsSArGPtvfzFleEHg0hHFviVDK6wX4qKlDsVYgmdDl91C9ZiY7bNWxRguwaRm2DSgO6ojedP X-Received: by 2002:a05:600c:c1b:b0:43c:e481:3353 with SMTP id 5b1f17b1804b1-442feffbb8dmr28003495e9.17.1747399195779; Fri, 16 May 2025 05:39:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHPCjYc5Wqc+yI//ttOUMN3UV92gYVHaHXlQYA3K8J/bBkEM9DoFCaA+VbVbF4V9/mI2J6diA== X-Received: by 2002:a05:600c:c1b:b0:43c:e481:3353 with SMTP id 5b1f17b1804b1-442feffbb8dmr28002955e9.17.1747399195319; Fri, 16 May 2025 05:39:55 -0700 (PDT) Received: from localhost (p200300d82f474700e6f9f4539ece7602.dip0.t-ipconnect.de. [2003:d8:2f47:4700:e6f9:f453:9ece:7602]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-442f3380498sm108750375e9.11.2025.05.16.05.39.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 May 2025 05:39:54 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-s390@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Thomas Huth , Matthew Wilcox , Zi Yan , Sebastian Mitterle Subject: [PATCH v1 3/3] s390/uv: improve splitting of large folios that cannot be split while dirty Date: Fri, 16 May 2025 14:39:46 +0200 Message-ID: <20250516123946.1648026-4-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250516123946.1648026-1-david@redhat.com> References: <20250516123946.1648026-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: F8CF2S2KCd3-eF08XMzIDlGtLk-pOLlSECkg4QZAq9k_1747399196 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B6C038000B X-Stat-Signature: qyutf3mu37z1dct3xwnpjzyjg79pzgi3 X-Rspam-User: X-HE-Tag: 1747399198-78097 X-HE-Meta: U2FsdGVkX1+fSY3XS8n+GC2PBoX/ZvtozQtcE5WZbTrgD0M8ZD6RTnThdWsAIr+hN1HbC5ylU5IKVLLLi5fXlx9oRqRiOq6HOGEzhG7VzUFl8g9Q07u6yhpECaUrMrOd525GkcDZ8caAaPvtIpUqwwxFI6W+JmF11cyKJpGNexa2DFA8jKYcCNJQet0EFJF20vkeeNGF/Xw4FsbZTqexyVx8zj7gkBWKJtMFbZJPNjei6uqyoow0an5dE61zbsSMLyfEMNYU6E7TaXNP/q6dZoX60fZ1BsQu4Ws00YbtQ4ubH5ar/zwbe81AA6FiIMY2jc/yarIEDN6r6gnW34+ZipMhV30jL8n+lNMdJy4FVONjA4CEC6AMn+40/jTl3WrZ3UbbyZTSfiQkmV4dMxuqUozZuEG1BLPw2WZUhLKcUsHoFspn/LZ7ccHcwasMSn8QE9Tznif3f8LH2WYtI4g7Jv/3oXS9kMYEnCEl3qNbXkrtCyMpbOBB3aKEd3fLeQa2wyoj3JO88g0+76dd2THA18k42Aaukd+YKO1TiEwV+iT8KX8SHY+skulwRwCE2kt8P8iSarBAPo+n9o3OWtLjiLJHB6Ck4+uP0L6P6W+3xatWqdtI0K3NQHtxLhzkxJwX++K/DbinQLgz9qeRPOAJI/B7stcbVgtVGRkiF9AjUuaJ+SXif/4xIrDMDr8YWX2q3f/6Wv1Pw574ZUvXrVsESS5pqlx+cQUKaNf6c/5POSi77yfcCRaBBsJ9phLA+wg8dck+paRbVTE9+O0JGkxRUPyQSUX99tN4+kTXv2wem+jpa/e5kwOr56cgU05AiJJplv/+idKxhsJF42nlkkefNh3lvmrUcFvRXV9wKNE+ZGueIsGqFlHrrvo4HxTK8UMeB3Q1Vx2l652dVjOosJaVOzqSSdEpHCb5DZuvyME1Rqtq/M4otNP/8K2dPr+LpqJU2MY/STAbEqrg6EIDxBg CCgzKd+3 0iuhCCjLm1CpuCPMdm+RRGAphsSth6DGMF4yiPFfIGDg7Z4tePsKQHFM9fQ9h2nT9R3qOdMje9Iz3hTOAJo6uHRncKJypcSBHWvrsab/Zm3NnJgw2xm9qGiQKPz97sivA3p/pVvHaVXpMyc4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, starting a PV VM on an iomap-based filesystem with large folio support, such as XFS, will not work. We'll be stuck in unpack_one()->gmap_make_secure(), because we can't seem to make progress splitting the large folio. The problem is that we require a writable PTE but a writable PTE under such filesystems will imply a dirty folio. So whenever we have a writable PTE, we'll have a dirty folio, and dirty iomap folios cannot currently get split, because split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() will fail in iomap_release_folio(). So we will not make any progress splitting such large folios. Until dirty folios can be split more reliably, let's manually trigger writeback of the problematic folio using filemap_write_and_wait_range(), and retry the split immediately afterwards exactly once, before looking up the folio again. Should this logic be part of split_folio()? Likely not; most split users don't have to split so eagerly to make any progress. For now, this seems to affect xfs, zonefs and erofs, and this patch makes it work again (tested on xfs only). While this could be considered a fix for 6795801366da ("xfs: Support large folios"), df2f9708ff1f ("zonefs: enable support for large folios") and ce529cc25b18 ("erofs: enable large folios for iomap mode"), before commit eef88fe45ac9 ("s390/uv: Split large folios in gmap_make_secure()"), we did not try splitting large folios at all. So it's all rather part of making SE compatible with file systems that support large folios. But to have some "Fixes:" tag, let's just use eef88fe45ac9. Not CCing stable, because there are a lot of dependencies, and it simply not working is not critical in stable kernels. Reported-by: Sebastian Mitterle Closes: https://issues.redhat.com/browse/RHEL-58218 Fixes: eef88fe45ac9 ("s390/uv: Split large folios in gmap_make_secure()") Signed-off-by: David Hildenbrand --- arch/s390/kernel/uv.c | 66 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 60 insertions(+), 6 deletions(-) diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index f6ddb2b54032e..d278bf0c09d1b 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -338,22 +339,75 @@ static int make_folio_secure(struct mm_struct *mm, struct folio *folio, struct u */ static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio) { - int rc; + int rc, tried_splits; lockdep_assert_not_held(&mm->mmap_lock); folio_wait_writeback(folio); lru_add_drain_all(); - if (folio_test_large(folio)) { + if (!folio_test_large(folio)) + return 0; + + for (tried_splits = 0; tried_splits < 2; tried_splits++) { + struct address_space *mapping; + loff_t lstart, lend; + struct inode *inode; + folio_lock(folio); rc = split_folio(folio); + if (rc != -EBUSY) { + folio_unlock(folio); + return rc; + } + + /* + * Splitting with -EBUSY can fail for various reasons, but we + * have to handle one case explicitly for now: some mappings + * don't allow for splitting dirty folios; writeback will + * mark them clean again, including marking all page table + * entries mapping the folio read-only, to catch future write + * attempts. + * + * While the system should be writing back dirty folios in the + * background, we obtained this folio by looking up a writable + * page table entry. On these problematic mappings, writable + * page table entries imply dirty folios, preventing the + * split in the first place. + * + * To prevent a livelock when trigger writeback manually and + * letting the caller look up the folio again in the page + * table (turning it dirty), immediately try to split again. + * + * This is only a problem for some mappings (e.g., XFS); + * mappings that do not support writeback (e.g., shmem) do not + * apply. + */ + if (!folio_test_dirty(folio) || folio_test_anon(folio) || + !folio->mapping || !mapping_can_writeback(folio->mapping)) { + folio_unlock(folio); + break; + } + + /* + * Ideally, we'd only trigger writeback on this exact folio. But + * there is no easy way to do that, so we'll stabilize the + * mapping while we still hold the folio lock, so we can drop + * the folio lock to trigger writeback on the range currently + * covered by the folio instead. + */ + mapping = folio->mapping; + lstart = folio_pos(folio); + lend = lstart + folio_size(folio) - 1; + inode = igrab(mapping->host); folio_unlock(folio); - if (rc != -EBUSY) - return rc; - return -EAGAIN; + if (unlikely(!inode)) + break; + + filemap_write_and_wait_range(mapping, lstart, lend); + iput(mapping->host); } - return 0; + return -EAGAIN; } int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header *uvcb) -- 2.49.0