From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82B6AC54FB3 for ; Thu, 29 May 2025 21:45:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C52896B0088; Thu, 29 May 2025 17:45:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C05DC6B0089; Thu, 29 May 2025 17:45:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF4B66B008A; Thu, 29 May 2025 17:45:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9110F6B0088 for ; Thu, 29 May 2025 17:45:48 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0F1C3BE52F for ; Thu, 29 May 2025 21:45:48 +0000 (UTC) X-FDA: 83497277976.24.8A1C7A6 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf14.hostedemail.com (Postfix) with ESMTP id DE51210000F for ; Thu, 29 May 2025 21:45:45 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=bl45ywHV; spf=pass (imf14.hostedemail.com: domain of "prvs=224491a944=amastro@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=224491a944=amastro@meta.com"; dmarc=pass (policy=reject) header.from=fb.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748555146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=D+IQI7hzYY4AdulWkAeQeVSVYH2EOVQ0gfgjGpBkn+M=; b=uhq4lakIDVwQSlIUw44wa9zvjFAvpgpMLtlRv52siPQNvhVXEqYFxxaORFBK4SXMHCSWb7 F4Mgw9igVVW4FxxNXQSOujPlNXAwjQXxzV7AJXlaINIKGsprG97LiUNt4tu4RXkxNKcyaI zQFzedi84wd475Bvpzck1N3hDYwtDkg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=bl45ywHV; spf=pass (imf14.hostedemail.com: domain of "prvs=224491a944=amastro@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=224491a944=amastro@meta.com"; dmarc=pass (policy=reject) header.from=fb.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748555146; a=rsa-sha256; cv=none; b=B+KkMfph4sZGVUeYDpyXWLk80+Ffvkut/b6S1tZGeV/gC8L7YTQs2RDdNpUxTFrPOCVds4 CHM1Ocs6yaH/SwHetGtTHNfDCD0dN93+P5f9f940Ma2vmmYHe3kTtjzBODzSyY54OViI1I 2+/d/ugW/nEd0AwUggeF+lc8Zs1ZuDs= Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 54TLjh61004770 for ; Thu, 29 May 2025 14:45:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:message-id :mime-version:subject:to; s=facebook; bh=D+IQI7hzYY4AdulWkAeQeVS VYH2EOVQ0gfgjGpBkn+M=; b=bl45ywHVXtgR56V3eZOByDZ+5tuWz0lz3cILXPo zbOU/8xYy4hjIfp5fvWWi6DG5oNZq7efhZU7ea+QpS1NIWqx0ZbTY3/fgaP/ymX6 D96Obhbvd0/H7Ot4zq5fT0HKODtruUVyWVzAVbkGFlBX13kZqchaoP7eOL/Bi1LC JvFk= Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 46wvb2pd63-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 29 May 2025 14:45:43 -0700 (PDT) Received: from twshared0377.32.frc3.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1748.24; Thu, 29 May 2025 21:44:29 +0000 Received: by devgpu004.nha5.facebook.com (Postfix, from userid 199522) id C78E440913C; Thu, 29 May 2025 14:44:15 -0700 (PDT) From: Alex Mastro To: CC: , , , , Subject: [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO Date: Thu, 29 May 2025 14:44:14 -0700 Message-ID: <20250529214414.1508155-1-amastro@fb.com> X-Mailer: git-send-email 2.47.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwNTI5MDIxMiBTYWx0ZWRfX0c4x7J+syI4+ qnNb27FujOkTXEciNYUdmFc3sHTn8n1N/K0yFZRTo9p98kdmgGcgFnL2XZoeH3KKhjskTYb9I11 gqEOPZVmBT7EcvxfmPzn4LDdY3yH2R6i/zLDYO4+hDXdA1b9zWAXAi5MqG4LXOeIQ49WwdGUTgn olCVdNmiBvWVx3cb5DbK/CyJLAbhatyBTsqx86v+awVZ+XeWN/Td3uXIMdPgIsmpm/tCkTM2vLq gTBISZ+xXSh9K72uxmb6G2Od0cFwvGjGKasNt6/0EbIrNPmKBHD54QgowfwBBPVpbiZIBndCkjT G2PQGsp0iAtuZ32YEGSH/SVVGc7kHp23Emlb0f0rFvlLxfnlZILerK4f7cSF+DgKZmnt0xrjQ1p SSqdRHHs1eDIVkb/PVmgQvxGse01OnfXtucAUKo3HuSXkmn/AVsI6VtjpGiIayn3lBDupRMY X-Proofpoint-ORIG-GUID: XOFS177ZxMdAUuyuTIsB9KkJtQgxl70D X-Proofpoint-GUID: XOFS177ZxMdAUuyuTIsB9KkJtQgxl70D X-Authority-Analysis: v=2.4 cv=Ca0I5Krl c=1 sm=1 tr=0 ts=6838d587 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=dt9VzEwgFbYA:10 a=VwQbUJbxAAAA:8 a=20KFwNOVAAAA:8 a=7NlN7ispKPTGJu6vY8wA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1099,Hydra:6.0.736,FMLib:17.12.80.40 definitions=2025-05-29_09,2025-05-29_01,2025-03-28_01 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DE51210000F X-Stat-Signature: s5uknkeehp1zbet36n3zg9gawgqw1qkd X-Rspam-User: X-HE-Tag: 1748555145-540630 X-HE-Meta: U2FsdGVkX1+dqnSTjf89yqiBrHf4MN2UsaD5cKruDBxPeL9tBvDsUjtc0q5sX1XS8NFSDH17ybmIQF4pb8a4BNP19aEbf7dBxxcFgcZXo3pqa5ymVJUqTtUX4rB8nTTZ2QrcHfxZDHarN5GT6cA/zGRhDXYwX8NrYp5DmRH2nk9xu70Y4qpwInw2kfif4k3CnwVwaRbcBGIg1zBymtwI2WU9X9yY6piTU4/eeQNuaaIbkd/Iulm6xC6gzDMNXMWu6wDkC5mb1o4C3o4vIoJF5HNKFlMT1vD8uqRyrhW5u0qYMdP1IojPQL4OVaBrCGjA/zXNqbG0Ig6isV5zhD3+TILjVva5y9rpHPbEcp3zyvkQLbj3KtBUZQttulibrRSrwD5vSh/DdFCOQtckc+oycp+XMUxHtNwF3LKWzy/HKiu8El6Gnf/he1Fe9N8W30wXq9zboWfnR2euriLfOo4TLhVZ4ftuJ/oZFe5Inj1E7Arl4vEZOWH6dsu2qVuJvfTi9E3bUT4W0RIGgtYIZC8VoPDV0zpTnvrfG5oeUflzRUvIh2GSUPaYHMvEUGHjpTO84Xb/SUYP8CRyyouktCmq2Jrb1qXPt0Q/wIE4pBiOzjullrlzNsZkVsne1k3CcWGnrxijVKxybA/e/AdozLv6nkJhd1sMU4c2kD8KDz1aB2p7B3QxE7fTq1qUuyWzA/j4pBDV69aKys88EGGCmDnBhBNEsbl52Hv3bd/WN7fPrwKWoUZQbtkkimLPHKJ+7PcrlWCnna4PPZY4KUjUNjfQq2sP7NIglZEMn+AJ0QuJKN6QVBfkRgtcEP/B5ny0juyfLjYuwLMOEGkbX1iH1C47n9ztxFTFtuDKWIeIvrnRVymLsrYYw6XI5n0xShrEcoVd3/DjJzdLoXZid8lMfytzsN8Au4ifFc8b9mniu6rpBa8TtAdbCT7ZRjsYHS0ecJtznVedCa6+KRWiiGXaRz5 QyGy/FKE NsxUyTfCLNMzgQY2Ib1AUd3fq2JRL+rdPP6r9HvNVrbENBfCuCMddrso9FpoaC1pzQP5QWSArTmAT4BtVl1p/Ig8sLNNTev3TA1C+GdO6pXNhf0pfkWAv9U1xCiljIKriqPNHhbwJdTXxa4jmL4htkkCuf8N8nve7pJszSKNGHxMOlBUlb4VJMqgbfBMYrgT2lcWeJTPSINXfjl54pMMB0t9dhZ4ag6vCnUy1/PZ/U0cZQL5g3zkJhJ8QEszquyx1WfmG/wJJ5mtpYjOZdN1nlnbUISK8Zhj6hCkrAU03AXxm57UpmjwVHp26uYaC1p7fBh9JCvTM2tZw6WCnbQD7cKJJ1ofJsBOcW4P/iBgI2yjQnhjja3koc4vzq1vEnBKgZeCdxc3AnZNeFg9m6ax1dxLsD/6ASBz1bdpkIhOGUtqbgAB7PM/7BXUtmkE5b23YeUmX/+IqxERUH7eKKVHoeuaM3CUIkyz3EM8wPpBh5p1mCyG0I5slQM/Ed+rR2r26Hyp6ul4EVuAtztlb9HgpVblHeyVhbaHwwgZj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, We are running user space drivers in production on top of VFIO, and after upgrading from v6.9.0 to v6.13.2 noticed intermittent, slow performance l= eading to "rcu_sched self-detected stall" when issuing VFIO_IOMMU_MAP_DMA on ~64= GiB mmap-ed BAR regions. When doing this on enough devices concurrently, we triggered softlockup_panic. The mmap-ed BAR regions were obtained from mm= ap on a VFIO device fd. We map regions > 1G, which sometimes do not start at 1G-aligned BAR offse= ts, but they are always aligned by at least 2 MiB. We determined that slow, stalling runs were correlated with 4 KiB-aligned addresses returned by mmap, and normal runs with >=3D 2 MiB alignment. Inspired by QEMU's mmap-alloc.c, we are handling this by reserving VA wit= h an oversized mmap, and then clobbering with MAP_FIXED at a good address insi= de the reservation with the mmap on the VFIO device fd. At first we settled for aligning the mmap address to {1 GiB, 2 MiB} exact= ly, and the stalls disappeared, but then improved performance with the follow= ing: We found that the best addresses to pass to VFIO_IOMMU_MAP_DMA have the following properties, where va_align and va_offset are chosen based on th= e size and BAR offsets of the desired mapping. va_align =3D {1 GiB, 2 MiB, 4 KiB} va_offset =3D mmap_offset % va_align (addr_to_mmap % va_align) =3D=3D va_offset Using addresses with the above properties seems to optimize the count and granularity of faults as confirmed by bpftrace-ing vfio_pci_mmap_huge_fau= lt. We then backported "Improve DMA mapping performance for huge pfnmaps" [1]= to our 6.13 tree, and saw further performance improvements consistent with t= hose described in the patch (thank you!). However, with the backport, we still= need to align mmap addresses manually, otherwise we see stalls. We are wondering the following: - Is all of the above expected behavior, and usage of VFIO? - Is there an expected minimum alignment greater than 4K (our system page= size) for non-MAP_FIXED mmap on a VFIO device fd? - Was there an unintended regression to our use-case in between 6.9 and 6= .13? Thanks, Alex Mastro [1] https://lore.kernel.org/all/20250205231728.2527186-1-alex.williamson@= redhat.com/