From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B09AC5AD49 for ; Fri, 30 May 2025 14:25:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A6196B0139; Fri, 30 May 2025 10:25:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 65C0F6B013A; Fri, 30 May 2025 10:25:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 545986B013B; Fri, 30 May 2025 10:25:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 367FA6B0139 for ; Fri, 30 May 2025 10:25:10 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C619A1A0910 for ; Fri, 30 May 2025 14:25:09 +0000 (UTC) X-FDA: 83499796338.28.799E55E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 8AC6814000D for ; Fri, 30 May 2025 14:25:07 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="c/cowytw"; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748615107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bOYqdeSVVHHCBrZDdW1OJnbvgA1dHMuohxeMIrVMI3U=; b=IV58NyFmbCaR0MS+ZA8eVwB5Bt36iiz4wcLuo7HmFKFDyVlKTgbKJzq4NqfUYs+zMIyrH3 3PqJKsCw0tN7n/bAId3i+1qwR9PmAsB0jIn0vk9/7kwlzAWD0okjRvJqazfZo/B4u+ybIO vnbTidwxa1A17LSaK52n5hR1krigBpA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="c/cowytw"; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748615107; a=rsa-sha256; cv=none; b=VN5Bq6xoqbF1Xr59iSzypwkzyRXfVjm6M1u06bvbysug3VPPNAf/+w9GYg20VAH6rYPQTp +heJfN9toEnIkJpWqTxKheVJ44a9H2DO9QXgKH3hE6FG5tLWVL/yUD2gQMitSAD/tcem+u JZgmOdt+HL2U9AJBbyMur5dIQvDzc/Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748615106; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bOYqdeSVVHHCBrZDdW1OJnbvgA1dHMuohxeMIrVMI3U=; b=c/cowytwdIC51bLO/vkzMZp38nRiELNElzf7fPAUMsUiKr7sUR8lKYkyxsz6SQPXar4Ry0 W+hi59jYXLX1L7dWAyyIEAXsMGX99JORJUPwhZJ10BVwg2jnLGhJaQx4T/izL2WiiJ99Qp pnHj3hut/sQJ0Gs+rU6vZ00SBtZzcaw= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-536-9SC07PqvPc6ADWKUjXyUng-1; Fri, 30 May 2025 10:25:05 -0400 X-MC-Unique: 9SC07PqvPc6ADWKUjXyUng-1 X-Mimecast-MFC-AGG-ID: 9SC07PqvPc6ADWKUjXyUng_1748615105 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6fac4b26c69so22521016d6.3 for ; Fri, 30 May 2025 07:25:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748615105; x=1749219905; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bOYqdeSVVHHCBrZDdW1OJnbvgA1dHMuohxeMIrVMI3U=; b=D4NW2uR4CAiF5nsZGK0/QvfkibXSy6209hoCs58gMqp3JTshXx5K9FgY5TQd/fbZPx 36apx+Fhzzcs7EdYQcsMdVKV0Qet/xd1MwOrhdpGycaajLWflGKOInrOpYYIGoSmX8yh kEufG8dMjT3HJqvZhCWLwTsjgnA8RWqj+FI6ffPyBFSxu3JjxDNhEbXS55pxO3CaXogf 9vZM21GJAIOHsu3huVwi1vKAe3urKOzZjCbsmIOIU1bJvBIJu53c3F7umIoE4G05PEJx CvEBfm4KpjQNhLQg1vU8DpO+SStTOyOVSkzgdJaD6JjApJjHRcLnVJmQ9Q8nA/OUzYAo vcNg== X-Forwarded-Encrypted: i=1; AJvYcCU4BE1ur6DyyhVHKu84PTKrhyovTaO6o2uCO09MYdQuzOVnne76QbG1sOVk3tljz32nMgjdV+69oQ==@kvack.org X-Gm-Message-State: AOJu0YxFW2NQQmf8/4ZhTZGHB77ma3y1DuaSgyrUUmsTCM+L14G3Zxfs yI/Y9Z1tBDIm9YPNnXZTPIAJCvpK19VoWvN4/M2lvguFDb9rqs8vQ75Y2V567kx2ghp3OIrYj8o mTvLXS1Zce3LoK/QhANi4coG2Zpxwq3mQFvD3gJsR55uhgH8iuy7B X-Gm-Gg: ASbGnctRA3oT0G2KTETeMixPcNUrmK1ClUi1OCfyOc03i2mGHjh6Eo/il72ysmz+IPf eoqZ+d1wT1Fpk15ghySYGaH20Zq5iqA5BJm1/qk+cVY52L0mZmqUe2oM9j0b3gXYPOmxXasDFm0 rMrcXjePg35nWfp9nnTwkJTpVoQJCob37sfdUM9qUOqn7cNDpWWY5iaC1ToLOzuMrwyA078jE9o k76Ui/Q1No9aIr6UR8u3NVTyjmSkj+gqfQCN/NSd6J38g+i5dnUCAKjx5NaTEvfgp8X8JVC3Bbj vuo= X-Received: by 2002:a05:6214:2524:b0:6fa:cc39:9f with SMTP id 6a1803df08f44-6facebe24e7mr63185456d6.32.1748615105018; Fri, 30 May 2025 07:25:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHm9OBph7eo8hVyCeG0tgAA+iOUtABp2JLodq1KoZQqWU0Q1fuwR2uKkDV3RhT/uCvdYKTowQ== X-Received: by 2002:a05:6214:2524:b0:6fa:cc39:9f with SMTP id 6a1803df08f44-6facebe24e7mr63184806d6.32.1748615104496; Fri, 30 May 2025 07:25:04 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fac6e1a73bsm23367036d6.107.2025.05.30.07.25.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 May 2025 07:25:04 -0700 (PDT) Date: Fri, 30 May 2025 10:25:01 -0400 From: Peter Xu To: Jason Gunthorpe Cc: Alex Mastro , linux-pci@vger.kernel.org, alex.williamson@redhat.com, kbusch@kernel.org, linux-mm@kvack.org Subject: Re: [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO Message-ID: References: <20250529214414.1508155-1-amastro@fb.com> <20250530131050.GA233377@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20250530131050.GA233377@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: w_6EaafdvrMrxx7aB3q69rj7r-ErR6uStpyePhA25M4_1748615105 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8AC6814000D X-Stat-Signature: 3k9qjfaqtehnxdk9m5omiqk1ug78tc9b X-Rspam-User: X-HE-Tag: 1748615107-460482 X-HE-Meta: U2FsdGVkX194R80lXzVcTP6s5CcJFa9yDPQz98rf07igq4A1ej0NeQuSqbRkJ9jQ+F4GUTpZRhB7RZxiM/idiFrRo8aunA2GlnlJ0QfZ0wCUB0Y0Tip+9Mvs/ioKq3mV12+7nGhBxT2GvL6UnczMxE5T9dx5DTPhg4Pr8IisfvGX4816NdPdlYDILY0leNHrQddnlQiiTDrB0Tc5ySOjV3BFzAaUSBKNQWrrs2YSdVlBByqgETOd7yJMj9Opxpe1fJdSIIRE9aEU8DTwJmzdy6rsYn9dR18YqgcthjYMzizYhVBLiz8aqOzbN//IPesQsAAr1UZpW0aedjaatWlKhVNAA1KKYOY6nX/J2BSmnLAIiwiWSJbo/dQT3VMHO9/fBMGna4uiO9SMS+z1UD8rry9hSeG1RFEH+Visqc+JaWIo7KW68/ttgXgzK2CLLnlAsF4vT/1ejLULFHx7Bo3jdFD6qYhV9lEsDNcdiXEiVFDq89JANV/7gI0vEZ2Zr0XqxiUSvgglFhY9C+iFk002wA/YHmXmT96pEGIXXVLhW7RK22nGLYeSov+XVETGrR0Y5sV5y0asZVqm0hLQqfXrb38Yu2qP/LmCq9ne9E/iDrzTsJoNIDmeWBWEGxlkhQa4XaaVZoEzKZ50BSIl/BhFGpVPTrL+yWe0wBhRks53wLBqsc09yW02TP6KUgdXJZE1YiHLGAjh34nbNROUkk3jCzqhgp+r76dL+QY3JjXVwsvkdquf5kDSOTRgRc2iT9DH1SkgbWgBSB5c1Zn6BMw79krWwb71uIwIkhYJ1pxauSntf5exSxwAarBTy+wPUXRgAjirubInNqquZOGHJMr9gl5UVx1ZLuvwrPlWw+iLi0qucN88SS+EFMATuwjH24vLF6nneuWLzA9WRscUq1qQjj++dpQe13AcIOF/DkCzgGihCPQhmBL/sJPmm2hcZwPzbX6TNpzVY86M7uMsXhH F9aAo0IT oiyIIOa5OgY+WW8ptUjMlAQGXfMhUZYxcKSEw0iCMHsxOESkfegl4rqrrygaQwO76SxX6f9HqUe1B/Z5u7xmknEIflbFiJvgWXC+SVMwp4a5w96QnhGwL50VDkls6Se2Manx5a6FHCBZFnI151MXChf6DSLbUveO9GSNdWeQKwi1VAU7LnaJqx2l+EkeAcxgcaMtNaSk7IvOw9Q16Y7j1aBYBLNBmsRHTN0IIkrF4BonLH7a9bPGdS4xv1ezcMF31m9HCByTC15jeIX6JmQsjjRWA7TwJEWPccOMguzcKopwO7UUkC8nUBKYr3QcAWZRwCb+HaEVpjCSUGmdBnQDcYq5h5nXKq22D1jXlV8iTtfYy3VVMeCECdOJ30+DkCJiMovrRCssAYKTiTOHoviq6cKAxNKp+xi6oQ2d7fZlfrhizNl0kDvEz46/bkK9ght3yi5Cbj+2hNA7dz5k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 30, 2025 at 10:10:50AM -0300, Jason Gunthorpe wrote: > On Thu, May 29, 2025 at 02:44:14PM -0700, Alex Mastro wrote: > > > We are wondering the following: > > - Is all of the above expected behavior, and usage of VFIO? > > - Is there an expected minimum alignment greater than 4K (our system page size) > > for non-MAP_FIXED mmap on a VFIO device fd? > > - Was there an unintended regression to our use-case in between 6.9 and 6.13? Probably due to aac6db75a9fc vfio/pci: Use unmap_mapping_range(). IIUC the plan was huge fault could bring back the lost perf, but indeed the alignment is still a challenge to at least always make right. > > I think this is something we have missed. VFIO should automatically > align the VMA's address if not MAP_FIXED, otherwise it can't use the > efficient huge page sizes anymore. qemu uses MAP_FIXED so we've left > out the non-qemu users from this performance optimization. > > To fix it, the flow from the mm side is something like what > shmem_get_unmapped_area() does. VFIO would probably want to align all > BAR's to their size. Good point! I overlooked the VA hints when QEMU doesn't need it. I can have a closer look if nobody else will. > > Which seems to me probably wants some refactoring and a core helper > 'mm_get_aligned_unmapped_area()'.. > > I think if you are mmaping a huge huge BAR it is not surprising that > it will take a huge amount of time to write out all of the 4K > PTEs. The stalls on old kernels should probably be addressed by having > cond_resched() inside the remap_pfnmap(). Right, but then that'll be a stable-only fix. If VFIO can provide a valid get_unmapped_area(), then with huge faults maybe we don't even need it, and such change can copy stable too. Meanwhile, just to mention there's one more commit that vfio huge_fault stable branches would like to have soon, that Alex fixed yet another alignment related issue to do reliable huge faults: commit c1d9dac0db168198b6f63f460665256dedad9b6e Author: Alex Williamson Date: Fri May 2 16:40:31 2025 -0600 vfio/pci: Align huge faults to order I think if your trace shows correct huge faults when you did correct alignment, it should mean it doesn't affect your case (likely your app sequentially fault in the bar region.. meanwhile likely there's no concurrent, especially unaligned, faults when pre-fault everything). But just something FYI and IIUC that commit will land 6.13.z soon. Thanks, -- Peter Xu