From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED320CE8E9A for ; Thu, 24 Oct 2024 18:19:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83D726B007B; Thu, 24 Oct 2024 14:19:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7ECC76B0088; Thu, 24 Oct 2024 14:19:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DCC56B0093; Thu, 24 Oct 2024 14:19:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4D8346B007B for ; Thu, 24 Oct 2024 14:19:10 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A0F0C1614AF for ; Thu, 24 Oct 2024 18:18:48 +0000 (UTC) X-FDA: 82709306400.20.2453D31 Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) by imf16.hostedemail.com (Postfix) with ESMTP id 6CCC4180015 for ; Thu, 24 Oct 2024 18:18:49 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=OsLSQhvV; spf=pass (imf16.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.161.45 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729793795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uZBytGE7ZuTkGms+2TTJHzCJ6rcb64F+IbygfHjIlxo=; b=rJiZwDCUutTeyvcLSJNLl4bgnnKIN4HcHe5rqDDidX+HVu2Eo2n+YptMFBcPWF4Kw6qkaT 64QdpvbcVy0U0xSZv/Ek6XpodCl/vBDdWQmC5EQno3UOoXzAhr/jqhePuEKhoF66/jJ2IR 4wBLlAMo99wGdkyloQ/+23pdC22sVMM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729793795; a=rsa-sha256; cv=none; b=iQWdEvau1YPdzanBkayITPDJMrLMjID9Ek24z80TSutm/3U1ZcxD7y7R4S7rXC86cmEENc t+gDEizNQUx3j4HHShu95W5xJ1VIpEHlgL0vW5oCHLPhW/3fYvrIFF+3Z4J4MdC5YFfFJV WfF9G7G/8M6D42Q92EaP6g1MHekwAh4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=OsLSQhvV; spf=pass (imf16.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.161.45 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-5e5d0a80db4so653592eaf.3 for ; Thu, 24 Oct 2024 11:19:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1729793947; x=1730398747; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uZBytGE7ZuTkGms+2TTJHzCJ6rcb64F+IbygfHjIlxo=; b=OsLSQhvVKIDfT7B2mmLnphc+1+NITTrDx8o3JIurBRpZQp1/ZB/9SFQc5KGAX3LgZn 02GIwcYsl2Q+qzu+HIE+tF5VjNLJCewF8y8eC4/f6S8AAaCivPkLhBLzmNXxIiG8qVRl g2NlAV5pUWNXxJjMu0rFKKd96SsgjsKvt3tXch7Q0ROWE9t7TQ90GWjQACwjxxqSEREL SI89+LczaagISYebGjeTWFCzKWnT/Z0TimjEpw7L/9+KEmMLuGG8kMRYjQYoJkYBRaMy BGx6DVGp/kV5TBe2fH74lrgiIBZe347k1NCeVoEzc6ngxhCyS+47ebIaSxFOfLi1Rsz7 NeHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729793947; x=1730398747; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uZBytGE7ZuTkGms+2TTJHzCJ6rcb64F+IbygfHjIlxo=; b=eA3cVW+GjLCPMFMpAzFRXq1rnavWo9eFUuQpp87rfR7TtxZ2dlp9KanTHwe9JDhf4L OI/6ZMqqa+FXRuEwdo9lo/BUT/j5MEdMkWaqraKQSfb14yrkpY44/JgHezu8+fUtPnLB hzUxsqf/shI1iB+4gfMrLiTnptnTNeYdQ9SHVBCNZDw8A8dbZFVeKlDvRCnuHbnpyais mEUFZ1zSQt1oEp9jdXwig847r0fCCyft3EsEmKRcESs89jPBnTBfD17c6m26pBR0geEh ioS+WUASZb6GWOQ7k3m9e1PZomxh9LsHmq4vAUEkJbk2nSB3h3mbHTMFT92T4tuERWgk 5EIw== X-Forwarded-Encrypted: i=1; AJvYcCVGnVlzNgyhZJHw/V9m58FKEQDLR0uY3Zss03FpkLOMbXtoWcH8XrRQtlYn2FDx71pjyIFX31VSVw==@kvack.org X-Gm-Message-State: AOJu0YxFemnFiIH2YGwTe0iK+NWt5YrxyLaq8e2frcfb35NinhhUgxSW Ysf6bu+x8DH1lGoW76AqhdFEWmpazaSAQi2odwVG2T7D7POOsASu4Nh+9QwYW0g= X-Google-Smtp-Source: AGHT+IEjicTJUCagFJHVWThr9j6tOLmSf7uLfErGeRucuprxKLTVc9VjM+4CMkaK7H0akWzhPiSqsQ== X-Received: by 2002:a05:6358:60c3:b0:1c3:8215:164c with SMTP id e5c5f4694b2df-1c3d80d4610mr558532255d.1.1729793946767; Thu, 24 Oct 2024 11:19:06 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-128-5.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.128.5]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-460d3c3fe1csm53700741cf.10.2024.10.24.11.19.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Oct 2024 11:19:04 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1t42QJ-000000005sw-3YfU; Thu, 24 Oct 2024 15:19:03 -0300 Date: Thu, 24 Oct 2024 15:19:03 -0300 From: Jason Gunthorpe To: Alex Williamson Cc: Qinyun Tan , Andrew Morton , linux-mm@kvack.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Message-ID: <20241024181903.GA20281@ziepe.ca> References: <20241024110624.63871cfa.alex.williamson@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241024110624.63871cfa.alex.williamson@redhat.com> X-Rspamd-Queue-Id: 6CCC4180015 X-Stat-Signature: f78k3r9ag31jhcpstieyogqn7usx5cdw X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729793929-157167 X-HE-Meta: U2FsdGVkX1+m4w0gA9M0+0BGmN2SqDrvKtj02/m8G3cl4C5aFOxV5Ofcsx48vtG1Skf80+JGXVv/oMyEwZdWEqTc3MMM+FtwP8uHvkjAJDXIdcBRexpkVEuAx/ngQF+FS4Fa4HcA2F9vHTqx8KkZsibablwARitfHidpdWIzxWLx/T2LGAq0Ft9C2gv1E0qMiDBavv/txrwQspsWEyZmopVbRdCvftd6U6CjxVGSuzuO/0bV8dmB7bstyy9W7K1DhBww4lZ9jwBYy9WxlGh130Y0/PwqExxTEw8ROx0wUQ6KA+WoypX5kC3bP8Uot2LubmrH6D5heSrlWvxRtShdVG3FUH4ztXKPaz7W6696km/GPf4nWOtAEVbb5CVsdXY7GiW3uaB+Wo9xXchM837u4nZYcfrM2ANV2rJJccK/eUHpI4mOai9wnr8JYiRTEES2AWpT7nFTYOSa04gqw6WzogLB0wF/3+oKrv/h+6C/egyThLVXKG6AEz2guVHKi4huZLV4lewmfkVF0mkNC20DRpjbnjuRlYIt7tmjlOXZSOEsFxtc2yH/tlfH+8a208C8zMszn6r0eoKql31sIgPI6KXwWkxssZ6e13WOvYr9N7nj8qkNzgRPx+4zFzjEQArTNtfIkmMGk8PDCKNFtvvHroKLYxyoifkovMc5n5VyhxT3RW0iwtJSIKCEn6xsdTwR4yaLPbsLwlsAffWpTqKMdJr0sLQIojLqyM0fH1iTf3VRFrfJ/fNmpCgn39CwoFr4zOICp/aU0HxWtrmPLcdRlyRk5hf1R/GJ7BnqettnS4EjbDbGFPIh5Usj80bthMvO4IAKH3jksRTQWleQKFLXch8UjJaCpRlHQF7sFF8wOB+kHfxBRvvxCr1NXoyfh4Wi+pFjme+mRMDSyOLv0+ffPyv7s66tyeAskcCGKB+4RBMqqbBX2dh/cKv4Qubj4kFlzljS4ak8yYk6hoMWtOp LufE3kkw YiwV5FXh7iqqihTDKIgsflzLl6cux/Usa5Q/y41/aaMzRmP6pHkA0E0Czx49SYRfxarGOQxpqD9ydZYxoLAFh1MWXMh+O+oxB9wXvfEaGt2pUVjEJQd7OfuodVNCzas2syTOImzTx3Tjpvved6/XJCBTAUkAY43N0f39ZZfY+XllxCFlyICWZWK8/UYtOCpDuC4uG5CJewgsLVlJYLxP08KxoPIR6/w1OcWrPDeR7wEEuqrc1s8Jci8sJzaBqFkmgIhDkx5wmuL75aVIjJO3jEC6B9JiDLY5ONsFZEhkILyXYDyKd6KOxyhYy2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 24, 2024 at 11:06:24AM -0600, Alex Williamson wrote: > On Thu, 24 Oct 2024 17:34:42 +0800 > Qinyun Tan wrote: > > > When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address, > > the general handler 'vfio_pin_map_dma' attempts to pin the memory and > > then create the mapping in the iommu. > > > > However, some mappings aren't backed by a struct page, for example an > > mmap'd MMIO range for our own or another device. In this scenario, a vma > > with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the > > pin operation incurs a large overhead which will result in a longer > > startup time for the VM. We don't actually need a pin in this scenario. > > > > To address this issue, we introduce a new DMA MAP flag > > 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote' > > operation in the DMA map process for mmio memory. Additionally, we add > > the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can > > directly obtain the pfn through vma->vm_pgoff. > > > > This approach allows us to avoid unnecessary memory pinning operations, > > which would otherwise introduce additional overhead during DMA mapping. > > > > In my tests, using vfio to pass through an 8-card AMD GPU which with a > > large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced > > from about 50.79s to 1.57s. > > If the vma has a flag to indicate pfnmap, why does the user need to > provide a mapping flag to indicate not to pin? We generally cannot > trust such a user directive anyway, nor do we in this series, so it all > seems rather redundant. The best answer is to map from DMABUF not from VMA and then you get perfect aggregation cheaply. > What about simply improving the batching of pfnmap ranges rather than > imposing any sort of mm or uapi changes? Or perhaps, since we're now > using huge_fault to populate the vma, maybe we can iterate at PMD or > PUD granularity rather than PAGE_SIZE? Seems like we have plenty of > optimizations to pursue that could be done transparently to the > user. I don't want to add more stuff to support the security broken follow_pfn path. It needs to be replaced. Leon's work to improve the DMA API is soo close so we may be close to the end! There are two versions of the dmabuf patches on the list, it would be good to get that in good shape. We could make a full solution, including the vfio/iommufd map side while waiting. Jason