From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E00BC77B7C for ; Wed, 25 Jun 2025 19:26:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB7696B0098; Wed, 25 Jun 2025 15:26:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8EEB8D0001; Wed, 25 Jun 2025 15:26:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA48C6B009A; Wed, 25 Jun 2025 15:26:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CAC5E6B0098 for ; Wed, 25 Jun 2025 15:26:53 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5561BBB0EC for ; Wed, 25 Jun 2025 19:26:53 +0000 (UTC) X-FDA: 83594905506.13.611C19C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 195731C0004 for ; Wed, 25 Jun 2025 19:26:50 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="ZiV/+trD"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750879611; a=rsa-sha256; cv=none; b=G7s9UJS0+REby7utdhAEXysJM9cYmBmbcx7VT77r3/oRTRiO6BfVSVlQJWwQtnAXglwCrL vNMJkbNtzJonZ2QsngHO3rzScNIuHKepem2qXsmcJQywRI2Pc1DtA/Zj3dQ/lydBr0cgLd yHy/D9n4USiLid6iTYnkiDsma1JAQVI= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="ZiV/+trD"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750879611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TGfYjsrv3CHWJsl4ZYRpnVjuHZ4AXKNVmUWtYP9RsHQ=; b=c5aDan9d19mxKhwWsURNbOa1iJMV42VIpUBoKPLgUmVH0i/8GcZY4KIw6hwBR0aCFtIL6K HyIoEgoKSsOy4zvknH/y0jhndbcod74Wy0z0b8BVvkX+jpkykwhciGUZOvdfGiBi7DlWaM IgEHDHrbAgP2BCGMR8gorX+uaoLk+6s= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750879610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TGfYjsrv3CHWJsl4ZYRpnVjuHZ4AXKNVmUWtYP9RsHQ=; b=ZiV/+trD/3ECqPDhy97fwfR43M0SlLcgQq9HFPGdc3EVC/sv5ydXps2qYsaFlD1QJuOGpH K+Dep6OFamql2i4bPivqTyYgvawSy9wSshQihN4pQYHuq4eqc/vlwIT0hBGNo1u7673tz8 9rRPJ0wZnb/JXE87qnRKCV4MFn6pd+4= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-43-NHLKvHEfO_iTf8RIv4PyDw-1; Wed, 25 Jun 2025 15:26:49 -0400 X-MC-Unique: NHLKvHEfO_iTf8RIv4PyDw-1 X-Mimecast-MFC-AGG-ID: NHLKvHEfO_iTf8RIv4PyDw_1750879608 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-4a6fb9bbbc9so8226081cf.0 for ; Wed, 25 Jun 2025 12:26:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750879608; x=1751484408; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TGfYjsrv3CHWJsl4ZYRpnVjuHZ4AXKNVmUWtYP9RsHQ=; b=BZg8LK7+4w8dSuRYB0d6BcKa2+eoPBxNKLoFmdVHOiItqMvQzq82bz01eoYUiKFTaa sgoKJL1QwX7o0yrFGkABJ64F4R2wH/7xa9tsABl6bH28WpNUKeIxbvKSL6zThI9sVCaY t+6PJX3Ch3H7j1WbbR7Be2fBvfFyJwOBZvCGSvyCCs9Au/aM0qAFedi+P022Zrgc3RQT lgBSP7HW6obSegnD+CFrsd8F5vWRfkGQV5PbQSr9sAo+pP7+QI/Kwh1MyFJlDflTdt3l CIAvGQ5b96RY0HwwMQr5AC6uUMXU0VkQB1yvkbioZ4uFL5TyYc9ASGYjg8a/wHCFuGY2 P5MA== X-Forwarded-Encrypted: i=1; AJvYcCWkwqkgXpK+paLQMQH+F16A1k22ZF4jhxBKiXYVoAN/9/O+dQqpA727fjhfp60TH3W09Aky7bAetw==@kvack.org X-Gm-Message-State: AOJu0YwlDC1qIbEosZ50qmfn3ZuIxlkU0c4HqgzC67G+84TU5afrh2NR 1ne80GrXo/vaKcdmzexK9gthTm75MfK74rg3wHubyqMqg1KcFdCSt5Brkofjs1E19m/77CcNGLC TLObT8J7dZZu8TSf70XytpBQftYkGn0KLGTP9TAfzjNyTApMiygkh X-Gm-Gg: ASbGncth+NR/OVQ6ocgfANnfboc2d0fxq+QnS3T4A5tSLC3AUsKyZQp3kWADfzPOaGc efwkjTibwBDNxyEf8Elivd8CaRy9FyixI4xvSrEnqTsRqIMSrTY+xGEJYe0yaOHPPKnqXvmyx9D D4VL/ZqDeGq2y/2JZGuT5V9RdayuQp1JD2yP9LywxG7vabDVHws/G9edkK0Vflz4ClsNNP1mavB hTkVUSjPCVyKDyjWz2bwllRg1mxXGTmPMXQ560nAt064w1gMOTL0s9PBNYRF3WHI0juHvd8/P+5 GrCl5mOXGI640A== X-Received: by 2002:a05:622a:d0c:b0:4a7:693a:6ae8 with SMTP id d75a77b69052e-4a7c0987d78mr76648571cf.52.1750879608519; Wed, 25 Jun 2025 12:26:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE2/4t5r9VjiW5td+72Yce2Xj+4z6Ttju+rlhn9lvRECNav3t/dX5xvs86OsGTFFXZgQqVrUA== X-Received: by 2002:a05:622a:d0c:b0:4a7:693a:6ae8 with SMTP id d75a77b69052e-4a7c0987d78mr76648091cf.52.1750879608097; Wed, 25 Jun 2025 12:26:48 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4a779d4f7d9sm62877581cf.6.2025.06.25.12.26.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jun 2025 12:26:47 -0700 (PDT) Date: Wed, 25 Jun 2025 15:26:44 -0400 From: Peter Xu To: Jason Gunthorpe Cc: "Liam R. Howlett" , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Alex Mastro , David Hildenbrand , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250619135852.GC1643312@nvidia.com> <20250619184041.GA10191@nvidia.com> <20250624234032.GC167785@nvidia.com> <20250625130711.GH167785@nvidia.com> <20250625184154.GI167785@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20250625184154.GI167785@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: pwZ7Wrhwm1svOxvWlW_g8os_Ot05QW2ZpS-Q2YJ3WeU_1750879608 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Queue-Id: 195731C0004 X-Rspamd-Server: rspam10 X-Stat-Signature: mdkax36xo1jjsufhqpfe4tbopprfztax X-HE-Tag: 1750879610-331075 X-HE-Meta: U2FsdGVkX194hzv+dHNvOiu0KJik0EtN1iRWdm5qkayRqzyeNq+sbwYDkdu9F4d5eVxW0F1kxALitq2B+yXa7bDGw1hZRtId1uuQlbBYExS1aA++wc0Tj+PPp6IMRKC9pJiREijhN8TwzeNlg+STq3IDPURxmdc7cW8Qmyvzn2KzVBl18EEA1XbvGP3LBiI9QKsa3QEwaim9Ip5Rpvp8ebC1Rg1qEU7Qro/DgPgSSvBZgUeAjse59ZxUC16TSJzdCOiPuelAtegmqLXXLNarzf6qMJ4b9vSOWR7rfspGI9XQK0CRkaMiDMQnOAZA1o9jVYi8dCa+dv7kReOZg1XCO/TZQlcG8Ps/8NsXdqEq5esN4sRzr6/8OfPef0qshxZVkfAW6OjCmFD1F3TZOk8toqAPo9KCuTrTIgXwAOVvNHB0sm42eouA2N0lh265NO+5AiCFPFaQFujf6vGLRAbh7WmgHxdhpuqaXZFIb9h3/G44IPfIOdBF+qnpId51dJQ3tcK+hE2IPZdaCN4k4sMnx8ENBuRidayQIPqtIQYBSGlMUAynZtav7FCj5ASwQ4TRhFqq3ZTGazM5I0efFnlqDzbmbmdQTrcCmE8ss0Fk7NFUkT+fQtgeMKQQlxUNj2F4KHDvBz+KBPbtzTYsuJ6aHGAZUbDr6sqKjaFWI6KQkwdW4AkcJ7IkfP1Nuj40c3LTCE3QMRNWPHBP6iprV1uJUfNpSX22ijijkUg6dPNre+5IHxDKRxzURNKqgLJWeF1BoD4JExp5XKWbidVvkMVF5cWWP8C3AcnPBeaLkL1drVFbqrGx6A1vy4mXt6uijHSA9XXBp0OSaiPqUE6MpguS9jgMk0/7adaI1QtHnK3EZ44v81gXHYQy6SBg7+0J86sRrp4xe2FKQDICV5nx6LPeUFKTaCMPCPBDDNckcTiyOqMw2Mm9Uxk228GVMTkshjfAUUApEQMQJz1dkqJGnH8 70Nv4ehD TZUcEu4WpgXu/ETBJkJCu5X3aL3ljcALHToWdJCLUfx2f6Z+BstoRssupPj+K+xnxfJUn3/b98w+05y9m1uf+0kd78QbYWixvMCmX7eDOF/VOYbUZW3oK4ZmMBbRcBYCiDOPKU9doy8jHkuIbROFiDEmADsnGN99f+sTF+BDWqLoGnBYP24rYbepwDUU6yf+ULnhXniZ9sZ73Q48E1vxRO4dE4F5WKFajICbHRHCGT/iZ1C8OG1TRIhUhj6B+f7Mq9ldfqM9KabuNUDmzwXFqAXqQQ27qcn6sD5leYpRmFs+pVHEJ7ZgUukcrvHe1+md8ximtK2t7OssquwHES7KY43hd02ouF4QV/oKO2+HP9n5cDBilwQLjirqaYK0Q40Ev2yqiOo6n0swaVOYCik3yhddTEFxqLY5Jo82U8+iMp72++64= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 25, 2025 at 03:41:54PM -0300, Jason Gunthorpe wrote: > On Wed, Jun 25, 2025 at 01:12:11PM -0400, Peter Xu wrote: > > > After I read the two use cases, I mostly agree. Just one trivial thing to > > mention, it may not be direct map but vmap() (see io_region_init_ptr()). > > If it is vmapped then this is all silly, you should vmap and mmmap > using the same cache colouring and, AFAIK, pgoff is how this works for > purely userspace. > > Once vmap'd it should determine the cache colour and set the pgoff > properly, then everything should already work no? I don't yet see how to set the pgoff. Here pgoff is passed from the userspace, which follows io_uring's definition (per io_uring_mmap). For example, in parisc one could map the complete queue with pgoff=IORING_OFF_CQ_RING (0x8000000), but then the VA alignment needs to be adjusted to the vmap() returned for complete queue's io_mapped_region.ptr. > > > It already does, see (io_uring_get_unmapped_area(), of parisc): > > > > /* > > * Do not allow to map to user-provided address to avoid breaking the > > * aliasing rules. Userspace is not able to guess the offset address of > > * kernel kmalloc()ed memory area. > > */ > > if (addr) > > return -EINVAL; > > > > I do not know whoever would use MAP_FIXED but with addr=0. So failing > > addr!=0 should literally stop almost all MAP_FIXED already. > > Maybe but also it is not right to not check MAP_FIXED directly.. And > addr is supposed to be a hint for non-fixed mode so it is weird to > -EINVAL when you can ignore the hint?? I agree on both points here. > > > Going back to the topic of this series - I think the new API would work for > > io_uring and parisc too if I can return phys_pgoff, here what parisc would > > need is: > > The best solution is to fix the selection of normal pgoff so it has > consistent colouring of user VMAs and kernel vmaps. Either compute a > pgoff that matches the vmap (hopefully easy if it is not uABI) or > teach the kernel vmap how to respect a "pgoff" to set the cache > colouring just like the user VMA's do (AFIACR). > > But I think this is getting maybe too big and I'd just introduce the > new API and not try to convert this hard stuff. The above explanation > how it could be fixed should be enough?? I never planned to do it myself. However if I'm going to sign-off and propose an API, I want to be crystal clear of the goal of the API, and feasibility of the goal even if I'm not going to work on it.. We don't want to introduce something then found it won't work even for some MMU use cases, and start maintaining both, or revert back. I wished we could have sticked with the get_unmapped_area() as of now and leave the API for later. So if we want the new API to be proposed here, and make VFIO use it first (while consider it to be applicable to all existing MMU users at least, which I checked all of them so far now), I'd think this proper: int (*mmap_va_hint)(struct file *file, unsigned long *pgoff, size_t len); The changes comparing to previous: (1) merged pgoff and *phys_pgoff parameters into one unsigned long, so the hook can adjust the pgoff for the va allocator to be used. The adjustment will not be visible to future mmap() when VMA is created. (2) I renamed it to mmap_va_hint(), because *pgoff will be able to be updated, so it's not only about ordering, but "order" and "pgoff adjustment" hints that the core mm will use when calculating the VA. Does it look ok to you? -- Peter Xu