From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96B37C0218A for ; Thu, 30 Jan 2025 13:00:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EAB2280291; Thu, 30 Jan 2025 08:00:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 06848280290; Thu, 30 Jan 2025 08:00:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E226C280291; Thu, 30 Jan 2025 08:00:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C403F280290 for ; Thu, 30 Jan 2025 08:00:29 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A8203140D6A for ; Thu, 30 Jan 2025 13:00:26 +0000 (UTC) X-FDA: 83064126852.18.30F286D Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf05.hostedemail.com (Postfix) with ESMTP id 5D0B4100007 for ; Thu, 30 Jan 2025 13:00:24 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=D456jYkN; spf=none (imf05.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.51) smtp.mailfrom=simona.vetter@ffwll.ch; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738242024; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PWEfvcfTsN7T0tilikMhQJMX7DbMz278YnuholCTh1k=; b=ons9VAPjOwLprXqPF17ALVTnz1v5clJHO7+s3kN0AdLi8Ea2+LDnsKCEKEpdb33auQLcZs kT4NQ68hE77iCVxBh8TFAP+J2ZFjkeYv6IMK3GoK3Z/0JeNAP/wCMK4NAwm2Fn69SRGW6o 1Awgrl98L4V12dIFJ32Z0Q64TNSjb1g= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=D456jYkN; spf=none (imf05.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.51) smtp.mailfrom=simona.vetter@ffwll.ch; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738242024; a=rsa-sha256; cv=none; b=jzwKxIWjxbFYO/3TXoBa74uvQZias9cDGQRTj5WvT3rTQxREQy33mV9kdqc5CSW/e7oihf cOOjC3WH7ezW+8Ex+Vh7uK5/h1yUuSjzOZSQ+1n467ydvmJZP54KBezC9kUwUfPKSIjzE1 S1Mxat0bIUtwCpk+43PGJm9R5RPWZbY= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-43618283dedso7858205e9.3 for ; Thu, 30 Jan 2025 05:00:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1738242023; x=1738846823; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=PWEfvcfTsN7T0tilikMhQJMX7DbMz278YnuholCTh1k=; b=D456jYkNIYLf99d3CTX7eUvhh/pQIf8nyqLOcrPFRqbygR2DweXgZRGoKNtemL+/nR qKdtQ9STJDubalbHDacpXiZXziacRm683TXMQIQwq1hMIRtqLiGAMoWoq3fuUZQu9rtN znhsj3R19njGNPZ/FXbGFvR5vip3iMMj+kUlc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738242023; x=1738846823; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PWEfvcfTsN7T0tilikMhQJMX7DbMz278YnuholCTh1k=; b=Nzgzf4BKB1yGL38cwqQB8/TLSYqXHglv1FDJDp9JVCcBjJxj8cfJ9aSYzTIJVttdOA lbuBrXs6T8clHm6I4zB7lvzW0QTGlOGFo5q5+rSr5lAOxNl6qMecUDjoBdRs+105O9Av S+vsqsMj4/JAI82mMMg2iAfrkfWPdqJScpY/hto1P/fSiGKC+rbTms4rdk2mz107boLX qHjxCSEjPpBr0HtsDdqSUhePw/OyQuVtq6T1L9jFmEfJGPa2RlYm9181ojL47bV/G9qX 35tEF7A9Ufhp66qz9Z2+Wgwq4QDo20fMCNi1s1Ynnlkm//Tq9EWzDu8Em7y8IQ2s1tqW n3wQ== X-Forwarded-Encrypted: i=1; AJvYcCW20TXzBNispm3n2JB3lKWzzqqilNPJhdRWG6ilT4lWqRLNrFDSyN6TwNhkZ07iZDPfOU135WLX2w==@kvack.org X-Gm-Message-State: AOJu0YykPV2sJIEEOjuPO3zdom1tCuMvky3jU10Rq7RV7KGpUYX3Ab95 fzHZ9NsrHZntyKOnSwKG140agScaZ2KMha5TZX+YVvfbVMPZ1H1ZWp/Ab/clcRg= X-Gm-Gg: ASbGncsYCnYAn26ALqKpylWRRMe/4T82ZhC3yWSbxVjUCeAjn7QMw0IzE0/L+Tg35nu POA4TRTfxVKgN3GGnVnq7zHDrNDwjFCosO98B1SwhXXqbj1eterjoI8QlqWTZW8IHhTGIH5jLn/ 5hJx6ZisuhtEOt91MH6yHZ+mYnYe9J4P9DkT5sxEZ6ePJ1ZQSGhGfI4PfaoaxArpFm4Kxd5e6dX Xu+xyKu+7Jn52/4hLdmA/Iu56KsX3lHzNqTi8OqRODsWN9qDfySFuOALLSyZP5R1vITUfqw/v0d i5KMjPUBFmmFisXT2EmV7Af8XEo= X-Google-Smtp-Source: AGHT+IG/nMuUBD3KzbbfR2aeV/qZxqQT5Q4oWBw/1zX9VuChXzIPGd76LlGfkfl60EN6WAkMb4jFOQ== X-Received: by 2002:a5d:5986:0:b0:385:ec6e:e87a with SMTP id ffacd0b85a97d-38c520904ebmr6377248f8f.43.1738242022392; Thu, 30 Jan 2025 05:00:22 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:5485:d4b2:c087:b497]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38c5c1b59f6sm1937111f8f.69.2025.01.30.05.00.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jan 2025 05:00:21 -0800 (PST) Date: Thu, 30 Jan 2025 14:00:19 +0100 From: Simona Vetter To: David Hildenbrand Cc: Alistair Popple , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, Andrew Morton , =?iso-8859-1?B?Suly9G1l?= Glisse , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Jason Gunthorpe Subject: Re: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk Message-ID: Mail-Followup-To: David Hildenbrand , Alistair Popple , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, Andrew Morton , =?iso-8859-1?B?Suly9G1l?= Glisse , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Jason Gunthorpe References: <20250129115411.2077152-1-david@redhat.com> <20250129115411.2077152-5-david@redhat.com> <7tzcpx23vufmp5cxutnzhjgdj7kwqrw5drwochpv5ern7zknhj@h2s6y2qjbr3f> <54a55ff7-38c8-42c2-886f-d6d1985072a9@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54a55ff7-38c8-42c2-886f-d6d1985072a9@redhat.com> X-Operating-System: Linux phenom 6.12.11-amd64 X-Rspamd-Queue-Id: 5D0B4100007 X-Stat-Signature: oumg9ycu3ehanqg4u1dxjk7hkwjuhrw5 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738242024-691676 X-HE-Meta: U2FsdGVkX1/6BpdslFB70VyX3880dTdcpA/3Ar4W6uKXT19tksmYd+NLRf+0IREL1tmHzAL1eyj/p54DiO4h3nnpLy7WZ1NjPn2/yClfTlaNKr/xNTVBxM47IxQCOYDxZWc6WDCSG5O57thYN3Ni73gYhm5KMgVFz+o+NnOxGMFr7u4Dp9gyVp2y7t0Hy6L0UWpyS9TQRUS5klf1+voUEVZy5xnS5LeDfSfqQtf9SkktgA8/lwrOM6FqdRR/0/n7fuvS1PSVxLUogqFqS27WgL5xlM2KtFRY2cfhlEnxCAnn3pkidQ9SjbdP0BpvVJd483zixF/JsDyCwz9a2cK+9/Fz/q/mYJ9Nc80Rvl8xjIdtsjs2jsfep/CMx6fgxk3Fnymss/mvGIiPNrpwdPQuAdKC7v7md+Vspn0ODl8GH+xbue8S2HM+uqzI3YiJ6M5FoXle7hp7gPfV8EqZ4Fu2KWWLkyA0sNhaXnkxs93W9JD2MiRolCW89qVEXg5p/YqssOLm/Qv3FjIYjCkSbI89ikbgllmwOUWCnhYx0HOiJmtrvexlk4YDx6tdSi1nQqcVHyOQP343MuK7yu2/LCQXV/D0TSO6jaSNH6D7ZEcMBoVsPGFsNpdsN1Tw0twqbRZPovWWtcvEntH9flyV/vp2AaX9pSGHPkDWbwswR4rlctreGVnmDMVcHpxyYJ4J2+fHQmroZgs1jXDOeXnMQ4sEpIO7hPDOn6LRnouCE2m6X00YUXRPBAAqGOPEZ3zDcdV4+Qp5SKvXNmGNeRO/+hm54M8/3JKc63nfVji7/C7OnPHtEuEpurZJWxGqLqfgExrbFiK8cWuzD41mZcVAfqY3PGZ24dGWovFHdrp55eufu0uQoEs9MAcDog47FpTZbs/ZODQdTIq4y8qot/Xc5OSDtv8hudf8vEKJQKDFzXfRTMDkDxjZNs74/+nPu1l36h/oB8jDKfooqI5Y9lkuXzS NwaZ/3I6 P0S9VqDdAMhu+ML2EWR+eRTTXolui1UBtVdBNuxSf/8JiJbxP7uuaPVTDD93++mG9l0SWRzd6Q/tWnP+u8yIknwPTxbZAq4F6NiHKjfuIqd92FiLMwHWB/5T3nMvQitYmvbZGcEAEu++nJ4C1LQ86dK3M8udRSX5uc8y3sB8vXRzv/0hSOUtqted7lah6VGxuCZN9a69npt0gVzAWN8h0U3MJ1wfY1CZim93dnxrfToyjHakrsnWOX6huKRZbABMEmIvmD1Fl02fJh6kyBYxpsxdUPVliWnZ4gjxCgV8dsU5HejIECKZti/7qLr95zSQm6l5Ic/8cc9nC6X5RWDdJ2XUK1Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 30, 2025 at 10:47:29AM +0100, David Hildenbrand wrote: > On 30.01.25 10:40, Simona Vetter wrote: > > On Thu, Jan 30, 2025 at 05:11:49PM +1100, Alistair Popple wrote: > > > On Wed, Jan 29, 2025 at 12:54:02PM +0100, David Hildenbrand wrote: > > > > We require a writable PTE and only support anonymous folio: we can only > > > > have exactly one PTE pointing at that page, which we can just lookup > > > > using a folio walk, avoiding the rmap walk and the anon VMA lock. > > > > > > > > So let's stop doing an rmap walk and perform a folio walk instead, so we > > > > can easily just modify a single PTE and avoid relying on rmap/mapcounts. > > > > > > > > We now effectively work on a single PTE instead of multiple PTEs of > > > > a large folio, allowing for conversion of individual PTEs from > > > > non-exclusive to device-exclusive -- note that the other way always > > > > worked on single PTEs. > > > > > > > > We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why > > > > that is not required: GUP will already take care of the > > > > MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive > > > > entry) when not finding a present PTE and having to trigger a fault and > > > > ending up in remove_device_exclusive_entry(). > > > > > > I will have to look at this a bit more closely tomorrow but this doesn't seem > > > right to me. We may be transitioning from a present PTE (ie. a writable > > > anonymous mapping) to a non-present PTE (ie. a device-exclusive entry) and > > > therefore any secondary processors (eg. other GPUs, iommus, etc.) will need to > > > update their copies of the PTE. So I think the notifier call is needed. > > > > I guess this is a question of semantics we want, for multiple gpus do we > > require that device-exclusive also excludes other gpus or not. I'm leaning > > towards agreeing with you here. > > See my reply, it's also relevant for non-device, such as KVM. So it's the > right thing to do. Yeah sounds good. > > > > Note that the PTE is > > > > always writable, and we can always create a writable-device-exclusive > > > > entry. > > > > > > > > With this change, device-exclusive is fully compatible with THPs / > > > > large folios. We still require PMD-sized THPs to get PTE-mapped, and > > > > supporting PMD-mapped THP (without the PTE-remapping) is a different > > > > endeavour that might not be worth it at this point. > > > > I'm not sure we actually want hugepages for device exclusive, since it has > > an impact on what's allowed and what not. If we only ever do 4k entries > > then userspace can assume that as long atomics are separated by a 4k page > > there's no issue when both the gpu and cpu hammer on them. If we try to > > keep thp entries then suddenly a workload that worked before will result > > in endless ping-pong between gpu and cpu because the separate atomic > > counters (or whatever) now all sit in the same 2m page. > > Agreed. And the conversion + mapping into the device gets trickier. > > > > > So going with thp might result in userspace having to spread out atomics > > even more, which is just wasting memory and not saving any tlb entries > > since often you don't need that many. > > > > tldr; I think not supporting thp entries for device exclusive is a > > feature, not a bug. > > So, you agree with my "different endeavour that might not be worth it" > statement? Yes. Well I think we should go further and clearly document that we intentionally return split pages. Because it's part of the uapi contract with users of all this. And if someone needs pmd entries for performance or whatever, we need two things: a) userspace must mmap that memory as hugepage memory, to clearly signal the promise that atomics are split up on hugepage sizes and not just page size b) we need to extend make_device_exclusive and drivers to handle the hugetlb folio case I think thp is simply not going to work here, it's impossible (without potentially causing fault storms) to figure out what userspace might want. Cheers, Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch