From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B3AC9EB64DB
	for <linux-mm@archiver.kernel.org>; Thu, 22 Jun 2023 15:28:05 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 57BDE8D0009; Thu, 22 Jun 2023 11:28:05 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 52BA68D0002; Thu, 22 Jun 2023 11:28:05 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3F3B58D0009; Thu, 22 Jun 2023 11:28:05 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 2D9E88D0002
	for <linux-mm@kvack.org>; Thu, 22 Jun 2023 11:28:05 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id E6916160177
	for <linux-mm@kvack.org>; Thu, 22 Jun 2023 15:28:04 +0000 (UTC)
X-FDA: 80930764488.02.47C286F
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf03.hostedemail.com (Postfix) with ESMTP id 832592000B
	for <linux-mm@kvack.org>; Thu, 22 Jun 2023 15:28:02 +0000 (UTC)
Authentication-Results: imf03.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IVslyydK;
	spf=pass (imf03.hostedemail.com: domain of dakr@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dakr@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1687447682;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=LFIsSMz1YrVeTt7yDPVdtnp2v+t72OcBanmN5WVCEzY=;
	b=f3JSyGqvAcDTFxFot6x3EYXoXJ+4XKHEObzKBFdQ+E51sfPfenezy7OAbx17HD2683xaof
	ZJJzu3FKjkFHMQWwbZFjK15E5/A0PqnDydTnvHFllB6b/l140UgvB2n1iEFigoIbPHSPn+
	nZ6VCSaieSXL3qPiCzmmL41nb61xC+A=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687447682; a=rsa-sha256;
	cv=none;
	b=4FT0d88JY+8eDvPhOPEOV1m9vXrWZONTUKj0EYmapG8yFPWElX/UhLDdPBB8TAe2Jl21Gv
	Jh7uxYEjBAf2le5xJukDX9zqPR2Ak8o1iU+IOz9QaYHnvqlfIgCEMIYUiLctVxJ4E0SKZz
	zgE3lHW1owbMx0eYLZDjScAPwWIhtM4=
ARC-Authentication-Results: i=1;
	imf03.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IVslyydK;
	spf=pass (imf03.hostedemail.com: domain of dakr@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dakr@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1687447681;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LFIsSMz1YrVeTt7yDPVdtnp2v+t72OcBanmN5WVCEzY=;
	b=IVslyydKLjE/LN43wbdBsHGR03EGHkHNTtrqErQCyfZerbSr+pTTbwQ5jm/1cOmVWiLWXQ
	3HWoAFnB+P2FYkdO1JH7ZSTHqk8WR8XnGZdnhu3wVsFyXpH7nXPty+4QkLJooZ86Fy9PbZ
	eKxhjnyZ4y7zAazBSIFoyl0/9l7RlgI=
Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com
 [209.85.214.198]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-484-eLVGI17GMbuD7-gMJe6W4w-1; Thu, 22 Jun 2023 11:27:58 -0400
X-MC-Unique: eLVGI17GMbuD7-gMJe6W4w-1
Received: by mail-pl1-f198.google.com with SMTP id d9443c01a7336-1b50a3809a1so5843505ad.0
        for <linux-mm@kvack.org>; Thu, 22 Jun 2023 08:27:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1687447677; x=1690039677;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=LFIsSMz1YrVeTt7yDPVdtnp2v+t72OcBanmN5WVCEzY=;
        b=KLX4FL2b699NfNL4gevo0DOELCXlyQuCmrzg1NoPxsRugieT64ySagGGwom6138ZtY
         TQD7LOsV6qUELtu0t2GgSQ+IasTGCXjd6HLg/kKZnTq7hWFSIDI8rIVtBDsXcv0x63PX
         VYIDA4mfyE1jhP+3cSLdinwoTt+fe3YvOE74fOBG94e3mAeGek11OC11KrZVnsAKw22W
         ZnbS4bDTmFiiu73PTt1Znrnihl/h8BJY5ErS3eBQ68A6kS5DNhE8Nxjjs8cFYHagTYR8
         wl4UQvJEi2YGjpnb6ZzO4cH6NT69bNXUpE5DBDWBBOZpKoz21crRfZdKWsIn9ugyj+qP
         3FFQ==
X-Gm-Message-State: AC+VfDyvIQsM6eDIxmwdzWl5FTLIrMh68OUuYUKLQK6gq2ftwNBj1Gdk
	T+9clN3b14XgMdmKs5tZV1bMfv3+jZTmnguq8n96bRX4tsL5xJLH7QHdd001VF3stKtmulpwxt3
	BQmT5D4pBfdk=
X-Received: by 2002:a17:902:db04:b0:1b6:4bc2:74bc with SMTP id m4-20020a170902db0400b001b64bc274bcmr17127854plx.2.1687447676714;
        Thu, 22 Jun 2023 08:27:56 -0700 (PDT)
X-Google-Smtp-Source: ACHHUZ5PkQD22nPF+PfRaRoeULgNWsV6z57goWT8b9vZTIr4iVPlbcuRSq0potrkLL5dqoW80YN4ig==
X-Received: by 2002:a17:902:db04:b0:1b6:4bc2:74bc with SMTP id m4-20020a170902db0400b001b64bc274bcmr17127817plx.2.1687447676288;
        Thu, 22 Jun 2023 08:27:56 -0700 (PDT)
Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c])
        by smtp.gmail.com with ESMTPSA id l10-20020a170902eb0a00b001b6a241b671sm2713367plb.271.2023.06.22.08.27.46
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 22 Jun 2023 08:27:55 -0700 (PDT)
Message-ID: <fb1e5234-b9a5-ba45-1706-81f4cee45274@redhat.com>
Date: Thu, 22 Jun 2023 17:27:44 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.10.0
Subject: Re: [PATCH drm-next v5 00/14] [RFC] DRM GPUVA Manager & Nouveau
 VM_BIND UAPI
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: matthew.brost@intel.com, airlied@gmail.com, daniel@ffwll.ch,
 tzimmermann@suse.de, mripard@kernel.org, corbet@lwn.net,
 christian.koenig@amd.com, bskeggs@redhat.com, Liam.Howlett@oracle.com,
 alexdeucher@gmail.com, ogabbay@kernel.org, bagasdotme@gmail.com,
 willy@infradead.org, jason@jlekstrand.net, dri-devel@lists.freedesktop.org,
 nouveau@lists.freedesktop.org, linux-doc@vger.kernel.org,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <20230620004217.4700-1-dakr@redhat.com>
 <20230620112540.19142ef3@collabora.com>
 <94adfd82-e77d-f99c-1d94-8b6397d39310@redhat.com>
 <20230622150101.229391e5@collabora.com>
 <b04b3dbb-0509-fec1-4e8e-90f724443836@redhat.com>
 <20230622171931.1c46f745@collabora.com>
From: Danilo Krummrich <dakr@redhat.com>
Organization: RedHat
In-Reply-To: <20230622171931.1c46f745@collabora.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 832592000B
X-Rspam-User: 
X-Rspamd-Server: rspam11
X-Stat-Signature: wpkf4xnpbt177w7f6eh6t5b9nyg14hwy
X-HE-Tag: 1687447682-351285
X-HE-Meta: U2FsdGVkX1+nE+dStvnTezg+PjpmHSuYCN0QduUjrHbrdwBzuRvvGU4kygyVSczCX/ALlXoxtLsNxUB9lmw4wybZPUqZtqSylE95Z5oTRfhHBCkO3l3wRYIhpyFNd+91D7+FlC/2/o4VHZXMDqXFFvbs4dAoKrCkwW36crpMU+aqM6TaY5NN7dAJYa6dU/+h60bWTr/LL/3xy5wekEVxEC0MOL9ihZVYYiYyagWFrUh6onUEkPUYb0bMCs6WMNxB3KAfbbdCLGtvDhoEED/ECrR2wJYW/bJoKKryy7aNaBJnszQgr6Ku0XEDfovScyDrljxQJ1BKnBN37qcWCVQ9qhf32tY+kg2q9x+tCKhr9FJO7tt4HFWWr2+MshY1F2f34ocGbLe2WozIBxEAvaPSEKCzXIj20e/MnCsIz/xEufmEK9gBJKnxTF043yRNJlFoXZ+HMUrnhaROI7/qk44A8bSBee2dqRKW29c4o5bJv5Ct1HvAA+Ehq9aVCL5nABGNkuerNb8D5gNed5XhYTIFl0+l6/rP8WWJU6A0so8nfKm4e7DBZoryXO2aoCqy306FMFiz59CGz8r5HBxDNovlKUdQEmW5ICCakGU+nmW6IbmYGtbVcvIdevncmCzcuPmM5Jxn9Dk9I+uLNJ9lt1FSa0SYN4ikCNCusMoO4Kx1WHdRUmIz25P36KLI2nqwyeR1SccrbsgiwlXGSak3BsjaLjToAAIXr5ZtTzW/GRj4T8qC0UftKXXsnLqv5YIyExAuUikRIKtgpo3IXcuDt/QRfk/LAieYr6SN782ZkXl+143QieedooeAZp5USTvEdI5rgMJHVauWmIe6uIMFm2UW6pd0QWvJzsH1ElgzyrVnFVS49lHLGO2Y3RhLiuj6aKqH2v8UJUUi+aj5CsLJdegcjgNuENkJlodJUlY0y8L1u0VuOzUiPxyr+N4bbkUUigg/dtuK9Dk7T3b47haHFby
 SnxKohsF
 TQWii8OjRFe6vLtJ+VFKMEIhMAi4VLRKiCdE8Ech+4Vj/tDAjXvm4OYTQ1RPCa2TPvkYyFIh0MZNpCi/rFtT73uM0pGzQBSIRljFJEWTfy06CZq3CLOlx2hh1Qb5EvZT63TcY/cD7n/Ac0A/ni1O1z5QAKqlE/JfbJqu8rpp1F7Tadw2rMq0JVfbyFUfBrglxIUe7Nrqj2htgXks2pA0/91XUZ99JQ5wmXWIg5L8t28WkXoOMrE53HJuDhmSK+k+gB3N0YincOHuwVszsUIEu+WE5Hb8+mXMe/wFF9l+QnGSkk97SZvJ0NZfgVBzyzYMiGKZnmoaR6EZWu4eWZjPZ2pt+HILrC7qZgLZ8+MAiRoTKgcjtnPTJTHeBw5FoRDi+o5sYtgmPdGCHFxj5fefYlDL3Cvt8sIu9quwtMJpBsVKr6hD/wlJL2PG9cldcy5EcsHU+2Vpj3jIxPToup6F9U5fQjQmvnmm+EVNcRTuJFzhJRPPeXzl5X/30X2KmOqk0g7wPPAwE01c8Cch7jm9WQB3Az0F83X1nyUWO
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On 6/22/23 17:19, Boris Brezillon wrote:
> Hi Danilo,
> 
> On Thu, 22 Jun 2023 15:58:23 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
>> Hi Boris,
>>
>> On 6/22/23 15:01, Boris Brezillon wrote:
>>> Hi Danilo,
>>>
>>> On Tue, 20 Jun 2023 14:46:07 +0200
>>> Danilo Krummrich <dakr@redhat.com> wrote:
>>>    
>>>>> The only thing I'm worried about is the 'sync mapping requests have to
>>>>> go through the async path and wait for all previous async requests to
>>>>> be processed' problem I mentioned in one of your previous submission,
>>>>> but I'm happy leave that for later.
>>>>
>>>> Yes, I'm aware of this limitation.
>>>>
>>>> Let me quickly try to explain where this limitation comes from and how I
>>>> intend to address it.
>>>>
>>>> In order to be able to allocate the required page tables for a mapping
>>>> request and in order to free corresponding page tables once the (async)
>>>> job finished I need to know the corresponding sequence of operations
>>>> (drm_gpuva_ops) to fulfill the mapping request.
>>>>
>>>> This requires me to update the GPUVA space in the ioctl() rather than in
>>>> the async stage, because otherwise I would need to wait for previous
>>>> jobs to finish before being able to submit subsequent jobs to the job
>>>> queue, since I need an up to date view of the GPUVA space in order to
>>>> calculate the sequence of operations to fulfill a mapping request.
>>>>
>>>> As a consequence all jobs need to be processed in the order they were
>>>> submitted, including synchronous jobs.
>>>>
>>>> @Matt: I think you will have the same limitation with synchronous jobs
>>>> as your implementation in XE should be similar?
>>>>
>>>> In order to address it I want to switch to using callbacks rather than
>>>> 'pre-allocated' drm_gpuva_ops and update the GPUVA space within the
>>>> asynchronous stage.
>>>> This would allow me to 'fit' synchronous jobs
>>>> between jobs waiting in the async job queue. However, to do this I have
>>>> to re-work how the page table handling in Nouveau is implemented, since
>>>> this would require me to be able to manage the page tables without
>>>> knowing the exact sequence of operations to fulfill a mapping request.
>>>
>>> Ok, so I think that's more or less what we're trying to do right
>>> now in PowerVR.
>>>
>>> - First, we make sure we reserve enough MMU page tables for a given map
>>>     operation to succeed no matter the VM state in the VM_BIND job
>>>     submission path (our VM_BIND ioctl). That means we're always
>>>     over-provisioning and returning unused memory back when the operation
>>>     is done if we end up using less memory.
>>> - We pre-allocate for the mapple-tree insertions.
>>> - Then we map using drm_gpuva_sm_map() and the callbacks we provided in
>>>     the drm_sched::run_job() path. We guarantee that no memory is
>>>     allocated in that path thanks to the pre-allocation/reservation we've
>>>     done at VM_BIND job submission time.
>>>
>>> The problem I see with this v5 is that:
>>>
>>> 1/ We now have a dma_resv_lock_held() in drm_gpuva_{link,unlink}(),
>>>      which, in our case, is called in the async drm_sched::run_job() path,
>>>      and we don't hold the lock in that path (it's been released just
>>>      after the job submission).
>>
>> My solution to this, as by now, is to - in the same way we pre-allocate
>> - to just pre-link and pre-unlink. And then fix things up in the cleanup
>> path.
>>
>> However, depending on the driver, this might require you to set a flag
>> in the driver specific structure (embedding struct drm_gpuva) whether
>> the gpuva is actually mapped (as in has active page table entries).
>> Maybe we could also just add such a flag to struct drm_gpuva. But yeah,
>> doesn't sound too nice to be honest...
>>
>>> 2/ I'm worried that Liam's plan to only reserve what's actually needed
>>>      based on the mapple tree state is going to play against us, because
>>>      the mapple-tree is only modified at job exec time, and we might have
>>>      several unmaps happening between the moment we created and queued the
>>>      jobs, and the moment they actually get executed, meaning the
>>>      mapple-tree reservation might no longer fit the bill.
>>
>> Yes, I'm aware and I explained to Liam in detail why we need the
>> mas_preallocate_worst_case() way of doing it.
>>
>> See this mail:
>> https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5
>>
>> He hasn't answered yet, but I hope we can just get (or actually keep)
>> such a function (hopefully with better naming), since it shouldn't
>> interfere with anything else.
> 
> My bad, I started reading your reply and got interrupted. Never got
> back to it, which I should definitely have done before posting my
> questions. Anyway, glad to hear we're on the same page regarding the
> mas_preallocate_worst_case() thing.

No worries, I should probably also reply to Liams patch introducing the 
change. I will do that in a minute.

> 
>>
>>>
>>> For issue #1, it shouldn't be to problematic if we use a regular lock to
>>> insert to/remove from the GEM gpuva list.
>>
>> Yes, that's why I had a separate GEM gpuva list lock in the first place.
>> However, this doesn't really work when generating ops rather than using
>> the callback interface.
>>
>> Have a look at drm_gpuva_gem_unmap_ops_create() requested by Matt for
>> XE. This function generates drm_gpuva_ops to unmap all mappings of a
>> given GEM. In order to do that the function must iterate the GEM's gpuva
>> list and allocate operations for each mapping. As a consequence the
>> gpuva list lock wouldn't be allowed to be taken in the fence signalling
>> path (run_job()) any longer. Hence, we can just protect the list with
>> the GEM's dma-resv lock.
> 
> Yeah, I see why using dma_resv when pre-inserting the mapping is
> useful, it just didn't really work with late mapping insertion.
> 
>>
>> However, I can understand that it might be inconvenient for the callback
>> interface and admittedly my solution to that isn't that nice as well.
>> Hence the following idea:
>>
>> For drivers to be able to use their own lock for that it would be enough
>> to get rid of the lockdep checks. We could just add a flag to the GPUVA
>> manager to let the driver indicate it wants to do it's own locking for
>> the GPUVA list and skip the lockdep checks for the dma-resv lock in that
>> case.
> 
> Sounds good to me.

I think it's way better than the pre-link / pre-unlink mess. I will add 
this to v6.

> 
>>
>>>
>>> For issue #2, I can see a way out if, instead of freeing gpuva nodes,
>>> we flag those as unused when we see that something happening later in
>>> the queue is going to map a section being unmapped. All of this implies
>>> keeping access to already queued VM_BIND jobs (using the spsc queue at
>>> the entity level is not practical), and iterating over them every time
>>> a new sync or async job is queued to flag what needs to be retained. It
>>> would obviously be easier if we could tell the mapple-tree API
>>> 'provision as if the tree was empty', so all we have to do is just
>>> over-provision for both the page tables and mapple-tree insertion, and
>>> free the unused mem when the operation is done.
>>>
>>> Don't know if you already thought about that and/or have solutions to
>>> solve these issues.
>>
>> As already mentioned above, I'd just expect we can keep it the
>> over-provision way, as you say. I think it's a legit use case to not
>> know the state of the maple tree at the time the pre-allocated nodes
>> will be used and keeping that should not interfere with Liams plan to
>> (hopefully separately) optimize for the pre-allocation use case they
>> have within -mm.
>>
>> But let's wait for his take on that.
> 
> Sure. As I said, I'm fine getting this version merged, we can sort out
> the changes needed for PowerVR later. Just thought I'd mention those
> issues early, so you're not surprised when we come back with crazy
> requests (which apparently are not that crazy ;-)).

They're not crazy at all, in fact they entirely represent what the 
callback interface was designed for. :-)

- Danilo

> 
> Regards,
> 
> Boris
>