linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
       [not found]                         ` <5f1a758b-b4f6-4fc3-b16b-29d731415636@amd.com>
@ 2024-03-18 14:24                           ` Maíra Canal
  2024-03-18 15:05                             ` Christian König
  0 siblings, 1 reply; 3+ messages in thread
From: Maíra Canal @ 2024-03-18 14:24 UTC (permalink / raw)
  To: Christian König, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm

Not that the CC list wasn't big enough, but I'm adding MM folks
in the CC list.

On 3/18/24 11:04, Christian König wrote:
> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>> Hi Christian,
>>
>> On 3/18/24 10:10, Christian König wrote:
>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>> Hi Christian,
>>>>
>>>> On 3/12/24 10:48, Christian König wrote:
>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>
>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>>>>> better match
>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>> created. If
>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>
>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>>>>
>>>>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>>>>> shmem backed GEM object is just some special handling as far 
>>>>>>>>>>> as I can see.
>>>>>>>>>>>
>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>
>>>>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>>>>
>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>
>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>
>>>>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>>>>> the connection why a different mount point helps with using 
>>>>>>>>> huge pages.
>>>>>>>>
>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>
>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>
>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>>>>> DRM device to not use huge pages with the shmem backend?
>>>>>>
>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>> back then the understanding was within_size may overallocate, 
>>>>>> meaning there would be some space wastage, until the memory 
>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>> haven't checked if that still applies.
>>>>>>
>>>>>> Other than that I don't know if some drivers/platforms could have 
>>>>>> problems if they have some limitations or hardcoded assumptions 
>>>>>> when they iterate the sg list.
>>>>>
>>>>> Yeah, that was the whole point behind my question. As far as I can 
>>>>> see this isn't driver specific, but platform specific.
>>>>>
>>>>> I might be wrong here, but I think we should then probably not have 
>>>>> that handling in each individual driver, but rather centralized in 
>>>>> the DRM code.
>>>>
>>>> I don't see a point in enabling THP for all shmem drivers. A huge page
>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>> I only need huge pages because I need the memory contiguously allocated
>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>> support implemented in the driver, I would be creating memory pressure
>>>> without any performance gain.
>>>
>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>> create much extra memory pressure for this use case.
>>>
>>> As far as I can see background for the option is that files in tmpfs 
>>> usually have a varying size, so it usually isn't beneficial to 
>>> allocate a huge page just to find that the shmem file is much smaller 
>>> than what's needed.
>>>
>>> But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
>>> or 1GiB and can therefore directly allocate huge pages if they are 
>>> available and object large enough to back them with.
>>>
>>> If the memory pressure is so high that we don't have huge pages 
>>> available the shmem code falls back to standard pages anyway.
>>
>> The matter is: how do we define the point where the memory pressure is 
>> high?
> 
> Well as driver developers/maintainers we simply don't do that. This is 
> the job of the shmem code.
> 
>> For example, notice that in this implementation of Super Pages
>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>> available for the GPU. If I created huge pages for every BO allocation 
>> (and initially, I tried that), I would end up with hangs in some 
>> applications.
> 
> Yeah, that is what I meant with the trivial optimisation to the shmem 
> code. Essentially when you have 2MiB+4KiB as BO size then the shmem code 
> should use a 2MiB and a 4KiB page to back them, but what it currently 
> does is to use two 2MiB pages and then split up the second page when it 
> find that it isn't needed.
> 
> That is wasteful and leads to fragmentation, but as soon as we stop 
> doing that we should be fine to enable it unconditionally for all drivers.

I see your point, but I believe that it would be tangent to the goal of
this series. As you mentioned, currently, we have a lot of memory
fragmentation when using THP and while we don't solve that (at the tmpfs
level), I believe we could move on with the current implementation (with
improvements proposed by Tvrtko).

Best Regards,
- Maíra

> 
> TTM does essentially the same thing for years.
> 
> Regards,
> Christian.
> 
>>
>>
>> At least, for V3D, I wouldn't like to see THP being used for all the 
>> allocations. But, we have maintainers of other drivers in the CC.
>>
>> Best Regards,
>> - Maíra
>>
>>>
>>> So THP is almost always beneficial for GEM even if the driver doesn't 
>>> actually need it. The only potential case I can think of which might 
>>> not be handled gracefully is the tail pages, e.g. huge + 4kib.
>>>
>>> But that is trivial to optimize in the shmem code when the final size 
>>> of the file is known beforehand.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Best Regards,
>>>> - Maíra
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>>
>>>>>> Te Cc is plenty large so perhaps someone else will have additional 
>>>>>> information. :)
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>>>>
>>>>>>>
>>>>>>> I mean it would make this patch here even smaller.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Tvrtko
>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 14:24                           ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
@ 2024-03-18 15:05                             ` Christian König
  2024-03-18 15:27                               ` Tvrtko Ursulin
  0 siblings, 1 reply; 3+ messages in thread
From: Christian König @ 2024-03-18 15:05 UTC (permalink / raw)
  To: Maíra Canal, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm

Am 18.03.24 um 15:24 schrieb Maíra Canal:
> Not that the CC list wasn't big enough, but I'm adding MM folks
> in the CC list.
>
> On 3/18/24 11:04, Christian König wrote:
>> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>>> Hi Christian,
>>>
>>> On 3/18/24 10:10, Christian König wrote:
>>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>>> Hi Christian,
>>>>>
>>>>> On 3/12/24 10:48, Christian König wrote:
>>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>>
>>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags 
>>>>>>>>>>>>>> that better match
>>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>>> created. If
>>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>>
>>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), 
>>>>>>>>>>>>> and make drm_gem_object_init() call that one with a NULL 
>>>>>>>>>>>>> argument.
>>>>>>>>>>>>
>>>>>>>>>>>> I would even go a step further into the other direction. 
>>>>>>>>>>>> The shmem backed GEM object is just some special handling 
>>>>>>>>>>>> as far as I can see.
>>>>>>>>>>>>
>>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>>
>>>>>>>>>>> That makes sense although it would be very churny. I at 
>>>>>>>>>>> least would be on the fence regarding the cost vs benefit.
>>>>>>>>>>
>>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>>
>>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>>
>>>>>>>>>> That huge pages are beneficial is clear to me, but I'm 
>>>>>>>>>> missing the connection why a different mount point helps with 
>>>>>>>>>> using huge pages.
>>>>>>>>>
>>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>>
>>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>>
>>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for 
>>>>>>>> a DRM device to not use huge pages with the shmem backend?
>>>>>>>
>>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>>> back then the understanding was within_size may overallocate, 
>>>>>>> meaning there would be some space wastage, until the memory 
>>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>>> haven't checked if that still applies.
>>>>>>>
>>>>>>> Other than that I don't know if some drivers/platforms could 
>>>>>>> have problems if they have some limitations or hardcoded 
>>>>>>> assumptions when they iterate the sg list.
>>>>>>
>>>>>> Yeah, that was the whole point behind my question. As far as I 
>>>>>> can see this isn't driver specific, but platform specific.
>>>>>>
>>>>>> I might be wrong here, but I think we should then probably not 
>>>>>> have that handling in each individual driver, but rather 
>>>>>> centralized in the DRM code.
>>>>>
>>>>> I don't see a point in enabling THP for all shmem drivers. A huge 
>>>>> page
>>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>>> I only need huge pages because I need the memory contiguously 
>>>>> allocated
>>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>>> support implemented in the driver, I would be creating memory 
>>>>> pressure
>>>>> without any performance gain.
>>>>
>>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>>> create much extra memory pressure for this use case.
>>>>
>>>> As far as I can see background for the option is that files in 
>>>> tmpfs usually have a varying size, so it usually isn't beneficial 
>>>> to allocate a huge page just to find that the shmem file is much 
>>>> smaller than what's needed.
>>>>
>>>> But GEM objects have a fixed size. So we of hand knew if we need 
>>>> 4KiB or 1GiB and can therefore directly allocate huge pages if they 
>>>> are available and object large enough to back them with.
>>>>
>>>> If the memory pressure is so high that we don't have huge pages 
>>>> available the shmem code falls back to standard pages anyway.
>>>
>>> The matter is: how do we define the point where the memory pressure 
>>> is high?
>>
>> Well as driver developers/maintainers we simply don't do that. This 
>> is the job of the shmem code.
>>
>>> For example, notice that in this implementation of Super Pages
>>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>>> available for the GPU. If I created huge pages for every BO 
>>> allocation (and initially, I tried that), I would end up with hangs 
>>> in some applications.
>>
>> Yeah, that is what I meant with the trivial optimisation to the shmem 
>> code. Essentially when you have 2MiB+4KiB as BO size then the shmem 
>> code should use a 2MiB and a 4KiB page to back them, but what it 
>> currently does is to use two 2MiB pages and then split up the second 
>> page when it find that it isn't needed.
>>
>> That is wasteful and leads to fragmentation, but as soon as we stop 
>> doing that we should be fine to enable it unconditionally for all 
>> drivers.
>
> I see your point, but I believe that it would be tangent to the goal of
> this series. As you mentioned, currently, we have a lot of memory
> fragmentation when using THP and while we don't solve that (at the tmpfs
> level), I believe we could move on with the current implementation (with
> improvements proposed by Tvrtko).

Oh, I seriously don't want to block this patch set here. Just asking if 
it's the right approach.

Point is we might need to revert the driver changes again when THP is 
further optimized and the options aren't needed any more.

But if and how that should happen is perfectly up to Tvrtko.

Regards,
Christian.

>
> Best Regards,
> - Maíra
>
>>
>> TTM does essentially the same thing for years.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> At least, for V3D, I wouldn't like to see THP being used for all the 
>>> allocations. But, we have maintainers of other drivers in the CC.
>>>
>>> Best Regards,
>>> - Maíra
>>>
>>>>
>>>> So THP is almost always beneficial for GEM even if the driver 
>>>> doesn't actually need it. The only potential case I can think of 
>>>> which might not be handled gracefully is the tail pages, e.g. huge 
>>>> + 4kib.
>>>>
>>>> But that is trivial to optimize in the shmem code when the final 
>>>> size of the file is known beforehand.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Best Regards,
>>>>> - Maíra
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Te Cc is plenty large so perhaps someone else will have 
>>>>>>> additional information. :)
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tvrtko
>>>>>>>
>>>>>>>>
>>>>>>>> I mean it would make this patch here even smaller.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Tvrtko
>>>>>>>>
>>>>>>
>>>>
>>



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 15:05                             ` Christian König
@ 2024-03-18 15:27                               ` Tvrtko Ursulin
  0 siblings, 0 replies; 3+ messages in thread
From: Tvrtko Ursulin @ 2024-03-18 15:27 UTC (permalink / raw)
  To: Christian König, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm


On 18/03/2024 15:05, Christian König wrote:
> Am 18.03.24 um 15:24 schrieb Maíra Canal:
>> Not that the CC list wasn't big enough, but I'm adding MM folks
>> in the CC list.
>>
>> On 3/18/24 11:04, Christian König wrote:
>>> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>>>> Hi Christian,
>>>>
>>>> On 3/18/24 10:10, Christian König wrote:
>>>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>>>> Hi Christian,
>>>>>>
>>>>>> On 3/12/24 10:48, Christian König wrote:
>>>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags 
>>>>>>>>>>>>>>> that better match
>>>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>>>> created. If
>>>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), 
>>>>>>>>>>>>>> and make drm_gem_object_init() call that one with a NULL 
>>>>>>>>>>>>>> argument.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would even go a step further into the other direction. 
>>>>>>>>>>>>> The shmem backed GEM object is just some special handling 
>>>>>>>>>>>>> as far as I can see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>>>
>>>>>>>>>>>> That makes sense although it would be very churny. I at 
>>>>>>>>>>>> least would be on the fence regarding the cost vs benefit.
>>>>>>>>>>>
>>>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>>>
>>>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>>>
>>>>>>>>>>> That huge pages are beneficial is clear to me, but I'm 
>>>>>>>>>>> missing the connection why a different mount point helps with 
>>>>>>>>>>> using huge pages.
>>>>>>>>>>
>>>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>>>
>>>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>>>
>>>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for 
>>>>>>>>> a DRM device to not use huge pages with the shmem backend?
>>>>>>>>
>>>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>>>> back then the understanding was within_size may overallocate, 
>>>>>>>> meaning there would be some space wastage, until the memory 
>>>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>>>> haven't checked if that still applies.
>>>>>>>>
>>>>>>>> Other than that I don't know if some drivers/platforms could 
>>>>>>>> have problems if they have some limitations or hardcoded 
>>>>>>>> assumptions when they iterate the sg list.
>>>>>>>
>>>>>>> Yeah, that was the whole point behind my question. As far as I 
>>>>>>> can see this isn't driver specific, but platform specific.
>>>>>>>
>>>>>>> I might be wrong here, but I think we should then probably not 
>>>>>>> have that handling in each individual driver, but rather 
>>>>>>> centralized in the DRM code.
>>>>>>
>>>>>> I don't see a point in enabling THP for all shmem drivers. A huge 
>>>>>> page
>>>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>>>> I only need huge pages because I need the memory contiguously 
>>>>>> allocated
>>>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>>>> support implemented in the driver, I would be creating memory 
>>>>>> pressure
>>>>>> without any performance gain.
>>>>>
>>>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>>>> create much extra memory pressure for this use case.
>>>>>
>>>>> As far as I can see background for the option is that files in 
>>>>> tmpfs usually have a varying size, so it usually isn't beneficial 
>>>>> to allocate a huge page just to find that the shmem file is much 
>>>>> smaller than what's needed.
>>>>>
>>>>> But GEM objects have a fixed size. So we of hand knew if we need 
>>>>> 4KiB or 1GiB and can therefore directly allocate huge pages if they 
>>>>> are available and object large enough to back them with.
>>>>>
>>>>> If the memory pressure is so high that we don't have huge pages 
>>>>> available the shmem code falls back to standard pages anyway.
>>>>
>>>> The matter is: how do we define the point where the memory pressure 
>>>> is high?
>>>
>>> Well as driver developers/maintainers we simply don't do that. This 
>>> is the job of the shmem code.
>>>
>>>> For example, notice that in this implementation of Super Pages
>>>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>>>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>>>> available for the GPU. If I created huge pages for every BO 
>>>> allocation (and initially, I tried that), I would end up with hangs 
>>>> in some applications.
>>>
>>> Yeah, that is what I meant with the trivial optimisation to the shmem 
>>> code. Essentially when you have 2MiB+4KiB as BO size then the shmem 
>>> code should use a 2MiB and a 4KiB page to back them, but what it 
>>> currently does is to use two 2MiB pages and then split up the second 
>>> page when it find that it isn't needed.
>>>
>>> That is wasteful and leads to fragmentation, but as soon as we stop 
>>> doing that we should be fine to enable it unconditionally for all 
>>> drivers.
>>
>> I see your point, but I believe that it would be tangent to the goal of
>> this series. As you mentioned, currently, we have a lot of memory
>> fragmentation when using THP and while we don't solve that (at the tmpfs
>> level), I believe we could move on with the current implementation (with
>> improvements proposed by Tvrtko).
> 
> Oh, I seriously don't want to block this patch set here. Just asking if 
> it's the right approach.
> 
> Point is we might need to revert the driver changes again when THP is 
> further optimized and the options aren't needed any more.
> 
> But if and how that should happen is perfectly up to Tvrtko.

Seem I got some un-intended voting powers here. :)

What I think would work best is, if Maíra solves the v3d part first and 
then she can propose/float the wider DRM/GEM change to always use THP. 
Because that one will require some wider testing and acks.

My concern there will be the behaviour of within_size mode. I was 
reading 465c403cb508 (drm/i915: introduce simple gemfs) the other day 
which made my think even then THP would initially over allocate. Or 
maybe the comment added in that commit wasn't fully accurate. If 
within_size is not high impact like that, then it GEM wide change might 
be feasible. With some opt-out facility or not I don't know.

In any case, as long as the v3d work can be done with not too much churn 
to common code, I think separating that as follow up should not cause a 
lot of additional churn. Should all be hidden in the implementation 
details if all goes to plan.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-18 15:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240311100959.205545-1-mcanal@igalia.com>
     [not found] ` <20240311100959.205545-3-mcanal@igalia.com>
     [not found]   ` <30a7f20b-1f2c-41cb-b193-03429c160b63@igalia.com>
     [not found]     ` <3a5d07c0-120f-47f9-a35a-31f3bfcc9330@amd.com>
     [not found]       ` <07885e3b-ee7b-456b-9fad-17d9009a4cb7@igalia.com>
     [not found]         ` <a0ac2e10-f5f5-4fbb-b591-bd188967ce53@amd.com>
     [not found]           ` <e8165baa-9ded-4149-aaa6-6713d112b621@igalia.com>
     [not found]             ` <69576e6d-9704-42b9-905f-289f9f9017b9@amd.com>
     [not found]               ` <ed7ecd56-0b5e-4543-80f6-7d28ddf8e2ec@igalia.com>
     [not found]                 ` <9cfb7f83-2d76-4e8d-9052-5975da71e0dc@amd.com>
     [not found]                   ` <2118492c-f223-41e9-8c1e-3c03d976301e@igalia.com>
     [not found]                     ` <68452c6c-12d7-4e60-9598-369acc9fc360@amd.com>
     [not found]                       ` <a07e4bc2-04af-4e98-8a9c-ccab31bca854@igalia.com>
     [not found]                         ` <5f1a758b-b4f6-4fc3-b16b-29d731415636@amd.com>
2024-03-18 14:24                           ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
2024-03-18 15:05                             ` Christian König
2024-03-18 15:27                               ` Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox