linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
       [not found]   ` <ZRqeoiZ2ayrAR6AV@debian.me>
@ 2023-10-02 11:02     ` Oleksandr Natalenko
  2023-10-02 14:32       ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Oleksandr Natalenko @ 2023-10-02 11:02 UTC (permalink / raw)
  To: linux-kernel, Bagas Sanjaya
  Cc: linux-media, linaro-mm-sig, dri-devel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Sumit Semwal, Christian König, Linux Regressions,
	Matthew Wilcox, Andrew Morton, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3746 bytes --]

/cc Matthew, Andrew (please see below)

On pondělí 2. října 2023 12:42:42 CEST Bagas Sanjaya wrote:
> On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> > 
> > On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
> > > On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> > > > Hello.
> > > > 
> > > > I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
> > > > 
> > > > ```
> > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > > 
> > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > >  drm_gem_put_pages+0x186/0x250
> > > >  drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > >  drm_gem_shmem_object_vunmap+0x83/0xe0
> > > >  drm_gem_vunmap_unlocked+0x46/0xb0
> > > >  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > >  drm_fb_helper_damage_work+0x96/0x170
> > > >  process_one_work+0x254/0x470
> > > >  worker_thread+0x55/0x4f0
> > > >  kthread+0xe8/0x120
> > > >  ret_from_fork+0x34/0x50
> > > >  ret_from_fork_asm+0x1b/0x30
> > > > 
> > > > kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
> > > > 
> > > > allocated by task 51 on cpu 0 at 14.668667s:
> > > >  drm_gem_get_pages+0x94/0x2b0
> > > >  drm_gem_shmem_get_pages+0x5d/0x110
> > > >  drm_gem_shmem_object_vmap+0xc4/0x1e0
> > > >  drm_gem_vmap_unlocked+0x3c/0x70
> > > >  drm_client_buffer_vmap+0x23/0x50
> > > >  drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> > > >  drm_fb_helper_damage_work+0x96/0x170
> > > >  process_one_work+0x254/0x470
> > > >  worker_thread+0x55/0x4f0
> > > >  kthread+0xe8/0x120
> > > >  ret_from_fork+0x34/0x50
> > > >  ret_from_fork_asm+0x1b/0x30
> > > > 
> > > > freed by task 51 on cpu 0 at 14.668697s:
> > > >  drm_gem_put_pages+0x186/0x250
> > > >  drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > >  drm_gem_shmem_object_vunmap+0x83/0xe0
> > > >  drm_gem_vunmap_unlocked+0x46/0xb0
> > > >  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > >  drm_fb_helper_damage_work+0x96/0x170
> > > >  process_one_work+0x254/0x470
> > > >  worker_thread+0x55/0x4f0
> > > >  kthread+0xe8/0x120
> > > >  ret_from_fork+0x34/0x50
> > > >  ret_from_fork_asm+0x1b/0x30
> > > > 
> > > > CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > > > Workqueue: events drm_fb_helper_damage_work
> > > > ```
> > > > 
> > > > This repeats a couple of times and then stops.
> > > > 
> > > > Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
> > > > 
> > > > The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> > > > 
> > > 
> > > Do you have this issue on v6.4?
> > 
> > No, I did not have this issue with v6.4.
> > 
> 
> Then proceed with kernel bisection. You can refer to
> Documentation/admin-guide/bug-bisect.rst in the kernel sources for the
> process.

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

In the git log between v6.4 and v6.5 I see this:

```
commit 3291e09a463870610b8227f32b16b19a587edf33
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jun 21 17:45:49 2023 +0100

drm: convert drm_gem_put_pages() to use a folio_batch

Remove a few hidden compound_head() calls by converting the returned page
to a folio once and using the folio APIs.
```

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-02 11:02     ` [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Oleksandr Natalenko
@ 2023-10-02 14:32       ` Matthew Wilcox
  2023-10-02 15:38         ` Oleksandr Natalenko
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2023-10-02 14:32 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, Bagas Sanjaya, linux-media, linaro-mm-sig,
	dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sumit Semwal, Christian König,
	Linux Regressions, Andrew Morton, linux-mm

On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > > > 
> > > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > >  drm_gem_put_pages+0x186/0x250
> > > > >  drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > >  drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > >  drm_gem_vunmap_unlocked+0x46/0xb0
> > > > >  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > >  drm_fb_helper_damage_work+0x96/0x170
> 
> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible.  I think you have two useful points to look at
before delving into a full bisect -- 863a8e and the parent of 0b62af.
If either of them work, I think you have no more work to do.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-02 14:32       ` Matthew Wilcox
@ 2023-10-02 15:38         ` Oleksandr Natalenko
  2023-10-05  7:44           ` Thomas Zimmermann
  0 siblings, 1 reply; 9+ messages in thread
From: Oleksandr Natalenko @ 2023-10-02 15:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Bagas Sanjaya, linux-media, linaro-mm-sig,
	dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sumit Semwal, Christian König,
	Linux Regressions, Andrew Morton, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > > > > 
> > > > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > > >  drm_gem_put_pages+0x186/0x250
> > > > > >  drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > > >  drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > > >  drm_gem_vunmap_unlocked+0x46/0xb0
> > > > > >  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > > >  drm_fb_helper_damage_work+0x96/0x170
> > 
> > Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> 
> Yes, entirely plausible.  I think you have two useful points to look at
> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:

```
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
```

then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-02 15:38         ` Oleksandr Natalenko
@ 2023-10-05  7:44           ` Thomas Zimmermann
  2023-10-05  7:56             ` Oleksandr Natalenko
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Zimmermann @ 2023-10-05  7:44 UTC (permalink / raw)
  To: Oleksandr Natalenko, Matthew Wilcox
  Cc: Linux Regressions, linux-kernel, dri-devel, Christian König,
	linaro-mm-sig, linux-mm, Maxime Ripard, Bagas Sanjaya,
	Andrew Morton, Sumit Semwal, linux-media


[-- Attachment #1.1: Type: text/plain, Size: 1892 bytes --]

Hi

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
>> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
>>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
>>>>>>>
>>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
>>>>>>>   drm_gem_put_pages+0x186/0x250
>>>>>>>   drm_gem_shmem_put_pages_locked+0x43/0xc0
>>>>>>>   drm_gem_shmem_object_vunmap+0x83/0xe0
>>>>>>>   drm_gem_vunmap_unlocked+0x46/0xb0
>>>>>>>   drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
>>>>>>>   drm_fb_helper_damage_work+0x96/0x170
>>>
>>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
>>
>> Yes, entirely plausible.  I think you have two useful points to look at
>> before delving into a full bisect -- 863a8e and the parent of 0b62af.
>> If either of them work, I think you have no more work to do.
> 
> OK, I've did this against v6.5.5:
> 
> ```
> git log --oneline HEAD~3..
> 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> ```
> 
> then rebooted the host multiple times, and the issue is not seen any more.
> 
> So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Ignore my other email. It's apparently been fixed already. Thanks!

Best regards
Thomas

> 
> Thanks.
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-05  7:44           ` Thomas Zimmermann
@ 2023-10-05  7:56             ` Oleksandr Natalenko
  2023-10-05 12:19               ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Oleksandr Natalenko @ 2023-10-05  7:56 UTC (permalink / raw)
  To: Matthew Wilcox, Thomas Zimmermann
  Cc: Linux Regressions, linux-kernel, dri-devel, Christian König,
	linaro-mm-sig, linux-mm, Maxime Ripard, Bagas Sanjaya,
	Andrew Morton, Sumit Semwal, linux-media

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

Hello.

On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> Hi
> 
> Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> >>>>>>>
> >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> >>>>>>>   drm_gem_put_pages+0x186/0x250
> >>>>>>>   drm_gem_shmem_put_pages_locked+0x43/0xc0
> >>>>>>>   drm_gem_shmem_object_vunmap+0x83/0xe0
> >>>>>>>   drm_gem_vunmap_unlocked+0x46/0xb0
> >>>>>>>   drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> >>>>>>>   drm_fb_helper_damage_work+0x96/0x170
> >>>
> >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> >>
> >> Yes, entirely plausible.  I think you have two useful points to look at
> >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> >> If either of them work, I think you have no more work to do.
> > 
> > OK, I've did this against v6.5.5:
> > 
> > ```
> > git log --oneline HEAD~3..
> > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > ```
> > 
> > then rebooted the host multiple times, and the issue is not seen any more.
> > 
> > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> 
> Ignore my other email. It's apparently been fixed already. Thanks!

Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

Thanks.

> Best regards
> Thomas
> 
> > 
> > Thanks.
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 


-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-05  7:56             ` Oleksandr Natalenko
@ 2023-10-05 12:19               ` Matthew Wilcox
  2023-10-05 12:30                 ` Oleksandr Natalenko
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2023-10-05 12:19 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Thomas Zimmermann, Linux Regressions, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, linux-mm, Maxime Ripard,
	Bagas Sanjaya, Andrew Morton, Sumit Semwal, linux-media

On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> > Hi
> > 
> > Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> > >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > >>>>>>>
> > >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > >>>>>>>   drm_gem_put_pages+0x186/0x250
> > >>>>>>>   drm_gem_shmem_put_pages_locked+0x43/0xc0
> > >>>>>>>   drm_gem_shmem_object_vunmap+0x83/0xe0
> > >>>>>>>   drm_gem_vunmap_unlocked+0x46/0xb0
> > >>>>>>>   drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > >>>>>>>   drm_fb_helper_damage_work+0x96/0x170
> > >>>
> > >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> > >>
> > >> Yes, entirely plausible.  I think you have two useful points to look at
> > >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> > >> If either of them work, I think you have no more work to do.
> > > 
> > > OK, I've did this against v6.5.5:
> > > 
> > > ```
> > > git log --oneline HEAD~3..
> > > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > > ```
> > > 
> > > then rebooted the host multiple times, and the issue is not seen any more.
> > > 
> > > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> > 
> > Ignore my other email. It's apparently been fixed already. Thanks!
> 
> Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

I don't understand; you said reverting those DRM commits fixed the
problem, so 863a8eb3f270 is the solution.  No?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-05 12:19               ` Matthew Wilcox
@ 2023-10-05 12:30                 ` Oleksandr Natalenko
  2023-10-05 13:05                   ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Oleksandr Natalenko @ 2023-10-05 12:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Zimmermann, Linux Regressions, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, linux-mm, Maxime Ripard,
	Bagas Sanjaya, Andrew Morton, Sumit Semwal, linux-media

[-- Attachment #1: Type: text/plain, Size: 2967 bytes --]

Hello.

On čtvrtek 5. října 2023 14:19:44 CEST Matthew Wilcox wrote:
> On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> > 
> > On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> > > Hi
> > > 
> > > Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > > > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> > > >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > >>>>>>>
> > > >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > >>>>>>>   drm_gem_put_pages+0x186/0x250
> > > >>>>>>>   drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > >>>>>>>   drm_gem_shmem_object_vunmap+0x83/0xe0
> > > >>>>>>>   drm_gem_vunmap_unlocked+0x46/0xb0
> > > >>>>>>>   drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > >>>>>>>   drm_fb_helper_damage_work+0x96/0x170
> > > >>>
> > > >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> > > >>
> > > >> Yes, entirely plausible.  I think you have two useful points to look at
> > > >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> > > >> If either of them work, I think you have no more work to do.
> > > > 
> > > > OK, I've did this against v6.5.5:
> > > > 
> > > > ```
> > > > git log --oneline HEAD~3..
> > > > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > > > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > > > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > > > ```
> > > > 
> > > > then rebooted the host multiple times, and the issue is not seen any more.
> > > > 
> > > > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> > > 
> > > Ignore my other email. It's apparently been fixed already. Thanks!
> > 
> > Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
> 
> I don't understand; you said reverting those DRM commits fixed the
> problem, so 863a8eb3f270 is the solution.  No?

No-no, sorry for possible confusion. Let me explain again:

1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Hope this gets clear.

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-05 12:30                 ` Oleksandr Natalenko
@ 2023-10-05 13:05                   ` Matthew Wilcox
  2023-10-05 13:34                     ` Oleksandr Natalenko
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2023-10-05 13:05 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Thomas Zimmermann, Linux Regressions, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, linux-mm, Maxime Ripard,
	Bagas Sanjaya, Andrew Morton, Sumit Semwal, linux-media

On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
> No-no, sorry for possible confusion. Let me explain again:
> 
> 1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
> 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
> 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
> 
> IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Thank you!  Sorry about the misunderstanding.  Try this:

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6129b89bb366..44a948b80ee1 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
 	struct page **pages;
 	struct folio *folio;
 	struct folio_batch fbatch;
-	int i, j, npages;
+	long i, j, npages;
 
 	if (WARN_ON(!obj->filp))
 		return ERR_PTR(-EINVAL);
@@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
 
 	i = 0;
 	while (i < npages) {
+		long nr;
 		folio = shmem_read_folio_gfp(mapping, i,
 				mapping_gfp_mask(mapping));
 		if (IS_ERR(folio))
 			goto fail;
-		for (j = 0; j < folio_nr_pages(folio); j++, i++)
+		nr = min(npages - i, folio_nr_pages(folio));
+		for (j = 0; j < nr; j++, i++)
 			pages[i] = folio_file_page(folio, i);
 
 		/* Make sure shmem keeps __GFP_DMA32 allocated pages in the




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
  2023-10-05 13:05                   ` Matthew Wilcox
@ 2023-10-05 13:34                     ` Oleksandr Natalenko
  0 siblings, 0 replies; 9+ messages in thread
From: Oleksandr Natalenko @ 2023-10-05 13:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Zimmermann, Linux Regressions, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, linux-mm, Maxime Ripard,
	Bagas Sanjaya, Andrew Morton, Sumit Semwal, linux-media

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

On čtvrtek 5. října 2023 15:05:27 CEST Matthew Wilcox wrote:
> On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
> > No-no, sorry for possible confusion. Let me explain again:
> > 
> > 1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
> > 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
> > 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
> > 
> > IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.
> 
> Thank you!  Sorry about the misunderstanding.  Try this:
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 6129b89bb366..44a948b80ee1 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
>  	struct page **pages;
>  	struct folio *folio;
>  	struct folio_batch fbatch;
> -	int i, j, npages;
> +	long i, j, npages;
>  
>  	if (WARN_ON(!obj->filp))
>  		return ERR_PTR(-EINVAL);
> @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
>  
>  	i = 0;
>  	while (i < npages) {
> +		long nr;
>  		folio = shmem_read_folio_gfp(mapping, i,
>  				mapping_gfp_mask(mapping));
>  		if (IS_ERR(folio))
>  			goto fail;
> -		for (j = 0; j < folio_nr_pages(folio); j++, i++)
> +		nr = min(npages - i, folio_nr_pages(folio));
> +		for (j = 0; j < nr; j++, i++)
>  			pages[i] = folio_file_page(folio, i);
>  
>  		/* Make sure shmem keeps __GFP_DMA32 allocated pages in the

No issues after five reboots with this patch applied on top of v6.5.5.

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Link: https://lore.kernel.org/lkml/13360591.uLZWGnKmhe@natalenko.name/
Fixes: 3291e09a4638 ("drm: convert drm_gem_put_pages() to use a folio_batch")
Cc: stable@vger.kernel.org # 6.5.x

Thank you!

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-05 13:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <13360591.uLZWGnKmhe@natalenko.name>
     [not found] ` <2701570.mvXUDI8C0e@natalenko.name>
     [not found]   ` <ZRqeoiZ2ayrAR6AV@debian.me>
2023-10-02 11:02     ` [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Oleksandr Natalenko
2023-10-02 14:32       ` Matthew Wilcox
2023-10-02 15:38         ` Oleksandr Natalenko
2023-10-05  7:44           ` Thomas Zimmermann
2023-10-05  7:56             ` Oleksandr Natalenko
2023-10-05 12:19               ` Matthew Wilcox
2023-10-05 12:30                 ` Oleksandr Natalenko
2023-10-05 13:05                   ` Matthew Wilcox
2023-10-05 13:34                     ` Oleksandr Natalenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox