* Re: [PATCH] fbdev: defio: fix the pagelist corruption
[not found] ` <BL1PR11MB5445633C68B3039320FE780E97E19@BL1PR11MB5445.namprd11.prod.outlook.com>
@ 2022-03-31 8:21 ` Paul Menzel
2022-03-31 8:39 ` Liu, Chuansheng
0 siblings, 1 reply; 2+ messages in thread
From: Paul Menzel @ 2022-03-31 8:21 UTC (permalink / raw)
To: Chuansheng Liu
Cc: tzimmermann, linux-fbdev, deller, dri-devel, Song Liu, linux-mm,
bpf, netdev, x86, ast, daniel, andrii, kernel-team, akpm,
rick.p.edgecombe, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen
Dear Chuansheng,
Am 31.03.22 um 02:06 schrieb Liu, Chuansheng:
>> -----Original Message-----
>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>> Sent: Thursday, March 31, 2022 12:47 AM
[…]
>> Am 29.03.22 um 01:58 schrieb Liu, Chuansheng:
>>
>>>> -----Original Message-----
>>>> From: Paul Menzel
>>>> Sent: Monday, March 28, 2022 2:15 PM
>>
>>>> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:
>>>>
>>>>>> -----Original Message-----
>>>>
>>>>>> Sent: Saturday, March 26, 2022 4:11 PM
>>>>
>>>>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
>>>>>>> Easily hit the below list corruption:
>>>>>>> ==
>>>>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but
>>>>>>> was ffffec604507edc8. (prev=ffffec604507edc8).
>>>>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
>>>>>>> __list_add_valid+0x53/0x80
>>>>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U
>>>>>>> RIP: 0010:__list_add_valid+0x53/0x80
>>>>>>> Call Trace:
>>>>>>> <TASK>
>>>>>>> fb_deferred_io_mkwrite+0xea/0x150
>>>>>>> do_page_mkwrite+0x57/0xc0
>>>>>>> do_wp_page+0x278/0x2f0
>>>>>>> __handle_mm_fault+0xdc2/0x1590
>>>>>>> handle_mm_fault+0xdd/0x2c0
>>>>>>> do_user_addr_fault+0x1d3/0x650
>>>>>>> exc_page_fault+0x77/0x180
>>>>>>> ? asm_exc_page_fault+0x8/0x30
>>>>>>> asm_exc_page_fault+0x1e/0x30
>>>>>>> RIP: 0033:0x7fd98fc8fad1
>>>>>>> ==
>>>>>>>
>>>>>>> Figure out the race happens when one process is adding &page->lru into
>>>>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is
>>>>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is
>>>>>>> not protected by the lock.
>>>>>>>
>>>>>>> This fix is to init all the page lists one time during initialization,
>>>>>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
>>>>>>> redundantly.
>>>>>>>
>>>>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already enlisted")
>>>>>>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
>>>>>>> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
>>>>>>> ---
>>>>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
>>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>>>>>>> index 98b0f23bf5e2..eafb66ca4f28 100644
>>>>>>> --- a/drivers/video/fbdev/core/fb_defio.c
>>>>>>> +++ b/drivers/video/fbdev/core/fb_defio.c
>>>>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>>>>>>> printk(KERN_ERR "no mapping available\n");
>>>>>>>
>>>>>>> BUG_ON(!page->mapping);
>>>>>>> - INIT_LIST_HEAD(&page->lru);
>>>>>>> page->index = vmf->pgoff;
>>>>>>>
>>>>>>> vmf->page = page;
>>>>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct *work)
>>>>>>> void fb_deferred_io_init(struct fb_info *info)
>>>>>>> {
>>>>>>> struct fb_deferred_io *fbdefio = info->fbdefio;
>>>>>>> + struct page *page;
>>>>>>> + int i;
>>>>>>>
>>>>>>> BUG_ON(!fbdefio);
>>>>>>> mutex_init(&fbdefio->lock);
>>>>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
>>>>>>> INIT_LIST_HEAD(&fbdefio->pagelist);
>>>>>>> if (fbdefio->delay == 0) /* set a default of 1 s */
>>>>>>> fbdefio->delay = HZ;
>>>>>>> +
>>>>>>> + /* initialize all the page lists one time */
>>>>>>> + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
>>>>>>> + page = fb_deferred_io_page(info, i);
>>>>>>> + INIT_LIST_HEAD(&page->lru);
>>>>>>> + }
>>>>>>> }
>>>>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init);
>>>>>>>
>>>>>> Applying your patch on top of current Linus’ master branch, tty0 is
>>>>>> unusable and looks frozen. Sometimes network card still works, sometimes
>>>>>> not.
>>>>>
>>>>> I don't see how the patch would cause below BUG call stack, need some time to
>>>>> debug. Just few comments:
>>>>> 1. Will the system work well without this patch?
>>>>
>>>> Yes, the framebuffer works well without the patch.
>>>>
>>>>> 2. When you are sure the patch causes the regression you saw, please get free
>>>> to submit one reverted patch, thanks : )
>>>>
>>>> I think you for patch wasn’t submitted yet – at least not pulled by Linus.
>>> The patch has been in drm-tip, could you have a try with the latest drm-tip to see
>>> if the Framebuffer works well, in that case, we could revert it in drm-tip then.
>>
>> With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration
>> manifest) everything works fine. (I had to disable amdgpu driver, as it
>> failed to build.) Is anyone able to explain that?
>
> My patch is for fixing another patch which is in the drm-tip at least,
The referenced commit 105a940416fc in the Fixes tag is also in Linus’
master branch.
> so I assume applying my patch into Linus tree directly is not
> completely proper. That's my intention of asking your help for
> retesting drm-tip.
If there were such a relation, that would need to be documented in the
commit message.
> You mean everything working fine means another issue you hit is also
> gone?
No, I just mean the hang when applying your patch.
Anyway, after figuring out, that drm-tip, is actually not behind Linus’
master branch, I tried to figure out the differences, and it turns out
it’s also related to commit fac54e2bfb5b (x86/Kconfig: Select
HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP) [1], which is in Linus’
master branch, but not drm-tip. Note, I am using a 32-bit user space and
a 64-bit Linux kernel. Reverting commit fac54e2bfb5b, and having your
patch a applied, the hang is gone.
I am adding the people involved in the other discussion to make them
aware of this failure case.
Kind regards,
Paul
[1]: https://linux-regtracking.leemhuis.info/regzbot/mainline/
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: [PATCH] fbdev: defio: fix the pagelist corruption
2022-03-31 8:21 ` [PATCH] fbdev: defio: fix the pagelist corruption Paul Menzel
@ 2022-03-31 8:39 ` Liu, Chuansheng
0 siblings, 0 replies; 2+ messages in thread
From: Liu, Chuansheng @ 2022-03-31 8:39 UTC (permalink / raw)
To: Paul Menzel
Cc: linux-fbdev, Dave Hansen, akpm, daniel, linux-mm, netdev, deller,
x86, ast, dri-devel, andrii, Song Liu, Ingo Molnar,
Thomas Gleixner, tzimmermann, Borislav Petkov, bpf, Edgecombe,
Rick P, kernel-team
Hi Paul,
> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Paul
> Menzel
> Sent: Thursday, March 31, 2022 4:22 PM
> To: Liu, Chuansheng <chuansheng.liu@intel.com>
> Cc: linux-fbdev@vger.kernel.org; Dave Hansen <dave.hansen@linux.intel.com>;
> akpm@linux-foundation.org; daniel@iogearbox.net; linux-mm@kvack.org;
> netdev@vger.kernel.org; deller@gmx.de; x86@kernel.org; ast@kernel.org; dri-
> devel@lists.freedesktop.org; andrii@kernel.org; Song Liu <song@kernel.org>;
> Ingo Molnar <mingo@redhat.com>; Thomas Gleixner <tglx@linutronix.de>;
> tzimmermann@suse.de; Borislav Petkov <bp@alien8.de>; bpf@vger.kernel.org;
> Edgecombe, Rick P <rick.p.edgecombe@intel.com>; kernel-team@fb.com
> Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption
>
> Dear Chuansheng,
>
>
> Am 31.03.22 um 02:06 schrieb Liu, Chuansheng:
>
> >> -----Original Message-----
> >> From: Paul Menzel <pmenzel@molgen.mpg.de>
> >> Sent: Thursday, March 31, 2022 12:47 AM
>
> […]
>
> >> Am 29.03.22 um 01:58 schrieb Liu, Chuansheng:
> >>
> >>>> -----Original Message-----
> >>>> From: Paul Menzel
> >>>> Sent: Monday, March 28, 2022 2:15 PM
> >>
> >>>> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:
> >>>>
> >>>>>> -----Original Message-----
> >>>>
> >>>>>> Sent: Saturday, March 26, 2022 4:11 PM
> >>>>
> >>>>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
> >>>>>>> Easily hit the below list corruption:
> >>>>>>> ==
> >>>>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but
> >>>>>>> was ffffec604507edc8. (prev=ffffec604507edc8).
> >>>>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
> >>>>>>> __list_add_valid+0x53/0x80
> >>>>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U
> >>>>>>> RIP: 0010:__list_add_valid+0x53/0x80
> >>>>>>> Call Trace:
> >>>>>>> <TASK>
> >>>>>>> fb_deferred_io_mkwrite+0xea/0x150
> >>>>>>> do_page_mkwrite+0x57/0xc0
> >>>>>>> do_wp_page+0x278/0x2f0
> >>>>>>> __handle_mm_fault+0xdc2/0x1590
> >>>>>>> handle_mm_fault+0xdd/0x2c0
> >>>>>>> do_user_addr_fault+0x1d3/0x650
> >>>>>>> exc_page_fault+0x77/0x180
> >>>>>>> ? asm_exc_page_fault+0x8/0x30
> >>>>>>> asm_exc_page_fault+0x1e/0x30
> >>>>>>> RIP: 0033:0x7fd98fc8fad1
> >>>>>>> ==
> >>>>>>>
> >>>>>>> Figure out the race happens when one process is adding &page->lru
> into
> >>>>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is
> >>>>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is
> >>>>>>> not protected by the lock.
> >>>>>>>
> >>>>>>> This fix is to init all the page lists one time during initialization,
> >>>>>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
> >>>>>>> redundantly.
> >>>>>>>
> >>>>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already
> enlisted")
> >>>>>>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> >>>>>>> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
> >>>>>>> ---
> >>>>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
> >>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c
> b/drivers/video/fbdev/core/fb_defio.c
> >>>>>>> index 98b0f23bf5e2..eafb66ca4f28 100644
> >>>>>>> --- a/drivers/video/fbdev/core/fb_defio.c
> >>>>>>> +++ b/drivers/video/fbdev/core/fb_defio.c
> >>>>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct
> vm_fault *vmf)
> >>>>>>> printk(KERN_ERR "no mapping available\n");
> >>>>>>>
> >>>>>>> BUG_ON(!page->mapping);
> >>>>>>> - INIT_LIST_HEAD(&page->lru);
> >>>>>>> page->index = vmf->pgoff;
> >>>>>>>
> >>>>>>> vmf->page = page;
> >>>>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct
> work_struct *work)
> >>>>>>> void fb_deferred_io_init(struct fb_info *info)
> >>>>>>> {
> >>>>>>> struct fb_deferred_io *fbdefio = info->fbdefio;
> >>>>>>> + struct page *page;
> >>>>>>> + int i;
> >>>>>>>
> >>>>>>> BUG_ON(!fbdefio);
> >>>>>>> mutex_init(&fbdefio->lock);
> >>>>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
> >>>>>>> INIT_LIST_HEAD(&fbdefio->pagelist);
> >>>>>>> if (fbdefio->delay == 0) /* set a default of 1 s */
> >>>>>>> fbdefio->delay = HZ;
> >>>>>>> +
> >>>>>>> + /* initialize all the page lists one time */
> >>>>>>> + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
> >>>>>>> + page = fb_deferred_io_page(info, i);
> >>>>>>> + INIT_LIST_HEAD(&page->lru);
> >>>>>>> + }
> >>>>>>> }
> >>>>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init);
> >>>>>>>
> >>>>>> Applying your patch on top of current Linus’ master branch, tty0 is
> >>>>>> unusable and looks frozen. Sometimes network card still works,
> sometimes
> >>>>>> not.
> >>>>>
> >>>>> I don't see how the patch would cause below BUG call stack, need some
> time to
> >>>>> debug. Just few comments:
> >>>>> 1. Will the system work well without this patch?
> >>>>
> >>>> Yes, the framebuffer works well without the patch.
> >>>>
> >>>>> 2. When you are sure the patch causes the regression you saw, please get
> free
> >>>> to submit one reverted patch, thanks : )
> >>>>
> >>>> I think you for patch wasn’t submitted yet – at least not pulled by Linus.
> >>> The patch has been in drm-tip, could you have a try with the latest drm-tip
> to see
> >>> if the Framebuffer works well, in that case, we could revert it in drm-tip then.
> >>
> >> With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration
> >> manifest) everything works fine. (I had to disable amdgpu driver, as it
> >> failed to build.) Is anyone able to explain that?
> >
> > My patch is for fixing another patch which is in the drm-tip at least,
>
> The referenced commit 105a940416fc in the Fixes tag is also in Linus’
> master branch.
>
> > so I assume applying my patch into Linus tree directly is not
> > completely proper. That's my intention of asking your help for
> > retesting drm-tip.
> If there were such a relation, that would need to be documented in the
> commit message.
You should have seen it : )
>
> > You mean everything working fine means another issue you hit is also
> > gone?
> No, I just mean the hang when applying your patch.
>
> Anyway, after figuring out, that drm-tip, is actually not behind Linus’
> master branch, I tried to figure out the differences, and it turns out
> it’s also related to commit fac54e2bfb5b (x86/Kconfig: Select
> HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP) [1], which is in
> Linus’
> master branch, but not drm-tip. Note, I am using a 32-bit user space and
> a 64-bit Linux kernel. Reverting commit fac54e2bfb5b, and having your
> patch a applied, the hang is gone.
Good to know you have figured it out, and the issue you hit is not related to
my patch : )
>
> I am adding the people involved in the other discussion to make them
> aware of this failure case.
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://linux-regtracking.leemhuis.info/regzbot/mainline/
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-03-31 8:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20220317054602.28846-1-chuansheng.liu@intel.com>
[not found] ` <c058f18b-3dae-9ceb-57b4-ed62fedef50a@molgen.mpg.de>
[not found] ` <BL1PR11MB54455684D2A1B4F0A666F861971D9@BL1PR11MB5445.namprd11.prod.outlook.com>
[not found] ` <502adc88-740f-fd68-d870-4f5577e1254d@molgen.mpg.de>
[not found] ` <BL1PR11MB544534F78BE2AB3502981AE5971D9@BL1PR11MB5445.namprd11.prod.outlook.com>
[not found] ` <baebc9c2-a8fc-9b36-6133-7fa8368a93d5@molgen.mpg.de>
[not found] ` <BL1PR11MB5445633C68B3039320FE780E97E19@BL1PR11MB5445.namprd11.prod.outlook.com>
2022-03-31 8:21 ` [PATCH] fbdev: defio: fix the pagelist corruption Paul Menzel
2022-03-31 8:39 ` Liu, Chuansheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox