From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFCA9C433F5 for ; Thu, 31 Mar 2022 08:21:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CFDC6B0072; Thu, 31 Mar 2022 04:21:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1807D6B0073; Thu, 31 Mar 2022 04:21:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0479A6B0074; Thu, 31 Mar 2022 04:21:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0022.hostedemail.com [216.40.44.22]) by kanga.kvack.org (Postfix) with ESMTP id EAB386B0072 for ; Thu, 31 Mar 2022 04:21:41 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A297AA30BD for ; Thu, 31 Mar 2022 08:21:41 +0000 (UTC) X-FDA: 79303987602.20.F776C37 Received: from mx1.molgen.mpg.de (mx3.molgen.mpg.de [141.14.17.11]) by imf01.hostedemail.com (Postfix) with ESMTP id 9130240013 for ; Thu, 31 Mar 2022 08:21:39 +0000 (UTC) Received: from [192.168.0.4] (ip5f5ae900.dynamic.kabel-deutschland.de [95.90.233.0]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 3F8B861E64846; Thu, 31 Mar 2022 10:21:37 +0200 (CEST) Message-ID: <208cb9f0-09e1-094f-5bca-9a9effbf1da8@molgen.mpg.de> Date: Thu, 31 Mar 2022 10:21:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption Content-Language: en-US To: Chuansheng Liu Cc: tzimmermann@suse.de, linux-fbdev@vger.kernel.org, deller@gmx.de, dri-devel@lists.freedesktop.org, Song Liu , linux-mm@kvack.org, bpf@vger.kernel.org, netdev@vger.kernel.org, x86@kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kernel-team@fb.com, akpm@linux-foundation.org, rick.p.edgecombe@intel.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen References: <20220317054602.28846-1-chuansheng.liu@intel.com> <502adc88-740f-fd68-d870-4f5577e1254d@molgen.mpg.de> From: Paul Menzel In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed X-Rspam-User: X-Stat-Signature: wzyphu7tbpn8n4xt5axuw6j3qhtzzjx1 Authentication-Results: imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of pmenzel@molgen.mpg.de designates 141.14.17.11 as permitted sender) smtp.mailfrom=pmenzel@molgen.mpg.de; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9130240013 X-HE-Tag: 1648714899-960189 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Dear Chuansheng, Am 31.03.22 um 02:06 schrieb Liu, Chuansheng: >> -----Original Message----- >> From: Paul Menzel >> Sent: Thursday, March 31, 2022 12:47 AM [=E2=80=A6] >> Am 29.03.22 um 01:58 schrieb Liu, Chuansheng: >> >>>> -----Original Message----- >>>> From: Paul Menzel >>>> Sent: Monday, March 28, 2022 2:15 PM >> >>>> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng: >>>> >>>>>> -----Original Message----- >>>> >>>>>> Sent: Saturday, March 26, 2022 4:11 PM >>>> >>>>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu: >>>>>>> Easily hit the below list corruption: >>>>>>> =3D=3D >>>>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090)= , but >>>>>>> was ffffec604507edc8. (prev=3Dffffec604507edc8). >>>>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26 >>>>>>> __list_add_valid+0x53/0x80 >>>>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U >>>>>>> RIP: 0010:__list_add_valid+0x53/0x80 >>>>>>> Call Trace: >>>>>>> >>>>>>> fb_deferred_io_mkwrite+0xea/0x150 >>>>>>> do_page_mkwrite+0x57/0xc0 >>>>>>> do_wp_page+0x278/0x2f0 >>>>>>> __handle_mm_fault+0xdc2/0x1590 >>>>>>> handle_mm_fault+0xdd/0x2c0 >>>>>>> do_user_addr_fault+0x1d3/0x650 >>>>>>> exc_page_fault+0x77/0x180 >>>>>>> ? asm_exc_page_fault+0x8/0x30 >>>>>>> asm_exc_page_fault+0x1e/0x30 >>>>>>> RIP: 0033:0x7fd98fc8fad1 >>>>>>> =3D=3D >>>>>>> >>>>>>> Figure out the race happens when one process is adding &page->lru= into >>>>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is >>>>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), wh= ich is >>>>>>> not protected by the lock. >>>>>>> >>>>>>> This fix is to init all the page lists one time during initializa= tion, >>>>>>> it not only fixes the list corruption, but also avoids INIT_LIST_= HEAD() >>>>>>> redundantly. >>>>>>> >>>>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already e= nlisted") >>>>>>> Cc: Thomas Zimmermann >>>>>>> Signed-off-by: Chuansheng Liu >>>>>>> --- >>>>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++- >>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/= fbdev/core/fb_defio.c >>>>>>> index 98b0f23bf5e2..eafb66ca4f28 100644 >>>>>>> --- a/drivers/video/fbdev/core/fb_defio.c >>>>>>> +++ b/drivers/video/fbdev/core/fb_defio.c >>>>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct v= m_fault *vmf) >>>>>>> printk(KERN_ERR "no mapping available\n"); >>>>>>> >>>>>>> BUG_ON(!page->mapping); >>>>>>> - INIT_LIST_HEAD(&page->lru); >>>>>>> page->index =3D vmf->pgoff; >>>>>>> >>>>>>> vmf->page =3D page; >>>>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_s= truct *work) >>>>>>> void fb_deferred_io_init(struct fb_info *info) >>>>>>> { >>>>>>> struct fb_deferred_io *fbdefio =3D info->fbdefio; >>>>>>> + struct page *page; >>>>>>> + int i; >>>>>>> >>>>>>> BUG_ON(!fbdefio); >>>>>>> mutex_init(&fbdefio->lock); >>>>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *inf= o) >>>>>>> INIT_LIST_HEAD(&fbdefio->pagelist); >>>>>>> if (fbdefio->delay =3D=3D 0) /* set a default of 1 s */ >>>>>>> fbdefio->delay =3D HZ; >>>>>>> + >>>>>>> + /* initialize all the page lists one time */ >>>>>>> + for (i =3D 0; i < info->fix.smem_len; i +=3D PAGE_SIZE) { >>>>>>> + page =3D fb_deferred_io_page(info, i); >>>>>>> + INIT_LIST_HEAD(&page->lru); >>>>>>> + } >>>>>>> } >>>>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init); >>>>>>> >>>>>> Applying your patch on top of current Linus=E2=80=99 master branch= , tty0 is >>>>>> unusable and looks frozen. Sometimes network card still works, som= etimes >>>>>> not. >>>>> >>>>> I don't see how the patch would cause below BUG call stack, need so= me time to >>>>> debug. Just few comments: >>>>> 1. Will the system work well without this patch? >>>> >>>> Yes, the framebuffer works well without the patch. >>>> >>>>> 2. When you are sure the patch causes the regression you saw, pleas= e get free >>>> to submit one reverted patch, thanks : ) >>>> >>>> I think you for patch wasn=E2=80=99t submitted yet =E2=80=93 at leas= t not pulled by Linus. >>> The patch has been in drm-tip, could you have a try with the latest d= rm-tip to see >>> if the Framebuffer works well, in that case, we could revert it in dr= m-tip then. >> >> With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration >> manifest) everything works fine. (I had to disable amdgpu driver, as i= t >> failed to build.) Is anyone able to explain that? >=20 > My patch is for fixing another patch which is in the drm-tip at least, The referenced commit 105a940416fc in the Fixes tag is also in Linus=E2=80= =99=20 master branch. > so I assume applying my patch into Linus tree directly is not > completely proper. That's my intention of asking your help for > retesting drm-tip. If there were such a relation, that would need to be documented in the=20 commit message. > You mean everything working fine means another issue you hit is also > gone? No, I just mean the hang when applying your patch. Anyway, after figuring out, that drm-tip, is actually not behind Linus=E2= =80=99=20 master branch, I tried to figure out the differences, and it turns out=20 it=E2=80=99s also related to commit fac54e2bfb5b (x86/Kconfig: Select=20 HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP) [1], which is in Linus=E2= =80=99=20 master branch, but not drm-tip. Note, I am using a 32-bit user space and=20 a 64-bit Linux kernel. Reverting commit fac54e2bfb5b, and having your=20 patch a applied, the hang is gone. I am adding the people involved in the other discussion to make them=20 aware of this failure case. Kind regards, Paul [1]: https://linux-regtracking.leemhuis.info/regzbot/mainline/