From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A91DC43334 for ; Fri, 17 Jun 2022 03:17:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E899E6B0071; Thu, 16 Jun 2022 23:17:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E396A6B0073; Thu, 16 Jun 2022 23:17:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD9C06B0074; Thu, 16 Jun 2022 23:17:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BE6416B0071 for ; Thu, 16 Jun 2022 23:17:46 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 9229D806BC for ; Fri, 17 Jun 2022 03:17:46 +0000 (UTC) X-FDA: 79586268132.25.21893BE Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 158F780092 for ; Fri, 17 Jun 2022 03:17:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C0D90B8269D for ; Fri, 17 Jun 2022 03:17:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE15DC385A5 for ; Fri, 17 Jun 2022 03:17:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655435862; bh=8x6liXDdkpB75klnw3/vgwOZ5VjePIJsxbGKq8YeHKs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=tYHMzLTAokUl9hFA1zgq7/ie7lmsqJ9skKkgBV8/d/w6t2I5cH7LdSVXDwYsvqiAB fzAgMjjSXzGNLBot+a9K80IOStIPRAtEWSnJDUhjWu2DfoXPa1ZO9IcnDyIGSk2eU+ UUIDGxInywyrEfkUBWvNYsr+8o7HN6SgnA+MckcEvxCGJkzadB+P6Y1rR8Xdsw3wq1 X8byMqwd+3k1Zt0eaYBWR+C5RT+Jb7x1NX7zE4LYhRA/n6B0J9rYt4gcGoRM+tAsBE H1/JxNS0LVUUgZI3b1/8Rcmwc+r1jnJVuf3tJ+W3PQlBOC/nv+59Du8grFcOmUFRiD XYBZjDuV8eHiQ== Received: by mail-vk1-f169.google.com with SMTP id s1so1460768vkl.3 for ; Thu, 16 Jun 2022 20:17:42 -0700 (PDT) X-Gm-Message-State: AJIora8QiNaHPLaLp4uJXmM8EEl2Q/e7ars1hb5BSq6pcVQ22HEpGb2F 9/vr+lZwEEtAnozvI/CKd91429PD4aFPv50Xc/Y= X-Google-Smtp-Source: AGRyM1sgjAJTBFXSSpms3qQuIIgaKXWTXxGsnMUvIxnYQDVdP/ISYbTrmH4++uPSUKy+5RLEO+p28tzUWfWpZFHdc3M= X-Received: by 2002:a1f:b292:0:b0:368:b49d:de17 with SMTP id b140-20020a1fb292000000b00368b49dde17mr3688756vkf.10.1655435861713; Thu, 16 Jun 2022 20:17:41 -0700 (PDT) MIME-Version: 1.0 References: <20220613131046.3009889-1-xianting.tian@linux.alibaba.com> <0262A4FB-5A9B-47D3-8F1A-995509F56279@nvidia.com> <435B45C3-E6A5-43B2-A5A2-318C748691FC@nvidia.com> <18330D9A-F433-4136-A226-F24173293BF3@nvidia.com> <5526fab6-c7e1-bddc-912b-e4d9b2769d4e@linux.alibaba.com> <417EC421-DC05-4B35-954B-35DF873A2C40@nvidia.com> <20f49e70-32e0-a141-907c-5f58c543d70b@redhat.com> In-Reply-To: <20f49e70-32e0-a141-907c-5f58c543d70b@redhat.com> From: Guo Ren Date: Fri, 17 Jun 2022 11:17:30 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype To: David Hildenbrand , Zi Yan Cc: Xianting Tian , Greg Kroah-Hartman , Andrew Morton , Vlastimil Babka , Linux-MM , Linux Kernel Mailing List , stable@vger.kernel.org, huanyi.xj@alibaba-inc.com, zjb194813@alibaba-inc.com, tianhu.hh@alibaba-inc.com, Hanjun Guo , Joonsoo Kim , Laura Abbott Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655435866; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K447VoOlpfN2kBrIsRiimXE+plizQOzFd9vNNfGNC80=; b=zryJEFN9TGsiDRkRag3G033m+4x/33zbk2IU+QiMYtwshvfbRxyl/LJxyJfxupBnF1wxjL 6VDu8pVmrvn1Y0Y03VctATNLq8Sczp+HI9JTB6XF5gGm7cjcx4axhQ7Y7R9AbLWp7sg/KR DwsVSvtjYYaEx0rlP3k52oICe7f22zI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655435866; a=rsa-sha256; cv=none; b=NYNyD8d1qfCfk4e9mlUHEjPSn/B2WIfwtyOYOTKLkBL1I3DfHkjPCwZxorSdvWCA5HMmTj kMzwtFC+jSaVWAjW5+F1ODytz29s6rdcwx+2hDTyYJ5qaupLsJkorDykAN4PR95uUPed8J H/u0U7M7eHBC0mFy0K8lV3rSnWCaW4g= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tYHMzLTA; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of guoren@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=guoren@kernel.org Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tYHMzLTA; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of guoren@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=guoren@kernel.org X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 158F780092 X-Stat-Signature: 815umyi7ttep9u8osefh6k55wmj36f1o X-HE-Tag: 1655435865-56343 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi David & Zi Yan On Thu, Jun 16, 2022 at 11:04 PM David Hildenbrand wrote= : > > On 16.06.22 16:01, Zi Yan wrote: > > On 15 Jun 2022, at 12:15, Xianting Tian wrote: > > > >> =E5=9C=A8 2022/6/15 =E4=B8=8B=E5=8D=889:55, Zi Yan =E5=86=99=E9=81=93: > >>> On 15 Jun 2022, at 2:47, Xianting Tian wrote: > >>> > >>>> =E5=9C=A8 2022/6/14 =E4=B8=8A=E5=8D=888:14, Zi Yan =E5=86=99=E9=81= =93: > >>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote: > >>>>> > >>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan wrote: > >>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote: > >>>>>>> > >>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan wrote: > >>>>>>>>> Hi Xianting, > >>>>>>>>> > >>>>>>>>> Thanks for your patch. > >>>>>>>>> > >>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote: > >>>>>>>>> > >>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before ch= eck its migratetype.") > >>>>>>>>>> added buddy check code. But unfortunately, this fix isn't back= ported to > >>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it = added wrong > >>>>>>>>>> fixes message: > >>>>>>>>>> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non= -fallbackable > >>>>>>>>>> pageblocks with others") > >>>>>>>>> No, the Fixes tag is right. The commit above does need to valid= ate buddy. > >>>>>>>> I think Xianting is right. The =E2=80=9CFixes:" tag is not accur= ate and the > >>>>>>>> page_is_buddy() is necessary here. > >>>>>>>> > >>>>>>>> This patch could be applied to the early version of the stable t= ree > >>>>>>>> (eg: Linux-5.10.y, not the master tree) > >>>>>>> This is quite misleading. Commit 787af64d05cd applies does not me= an it is > >>>>>>> intended to fix the preexisting bug. Also it does not apply clean= ly > >>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At= best, > >>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixe= s d9dddbf55667. > >>>>>>> There is no way you can apply 787af64d05cd to earlier trees and c= all it a day. > >>>>>>> > >>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c = and there is > >>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way = too. Saying > >>>>>>> the fixes message is wrong just misleads people, making them thin= k there is > >>>>>>> no bug in 1dd214b8f21c. We need to be clear about this. > >>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. Th= e > >>>>>> origin fixes could cover the Linux-5.0.y tree if they give the > >>>>>> accurate commit number and that is the cause we want to point out. > >>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fix= es > >>>>> the issue introduced by d9dddbf55667. But my point is that 787af64d= 05cd > >>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes > >>>>> message is misleading. This is the point I want to make. > >>>>> > >>>>>> Second, if the patch is for d9dddbf55667 then it could cover any t= ree > >>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the > >>>>>> problem. > >>>>> But it is not and does not apply to d9dddbf55667 cleanly. > >>>>> > >>>>>> Maybe, Gregkh could help to direct us on how to deal with the issu= e: > >>>>>> (Fixup a bug which only belongs to the former stable branch.) > >>>>>> > >>>>> I think you just need to send this patch without saying =E2=80=9Cco= mmit > >>>>> 787af64d05cd fixes message is wrong=E2=80=9D would be a good start.= You also > >>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and = 5.17 > >>>>> (inclusive). So there will need to be two patches: > >>>>> > >>>>> 1) your patch to stable tree prior to 5.15 and > >>>>> > >>>>> 2) your patch with an additional mm/page_isolation.c fix to stable = tree > >>>>> between 5.15 and 5.17. > >>>>> > >>>>>>> Also, you will need to fix the mm/page_isolation.c code too to ma= ke this patch > >>>>>>> complete, unless you can show that PFN=3D0x1000 is never going to= be encountered > >>>>>>> in the mm/page_isolation.c code I mentioned below. > >>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it= had > >>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prev= ent > >>>>>> buddy_pfn=3D0. > >>>>>> The root cause comes from __find_buddy_pfn(): > >>>>>> return page_pfn ^ (1 << order); > >>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix i= s > >>>>> required for kernels between 5.15 and 5.17 (inclusive). > >>>>> > >>>>>> When page_pfn is the same as the order size, it will return the > >>>>>> previous buddy not the next. That is the only exception for this > >>>>>> algorithm, right? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> In fact, the bug is a very long time to reproduce and is not easy = to > >>>>>> debug, so we want to contribute it to the community to prevent oth= er > >>>>>> guys from wasting time. Although there is no new patch at all. > >>>>> Thanks for your reporting and sending out the patch. I really > >>>>> appreciate it. We definitely need your inputs. Throughout the email > >>>>> thread, I am trying to help you clarify the bug and how to fix it > >>>>> properly: > >>>>> > >>>>> 1. The commit 787af64d05cd does not apply cleanly to commits > >>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to > >>>>> fix the issue. That is why we need your patch to fix the issue. > >>>>> And saying it has a wrong fixes message in this patch=E2=80=99s git= log is > >>>>> misleading. > >>>>> > >>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix > >>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was > >>>>> removed since 5.15 and the issue can appear during page isolation. > >>>>> > >>>>> 3. For kernels before 5.15, this patch will apply. > >>>> Zi Yan, Guo Ren, > >>>> > >>>> I think we still need some imporvemnt for MASTER branch, as we discu= ssed above, we will get an illegal buddy page if buddy_pfn is 0, > >>>> > >>>> within page_is_buddy(), it still use the illegal buddy page to do th= e check. I think in most of cases, page_is_buddy() can return false, but i= t still may return true with very low probablity. > >>> Can you elaborate more on this? What kind of page can lead to page_is= _buddy() > >>> returning true? You said it is buddy_pfn is 0, but if the page is res= erved, > >>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false. > >>> Maybe show us the dump_page() that offending page. > >>> > >>> Thanks. > >> > >> Let=E2=80=98s take the issue we met on RISC-V arch for example, > >> > >> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is = 0xffffffe07e205000, which is the page address of PFN 512. > >> > >> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd. > >> We know PFN 0 is not a valid pfn for buddy system, because 512 is the = first PFN for buddy system. > >> > >> Then it use below code to get buddy page with buddy_pfn 0: > >> buddy =3D page + (buddy_pfn - pfn); > >> So buddy page address is: > >> 0xffffffe07e1fe000 =3D (struct page*)0xffffffe07e26e000 + (0 - 0x2000) > >> > >> we can know this buddy page's address is less than mem_map(0xffffffe07= e1fe000 < 0xffffffe07e205000), > >> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0= xffffffe07e1fe000 > >> as the page's address to extract the value of a member in 'struct page= ', we may get an uncertain value. > >> That's why I say page_is_buddy() may return true with very low probabl= ity. > >> > >> So I think we need to add the code the verify buddy_pfn in the first p= lace: > >> pfn_valid(buddy_pfn) > >> > > > > +DavidH on how memory section works. > > > > This 2MB RAM reservation does not sound right to me. How does it work i= n sparsemem? > > RISC-V has SECTION_SIZE_BITS=3D27, i.e., 128MB a section. All pages wit= hin > > a section should have their corresponding struct page (mem_map). So in = this case, > > the first 2MB pages should have mem_map and can be marked as PageReserv= ed. As a > > result, page_is_buddy() will return false. Actually, we had a patch to fix that, have a look: https://lore.kernel.org/linux-riscv/20211123015717.542631-2-guoren@kernel.o= rg/ What do you think of the above patch? A lot of arch maintainers do not recognize that the buddy system has an implied limitation that the start of the phy ram address must align with (1 << MAX_ORDER-1). > > Yes. Unless there is a BUG :) > > init_unavailable_range() is supposed to initialize the memap of > unavailable ranges and mark it reserved. > > I wonder if we're missing a case in memmap_init(), to also initialize > holes at the beginning of a section, before RAM (we do handle sections > in a special way if the end of RAM falls in the middle of a section). > > If it's not initialized, it might contain garbage. > > -- > Thanks, > > David / dhildenb > -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/