From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C32B0C433C1 for ; Tue, 30 Mar 2021 04:59:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 12EC361924 for ; Tue, 30 Mar 2021 04:59:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 12EC361924 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=wanyeetech.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A777C6B007E; Tue, 30 Mar 2021 00:59:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A27B16B0080; Tue, 30 Mar 2021 00:59:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A14A6B0081; Tue, 30 Mar 2021 00:59:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 6A2806B007E for ; Tue, 30 Mar 2021 00:59:33 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 19C64824999B for ; Tue, 30 Mar 2021 04:59:33 +0000 (UTC) X-FDA: 77975337426.22.E1EC09B Received: from out28-121.mail.aliyun.com (out28-121.mail.aliyun.com [115.124.28.121]) by imf19.hostedemail.com (Postfix) with ESMTP id 876B090009E2 for ; Tue, 30 Mar 2021 04:59:24 +0000 (UTC) X-Alimail-AntiSpam:AC=CONTINUE;BC=0.07441275|-1;CH=green;DM=|CONTINUE|false|;DS=CONTINUE|ham_regular_dialog|0.00478279-0.00110423-0.994113;FP=0|0|0|0|0|-1|-1|-1;HT=ay29a033018047201;MF=zhouyanjie@wanyeetech.com;NM=1;PH=DS;RN=7;RT=7;SR=0;TI=SMTPD_---.JsM3E1E_1617080368; Received: from 192.168.123.38(mailfrom:zhouyanjie@wanyeetech.com fp:SMTPD_---.JsM3E1E_1617080368) by smtp.aliyun-inc.com(10.147.42.22); Tue, 30 Mar 2021 12:59:28 +0800 Subject: Re: [PATCH v3] mm: fix race by making init_zero_pfn() early_initcall To: Ilya Lipnitskiy , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , "Eric W. Biederman" , stable@vger.kernel.org References: <20210329052922.1130493-1-ilya.lipnitskiy@gmail.com> <20210330044208.8305-1-ilya.lipnitskiy@gmail.com> From: Zhou Yanjie Message-ID: <51e3affb-ea09-65a4-99e1-daba968e6dc8@wanyeetech.com> Date: Tue, 30 Mar 2021 12:59:27 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20210330044208.8305-1-ilya.lipnitskiy@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Queue-Id: 876B090009E2 X-Stat-Signature: 185mkaf59ozr3b1i5snuau8dm1xepf75 X-Rspamd-Server: rspam02 Received-SPF: none (wanyeetech.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=out28-121.mail.aliyun.com; client-ip=115.124.28.121 X-HE-DKIM-Result: none/none X-HE-Tag: 1617080364-71211 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Ilya, On 2021/3/30 =E4=B8=8B=E5=8D=8812:42, Ilya Lipnitskiy wrote: > There are code paths that rely on zero_pfn to be fully initialized > before core_initcall. For example, wq_sysfs_init() is a core_initcall > function that eventually results in a call to kernel_execve, which > causes a page fault with a subsequent mmput. If zero_pfn is not > initialized by then it may not get cleaned up properly and result in an > error: > BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1 > > Here is an analysis of the race as seen on a MIPS device. On this > particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until > initialized, at which point it becomes PFN 5120: > 1. wq_sysfs_init calls into kobject_uevent_env at core_initcall: > [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec > [<8033f8b8>] kset_register+0x68/0x88 > [<803cf824>] bus_register+0xdc/0x34c > [<803cfac8>] subsys_virtual_register+0x34/0x78 > [<8086afb0>] wq_sysfs_init+0x1c/0x4c > [<80001648>] do_one_initcall+0x50/0x1a8 > [<8086503c>] kernel_init_freeable+0x230/0x2c8 > [<8066bca0>] kernel_init+0x10/0x100 > [<80003038>] ret_from_kernel_thread+0x14/0x1c > > 2. kobject_uevent_env() calls call_usermodehelper_exec() which execu= tes > kernel_execve asynchronously. > > 3. Memory allocations in kernel_execve cause a page fault, bumping t= he > MM reference counter: > [<8015adb4>] add_mm_counter_fast+0xb4/0xc0 > [<80160d58>] handle_mm_fault+0x6e4/0xea0 > [<80158aa4>] __get_user_pages.part.78+0x190/0x37c > [<8015992c>] __get_user_pages_remote+0x128/0x360 > [<801a6d9c>] get_arg_page+0x34/0xa0 > [<801a7394>] copy_string_kernel+0x194/0x2a4 > [<801a880c>] kernel_execve+0x11c/0x298 > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > > 4. In case zero_pfn has not been initialized yet, zap_pte_range does > not decrement the MM_ANONPAGES RSS counter and the BUG message is > triggered shortly afterwards when __mmdrop checks the ref counter= s: > [<800285e8>] __mmdrop+0x98/0x1d0 > [<801a6de8>] free_bprm+0x44/0x118 > [<801a86a8>] kernel_execve+0x160/0x1d8 > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > [<80003198>] ret_from_kernel_thread+0x14/0x1c > > To avoid races such as described above, initialize init_zero_pfn at > early_initcall level. Depending on the architecture, ZERO_PAGE is eithe= r > constant or gets initialized even earlier, at paging_init, so there is > no issue with initializing zero_pfn earlier. > > Discussion: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=3DhafjhHCtTHVod= Chv1qpM=3DniAXOpqEbt7w@mail.gmail.com > > Signed-off-by: Ilya Lipnitskiy > Cc: Hugh Dickins > Cc: "Eric W. Biederman" > Cc: stable@vger.kernel.org > --- > mm/memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Tested-by: =E5=91=A8=E7=90=B0=E6=9D=B0 (Zhou Yanjie) # on=20 CU1000-Neo/X1000E and CU1830-Neo/X1830 > diff --git a/mm/memory.c b/mm/memory.c > index 5c3b29d3af66..e66b11ac1659 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -166,7 +166,7 @@ static int __init init_zero_pfn(void) > zero_pfn =3D page_to_pfn(ZERO_PAGE(0)); > return 0; > } > -core_initcall(init_zero_pfn); > +early_initcall(init_zero_pfn); > =20 > void mm_trace_rss_stat(struct mm_struct *mm, int member, long count) > {