From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C17A4C433E0 for ; Mon, 22 Feb 2021 09:52:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F0D264E2F for ; Mon, 22 Feb 2021 09:52:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F0D264E2F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B62C06B0075; Mon, 22 Feb 2021 04:52:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B13446B0078; Mon, 22 Feb 2021 04:52:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A02AC8D0001; Mon, 22 Feb 2021 04:52:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id 8B8C06B0075 for ; Mon, 22 Feb 2021 04:52:37 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 456FD1803EF00 for ; Mon, 22 Feb 2021 09:52:37 +0000 (UTC) X-FDA: 77845439154.10.B24F712 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 2DBEE2000385 for ; Mon, 22 Feb 2021 09:52:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613987555; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d5eIr8u8FvWotSBBBi/ahMw0H/+9oyeliDYTeYUXMTY=; b=ABj9xfk/qvABV6ygUTKXPZ+FJ+Y7wTpd/oniMMRG1BWU6wG5iqubYhYv2vIO5O1RHiMJWI HfxyML7C1wSp64xJ0ZAaA0YO4AWaPb5p8jwdEnGaUj4FspZSdLvGZY/tjQXi9W0LbSJPwf 9aEEKHo6JnSNJkoqntvrAG5QVtJGWHs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-30-3Uvdxx5pNT-S0cgB1-QOkw-1; Mon, 22 Feb 2021 04:52:31 -0500 X-MC-Unique: 3Uvdxx5pNT-S0cgB1-QOkw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7C76D801965; Mon, 22 Feb 2021 09:52:28 +0000 (UTC) Received: from [10.36.115.16] (ovpn-115-16.ams2.redhat.com [10.36.115.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 160D71001281; Mon, 22 Feb 2021 09:52:23 +0000 (UTC) To: George Kennedy , Andrey Konovalov Cc: Andrew Morton , Catalin Marinas , Vincenzo Frascino , Dmitry Vyukov , Konrad Rzeszutek Wilk , Will Deacon , Andrey Ryabinin , Alexander Potapenko , Marco Elver , Peter Collingbourne , Evgenii Stepanov , Branislav Rankov , Kevin Brodsky , Christoph Hellwig , kasan-dev , Linux ARM , Linux Memory Management List , LKML , Dhaval Giani References: <487751e1ccec8fcd32e25a06ce000617e96d7ae1.1613595269.git.andreyknvl@google.com> <797fae72-e3ea-c0b0-036a-9283fa7f2317@oracle.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH] mm, kasan: don't poison boot memory Message-ID: <1ac78f02-d0af-c3ff-cc5e-72d6b074fc43@redhat.com> Date: Mon, 22 Feb 2021 10:52:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <797fae72-e3ea-c0b0-036a-9283fa7f2317@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Stat-Signature: 396nibta1rfkewnpm4hw5wo3hury1q4i X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2DBEE2000385 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613987548-575037 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.02.21 00:04, George Kennedy wrote: >=20 >=20 > On 2/19/2021 11:45 AM, George Kennedy wrote: >> >> >> On 2/18/2021 7:09 PM, Andrey Konovalov wrote: >>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy >>> wrote: >>>> >>>> >>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote: >>>>> On 17.02.21 21:56, Andrey Konovalov wrote: >>>>>> During boot, all non-reserved memblock memory is exposed to the bu= ddy >>>>>> allocator. Poisoning all that memory with KASAN lengthens boot tim= e, >>>>>> especially on systems with large amount of RAM. This patch makes >>>>>> page_alloc to not call kasan_free_pages() on all new memory. >>>>>> >>>>>> __free_pages_core() is used when exposing fresh memory during syst= em >>>>>> boot and when onlining memory during hotplug. This patch adds a ne= w >>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() thro= ugh >>>>>> free_pages_prepare() from __free_pages_core(). >>>>>> >>>>>> This has little impact on KASAN memory tracking. >>>>>> >>>>>> Assuming that there are no references to newly exposed pages >>>>>> before they >>>>>> are ever allocated, there won't be any intended (but buggy) >>>>>> accesses to >>>>>> that memory that KASAN would normally detect. >>>>>> >>>>>> However, with this patch, KASAN stops detecting wild and large >>>>>> out-of-bounds accesses that happen to land on a fresh memory page >>>>>> that >>>>>> was never allocated. This is taken as an acceptable trade-off. >>>>>> >>>>>> All memory allocated normally when the boot is over keeps getting >>>>>> poisoned as usual. >>>>>> >>>>>> Signed-off-by: Andrey Konovalov >>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d >>>>> Not sure this is the right thing to do, see >>>>> >>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@orac= le.com >>>>> >>>>> >>>>> Reversing the order in which memory gets allocated + used during bo= ot >>>>> (in a patch by me) might have revealed an invalid memory access dur= ing >>>>> boot. >>>>> >>>>> I suspect that that issue would no longer get detected with your >>>>> patch, as the invalid memory access would simply not get detected. >>>>> Now, I cannot prove that :) >>>> Since David's patch we're having trouble with the iBFT ACPI table, >>>> which >>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KA= SAN >>>> detects that it is being used after free when ibft_init() accesses t= he >>>> iBFT table, but as of yet we can't find where it get's freed (we've >>>> instrumented calls to kunmap()). >>> Maybe it doesn't get freed, but what you see is a wild or a large >>> out-of-bounds access. Since KASAN marks all memory as freed during th= e >>> memblock->page_alloc transition, such bugs can manifest as >>> use-after-frees. >> >> It gets freed and re-used. By the time the iBFT table is accessed by >> ibft_init() the page has been over-written. >> >> Setting page flags like the following before the call to kmap() >> prevents the iBFT table page from being freed: >=20 > Cleaned up version: >=20 > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > index 0418feb..8f0a8e7 100644 > --- a/drivers/acpi/osl.c > +++ b/drivers/acpi/osl.c > @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_addres= s > pg_off, unsigned long pg_sz) >=20 > =C2=A0=C2=A0=C2=A0=C2=A0 pfn =3D pg_off >> PAGE_SHIFT; > =C2=A0=C2=A0=C2=A0=C2=A0 if (should_use_kmap(pfn)) { > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 struct page *page =3D pfn_to_pag= e(pfn); > + > =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (pg_sz > PAGE_SIZE) > =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return= NULL; > -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return (void __iomem __force *)k= map(pfn_to_page(pfn)); > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 SetPageReserved(page); > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return (void __iomem __force *)k= map(page); > =C2=A0=C2=A0=C2=A0=C2=A0 } else > =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return acpi_os_ioremap(pg= _off, pg_sz); > =C2=A0} > @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address > pg_off, void __iomem *vaddr) > =C2=A0=C2=A0=C2=A0=C2=A0 unsigned long pfn; >=20 > =C2=A0=C2=A0=C2=A0=C2=A0 pfn =3D pg_off >> PAGE_SHIFT; > -=C2=A0=C2=A0=C2=A0 if (should_use_kmap(pfn)) > -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 kunmap(pfn_to_page(pfn)); > -=C2=A0=C2=A0=C2=A0 else > +=C2=A0=C2=A0=C2=A0 if (should_use_kmap(pfn)) { > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 struct page *page =3D pfn_to_pag= e(pfn); > + > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ClearPageReserved(page); > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 kunmap(page); > +=C2=A0=C2=A0=C2=A0 } else > =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 iounmap(vaddr); > =C2=A0} >=20 > David, the above works, but wondering why it is now necessary. kunmap() > is not hit. What other ways could a page mapped via kmap() be unmapped? >=20 Let me look into the code ... I have little experience with ACPI=20 details, so bear with me. I assume that acpi_map()/acpi_unmap() map some firmware blob that is=20 provided via firmware/bios/... to us. should_use_kmap() tells us whether a) we have a "struct page" and should kmap() that one b) we don't have a "struct page" and should ioremap. As it is a blob, the firmware should always reserve that memory region=20 via memblock (e.g., memblock_reserve()), such that we either 1) don't create a memmap ("struct page") at all (-> case b) ) 2) if we have to create e memmap, we mark the page PG_reserved and *never* expose it to the buddy (-> case a) ) Are you telling me that in this case we might have a memmap for the HW=20 blob that is *not* PG_reserved? In that case it most probably got=20 exposed to the buddy where it can happily get allocated/freed. The latent BUG would be that that blob gets exposed to the system like=20 ordinary RAM, and not reserved via memblock early during boot. Assuming=20 that blob has a low physical address, with my patch it will get=20 allocated/used a lot earlier - which would mean we trigger this latent=20 BUG now more easily. There have been similar latent BUGs on ARM boards that my patch=20 discovered where special RAM regions did not get marked as reserved via=20 the device tree properly. Now, this is just a wild guess :) Can you dump the page when mapping=20 (before PageReserved()) and when unmapping, to see what the state of=20 that memmap is? --=20 Thanks, David / dhildenb