From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 510878E0002 for ; Thu, 17 Jan 2019 17:29:41 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id m16so7132412pgd.0 for ; Thu, 17 Jan 2019 14:29:41 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o19sor4075884pll.44.2019.01.17.14.29.39 for (Google Transport Security); Thu, 17 Jan 2019 14:29:39 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: [PATCH 06/17] x86/alternative: use temporary mm for text poking From: Nadav Amit In-Reply-To: Date: Thu, 17 Jan 2019 14:29:36 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <32219CAE-7D49-4848-9497-A17E0D809B3E@gmail.com> References: <20190117003259.23141-1-rick.p.edgecombe@intel.com> <20190117003259.23141-7-rick.p.edgecombe@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Rick Edgecombe , Ingo Molnar , LKML , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Borislav Petkov , Dave Hansen , Peter Zijlstra , Damian Tometzki , linux-integrity , LSM List , Andrew Morton , Kernel Hardening , Linux-MM , Will Deacon , Ard Biesheuvel , Kristen Carlson Accardi , "Dock, Deneen T" , Kees Cook , Dave Hansen , Masami Hiramatsu > On Jan 17, 2019, at 1:43 PM, Nadav Amit wrote: >=20 >> On Jan 17, 2019, at 12:47 PM, Andy Lutomirski = wrote: >>=20 >> On Thu, Jan 17, 2019 at 12:27 PM Andy Lutomirski = wrote: >>> On Wed, Jan 16, 2019 at 4:33 PM Rick Edgecombe >>> wrote: >>>> From: Nadav Amit >>>>=20 >>>> text_poke() can potentially compromise the security as it sets = temporary >>>> PTEs in the fixmap. These PTEs might be used to rewrite the kernel = code >>>> from other cores accidentally or maliciously, if an attacker gains = the >>>> ability to write onto kernel memory. >>>=20 >>> i think this may be sufficient, but barely. >>>=20 >>>> + pte_clear(poking_mm, poking_addr, ptep); >>>> + >>>> + /* >>>> + * __flush_tlb_one_user() performs a redundant TLB flush = when PTI is on, >>>> + * as it also flushes the corresponding "user" address = spaces, which >>>> + * does not exist. >>>> + * >>>> + * Poking, however, is already very inefficient since it = does not try to >>>> + * batch updates, so we ignore this problem for the time = being. >>>> + * >>>> + * Since the PTEs do not exist in other kernel = address-spaces, we do >>>> + * not use __flush_tlb_one_kernel(), which when PTI is on = would cause >>>> + * more unwarranted TLB flushes. >>>> + * >>>> + * There is a slight anomaly here: the PTE is a = supervisor-only and >>>> + * (potentially) global and we use __flush_tlb_one_user() = but this >>>> + * should be fine. >>>> + */ >>>> + __flush_tlb_one_user(poking_addr); >>>> + if (cross_page_boundary) { >>>> + pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep = + 1); >>>> + __flush_tlb_one_user(poking_addr + PAGE_SIZE); >>>> + } >>>=20 >>> In principle, another CPU could still have the old translation. = Your >>> mutex probably makes this impossible, but it makes me nervous. >>> Ideally you'd use flush_tlb_mm_range(), but I guess you can't do = that >>> with IRQs off. Hmm. I think you should add an inc_mm_tlb_gen() = here. >>> Arguably, if you did that, you could omit the flushes, but maybe >>> that's silly. >>>=20 >>> If we start getting new users of use_temporary_mm(), we should give >>> some serious thought to the SMP semantics. >>>=20 >>> Also, you're using PAGE_KERNEL. Please tell me that the global bit >>> isn't set in there. >>=20 >> Much better solution: do unuse_temporary_mm() and *then* >> flush_tlb_mm_range(). This is entirely non-sketchy and should be = just >> about optimal, too. >=20 > This solution sounds nice and clean. The fact the global-bit was set = didn=E2=80=99t > matter before (since __flush_tlb_one_user would get rid of it no = matter > what), but would matter now, so I=E2=80=99ll change it too. Err.. so actually text_poke() might be called with disabled IRQs (by = kgdb). flush_tlb_mm_range() should still work fine even with disabled IRQs = since no core would use poking_mm at this point. I can add a comment to flush_tlb_mm_range(), but all in all it is actually not very pretty.