From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id A118E6B4E57 for ; Wed, 28 Nov 2018 13:29:56 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id f9so12619402pgs.13 for ; Wed, 28 Nov 2018 10:29:56 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d10-v6sor11335174pfj.29.2018.11.28.10.29.55 for (Google Transport Security); Wed, 28 Nov 2018 10:29:55 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.1 \(3445.101.1\)) Subject: =?utf-8?Q?Re=3A_=5BPATCH_0/2=5D_Don=E2=80=99t_leave_executable_TL?= =?utf-8?Q?B_entries_to_freed_pages?= From: Nadav Amit In-Reply-To: <20181128095734.GA23467@arm.com> Date: Wed, 28 Nov 2018 10:29:52 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20181128000754.18056-1-rick.p.edgecombe@intel.com> <449E6648-5599-476D-8136-EE570101F930@gmail.com> <20181128095734.GA23467@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Will Deacon Cc: Rick Edgecombe , Andrew Morton , Andy Lutomirski , linux-mm , LKML , Kernel Hardening , naveen.n.rao@linux.vnet.ibm.com, anil.s.keshavamurthy@intel.com, David Miller , Masami Hiramatsu , Steven Rostedt , Ingo Molnar , ast@kernel.org, Daniel Borkmann , jeyu@kernel.org, netdev@vger.kernel.org, Ard Biesheuvel , Jann Horn , kristen@linux.intel.com, dave.hansen@intel.com, deneen.t.dock@intel.com > On Nov 28, 2018, at 1:57 AM, Will Deacon wrote: >=20 > On Tue, Nov 27, 2018 at 05:21:08PM -0800, Nadav Amit wrote: >>> On Nov 27, 2018, at 5:06 PM, Nadav Amit = wrote: >>>=20 >>>> On Nov 27, 2018, at 4:07 PM, Rick Edgecombe = wrote: >>>>=20 >>>> Sometimes when memory is freed via the module subsystem, an = executable >>>> permissioned TLB entry can remain to a freed page. If the page is = re-used to >>>> back an address that will receive data from userspace, it can = result in user >>>> data being mapped as executable in the kernel. The root of this = behavior is >>>> vfree lazily flushing the TLB, but not lazily freeing the = underlying pages.=20 >>>>=20 >>>> There are sort of three categories of this which show up across = modules, bpf, >>>> kprobes and ftrace: >>>>=20 >>>> 1. When executable memory is touched and then immediatly freed >>>>=20 >>>> This shows up in a couple error conditions in the module loader and = BPF JIT >>>> compiler. >>>=20 >>> Interesting! >>>=20 >>> Note that this may cause conflict with "x86: avoid W^X being broken = during >>> modules loading=E2=80=9D, which I recently submitted. >>=20 >> I actually have not looked on the vmalloc() code too much recent, but = it >> seems =E2=80=A6 strange: >>=20 >> void vm_unmap_aliases(void) >> { =20 >>=20 >> ... >> mutex_lock(&vmap_purge_lock); >> purge_fragmented_blocks_allcpus(); >> if (!__purge_vmap_area_lazy(start, end) && flush) >> flush_tlb_kernel_range(start, end); >> mutex_unlock(&vmap_purge_lock); >> } >>=20 >> Since __purge_vmap_area_lazy() releases the memory, it seems there is = a time >> window between the release of the region and the TLB flush, in which = the >> area can be allocated for another purpose. This can result in a >> (theoretical) correctness issue. No? >=20 > If __purge_vmap_area_lazy() returns false, then it hasn't freed the = memory, > so we only invalidate the TLB if 'flush' is true in that case. If > __purge_vmap_area_lazy() returns true instead, then it takes care of = the TLB > invalidation before the freeing. Right. Sorry for my misunderstanding. Thanks, Nadav