From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id E254F6B0006 for ; Wed, 8 Aug 2018 12:54:17 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id g26-v6so1686270pfo.7 for ; Wed, 08 Aug 2018 09:54:17 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 29-v6sor1230972pfm.7.2018.08.08.09.54.16 for (Google Transport Security); Wed, 08 Aug 2018 09:54:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180808162752.GA26592@arm.com> References: <20180629110709.GA17859@arm.com> <20180703173608.GF27243@arm.com> <20180801163538.GA10800@arm.com> <20180803092312.GA17798@arm.com> <20180808162752.GA26592@arm.com> From: Dmitry Vyukov Date: Wed, 8 Aug 2018 18:53:54 +0200 Message-ID: Subject: Re: [PATCH v4 00/17] khwasan: kernel hardware assisted address sanitizer Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Will Deacon Cc: Andrey Konovalov , Andrew Morton , Catalin Marinas , Dave Martin , Andrey Ryabinin , Alexander Potapenko , Christoph Lameter , Mark Rutland , Nick Desaulniers , Marc Zyngier , Ard Biesheuvel , "Eric W . Biederman" , Ingo Molnar , Paul Lawrence , Geert Uytterhoeven , Arnd Bergmann , "Kirill A . Shutemov" , Greg Kroah-Hartman , Kate Stewart , Mike Rapoport , kasan-dev , linux-doc@vger.kernel.org, LKML , Linux ARM , linux-sparse@vger.kernel.org, Linux Memory Management List , Linux Kbuild mailing list , Chintan Pandya , Jacob Bramley , Jann Horn , Ruben Ayrapetyan , Lee Smith , Kostya Serebryany , Mark Brand , Ramana Radhakrishnan , Evgeniy Stepanov On Wed, Aug 8, 2018 at 6:27 PM, Will Deacon wrote: >> >> > Thanks for tracking these cases down and going through each of them. The >> >> > obvious follow-up question is: how do we ensure that we keep on top of >> >> > this in mainline? Are you going to repeat your experiment at every kernel >> >> > release or every -rc or something else? I really can't see how we can >> >> > maintain this in the long run, especially given that the coverage we have >> >> > is only dynamic -- do you have an idea of how much coverage you're actually >> >> > getting for, say, a defconfig+modules build? >> >> > >> >> > I'd really like to enable pointer tagging in the kernel, I'm just still >> >> > failing to see how we can do it in a controlled manner where we can reason >> >> > about the semantic changes using something other than a best-effort, >> >> > case-by-case basis which is likely to be fragile and error-prone. >> >> > Unfortunately, if that's all we have, then this gets relegated to a >> >> > debug feature, which sort of defeats the point in my opinion. >> >> >> >> Well, in some cases there is no other way as resorting to dynamic testing. >> >> How do we ensure that kernel does not dereference NULL pointers, does >> >> not access objects after free or out of bounds? Nohow. And, yes, it's >> >> constant maintenance burden resolved via dynamic testing. >> > >> > ... and the advantage of NULL pointer issues is that you're likely to see >> > them as a synchronous exception at runtime, regardless of architecture and >> > regardless of Kconfig options. With pointer tagging, that's certainly not >> > the case, and so I don't think we can just treat issues there like we do for >> > NULL pointers. >> >> Well, let's take use-after-frees, out-of-bounds, info leaks, data >> races is a good example, deadlocks and just logical bugs... > > Ok, but it was you that brought up NULL pointers, so there's some goalpost > moving here. I moved it only because our views on bugs seems to be somewhat different. I would put it all including NULL derefs into the same bucket of bugs. But the point I wanted to make holds if we take NULL derefs out of equation too, so I took them out so that we don't concentrate on "synchronous exceptions" only. > And as with NULL pointers, all of the issues you mention above > apply to other architectures and the majority of their configurations, so my > concerns about this feature remain. > >> > If you want to enable khwasan in "production" and since enabling it >> > could potentially change the behaviour of existing code paths, the >> > run-time validation space doubles as we'd need to get the same code >> > coverage with and without the feature being enabled. >> >> This is true for just any change in configs, sysctls or just a >> different workload. Any of this can enable new code, exiting code >> working differently, or just working with data in new states. And we >> have tens of thousands of bugs, so blindly deploying anything new to >> production without proper testing is a bad idea. It's not specific to >> HWASAN in any way. And when you enable HWASAN you actually do mean to >> retest everything as hard as possible. > > I suppose I'm trying to understand whether we have to resort to testing, or > whether we can do better. I'm really uncomfortable with testing as our only > means of getting this right because this is a non-standard, arm64-specific > option and I don't think it will get very much testing in mainline at all. > Rather, we'll get spurious bug reports from forks of -stable many releases > later and we'll actually be worse-off for it. > >> And in the end we do not seem to have any action points here, right? > > Right now, it feels like this series trades one set of bugs for another, > so I'd like to get to a position where this new set of bugs is genuinely > more manageable (i.e. detectable, fixable, preventable) than the old set. > Unfortunately, the only suggestion seems to be "testing", which I really > don't find convincing :( > > Could we do things like: > > - Set up a dedicated arm64 test farm, running mainline and with a public > frontend, aimed at getting maximum coverage of the kernel with KHWASAN > enabled? FWIW we could try to setup a syzbot instance with qemu/arm64 emulation. We run such combination few times, but I am not sure how stable it will be wrt flaky timeouts/stalls/etc. If works, it will give instant coverage of about 1MLOC. > - Have an implementation of KHWASAN for other architectures? (Is this even > possible?) > > - Have a compiler plugin to clear out the tag for pointer arithmetic? > Could we WARN if two pointers are compared with different tags? > Could we manipulate the tag on cast-to-pointer so that a mismatch would > be qualifier to say that pointer was created via a cast? > > - ... > > ? > > Will