From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2424AC433EF for ; Fri, 13 May 2022 15:28:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ABF68D0001; Fri, 13 May 2022 11:28:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95C726B0075; Fri, 13 May 2022 11:28:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FD2F8D0001; Fri, 13 May 2022 11:28:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 71A2E6B0073 for ; Fri, 13 May 2022 11:28:50 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 53576120CA8 for ; Fri, 13 May 2022 15:28:50 +0000 (UTC) X-FDA: 79461102420.18.A156A26 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by imf30.hostedemail.com (Postfix) with ESMTP id 1D7FC800A8 for ; Fri, 13 May 2022 15:28:29 +0000 (UTC) Received: by mail-yb1-f171.google.com with SMTP id r11so15891612ybg.6 for ; Fri, 13 May 2022 08:28:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=qHkbDcmIgfu+PdFtLJ6QK27CjcqHMlIDYaRd7PVknp4=; b=AIYIk3dQe6uIwNjIx0OXaaI8vUlWzLlvm9qrR2Wqb4c3GkB/uYpPy6XUoTmIH+EUMZ ql9fShPHLXrNNzvr1TBHBrsXWJygTJf9onlKDfQDrjRvhjAxXGH3u2VSUEOLr7oiiTyU jj/dQg5E3yqsATm0AQR+lRdHuPmhg5ydY3edvzxmDiZspHVJMaVaXy56FDAAaIwAiiqA YTWBaaZrYp7rmqJHc4Vbau0jHXNsDfKD7Rd8E13iRnDtpdr1ACtXDt2cxQ/bFxxZB17B 7QR8DBkyH6BHdY+MgH5uodm7KT65RUGSWKioSmGyqBq+S91ISBZw8rXGuCASjRyAzSfR PC+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=qHkbDcmIgfu+PdFtLJ6QK27CjcqHMlIDYaRd7PVknp4=; b=SbjCFPuc61PX+L8hTZgqnjbfVGj+zbtZa4ezgR0IT05Ep8BieMODrWWNw9LpqoD/AH t0iHIbD627QkM+Kev5XhSevoSicHP0wxvUU6JX2TLDnFFEVbryWEsgzJrIHfECHO0mrj A5wGw/CmxyDs5NWBNyZJhRw5F3rch5PX7xHwQiuIH6TsKm1PsA+AMVR3iV2fJD/9H7lO Sna1tfbZscpXdVfMi1I5ZrGxNNF4FWWZ2jWF1I+4Qvr6eYO30h04SniUKMFWvACZ2tgd LEBn0i+QxGkvtdZwDGhZxGtoFFkpbHhdta2kR7o2IcHIntwaNO4KkJQLZrNdRF1iAm/4 XyTg== X-Gm-Message-State: AOAM533q/E8Fnn/kltw+zZrQOz1q3le9mbBPqTkuj9m7GayOv/w5Ktjp QMs/UJIUKdETJ/Ge/9M/UYKLEz4l4hz01dBJc7Nn5g== X-Google-Smtp-Source: ABdhPJx3VhVGgZYu2bUk/YBm7CeVVJ+1f7Fyqquyp4xWb52htCkyiut92oml18xPBAcuBtHq7ZXV6TzLPo9RyCyNumQ= X-Received: by 2002:a25:d44e:0:b0:648:3d5b:fbd5 with SMTP id m75-20020a25d44e000000b006483d5bfbd5mr5516615ybf.363.1652455728952; Fri, 13 May 2022 08:28:48 -0700 (PDT) MIME-Version: 1.0 References: <20220511022751.65540-1-kirill.shutemov@linux.intel.com> <20220511064943.GR76023@worktop.programming.kicks-ass.net> <20bada85-9203-57f4-2502-57a6fd11f3ea@intel.com> <875ymav8ul.ffs@tglx> <55176b79-90af-4a47-dc06-9f5f2f2c123d@intel.com> <8a47d0ee50b44520a6f26177e6fe7ec5@AcuMS.aculab.com> In-Reply-To: From: Alexander Potapenko Date: Fri, 13 May 2022 17:28:12 +0200 Message-ID: Subject: Re: [RFCv2 00/10] Linear Address Masking enabling To: David Laight Cc: Dave Hansen , Thomas Gleixner , Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , "the arch/x86 maintainers" , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Linux Memory Management List , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1D7FC800A8 X-Stat-Signature: ikmpwet3t1snr49aof4x5zx9g1dup7pd X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AIYIk3dQ; spf=pass (imf30.hostedemail.com: domain of glider@google.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=glider@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1652455709-730967 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 13, 2022 at 4:26 PM David Laight wrot= e: > > From: Alexander Potapenko > > Sent: 13 May 2022 13:26 > > > > On Fri, May 13, 2022 at 1:28 PM David Laight = wrote: > > > > > > ... > > > > Once we have the possibility to store tags in the pointers, we don'= t > > > > need redzones for heap/stack objects anymore, which saves quite a b= it > > > > of memory. > > > > > > You still need redzones. > > > The high bits are ignored for actual memory accesses. > > > > > > To do otherwise you'd need the high bits to be in the PTE, > > > copied to the TLB and finally get into the cache tag. > > > > > > Then you'd have to use the correct tags for each page. > > > > Sorry, I don't understand how this is relevant to HWASan in the userspa= ce. > > Like in ASan, we have a custom allocator that assigns tags to heap > > objects. The assigned tag is stored in both the shadow memory for the > > object and the pointer returned by the allocator. > > Instrumentation inserted by the compiler checks the pointer before > > every memory access and ensures that its tag matches the tag of the > > object in the shadow memory. > > Doesn't that add so much overhead that the system runs like a sick pig? > I don't see any point adding overhead to a generic kernel to support > such operation. Let me ensure we are on the same page here. Right now we are talking about LAM support for userspace addresses. At this point nobody is going to add instrumentation to a generic kernel - just a prctl (let aside how exactly it works) that makes the CPU ignore certain address bits in a particular userspace process. The whole system is not supposed to run slower because of that - even if one or many processes choose to enable LAM. Now let's consider ASan (https://clang.llvm.org/docs/AddressSanitizer.html)= . It is a powerful detector of memory corruptions in the userspace, but it comes with a cost: - compiler instrumentation bloats the code (by roughly 50%) and slows down the execution (by up to 2x); - redzones around stack and heap objects, memory quarantine and shadow memory increase the memory consumption (by up to 4x). In short, for each 8 bytes of app memory ASan stores one byte of the metadata (shadow memory) indicating the addressability of those 8 bytes. It then uses compiler instrumentation to verify that every memory access in the program accesses only addressable memory. ASan is widely used for testing and to some extent can be used in production, but for big server-side apps the RAM overhead becomes critical. This is where HWASan (https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html) comes to the rescue. Instead of storing addressability info in the shadow memory, it stores a 1-byte tag for every 16 bytes of app memory (see https://arxiv.org/pdf/1802.09517.pdf for other options). As I mentioned before, the custom userspace memory allocator assigns the tags to memory chunks and returns tagged pointers. Like ASan, HWASan uses compiler instrumentation to verify that every memory access is touching valid memory, but in this case it must ensure that the pointer tag matches the tag stored in the shadow memory. Because of instrumentation, HWASan still has comparable code size and execution overheads, but it uses way less memory (10-35% of the original app memory consumption). This lets us test beefy applications, e.g. feeding real-world queries to production services. Even smaller applications benefit from it, e.g. because of reduced cache pressure. HWASan has been available for Android for a while now, and proved itself us= eful. > > A tag mismatch is reported as an out-of-bounds or a use-after-free, > > depending on whether the accessed memory is still considered > > allocated. > > Because objects with different tags follow each other, there is no > > need to add extra redzones to the objects to detect buffer overflows. > > (We might need to increase the object alignment though, but that's a > > different story). > > How does all that help if a system call (eg read()) is given > an invalid length. Neither ASan nor HWASan care about what happens in the kernel. Instead, they wrap system calls (along with some important libc functions) and check their arguments to ensure there are no buffer overflows. HTH, Alex -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Stra=C3=9Fe, 33 80636 M=C3=BCnchen Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese f=C3=A4lschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, l=C3=B6schen Sie alle Kopien und Anh=C3=A4nge davon und lassen Sie = mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.