From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4030AC6FD1D for ; Tue, 4 Apr 2023 10:29:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0B536B0075; Tue, 4 Apr 2023 06:29:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BBF76B0078; Tue, 4 Apr 2023 06:29:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85C1D900002; Tue, 4 Apr 2023 06:29:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 74D756B0075 for ; Tue, 4 Apr 2023 06:29:33 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 41F6440114 for ; Tue, 4 Apr 2023 10:29:33 +0000 (UTC) X-FDA: 80643337026.28.7B256D0 Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by imf27.hostedemail.com (Postfix) with ESMTP id 5E71240015 for ; Tue, 4 Apr 2023 10:29:30 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="UT/fEhLm"; spf=pass (imf27.hostedemail.com: domain of elver@google.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680604170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4EmkZjYvZJVj3QnIxJ1/YfnG3WC3vCHhD00FgzBktzc=; b=hHHL0XOpaNwaC6Cx2Ky4Y0Rpl0izgHH50IYv/xMAHewSjglEFRdGUm464lvBmU+Yea7SFt FIFQwcrweV6kSkT6yyikY87ghYWg4QAbTbTzwEjJkrgirT6cGqwDVdMpHGcJ5vRSf6ehZx frGuuG9fzkrOGumKpoJ7kSMXbpt5OWc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="UT/fEhLm"; spf=pass (imf27.hostedemail.com: domain of elver@google.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680604170; a=rsa-sha256; cv=none; b=qgtoZO9o6WOXiCIXkE3IFjFy2DeDEvxhsC0s3WSGA9qAGFOfEuHvLnvXVfQ0Enr5gN8tTA JVc7AnBNPXYg1vjGyqMvDIrEQYO/UsLg9dcJrmkSM9tx/bTHznC8S3EhntkLdEjtzAUeqa PTUjtnHDT+Q8ZyVzuoHbPdLlgWxaAcw= Received: by mail-io1-f53.google.com with SMTP id g16so3566231iom.11 for ; Tue, 04 Apr 2023 03:29:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680604169; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4EmkZjYvZJVj3QnIxJ1/YfnG3WC3vCHhD00FgzBktzc=; b=UT/fEhLm0SC/fiIsmvTNeoOgXz8ZErqOQFzejfAdFEvfdEfjgHfylVxuCGYJnFUJD7 VztWelG82Vzgkg2W03Zc0/ORaguOUDfPMG0Ul3sN9Jo1kIaior/siMoErdhdWtnOFhlY +sn4xUXe/pRvKOStXyh3o8TwB/kMxWLqE0kI5VcMySufwTC1hCSfYNww0DRoNA8yn1CT 0iQA4hExGURB1QR/ZLJc92CWj/OB+2Ur+EhomhjsPrAXvwSXoELaSyxolnSzRNdymJpz Mqb/qHNqAvLcv+8tv4F7OnW4/KrsCtNKKAG/aHMb9cwWyGL2B6MiMz3hFZ9SZ6ZrC9Cd nPdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680604169; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4EmkZjYvZJVj3QnIxJ1/YfnG3WC3vCHhD00FgzBktzc=; b=RtC4dVhXrLMmcjiJmO8wiMYbtfJn+esMUZCdcDf4JYpss/npLf0K1sjPWtdQByNRwl CQFhrfil4upDtlrVYyIifVul/gn+WbQ1bXmaFMwFetnYF+2vR2FZzTFuGb4ImAtKXHEN vEnt7BE5RNW/zM9SSo+hynMrIl2hMLHu5K+2XFBusFD0v/44C0oaK+1aL5aOAT+MdouP z9JDLPTCKRMb7nXkAtFOo7efSTxQA5uC3V57A+9xhbXnMRe+gs9pOK55SIoOT2+Z1fo6 LymHR8p00IUkuI0n9+x2rKvJa23fXTWLi5EukR13TjSQpyshekODvnZ5sn0XW/7p6hI/ vMPQ== X-Gm-Message-State: AAQBX9f21GlO5KsMCVVnolBsworIOHkrOgA2FondEv4STGKwe196FzvW Wof/P+wmMJaDu9F8yHvoROnmmBV7JjqxHFHn3FCCdQ== X-Google-Smtp-Source: AKy350awmYuEjUdfWn5va245UWflf0zZx/dwv6y8E/EYdNes0zj/Q+XcXS4dcKFkmbTVXdB6Y7z6Xb0t/VHgYkRttGw= X-Received: by 2002:a5d:9b0a:0:b0:753:ee63:3dc with SMTP id y10-20020a5d9b0a000000b00753ee6303dcmr1835631ion.20.1680604169337; Tue, 04 Apr 2023 03:29:29 -0700 (PDT) MIME-Version: 1.0 References: <20230403122738.6006-1-zhangpeng.00@bytedance.com> In-Reply-To: <20230403122738.6006-1-zhangpeng.00@bytedance.com> From: Marco Elver Date: Tue, 4 Apr 2023 12:28:52 +0200 Message-ID: Subject: Re: [PATCH v2] mm: kfence: Improve the performance of __kfence_alloc() and __kfence_free() To: Peng Zhang Cc: glider@google.com, dvyukov@google.com, akpm@linux-foundation.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 6tgkjrjsh1caozqjun4yrzsuzkzwqfft X-Rspamd-Queue-Id: 5E71240015 X-HE-Tag: 1680604170-749729 X-HE-Meta: U2FsdGVkX18v8lZ917GwMr8OYfnDqQ5xZyzTLQ8Ew+i2usDFycgzDyaXzSHa6wBSIjBFRWLgWaNrKWHMFB9Cx6HMW18Uq08LsEZJPARBg4ToWvJ+AyURzgFocNk2KJaCgDxRdvu5wV524ZPlLowpnG0xqfzYKVqAgEdM6nL6q6uhdOS9Bg6uck0SmumNjCA+tXYnLhQGGB0iMFJzyx/zHHotEYxsDtGQR5D0RVcP7CRnV2j/gLncAHcHS9DyS5b4iVfXvtwYDs4oXnUPrBWykcyMB16UEAI5LIk3RAydLJyto39IWyHiPyyDdxhZCZSIfEB3dWgQdOYjQhMZxI3RM+OcIzSj0bIl6r0R4xWuY6tPzLuPSQI1sF2oSRxHppIvyF7hxicWbB4B5DPeip7D+7EJNmgIrLbL1KFbmbtiXhUKn86RDG1xLCkO+LGVHkVQaV5pz6GtfIDSryxUecqjOR1zECZdfS0Mql4XGeClJb40wdyYu0kpKYgZ+QTVjOrYGfVp2TW66WpGrfr3zfVuQi6peOiq/yrkt6nA/J2VbeBgsGrNrx/a3rjGEDxgRrxPp1XLckMAum9dp4g9nl0XdJr35pH3w8zpjGDlD7IvckZtYr6WAUB41aTO95xeuMrCLk6e7q6uXwQgRYdmLN7cfSnZ7i1m500MKcznnc/vR7YEhJNQyJoFx7W8QDcdhC7p7o8PSW2nez+0Nu7U5eYtfjt5ZCczu74OrydDUI2UamEuICR94/p8D+ZEHIxQOQgcSS2WIhJexesn4CRJKN4h7M9H6io+YA/qPuZhqFIUNY2Cl7YVarUv0bYFP9CZQrrrxfVoqUIY3tgjFnhzsWVipMunsqgFC92OUVw2PQfzUbM7QpX9hoWRA9cx5cVeNd+1VttCIoTKGhuRWla9q5ZKQDoS21y4YfHygiSC1U5DIS8AwcEwgOTaTKq3ThHCK6IChAdVq+LLd9hbiOTLx7c NBUh/whj AoU9sL/muLWLjXOkiNl54gd2T1CXQc/31+2zLMqIEtVDqhttFezeZvTBUQGQeBGtXYwt2yczHp4MOcsEiFA6JOzmcwEsgJ//CiX93PDN+L/bNdMl558s5Y3aWO/Nyaecf5bmlKcbmxdEgs4lGAiw3s/VbNypAvUjdMaj+Vzs70rd94QZGO3pWlyhFxgvr9GH9Jm7f5F44PPC374ngvvbBy4G+UIgttoV/3usHmYre1FCzLiQLILG9nQIYmG4HkD8JtTSx6BtzfPN6DnxsHMBlyVHu1jNAhN9mXvQ3rqUh3t7yb7ttKgGKmWXYiL+Jet73V1U8iXB68S+9ODurgLda5F3eVmdFkYNyI1wr/oG6qkvRoWz4MSere+35iG7o2VDmY/iajILS0rRKxsJlhLdMoiYPFsgSzpaICA16PdVW/69skxo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 3 Apr 2023 at 14:27, 'Peng Zhang' via kasan-dev wrote: > > In __kfence_alloc() and __kfence_free(), we will set and check canary. > Assuming that the size of the object is close to 0, nearly 4k memory > accesses are required because setting and checking canary is executed > byte by byte. > > canary is now defined like this: > KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) > > Observe that canary is only related to the lower three bits of the > address, so every 8 bytes of canary are the same. We can access 8-byte > canary each time instead of byte-by-byte, thereby optimizing nearly 4k > memory accesses to 4k/8 times. > > Use the bcc tool funclatency to measure the latency of __kfence_alloc() > and __kfence_free(), the numbers (deleted the distribution of latency) > is posted below. Though different object sizes will have an impact on the > measurement, we ignore it for now and assume the average object size is > roughly equal. > > Before patching: > __kfence_alloc: > avg = 5055 nsecs, total: 5515252 nsecs, count: 1091 > __kfence_free: > avg = 5319 nsecs, total: 9735130 nsecs, count: 1830 > > After patching: > __kfence_alloc: > avg = 3597 nsecs, total: 6428491 nsecs, count: 1787 > __kfence_free: > avg = 3046 nsecs, total: 3415390 nsecs, count: 1121 > > The numbers indicate that there is ~30% - ~40% performance improvement. > > Signed-off-by: Peng Zhang Reviewed-by: Marco Elver > --- > mm/kfence/core.c | 70 ++++++++++++++++++++++++++++++++-------------- > mm/kfence/kfence.h | 10 ++++++- > mm/kfence/report.c | 2 +- > 3 files changed, 59 insertions(+), 23 deletions(-) > > diff --git a/mm/kfence/core.c b/mm/kfence/core.c > index 79c94ee55f97..b7fe2a2493a0 100644 > --- a/mm/kfence/core.c > +++ b/mm/kfence/core.c > @@ -297,20 +297,13 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex > WRITE_ONCE(meta->state, next); > } > > -/* Write canary byte to @addr. */ > -static inline bool set_canary_byte(u8 *addr) > -{ > - *addr = KFENCE_CANARY_PATTERN(addr); > - return true; > -} > - > /* Check canary byte at @addr. */ > static inline bool check_canary_byte(u8 *addr) > { > struct kfence_metadata *meta; > unsigned long flags; > > - if (likely(*addr == KFENCE_CANARY_PATTERN(addr))) > + if (likely(*addr == KFENCE_CANARY_PATTERN_U8(addr))) > return true; > > atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); > @@ -323,15 +316,31 @@ static inline bool check_canary_byte(u8 *addr) > return false; > } > > -/* __always_inline this to ensure we won't do an indirect call to fn. */ > -static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *)) > +static inline void set_canary(const struct kfence_metadata *meta) > { > const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); > - unsigned long addr; > + unsigned long addr = pageaddr; > + > + /* > + * The canary may be written to part of the object memory, but it does > + * not affect it. The user should initialize the object before using it. > + */ > + for (; addr < meta->addr; addr += sizeof(u64)) > + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; > + > + addr = ALIGN_DOWN(meta->addr + meta->size, sizeof(u64)); > + for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) > + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; > +} > + > +static inline void check_canary(const struct kfence_metadata *meta) > +{ > + const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); > + unsigned long addr = pageaddr; > > /* > - * We'll iterate over each canary byte per-side until fn() returns > - * false. However, we'll still iterate over the canary bytes to the > + * We'll iterate over each canary byte per-side until a corrupted byte > + * is found. However, we'll still iterate over the canary bytes to the > * right of the object even if there was an error in the canary bytes to > * the left of the object. Specifically, if check_canary_byte() > * generates an error, showing both sides might give more clues as to > @@ -339,16 +348,35 @@ static __always_inline void for_each_canary(const struct kfence_metadata *meta, > */ > > /* Apply to left of object. */ > - for (addr = pageaddr; addr < meta->addr; addr++) { > - if (!fn((u8 *)addr)) > + for (; meta->addr - addr >= sizeof(u64); addr += sizeof(u64)) { > + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) > break; > } > > - /* Apply to right of object. */ > - for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) { > - if (!fn((u8 *)addr)) > + /* > + * If the canary is corrupted in a certain 64 bytes, or the canary > + * memory cannot be completely covered by multiple consecutive 64 bytes, > + * it needs to be checked one by one. > + */ > + for (; addr < meta->addr; addr++) { > + if (unlikely(!check_canary_byte((u8 *)addr))) > break; > } > + > + /* Apply to right of object. */ > + for (addr = meta->addr + meta->size; addr % sizeof(u64) != 0; addr++) { > + if (unlikely(!check_canary_byte((u8 *)addr))) > + return; > + } > + for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) { > + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) { > + Unnecessary blank line, remove. > + for (; addr - pageaddr < PAGE_SIZE; addr++) { > + if (!check_canary_byte((u8 *)addr)) > + return; > + } > + } > + } > } > > static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp, > @@ -434,7 +462,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g > #endif > > /* Memory initialization. */ > - for_each_canary(meta, set_canary_byte); > + set_canary(meta); > > /* > * We check slab_want_init_on_alloc() ourselves, rather than letting > @@ -495,7 +523,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z > alloc_covered_add(meta->alloc_stack_hash, -1); > > /* Check canary bytes for memory corruption. */ > - for_each_canary(meta, check_canary_byte); > + check_canary(meta); > > /* > * Clear memory if init-on-free is set. While we protect the page, the > @@ -751,7 +779,7 @@ static void kfence_check_all_canary(void) > struct kfence_metadata *meta = &kfence_metadata[i]; > > if (meta->state == KFENCE_OBJECT_ALLOCATED) > - for_each_canary(meta, check_canary_byte); > + check_canary(meta); > } > } > > diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h > index 600f2e2431d6..2aafc46a4aaf 100644 > --- a/mm/kfence/kfence.h > +++ b/mm/kfence/kfence.h > @@ -21,7 +21,15 @@ > * lower 3 bits of the address, to detect memory corruptions with higher > * probability, where similar constants are used. > */ > -#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) > +#define KFENCE_CANARY_PATTERN_U8(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) > + > +/* > + * Define a continuous 8-byte canary starting from a multiple of 8. The canary > + * of each byte is only related to the lowest three bits of its address, so the > + * canary of every 8 bytes is the same. 64-bit memory can be filled and checked > + * at a time instead of byte by byte to improve performance. > + */ > +#define KFENCE_CANARY_PATTERN_U64 ((u64)0xaaaaaaaaaaaaaaaa ^ (u64)(0x0706050403020100)) > > /* Maximum stack depth for reports. */ > #define KFENCE_STACK_DEPTH 64 > diff --git a/mm/kfence/report.c b/mm/kfence/report.c > index 60205f1257ef..197430a5be4a 100644 > --- a/mm/kfence/report.c > +++ b/mm/kfence/report.c > @@ -168,7 +168,7 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, > > pr_cont("["); > for (cur = (const u8 *)address; cur < end; cur++) { > - if (*cur == KFENCE_CANARY_PATTERN(cur)) > + if (*cur == KFENCE_CANARY_PATTERN_U8(cur)) > pr_cont(" ."); > else if (no_hash_pointers) > pr_cont(" 0x%02x", *cur); > -- > 2.20.1