From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C83F5C02180 for ; Wed, 15 Jan 2025 15:17:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 534D96B007B; Wed, 15 Jan 2025 10:17:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E4A96B0082; Wed, 15 Jan 2025 10:17:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AC026B0083; Wed, 15 Jan 2025 10:17:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1CB096B007B for ; Wed, 15 Jan 2025 10:17:37 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C419A8097B for ; Wed, 15 Jan 2025 15:17:36 +0000 (UTC) X-FDA: 83010040512.26.72D0F2D Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf18.hostedemail.com (Postfix) with ESMTP id D63541C001A for ; Wed, 15 Jan 2025 15:17:34 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V7uayvG7; spf=pass (imf18.hostedemail.com: domain of glider@google.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=glider@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736954254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A7WzHPt/cWpfOHApvRgjD4QZNtPcP64L7GIPpO/klVU=; b=WLkFKt5AXPoVoICh5zXOV6BSz+ag/ul9LnYgvW3QWQAOX+NLDccimmZEWgOrNygTvOHceC 4iaM1NLulN3eiFGVzrKYfD3pYyAHToQr4k7aFr371ETXVlLJd4JMpfL1nj1ZjL3xPwJvYT rW49GNuq1jmavuzvX3lmFBt6qCgOt1k= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V7uayvG7; spf=pass (imf18.hostedemail.com: domain of glider@google.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=glider@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736954254; a=rsa-sha256; cv=none; b=wiTmmRF6e20SJdYfXrw/QUg/1Y1Knpk0bGDbZoIH51ssRmYv+Qf1kx/AYKMP8WgbUFNaMi 8Gz7zPA4SklPE8PVJ7xC9gvXvwI6VmMzKlLA0HJ38VG0+9V8aco+HoyDeovowD0v5+XeB2 erMR561ly8UaihyNEyq6a/XfpEOxCEY= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6d900c27af7so64345696d6.2 for ; Wed, 15 Jan 2025 07:17:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736954254; x=1737559054; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A7WzHPt/cWpfOHApvRgjD4QZNtPcP64L7GIPpO/klVU=; b=V7uayvG7Uv3kyaPAEuVVuN2IibVHmlHNFBDeM5x0RGjJgfh/9SpTM128JfWMCr4Q6R AHw+ssdSwhw2Ra9Een2sCJsns6UWY+RjkWOpK15eZBhBaDAMrJlVolia9eG2nxwsXNU8 4aGPz7okX1FIXKG2sB/YT0MduQb5dxJdvr/suoh4ksBVhXcntTyQt6vYG++5DqAAWaij CjlJA1qyP6EKnJaBGT7VvimGK05Tdi0MLurRSZzg+QIyLwmmpjHl1avjAafrKbVhXfgV pSvLCiNCNuwx+SCaxlXzwMjsbRgMNcYWMiQVdUmvPMevvYw6qWy//7gNwr8ww5jzj77F UQ7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736954254; x=1737559054; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A7WzHPt/cWpfOHApvRgjD4QZNtPcP64L7GIPpO/klVU=; b=JYR/v1XjjmHmI+P6HeHjQm6fng39PeG+tcRqGbAGzMv+K9SjTM8uNBi6dAXn/eqbKu VDdVGiTy27sdS2rSHrNyctOPCw9OnUW90mkcWOGZ1hHRhrmhhwoR+QwO9kZM71ar/mJ8 8mjSCloPLrpmXjIT7dyvrEuuzRDi5hlEGM8SgN5Twpzo5BRTa3A98FOCT10AamGMGz38 7AiXiCarbdtQTXPA1vUxEQ/MYA+c+9Y/DTva0aKg73S3kik9ChnFGv5qB8iiY/8LiR3h uMeHczVIWkv9PGJ64wyK0MgEDJRVAmqfiqDt4aVRrY4cC8CEiipwg8L+yhay73VytB/N 2B9Q== X-Forwarded-Encrypted: i=1; AJvYcCU0TY17fsUsC0LbX9JwMX/TRZyeqfOlwU3gzua7drQmGPcn84nd00Ws638wnXd8NxgnOnjECe5KuQ==@kvack.org X-Gm-Message-State: AOJu0Yxss317pDPWCqbi5D6zJojUxwL8+mKQ4ZbEfIKQWBx0k44FzrxK wXDvnI2LMxhNIRw49OKHKavZ8ddOcLphUg13CRUMEYYgqygsimh9z5XLmrsToKmD5XuBRoOJTmA EZOwLUWKO1t/rignlJESfFHotOQP97RZf4p8Y X-Gm-Gg: ASbGncvB2LnO8PYheg1AHd1J/9KSw4tuX/XuDudVvarB8sZo2jwrgZLKlMSBVZi31YR rV5yZjqO6JymTCLJmpvQ2P+gDvkwPHK04Rzgi+ruMI9N0MzMXSiNeKFdAtKvSYenTOThc X-Google-Smtp-Source: AGHT+IExamaQV69QfDT6mt7pi4rOTIShHeVotAi3fypa8T+VZtB/JGqy31OBdOOudK5fsyrNV0iaD1BO6s7Tv7OYVl4= X-Received: by 2002:a05:6214:2f8e:b0:6df:ba24:2af2 with SMTP id 6a1803df08f44-6dfba242b5bmr334969386d6.25.1736954253788; Wed, 15 Jan 2025 07:17:33 -0800 (PST) MIME-Version: 1.0 References: <20250114-kcov-v1-0-004294b931a2@quicinc.com> In-Reply-To: From: Alexander Potapenko Date: Wed, 15 Jan 2025 16:16:57 +0100 X-Gm-Features: AbW1kvYef5mtP3iXS_rQEHTfWGlWNRIExvdT9bNMIFnqNFmmY6UipxblFzMqMNY Message-ID: Subject: Re: [PATCH 0/7] kcov: Introduce New Unique PC|EDGE|CMP Modes To: Joey Jiao Cc: Marco Elver , Dmitry Vyukov , Andrey Konovalov , Jonathan Corbet , Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter , Catalin Marinas , Will Deacon , kasan-dev@googlegroups.com, linux-kernel@vger.kernel.org, workflows@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, kernel@quicinc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D63541C001A X-Rspam-User: X-Stat-Signature: dsozgicxpopfkqmbt3c5p38zay7ux3xo X-HE-Tag: 1736954254-46869 X-HE-Meta: U2FsdGVkX1/GxwTor1pqG/W3QRsFC+zzeQB6zXthqZBS9kVigGvJtBDUuvmKkjBkPqWx+Kexue1wUrTuo8JhZpxYlT+P0DP/b+4weQlIul9BL3IQiNmh8d3dYROwU6fMJ4iKkJdZvrel86bHtm4/jxP4tjP57dOHZE5WDuXPoVHxMbvWcdmDKJSZVG3vsF5YMbrjHCo1UshAHOdhYwezAbxy+DYM6GVG18VB0GBoBKEtJdhZoZwIU5Mm6IXHKoo7QvwbFSgz/4io73S8WKsyAe/umjuarm7ob1hLG58J7WF0gomqCK5wTbWjzy6ikdpOy30Ik3zrmFGTb8qpIXqxTvI92TJr8odsgvwNpKBsQZuJMNxhoSl+76OJHeCva8beRX0C1kuUDn4kiAm57IFQscYHaDezv4i274EXerAm9q0SwLf5C2bkTogOsTkmJI3VslETFlMA7MRJhUwYpdlA9wwr/nmIrhZUNxmthshQ1bemeh8PLvXQB9cZXv9ijpL1L6+vJ+MHTPBoCjV5W0X29pJ6+SnWgYNQVXvj/W8WmhqfNIdjZEHtJKcs3gu8j5i+ymXJvIQStSS8t7vPGccUEbvB5ABjyNXZ8L0IE+a5LPozBMfB7XO9XZH6GNgnujlNayfA3XiWVlJ+phnHq8UWctan9o/RXhHdETQLJ5TUwPHKlU1PSm2Z59KFQtrQ4M+Ck0AXkAxTmtTPR/mrH1buwUFi79l9w1yvX7rlg9z0krQ8gjlBw7ax+fFMJmA6zpIZMbA/Q+YAJ8CG2g8XK3VOviCFxhe9a4aErCDf6CRGK33Df97iFeUTbySLy6Dzbkh0Grg1YG1lhGqGyp5VDfmzthZsgI+EohsehAsPgN69zW2Tn2BkfJOWIuiI2Lq0hL9Tkm9tSOenrSzKnjaxh+aLPA/CB6KUrJ6r/OKisOxNSWdaDyUSY9iDRHuZq5KZWAiQGK7dKZZz8xp+jb8EX9L FvDMTGV6 cPBdXr8WA/zqYIEOti1JHYjus89NCFSZNGrk7eUI8HNPmS3ChFFtlrt51pwfU5s7htiqhRjtJox9O+YOK8MyB0CeEBGJHjgSy72RN6lTXhDSquA8w1P2oH0HEZc6n9b++itLzPZjzLuNjSzIGvFzOvopmdIizU7CZpnPps7J3bDFdxj5yllK+W8sdzXvmSrds5gbgG/igqZW20d9gie1XAgDQNBkBW42jUWIl66JzAiAGwuRnzJ9/YXHohwIBy7YzmUia X-Bogosity: Ham, tests=bogofilter, spamicity=0.014817, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 2:00=E2=80=AFPM Joey Jiao wrote: > > On Tue, Jan 14, 2025 at 11:43:08AM +0100, Marco Elver wrote: > > On Tue, 14 Jan 2025 at 06:35, Jiao, Joey wr= ote: > > > > > > Hi, > > > > > > This patch series introduces new kcov unique modes: > > > `KCOV_TRACE_UNIQ_[PC|EDGE|CMP]`, which are used to collect unique PC,= EDGE, > > > CMP information. > > > > > > Background > > > ---------- > > > > > > In the current kcov implementation, when `__sanitizer_cov_trace_pc` i= s hit, > > > the instruction pointer (IP) is stored sequentially in an area. Users= pace > > > programs then read this area to record covered PCs and calculate cove= red > > > edges. However, recent syzkaller runs show that many syscalls likely= have > > > `pos > t->kcov_size`, leading to kcov overflow. To address this issue= , we > > > introduce new kcov unique modes. Hi Joey, Sorry for not responding earlier, I thought I'd come with a working proposal, but it is taking a while. You are right that kcov is prone to overflows, and we might be missing interesting coverage because of that. Recently we've been discussing the applicability of -fsanitize-coverage=3Dtrace-pc-guard to this problem, and it is almost working already. The idea is as follows: - -fsanitize-coverage=3Dtrace-pc-guard instruments basic blocks with calls to `__sanitizer_cov_trace_pc_guard(u32 *guard)`, each taking a unique 32-bit global in the __sancov_guards section; - these globals are zero-initialized, but upon the first call to __sanitizer_cov_trace_pc_guard() from each callsite, the corresponding global will receive a unique consequent number; - now we have a mapping of PCs into indices, which can we use to deduplicate the coverage: -- storing PCs by their index taken from *guard directly in the user-supplied buffer (which size will not exceed several megabytes in practice); -- using a per-task bitmap (at most hundreds of kilobytes) to mark visited basic blocks, and appending newly encountered PCs to the user-supplied buffer like it's done now. I think this approach is more promising than using hashmaps in kcov: - direct mapping should be way faster than a hashmap (and the overhead of index allocation is amortized, because they are persistent between program runs); - there cannot be collisions; - no additional complexity from pool allocations, RCU synchronization. The above approach will naturally break edge coverage, as there will be no notion of a program trace anymore. But it is still a question whether edges are helping the fuzzer, and correctly deduplicating them may not be worth the effort. If you don't object, I would like to finish prototyping coverage guards for kcov before proceeding with this review. Alex > > > 2. [P 2-3] Introduce `KCOV_TRACE_UNIQ_EDGE` Mode: > > > - Save `prev_pc` to calculate edges with the current IP. > > > - Add unique edges to the hashmap. > > > - Use a lower 12-bit mask to make hash independent of module offse= ts. Note that on ARM64 this will be effectively using bits 11:2, so if I am understanding correctly more than a million coverage callbacks will be mapped into one of 1024 buckets.