From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22A4DC3DA78 for ; Tue, 17 Jan 2023 20:51:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E0F46B0073; Tue, 17 Jan 2023 15:51:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 990D16B0074; Tue, 17 Jan 2023 15:51:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8315D6B0075; Tue, 17 Jan 2023 15:51:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6F2D26B0073 for ; Tue, 17 Jan 2023 15:51:29 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 100D9AAE3B for ; Tue, 17 Jan 2023 20:51:29 +0000 (UTC) X-FDA: 80365486698.09.4528C12 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf15.hostedemail.com (Postfix) with ESMTP id DF1B0A0013 for ; Tue, 17 Jan 2023 20:51:26 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=NaF0mTKX; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673988687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=6FPcyeUfvXFPt9BQuGj8RysT4nPtC1UahJIy8PTUEaSaoiCfUy2YLep3fjieze9BmJ9i6P gVSLV8gCWCiumo9onWILFAOJISy44JDMGk7fFCpi9A1/SBDYjZHKyClCw23Tg96VMViWJv yjiGrJmoVJy6wFSRXSG8/Klwx6HFl7g= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=NaF0mTKX; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673988687; a=rsa-sha256; cv=none; b=w9n7AJIq7iNVQJ3hyzWdki14NqEvfTp2m1h3r9y4OBOPL925zsU7r8SMNPYWR+ybEjugbX ju1wKNWpcUNbe6a9Y4GsEyd1hrmM1lxIiU63gIJ6LawYyEVdr60D8eUwrkvgBdNAAAEa2K PTuiHrzET7HdheIuFvri0l+mM+K/RtY= Received: by mail-qt1-f182.google.com with SMTP id fd15so18329641qtb.9 for ; Tue, 17 Jan 2023 12:51:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=NaF0mTKXGLsZlvV/feMeBjnEe9X4ZZaaFt+H7wStvQtVHCvNlMtjziUMrcwGR9AmK8 pOwV+xNrzbgIGpeuNzcyXz848TXcUjB4OQIjHdRXltt1EInK8pSBvoPmwJj/cBJJ9DNt RUi2v4qmFvsl54XqffZV2GuqwTRlO8ybzrZbE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=QNttCT+5ylqWcAuN4MIhoWdFHqqOUdHiNRc0SiTtPj8ULcrlWKj9AzPgmYUzKN8c3c 7IETaVcdqWj/tfYoqawHjUcsBvRpK14CUrvhiIy6yObfuH25UHeNDkG2TKzVsW0zCZWE k3CuRSHpZ2JYVAqe4FGycHUNRd6A+Fdszn/zUva1E2Ye/nWgfoIP7bauAuzAF81mBnAy 0FJq/yo2fUyfaxlTKO4ObA/NWCZwVV8bmN5cFVvxKbKm3LnpAtylV614ExEZlIyhIhSR mTKJ2LCc9Fhq9dJ8LF83UQxdAEL8CwUibdYV6tRQWVdtXzMSPUcYTxkYeh1TKawo4Ywr BemQ== X-Gm-Message-State: AFqh2koX0e0lHX87yqPzzvWof8yGQz8BbywJsAlo2cuXMWRBHvA8fzdo 2HWLmnUkvz844QtVg3jDh1GWPq+NtOzPM/eq X-Google-Smtp-Source: AMrXdXuYAH8WxGJMwtBGUnTrSCGkS3Jh4r2ioGDMomEaUPbE9uYusZWr+uFK27P8yTK34gmbU+TeCw== X-Received: by 2002:a05:622a:1995:b0:3b2:4309:99e with SMTP id u21-20020a05622a199500b003b24309099emr7440094qtc.54.1673988685937; Tue, 17 Jan 2023 12:51:25 -0800 (PST) Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com. [209.85.219.54]) by smtp.gmail.com with ESMTPSA id do26-20020a05620a2b1a00b0070648cf78bdsm6811086qkb.54.2023.01.17.12.51.25 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jan 2023 12:51:25 -0800 (PST) Received: by mail-qv1-f54.google.com with SMTP id q10so22402490qvt.10 for ; Tue, 17 Jan 2023 12:51:25 -0800 (PST) X-Received: by 2002:ad4:50a9:0:b0:532:31b0:b4fa with SMTP id d9-20020ad450a9000000b0053231b0b4famr295229qvq.129.1673988253652; Tue, 17 Jan 2023 12:44:13 -0800 (PST) MIME-Version: 1.0 References: <20230111123736.20025-1-kirill.shutemov@linux.intel.com> <20230111123736.20025-9-kirill.shutemov@linux.intel.com> <20230117135703.voaumisreld7crfb@box> In-Reply-To: From: Linus Torvalds Date: Tue, 17 Jan 2023 12:43:57 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user To: Nick Desaulniers Cc: Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , x86@kernel.org, Kostya Serebryany , Andrey Ryabinin , Andrey Konovalov , Alexander Potapenko , Taras Madan , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Bharata B Rao , Jacob Pan , Ashok Raj , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sami Tolvanen , joao@overdrivepizza.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DF1B0A0013 X-Stat-Signature: 3fbjr8uegswg5pm1z4dk8mciu85ybm9e X-HE-Tag: 1673988686-617920 X-HE-Meta: U2FsdGVkX18AKMuq/evX0LL8eANLc6C98cO5dGzdAfMxhe6ZwEfwXvaiMev6mZRSH7YQ/O5khC8ScHy7WGqeMAv84bfP5uDgmOLFZXQNBd+cMOCfynlNxjZKobcxw1/b66tSdSGe9laNL/8bB7AjldCLHF7py7wBbcsjXWE51/Fl7wrz1z77Ih54ZKpwKLFJmhL1x/c5FSWn8X0I1hCSQqcQNFdrSk35bSYGPIVTS6rbk4r+lHEqJkKuohyWOkQyNQMR59Y5sH7ATQur24vW1Awb/khwMXONyP7FojVcWKdRDIcl2+HS2ozYSAKNqSRIjHZ1gN52+bbGLyd+HAWhg8XtShyOJF73hBJaKrNk8Wlw0Wio6W+15xi6qaw8Ys0DqSbDoZ49AB5yNr98rogN9zU+zcE7zAKmDAFo6yYbTUj9UTA7i4ygfQObf+gJBDQeIsgcvy04BuCyPZ7ujUmqDRQ3EUSJi2m32Vr2dYufDdvfCwycMDLp2bnIDlsL9wZK3M3wZjAXd6OulHpbQiygy+xEnIztHDc58OmWoFQfZ4r81hyJC4a5stb97NVd+syEoLqjzjwm/GEEPEM10IvWCj/NewWreYGiMNiEwqvJAvJA90sDSihnpiFC5HknLf2ZwhUczrRT9vYoiXYUChoTjW/3daQ0+7BChPxmlH3QBl61tWKAohyOtpFtj05oqN1SiXU5wuutK1BzFo1XTwocHhyMfpzYz9xuvpjCmbQIVto/P6+gU5f6uR0ufA5r0VJIHm8uNtX//Gv7JGoT7QFR7Escqi6MkE7xJ7jtJHFz6phu3dSVo/0gZkDw9JxTK25pxcfMK1vmPY/YXH3CkvSJmzEJ3xoWInH3e2XDlKlfG5t++1nG3jNAt2NOqCK2R07TJwhSJWvH+uWN1y2yghIN9+Cw2O/k189H8zuAr16yMQo1I4+hPIB75ib9tfPqhm4SwQ5pBCJUp1r+oPIKoJb viQKzbtY 6J6ITjH24X9tamRncosCu3fQzknZWVP/eEVtvNtAev5hP/V0rF1rECtrI5TIrzvupj8D4xTixdUwnnlyQUeWM8PUfV9mcCUHELudD7QIBpFaIe7xiIJ97QBhVqkwY90PzFpdpAjMV1YsEyxt+vY5lWjaJLHOZ5h3UJAwiNXSAqcqS7i9FsiH0bc7+7zyW3RIidFarjq6zT3cvsF7kFVeCZq+B/k6WG3AvfFjAZet0+V6B4ppMOkNt6cpupM2VRscxpo66kH7oz2rAkU8uqyprE3nf4aJOUxZL6iy4nA32SU26p/7LjonmqC+tQo9uersF4tK6St6m+jLdWJFUR+o7vZyGN6uu38dvBGmA5Z/1Rj99HoeHSBaxo7SbAeb5jbo2ZI64QJdGWuZ28Aiyj7ekdcO00w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 12:10 PM Linus Torvalds wrote: > > That said, clang still generates more register pressure than gcc, > causing the function prologue and epilogue to be rather bigger > (pushing and popping six registers, as opposed to gcc that only needs > three) .. and at least part of that is the same thing with the bad byte mask generation (see that "clang *still* messes up" link for details). Basically, the byte mask is computed by mask = bytemask_from_count(tcount); where we have #define bytemask_from_count(cnt) (~(~0ul << (cnt)*8)) and clang tries very very hard to avoid that "multiply by 8", so instead it keeps a shadow copy of that "(cnt)*8" value in the loop. That is wrong for a couple of reasons: (a) it adds register pressure for no good reason (b) when you shift left by that value, only the low 6 bits of that value matters And guess how that "tcount" is updated? It's this: tcount -= sizeof(unsigned long); in the loop, and thus the update of that shadow value of "(cnt)*8" is done as addl $-64, %ecx inside that loop. This is truly stupid and wasted work, because the low 6 bits of the value - remember, the only part that matters - DOES NOT CHANGE when you do that. So clang has decided that it needs to (a) avoid the "expensive" multiply-by-8 at the end by turning it into a repeated "add $-64" inside the loop (b) added register pressure and one extra instruction inside the loop (c) not realized that that extra instruction doesn't actually *do* anything, because it only affects the bits that don't actually matter in the end. which is all kind of silly, wouldn't you agree. Every single step there was pointless. But with my other simplifications, the fact that clang does these extra things is no longer all that noticeable. It *used* to be a horrible disaster because the extra register pressure ended up meaning that you had spills and all kinds of nastiness. Now the function is simple enough that even with the extra register pressure, there's no need for spills. .. until you look at the 32-bit version, which still needs spills. Gcc does too, but clang just makes it worse by having the extra pointless shadow variable. If I cared about 32-bit, I might write up a bugzilla entry. As it is, it's just "clang tries to be clever, and in the process is actually being stupid". Linus