From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16910C3DA78 for ; Tue, 17 Jan 2023 19:17:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC5C86B0071; Tue, 17 Jan 2023 14:17:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A75D96B0073; Tue, 17 Jan 2023 14:17:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93D1D6B0074; Tue, 17 Jan 2023 14:17:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 82FC56B0071 for ; Tue, 17 Jan 2023 14:17:31 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 610A51C5BB1 for ; Tue, 17 Jan 2023 19:17:31 +0000 (UTC) X-FDA: 80365249902.06.27C8B98 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf29.hostedemail.com (Postfix) with ESMTP id AD8B812000D for ; Tue, 17 Jan 2023 19:17:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kxbltlCa; spf=pass (imf29.hostedemail.com: domain of ndesaulniers@google.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=ndesaulniers@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673983048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=FrsgaT/w4I+jZp4Suh/+RW5snwZz+zpBuxAT2G67CbJRUa5F/rC/PyIk9I970QW4Kq2HkU 75znqqd0XYVsqH4tdQqWUzgKvzgIcb3scQDb0aJuE7aoBniNDaVgoHSC+MnwwyqYpau2JP uOBCYKAv8dInY22t+1pvgAA4q9XVgfU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kxbltlCa; spf=pass (imf29.hostedemail.com: domain of ndesaulniers@google.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=ndesaulniers@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673983048; a=rsa-sha256; cv=none; b=SPZQt2Y3nUqgq8xqjckv2P/Khkowa4RfuigXjTpgUdugBh6RbZTeAH/Zz3dn24wUKRlIO/ 9HzG9fzU3o0uutpXAVtg1Ed0z6aOCJ9nWeF4m9x8bnoE6I85i8VxO736cHGdeT+KaBdWL5 W0Ke9rlVmSIQIFkwusfhWG2i1fenUBI= Received: by mail-pg1-f177.google.com with SMTP id s67so22756235pgs.3 for ; Tue, 17 Jan 2023 11:17:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=kxbltlCa5KiY2grjjE02ZlPoiDDLBMthuShIG5sqFiOtBMap+KYDWZYfjsUEIi5ioD OAXYZM1hCGqb4uqPwVpfm+vgJkx/3GI8i1rDazC2IEXMPwUFOZvWqyL9nJmWQpnnEPD5 ZTv/LfN66Fkbj+tCTo3NizQDlvvT6wiBjYQ7zE85jP/MQzN5BD67dM2DzRhNE80vVwXy +7gtsYmLduOf5oBnWupdrlIAxXTJl1QrTVRS8toI9TR6HZz1jWBw3kwc4MGqPkjYfJCW UJINdxAE8fVN4RyRnGRiPS8Hw1W5goGXQUgK3Z/L50KeN0+ZbAxe5q1NOBz+PNd8L/RI iFlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=NYPEVkPsynRBMVA6nlfQpgeEi18PoxJGeGWNJRYuovK2caxiPM41JTaCVh+0bMILcE X0EDIsdqNaRuxX9AEdTl6VrkvdVdejnXupSP6uQrmK8DGBduapkQDQaP9VoIcqvGHwpN n1gazmfzzF4Srjx464pKGOPGcYb8s3C5xdoiGNFPyaxESYyxt0TavlLy8kL4CDw5F1Rg nMauH6/AnWB1kIaV636hGLUrLcSjZthM9fHHb5zh5dGELcVXAxOmAnGfmKwqPRcDu61Q qRl3Ma2hTDanbKkc6z37sl4jDx15mQBSsMkk/xhQfWKEoyMKGuXkxeDX/jS5tOm+QErO SHTg== X-Gm-Message-State: AFqh2kpuwHz7vTiW39VhSB2LjHNdLwokgQdpiK/txZ5RArWIVBobWPMb 4seftE0ETBvfOj+HrFR+zGAbodK8v93kmBTcPQIRgQ== X-Google-Smtp-Source: AMrXdXtDqOG5QlMZYECRdGHEBBMMBUt8IoqHQDP1z5Ql6csgodRoRX705ojQtoQ11raWmQ/0ncdWiOvUp9wWtR3tTU4= X-Received: by 2002:a63:78a:0:b0:4ce:52b4:aff8 with SMTP id 132-20020a63078a000000b004ce52b4aff8mr300030pgh.427.1673983047213; Tue, 17 Jan 2023 11:17:27 -0800 (PST) MIME-Version: 1.0 References: <20230111123736.20025-1-kirill.shutemov@linux.intel.com> <20230111123736.20025-9-kirill.shutemov@linux.intel.com> <20230117135703.voaumisreld7crfb@box> In-Reply-To: From: Nick Desaulniers Date: Tue, 17 Jan 2023 11:17:15 -0800 Message-ID: Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user To: Linus Torvalds Cc: Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , x86@kernel.org, Kostya Serebryany , Andrey Ryabinin , Andrey Konovalov , Alexander Potapenko , Taras Madan , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Bharata B Rao , Jacob Pan , Ashok Raj , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sami Tolvanen , joao@overdrivepizza.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AD8B812000D X-Rspam-User: X-Stat-Signature: 83mxca16ktpsh9wurnoezb3gc69ddz4i X-HE-Tag: 1673983048-559043 X-HE-Meta: U2FsdGVkX19jmR96AHea4xUvGeWWi6mpg20aMrKVjMpudQZUYMrd320eC4Jvu9jOl93/0vu0osLSTMiqnrms3hWZjCDzLEMspsi9LJDOW7OYBQMHhLJloffyCURjzbqdoHce5EVGTD0tVSMH8M38WY+UUGYC1EUD+xW7wO0afKMoCoHP3bzv1SdTjbPcKJ6D8ehbXXyyhkl+4oX7kd61Pf1tGGox9eTLcg22i1S1PiQYbACVwXT3Pgya96wa+83OTnq+3B7jFbzAV8453XGwbXGs/ClPTwp/TnbH3tDfh2FPXIg9Gp3kiP/ekym6Pb9CLg537BWFPA5qqF6Tf/DyIpSebl5+oFMCwrza6c90esEdsaH67jXWZdhsWMAmS3i1qlGokaWLa20TqQSfsgqduYKtVDrC3sTZ52euNAGI2SldGNqCLMCrt2YWUaqMOHkLHDyq4LsNDuxa1FjfyUjZyzWI0Fw9Leki9K8d1og4O21zERppRa2Ek4I4z9SPEjsqgZevrl9mv2FdusdCjPqVtmNIEdv4MB64cUol61MC4R3xpAc6Um/V+bvwaFhskRGzXf2QC0NPyk9ZRgzaXVAmfdJ2kJT2+L0GEZhAKK19gV/hntvAUoxZBA8B1sZqJvT6R14p1LHFiofWtk2ecrSPBiLVyMhGhiftdsVeAcOpO2e2anlkQ3N0VP0RmKGG4p5vpAqTZ+0XYcWb534Ug5XHD62m+V3cImMSwz8KdkQf9aqoDBLLUuF8XbKRL02Hr8Mz0G2drUNXgZZ6lXQGP1r/UUlH/q7ldRS3EQZwkTv1S7koKRcTYChLmVwCK6PLyPiM6cMWKAhcFpGAyK5LCPdGZH+BfWmcPfUnCGWNyNLQbYizcNPV/tvDB4tWLwOJ6sGJqPgErEzn9+VWT1VYPPhKVMuFDveysbKBWYL1M7/bAMGMwzPRCKJFtplOB439lh5IjQwmNjPZBuAfFPLuRND f4Tpi2FI 7CjHDYl9vj4b76hYMeQ7SSNlvGhy43kN4UijOvazvvwKppLXsHVzrE9023z46H2uOlvfYdko+GvUe+Ioq/5VZVWUNwaV5MzMj3Yct8hnms+701syrdBaq8VMyj62wC2fHub8I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds wrote: > > On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers > wrote: > > > > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds > > wrote: > > > > > > Side note: that's not something new or unusual. It's been the case > > > since I started testing clang - we have several code-paths where we > > > use "unlikely()" to try to get very unlikely cases to be out-of-line, > > > and clang just mostly ignores it, or treats it as a very weak hint. I > > > think the only way to get clang to treat it as a *strong* hint is to > > > use PGO. > > > > I'd be surprised if that were intentional or by design. > > > > Do you guys have a bug report we could look at? > > Heh. I actually sent you an example long ago. Let me go fish it out of > my mail archives and quote some of it below so that you can find it in > yours.. > > Linus > > [ Time passes. Found this email to you and Bill Wendling from Feb 16, > 2020, Message-ID > CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@mail.gmail.com ]: > > Anyway, I'm looking at clang code generation, and comparing it with > gcc on one of my "this has been optimized to hell and back" functions: > __d_lookup_rcu(). > > It looks like clang does frame pointers, and ignores our > likely/unlikely annotations. > > So this code: > > if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) { > int tlen; > const char *tname; > ...... > > doesn't actually jump out of line, but instead generates the unlikely > case as the fallthrough: > > testb $2, (%r12) > je .LBB50_9 > ... unlikely code goes here... Perhaps that was compiler version or config specific? $ make LLVM=1 -j128 defconfig fs/dcache.o $ llvm-objdump -d --no-show-raw-insn --disassemble-symbols=__d_lookup_rcu fs/dcache.o 0000000000003210 <__d_lookup_rcu>: 3210: endbr64 3214: pushq %rbp 3215: pushq %r15 3217: pushq %r14 3219: pushq %r12 321b: pushq %rbx 321c: testb $0x2, (%rdi) 321f: jne 0x32d7 <__d_lookup_rcu+0xc7> ... 32d7: popq %rbx 32d8: popq %r12 32da: popq %r14 32dc: popq %r15 32de: popq %rbp 32df: jmp 0x3300 <__d_lookup_rcu_op_compare> That looks like what you want, yeah? Your original report was from nearly 3 years ago; could have fixed a few instances of branch weights not getting propagated since then. What's going on in this case in this thread? I know we don't support hot/cold attributes on labels yet, but if static_branch_likely (or friends) is being used, we assign the indirect branches a 0% likeliness/branch-weight. > > and then the likely case ends up having unfortunate reloads inside the > hot loop. Possibly because it has one fewer free registers than gcc > because of the frame pointer. > > I didn't look into _why_ clang generates frame pointers but gcc > doesn't. It may be just a compiler default, I think we don't end up > explicitly asking either way. > > [ And then Bill replied with this ] > > It's not a no-op. We add branch probabilities to the IR, whether > they're honored or not depends on the transformation. But they > *should* be honored when available. I've seen in the past that instead > of moving unlikely blocks out of the way (like gcc, which moves them > below the function's "ret" instruction), LLVM will do something like > this: > > > > > > > <...> > > I.e. the loop is rotated and the unlikely code is first and the hotter > code is closer together but between the unlikely and conditional test. > Could this be what's going on? Otherwise, maybe clang decided that > it's not beneficial to move the code out-of-line because the benefit > was minimal? (I'm guessing here.) -- Thanks, ~Nick Desaulniers