From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1004C54E76 for ; Tue, 17 Jan 2023 20:16:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62CF86B0072; Tue, 17 Jan 2023 15:16:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DC716B0080; Tue, 17 Jan 2023 15:16:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A3D66B0082; Tue, 17 Jan 2023 15:16:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3805E6B0072 for ; Tue, 17 Jan 2023 15:16:32 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EF30D160B5A for ; Tue, 17 Jan 2023 20:16:31 +0000 (UTC) X-FDA: 80365398582.01.134B5F3 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf17.hostedemail.com (Postfix) with ESMTP id 1C40640013 for ; Tue, 17 Jan 2023 20:16:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="OiGs/qOO"; spf=pass (imf17.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.214.173 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673986590; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dNFni7Rgr3eQM2tShZ3fPnjMab/wLpj7M+t9eYmmNw4=; b=3cfXboXb+kjcMlIhznwW21TFJA2XmcJs7RupZJICDKV5gVN7TJ0Hie/Iu5kgOIDpmxdgmO kLUw9+xad8Otvv5y6C+86R1EaplSjmCFoShNUe3cWnClKxFWKUVtEDxmyMXVLUsGlT/jSe nHQ93qkTvCvRiUl0fAjE1zQQBNkklWg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="OiGs/qOO"; spf=pass (imf17.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.214.173 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673986590; a=rsa-sha256; cv=none; b=6XjBLW+DcfyxIc+dNZnUTM6ZQz4QZa/hFPUVHLCL621PUOf8EXGERpNTFtDVLfb+i8dhYo 7k+aXPxmSqngsVcBf8GjOCpQttKMTt6CcL5wO9fa+70trbtxEHVisJsPgbdk20VGJKeUkW pZQUOHJn/kICT8pLQCqJJAdRWHq4BiY= Received: by mail-pl1-f173.google.com with SMTP id k13so2116475plg.0 for ; Tue, 17 Jan 2023 12:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dNFni7Rgr3eQM2tShZ3fPnjMab/wLpj7M+t9eYmmNw4=; b=OiGs/qOOajehCduepkfEcpG8axNVtd4tbOgOWt02uJ83iC16JDlEf/zlF2d6GjReiB c2GFYK1+u/tXXdt73yLm4VUQcW39gVKE5rF2mtzMzX+UMGHDE3VTw0nW+YlyiTh4UGrv I+sqk4OwgrppTiVKoB5sCf19dVIgaNbLkKY5k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dNFni7Rgr3eQM2tShZ3fPnjMab/wLpj7M+t9eYmmNw4=; b=XkfnXp30wOQrdsZsX6owJoeiECK557ERzDIt4InED0AAzZ7KMtWvyhARY0rczXg/xj 5rpbiCuGH9NO/PKQ3flgpr0pk86HODPK0M7BIHXElFgWf0hnI4SAi1GxrUISHDwyoBR1 /aIhdC53qRB7UIkQPGheqUdifut3hH0erRIOjv5M8CYCWgf74L3MGNsXRFqbr6Py4BV4 TacMJmKjt9YzttQEy1dF4zmO1cR0XNbfuA5Jtm9dahDA7cDLF2wJmH5eWTYy0K3j5M5W 3QY7E8UcOvozmpS1J7ejF9QUBRtRgO7dljHsgHvy3S/qaH4CypUQaGbQe1HQe1MtYxEP TvaQ== X-Gm-Message-State: AFqh2krHfCqlArU6MOXMetO4uuSnZils+X+Pg54b2H0pvMBomfMbUllM S6mexAy1ofUBhkm6jHbTx9lmiHnDrfLo1B6Q X-Google-Smtp-Source: AMrXdXu7c5uU15+5a+GsE0bfkBtupc0R5nRGvftofSxNFmsGJ+W+c+PrQU71fdsoc8ghGEXs5IWnmg== X-Received: by 2002:a17:90b:3d0a:b0:219:705c:7193 with SMTP id pt10-20020a17090b3d0a00b00219705c7193mr4361581pjb.11.1673986588628; Tue, 17 Jan 2023 12:16:28 -0800 (PST) Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com. [209.85.210.179]) by smtp.gmail.com with ESMTPSA id d63-20020a17090a6f4500b00226dd47fc23sm7328727pjk.14.2023.01.17.12.16.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jan 2023 12:16:28 -0800 (PST) Received: by mail-pf1-f179.google.com with SMTP id w2so9071275pfc.11 for ; Tue, 17 Jan 2023 12:16:28 -0800 (PST) X-Received: by 2002:a05:6214:5d11:b0:531:7593:f551 with SMTP id me17-20020a0562145d1100b005317593f551mr220481qvb.89.1673986219572; Tue, 17 Jan 2023 12:10:19 -0800 (PST) MIME-Version: 1.0 References: <20230111123736.20025-1-kirill.shutemov@linux.intel.com> <20230111123736.20025-9-kirill.shutemov@linux.intel.com> <20230117135703.voaumisreld7crfb@box> In-Reply-To: From: Linus Torvalds Date: Tue, 17 Jan 2023 12:10:03 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user To: Nick Desaulniers Cc: Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , x86@kernel.org, Kostya Serebryany , Andrey Ryabinin , Andrey Konovalov , Alexander Potapenko , Taras Madan , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Bharata B Rao , Jacob Pan , Ashok Raj , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sami Tolvanen , joao@overdrivepizza.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1C40640013 X-Stat-Signature: 1zymzup7cy331jicyswyknp8yg6e3g8n X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1673986589-681342 X-HE-Meta: U2FsdGVkX1/2SQ2OwMxb/1cvxhzNs8aYy6+FUg+Y5BbQfbc6I8DHGYlSoVWWcq/Rsvf9OUIBoF5PaYyRm/MyZyDSWXna61ttb9/DnCJNcyErX7rB/7DTK+yiBao2g6bPzwbkux3NBzUlStwi4/Sdk9zmLH5UDAU95elVy5aD+DTEpwZBE/WBRmK8H8rUp+7yTowPbO1cnG+dyjr1/EYdgIjaUHY4KdmtzJR40Y4/qedsMYBQSrHN4Ne0nv1iPpVKfYdgpOotR+5YREkqo8LHPookaft5Il2awyx0AMCpxVPqbcBT1TX+OfNK6T/2m73JTEcG3FncSlmku+p8v8tuiy19MHpWD4gdvawDxJEHSHPCnCpHsEuuoom6gvqO73Qu02np2bcjKBrO77m5uOx/YPh1pytzexuHekBLlGzWBKTItOMfRniZ6DzTrCf7N9hmPYj7q56iQqgcJW/qq7nmqcKQh9Kj9W6p97V9DA3FRXkTOBG9h1xA9b6ymsikndpnEMEhO3HfGBlk6DkNt8zBwGjUM/nNQi2MtsD+JcOtewSyL72yi9igprwLHepED5NO5eYmq46AbBHbOLsrjYtf3pqKhJR5+VRtrTLO5ihvii14GgkSDBJ+zxIsNayusylYV/murhU/C2EA2KbgkKammr4e0qWOJp1wjs0arbGk5Jmvi46SnwPybby+irhMkI1ZtxeiY4LPTCTS5yFNjip8gCoXmR7BxcwUuVeCBPQgS5PVbJu3/ddwx3MS+2bUO8OIuNV3g402T3VWLKfwyrMRY7/1B6kPVKaSivIVW1MYiYGmVooQQFDXmSDYeJZyPV1jfZElhv12d5jRZueBg9WQy+TzTAqx8JYaby8Al3O7hzi+CnUSk2LIT4Sa0zcT5eIOjCYRtEiTVZFBr+5J9MyS3v9qkxMg2JG9AYHYePNMQqRW70YfNSlAg4lkcPORdGXtNIfVlpF0OrBG5vHILC1 dFyfSEFu fgVB5boSCC6o61BDUnvQfewv17lpx3sJirx1CRqTKbHu72GhswJ6Y8EKYidN9YePwj05WmXgmXKwnJfaeAxt6HKfaxma5QzD+d1niNZsuV6s3zwZIxolAOAoF4lMB4KR9Yujqdd2Yrr5uoLXeazanK76hJp3M0T5WOgYZ0tpEJNoFff+eiohPhIbjUfaq+2qLQV5ARQWmXGXDxjoe+/fCFiFirdF7qGsMW15Fh+X/EAlebeazv+xh0E+f7SiNqBuqQ+JMf8XHfqFV2mczDcVe8zyWxmOdGMjCQcr8VcKEwGMOd68MtoaMRhI6aBSrCV3Yy/30ja9STH4jMQkt3n5vz9Nu2r15jhQSS2sYiHlOU/ixNM9lOTqQMuJh9GyXeImcS5jIMMkTEqgH3DAuq52w8em5YYJbdhyg45H1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 11:17 AM Nick Desaulniers wrote: > > Perhaps that was compiler version or config specific? Possible, but... The clang code generation annoyed me enough that I actually ended up rewriting the unlikely test to be outside the loop in commit ae2a823643d7 ("dcache: move the DCACHE_OP_COMPARE case out of the __d_lookup_rcu loop"). I think that then made clang no longer have the whole "rotate loop with unlikely case in the middle" issue. And then because clang *still* messed up by trying to be too clever (see https://lore.kernel.org/all/CAHk-=wjyOB66pofW0mfzDN7SO8zS1EMRZuR-_2aHeO+7kuSrAg@mail.gmail.com/ for details), I also ended up doing commit c4e34dd99f2e ("x86: simplify load_unaligned_zeropad() implementation"). The end result is that now the compiler almost *cannot* mess up any more. So the reason clang now does a good job on __d_lookup_rcu() is largely that I took away all the places where it did badly ;) That said, clang still generates more register pressure than gcc, causing the function prologue and epilogue to be rather bigger (pushing and popping six registers, as opposed to gcc that only needs three) Gcc is also better able to schedule the prologue and epilogue together with the work of the function, which clang seems to always do it as a "push all" and "pop all" sequence. That scheduling doesn't matter in that particular place (although it does make the unlikely case of calling __d_lookup_rcu_op_compare pointlessly push all regs only to then pop them), but I've seen a few other cases where it ends up meaning that it always does that full function prologue even when the *likely* case then returns early and doesn't actually need any of that work because it didn't use any of those registers. But yeah, the RCU pathname lookup looks fine these days. And I don't actually think it was due to clang changes ;) Linus