From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94DC3C433EF for ; Wed, 22 Jun 2022 15:07:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CC098E00C1; Wed, 22 Jun 2022 11:07:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 055908E00AB; Wed, 22 Jun 2022 11:07:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12568E00C1; Wed, 22 Jun 2022 11:07:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D1ADA8E00AB for ; Wed, 22 Jun 2022 11:07:17 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 98C2E356F2 for ; Wed, 22 Jun 2022 15:07:17 +0000 (UTC) X-FDA: 79606200114.29.8A8C284 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf05.hostedemail.com (Postfix) with ESMTP id 48A8110009F for ; Wed, 22 Jun 2022 15:07:09 +0000 (UTC) Received: by mail-ej1-f50.google.com with SMTP id sb34so5837743ejc.11 for ; Wed, 22 Jun 2022 08:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JtKGSY+EwrQu/E5It51yjVvXryiaGnECNUkZx2C4dtM=; b=cO9wuTZr6hgOa/H1lK7keG1p9tQrXY260EO8+oqIMG0rOplL4rfYARo8Vyl+RNL1Nu 2ChN8E89q/fP2nuM60LHVef8iu61TUEfg8Ruev/INvpgj47kR9zATBkGpmxeoBDpHqQ1 sGeyAvE8N7wyhhoOmzWimsRxUC7f5iqyx3kVs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JtKGSY+EwrQu/E5It51yjVvXryiaGnECNUkZx2C4dtM=; b=wTHDAAUc+YsbLDN3p8T0lGmPYZMFXrh4e6aiSJkCFnH1bv3YLf9TXVZMRiP4cXu2py q33cy0s7T6uNfILyR2Aaaa/BigMpVnueeY20eu83o+Rhx2XOmCOw9ZMWRaLzXGA5x4w4 rZPJQqth2Q14K06fKfP9vMqsl+6bk26XVhDFZLpM3CvIVasgn3gzI4Nm5/M0Rnhzu22g kOzYzZj0AsMB5sJsWcoTzL6wbWKQpJTKHQivG4ms1XsHNM6bsE8FRIbk3bsD7qYpeIVA QvCdQmlzmI46i9yYhZeRmx4nZCJ0FvLB3XCXYH3HI8rQv0i+L1kNP5ZbLrsD2XxrIRQ2 cNlw== X-Gm-Message-State: AJIora+j/a+BesTAjLeDC+w632AV2u5SnCA+xtqvVFbBOfWeO5B6e4Xq +lvYCmuJqQ82ufHzHQvuM6xslvoBXekjU3Eo X-Google-Smtp-Source: AGRyM1tqqUC6W83LsfilNZYMFZWIKUNIkBih6TdLJxOsz1wVdcPe8bke/pmBQi5urtU52WZjvF2bOw== X-Received: by 2002:a17:906:4fc9:b0:722:f204:ef5d with SMTP id i9-20020a1709064fc900b00722f204ef5dmr3020928ejw.213.1655910421388; Wed, 22 Jun 2022 08:07:01 -0700 (PDT) Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com. [209.85.128.49]) by smtp.gmail.com with ESMTPSA id la12-20020a170907780c00b006f3ef214dc7sm9592176ejc.45.2022.06.22.08.06.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jun 2022 08:06:59 -0700 (PDT) Received: by mail-wm1-f49.google.com with SMTP id r7-20020a1c4407000000b003a02cc49774so576611wma.1 for ; Wed, 22 Jun 2022 08:06:59 -0700 (PDT) X-Received: by 2002:a05:600c:3485:b0:39c:7db5:f0f7 with SMTP id a5-20020a05600c348500b0039c7db5f0f7mr4500199wmq.8.1655910418839; Wed, 22 Jun 2022 08:06:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Wed, 22 Jun 2022 10:06:42 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] x86/clear_user: Make it faster To: Borislav Petkov Cc: Mark Hemment , Andrew Morton , "the arch/x86 maintainers" , Peter Zijlstra , patrice.chotard@foss.st.com, Mikulas Patocka , Lukas Czerner , Christoph Hellwig , "Darrick J. Wong" , Chuck Lever , Hugh Dickins , patches@lists.linux.dev, Linux-MM , mm-commits@vger.kernel.org, Mel Gorman Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655910435; a=rsa-sha256; cv=none; b=YEFAzuCByAdtCqNuMCzbc8MPauHM5HpEO/gdGkvYs/hK0RAw17MSPzI/G6mHSfRL4JDsxr 9C8LefPF7kEV/B2XsseMUC14YiAihnICjWwLEnky1QEVu7r/FcadP4kPrywnR5SvN9gs2C 68cNjDQs4shs2qEx35NxI9Sohpp22PI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655910435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JtKGSY+EwrQu/E5It51yjVvXryiaGnECNUkZx2C4dtM=; b=dn+WU/eQRjx4VTDtvRTQ57OXXbkKqs+9S/tk3gLgHHJyJjf56pROliPWNbmx9jpsueprhO KmT6vbOcnNH6CtJrBUK2jryAAd681xmAa6aQ7Wy9Vb/mneneVRYPvCfNQYGD10E+WHQRSN y33B4Z2eC7oqRBCyPn0r7NJOf+pMUqA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=linux-foundation.org header.s=google header.b=cO9wuTZr; dmarc=none; spf=pass (imf05.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.50 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Rspam-User: X-Rspamd-Queue-Id: 48A8110009F Authentication-Results: imf05.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=linux-foundation.org header.s=google header.b=cO9wuTZr; dmarc=none; spf=pass (imf05.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.50 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Stat-Signature: 5raeapkpmwimdunw75xipsekchxbdwtc X-Rspamd-Server: rspam09 X-HE-Tag: 1655910429-266592 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 22, 2022 at 9:21 AM Borislav Petkov wrote: > > and frankly, this looks really weird: I'm not sure how valid the TSC thing is, with the extra synchronization maybe interacting with the whole microcode engine startup/stop thing. I'm also not sure the rdtsc is doing the same thing on your AMD tests vs your Intel tests - I suspect you end up both using 'rdtscp' (as opposed to the 'lsync' variant we also have), but I don't think the ordering really is all that well defined architecturally, so AMD may have very different serialization rules than Intel does. .. and that serialization may well be different wrt normal load/stores and microcode. So those numbers look like they have a 3% difference, but I'm not 100% convinced it might not be due to measuring artifacts. The fact that it worked well for you on your AMD platform doesn't necessarily mean that it has to work on icelake-x. But it could equally easily be that "rep stosb" really just isn't any better on that platform, and the numbers are just giving the plain reality. Or it could mean that it makes some cache access decision ("this is big enough that let's not pollute L1 caches, do stores directly to L2") that might be better for actual performance afterwards, but that makes that clearing itself that bit slower. IOW, I do think that microbenchmarks are kind of suspect to begin with, and the rdtsc thing in particular may work better on some microarchitectures than it does others. Very hard to make a judgment call - I think the only thing that really ends up mattering is the macro-benchmarks, but I think when you tried that it was way too noisy to actually show any real signal. That is, of course, a problem with memcpy and memset in general. It's easy to do microbenchmarks for them, it's not just clear whether said microbenchmarks give numbers that are actually meaningful, exactly because of things like cache replacement policy etc. And finally, I will repeat that this particular code probably just isn't that important. The memory clearing for page allocation and regular memcpy is where most of the real time is spent, so I don't think that you should necessarily worry too much about this special case. Linus