From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89B26C43334
	for <linux-mm@archiver.kernel.org>; Wed, 22 Jun 2022 21:07:48 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E4E7A8E00F2; Wed, 22 Jun 2022 17:07:47 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DD4FD8E00E7; Wed, 22 Jun 2022 17:07:47 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C75818E00F2; Wed, 22 Jun 2022 17:07:47 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id B22678E00E7
	for <linux-mm@kvack.org>; Wed, 22 Jun 2022 17:07:47 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay13.hostedemail.com (Postfix) with ESMTP id 8B4F0611B5
	for <linux-mm@kvack.org>; Wed, 22 Jun 2022 21:07:47 +0000 (UTC)
X-FDA: 79607108574.03.E4D66DA
Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47])
	by imf12.hostedemail.com (Postfix) with ESMTP id C974D400AF
	for <linux-mm@kvack.org>; Wed, 22 Jun 2022 21:07:39 +0000 (UTC)
Received: by mail-ej1-f47.google.com with SMTP id u12so36848906eja.8
        for <linux-mm@kvack.org>; Wed, 22 Jun 2022 14:07:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linux-foundation.org; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=vdV4khN9jbInqLI4CoGJr0aFm/O0NmyVKymv5AnToq8=;
        b=XLKkNjFRt1C0ChxBi2kAg8Go9eskGrlfI4csxSmooiP/W7x9Ujt3ZyNelAZ686R9pW
         N498trRQ8h/Q27iBMd+gsaT7IOwDVGZU5ZS48n09MfaRcVZHXqpBa6ZAlDFUIh3F1ViM
         MR6hh0LV1ZnhBVGrTq0GaUQEN6nCffHi5cKQ8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=vdV4khN9jbInqLI4CoGJr0aFm/O0NmyVKymv5AnToq8=;
        b=mTfG4CUpTBNviUHjRj/VuLyAchup7ZDcREnKmUH4vk0hLF53fYJBQFB+iUs5wsTeVm
         Ku92hYMpkNFJvCCE+dVWBKuY0ZPfcVmwjmGOSNr5sllRrid2TYAo7rfrKhC2kbxcFObv
         g6f6Qn+G1dVsS2UyVO2GIOdQHDkD7y4HAbyZZN6o2P9tYmN1DZ5RcIuwgGV5jG2ImUE8
         SUI0M5S58RobYbED43CueGz3iM/Nv/oToGPTCGcQcwPF445W2hcYkaIYr+4BAH6kjbev
         Xyy08SfkchDPcuVF5ELYpZZWbGMVa0tSMLVqNxNAQOnAbxTamCZM1+K5u5VJaCf15b1k
         R9gw==
X-Gm-Message-State: AJIora9RWeUJRZU26b2/kWhg0NE6vAt1SaBxXla1Zd4lN+oJIgxyfaxf
	f8SIkfQ1WC08ij+76M+IDoIAAWc9qQt+O1hn
X-Google-Smtp-Source: AGRyM1sFsqPVdSMNbr3UjEevBF2l05q/kvc6t/MqX1H3UqYTKsCctlmAiYgnKVICbShzoLZRIHR0+A==
X-Received: by 2002:a17:906:6a28:b0:711:d032:caa4 with SMTP id qw40-20020a1709066a2800b00711d032caa4mr4955882ejc.80.1655932057990;
        Wed, 22 Jun 2022 14:07:37 -0700 (PDT)
Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com. [209.85.128.52])
        by smtp.gmail.com with ESMTPSA id p1-20020a17090653c100b00722e771007fsm2796450ejo.37.2022.06.22.14.07.36
        for <linux-mm@kvack.org>
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 22 Jun 2022 14:07:36 -0700 (PDT)
Received: by mail-wm1-f52.google.com with SMTP id x6-20020a1c7c06000000b003972dfca96cso333808wmc.4
        for <linux-mm@kvack.org>; Wed, 22 Jun 2022 14:07:36 -0700 (PDT)
X-Received: by 2002:a05:600c:354c:b0:39c:7e86:6ff5 with SMTP id
 i12-20020a05600c354c00b0039c7e866ff5mr270015wmq.145.1655932056146; Wed, 22
 Jun 2022 14:07:36 -0700 (PDT)
MIME-Version: 1.0
References: <CAHk-=wiGbrJMim6EWncZUQBzguqy-vtNd+grfNizm5L8Vcmu+w@mail.gmail.com>
 <YnLplKy0Y66SsvQw@zn.tnic> <CAHk-=wjUX5DGNSiBYvPC8fQJGRe5_RWR8NW=gYF4=UpPiwCE8A@mail.gmail.com>
 <Ynow8F3G8Kl6V3gu@zn.tnic> <CAHk-=whCmmipbBDips0OJ=UiBUjZfgBGYruoOsqcq2TVd5kBSA@mail.gmail.com>
 <YnqqhmYv75p+xl73@zn.tnic> <Ynq1nVpu1xCpjnXm@zn.tnic> <YozQZMyQ0NDdD8cH@zn.tnic>
 <YrMlVBoDxB21l/kD@zn.tnic> <CAHk-=wgmOfipHDvshwooTV81hMh6FHieSvhgGVWZMX8w+E-2DQ@mail.gmail.com>
 <YrN4DdR9HN0srNWe@zn.tnic>
In-Reply-To: <YrN4DdR9HN0srNWe@zn.tnic>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 22 Jun 2022 16:07:19 -0500
X-Gmail-Original-Message-ID: <CAHk-=wj_MeMUnKyRDuQTiU1OmQ=gfZVZhcD=G7Uma=1gkKkzxg@mail.gmail.com>
Message-ID: <CAHk-=wj_MeMUnKyRDuQTiU1OmQ=gfZVZhcD=G7Uma=1gkKkzxg@mail.gmail.com>
Subject: Re: [PATCH] x86/clear_user: Make it faster
To: Borislav Petkov <bp@alien8.de>
Cc: Mark Hemment <markhemm@googlemail.com>, Andrew Morton <akpm@linux-foundation.org>, 
	"the arch/x86 maintainers" <x86@kernel.org>, Peter Zijlstra <peterz@infradead.org>, patrice.chotard@foss.st.com, 
	Mikulas Patocka <mpatocka@redhat.com>, Lukas Czerner <lczerner@redhat.com>, 
	Christoph Hellwig <hch@lst.de>, "Darrick J. Wong" <djwong@kernel.org>, Chuck Lever <chuck.lever@oracle.com>, 
	Hugh Dickins <hughd@google.com>, patches@lists.linux.dev, Linux-MM <linux-mm@kvack.org>, 
	mm-commits@vger.kernel.org, Mel Gorman <mgorman@suse.de>
Content-Type: text/plain; charset="UTF-8"
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655932063; a=rsa-sha256;
	cv=none;
	b=8hq0qVJeAeG5BgA0xpZrK5lURhyhXxqbQQH2Ho+MS7c3t0N8Z7XJyB0z1YWUbbnAUbynJ1
	XevUZEXp6sODtjpGWPWXA9EnMyYbEdNOe48Ws37oFoBxI7CZE2+vC4e3VZJqkYlbL8KsVV
	rog0Zrp+YunfmykwCF6eJNeXHogH+Jg=
ARC-Authentication-Results: i=1;
	imf12.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=google header.b=XLKkNjFR;
	dmarc=none;
	spf=pass (imf12.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1655932063;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=vdV4khN9jbInqLI4CoGJr0aFm/O0NmyVKymv5AnToq8=;
	b=AriVPT4hNHuRqs921iumo+s6cOx2c3M3ALLvUf9P0Y8PyweZsYEQ8e97soitZHBQTRDXH0
	XIm5CV18pCSaTVq2V1KjhElwVd6JeWE244VLc+HwBC8TEdVwn0pK7RlN0046u2hAmHrN7G
	57/BynK9RD8hPsg6pen+9zCFAnnqoww=
X-Rspamd-Queue-Id: C974D400AF
X-Rspam-User: 
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=google header.b=XLKkNjFR;
	dmarc=none;
	spf=pass (imf12.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org
X-Rspamd-Server: rspam03
X-Stat-Signature: gb5zhm9mme4kayhhp1u8tzndjentkg88
X-HE-Tag: 1655932059-227932
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Jun 22, 2022 at 3:14 PM Borislav Petkov <bp@alien8.de> wrote:
>
> before:
>
> $ dd if=/dev/zero of=/dev/null bs=1024k status=progress
> 400823418880 bytes (401 GB, 373 GiB) copied, 17 s, 23.6 GB/s
>
> after:
>
> $ dd if=/dev/zero of=/dev/null bs=1024k status=progress
> 2696274771968 bytes (2.7 TB, 2.5 TiB) copied, 50 s, 53.9 GB/s
>
> So that's very persuasive in my book.

Heh. Your numbers are very confusing, because apparently you just ^C'd
the thing randomly and they do different sizes (and the GB/s number is
what matters).

Might I suggest just using "count=XYZ" to make the sizes the same and
the numbers a but more comparable? Because when I first looked at the
numbers I was like "oh, the first one finished in 17s, the second one
was three times slower!

But yes, apparently that "rep stos" is *much* better with that /dev/zero test.

That does imply that what it does is to avoid polluting some cache
hierarchy, since your 'dd' test case doesn't actually ever *use* the
end result of the zeroing.

So yeah, memset and memcpy are just fundamentally hard to benchmark,
because what matters more than the cost of the op itself is often how
the end result interacts with the code around it.

For example, one of the things that I hope FSRM really does well is
when small copies (or memsets) are then used immediately afterwards -
does the just stored data by the microcode get nicely forwarded from
the store buffers (like it would if it was a loop of stores) or does
it mean that the store buffer is bypassed and subsequent loads will
then hit the L1 cache?

That is *not* an issue in this situation, since any clear_user() won't
be immediately loaded just a few instructions later, but it's
traditionally an issue for the "small memset/memcpy" case, where the
memset/memcpy destination is possibly accessed immediately afterwards
(either to make further modifications, or to just be read).

In a perfect world, you get all the memory forwarding logic kicking
in, which can really shortcircuit things on an OoO core and take the
memory pipeline out of the critical path, which then helps IPC.

And that's an area that legacy microcoded 'rep stosb' has not been
good at. Whether FSRM is quite there yet, I don't know.

(Somebody could test: do a 'store register to memory', then to a
'memcpy()' of that memory to another memory area, and then do a
register load from that new area - at least in _theory_ a very
aggressive microarchitecture could actually do that whole forwarding,
and make the latency from the original memory store to the final
memory load be zero cycles. I know AMD was supposedly doing that for
some of the simpler cases, and it *does* actually matter for real
world loads, because that memory indirection is often due to passing
data in structures as function arguments. So it sounds stupid to store
to memory and then immediately load it again, but it actually happens
_all_the_time_ even for smart software).

            Linus