From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBB2DC55189 for ; Wed, 22 Apr 2020 08:19:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7947A206E9 for ; Wed, 22 Apr 2020 08:19:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZP9qYsS9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7947A206E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1FCE88E0006; Wed, 22 Apr 2020 04:19:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1870B8E0003; Wed, 22 Apr 2020 04:19:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 076228E0006; Wed, 22 Apr 2020 04:19:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id E01438E0003 for ; Wed, 22 Apr 2020 04:19:00 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A48D0181AEF15 for ; Wed, 22 Apr 2020 08:19:00 +0000 (UTC) X-FDA: 76734790440.23.dirt61_4e9f2fa236760 X-HE-Tag: dirt61_4e9f2fa236760 X-Filterd-Recvd-Size: 4563 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Apr 2020 08:19:00 +0000 (UTC) Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EA9F120663; Wed, 22 Apr 2020 08:18:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587543539; bh=bScY0j6E3zZ/Jcj3o78C2PqeEUiNfnBeyfhMpheLhHE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ZP9qYsS9jKoi0DbmVtwAUkVCwXLutRnEpo9XLbqxVgcNl9GbZaXORzmegOgYzJXIm KSWswY4EhGmdizom2AmxUTq8Lg0E5MSuUdS4QEYN4iDRenztU4m/cYw5XFulIlGdQo Ic7YP8FMlMm/khi3DOOI9w8mx+ML6JCP74nXR3MY= Date: Wed, 22 Apr 2020 09:18:53 +0100 From: Will Deacon To: Vlastimil Babka Cc: Prathu Baronia , catalin.marinas@arm.com, alexander.duyck@gmail.com, chintan.pandya@oneplus.com, mhocko@suse.com, akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com, mark.rutland@arm.com Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user Message-ID: <20200422081852.GB29541@willie-the-truck> References: <20200414153829.GA15230@oneplus.com> <87r1wpzavo.fsf@yhuang-dev.intel.com> <20200419155856.dtwxomdkyujljdfi@oneplus.com> <87k12bt3ff.fsf@yhuang-dev.intel.com> <20200421093621.3fuptvf2qbyfzwfz@oneplus.com> <20200421100932.GC17256@willie-the-truck> <02d5daa8-ee7b-7d2d-6753-5191a7d761b9@suse.cz> <20200421133935.GC17875@willie-the-truck> <5e334947-22e9-e59d-f7bb-63e04cc8caf0@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5e334947-22e9-e59d-f7bb-63e04cc8caf0@suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 21, 2020 at 03:48:04PM +0200, Vlastimil Babka wrote: > On 4/21/20 3:39 PM, Will Deacon wrote: > > On Tue, Apr 21, 2020 at 02:48:04PM +0200, Vlastimil Babka wrote: > >> On 4/21/20 2:47 PM, Vlastimil Babka wrote: > >> > > >> > It was suspected that current Intel can prefetch forward and backwards, and the > >> > tested ARM64 microarchitecture only backwards, can it be true? The current code > >> > >> Oops, tested ARM64 microarchitecture I meant "only forwards". > > > > I'd be surprised if that's the case, but it could be that there's an erratum > > workaround in play which hampers the prefetch behaviour. We generally try > > not to assume too much about the prefetcher on arm64 because they're not > > well documented and vary wildly between different micro-architectures. > > Yeah it's probably not as simple as I thought, as the test code [1] shows the > page iteration goes backwards, but per-page memsets are not special. So maybe > it's not hardware specifics, but x86 memtest implementation is also done > backwards, so it fits the backwards outer loop, but arm64 memset is forward, so > the resulting pattern is non-linear? A straightforward linear prefetcher would probably be defeated by that sort of thing, yes, but I'd have thought that the recent CPUs (e.g. A76 which I think is the "big" CPU in the SoC mentioned at the start of the thread) would still have a fighting chance at prefetching based on non-linear histories. However, to my earlier point, we're making this more difficult than it needs to be for the hardware and we shouldn't assume that all prefetchers will handle it gracefully, so keeping the core code relatively straightforward does seem to be the best bet. Alarm bells just rang initially when it appeared that we were optimising code under arch/arm64 rather than improving the core code, but I now have a better picture of what's going on (thanks). Alternatively, we could switch our memset() around, but I'm worried that we could end up hurting something else by doing that. I guess we could add a memset_backwards() version if we *had* to... > In that case it's also a question if the measurement was done in kernel or > userspace, and if userspace memset have any implications for kernel memset... Sounds like it was done in userspace. If I get a chance later on, I'll try to give it a spin here on some of the boards I have kicking around. Will