From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5515BC432BE for ; Mon, 30 Aug 2021 12:12:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF60E610FB for ; Mon, 30 Aug 2021 12:12:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DF60E610FB Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1AB2D6B006C; Mon, 30 Aug 2021 08:12:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15AE96B0071; Mon, 30 Aug 2021 08:12:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0240D6B0072; Mon, 30 Aug 2021 08:12:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id E74C96B006C for ; Mon, 30 Aug 2021 08:12:53 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 89BDB256F4 for ; Mon, 30 Aug 2021 12:12:53 +0000 (UTC) X-FDA: 78531635826.38.F1C3D07 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf08.hostedemail.com (Postfix) with ESMTP id 255FE30000AF for ; Mon, 30 Aug 2021 12:12:53 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id EBE1C220CA; Mon, 30 Aug 2021 12:12:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630325571; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2hWoCd86FOkOw7A98E1hRLN7RLXNnAR31E/MeyrKApg=; b=Y1gMMOSG4Lbl6n6EPJf1XND9hTBA55SaCavADAkaMhKtcCo/fdtC90kY49WhTWrpd2I2Qw ihSEer7fh+GydVOL5lgJ+x54xNtaI/DzPJNEfiUsx3J1/ggxYMTOuH0fO1Tzf0qlIEFEpO Ao6FdqbDH28DfZ6nb62XRrXvVEsIYJg= Received: from suse.cz (pathway.suse.cz [10.100.12.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id C0204A3B8C; Mon, 30 Aug 2021 12:12:49 +0000 (UTC) Date: Mon, 30 Aug 2021 14:12:49 +0200 From: Petr Mladek To: Yury Norov Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mmc@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, "James E.J. Bottomley" , Alexander Lobakin , Alexander Shishkin , Alexey Klimov , Andrea Merello , Andy Shevchenko , Arnaldo Carvalho de Melo , Arnd Bergmann , Ben Gardon , Benjamin Herrenschmidt , Brian Cain , Catalin Marinas , Christoph Lameter , Daniel Bristot de Oliveira , David Hildenbrand , Dennis Zhou , Geert Uytterhoeven , Heiko Carstens , Ian Rogers , Ingo Molnar , Jaegeuk Kim , Jakub Kicinski , Jiri Olsa , Joe Perches , Jonas Bonn , Leo Yan , Mark Rutland , Namhyung Kim , Palmer Dabbelt , Paolo Bonzini , Peter Xu , Peter Zijlstra , Rasmus Villemoes , Rich Felker , Samuel Mendoza-Jonas , Sean Christopherson , Sergey Senozhatsky , Shuah Khan , Stefan Kristiansson , Steven Rostedt , Tejun Heo , Thomas Bogendoerfer , Ulf Hansson , Will Deacon , Wolfram Sang , Yoshinori Sato Subject: Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Message-ID: <20210830121249.2fgyvf47py2tz5s5@pathway.suse.cz> References:<20210814211713.180533-1-yury.norov@gmail.com> <20210814211713.180533-12-yury.norov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) X-Rspamd-Queue-Id: 255FE30000AF Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Y1gMMOSG; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=pmladek@suse.com X-Rspamd-Server: rspam01 X-Stat-Signature: 5xegt3sz3896trnrb3agiqfxupa9a5ex X-HE-Tag: 1630325573-916822 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 2021-08-26 14:09:55, Yury Norov wrote: > On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote: > > On Sat 2021-08-14 14:17:07, Yury Norov wrote: > > > The macros iterate thru all set/clear bits in a bitmap. They search a > > > first bit using find_first_bit(), and the rest bits using find_next_bit(). > > > > > > Since find_next_bit() is called shortly after find_first_bit(), we can > > > save few lines of I-cache by not using find_first_bit(). > > > > Is this only a speculation or does it fix a real performance problem? > > > > The macro is used like: > > > > for_each_set_bit(bit, addr, size) { > > fn(bit); > > } > > > > IMHO, the micro-opimization does not help when fn() is non-trivial. > > The effect is measurable: > > Start testing for_each_bit() > for_each_set_bit: 15296 ns, 1000 iterations > for_each_set_bit_from: 15225 ns, 1000 iterations > > Start testing for_each_bit() with cash flushing > for_each_set_bit: 547626 ns, 1000 iterations > for_each_set_bit_from: 497899 ns, 1000 iterations > > Refer this: > > https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html I see. The results look convincing on the first look. But I am still not sure. This patch is basically contradicting many other patches from this patchset: + 5th patch optimizes find_first_and_bit() and proves that it is much faster: Before (#define find_first_and_bit(...) find_next_and_bit(..., 0): Start testing find_bit() with random-filled bitmap [ 140.291468] find_first_and_bit: 46890919 ns, 32671 iterations Start testing find_bit() with sparse bitmap [ 140.295028] find_first_and_bit: 7103 ns, 1 iterations After: Start testing find_bit() with random-filled bitmap [ 162.574907] find_first_and_bit: 25045813 ns, 32846 iterations Start testing find_bit() with sparse bitmap [ 162.578458] find_first_and_bit: 4900 ns, 1 iterations => saves 46% in random bitmap saves 31% in sparse bitmap + 6th, 7th, and 9th patch makes the code use find_first_bit() because it is faster than find_next_bit(mask, size, 0); + Now, 11th (this) patch replaces find_first_bit() with find_next_bit(mask, size, 0) because find_first_bit() makes things slower. It is suspicious at minimum. By other words. The I-cache could safe 10% in one case. But find_first_bit() might safe 46% in random case. Does I-cache cost more than the faster code? Or was for_each_set_bit() tested only with a bitmap where find_first_bit() optimization did not help much? How would for_each_set_bit() work with random bitmap? How does it work with larger bitmaps? Best Regards, Petr