From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9270BC433DF for ; Tue, 18 Aug 2020 07:17:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 581B2205CB for ; Tue, 18 Aug 2020 07:17:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 581B2205CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D30506B000A; Tue, 18 Aug 2020 03:17:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE0316B000C; Tue, 18 Aug 2020 03:17:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCF7C6B000D; Tue, 18 Aug 2020 03:17:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id A5F006B000A for ; Tue, 18 Aug 2020 03:17:05 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 61A041F1A for ; Tue, 18 Aug 2020 07:17:05 +0000 (UTC) X-FDA: 77162832810.12.nose43_32077cc2701d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 3DC62180070D7 for ; Tue, 18 Aug 2020 07:16:58 +0000 (UTC) X-HE-Tag: nose43_32077cc2701d X-Filterd-Recvd-Size: 6753 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Aug 2020 07:16:53 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04397;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0U67H3Vd_1597735003; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U67H3Vd_1597735003) by smtp.aliyun-inc.com(127.0.0.1); Tue, 18 Aug 2020 15:16:45 +0800 Subject: Re: [PATCH 2/2] mm/pageblock: remove false sharing in pageblock_flags To: Alexander Duyck Cc: Matthew Wilcox , Andrew Morton , Hugh Dickins , Alexander Duyck , LKML , linux-mm References: <1597549677-7480-1-git-send-email-alex.shi@linux.alibaba.com> <1597549677-7480-2-git-send-email-alex.shi@linux.alibaba.com> <20200816041720.GG17456@casper.infradead.org> <957eee62-1f46-49b6-4d5a-9671dc07c562@linux.alibaba.com> From: Alex Shi Message-ID: Date: Tue, 18 Aug 2020 15:15:43 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: 3DC62180070D7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2020/8/16 =E4=B8=8B=E5=8D=8811:56, Alexander Duyck =E5=86=99=E9= =81=93: > On Sun, Aug 16, 2020 at 7:11 AM Alex Shi w= rote: >> >> >> >> =E5=9C=A8 2020/8/16 =E4=B8=8B=E5=8D=8812:17, Matthew Wilcox =E5=86=99=E9= =81=93: >>> On Sun, Aug 16, 2020 at 11:47:57AM +0800, Alex Shi wrote: >>>> Current pageblock_flags is only 4 bits, so it has to share a char si= ze >>>> in cmpxchg when get set, the false sharing cause perf drop. >>>> >>>> If we incrase the bits up to 8, false sharing would gone in cmpxchg.= and >>>> the only cost is half char per pageblock, which is half char per 128= MB >>>> on x86, 4 chars in 1 GB. >>> >>> I don't believe this patch has that effect, mostly because it still d= oes >>> cmpxchg() on words instead of bytes. >> >> Hi Matthew, >> >> Thank a lot for comments! >> >> Sorry, I must overlook sth, would you like point out why the cmpxchg i= s still >> on words after patch 1 applied? >> >=20 > I would take it one step further. You still have false sharing as the > pageblocks bits still occupy the same cacheline so you are going to > see them cache bouncing regardless. Right, there 2 level false sharing here, cacheline and cmpxchg comparsion= range. this patch could fix the cmpxchg level with a very cheap price. the cacheline size is too huge to resovle here. >=20 > What it seems like you are attempting to address is the fact that > multiple threads could all be attempting to update the same long > value. As I pointed out for the migrate type it seems to be protected > by the zone lock, but for compaction the skip bit doesn't have the > same protection as there are some threads using the zone lock and > others using the LRU lock. I'm still not sure it makes much of a > difference though. It looks with this patch, lock are not needed anymore on the flags. >=20 >>> >>> But which functions would benefit? It seems to me this cmpxchg() is >>> only called from the set_pageblock_migratetype() morass of functions, >>> none of which are called in hot paths as far as I can make out. >>> >>> So are you just reasoning by analogy with the previous patch where yo= u >>> have measured a performance improvement, or did you send the wrong pa= tch, >>> or did I overlook a hot path that calls one of the pageblock migratio= n >>> functions? >>> >> >> Uh, I am reading compaction.c and found the following commit introduce= d >> test_and_set_skip under a lock. It looks like the pagelock_flags setti= ng >> has false sharing in cmpxchg. but I have no valid data on this yet. >> >> Thanks >> Alex >> >> e380bebe4771548 mm, compaction: keep migration source private to a si= ngle compaction instance >> >> if (!locked) { >> locked =3D compact_trylock_irqsave(zone_lru_lo= ck(zone), >> &flags= , cc); >> - if (!locked) >> + >> + /* Allow future scanning if the lock is conten= ded */ >> + if (!locked) { >> + clear_pageblock_skip(page); >> break; >> + } >> + >> + /* Try get exclusive access under lock */ >> + if (!skip_updated) { >> + skip_updated =3D true; >> + if (test_and_set_skip(cc, page, low_pf= n)) >> + goto isolate_abort; >> + } >> >=20 > I'm not sure that is a good grounds for doubling the size of the > pageblock flags. If you look further down in the code there are bits > that are setting these bits without taking the lock. The assumption > here is that by taking the lock the test_and_set_skip will be > performed atomically since another thread cannot perform that while > the zone lock is held. If you look in the function itself it only does > anything if the skip bits are checked and if the page is the first > page in the pageblock. >=20 > I think you might be confusing some of my earlier comments. I still > believe the 3% regression you reported with my patch is not directly > related to the test_and_set_skip as the test you ran seems unlikely to > trigger compaction. However with that said one of the advantages of > using the locked section to perform these types of tests is that it > reduces the number of times the test is run since it will only be on > the first unlocked page in any batch of pages and the first page in > the pageblock is always going to be handled without the lock held > since it is the first page processed. >=20 > Until we can get a test up such as thpscale that does a good job of > stressing the compaction code I don't think we can rely on just > observations to say if this is an improvement or not. I still struggle on thpscale meaningful running. But if the patch is=20 clearly right in theory. Do we have to hang on a benchmark result? Thanks Alex