From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB67EC433DF for ; Mon, 27 Jul 2020 18:05:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8832B20714 for ; Mon, 27 Jul 2020 18:05:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8832B20714 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 16C986B0003; Mon, 27 Jul 2020 14:05:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11FA88D0001; Mon, 27 Jul 2020 14:05:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00C486B0006; Mon, 27 Jul 2020 14:05:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0079.hostedemail.com [216.40.44.79]) by kanga.kvack.org (Postfix) with ESMTP id DFD3B6B0003 for ; Mon, 27 Jul 2020 14:05:53 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A1973181AC9B6 for ; Mon, 27 Jul 2020 18:05:53 +0000 (UTC) X-FDA: 77084634186.10.force98_5e0ea1226f63 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 0757912C9 for ; Mon, 27 Jul 2020 18:04:26 +0000 (UTC) X-HE-Tag: force98_5e0ea1226f63 X-Filterd-Recvd-Size: 6626 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Jul 2020 18:04:24 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01419;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0U4.hRFJ_1595873052; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0U4.hRFJ_1595873052) by smtp.aliyun-inc.com(127.0.0.1); Tue, 28 Jul 2020 02:04:17 +0800 Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault To: Yu Xu , Catalin Marinas Cc: Linus Torvalds , Andrew Morton , Johannes Weiner , Hillf Danton , Hugh Dickins , Josef Bacik , "Kirill A . Shutemov" , Linux-MM , mm-commits@vger.kernel.org, Will Deacon , Matthew Wilcox References: <20200723211432.b31831a0df3bc2cbdae31b40@linux-foundation.org> <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> <0323de82-cfbd-8506-fa9c-a702703dd654@linux.alibaba.com> <20200727110512.GB25400@gaia> <39560818-463f-da3a-fc9e-3a4a0a082f61@linux.alibaba.com> From: Yang Shi Message-ID: Date: Mon, 27 Jul 2020 11:04:01 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <39560818-463f-da3a-fc9e-3a4a0a082f61@linux.alibaba.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Queue-Id: 0757912C9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/27/20 10:12 AM, Yu Xu wrote: > On 7/27/20 7:05 PM, Catalin Marinas wrote: >> On Mon, Jul 27, 2020 at 03:31:16PM +0800, Yu Xu wrote: >>> On 7/25/20 4:22 AM, Linus Torvalds wrote: >>>> On Fri, Jul 24, 2020 at 12:27 PM Linus Torvalds >>>> wrote: >>>>> >>>>> It *may* make sense to say "ok, don't bother flushing the TLB if th= is >>>>> is a retry, because we already did that originally". MAYBE. >> [...] >>>> We could say that we never need it at all for FAULT_FLAG_RETRY. That >>>> makes a lot of sense to me. >>>> >>>> So a patch that does something like the appended (intentionally >>>> whitespace-damaged) seems sensible. >>> >>> I tested your patch on our aarch64 box, with 128 online CPUs. >> [...] >>> There are two points to sum up. >>> >>> 1) the performance of page_fault3_process is restored, while the=20 >>> performance >>> of page_fault3_thread is about ~80% of the vanilla, except the case=20 >>> of 128 >>> threads. >>> >>> 2) in the case of 128 threads, test worker threads seem to get=20 >>> stuck, making >>> no progress in the iterations of mmap-write-munmap until a period of=20 >>> time >>> later.=C2=A0 the test result is 0 because only first 16 samples are=20 >>> counted, and >>> they are all 0.=C2=A0 This situation is easy to re-produce with large= =20 >>> number of >>> threads (not necessarily 128), and the stack of one stuck thread is=20 >>> shown >>> below. >>> >>> [<0>] __switch_to+0xdc/0x150 >>> [<0>] wb_wait_for_completion+0x84/0xb0 >>> [<0>] __writeback_inodes_sb_nr+0x9c/0xe8 >>> [<0>] try_to_writeback_inodes_sb+0x6c/0x88 >>> [<0>] ext4_nonda_switch+0x90/0x98 [ext4] >>> [<0>] ext4_page_mkwrite+0x248/0x4c0 [ext4] >>> [<0>] do_page_mkwrite+0x4c/0x100 >>> [<0>] do_fault+0x2ac/0x3e0 >>> [<0>] handle_pte_fault+0xb4/0x258 >>> [<0>] __handle_mm_fault+0x1d8/0x3a8 >>> [<0>] handle_mm_fault+0x104/0x1d0 >>> [<0>] do_page_fault+0x16c/0x490 >>> [<0>] do_translation_fault+0x60/0x68 >>> [<0>] do_mem_abort+0x58/0x100 >>> [<0>] el0_da+0x24/0x28 >>> [<0>] 0xffffffffffffffff >>> >>> It seems quite normal, right? and I've run out of ideas. >> >> If threads get stuck here, it could be a stale TLB entry that's not >> flushed with Linus' patch. Since that's a write fault, I think it hits >> the FAULT_FLAG_TRIED case. > > There must be some changes in my test box, because I find that even the > vanilla kernel (89b15332af7c^) get result of 0 in 128t testcase. And I > just directly used the history test data as the baseline.=C2=A0 I will = dig > into this then. Thanks for doing the test. > > And do we still need to concern the ~20% performance drop in thread mod= e? I guess there might be more resource contention for thread mode, i.e.=20 page table lock, etc so the result might be not very stable. And retried=20 page fault may exacerbate such contention. Anyway we got the process=20 mode back to normal and improved the thread mode a lot. > >> >> Could you give my patch here a try as an alternative: >> >> https://lore.kernel.org/linux-mm/20200725155841.GA14490@gaia/ > > I ran the same test on the same aarch64 box, with your patch, the resul= t > is as follows. > > test=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vanilla kern= el=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 patched kernel > parameter=C2=A0=C2=A0=C2=A0=C2=A0 (89b15332af7c^)=C2=A0=C2=A0=C2=A0=C2=A0= (Catalin's patch) > 1p=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 82= 9299=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 787676=C2=A0=C2=A0=C2=A0 (96.36 %) > 1t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 99= 8007=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 789284=C2=A0=C2=A0=C2=A0 (78.36 %) > 32p=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1891671= 8=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 17921= 100=C2=A0 (94.68 %) > 32t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2020918= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 1644146=C2=A0=C2=A0 (67.64 %) > 64p=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1896516= 8=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 18983= 580=C2=A0 (100.0 %) > 64t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1415404= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 1093750=C2=A0=C2=A0 (48.03 %) > 96p=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1894943= 8=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 18963= 921=C2=A0 (100.1 %) > 96t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1622876= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 1262878=C2=A0=C2=A0 (63.72 %) > 128p=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 18926813=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1680146=C2=A0= =C2=A0 (8.89=C2=A0 %) > 128t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1643109=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 (0.0= 0 % ) # ignore this temporarily It looks Linus's patch has better data. It seems sane to me since=20 Catalin's patch still needs flush TLB in the shared domain. > > Thanks > Yu > >> >> It leaves the spurious flush in place but only local (though note that >> in a guest under KVM, all local TLBIs are upgraded to inner-shareable, >> so you'd not get the performance benefit). >>