From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF694C433ED for ; Mon, 27 Jul 2020 17:13:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 541C5206E7 for ; Mon, 27 Jul 2020 17:13:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 541C5206E7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F176A6B0006; Mon, 27 Jul 2020 13:12:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9F0F6B0007; Mon, 27 Jul 2020 13:12:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8DE66B0008; Mon, 27 Jul 2020 13:12:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id C17A06B0006 for ; Mon, 27 Jul 2020 13:12:59 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7927B8248068 for ; Mon, 27 Jul 2020 17:12:59 +0000 (UTC) X-FDA: 77084500878.18.cub61_6200a6926f62 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id C7AEB100ED9D5 for ; Mon, 27 Jul 2020 17:12:54 +0000 (UTC) X-HE-Tag: cub61_6200a6926f62 X-Filterd-Recvd-Size: 5627 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Jul 2020 17:12:52 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R271e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01355;MF=xuyu@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0U4.kK4M_1595869963; Received: from xuyu-mbp15.local(mailfrom:xuyu@linux.alibaba.com fp:SMTPD_---0U4.kK4M_1595869963) by smtp.aliyun-inc.com(127.0.0.1); Tue, 28 Jul 2020 01:12:44 +0800 Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault To: Catalin Marinas Cc: Linus Torvalds , Andrew Morton , Johannes Weiner , Hillf Danton , Hugh Dickins , Josef Bacik , "Kirill A . Shutemov" , Linux-MM , mm-commits@vger.kernel.org, Will Deacon , Matthew Wilcox , yang.shi@linux.alibaba.com References: <20200723211432.b31831a0df3bc2cbdae31b40@linux-foundation.org> <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> <0323de82-cfbd-8506-fa9c-a702703dd654@linux.alibaba.com> <20200727110512.GB25400@gaia> From: Yu Xu Message-ID: <39560818-463f-da3a-fc9e-3a4a0a082f61@linux.alibaba.com> Date: Tue, 28 Jul 2020 01:12:43 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200727110512.GB25400@gaia> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C7AEB100ED9D5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/27/20 7:05 PM, Catalin Marinas wrote: > On Mon, Jul 27, 2020 at 03:31:16PM +0800, Yu Xu wrote: >> On 7/25/20 4:22 AM, Linus Torvalds wrote: >>> On Fri, Jul 24, 2020 at 12:27 PM Linus Torvalds >>> wrote: >>>> >>>> It *may* make sense to say "ok, don't bother flushing the TLB if this >>>> is a retry, because we already did that originally". MAYBE. > [...] >>> We could say that we never need it at all for FAULT_FLAG_RETRY. That >>> makes a lot of sense to me. >>> >>> So a patch that does something like the appended (intentionally >>> whitespace-damaged) seems sensible. >> >> I tested your patch on our aarch64 box, with 128 online CPUs. > [...] >> There are two points to sum up. >> >> 1) the performance of page_fault3_process is restored, while the performance >> of page_fault3_thread is about ~80% of the vanilla, except the case of 128 >> threads. >> >> 2) in the case of 128 threads, test worker threads seem to get stuck, making >> no progress in the iterations of mmap-write-munmap until a period of time >> later. the test result is 0 because only first 16 samples are counted, and >> they are all 0. This situation is easy to re-produce with large number of >> threads (not necessarily 128), and the stack of one stuck thread is shown >> below. >> >> [<0>] __switch_to+0xdc/0x150 >> [<0>] wb_wait_for_completion+0x84/0xb0 >> [<0>] __writeback_inodes_sb_nr+0x9c/0xe8 >> [<0>] try_to_writeback_inodes_sb+0x6c/0x88 >> [<0>] ext4_nonda_switch+0x90/0x98 [ext4] >> [<0>] ext4_page_mkwrite+0x248/0x4c0 [ext4] >> [<0>] do_page_mkwrite+0x4c/0x100 >> [<0>] do_fault+0x2ac/0x3e0 >> [<0>] handle_pte_fault+0xb4/0x258 >> [<0>] __handle_mm_fault+0x1d8/0x3a8 >> [<0>] handle_mm_fault+0x104/0x1d0 >> [<0>] do_page_fault+0x16c/0x490 >> [<0>] do_translation_fault+0x60/0x68 >> [<0>] do_mem_abort+0x58/0x100 >> [<0>] el0_da+0x24/0x28 >> [<0>] 0xffffffffffffffff >> >> It seems quite normal, right? and I've run out of ideas. > > If threads get stuck here, it could be a stale TLB entry that's not > flushed with Linus' patch. Since that's a write fault, I think it hits > the FAULT_FLAG_TRIED case. There must be some changes in my test box, because I find that even the vanilla kernel (89b15332af7c^) get result of 0 in 128t testcase. And I just directly used the history test data as the baseline. I will dig into this then. And do we still need to concern the ~20% performance drop in thread mode? > > Could you give my patch here a try as an alternative: > > https://lore.kernel.org/linux-mm/20200725155841.GA14490@gaia/ I ran the same test on the same aarch64 box, with your patch, the result is as follows. test vanilla kernel patched kernel parameter (89b15332af7c^) (Catalin's patch) 1p 829299 787676 (96.36 %) 1t 998007 789284 (78.36 %) 32p 18916718 17921100 (94.68 %) 32t 2020918 1644146 (67.64 %) 64p 18965168 18983580 (100.0 %) 64t 1415404 1093750 (48.03 %) 96p 18949438 18963921 (100.1 %) 96t 1622876 1262878 (63.72 %) 128p 18926813 1680146 (8.89 %) 128t 1643109 0 (0.00 % ) # ignore this temporarily Thanks Yu > > It leaves the spurious flush in place but only local (though note that > in a guest under KVM, all local TLBIs are upgraded to inner-shareable, > so you'd not get the performance benefit). >