From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31FC2C433C1 for ; Wed, 31 Mar 2021 05:44:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ACA9361584 for ; Wed, 31 Mar 2021 05:44:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ACA9361584 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2E30B6B007E; Wed, 31 Mar 2021 01:44:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 292C26B0081; Wed, 31 Mar 2021 01:44:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15AA06B0082; Wed, 31 Mar 2021 01:44:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id E85E96B007E for ; Wed, 31 Mar 2021 01:44:09 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 884DE689A for ; Wed, 31 Mar 2021 05:44:09 +0000 (UTC) X-FDA: 77979078618.10.10904A5 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP id EB600E0001AF for ; Wed, 31 Mar 2021 05:44:07 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 545C06024A; Wed, 31 Mar 2021 05:44:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1617169447; bh=Xhc7zQfL4bF/OHNK1HpEfoIAv4vWvUPs8jQIpfTLYIE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=OXDwaqr5QvEDOc9RMY6XwR+lv1+jUCOhFLEavt/HREAXlF7KvJraHMSNFiwkcYa0G Pge5v+dcMHAuJHF1HZSnLDmsmb/L8xYOpn3mPxog3+kaNVjpqQPLelzIE9/jv/YzSU iGI3uaLXguWFaPHmJnTG/QJqrNUN/3owSCpkkNZ4= Date: Tue, 30 Mar 2021 22:44:06 -0700 From: Andrew Morton To: qianjun.kernel@gmail.com Cc: ast@kernel.org, daniel@iogearbox.net, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, andriin@fb.com, john.fastabend@gmail.com, kpsingh@chromium.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: Re: [PATCH V2 1/1] mm:improve the performance during fork Message-Id: <20210330224406.5e195f3b8b971ff2a56c657d@linux-foundation.org> In-Reply-To: <20210329123635.56915-1-qianjun.kernel@gmail.com> References: <20210329123635.56915-1-qianjun.kernel@gmail.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: EB600E0001AF X-Stat-Signature: erztt3mxsdhu6zawh3w84z8c87xx56j8 X-Rspamd-Server: rspam02 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617169447-168322 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 29 Mar 2021 20:36:35 +0800 qianjun.kernel@gmail.com wrote: > From: jun qian > > In our project, Many business delays come from fork, so > we started looking for the reason why fork is time-consuming. > I used the ftrace with function_graph to trace the fork, found > that the vm_normal_page will be called tens of thousands and > the execution time of this vm_normal_page function is only a > few nanoseconds. And the vm_normal_page is not a inline function. > So I think if the function is inline style, it maybe reduce the > call time overhead. > > I did the following experiment: > > use the bpftrace tool to trace the fork time : > > bpftrace -e 'kprobe:_do_fork/comm=="redis-server"/ {@st=nsecs;} \ > kretprobe:_do_fork /comm=="redis-server"/{printf("the fork time \ > is %d us\n", (nsecs-@st)/1000)}' > > no inline vm_normal_page: > result: > the fork time is 40743 us > the fork time is 41746 us > the fork time is 41336 us > the fork time is 42417 us > the fork time is 40612 us > the fork time is 40930 us > the fork time is 41910 us > > inline vm_normal_page: > result: > the fork time is 39276 us > the fork time is 38974 us > the fork time is 39436 us > the fork time is 38815 us > the fork time is 39878 us > the fork time is 39176 us > > In the same test environment, we can get 3% to 4% of > performance improvement. > > note:the test data is from the 4.18.0-193.6.3.el8_2.v1.1.x86_64, > because my product use this version kernel to test the redis > server, If you need to compare the latest version of the kernel > test data, you can refer to the version 1 Patch. > > We need to compare the changes in the size of vmlinux: > inline non-inline diff > vmlinux size 9709248 bytes 9709824 bytes -576 bytes > I get very different results with gcc-7.2.0: q:/usr/src/25> size mm/memory.o text data bss dec hex filename 74898 3375 64 78337 13201 mm/memory.o-before 75119 3363 64 78546 132d2 mm/memory.o-after That's a somewhat significant increase in code size, and larger code size has a worsened cache footprint. Not that this is necessarily a bad thing for a function which is tightly called many times in succession as is vm__normal_page() > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -592,7 +592,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, > * PFNMAP mappings in order to support COWable mappings. > * > */ > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > pte_t pte) > { > unsigned long pfn = pte_pfn(pte); I'm a bit surprised this made any difference - rumour has it that modern gcc just ignores `inline' and makes up its own mind. Which is why we added __always_inline.