From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A7AC2D0EC for ; Fri, 10 Apr 2020 18:55:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 03EEC20801 for ; Fri, 10 Apr 2020 18:55:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="N7xLbvn3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 03EEC20801 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 918198E004F; Fri, 10 Apr 2020 14:55:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C8E48E004D; Fri, 10 Apr 2020 14:55:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DDE48E004F; Fri, 10 Apr 2020 14:55:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 651148E004D for ; Fri, 10 Apr 2020 14:55:06 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 248B3A8F6 for ; Fri, 10 Apr 2020 18:55:06 +0000 (UTC) X-FDA: 76692847812.04.wash58_83d2a54adb101 X-HE-Tag: wash58_83d2a54adb101 X-Filterd-Recvd-Size: 6486 Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Apr 2020 18:55:05 +0000 (UTC) Received: by mail-il1-f196.google.com with SMTP id z12so2661881ilb.10 for ; Fri, 10 Apr 2020 11:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ClYPsWG/W2/E4LG9+wZv+uTssMgHE+BF+kTHJv5rhBg=; b=N7xLbvn3dOXT3E15NneJclKR4BCQNvFqQ5N2nylnTfsWaQfwzc+LK6RNLBoz++zlA7 wW++sXk1wWqIolqKiw2so3mLmuJBrx1vI03NX3crp2oGMWbU+/52Yiv5zvMLsBFFuHzO DL3WO25yR2Y1EueI1a2FB9O+z7Bs48n9ipQbsVB6uauoCoyKRlKMF2563Jk8OweT2u00 rTN6QvVp8cHHhi1Xb0ff8fTCX/n2FCaJESqNeCJB630l3pRdfiM2yZ08biLhygfYtdHD alqYn/E7l3QgYqTnAl1zhtX2XHuY8xg+5r2jjPRQmYBj7dFU+zZwCpSxw9fKpFViCML+ 8xrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ClYPsWG/W2/E4LG9+wZv+uTssMgHE+BF+kTHJv5rhBg=; b=blBsjKK+Un+KrF9SC9gpBSu5s/WyQzqJcXB55UHXP445z58zKS6VFuPsYxQG73IRBp rQmAAfWcG7SbeKnurNjNSg5aGnS6fn2F0aseijtKJTnksYqXHKhf1bieVXaKU5f0Uid5 asglkzvXtQHMurl1Xc/RfscLDzZTiZcu0s8fHyhw3z+oBudQ4rc4r4FMobcXkHDc9bZ5 0gr7O9lwGiPu74lV4sm2fgZjyV/QMF/W0lJpaGfSwwTX/Kt5jVNeq/I23RYzFNKx/x1k XKBeGcmE5kHVUkjFzEcqgaI/u7YEbdAMkZ/VrzEu0NeTZ92LueBNogexrcnC/nvK76qu UmQA== X-Gm-Message-State: AGi0Pua4DPmvZQz5zi0/TV2EXndfCXkldFz+HsTMxk+zEKIc6XYpmtoL EVIq9nFaL0yB44pcB/61++Okn2EK6/XIbiw78Yk= X-Google-Smtp-Source: APiQypItyWkuXLZJ3UEK9uHd7BZiVTfOniLiy2fCn2gDhHh05eKxBi9+WqRu/39AENb4GwNNuwK/H0MFQXXjfcQrSLI= X-Received: by 2002:a05:6e02:c25:: with SMTP id q5mr5809576ilg.97.1586544904698; Fri, 10 Apr 2020 11:55:04 -0700 (PDT) MIME-Version: 1.0 References: <20200403081812.GA14090@oneplus.com> In-Reply-To: <20200403081812.GA14090@oneplus.com> From: Alexander Duyck Date: Fri, 10 Apr 2020 11:54:53 -0700 Message-ID: Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases To: Prathu Baronia Cc: Andrew Morton , linux-mm , Greg KH , gthelen@google.com, jack@suse.cz, Michal Hocko , ken.lin@oneplus.com, gasine.xu@oneplus.com, chintan.pandya@oneplus.com Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 3, 2020 at 1:18 AM Prathu Baronia wrote: > > THP allocation for anon memory requires zeroing of the huge page. To do so, > we iterate over 2MB memory in 4KB chunks. Each iteration calls for kmap_atomic() > and kunmap_atomic(). This routine makes sense where we need temporary mapping of > the user page. In !HIGHMEM cases, specially in 64-bit architectures, we don't > need temp mapping. Hence, kmap_atomic() acts as nothing more than multiple > barrier() calls. > > This called for optimization. Simply getting VADDR from page does the job for > us. So, implement another (optimized) routine for clear_huge_page() which > doesn't need temporary mapping of user space page. > > While testing this patch on Qualcomm SM8150 SoC (kernel v4.14.117), we see 64% > Improvement in clear_huge_page(). > > Ftrace results: > > Default profile: > ------------------------------------------ > 6) ! 473.802 us | clear_huge_page(); > ------------------------------------------ > > With this patch applied: > ------------------------------------------ > 5) ! 170.156 us | clear_huge_page(); > ------------------------------------------ I suspect that if anything this is really pointing out how much overhead is being added through process_huge_page. I know for x86 most of the modern processors are somewhere between 16B/cycle or 32B/cycle to initialize memory with some fixed amount of overhead for making the rep movsb/stosb call. One thing that might make sense to look at would be to see if we could possibly reduce the number of calls we have to make with process_huge_page by taking the caches into account. For example I know on x86 the L1 cache is 32K for most processors, so we could look at possibly bumping things up so that we are processing 8 pages at a time and then making a call to cond_resched() instead of doing it per 4K page. > Signed-off-by: Prathu Baronia > Reported-by: Chintan Pandya > --- > mm/memory.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 3ee073d..3e120e8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5119,6 +5119,7 @@ EXPORT_SYMBOL(__might_fault); > #endif > > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) > +#ifdef CONFIG_HIGHMEM > static void clear_gigantic_page(struct page *page, > unsigned long addr, > unsigned int pages_per_huge_page) > @@ -5183,6 +5184,16 @@ void clear_huge_page(struct page *page, > addr + right_idx * PAGE_SIZE); > } > } > +#else > +void clear_huge_page(struct page *page, > + unsigned long addr_hint, unsigned int pages_per_huge_page) > +{ > + void *addr; > + > + addr = page_address(page); > + memset(addr, 0, pages_per_huge_page*PAGE_SIZE); > +} > +#endif This seems like a very simplistic solution to the problem, and I am worried something like this would introduce latency issues when pages_per_huge_page gets to be large. It might make more sense to just wrap the process_huge_page call in the original clear_huge_page and then add this code block as an #else case. That way you avoid potentially stalling a system for extended periods of time if you start trying to clear 1G pages with the function. One interesting data point would be to see what the cost is for breaking this up into a loop where you only process some fixed number of pages and running it with cond_resched() so you can avoid introducing latency spikes.