From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F657C352B6 for ; Mon, 13 Apr 2020 16:24:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB0C120678 for ; Mon, 13 Apr 2020 16:24:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P5GmFKKz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB0C120678 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 630148E0129; Mon, 13 Apr 2020 12:24:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DFAA8E0104; Mon, 13 Apr 2020 12:24:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F64D8E0129; Mon, 13 Apr 2020 12:24:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 381338E0104 for ; Mon, 13 Apr 2020 12:24:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DEF82181AEF1A for ; Mon, 13 Apr 2020 16:24:41 +0000 (UTC) X-FDA: 76703355162.29.noise12_be227ddbf059 X-HE-Tag: noise12_be227ddbf059 X-Filterd-Recvd-Size: 6299 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Apr 2020 16:24:41 +0000 (UTC) Received: by mail-io1-f51.google.com with SMTP id s18so6570484ioe.10 for ; Mon, 13 Apr 2020 09:24:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=E43Qdro4sc6n+mNQmJz5VCTmwJks+bCFDrOvkqaVOvE=; b=P5GmFKKzHqBujLO5cy4v4RQkiBkayhPdZkGuNUTkse9J7vR0BsqyCBFYOETuFpQshA 6WYi+Ic9wefCmWAKHc5T0IL5Xy87cg7gMfZCdQao3OvaNqs90WP0BxxJe4UoqFNBNyB8 Qz0z67wKgZ58+Eg4DHs+fBO0aljfu2R7A1TKOaAqmEguKaQf5+DWG00d54qokXTKyllt vh3eTyHRGmHKFhJEIhbbhO/V9hfIwKScZ9zHzzqpjVNliK6qPHhtvyfBF9STDxGR/UXp frG9sumo6x+RRXiQFz6HGd3M4p3BINsVf5SQiGV2NMcLyE0Lb0e1ntmuR39IskUcXAWd Vq4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E43Qdro4sc6n+mNQmJz5VCTmwJks+bCFDrOvkqaVOvE=; b=DZ6fJPZ6xbIMSoxZRr1G9NenuxgDzzP+7ebJr17whS/RPgu76kuLpISC2ZTv/cMwKM 3auHBYdx9ntzURGnsHd3tBshjKky5f89rakJUxBOMi6mPBWhlVq/28QXXsU/BCNR3gNf X++76EzLJZ/e/uLxiymn8LpLNfrWvaIoCqHyIkln9REfWlXESc/Rw6xHS2oYWFS3a+/R B8DxoSJCmI4G83WDcTpNOmUKgJHAgdYCc5SiomvC35+mQhU1S2MmkarU00rkHpneBz5w gfObML4O0dWYpNBpgbx72SBsW9g9/5a5omnPV4YBn/lC6mDrJWWWPuGSleBplDWEqkvB twPA== X-Gm-Message-State: AGi0Pub5boVyeeUpjAIkaFic8b4PBS7rdlkvz4PBwCt25BNGRXCpMoC7 iVF5wgbygOEYhnY+ChhASubbRPSy6nnhFSOVXPo= X-Google-Smtp-Source: APiQypIQE4HeBCez56Px1bNu8PVISUF+LVpRNzk37vtR33r5aSNlgJ339I3KP608ukEY5lLpZ3NQOGSFQqbClQ6YIxI= X-Received: by 2002:a6b:cd4a:: with SMTP id d71mr17022812iog.5.1586795080751; Mon, 13 Apr 2020 09:24:40 -0700 (PDT) MIME-Version: 1.0 References: <20200403081812.GA14090@oneplus.com> <20200403085201.GX22681@dhcp22.suse.cz> <20200409152913.GA9878@oneplus.com> <20200409154538.GR18386@dhcp22.suse.cz> <87lfn390db.fsf@yhuang-dev.intel.com> <20200413153351.GB13136@oneplus.com> In-Reply-To: <20200413153351.GB13136@oneplus.com> From: Alexander Duyck Date: Mon, 13 Apr 2020 09:24:29 -0700 Message-ID: Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases To: Prathu Baronia Cc: Chintan Pandya , "Huang, Ying" , Michal Hocko , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "gregkh@linuxfoundation.org" , "gthelen@google.com" , "jack@suse.cz" , Ken Lin , Gasine Xu Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 13, 2020 at 8:34 AM Prathu Baronia wrote: > > The 04/11/2020 13:47, Alexander Duyck wrote: > > > > This is an interesting data point. So running things in reverse seems > > much more expensive than running them forward. As such I would imagine > > process_huge_page is going to be significantly more expensive then on > > ARM64 since it will wind through the pages in reverse order from the > > end of the page all the way down to wherever the page was accessed. > > > > I wonder if we couldn't simply process_huge_page to process pages in > > two passes? The first being from the addr_hint + some offset to the > > end, and then loop back around to the start of the page for the second > > pass and just process up to where we started the first pass. The idea > > would be that the offset would be enough so that we have the 4K that > > was accessed plus some range before and after the address hopefully > > still in the L1 cache after we are done. > That's a great idea, we were working on a similar idea for the v2 patch and you > suggesting this idea has reassured our approach. This will incorporate the > benefits of optimized memset and will keep the cache hot around the > faulting address. > > Earlier we had taken this offset as 0.5MB and after your response we have kept it > as 32KB. As we understand there is a trade-off associated with keeping this value > too high, we would really appreciate if you can suggest a method to derive an > appropriate value for this offset from the L1 cache size. I mentioned 32KB as a value since that happens to be a common value for L1 cache size on both the ARM64 processor mentioned, and most modern x86 CPUs. As far as deriving it I don't know if there is a good way to go about doing that. I suspect it is something that would need to be architecture specific. If nothing else you might be able to do something like define it similar to how L1_CACHE_SHIFT/BYTES is defined in cache.h for most architectures. Also we probably want to play around with that value a bit as well as I suspect there may be some room to either increase or decrease the value depending on the cost for cold accesses versus being able to process memory initialization in larger batches. > > An additional thing I was just wondering is if this also impacts the > > copy operations as well? Looking through the code the two big users > > for process_huge_page are clear_huge_page and copy_user_huge_page. One > > thing that might make more sense than just splitting the code at a > > high level would be to look at possibly refactoring process_huge_page > > and the users for it. > You are right, we didn't consider refactoring process_huge_page earlier. We > will incorporate this in the soon to be sent v2 patch. > > Thanks a lot for the interesting insights! Sounds good. I'll look forward to v2. Thanks. - Alex