From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 6B0AA8E0038 for ; Tue, 8 Jan 2019 16:53:06 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id 12so2897809plb.18 for ; Tue, 08 Jan 2019 13:53:06 -0800 (PST) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id j14si10968118pgg.44.2019.01.08.13.53.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Jan 2019 13:53:05 -0800 (PST) Message-ID: Subject: Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order From: Alexander Duyck Date: Tue, 08 Jan 2019 13:53:03 -0800 In-Reply-To: <20190108200436.GK31793@dhcp22.suse.cz> References: <1546578076-31716-1-git-send-email-arunks@codeaurora.org> <37498672d5b2345b1435477e78251282af42742b.camel@linux.intel.com> <20190108200436.GK31793@dhcp22.suse.cz> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Arun KS , arunks.linux@gmail.com, akpm@linux-foundation.org, vbabka@suse.cz, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, getarunks@gmail.com On Tue, 2019-01-08 at 21:04 +0100, Michal Hocko wrote: > On Tue 08-01-19 10:40:18, Alexander Duyck wrote: > > On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote: > > > When freeing pages are done with higher order, time spent on coalescing > > > pages by buddy allocator can be reduced. With section size of 256MB, hot > > > add latency of a single section shows improvement from 50-60 ms to less > > > than 1 ms, hence improving the hot add latency by 60 times. Modify > > > external providers of online callback to align with the change. > > > > > > Signed-off-by: Arun KS > > > Acked-by: Michal Hocko > > > Reviewed-by: Oscar Salvador > > > > After running into my initial issue I actually had a few more questions > > about this patch. > > > > > [...] > > > +static int online_pages_blocks(unsigned long start, unsigned long nr_pages) > > > +{ > > > + unsigned long end = start + nr_pages; > > > + int order, ret, onlined_pages = 0; > > > + > > > + while (start < end) { > > > + order = min(MAX_ORDER - 1, > > > + get_order(PFN_PHYS(end) - PFN_PHYS(start))); > > > + > > > + ret = (*online_page_callback)(pfn_to_page(start), order); > > > + if (!ret) > > > + onlined_pages += (1UL << order); > > > + else if (ret > 0) > > > + onlined_pages += ret; > > > + > > > + start += (1UL << order); > > > + } > > > + return onlined_pages; > > > } > > > > > > > Should the limit for this really be MAX_ORDER - 1 or should it be > > pageblock_order? In some cases this will be the same value, but I seem > > to recall that for x86 MAX_ORDER can be several times larger than > > pageblock_order. > > Does it make any difference when we are in fact trying to onine nr_pages > and we clamp to it properly? I'm not entirely sure if it does or not. What I notice looking through the code though is that there are a number of checks for the pageblock migrate type. There ends up being checks in __free_one_page, free_one_page, and __free_pages_ok all related to this. It might be moot since we are starting with a offline section, but I just brought this up because I know in the case of deferred page init we were limiting ourselves to pageblock_order and I wasn't sure if there was some specific reason for doing that. > > > static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, > > > void *arg) > > > { > > > - unsigned long i; > > > unsigned long onlined_pages = *(unsigned long *)arg; > > > - struct page *page; > > > > > > if (PageReserved(pfn_to_page(start_pfn))) > > > > I'm not sure we even really need this check. Getting back to the > > discussion I have been having with Michal in regards to the need for > > the DAX pages to not have the reserved bit cleared I was originally > > wondering if we could replace this check with a call to > > online_section_nr since the section shouldn't be online until we set > > the bit below in online_mem_sections. > > > > However after doing some further digging it looks like this could > > probably be dropped entirely since we only call this function from > > online_pages and that function is only called by memory_block_action if > > pages_correctly_probed returns true. However pages_correctly_probed > > should return false if any of the sections contained in the page range > > is already online. > > Yes you are right but I guess it would be better to address in a > separate patch that deals with PageReserved manipulation in general. > I do not think we want to remove the check silently. People who might be > interested in backporting this for whatever reason might screatch their > head why the test is not needed anymore. Yeah I am already working on that, it is what led me to review this patch. Just thought I would bring it up since it would make it possible to essentially reduce the size and/or need for a new function. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E2DBC43612 for ; Tue, 8 Jan 2019 21:53:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4338620660 for ; Tue, 8 Jan 2019 21:53:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4338620660 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CFBCB8E009D; Tue, 8 Jan 2019 16:53:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C84798E0038; Tue, 8 Jan 2019 16:53:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B25E58E009D; Tue, 8 Jan 2019 16:53:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 6B0AA8E0038 for ; Tue, 8 Jan 2019 16:53:06 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id 12so2897809plb.18 for ; Tue, 08 Jan 2019 13:53:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :subject:from:to:cc:date:in-reply-to:references:mime-version :content-transfer-encoding; bh=G5O8wtLqCBPHHIqC79aJwgSZzGJC16jAuPeFQLK0h/c=; b=iAxg5VvOrAa5Sh9jszp0ZcUE/1X4+LJj4NBwsRh3UOXs3dVc26D+2WKGVo1iHQUsnN Uz+XqjvBmrTvVlo/ms9ol1/Aw6NmnJKaTp+iMiC+wDTGX3W9JwnLy7+Yj1ruLkcbxGdC 7IOVc3ySf/1qgBaEyIUe3oAObOG194gjn3mwM7KDSIyduNom6uKdxxBAhkXuaRVVxKLA M7YVQ48RyH91ftzbaU7Ja4Ydq+T8BjEbx0d+c1vp47i4Zc7ISflRRbZbZNsDeJi3MblC 9anXWNTKKtrF55zSXdsDbTTbZCmxWMEsCq0VCe9zSoymooxAqCjQQFWgQDFu/Veof9Tr fEnA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukeapQvTo4uLk3XJue/YjeuWN3DcXufvNberHPd45jdGzPDEeXuL LQBgFZyc+evxbBD5pulNkd70ZqhByo141wptYCbhrrJPdFLGQXWKU8Iv6scC8cgoGnSkVi4OkuT mUsGZ2ZhYLcAuwcfrdIxlco2h8VvylKDkGDJJClBFf28dJiernXyKsVesD3yOip4ZyQ== X-Received: by 2002:a62:a1a:: with SMTP id s26mr3431707pfi.31.1546984386107; Tue, 08 Jan 2019 13:53:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN6LUggyNiGaeh/GC8uLGBTUk4m7fDljrOxfRrsvmEMTOPSfXKTFaO9OptGhgPRDSQz7L405 X-Received: by 2002:a62:a1a:: with SMTP id s26mr3431663pfi.31.1546984385186; Tue, 08 Jan 2019 13:53:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546984385; cv=none; d=google.com; s=arc-20160816; b=r/lxHugkwETIKbcUr9qlGB/cj+sg2jeBOvl7kydMnAEC/ybI5REwxbWCg1U6PTFTTm RAqSGdFyuzQvhnisyiR+fcJTpuF1eIIltYA/i7TJywXoYJLCdUrQnfTs/6XxjBkzFroI m61jCMrdMW8NhnEc6ZtzoAqN8MrSfIPmBvhW9Gahbqf+HF0ymiRnCwiBFjWt/MbgWfp+ ciijdV9T/hExsUVEkwCUMboJ6i4ZN9KENcLSIQg0pnmO1dT238glRKSjsa0V6TTJ+7Gc uTzO27vg77usz/Akqjkh49+YKyTJG2Zgjxyx5mUgAxrTGF7Prmt7DBU/B872lL8dKc9F cZug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to:date :cc:to:from:subject:message-id; bh=G5O8wtLqCBPHHIqC79aJwgSZzGJC16jAuPeFQLK0h/c=; b=wbM10TwH9zVXzB1MGY3lssOn5oiNMNZZHQMQqBX0ONHJBUTTrbVuCA2nYd89A32zfs bZ6P4E9Ayw9tm+woUmQx6LtMW+71uR4/g2EgJL4bh6trLL5/Vhd2+ttE57v/dyb3dZtv mrByr4HxaUyrn8YROM5dQ4UodVLOyo0nMnFwnsq65qLyrusS0P5aBhY/F2Xj7iUFdyCR hkwz7SNrK1oZVpEBEbwrZ58EZj2xNJ2jKeRwPPsBXH3ypoNSONMZsLpkhVYUKaSS6nzi cEH/g9XyCGqDatlsbxpy4+PEf0X+qTOjW2io1SntNK13sLS+6NIzf3LmDmdHhBT/317w ehmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id j14si10968118pgg.44.2019.01.08.13.53.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Jan 2019 13:53:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Jan 2019 13:53:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,455,1539673200"; d="scan'208";a="105029250" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga007.jf.intel.com with ESMTP; 08 Jan 2019 13:53:03 -0800 Message-ID: Subject: Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order From: Alexander Duyck To: Michal Hocko Cc: Arun KS , arunks.linux@gmail.com, akpm@linux-foundation.org, vbabka@suse.cz, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, getarunks@gmail.com Date: Tue, 08 Jan 2019 13:53:03 -0800 In-Reply-To: <20190108200436.GK31793@dhcp22.suse.cz> References: <1546578076-31716-1-git-send-email-arunks@codeaurora.org> <37498672d5b2345b1435477e78251282af42742b.camel@linux.intel.com> <20190108200436.GK31793@dhcp22.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-2.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Message-ID: <20190108215303.UlUc_HHWgqj7j7NXfpsya91NvS8h8GT97v5l4xIk6Eo@z> On Tue, 2019-01-08 at 21:04 +0100, Michal Hocko wrote: > On Tue 08-01-19 10:40:18, Alexander Duyck wrote: > > On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote: > > > When freeing pages are done with higher order, time spent on coalescing > > > pages by buddy allocator can be reduced. With section size of 256MB, hot > > > add latency of a single section shows improvement from 50-60 ms to less > > > than 1 ms, hence improving the hot add latency by 60 times. Modify > > > external providers of online callback to align with the change. > > > > > > Signed-off-by: Arun KS > > > Acked-by: Michal Hocko > > > Reviewed-by: Oscar Salvador > > > > After running into my initial issue I actually had a few more questions > > about this patch. > > > > > [...] > > > +static int online_pages_blocks(unsigned long start, unsigned long nr_pages) > > > +{ > > > + unsigned long end = start + nr_pages; > > > + int order, ret, onlined_pages = 0; > > > + > > > + while (start < end) { > > > + order = min(MAX_ORDER - 1, > > > + get_order(PFN_PHYS(end) - PFN_PHYS(start))); > > > + > > > + ret = (*online_page_callback)(pfn_to_page(start), order); > > > + if (!ret) > > > + onlined_pages += (1UL << order); > > > + else if (ret > 0) > > > + onlined_pages += ret; > > > + > > > + start += (1UL << order); > > > + } > > > + return onlined_pages; > > > } > > > > > > > Should the limit for this really be MAX_ORDER - 1 or should it be > > pageblock_order? In some cases this will be the same value, but I seem > > to recall that for x86 MAX_ORDER can be several times larger than > > pageblock_order. > > Does it make any difference when we are in fact trying to onine nr_pages > and we clamp to it properly? I'm not entirely sure if it does or not. What I notice looking through the code though is that there are a number of checks for the pageblock migrate type. There ends up being checks in __free_one_page, free_one_page, and __free_pages_ok all related to this. It might be moot since we are starting with a offline section, but I just brought this up because I know in the case of deferred page init we were limiting ourselves to pageblock_order and I wasn't sure if there was some specific reason for doing that. > > > static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, > > > void *arg) > > > { > > > - unsigned long i; > > > unsigned long onlined_pages = *(unsigned long *)arg; > > > - struct page *page; > > > > > > if (PageReserved(pfn_to_page(start_pfn))) > > > > I'm not sure we even really need this check. Getting back to the > > discussion I have been having with Michal in regards to the need for > > the DAX pages to not have the reserved bit cleared I was originally > > wondering if we could replace this check with a call to > > online_section_nr since the section shouldn't be online until we set > > the bit below in online_mem_sections. > > > > However after doing some further digging it looks like this could > > probably be dropped entirely since we only call this function from > > online_pages and that function is only called by memory_block_action if > > pages_correctly_probed returns true. However pages_correctly_probed > > should return false if any of the sections contained in the page range > > is already online. > > Yes you are right but I guess it would be better to address in a > separate patch that deals with PageReserved manipulation in general. > I do not think we want to remove the check silently. People who might be > interested in backporting this for whatever reason might screatch their > head why the test is not needed anymore. Yeah I am already working on that, it is what led me to review this patch. Just thought I would bring it up since it would make it possible to essentially reduce the size and/or need for a new function.