From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8016FC433DF for ; Mon, 17 Aug 2020 15:27:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2858E2078D for ; Mon, 17 Aug 2020 15:27:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VrdDd0Gv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2858E2078D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C45B38D0002; Mon, 17 Aug 2020 11:27:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF5F36B000C; Mon, 17 Aug 2020 11:27:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE5D28D0002; Mon, 17 Aug 2020 11:27:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 97D2E6B0006 for ; Mon, 17 Aug 2020 11:27:11 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 50EEF364B for ; Mon, 17 Aug 2020 15:27:11 +0000 (UTC) X-FDA: 77160439062.13.bulb28_0a1384027017 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 190A618140B67 for ; Mon, 17 Aug 2020 15:27:11 +0000 (UTC) X-HE-Tag: bulb28_0a1384027017 X-Filterd-Recvd-Size: 8442 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Mon, 17 Aug 2020 15:27:10 +0000 (UTC) Received: by mail-pl1-f194.google.com with SMTP id k13so7675491plk.13 for ; Mon, 17 Aug 2020 08:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=zqBIUaJ3cVMO65WD1p/o1j9m69Vj//0MBX/6PG1ShOg=; b=VrdDd0GvfNZ8UB/+BKYbvA92AB8pBrjyqEN9RMIEAMONRkfZecAolVkdLuKzTDNoKG ya2r3/y197RzPruBvuDFSM7etHxelORUwAxOQylt/ZUQZIsDLt0IVC8pvLwXTZ1kaqX5 egKgYYFSuZzBMLZpAAnoTpM0IUXncqlhen8Db6cb3BVCUpXDAtyRkMwFKwJ9dKfA+F8N yrLnJ2gBUcUXEhBU4TLdwXegpvgrj4ymVG6PuSLuIeGA5SYafr03SfjLLgTFANr+RbPq LvcEIlQWM3IHBHhTgyOVS5FloZMtnRdZgkVvVWl+VYdYcBO6skk91crJHT7oMVg/Br+G YGMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=zqBIUaJ3cVMO65WD1p/o1j9m69Vj//0MBX/6PG1ShOg=; b=fWWo7ObG4eZHU6yGRs3OtTk7Z0FZ3gWnVdip+z7eQsaSAzZNlxmejc8Rfa2ab1vRy6 L8zRmmhkZ2zmLw2QfJ9a9EwjRbWH5L/eOvf3cWiDAYv+Q2+Dm4eAVQKlWZl/ygFYAsyx JysN8t5PBo/IPtnzVXG7Mme9nvdkv+bDoRr1F16US/RjYDPiQVREsjQwa5WZ/HmBU6Ni ZuBB61373pKb3kuvETfJ1H0MAwgeXsAw2ANeKM66Z2wxhvCQSawPVDYVrM3dNQRzBxvk jxdqpB34t8KL48mmGU6MyQ5nLF8YjSxk/TjhEtTjkTpUBLgt8hu6lBoIlGv72PD+1HbL kNXg== X-Gm-Message-State: AOAM531JvmdT3Wu7I3LiCEqsIJEa1MwqRGWzxhaLcE23BwlxPjhc+TAx l2Dk8boWKff1ipy9oIRlBjk= X-Google-Smtp-Source: ABdhPJyzS620NVAKSrvnlNKkQSMHEXkJhUtgqVhjsL+vOiafiXiks0wBPfVkucMoN+67ho5ind+Wtg== X-Received: by 2002:a17:902:8e85:: with SMTP id bg5mr11537252plb.306.1597678029567; Mon, 17 Aug 2020 08:27:09 -0700 (PDT) Received: from google.com ([2620:15c:211:1:7220:84ff:fe09:5e58]) by smtp.gmail.com with ESMTPSA id a24sm19857849pfg.113.2020.08.17.08.27.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Aug 2020 08:27:08 -0700 (PDT) Date: Mon, 17 Aug 2020 08:27:06 -0700 From: Minchan Kim To: David Hildenbrand Cc: Andrew Morton , linux-mm , Joonsoo Kim , Vlastimil Babka , John Dias , Suren Baghdasaryan , pullip.cho@samsung.com Subject: Re: [RFC 0/7] Support high-order page bulk allocation Message-ID: <20200817152706.GB3852332@google.com> References: <20200814173131.2803002-1-minchan@kernel.org> <4e2bd095-b693-9fed-40e0-ab538ec09aaa@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4e2bd095-b693-9fed-40e0-ab538ec09aaa@redhat.com> X-Rspamd-Queue-Id: 190A618140B67 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote: > On 14.08.20 19:31, Minchan Kim wrote: > > There is a need for special HW to require bulk allocation of > > high-order pages. For example, 4800 * order-4 pages. > > > > To meet the requirement, a option is using CMA area because > > page allocator with compaction under memory pressure is > > easily failed to meet the requirement and too slow for 4800 > > times. However, CMA has also the following drawbacks: > > > > * 4800 of order-4 * cma_alloc is too slow > > > > To avoid the slowness, we could try to allocate 300M contiguous > > memory once and then split them into order-4 chunks. > > The problem of this approach is CMA allocation fails one of the > > pages in those range couldn't migrate out, which happens easily > > with fs write under memory pressure. > > Why not chose a value in between? Like try to allocate MAX_ORDER - 1 > chunks and split them. That would already heavily reduce the call frequency. I think you meant this: alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1) It would work if system has lots of non-fragmented free memory. However, once they are fragmented, it doesn't work. That's why we have seen even order-4 allocation failure in the field easily and that's why CMA was there. CMA has more logics to isolate the memory during allocation/freeing as well as fragmentation avoidance so that it has less chance to be stealed from others and increase high success ratio. That's why I want this API to be used with CMA or movable zone. A usecase is device can set a exclusive CMA area up when system boots. When device needs 4800 * order-4 pages, it could call this bulk against of the area so that it could effectively be guaranteed to allocate enough fast. > > I don't see a real need for a completely new range allocator function > for this special case yet. > > > > > To solve issues, this patch introduces alloc_pages_bulk. > > > > int alloc_pages_bulk(unsigned long start, unsigned long end, > > unsigned int migratetype, gfp_t gfp_mask, > > unsigned int order, unsigned int nr_elem, > > struct page **pages); > > > > It will investigate the [start, end) and migrate movable pages > > out there by best effort(by upcoming patches) to make requested > > order's free pages. > > > > The allocated pages will be returned using pages parameter. > > Return value represents how many of requested order pages we got. > > It could be less than user requested by nr_elem. > > > > /** > > * alloc_pages_bulk() -- tries to allocate high order pages > > * by batch from given range [start, end) > > * @start: start PFN to allocate > > * @end: one-past-the-last PFN to allocate > > * @migratetype: migratetype of the underlaying pageblocks (either > > * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks > > * in range must have the same migratetype and it must > > * be either of the two. > > * @gfp_mask: GFP mask to use during compaction > > * @order: page order requested > > * @nr_elem: the number of high-order pages to allocate > > * @pages: page array pointer to store allocated pages (must > > * have space for at least nr_elem elements) > > * > > * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES > > * aligned. The PFN range must belong to a single zone. > > * > > * Return: the number of pages allocated on success or negative error code. > > * The allocated pages should be freed using __free_pages > > */ > > > > The test goes order-4 * 4800 allocation(i.e., total 300MB) under kernel > > build workload. System RAM size is 1.5GB and CMA is 500M. > > > > With using CMA to allocate to 300M, ran 10 times trial, 10 time failed > > with big latency(up to several seconds). > > > > With this alloc_pages_bulk API, ran 10 time trial, 7 times are > > successful to allocate 4800 times. Rest 3 times are allocated 4799, 4789 > > and 4799. They are all done with 300ms. > > > > This patchset is against on next-20200813 > > > > Minchan Kim (7): > > mm: page_owner: split page by order > > mm: introduce split_page_by_order > > mm: compaction: deal with upcoming high-order page splitting > > mm: factor __alloc_contig_range out > > mm: introduce alloc_pages_bulk API > > mm: make alloc_pages_bulk best effort > > mm/page_isolation: avoid drain_all_pages for alloc_pages_bulk > > > > include/linux/gfp.h | 5 + > > include/linux/mm.h | 2 + > > include/linux/page-isolation.h | 1 + > > include/linux/page_owner.h | 10 +- > > mm/compaction.c | 64 +++++++---- > > mm/huge_memory.c | 2 +- > > mm/internal.h | 5 +- > > mm/page_alloc.c | 198 ++++++++++++++++++++++++++------- > > mm/page_isolation.c | 10 +- > > mm/page_owner.c | 7 +- > > 10 files changed, 230 insertions(+), 74 deletions(-) > > > > > -- > Thanks, > > David / dhildenb >