From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=G0TP=B4=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CEF2AC433DF
	for <linux-mm@archiver.kernel.org>; Tue, 18 Aug 2020 07:43:05 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 8E662207FB
	for <linux-mm@archiver.kernel.org>; Tue, 18 Aug 2020 07:43:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Y1pRw+1R"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E662207FB
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 202788D0005; Tue, 18 Aug 2020 03:43:05 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1B3A18D0003; Tue, 18 Aug 2020 03:43:05 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 07B2B8D0005; Tue, 18 Aug 2020 03:43:05 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50])
	by kanga.kvack.org (Postfix) with ESMTP id E27DA8D0003
	for <linux-mm@kvack.org>; Tue, 18 Aug 2020 03:43:04 -0400 (EDT)
Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 91C391F06
	for <linux-mm@kvack.org>; Tue, 18 Aug 2020 07:43:04 +0000 (UTC)
X-FDA: 77162898288.03.bears20_30159e12701d
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin03.hostedemail.com (Postfix) with ESMTP id 54A9228A4EA
	for <linux-mm@kvack.org>; Tue, 18 Aug 2020 07:43:04 +0000 (UTC)
X-HE-Tag: bears20_30159e12701d
X-Filterd-Recvd-Size: 7561
Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194])
	by imf02.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 18 Aug 2020 07:43:03 +0000 (UTC)
Received: by mail-pg1-f194.google.com with SMTP id 189so8733199pgg.13
        for <linux-mm@kvack.org>; Tue, 18 Aug 2020 00:43:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:subject:to:cc:references:in-reply-to:mime-version
         :message-id:content-transfer-encoding;
        bh=I4249D4th+QXbHKs05P0EDC1yqk/TsdiMCXXbxiXOFA=;
        b=Y1pRw+1RtpKesUfJO+NaflCf8bP5tHW43KysILkpcwBd35N/QtevjM2xMHj5MdRIj9
         2ZN52fN95HYVzxeha7kANHuvpKWdcauNn4pxRavpbgHBem48cIb23/53QXhSckTRxN7S
         C7hVH5XOdqtRU84ZBPUBChwgbEPp9fOBTLibri65CVJBZAl/X5PpsDneFcpNiZEyUm/5
         rcj8sRp6U7nKWY/0m1K7PWkAfrTQJDM63RVgc+Mspus2g113SkgHuApQjyZOwn63sra9
         WMZGH7GLOhWj3zqr94hRsXIrdlWVadGvLxkMNlmqSRg7HPYii2W/pYKy74fzYxFGHH63
         VY5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to
         :mime-version:message-id:content-transfer-encoding;
        bh=I4249D4th+QXbHKs05P0EDC1yqk/TsdiMCXXbxiXOFA=;
        b=EYE6/lKbBFmXMQPUe8LzuwwQg7GW+1ViSxHQL+qJ0rJkmug9SVGFcBCa7dNdEL+sZ9
         Mx9p6Bi+pEmVCZpaBfzYrJVKYz1sKj9oEtikmlnFCGSznavXBwlYHrNxtEGxYdO78j/r
         GaxCaDv02TGCADw+pkPNBrS9RKg2g8e3bXa6MYQ5VX+8JOMZk+R0njBlzjIEfKAxHs6M
         MagMapQ3ouoZ+F8H0SAXJa/Z4SBqLX1zha0XDa7kt1AoNj/XTGtQpVhpI/MHj1jGPSc2
         yoSvclcTRhyMhZZYFaDuv1SyY34SDav8rWozLTzOVpqNLuvQgvnsU73P9qi/jJAyZwEJ
         KehA==
X-Gm-Message-State: AOAM533D3Dhxa61/PYGox9AdgaUeuIq3HlhJ2QR9N2U4FgeQNMqRzxhw
	xwv3JYfP09e0mLyoAFXA/mc=
X-Google-Smtp-Source: ABdhPJy+UEIL8tLu/JLZvWKs+KxlRJ1qLFk0NzqVFJraC04G0Z1wlYMcqA/ZtpL+w4cEP62dFztWYw==
X-Received: by 2002:aa7:8182:: with SMTP id g2mr13884529pfi.261.1597736582748;
        Tue, 18 Aug 2020 00:43:02 -0700 (PDT)
Received: from localhost (193-116-193-175.tpgi.com.au. [193.116.193.175])
        by smtp.gmail.com with ESMTPSA id x7sm23688616pfc.209.2020.08.18.00.43.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Aug 2020 00:43:02 -0700 (PDT)
Date: Tue, 18 Aug 2020 17:42:56 +1000
From: Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [RFC 0/7] Support high-order page bulk allocation
To: David Hildenbrand <david@redhat.com>, Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Joonsoo Kim
	<iamjoonsoo.kim@lge.com>, John Dias <joaodias@google.com>, linux-mm
	<linux-mm@kvack.org>, pullip.cho@samsung.com, Suren Baghdasaryan
	<surenb@google.com>, Vlastimil Babka <vbabka@suse.cz>
References: <20200814173131.2803002-1-minchan@kernel.org>
	<4e2bd095-b693-9fed-40e0-ab538ec09aaa@redhat.com>
	<20200817152706.GB3852332@google.com>
	<aa96518d-94c9-8c28-5e67-59388587b3bd@redhat.com>
	<20200817163018.GC3852332@google.com>
	<f047f2b2-9f62-cbf4-3c6b-a0f3bf1e9406@redhat.com>
	<20200817233442.GD3852332@google.com>
In-Reply-To: <20200817233442.GD3852332@google.com>
MIME-Version: 1.0
Message-Id: <1597735668.d8431uavn3.astroid@bobo.none>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 54A9228A4EA
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam03
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Excerpts from Minchan Kim's message of August 18, 2020 9:34 am:
> On Mon, Aug 17, 2020 at 06:44:50PM +0200, David Hildenbrand wrote:
>> On 17.08.20 18:30, Minchan Kim wrote:
>> > On Mon, Aug 17, 2020 at 05:45:59PM +0200, David Hildenbrand wrote:
>> >> On 17.08.20 17:27, Minchan Kim wrote:
>> >>> On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote:
>> >>>> On 14.08.20 19:31, Minchan Kim wrote:
>> >>>>> There is a need for special HW to require bulk allocation of
>> >>>>> high-order pages. For example, 4800 * order-4 pages.
>> >>>>>
>> >>>>> To meet the requirement, a option is using CMA area because
>> >>>>> page allocator with compaction under memory pressure is
>> >>>>> easily failed to meet the requirement and too slow for 4800
>> >>>>> times. However, CMA has also the following drawbacks:
>> >>>>>
>> >>>>>  * 4800 of order-4 * cma_alloc is too slow
>> >>>>>
>> >>>>> To avoid the slowness, we could try to allocate 300M contiguous
>> >>>>> memory once and then split them into order-4 chunks.
>> >>>>> The problem of this approach is CMA allocation fails one of the
>> >>>>> pages in those range couldn't migrate out, which happens easily
>> >>>>> with fs write under memory pressure.
>> >>>>
>> >>>> Why not chose a value in between? Like try to allocate MAX_ORDER - =
1
>> >>>> chunks and split them. That would already heavily reduce the call f=
requency.
>> >>>
>> >>> I think you meant this:
>> >>>
>> >>>     alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1)
>> >>>
>> >>> It would work if system has lots of non-fragmented free memory.
>> >>> However, once they are fragmented, it doesn't work. That's why we ha=
ve
>> >>> seen even order-4 allocation failure in the field easily and that's =
why
>> >>> CMA was there.
>> >>>
>> >>> CMA has more logics to isolate the memory during allocation/freeing =
as
>> >>> well as fragmentation avoidance so that it has less chance to be ste=
aled
>> >>> from others and increase high success ratio. That's why I want this =
API
>> >>> to be used with CMA or movable zone.
>> >>
>> >> I was talking about doing MAX_ORDER - 1 CMA allocations instead of on=
e
>> >> big 300M allocation. As you correctly note, memory placed into CMA
>> >> should be movable, except for (short/long) term pinnings. In these
>> >> cases, doing allocations smaller than 300M and splitting them up shou=
ld
>> >> be good enough to reduce the call frequency, no?
>> >=20
>> > I should have written that. The 300M I mentioned is really minimum siz=
e.
>> > In some scenraio, we need way bigger than 300M, up to several GB.
>> > Furthermore, the demand would be increased in near future.
>>=20
>> And what will the driver do with that data besides providing it to the
>> device? Can it be mapped to user space? I think we really need more
>> information / the actual user.
>>=20
>> >>
>> >>>
>> >>> A usecase is device can set a exclusive CMA area up when system boot=
s.
>> >>> When device needs 4800 * order-4 pages, it could call this bulk agai=
nst
>> >>> of the area so that it could effectively be guaranteed to allocate
>> >>> enough fast.
>> >>
>> >> Just wondering
>> >>
>> >> a) Why does it have to be fast?
>> >=20
>> > That's because it's related to application latency, which ends up
>> > user feel bad.
>>=20
>> Okay, but in theory, your device-needs are very similar to
>> application-needs, besides you requiring order-4 pages, correct? Similar
>> to an application that starts up and pins 300M (or more), just with
>> ordr-4 pages.
>=20
> Yes.

Linux has never seriously catered for broken devices that require
large contiguous physical ranges to perform well.

The problem with doing this is it allows hardware designers to get
progressively lazier and foist more of their work onto us, and then
we'd be stuck with it.

I think you need to provide a lot better justification than this, and
probably should just solve it with some hack like allocating larger
pages or pre-allocating some of that CMA space before the user opens
the device, or require application to use hugetlbfs.

Thanks,
Nick