From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=g5mM=FG=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,FSL_HELO_FAKE,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F0024C6369E
	for <linux-mm@archiver.kernel.org>; Wed,  2 Dec 2020 20:48:39 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 152E322203
	for <linux-mm@archiver.kernel.org>; Wed,  2 Dec 2020 20:48:38 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 152E322203
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 152D76B005C; Wed,  2 Dec 2020 15:48:38 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1035E6B005D; Wed,  2 Dec 2020 15:48:38 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id E973C6B0068; Wed,  2 Dec 2020 15:48:37 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7])
	by kanga.kvack.org (Postfix) with ESMTP id CEB286B005C
	for <linux-mm@kvack.org>; Wed,  2 Dec 2020 15:48:37 -0500 (EST)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 933F31EF2
	for <linux-mm@kvack.org>; Wed,  2 Dec 2020 20:48:37 +0000 (UTC)
X-FDA: 77549530674.12.thumb00_5c05fdf273b6
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin12.hostedemail.com (Postfix) with ESMTP id 731601801272F
	for <linux-mm@kvack.org>; Wed,  2 Dec 2020 20:48:37 +0000 (UTC)
X-HE-Tag: thumb00_5c05fdf273b6
X-Filterd-Recvd-Size: 9240
Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196])
	by imf44.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Wed,  2 Dec 2020 20:48:36 +0000 (UTC)
Received: by mail-pg1-f196.google.com with SMTP id o4so4956pgj.0
        for <linux-mm@kvack.org>; Wed, 02 Dec 2020 12:48:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=gt9hzYCtNcO6pucg1V+gBSIqdKsij7u0OWJUlXP9DKM=;
        b=UUG2F1QYy5FvtvGtTYBJmiMRQoqH0NFofMnXxJDDDNLskva/PIVsF+3J8kdAQXvqRq
         vujK49qAeZwATkysyhPnwXetTOK0LeMC+CABGQX4Pn9fpok9I8d7Kkt1Kwo7haQl6BSg
         afpECKeFPdeVS+vONuYuFjDDLKyBYv35wpKcD+W7Eb6ql5yb1B2o2q1cmm8EwWlma3+j
         K6H7/shJRnOprQRDiZ0ReWQ8AwdWolI8nPmO4AgNkLzuKpmTNCuJ9Kku7AOut8ZFI2bK
         EEdju0Y4RpX+EsknllBncdSKqV8fKSOuetxoASlTmQjvmJjTgwWrT5rrs094yo7QuZjV
         776g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
         :references:mime-version:content-disposition:in-reply-to;
        bh=gt9hzYCtNcO6pucg1V+gBSIqdKsij7u0OWJUlXP9DKM=;
        b=VU1FsCsJHDhuRCqs9/vYOVSMf/JSNELiVO1R1V2rK2MrlbdH3uOYJsGDPcTRv29CFy
         7gYAv07M1ajBjqOz0JFXa78ROm7TNFdTd8U82+NhJ/HFRb4fqZco1PJaZrZtcdSdyY3/
         jXWUegUOMJDcoDb10rWJ43ScgLG+0ThZ1KsakWMpcR6ksYs2UTYD4A2Xl6eRCJ6yNpD0
         cbUdbmHsTYEFDfrKusv83+MEyrMxQ9SgQFobPjRyC6U8eWA7m9jondspxkftoKmS+XuH
         aK8eRAeOw1Cm4zkvbGB065mQBuq1oya18u1+KlGSFJX4Z0f8fiszfsqFcYOrQ/hzK4Mj
         FD3w==
X-Gm-Message-State: AOAM530ar65Z4Ui9VbPp7vgECC5q63k8tFW4GUbDOfeAMaxBTm9SBghA
	3on+lIlmUNq/R/YPtlKZY4Y=
X-Google-Smtp-Source: ABdhPJywHlNLOdog4uLjnoWm03XM8rSAaJFIonYnd/FuiTebzFhIhD5EPwsZYD1rNwjeMIO4SbWUFg==
X-Received: by 2002:aa7:8297:0:b029:198:15b2:ed0a with SMTP id s23-20020aa782970000b029019815b2ed0amr4307591pfm.47.1606942115908;
        Wed, 02 Dec 2020 12:48:35 -0800 (PST)
Received: from google.com ([2620:15c:211:201:7220:84ff:fe09:5e58])
        by smtp.gmail.com with ESMTPSA id g85sm8386pfb.4.2020.12.02.12.48.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 02 Dec 2020 12:48:34 -0800 (PST)
Date: Wed, 2 Dec 2020 12:48:31 -0800
From: Minchan Kim <minchan@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>,
	hyesoo.yu@samsung.com, willy@infradead.org, iamjoonsoo.kim@lge.com,
	vbabka@suse.cz, surenb@google.com, pullip.cho@samsung.com,
	joaodias@google.com, hridya@google.com, sumit.semwal@linaro.org,
	john.stultz@linaro.org, Brian.Starkey@arm.com,
	linux-media@vger.kernel.org, devicetree@vger.kernel.org,
	robh@kernel.org, christian.koenig@amd.com,
	linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH v2 2/4] mm: introduce cma_alloc_bulk API
Message-ID: <X8f9nxqYcD8u8dtl@google.com>
References: <20201201175144.3996569-1-minchan@kernel.org>
 <20201201175144.3996569-3-minchan@kernel.org>
 <8f006a4a-c21d-9db3-5493-fb1cc651b0cf@redhat.com>
 <20201202154915.GU17338@dhcp22.suse.cz>
 <X8e9tSwcsrEsAv1O@google.com>
 <20201202164834.GV17338@dhcp22.suse.cz>
 <X8fU1ddmsSfuV6sD@google.com>
 <20201202185107.GW17338@dhcp22.suse.cz>
 <X8fqU82GXmu57f7V@google.com>
 <f0e980cb-cc74-82e8-6ccf-09030a96103a@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <f0e980cb-cc74-82e8-6ccf-09030a96103a@redhat.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Dec 02, 2020 at 09:22:36PM +0100, David Hildenbrand wrote:
> On 02.12.20 20:26, Minchan Kim wrote:
> > On Wed, Dec 02, 2020 at 07:51:07PM +0100, Michal Hocko wrote:
> >> On Wed 02-12-20 09:54:29, Minchan Kim wrote:
> >>> On Wed, Dec 02, 2020 at 05:48:34PM +0100, Michal Hocko wrote:
> >>>> On Wed 02-12-20 08:15:49, Minchan Kim wrote:
> >>>>> On Wed, Dec 02, 2020 at 04:49:15PM +0100, Michal Hocko wrote:
> >>>> [...]
> >>>>>> Well, what I can see is that this new interface is an antipatern to our
> >>>>>> allocation routines. We tend to control allocations by gfp mask yet you
> >>>>>> are introducing a bool parameter to make something faster... What that
> >>>>>> really means is rather arbitrary. Would it make more sense to teach
> >>>>>> cma_alloc resp. alloc_contig_range to recognize GFP_NOWAIT, GFP_NORETRY resp.
> >>>>>> GFP_RETRY_MAYFAIL instead?
> >>>>>
> >>>>> If we use cma_alloc, that interface requires "allocate one big memory
> >>>>> chunk". IOW, return value is just struct page and expected that the page
> >>>>> is a big contiguos memory. That means it couldn't have a hole in the
> >>>>> range.
> >>>>> However the idea here, what we asked is much smaller chunk rather
> >>>>> than a big contiguous memory so we could skip some of pages if they are
> >>>>> randomly pinned(long-term/short-term whatever) and search other pages
> >>>>> in the CMA area to avoid long stall. Thus, it couldn't work with exising
> >>>>> cma_alloc API with simple gfp_mak.
> >>>>
> >>>> I really do not see that as something really alient to the cma_alloc
> >>>> interface. All you should care about, really, is what size of the object
> >>>> you want and how hard the system should try. If you have a problem with
> >>>> an internal implementation of CMA and how it chooses a range and deal
> >>>> with pinned pages then it should be addressed inside the CMA allocator.
> >>>> I suspect that you are effectivelly trying to workaround those problems
> >>>> by a side implementation with a slightly different API. Or maybe I still
> >>>> do not follow the actual problem.
> >>>>  
> >>>>>> I am not deeply familiar with the cma allocator so sorry for a
> >>>>>> potentially stupid question. Why does a bulk interface performs better
> >>>>>> than repeated calls to cma_alloc? Is this because a failure would help
> >>>>>> to move on to the next pfn range while a repeated call would have to
> >>>>>> deal with the same range?
> >>>>>
> >>>>> Yub, true with other overheads(e.g., migration retrial, waiting writeback
> >>>>> PCP/LRU draining IPI)
> >>>>
> >>>> Why cannot this be implemented in the cma_alloc layer? I mean you can
> >>>> cache failed cases and optimize the proper pfn range search.
> >>>
> >>> So do you suggest this?
> >>>
> >>> enum cma_alloc_mode {
> >>> 	CMA_ALLOC_NORMAL,
> >>> 	CMA_ALLOC_FAIL_FAST,
> >>> };
> >>>
> >>> struct page *cma_alloc(struct cma *cma, size_t count, unsigned int
> >>> 	align, enum cma_alloc_mode mode);
> >>>
> >>> >From now on, cma_alloc will keep last failed pfn and then start to
> >>> search from the next pfn for both CMA_ALLOC_NORMAL and
> >>> CMA_ALLOC_FAIL_FAST if requested size from the cached pfn is okay
> >>> within CMA area and then wraparound it couldn't find right pages
> >>> from the cached pfn. Othewise, the cached pfn will reset to the zero
> >>> so that it starts the search from the 0. I like the idea since it's
> >>> general improvement, I think.
> >>
> >> Yes something like that. There are more options to be clever here - e.g.
> >> track ranges etc. but I am not sure this is worth the complexity.
> > 
> > Agree. Just last pfn caching would be good enough as simple start.
> > 
> >>
> >>> Furthemore, With CMA_ALLOC_FAIL_FAST, it could avoid several overheads
> >>> at the cost of sacrificing allocation success ratio like GFP_NORETRY.
> >>
> >> I am still not sure a specific flag is a good interface. Really can this
> >> be gfp_mask instead?
> > 
> > I am not strong(even, I did it with GFP_NORETRY) but David wanted to
> > have special mode and I agreed when he mentioned ALLOC_CONTIG_HARD as
> > one of options in future(it would be hard to indicate that mode with
> > gfp flags).
> 
> I can't tell regarding the CMA interface, but for the alloc_contig()
> interface I think modes make sense. Yes, it's different to other
> allocaters, but the contig range allocater is different already. E.g.,
> the CMA allocater mostly hides "which exact PFNs you try to allocate".
> 
> In the contig range allocater, gfp flags are currently used to express
> how to allocate pages used as migration targets. I don't think mangling
> in other gfp flags (or even overloading them) makes things a lot
> clearer. E.g., GFP_NORETRY: don't retry to allocate migration targets?
> don't retry to migrate pages? both?
> 
> As I said, other aspects might be harder to model (e.g., don't drain
> LRU) and hiding them behind generic gfp flags (e.g., GFP_NORETRY) feels
> wrong.

I also support a special flag/bool variable for cma_alloc rather than
relying on mixing original gfp_flags since it would be more clear
with preventing passing unhandled the other gfp_flags into cma_alloc.

> 
> With the mode, we're expressing details for the necessary page
> migration. Suggestions on how to model that are welcome.
> 
> -- 
> Thanks,
> 
> David / dhildenb
>