From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VkKC=B7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C08E2C433E1
	for <linux-mm@archiver.kernel.org>; Fri, 21 Aug 2020 04:18:20 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 662C02076E
	for <linux-mm@archiver.kernel.org>; Fri, 21 Aug 2020 04:18:20 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=mg.codeaurora.org header.i=@mg.codeaurora.org header.b="WE6MJ7Np"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 662C02076E
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=codeaurora.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 98E168D0012; Fri, 21 Aug 2020 00:18:19 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 93ED18D000A; Fri, 21 Aug 2020 00:18:19 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 806AB8D0012; Fri, 21 Aug 2020 00:18:19 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137])
	by kanga.kvack.org (Postfix) with ESMTP id 6876D8D000A
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 00:18:19 -0400 (EDT)
Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 350F3248D
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 04:18:19 +0000 (UTC)
X-FDA: 77173268718.03.sign00_380b99427036
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin03.hostedemail.com (Postfix) with ESMTP id 0422828A4EB
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 04:18:18 +0000 (UTC)
X-HE-Tag: sign00_380b99427036
X-Filterd-Recvd-Size: 7774
Received: from mail29.static.mailgun.info (mail29.static.mailgun.info [104.130.122.29])
	by imf36.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 04:18:12 +0000 (UTC)
DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt;
 s=smtp; t=1597983498; h=Message-ID: References: In-Reply-To: Subject:
 Cc: To: From: Date: Content-Transfer-Encoding: Content-Type:
 MIME-Version: Sender; bh=dp3nIlYMWIpEqxxYOG00yGTEXD2AmqqdUiwBKlzCCso=;
 b=WE6MJ7NpCcR4meeQN/ykclUh+CpSgWIefqudRpd8Raz6bRh54m7xFvsBcVtgqmhBhmoZfoA8
 M11wWfLnwzKajAJ0vHZq2H0Goz3ekxtyNDygpdjHLFyWSmIWTGdedkcSbFLtPmFbVAnz+sZe
 fWuGKbdKeSHAoZ8qUUt1Tja300s=
X-Mailgun-Sending-Ip: 104.130.122.29
X-Mailgun-Sid: WyIwY2Q3OCIsICJsaW51eC1tbUBrdmFjay5vcmciLCAiYmU5ZTRhIl0=
Received: from smtp.codeaurora.org
 (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by
 smtp-out-n05.prod.us-west-2.postgun.com with SMTP id
 5f3f4ad9797e7ddecc37f97d (version=TLS1.2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Fri, 21 Aug 2020 04:17:29
 GMT
Received: by smtp.codeaurora.org (Postfix, from userid 1001)
	id EC0A4C433C6; Fri, 21 Aug 2020 04:17:28 +0000 (UTC)
Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	(Authenticated sender: cgoldswo)
	by smtp.codeaurora.org (Postfix) with ESMTPSA id 4A3BFC433CA;
	Fri, 21 Aug 2020 04:17:28 +0000 (UTC)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Thu, 20 Aug 2020 21:17:28 -0700
From: cgoldswo@codeaurora.org
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-arm-msm@vger.kernel.org,
 linux-kernel@vger.kernel.org, pratikp@codeaurora.org, pdaly@codeaurora.org,
 sudraja@codeaurora.org, iamjoonsoo.kim@lge.com,
 linux-arm-msm-owner@vger.kernel.org
Subject: Re: cma_alloc(), add sleep-and-retry for temporary page pinning
In-Reply-To: <896f92e8c37936e7cb2914e79273e9e8@codeaurora.org>
References: <896f92e8c37936e7cb2914e79273e9e8@codeaurora.org>
Message-ID: <896458e8daf87a274ba1ce8ced30ac8e@codeaurora.org>
X-Sender: cgoldswo@codeaurora.org
User-Agent: Roundcube Webmail/1.3.9
X-Rspamd-Queue-Id: 0422828A4EB
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On 2020-08-11 15:20, cgoldswo@codeaurora.org wrote:
> On 2020-08-06 18:31, Andrew Morton wrote:
>> On Wed,  5 Aug 2020 19:56:21 -0700 Chris Goldsworthy
>> <cgoldswo@codeaurora.org> wrote:
>> 
>>> On mobile devices, failure to allocate from a CMA area constitutes a
>>> functional failure.  Sometimes during CMA allocations, we have 
>>> observed
>>> that pages in a CMA area allocated through alloc_pages(), that we're 
>>> trying
>>> to migrate away to make room for a CMA allocation, are temporarily 
>>> pinned.
>>> This temporary pinning can occur when a process that owns the pinned 
>>> page
>>> is being forked (the example is explained further in the commit 
>>> text).
>>> This patch addresses this issue by adding a sleep-and-retry loop in
>>> cma_alloc() . There's another example we know of similar to the above 
>>> that
>>> occurs during exit_mmap() (in zap_pte_range() specifically), but I 
>>> need to
>>> determine if this is still relevant today.
>> 
> 
>> Sounds fairly serious but boy, we're late for 5.9.
>> 
>> I can queue it for 5.10 with a cc:stable so that it gets backported
>> into earlier kernels a couple of months from now, if we think the
>> seriousness justifies backporting(?).
>> 
> 
> Queuing this seems like the best way to proceed, if we were to pick up
> this patch.
> I think we can forgo back-porting this, as this is something that will 
> only be
> needed as vendors such as our selves start using Google's Generic 
> Kernel Image
> (we've carried this patch in our tree for over four years).
> 
>> 
>> And...  it really is a sad little patch, isn't it?  Instead of fixing
>> the problem, it reduces the problem's probability by 5x.  Can't we do
>> better than this?
> 
> I have one alternative in mind.  I have been able to review the 
> exit_mmap()
> case, so before proceeding, let's do a breakdown of the problem: we can
> categorize the pinning issue we're trying to address here as being one 
> of
> (1) incrementing _refcount and getting context-switched out before
> incrementing _mapcount (applies to forking a process / copy_one_pte()), 
> and
> (2) decrementing _mapcount and getting context-switched out before
> decrementing _refcount (applies to tearing down a process / 
> exit_mmap()).
> So, one alternative would be to insert preempt_disable/enable() calls 
> at
> affected sites. So, for the copy_one_pte() pinning case, we could do 
> the
> following inside of copy_one_pte():
> 
>         if (page) {
> +               preempt_disable();
>                 get_page(page);
>                 page_dup_rmap(page, false);
> +               preempt_enable();
>                 rss[mm_counter(page)]++;
>         }
> 
> I'm not sure if this approach would be acceptable for the exit_mmap()
> pinning case (applicable when CONFIG_MMU_GATHER_NO_GATHER=y).  For the
> purposes of this discussion, we can look at two function calls inside 
> of
> exit_mmap(), in the order they're called in, to show how the pinning is
> occuring:
> 
>     1. Calling unmap_vmas(): this unmaps the pages in each VMA for an
>     exiting task, using zap_pte_range() - zap_pte_range() reduces the
>     _mapcount for each page in a VMA, using page_remove_rmap().  After
>     calling page_remove_rmap(), the page is placed into a list in
>     __tlb_remove_page().  This list of pages will be used when flushing
>     TLB entries later on during the process teardown.
> 
>     2. Calling tlb_finish_mmu(): This is will flush the TLB entries
>     associated with pages, before calling put_page() on them, using the
>     previously collected pages from __tlb_remove_page() - the call flow 
> is
>     tlb_flush_mmu() > tlb_flush_mmu() > tlb_flush_mmu_free()
>     > tlb_batch_pages_flush() > free_pages_and_swap_cache() >
>     release_pages(), where release_pages() is described as a "batched
>     put_page()"
> 
> The preempt_disable/enable() approach would entail doing the following
> inside of exit_mmap():
> 
> +       preempt_disable();
>         unmap_vmas(&tlb, vma, 0, -1);
>         free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 
> USER_PGTABLES_CEILING);
>         tlb_finish_mmu(&tlb, 0, -1);
> +       preempt_enable();
> 
> I'm not sure doing this is feasible, given how long it could take to do 
> the
> process teardown.
> 
> The good thing about this patch is that it has been stable in our 
> kernel
> for four years (though for some SoCs we increased the retry counts).  
> One
> thing to stress is that there are other instances of CMA page pinning, 
> that
> this patch isn't attempting to address. Please let me know if you're 
> okay
> with queuing this for the 5.10 merge window - if you are, I can add an
> option to configure the number of retries, and will resend the patch 
> once
> the 5.9 merge window closes.
> 
> Thanks,
> 
> Chris.

Hi Andrew,

Have you been able to give the patch any further consideration?

Thanks,

Chris.

-- 
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project