From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C08E2C433E1 for ; Fri, 21 Aug 2020 04:18:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 662C02076E for ; Fri, 21 Aug 2020 04:18:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=mg.codeaurora.org header.i=@mg.codeaurora.org header.b="WE6MJ7Np" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 662C02076E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 98E168D0012; Fri, 21 Aug 2020 00:18:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93ED18D000A; Fri, 21 Aug 2020 00:18:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 806AB8D0012; Fri, 21 Aug 2020 00:18:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137]) by kanga.kvack.org (Postfix) with ESMTP id 6876D8D000A for ; Fri, 21 Aug 2020 00:18:19 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 350F3248D for ; Fri, 21 Aug 2020 04:18:19 +0000 (UTC) X-FDA: 77173268718.03.sign00_380b99427036 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 0422828A4EB for ; Fri, 21 Aug 2020 04:18:18 +0000 (UTC) X-HE-Tag: sign00_380b99427036 X-Filterd-Recvd-Size: 7774 Received: from mail29.static.mailgun.info (mail29.static.mailgun.info [104.130.122.29]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Aug 2020 04:18:12 +0000 (UTC) DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1597983498; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=dp3nIlYMWIpEqxxYOG00yGTEXD2AmqqdUiwBKlzCCso=; b=WE6MJ7NpCcR4meeQN/ykclUh+CpSgWIefqudRpd8Raz6bRh54m7xFvsBcVtgqmhBhmoZfoA8 M11wWfLnwzKajAJ0vHZq2H0Goz3ekxtyNDygpdjHLFyWSmIWTGdedkcSbFLtPmFbVAnz+sZe fWuGKbdKeSHAoZ8qUUt1Tja300s= X-Mailgun-Sending-Ip: 104.130.122.29 X-Mailgun-Sid: WyIwY2Q3OCIsICJsaW51eC1tbUBrdmFjay5vcmciLCAiYmU5ZTRhIl0= Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n05.prod.us-west-2.postgun.com with SMTP id 5f3f4ad9797e7ddecc37f97d (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Fri, 21 Aug 2020 04:17:29 GMT Received: by smtp.codeaurora.org (Postfix, from userid 1001) id EC0A4C433C6; Fri, 21 Aug 2020 04:17:28 +0000 (UTC) Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cgoldswo) by smtp.codeaurora.org (Postfix) with ESMTPSA id 4A3BFC433CA; Fri, 21 Aug 2020 04:17:28 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 20 Aug 2020 21:17:28 -0700 From: cgoldswo@codeaurora.org To: Andrew Morton Cc: linux-mm@kvack.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, pratikp@codeaurora.org, pdaly@codeaurora.org, sudraja@codeaurora.org, iamjoonsoo.kim@lge.com, linux-arm-msm-owner@vger.kernel.org Subject: Re: cma_alloc(), add sleep-and-retry for temporary page pinning In-Reply-To: <896f92e8c37936e7cb2914e79273e9e8@codeaurora.org> References: <896f92e8c37936e7cb2914e79273e9e8@codeaurora.org> Message-ID: <896458e8daf87a274ba1ce8ced30ac8e@codeaurora.org> X-Sender: cgoldswo@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 X-Rspamd-Queue-Id: 0422828A4EB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020-08-11 15:20, cgoldswo@codeaurora.org wrote: > On 2020-08-06 18:31, Andrew Morton wrote: >> On Wed, 5 Aug 2020 19:56:21 -0700 Chris Goldsworthy >> wrote: >> >>> On mobile devices, failure to allocate from a CMA area constitutes a >>> functional failure. Sometimes during CMA allocations, we have >>> observed >>> that pages in a CMA area allocated through alloc_pages(), that we're >>> trying >>> to migrate away to make room for a CMA allocation, are temporarily >>> pinned. >>> This temporary pinning can occur when a process that owns the pinned >>> page >>> is being forked (the example is explained further in the commit >>> text). >>> This patch addresses this issue by adding a sleep-and-retry loop in >>> cma_alloc() . There's another example we know of similar to the above >>> that >>> occurs during exit_mmap() (in zap_pte_range() specifically), but I >>> need to >>> determine if this is still relevant today. >> > >> Sounds fairly serious but boy, we're late for 5.9. >> >> I can queue it for 5.10 with a cc:stable so that it gets backported >> into earlier kernels a couple of months from now, if we think the >> seriousness justifies backporting(?). >> > > Queuing this seems like the best way to proceed, if we were to pick up > this patch. > I think we can forgo back-porting this, as this is something that will > only be > needed as vendors such as our selves start using Google's Generic > Kernel Image > (we've carried this patch in our tree for over four years). > >> >> And... it really is a sad little patch, isn't it? Instead of fixing >> the problem, it reduces the problem's probability by 5x. Can't we do >> better than this? > > I have one alternative in mind. I have been able to review the > exit_mmap() > case, so before proceeding, let's do a breakdown of the problem: we can > categorize the pinning issue we're trying to address here as being one > of > (1) incrementing _refcount and getting context-switched out before > incrementing _mapcount (applies to forking a process / copy_one_pte()), > and > (2) decrementing _mapcount and getting context-switched out before > decrementing _refcount (applies to tearing down a process / > exit_mmap()). > So, one alternative would be to insert preempt_disable/enable() calls > at > affected sites. So, for the copy_one_pte() pinning case, we could do > the > following inside of copy_one_pte(): > > if (page) { > + preempt_disable(); > get_page(page); > page_dup_rmap(page, false); > + preempt_enable(); > rss[mm_counter(page)]++; > } > > I'm not sure if this approach would be acceptable for the exit_mmap() > pinning case (applicable when CONFIG_MMU_GATHER_NO_GATHER=y). For the > purposes of this discussion, we can look at two function calls inside > of > exit_mmap(), in the order they're called in, to show how the pinning is > occuring: > > 1. Calling unmap_vmas(): this unmaps the pages in each VMA for an > exiting task, using zap_pte_range() - zap_pte_range() reduces the > _mapcount for each page in a VMA, using page_remove_rmap(). After > calling page_remove_rmap(), the page is placed into a list in > __tlb_remove_page(). This list of pages will be used when flushing > TLB entries later on during the process teardown. > > 2. Calling tlb_finish_mmu(): This is will flush the TLB entries > associated with pages, before calling put_page() on them, using the > previously collected pages from __tlb_remove_page() - the call flow > is > tlb_flush_mmu() > tlb_flush_mmu() > tlb_flush_mmu_free() > > tlb_batch_pages_flush() > free_pages_and_swap_cache() > > release_pages(), where release_pages() is described as a "batched > put_page()" > > The preempt_disable/enable() approach would entail doing the following > inside of exit_mmap(): > > + preempt_disable(); > unmap_vmas(&tlb, vma, 0, -1); > free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, > USER_PGTABLES_CEILING); > tlb_finish_mmu(&tlb, 0, -1); > + preempt_enable(); > > I'm not sure doing this is feasible, given how long it could take to do > the > process teardown. > > The good thing about this patch is that it has been stable in our > kernel > for four years (though for some SoCs we increased the retry counts). > One > thing to stress is that there are other instances of CMA page pinning, > that > this patch isn't attempting to address. Please let me know if you're > okay > with queuing this for the 5.10 merge window - if you are, I can add an > option to configure the number of retries, and will resend the patch > once > the 5.9 merge window closes. > > Thanks, > > Chris. Hi Andrew, Have you been able to give the patch any further consideration? Thanks, Chris. -- The Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project