From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A034CC282DA for ; Wed, 30 Jan 2019 11:34:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F3BE20844 for ; Wed, 30 Jan 2019 11:34:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F3BE20844 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4DD88E0002; Wed, 30 Jan 2019 06:34:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFF658E0001; Wed, 30 Jan 2019 06:34:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEDA08E0002; Wed, 30 Jan 2019 06:34:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 8A8368E0001 for ; Wed, 30 Jan 2019 06:34:45 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id s27so16141686pgm.4 for ; Wed, 30 Jan 2019 03:34:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:in-reply-to:references:date:message-id:mime-version; bh=lD1B27lkKnEmSZuh42qovztLBs3GuXWcSD6heOz7zTY=; b=ZXCDfS+7VJuMsDWiYDARXkSpt/FzVzhvVGW2/oIJfre3az5MYiHD03xyM0geZQwX+y E5KBkcnSgNhwD/+craEmzvCVy71DnRf1UuEPu2y4/H3JFN1qbDQ13BweJtQMQ6l35Qpu y1/YJIeruiwd96QJZFmkBs3Lj0LSyKI4VgaKP81CUqbOSgG0ML0ahBPawP2sASwIVJ2+ E6OYpxPbEybHI6EBdY1vh0m4EW31lQciPotlDWuob8Dqhj62kiYJXfj3EHIhuzFral7c h4gE1EMoO53E9/EYYoJ1rXvJB8NLV57APFJzO6IkdhRVsHGu6ksrvRjsD/DPZ7JPsoUQ amXg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 203.11.71.1 is neither permitted nor denied by best guess record for domain of mpe@ellerman.id.au) smtp.mailfrom=mpe@ellerman.id.au X-Gm-Message-State: AJcUukdxxDUbIyG3YTlrBJJULS/bo8sSVrr+TCrM7ddW3WGsFbC5XAkp CXBfR2tO1KsVWuRG/dEyZNfTBrZlv+848h/UEF9QlCzVkyYybrN6yGD5AnPan1+tFtVFl2WbWef l7qxWAwfsVr1e0ALuuq8OeAgjBGWc0s/6dTKzlK3B+XNiwdeGad0TbIblgw7ccBY= X-Received: by 2002:a63:cd11:: with SMTP id i17mr27433095pgg.345.1548848085117; Wed, 30 Jan 2019 03:34:45 -0800 (PST) X-Google-Smtp-Source: ALg8bN5jIzHrj6tjIw7bx7lolYzXjzAJ/i/oeqgup1L3ARDN2Sg0mTvYtm8hRS0mvkaxV7R8KpmX X-Received: by 2002:a63:cd11:: with SMTP id i17mr27433038pgg.345.1548848084031; Wed, 30 Jan 2019 03:34:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548848084; cv=none; d=google.com; s=arc-20160816; b=dM1rPUv1zRtUmEcsJQ7rCXKydzYqoZW7s7RrC99NUqfqS2VPlup2DLQYUCdQjrcaa7 25tLEXru37hgujWf9rXEqHcFEU2Db8ONcS9dLON6D3iPd6bAFz2QcO2nytUWiRAPqd5t GvfrweMcsBkE5AusqPShcxORYwRDjRThAFxe8QicJz/hiuJ7+7YK+SJHYr4dPy/SISOF iCM6mKBEDg7FvZD6cN5hsu9Qv8v3TQbDvG9Vkc1cASFwU1KahveTh4r7khqZIqCYAFCU N12hXGs7NkEYYFdo1pEwYYRHvs5Bp/0AoULtzYnUpleEWg54nXdvWb7uHREk1q+8TX8Z R1tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from; bh=lD1B27lkKnEmSZuh42qovztLBs3GuXWcSD6heOz7zTY=; b=CYnqgwpU5rkVFZn12tbj4GECFf56HoJV7ljRuhJDHWTRauacAUNK4EUYSXjfI+2m+f SeIlcohm1Kb35xbNetTKpct0l67kySZe8n+Tu9vwM9eR4aRzFi+3nzt61YHTTPbcTSSp BvxIkHBYPD+2KxUqYA1VR0S7jnb4oGaXq2h6CjyRT7gWIBOcn6zr+vusnEMRhVQo3zMd lM8cAjWlF/9M/YTQmkBrbT0sqgkzf/WO0JQVkD0lTch5MDLHhMwsO91N+M5btQ1fhEMv ArHsDyPKxyG8V+eyMgquQf1Ue9dGkYcMnikH0Vmq9wf7i83eaxrPqTEpU5hjQbyYeQL7 DUIA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 203.11.71.1 is neither permitted nor denied by best guess record for domain of mpe@ellerman.id.au) smtp.mailfrom=mpe@ellerman.id.au Received: from ozlabs.org (ozlabs.org. [203.11.71.1]) by mx.google.com with ESMTPS id 33si1337247plt.228.2019.01.30.03.34.43 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 Jan 2019 03:34:43 -0800 (PST) Received-SPF: neutral (google.com: 203.11.71.1 is neither permitted nor denied by best guess record for domain of mpe@ellerman.id.au) client-ip=203.11.71.1; Authentication-Results: mx.google.com; spf=neutral (google.com: 203.11.71.1 is neither permitted nor denied by best guess record for domain of mpe@ellerman.id.au) smtp.mailfrom=mpe@ellerman.id.au Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPSA id 43qLqz6kSCz9s6w; Wed, 30 Jan 2019 22:34:39 +1100 (AEDT) From: Michael Ellerman To: "Aneesh Kumar K.V" , akpm@linux-foundation.org, Michal Hocko , Alexey Kardashevskiy , David Gibson , Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K.V" Subject: Re: [PATCH V7 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_do_alloc In-Reply-To: <20190114095438.32470-5-aneesh.kumar@linux.ibm.com> References: <20190114095438.32470-1-aneesh.kumar@linux.ibm.com> <20190114095438.32470-5-aneesh.kumar@linux.ibm.com> Date: Wed, 30 Jan 2019 22:34:39 +1100 Message-ID: <874l9qqsz4.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Aneesh Kumar K.V" writes: > The current code doesn't do page migration if the page allocated is a compound page. > With HugeTLB migration support, we can end up allocating hugetlb pages from > CMA region. Also, THP pages can be allocated from CMA region. This patch updates > the code to handle compound pages correctly. The patch also switches to a single > get_user_pages with the right count, instead of doing one get_user_pages per page. > That avoids reading page table multiple times. It's not very obvious from the above description that the migration logic is now being done by get_user_pages_longterm(), it just looks like it's all being deleted in this patch. Would be good to mention that. > Since these page reference updates are long term pin, switch to > get_user_pages_longterm. That makes sure we fail correctly if the guest RAM > is backed by DAX pages. Can you explain that in more detail? > The patch also converts the hpas member of mm_iommu_table_group_mem_t to a union. > We use the same storage location to store pointers to struct page. We cannot > update all the code path use struct page *, because we access hpas in real mode > and we can't do that struct page * to pfn conversion in real mode. That's a pain, it's asking for bugs mixing two different values in the same array. But I guess it's the least worst option. It sounds like that's a separate change you could do in a separate patch. But it's not, because it's tied to the fact that we're doing a single GUP call. > diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c > index a712a650a8b6..f11a2f15071f 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > static DEFINE_MUTEX(mem_list_mutex); > > @@ -34,8 +35,18 @@ struct mm_iommu_table_group_mem_t { > atomic64_t mapped; > unsigned int pageshift; > u64 ua; /* userspace address */ > - u64 entries; /* number of entries in hpas[] */ > - u64 *hpas; /* vmalloc'ed */ > + u64 entries; /* number of entries in hpas/hpages[] */ > + /* > + * in mm_iommu_get we temporarily use this to store > + * struct page address. > + * > + * We need to convert ua to hpa in real mode. Make it > + * simpler by storing physical address. > + */ > + union { > + struct page **hpages; /* vmalloc'ed */ > + phys_addr_t *hpas; > + }; > #define MM_IOMMU_TABLE_INVALID_HPA ((uint64_t)-1) > u64 dev_hpa; /* Device memory base address */ > }; > @@ -80,64 +91,15 @@ bool mm_iommu_preregistered(struct mm_struct *mm) > } > EXPORT_SYMBOL_GPL(mm_iommu_preregistered); > > -/* > - * Taken from alloc_migrate_target with changes to remove CMA allocations > - */ > -struct page *new_iommu_non_cma_page(struct page *page, unsigned long private) > -{ > - gfp_t gfp_mask = GFP_USER; > - struct page *new_page; > - > - if (PageCompound(page)) > - return NULL; > - > - if (PageHighMem(page)) > - gfp_mask |= __GFP_HIGHMEM; > - > - /* > - * We don't want the allocation to force an OOM if possibe > - */ > - new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN); > - return new_page; > -} > - > -static int mm_iommu_move_page_from_cma(struct page *page) > -{ > - int ret = 0; > - LIST_HEAD(cma_migrate_pages); > - > - /* Ignore huge pages for now */ > - if (PageCompound(page)) > - return -EBUSY; > - > - lru_add_drain(); > - ret = isolate_lru_page(page); > - if (ret) > - return ret; > - > - list_add(&page->lru, &cma_migrate_pages); > - put_page(page); /* Drop the gup reference */ > - > - ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page, > - NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE); > - if (ret) { > - if (!list_empty(&cma_migrate_pages)) > - putback_movable_pages(&cma_migrate_pages); > - } > - > - return 0; > -} > - > static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, > - unsigned long entries, unsigned long dev_hpa, > - struct mm_iommu_table_group_mem_t **pmem) > + unsigned long entries, unsigned long dev_hpa, > + struct mm_iommu_table_group_mem_t **pmem) > { > struct mm_iommu_table_group_mem_t *mem; > - long i, j, ret = 0, locked_entries = 0; > + long i, ret = 0, locked_entries = 0; I'd prefer we didn't initialise ret here. > unsigned int pageshift; > unsigned long flags; > unsigned long cur_ua; > - struct page *page = NULL; > > mutex_lock(&mem_list_mutex); > > @@ -187,41 +149,27 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, > goto unlock_exit; > } > > + down_read(&mm->mmap_sem); > + ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL); > + up_read(&mm->mmap_sem); > + if (ret != entries) { > + /* free the reference taken */ > + for (i = 0; i < ret; i++) > + put_page(mem->hpages[i]); > + > + vfree(mem->hpas); > + kfree(mem); > + ret = -EFAULT; > + goto unlock_exit; > + } else { > + ret = 0; Or here. Instead it should be set to 0 at good_exit. > + } > + > + pageshift = PAGE_SHIFT; > for (i = 0; i < entries; ++i) { > + struct page *page = mem->hpages[i]; > + > cur_ua = ua + (i << PAGE_SHIFT); > - if (1 != get_user_pages_fast(cur_ua, > - 1/* pages */, 1/* iswrite */, &page)) { > - ret = -EFAULT; > - for (j = 0; j < i; ++j) > - put_page(pfn_to_page(mem->hpas[j] >> > - PAGE_SHIFT)); > - vfree(mem->hpas); > - kfree(mem); > - goto unlock_exit; > - } > - /* > - * If we get a page from the CMA zone, since we are going to > - * be pinning these entries, we might as well move them out > - * of the CMA zone if possible. NOTE: faulting in + migration > - * can be expensive. Batching can be considered later > - */ > - if (is_migrate_cma_page(page)) { > - if (mm_iommu_move_page_from_cma(page)) > - goto populate; > - if (1 != get_user_pages_fast(cur_ua, > - 1/* pages */, 1/* iswrite */, > - &page)) { > - ret = -EFAULT; > - for (j = 0; j < i; ++j) > - put_page(pfn_to_page(mem->hpas[j] >> > - PAGE_SHIFT)); > - vfree(mem->hpas); > - kfree(mem); > - goto unlock_exit; > - } > - } > -populate: > - pageshift = PAGE_SHIFT; > if (mem->pageshift > PAGE_SHIFT && PageCompound(page)) { > pte_t *pte; > struct page *head = compound_head(page); > @@ -239,6 +187,10 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, > local_irq_restore(flags); > } > mem->pageshift = min(mem->pageshift, pageshift); > + /* > + * We don't need struct page reference any more, switch > + * to physical address. > + */ > mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; > } I'm not any sort of expert on this code, but I don't see anything wrong. Reviewed-by: Michael Ellerman cheers