From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82273C47423 for ; Fri, 2 Oct 2020 07:31:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 57583207F7 for ; Fri, 2 Oct 2020 07:31:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="lC6n1jSw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 57583207F7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32FF46B006C; Fri, 2 Oct 2020 03:31:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BAF26B006E; Fri, 2 Oct 2020 03:31:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 181336B0070; Fri, 2 Oct 2020 03:31:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id D6B726B006C for ; Fri, 2 Oct 2020 03:31:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 63ED5181AE871 for ; Fri, 2 Oct 2020 07:31:21 +0000 (UTC) X-FDA: 77326164762.21.swing94_050889a271a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 3A2E4180442C0 for ; Fri, 2 Oct 2020 07:31:21 +0000 (UTC) X-HE-Tag: swing94_050889a271a2 X-Filterd-Recvd-Size: 8365 Received: from mail-oi1-f195.google.com (mail-oi1-f195.google.com [209.85.167.195]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Fri, 2 Oct 2020 07:31:20 +0000 (UTC) Received: by mail-oi1-f195.google.com with SMTP id z26so380678oih.12 for ; Fri, 02 Oct 2020 00:31:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FV2kqyJj2HCS5dITMAB2uIW5WIzwJCxSsKqyYxhrIb8=; b=lC6n1jSw4vGEbBk+NzuHmRhWwgQz1I5GTOa2zuJv940Zx1wxEh97+0AE0wc3SK5OYg GRkx1QpyhQzUs7T9uEenrAEt1CqXAY7filBYvuEA4UVnDQfsCIepB9TlDFUZtJJt6th5 z+i698ITBVY0/dTSn4fqVlNnKIt8fSywkSnTE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FV2kqyJj2HCS5dITMAB2uIW5WIzwJCxSsKqyYxhrIb8=; b=bMUoeX3rs1xPBpHfDGgpEmOjLdjXa32Xb2hF0BpTSUCkn+gmCe3rd2FNCU0oy8wIn8 C4i4uZxEYYqenkXt+v2Jx7dMxDdN7+ay+OoIoh56viRWb3kkS/SIIDfgNluqpb4OqvKp 4XaUwuNPcu+YCHmW5huBtafzSl0+wz6vpXfaV58x33UnIMxLm3ybEnL6MNhRnNvaz7i4 viWLIKRXwqhyStOUpzagvKVjL3823ElLXHGI/GosoraoMh6izWkjaBl213/t6Beee0d+ Vxla7d9izs7OaDpJXEVagiRii4Foxa6Nxkv1doX7QTXpjSuB5pt+3bjzj8D1/c0jDGps KL/Q== X-Gm-Message-State: AOAM532kEGdB1er8JXpBX5aoWCFuJ8V1WROmbh8ddRFgw6IvAJZpbiIG fkqWqJcjN4LztONQKTpOn9HhMcdXTYZbL6mBTZgYHw== X-Google-Smtp-Source: ABdhPJxSuPhsIR7ozBvleiZGtWRZp5/0o4EsEQMkcZj85SgObTBwKlJDPsHuSNersHwNZqgCIw8aPofYz6agkQnCgd8= X-Received: by 2002:aca:eb49:: with SMTP id j70mr477352oih.101.1601623879554; Fri, 02 Oct 2020 00:31:19 -0700 (PDT) MIME-Version: 1.0 References: <20200930221821.13719-1-agoins@nvidia.com> <4566cf03-1c9e-1626-6c92-7b5fa29d6b75@amd.com> In-Reply-To: From: Daniel Vetter Date: Fri, 2 Oct 2020 09:31:08 +0200 Message-ID: Subject: Re: [PATCH RFC 0/1] drm/ttm: Allocate transparent huge pages without clearing __GFP_COMP To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Alex Goins , amd-gfx list , dri-devel , Linux MM , John Hubbard , Zi Yan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 2, 2020 at 8:41 AM Christian K=C3=B6nig wrote: > > Hi Alex, > > adding Daniel as well. > > Am 01.10.20 um 20:45 schrieb Alex Goins: > > Hi Christian, > > > > On Thu, 1 Oct 2020, Christian K=C3=B6nig wrote: > > > >> Hi Alex, > >> > >> first of all accessing the underlying page of an exported DMA-buf is > >> illegal! So I'm not 100% sure what you're intentions are here, please > >> explain further. > > We have some mapping requirements that I was hoping I could address by = mapping > > these pages manually. > > > > Are you sure that it's illegal to access the underlying pages of an exp= orted > > DMA-BUF? > > yes, I'm 100% sure of that. This was discussed multiple times now on the > mailing list. > > > There appears to be quite a few usages of this already. See the usage > > of drm_prime_sg_to_page_addr_arrays() in vgem, vkms, msm, xen, and etna= viv. > > drm_gem_prime_import_dev() uses driver->gem_prime_import_sg_table() whe= n > > importing a DMA-BUF from another driver, and the listed drivers then ex= tract the > > pages from the given SGT using drm_prime_sg_to_page_addr_arrays(). Thes= e pages > > can then be mapped and faulted in. > > No, exactly that doesn't work correctly. > > You are corrupting internal state in struct page while doing so and risk > that userspace is accessing freed up memory. > > We really need to find a way to fix the few drivers already doing this. Yeah the drivers doing this were merged with everyone aware that it's a bad trick, but 10 years ago we had nothing, not even userspace for multi-gpu, so there needed to be something to get the thing off the ground. But it was a bad idea back then, and it's still a bad idea now (and now we do have the ecosystem off the ground, so there's really not excuse for shortcuts). -Daniel > > See commit af33a9190d02 ('drm/vgem: Enable dmabuf import interfaces'). = After > > importing the pages from the SGT, vgem can fault them in, taking a refc= ount with > > get_page() first. get_page() throws a BUG if the refcount is zero, whic= h it will > > hit on each of the 'tail' pages from TTM THP allocations. > > > > All of this currently works fine with TTM DMA-BUFs when the kernel is b= uilt with > > !CONFIG_TRANSPARENT_HUGEPAGE. However, 'echo never > > > /sys/kernel/mm/transparent_hugepage/enabled' doesn't change how TTM all= ocates > > pages. > > You need to redirect the mapping to dma_buf_mmap() instead. > > Regards, > Christian. > > > > >> Then the reason for TTM not using compound pages is that we can't > >> guarantee that they are mapped as a whole to userspace. > >> > >> The result is that the kernel sometimes tried to de-compound them whic= h > >> created a bunch of problems. > >> > >> So yes this is completely intentional. > > Understood, I figured something like that was the case, so I wanted to = get your > > input first. Do you know what the problems were, exactly? Practical iss= ues > > aside, it seems strange to call something a transparent huge page if it= 's > > non-compound. > > > > Besides making these pages compound, would it be reasonable to split th= em before > > sharing them, in e.g. amdgpu_dma_buf_map (and in other drivers that use= TTM)? > > That's where it's supposed to make sure that the shared DMA-BUF is acce= ssible by > > the target device. > > > > Thanks, > > Alex > > > >> Regards, > >> Christian. > >> > >> Am 01.10.20 um 00:18 schrieb Alex Goins: > >>> Hi Christian, > >>> > >>> I've been looking into the DMA-BUFs exported from AMDGPU / TTM. Would > >>> you mind giving some input on this? > >>> > >>> I noticed that your changes implementing transparent huge page suppor= t > >>> in TTM are allocating them as non-compound. I understand that using > >>> multiorder non-compound pages is common in device drivers, but I thin= k > >>> this can cause a problem when these pages are exported to other drive= rs. > >>> > >>> It's possible for other drivers to access the DMA-BUF's pages via > >>> gem_prime_import_sg_table(), but without context from TTM, it's > >>> impossible for the importing driver to make sense of them; they simpl= y > >>> appear as individual pages, with only the first page having a non-zer= o > >>> refcount. Making TTM's THP allocations compound puts them more in lin= e > >>> with the standard definition of a THP, and allows DMA-BUF-importing > >>> drivers to make sense of the pages within. > >>> > >>> I would like to propose making these allocations compound, but based = on > >>> patch history, it looks like the decision to make them non-compound w= as > >>> intentional, as there were difficulties figuring out how to map them > >>> into CPU page tables. I did some cursory testing with compound THPs, = and > >>> nothing seems obviously broken. I was also able to map compound THP > >>> DMA-BUFs into userspace without issue, and access their contents. Are > >>> you aware of any other potential consequences? > >>> > >>> Commit 5c42c64f7d54 ("drm/ttm: fix the fix for huge compound pages") = should > >>> probably also be reverted if this is applied. > >>> > >>> Thanks, > >>> Alex > >>> > >>> Alex Goins (1): > >>> drm-ttm: Allocate compound transparent huge pages > >>> > >>> drivers/gpu/drm/ttm/ttm_page_alloc.c | 5 ++--- > >>> 1 file changed, 2 insertions(+), 3 deletions(-) > >>> > --=20 Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch