From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF97EC41604 for ; Wed, 7 Oct 2020 13:06:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 187FF20789 for ; Wed, 7 Oct 2020 13:06:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="IVfaH4Or" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 187FF20789 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 58F946B006C; Wed, 7 Oct 2020 09:06:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 518896B006E; Wed, 7 Oct 2020 09:06:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E1636B0070; Wed, 7 Oct 2020 09:06:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 0C4C76B006C for ; Wed, 7 Oct 2020 09:06:38 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 928B5181AE86E for ; Wed, 7 Oct 2020 13:06:37 +0000 (UTC) X-FDA: 77345153634.12.push03_430752e271cf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 5F88A180559ED for ; Wed, 7 Oct 2020 13:06:37 +0000 (UTC) X-HE-Tag: push03_430752e271cf X-Filterd-Recvd-Size: 10409 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Wed, 7 Oct 2020 13:06:36 +0000 (UTC) Received: by mail-ej1-f68.google.com with SMTP id e22so2857038ejr.4 for ; Wed, 07 Oct 2020 06:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=JtiTAC1lCqywnD4Z2Vg4zVbVRlofQtOWuMfZmgMPCws=; b=IVfaH4OrbWAN6uzg0UIk/strWAow/Q6FzTRKahgOrF9EF7kqJXUGjYDmctFaBHvyOo FYBn0VINGYPuQRP0p46AU6UBnVUkbrZOzWZ9Ys2I27aCChyOeG+vJ8llmAULxl7kC8wj mqztEpzuIt/NtFC/8GqwQM3qEJxAHpZi5HRdE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=JtiTAC1lCqywnD4Z2Vg4zVbVRlofQtOWuMfZmgMPCws=; b=tcCbv8uRtajpC77XigXWpCmrG1V9NJR4SKIsTkxVMwNfb6BerpJZhDKSriHIbHNKVL Pj4fRQcz7DtbGmsmAgn5xGTkiApqM9seNw/8UoC8vdTD6+NsZKBCi+kPrBBracwM85lh V3HwQzWZbQmAeZoPb4ecBoBlDp/RjDaD6UeUkiq8uv/LyhaABkowG2ZCKdPEyJpgMIpt ZpwubEogtiV8zhmYFphtCEo4LKbE+NRYgmDGzvxcqVsHVaHLHectEEHOsvf0DwRvH5Xr LT5JsFY2nrXfTyLaXMSQB3lmMTMSekIXV8kQpluuugZeV1GfB+MTXSgrSqhAtnhEbEUP zbuw== X-Gm-Message-State: AOAM531SjqqI4P4lOqzYGD8CxFRy1zTyofjwRprSePl+FprF8bbTy6Li N4zQc0O/Xwjy6XLP44xz49OiUURMNhnyyw== X-Google-Smtp-Source: ABdhPJwWxdl767yhMl5PENPk+4DYxbSJgFVV9VJb06m67fVvvHqjN3jYRWGp7kW6JNcYEx/bny+r7Q== X-Received: by 2002:a17:906:3716:: with SMTP id d22mr3226150ejc.267.1602075994225; Wed, 07 Oct 2020 06:06:34 -0700 (PDT) Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com. [209.85.128.50]) by smtp.gmail.com with ESMTPSA id k18sm1455666ejh.50.2020.10.07.06.06.30 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Oct 2020 06:06:31 -0700 (PDT) Received: by mail-wm1-f50.google.com with SMTP id d4so2216399wmd.5 for ; Wed, 07 Oct 2020 06:06:30 -0700 (PDT) X-Received: by 2002:a1c:2d85:: with SMTP id t127mr3180500wmt.22.1602075989680; Wed, 07 Oct 2020 06:06:29 -0700 (PDT) MIME-Version: 1.0 References: <20201002175303.390363-1-daniel.vetter@ffwll.ch> <20201002175303.390363-2-daniel.vetter@ffwll.ch> <20201002180603.GL9916@ziepe.ca> <20201002233118.GM9916@ziepe.ca> In-Reply-To: <20201002233118.GM9916@ziepe.ca> From: Tomasz Figa Date: Wed, 7 Oct 2020 15:06:17 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM To: Jason Gunthorpe Cc: Daniel Vetter , DRI Development , LKML , Daniel Vetter , Andrew Morton , John Hubbard , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Dan Williams , Linux MM , Linux ARM , Pawel Osciak , Marek Szyprowski , Kyungmin Park , Inki Dae , Joonyoung Shim , Seung-Woo Kim , linux-samsung-soc , "open list:DMA BUFFER SHARING FRAMEWORK" , Oded Gabbay Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Oct 3, 2020 at 1:31 AM Jason Gunthorpe wrote: > > On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote: > > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe wrote: > > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote: > > > > For $reasons I've stumbled over this code and I'm not sure the chan= ge > > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector: > > > > convert get_user_pages() --> pin_user_pages()") was entirely correc= t. > > > > > > > > This here is used for long term buffers (not just quick I/O) like > > > > RDMA, and John notes this in his patch. But I thought the rule for > > > > these is that they need to add FOLL_LONGTERM, which John's patch > > > > didn't do. > > > > > > > > There is already a dax specific check (added in b7f0554a56f2 ("mm: > > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this see= ms > > > > like the prudent thing to do. > > > > > > > > Signed-off-by: Daniel Vetter > > > > Cc: Andrew Morton > > > > Cc: John Hubbard > > > > Cc: J=C3=A9r=C3=B4me Glisse > > > > Cc: Jan Kara > > > > Cc: Dan Williams > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > Cc: linux-media@vger.kernel.org > > > > Hi all, > > > > > > > > I stumbled over this and figured typing this patch can't hurt. Real= ly > > > > just to maybe learn a few things about how gup/pup is supposed to b= e > > > > used (we have a bit of that in drivers/gpu), this here isn't really > > > > ralated to anything I'm doing. > > > > > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO > > > > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib > > verb access mode indicates possible writes. I'm not really clear on > > why FOLL_WRITE isn't enough any why you need to be able to write > > through a vma that's write protected currently. > > Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the > only reason you'd want this version for read is if you are doing > longterm stuff. I wrote about this recently: > > https://lore.kernel.org/linux-mm/20200928235739.GU9916@ziepe.ca/ > > > > Since every driver does this wrong anything that uses this is creatin= g > > > terrifying security issues. > > > > > > IMHO this whole API should be deleted :( > > > > Yeah that part I just tried to conveniently ignore. I guess this dates > > back to a time when ioremaps where at best fixed, and there wasn't > > anything like a gpu driver dynamically managing vram around, resulting > > in random entirely unrelated things possibly being mapped to that set > > of pfns. > > No, it was always wrong. Prior to GPU like cases the lifetime of the > PTE was tied to the vma and when the vma becomes free the driver can > move the things in the PTEs to 'free'. Easy to trigger use-after-free > issues and devices like RDMA have security contexts attached to these > PTEs so it becomes a serious security bug to do something like this. > > > The underlying follow_pfn is also used in other places within > > drivers/media, so this doesn't seem to be an accident, but actually > > intentional. > > Looking closely, there are very few users, most *seem* pointless, but > maybe there is a crazy reason? > > The sequence > get_vaddr_frames(); > frame_vector_to_pages(); > sg_alloc_table_from_pages(); > > Should be written > pin_user_pages_fast(FOLL_LONGTERM); > sg_alloc_table_from_pages() > > There is some 'special' code in frame_vector_to_pages() that tries to > get a struct page for things from a VM_IO or VM_PFNMAP... > > Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP > then get_vaddr_frames() iterates over all VMAs in the range, of any > kind and extracts the PTEs then blindly references them! This means it > can be used to use after free normal RAM struct pages!! Gah! > > Wow. Okay. That has to go. > > So, I *think* we can assume there is no sane cases where > frame_vector_to_pages() succeeds but pin_user_pages() wasn't called. > > That means the users here: > - habanalabs: Hey Oded can you fix this up? > > - gpu/exynos: Daniel can you get someone there to stop using it? > > - media/videobuf via vb2_dma_sg_get_userptr() > > Should all be switched to the standard pin_user_pages sequence > above. > > That leaves the only interesting places as vb2_dc_get_userptr() and > vb2_vmalloc_get_userptr() which both completely fail to follow the > REQUIRED behavior in the function's comment about checking PTEs. It > just DMA maps them. Badly broken. Note that vb2_vmalloc is only used for in-kernel CPU usage, e.g. the contents being copied by the driver between vb2 buffers and some hardware FIFO or other dedicated buffers. The memory does not go to any hardware DMA. Could you elaborate on what "the REQUIRED behavior is"? I can see that both follow the get_vaddr_frames() -> frame_vector_to_pages() flow, as you mentioned. Perhaps the only change needed is switching to pin_user_pages after all? > > Guessing this hackery is for some embedded P2P DMA transfer? > That's not the case. There are two usage scenarios: 1) Regular user pointer support, i.e. using malloc()ed memory as a video input or output buffer. This is deemed to be deprecated, but still used quite widely by various applications. Quite common for video decoder input and video encoder output, because the other side of the pipeline is CPU, e.g. network stack. 2) Sharing carveout buffers (without struct pages attached) between two devices. I believe this predates DMA-buf and CMA (which enabled carveouts with struct pages) and I'm not sure how commonly it is used these days. > After he three places above should use pin_user_pages_fast(), then > this whole broken API should be moved into videobuf2-memops.c and a > big fat "THIS DOESN'T WORK" stuck on it. > > videobuf2 should probably use P2P DMA buf for this instead. See above. Best regards, Tomasz