From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC1C6C41604 for ; Sat, 3 Oct 2020 08:34:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 384CF206DB for ; Sat, 3 Oct 2020 08:34:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="s2B2+iZ2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 384CF206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 48EB86B005C; Sat, 3 Oct 2020 04:34:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43F296B005D; Sat, 3 Oct 2020 04:34:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 308846B0062; Sat, 3 Oct 2020 04:34:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id F063B6B005C for ; Sat, 3 Oct 2020 04:34:30 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 638E0181AE874 for ; Sat, 3 Oct 2020 08:34:30 +0000 (UTC) X-FDA: 77329952700.27.mint82_270d5a8271ab Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 494093D663 for ; Sat, 3 Oct 2020 08:34:30 +0000 (UTC) X-HE-Tag: mint82_270d5a8271ab X-Filterd-Recvd-Size: 8880 Received: from mail-oi1-f195.google.com (mail-oi1-f195.google.com [209.85.167.195]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Sat, 3 Oct 2020 08:34:29 +0000 (UTC) Received: by mail-oi1-f195.google.com with SMTP id v20so3664048oiv.3 for ; Sat, 03 Oct 2020 01:34:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=MwiC6YyUZFKsR0OxsiL5dqCxbE9WNpSWpYmDHi3uZb4=; b=s2B2+iZ2DS5bAdFajuwTOZbSgIS0kvKB9eFuoSekzKVBfHj6VYFZcLuQc7DFDKFRCh 0CzYQdJqdyNjQ4+AaYAD+BcfFg6UYJwfzDTabb5ckgFbh9G5tbV/b3a8Tb8IaHYrXK9e YgjqWIYUIHDI9NisNeGL+94NPhn5X4ZU0t2nCZ97bNiFv5fdQPbyjjcVbsXDTTkWWBS0 dPHhV578Mo38EZCUj2h3XEEx/Q7Onc584hjVNjFimVQJrJaXv4PvgfPdCovKR3YG2ZqV bCyL8nTePjit6Q7hnm+hTCotVghPI21Ao+zOAy3SfUT52mcspFE34Tjxj+XjcX2EPsmH xyfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=MwiC6YyUZFKsR0OxsiL5dqCxbE9WNpSWpYmDHi3uZb4=; b=lXap7C7OEqkkznzvlj+5/W6N2Hff3J37AwVByo5/3sjAuwJ2a8LRcqHUNR7XB9bb8H Q5tvaNB45azOmwLB4cxNwx3cH6G+OqgZE7XBnAz6lgkors9C+I3BglzDzNnQx4nkJOca m/JwUSj7+MF+iKa4A+f7DanceEyF9BJGVXQhd3Y1Rf4M8oVJNgDcwxLG3Alde7haWnJ+ 2b88sfruXjuNlseOmLKDuTU/RxAIs6hnxQ+McnrESzBuxT0wsc68tsE/rCUlBu7cdUFh OZamzLWYlmc7bD5F42Zz111rd9EC6t1msfRF54mmrby3YKXXi58Y1KursK/GMGMtNLFY boSg== X-Gm-Message-State: AOAM530r2rUScRgIudcTsPm1c+5paaYuO6uTFKJGImQm8kdANwGDvYbM 9cC1UF1NZWp6C08UrVd0sLOpBnu3A7o0p+AiHcc= X-Google-Smtp-Source: ABdhPJwz6YDv9Y3Szh9uKYhc8PvVg5MEbm5BaRdTqy54FwkGbFwyabTSxzLcQSKtnQ+lvxanp34yvku05esUVnF4kVA= X-Received: by 2002:a05:6808:3bb:: with SMTP id n27mr3285560oie.130.1601714069143; Sat, 03 Oct 2020 01:34:29 -0700 (PDT) MIME-Version: 1.0 References: <20201002175303.390363-1-daniel.vetter@ffwll.ch> <20201002175303.390363-2-daniel.vetter@ffwll.ch> <20201002180603.GL9916@ziepe.ca> <20201002233118.GM9916@ziepe.ca> In-Reply-To: <20201002233118.GM9916@ziepe.ca> From: Oded Gabbay Date: Sat, 3 Oct 2020 11:34:01 +0300 Message-ID: Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM To: Jason Gunthorpe Cc: Daniel Vetter , DRI Development , LKML , Daniel Vetter , Andrew Morton , John Hubbard , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Dan Williams , Linux MM , Linux ARM , Pawel Osciak , Marek Szyprowski , Kyungmin Park , Tomasz Figa , Inki Dae , Joonyoung Shim , Seung-Woo Kim , linux-samsung-soc , "open list:DMA BUFFER SHARING FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Oct 3, 2020 at 2:31 AM Jason Gunthorpe wrote: > > On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote: > > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe wrote: > > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote: > > > > For $reasons I've stumbled over this code and I'm not sure the chan= ge > > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector: > > > > convert get_user_pages() --> pin_user_pages()") was entirely correc= t. > > > > > > > > This here is used for long term buffers (not just quick I/O) like > > > > RDMA, and John notes this in his patch. But I thought the rule for > > > > these is that they need to add FOLL_LONGTERM, which John's patch > > > > didn't do. > > > > > > > > There is already a dax specific check (added in b7f0554a56f2 ("mm: > > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this see= ms > > > > like the prudent thing to do. > > > > > > > > Signed-off-by: Daniel Vetter > > > > Cc: Andrew Morton > > > > Cc: John Hubbard > > > > Cc: J=C3=A9r=C3=B4me Glisse > > > > Cc: Jan Kara > > > > Cc: Dan Williams > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > Cc: linux-media@vger.kernel.org > > > > Hi all, > > > > > > > > I stumbled over this and figured typing this patch can't hurt. Real= ly > > > > just to maybe learn a few things about how gup/pup is supposed to b= e > > > > used (we have a bit of that in drivers/gpu), this here isn't really > > > > ralated to anything I'm doing. > > > > > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO > > > > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib > > verb access mode indicates possible writes. I'm not really clear on > > why FOLL_WRITE isn't enough any why you need to be able to write > > through a vma that's write protected currently. > > Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the > only reason you'd want this version for read is if you are doing > longterm stuff. I wrote about this recently: > > https://lore.kernel.org/linux-mm/20200928235739.GU9916@ziepe.ca/ > > > > Since every driver does this wrong anything that uses this is creatin= g > > > terrifying security issues. > > > > > > IMHO this whole API should be deleted :( > > > > Yeah that part I just tried to conveniently ignore. I guess this dates > > back to a time when ioremaps where at best fixed, and there wasn't > > anything like a gpu driver dynamically managing vram around, resulting > > in random entirely unrelated things possibly being mapped to that set > > of pfns. > > No, it was always wrong. Prior to GPU like cases the lifetime of the > PTE was tied to the vma and when the vma becomes free the driver can > move the things in the PTEs to 'free'. Easy to trigger use-after-free > issues and devices like RDMA have security contexts attached to these > PTEs so it becomes a serious security bug to do something like this. > > > The underlying follow_pfn is also used in other places within > > drivers/media, so this doesn't seem to be an accident, but actually > > intentional. > > Looking closely, there are very few users, most *seem* pointless, but > maybe there is a crazy reason? > > The sequence > get_vaddr_frames(); > frame_vector_to_pages(); > sg_alloc_table_from_pages(); > > Should be written > pin_user_pages_fast(FOLL_LONGTERM); > sg_alloc_table_from_pages() > > There is some 'special' code in frame_vector_to_pages() that tries to > get a struct page for things from a VM_IO or VM_PFNMAP... > > Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP > then get_vaddr_frames() iterates over all VMAs in the range, of any > kind and extracts the PTEs then blindly references them! This means it > can be used to use after free normal RAM struct pages!! Gah! > > Wow. Okay. That has to go. > > So, I *think* we can assume there is no sane cases where > frame_vector_to_pages() succeeds but pin_user_pages() wasn't called. > > That means the users here: > - habanalabs: Hey Oded can you fix this up? Yes, no problem, I'll do it very soon. Thanks, Oded > > - gpu/exynos: Daniel can you get someone there to stop using it? > > - media/videobuf via vb2_dma_sg_get_userptr() > > Should all be switched to the standard pin_user_pages sequence > above. > > That leaves the only interesting places as vb2_dc_get_userptr() and > vb2_vmalloc_get_userptr() which both completely fail to follow the > REQUIRED behavior in the function's comment about checking PTEs. It > just DMA maps them. Badly broken. > > Guessing this hackery is for some embedded P2P DMA transfer? > > After he three places above should use pin_user_pages_fast(), then > this whole broken API should be moved into videobuf2-memops.c and a > big fat "THIS DOESN'T WORK" stuck on it. > > videobuf2 should probably use P2P DMA buf for this instead. > > Jason