From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B7B6C2D0A3 for ; Mon, 9 Nov 2020 08:44:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A485220702 for ; Mon, 9 Nov 2020 08:44:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A485220702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 699DC6B0036; Mon, 9 Nov 2020 03:44:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 621DB6B005D; Mon, 9 Nov 2020 03:44:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 475A16B006E; Mon, 9 Nov 2020 03:44:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 130636B0036 for ; Mon, 9 Nov 2020 03:44:49 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BB0DF8249980 for ; Mon, 9 Nov 2020 08:44:48 +0000 (UTC) X-FDA: 77464244256.18.rest83_18000fb272eb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 9F782100ED3B2 for ; Mon, 9 Nov 2020 08:44:48 +0000 (UTC) X-HE-Tag: rest83_18000fb272eb X-Filterd-Recvd-Size: 7029 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Nov 2020 08:44:47 +0000 (UTC) IronPort-SDR: NfhHETTzD6EGDi5EZd1USreRsxYtWC+rlwqAuteAa17zYIFvNkcCgXb5h1gq5m2qlomlThtG9P WIhBYT8M1AHw== X-IronPort-AV: E=McAfee;i="6000,8403,9799"; a="187715885" X-IronPort-AV: E=Sophos;i="5.77,463,1596524400"; d="scan'208";a="187715885" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2020 00:44:45 -0800 IronPort-SDR: IfG8q/sA5EMlyMjSvfUVNc+EaAlzDVZgRGJpX0K7ztOK0lomesX1KEuchnWCGhkYW9POxwry58 wHJVsBh0G9fA== X-IronPort-AV: E=Sophos;i="5.77,463,1596524400"; d="scan'208";a="540750920" Received: from sfhansen-mobl.ger.corp.intel.com ([10.249.254.141]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2020 00:44:40 -0800 Message-ID: <504d77b87c81b7027157e0c7b5286e17123c59d9.camel@linux.intel.com> Subject: Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Jason Gunthorpe , Daniel Vetter Cc: John Hubbard , Christoph Hellwig , J??r??me Glisse , linux-samsung-soc , Jan Kara , Pawel Osciak , KVM list , Mauro Carvalho Chehab , LKML , DRI Development , Tomasz Figa , Linux MM , Kyungmin Park , Daniel Vetter , Andrew Morton , Marek Szyprowski , Dan Williams , Linux ARM , "open list:DMA BUFFER SHARING FRAMEWORK" Date: Mon, 09 Nov 2020 09:44:02 +0100 In-Reply-To: <20201106125505.GO36674@ziepe.ca> References: <20201104163758.GA17425@infradead.org> <20201104164119.GA18218@infradead.org> <20201104181708.GU36674@ziepe.ca> <20201105092524.GQ401619@phenom.ffwll.local> <20201105124950.GZ36674@ziepe.ca> <7ae3486d-095e-cf4e-6b0f-339d99709996@nvidia.com> <20201106125505.GO36674@ziepe.ca> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2020-11-06 at 08:55 -0400, Jason Gunthorpe wrote: > On Fri, Nov 06, 2020 at 11:27:59AM +0100, Daniel Vetter wrote: > > On Fri, Nov 6, 2020 at 11:01 AM Daniel Vetter > > wrote: > > > On Fri, Nov 6, 2020 at 5:08 AM John Hubbard > > > wrote: > > > > On 11/5/20 4:49 AM, Jason Gunthorpe wrote: > > > > > On Thu, Nov 05, 2020 at 10:25:24AM +0100, Daniel Vetter > > > > > wrote: > > > > > > > /* > > > > > > > * If we can't determine whether or not a pte is > > > > > > > special, then fail immediately > > > > > > > * for ptes. Note, we can still pin HugeTLB and THP as > > > > > > > these are guaranteed not > > > > > > > * to be special. > > > > > > > * > > > > > > > * For a futex to be placed on a THP tail page, > > > > > > > get_futex_key requires a > > > > > > > * get_user_pages_fast_only implementation that can pin > > > > > > > pages. Thus it's still > > > > > > > * useful to have gup_huge_pmd even if we can't operate > > > > > > > on ptes. > > > > > > > */ > > > > > > > > > > > > We support hugepage faults in gpu drivers since recently, > > > > > > and I'm not > > > > > > seeing a pud_mkhugespecial anywhere. So not sure this > > > > > > works, but probably > > > > > > just me missing something again. > > > > > > > > > > It means ioremap can't create an IO page PUD, it has to be > > > > > broken up. > > > > > > > > > > Does ioremap even create anything larger than PTEs? > > > > > > gpu drivers also tend to use vmf_insert_pfn* directly, so we can > > > do > > > on-demand paging and move buffers around. From what I glanced for > > > lowest level we to the pte_mkspecial correctly (I think I > > > convinced > > > myself that vm_insert_pfn does that), but for pud/pmd levels it > > > seems > > > just yolo. > > > > So I dug around a bit more and ttm sets PFN_DEV | PFN_MAP to get > > past > > the various pft_t_devmap checks (see e.g. > > vmf_insert_pfn_pmd_prot()). > > x86-64 has ARCH_HAS_PTE_DEVMAP, and gup.c seems to handle these > > specially, but frankly I got totally lost in what this does. > > The fact vmf_insert_pfn_pmd_prot() has all those BUG_ON's to prevent > putting VM_PFNMAP pages into the page tables seems like a big red > flag. > > The comment seems to confirm what we are talking about here: > > /* > * If we had pmd_special, we could avoid all these > restrictions, > * but we need to be consistent with PTEs and architectures > that > * can't support a 'special' bit. > */ > > ie without the ability to mark special we can't block fast gup and > anyone who does O_DIRECT on these ranges will crash the kernel when > it > tries to convert a IO page into a struct page. > > Should be easy enough to directly test? > > Putting non-struct page PTEs into a VMA without setting VM_PFNMAP > just > seems horribly wrong to me. Although core mm special huge-page support is currently quite limited, some time ago, I extended the pre-existing vma_is_dax() to vma_is_special_huge(): /** * vma_is_special_huge - Are transhuge page-table entries considered special? * @vma: Pointer to the struct vm_area_struct to consider * * Whether transhuge page-table entries are considered "special" following * the definition in vm_normal_page(). * * Return: true if transhuge page-table entries should be considered special, * false otherwise. */ static inline bool vma_is_special_huge(const struct vm_area_struct *vma) { return vma_is_dax(vma) || (vma->vm_file && (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))); } meaning that currently all transhuge page-table-entries in a PFNMAP or MIXEDMAP vma are considered "special". The number of calls to this function (mainly in the page-splitting code) is quite limited so replacing it with a more elaborate per-page-table-entry scheme would, I guess, definitely be possible. Although all functions using it would need to require a fallback path for architectures not supporting it. /Thomas > > Jason