From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EDC2C433E1 for ; Tue, 18 Aug 2020 20:35:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 98B64206B5 for ; Tue, 18 Aug 2020 20:35:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="VYeEQ6Gu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98B64206B5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E6FB38D001E; Tue, 18 Aug 2020 16:35:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E21148D0001; Tue, 18 Aug 2020 16:35:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0FE98D001E; Tue, 18 Aug 2020 16:35:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id B9BF88D0001 for ; Tue, 18 Aug 2020 16:35:19 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 57DD6180AD820 for ; Tue, 18 Aug 2020 20:35:19 +0000 (UTC) X-FDA: 77164844358.06.pear65_550c81b27022 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 048DF1004E418 for ; Tue, 18 Aug 2020 20:35:14 +0000 (UTC) X-HE-Tag: pear65_550c81b27022 X-Filterd-Recvd-Size: 5318 Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Aug 2020 20:35:13 +0000 (UTC) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 18 Aug 2020 13:34:57 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 18 Aug 2020 13:35:11 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 18 Aug 2020 13:35:11 -0700 Received: from [10.2.49.218] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 18 Aug 2020 20:35:11 +0000 Subject: Re: Regarding HMM To: Ralph Campbell , Valmiki , CC: References: <3482c2c7-6827-77f7-a581-69af8adc73c3@nvidia.com> From: John Hubbard Message-ID: <9af4d56c-61f5-9367-28bf-b6f1236e90fa@nvidia.com> Date: Tue, 18 Aug 2020 13:35:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <3482c2c7-6827-77f7-a581-69af8adc73c3@nvidia.com> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1597782897; bh=R8ElAAipytxv7LDeJFEKMMOsNvf9Imju1jxvhbxW+0A=; h=X-PGP-Universal:Subject:To:CC:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:X-Originating-IP: X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=VYeEQ6GucsHf9oORPA4LtN/7uyviib9oGSzDROy7UeTSQg85u0hvXiuzkx4C95nrB gk78d+3FwPkQrN6//NfXoR0qx8WMvVQBSmCOvin8zhbw8qgUrUiEHhoNa0tJK0I57H 8D5OwW1YVQDOWKSNq3B96UNt+Qg3/EwXdgcK1qWVJfWzMA9u/TTDOleqlOaPP1JSAi 8i1UQJCmnunkR5Q6BJanVlR2pyXRXS2AYl9cCQ0B8/05uUOLrgCtwIfqAJB6hhRilh zvmoKgj+LjJD0zWwRwvf8ZWiHyu7PYNj5F4PoWq+VARrPwtJw+gPpRCri0wEyD7J36 spO+lnbXoNV6A== X-Rspamd-Queue-Id: 048DF1004E418 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/18/20 10:06 AM, Ralph Campbell wrote: > > On 8/18/20 12:15 AM, Valmiki wrote: >> Hi All, >> >> Im trying to understand heterogeneous memory management, i have following doubts. >> >> If HMM is being used we dont have to use DMA controller on device for memory transfers ? Hi, Nothing about HMM either requires or prevents using DMA controllers. >> Without DMA if software is managing page faults and migrations, will there be any performance >> impacts ? >> >> Is HMM targeted for any specific use cases where DMA controller is not there on device ? >> >> Regards, >> Valmiki >> > > There are two APIs that are part of "HMM" and are independent of each other. > > hmm_range_fault() is for getting the physical address of a system resident memory page that > a device can map but is not pinned in the usual way I/O increases the page reference count > to pin the page. The device driver has to handle invalidation callbacks to remove the device > mapping. This lets the device access the page without moving it. > > migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() are used by the device > driver to migrate data to device private memory. After migration, the system memory is freed > and the CPU page table holds an invalid PTE that points to the device private struct page > (similar to a swap PTE). If the CPU process faults on that address, there is a callback > to the driver to migrate it back to system memory. This is where device DMA engines can > be used to copy data to/from system memory and device private memory. > > The use case for the above is to be able to run code such as OpenCL on GPUs and CPUs using > the same virtual addresses without having to call special memory allocators. > In other words, just use mmap() and malloc() and not clSVMAlloc(). > > There is a performance consideration here. If the GPU accesses the data over PCIe to > system memory, there is much less bandwidth than accessing local GPU memory. If the > data is to be accessed/used many times, it can be more efficient to migrate the data > to local GPU memory. If the data is only accessed a few times, then it is probably > more efficient to map system memory. > Ralph, that's a good write-up! Valmiki, did you already read Documentation/vm/hmm.rst, before posting your question? It's OK to say "no"--I'm not asking in order to criticize, but in order to calibrate the documentation. Because, we should consider merging in Ralph's write-up above into hmm.rst, depending on if it helps (which I expect it does, but I'm tainted by reading hmm.rst too many times and now I can't see what might be missing). Any time someone new tries to understand the system, it's an opportunity to "unit test" the documentation. Ideally, hmm.rst would answer many of a first-time reader's questions, that's where we'd like to end up. thanks, -- John Hubbard NVIDIA