From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 880F0C433EF for ; Mon, 20 Jun 2022 06:02:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D06516B0074; Mon, 20 Jun 2022 02:02:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB5DE6B0075; Mon, 20 Jun 2022 02:02:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7CE56B0078; Mon, 20 Jun 2022 02:02:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A88456B0074 for ; Mon, 20 Jun 2022 02:02:10 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8611EAF4 for ; Mon, 20 Jun 2022 06:02:10 +0000 (UTC) X-FDA: 79597568820.28.BC2CCB4 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf09.hostedemail.com (Postfix) with ESMTP id 23D2C1400AF for ; Mon, 20 Jun 2022 06:02:09 +0000 (UTC) Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-3177f4ce3e2so66023267b3.5 for ; Sun, 19 Jun 2022 23:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Xe3VeJwBL8Ebq+C8acPsdxpGtpDiQ/HFobU5cBrv3WU=; b=JGXEQN+I0XIHQ+X5Mcx0dO6wWlI7vTAu58RHk2tfY9oeE1RryAubJGzZLv3yuLijUw gydtnS0hDUSBpq9r3SsmFRmxKwivL+qEqqG4AeasEJPgKwGUW5ncwS8hDuaBe1by6iLC 4shKZoFhI0TDayMYjRURjmXduC89qZioy71wZUHaVlgavAIG9rFEBATExIyE2a7s/Lc/ gNmY5EcMGCyLhl0+wKhutdbW2QJ5Poc9OGfm3v+PoOows2woHUi1qoC2FO9C8S35db6d p9G1eS85wqQTc2cRC0ho7twEyZvjDUtmTk7i+PE8vZwGGFv/J/4SsrlLus365LDRYX6v rN2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Xe3VeJwBL8Ebq+C8acPsdxpGtpDiQ/HFobU5cBrv3WU=; b=TiTxUXxsNMNP0d0SFd6EvpalBYH/3bA2MI8nsp6xdDcgn9planDm80VGvB9NDR8IbF UzN9GPsGjwMxdF3NLhO/tPzNR8jLrizs8W/3639NyLbOm7Ub0mZ+5AnBgdOWZRnsLPn+ 2hXCkk8TFzcDLwscbaysCOv2bAZlqxZi5gVryxCcJPm/Aqdv6nME80MWo2KWGeMd7lY8 yn4EYBMLuzhqyksZIvp6PMoyJNlTW1DiwDspOFjqp8aDU/TJqrqru5qmgZVIOwopjkFy zxEzOc47MDGkuAZbZY81ew5cEr7FSH2W9QfXUrwstmBG90bNA+COTvzAnRu/i9TpN+2n q+RQ== X-Gm-Message-State: AJIora/+3YJEtriaD5g58Y3GWVJQZ23jybxBKW3xpEp/kAM6F2RS7eFQ /FXg+oqpiUnt8vbmJL9QTVWdNC6o3lQOcalM8lk= X-Google-Smtp-Source: AGRyM1tzeukUE0eoUNLmKpkTD0UfGuNi9OAgJD/BPDmnsr6lE4pkLipEhHJgaDKCB2nZwNfM9ZoCJ1FPCdi+Kl69/hM= X-Received: by 2002:a0d:f8c6:0:b0:2fe:ca9c:f937 with SMTP id i189-20020a0df8c6000000b002feca9cf937mr25655114ywf.62.1655704929245; Sun, 19 Jun 2022 23:02:09 -0700 (PDT) MIME-Version: 1.0 References: <20220531200041.24904-1-alex.sierra@amd.com> <20220531200041.24904-2-alex.sierra@amd.com> <3ac89358-2ce0-7d0d-8b9c-8b0e5cc48945@redhat.com> <02ed2cb7-3ad3-8ffc-6032-04ae1853e234@amd.com> <87bkuo898d.fsf@nvdebian.thelocal> In-Reply-To: <87bkuo898d.fsf@nvdebian.thelocal> From: Oded Gabbay Date: Mon, 20 Jun 2022 09:01:42 +0300 Message-ID: Subject: Re: [PATCH v5 01/13] mm: add zone device coherent type memory support To: Alistair Popple Cc: "Sierra Guiza, Alejandro (Alex)" , David Hildenbrand , Jason Gunthorpe , rcampbell@nvidia.com, Matthew Wilcox , "Kuehling, Felix" , amd-gfx list , linux-xfs@vger.kernel.org, linux-mm , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Maling list - DRI developers , Andrew Morton , linux-ext4@vger.kernel.org, Christoph Hellwig Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655704930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xe3VeJwBL8Ebq+C8acPsdxpGtpDiQ/HFobU5cBrv3WU=; b=QPeECdaivHqgCBTTx7X2uxmGAHU2qNT6ZXXOe1s5hLB9FU3x62i/VkcUZo59hFba9xm9rX ahNMESVYFTdb0Ja++6QR3EwcGYFpCo1X8Ja3MW61I4n6sXISUNK0nx3NCvEBlDQ5bKcpi1 gdFnJXnJ02s5d+H0lHZ/eDrb3HTT1Qg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655704930; a=rsa-sha256; cv=none; b=BeCFHftxk7hCioPf+X/ufbMbaXIOpO2iD26zTaKodBPDu3Hc+7iHjcgzxEgpxMag+LCPxr 1j7n4Da6zsa5OF1REnBA7NmyNgxm5sR5eaHBVqYURUoDf1GSbkO6U6p+ysnEyJw8TiIxAu KPwxMAy4PiaP93M0OMPUu3XqU9xxICU= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JGXEQN+I; spf=pass (imf09.hostedemail.com: domain of oded.gabbay@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=oded.gabbay@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: d6ocx65xdf6t3zsqx1uozabsa9os7qye Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JGXEQN+I; spf=pass (imf09.hostedemail.com: domain of oded.gabbay@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=oded.gabbay@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 23D2C1400AF X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1655704929-424074 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple wrote: > > > Oded Gabbay writes: > > > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) > > wrote: > >> > >> > >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: > >> > On 31.05.22 22:00, Alex Sierra wrote: > >> >> Device memory that is cache coherent from device and CPU point of view. > >> >> This is used on platforms that have an advanced system bus (like CAPI > >> >> or CXL). Any page of a process can be migrated to such memory. However, > >> >> no one should be allowed to pin such memory so that it can always be > >> >> evicted. > >> >> > >> >> Signed-off-by: Alex Sierra > >> >> Acked-by: Felix Kuehling > >> >> Reviewed-by: Alistair Popple > >> >> [hch: rebased ontop of the refcount changes, > >> >> removed is_dev_private_or_coherent_page] > >> >> Signed-off-by: Christoph Hellwig > >> >> --- > >> >> include/linux/memremap.h | 19 +++++++++++++++++++ > >> >> mm/memcontrol.c | 7 ++++--- > >> >> mm/memory-failure.c | 8 ++++++-- > >> >> mm/memremap.c | 10 ++++++++++ > >> >> mm/migrate_device.c | 16 +++++++--------- > >> >> mm/rmap.c | 5 +++-- > >> >> 6 files changed, 49 insertions(+), 16 deletions(-) > >> >> > >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >> >> index 8af304f6b504..9f752ebed613 100644 > >> >> --- a/include/linux/memremap.h > >> >> +++ b/include/linux/memremap.h > >> >> @@ -41,6 +41,13 @@ struct vmem_altmap { > >> >> * A more complete discussion of unaddressable memory may be found in > >> >> * include/linux/hmm.h and Documentation/vm/hmm.rst. > >> >> * > >> >> + * MEMORY_DEVICE_COHERENT: > >> >> + * Device memory that is cache coherent from device and CPU point of view. This > >> >> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A > >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory > >> >> + * type. Any page of a process can be migrated to such memory. However no one > >> > Any page might not be right, I'm pretty sure. ... just thinking about special pages > >> > like vdso, shared zeropage, ... pinned pages ... > >> > >> Hi David, > >> > >> Yes, I think you're right. This type does not cover all special pages. > >> I need to correct that on the cover letter. > >> Pinned pages are allowed as long as they're not long term pinned. > >> > >> Regards, > >> Alex Sierra > > > > What if I want to hotplug this device's coherent memory, but I do > > *not* want the OS > > to migrate any page to it ? > > I want to fully-control what resides on this memory, as I can consider > > this memory > > "expensive". i.e. I don't have a lot of it, I want to use it for > > specific purposes and > > I don't want the OS to start using it when there is some memory pressure in > > the system. > > This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent > pages are only allocated by a device driver and exposed to user-space by > a driver migrating pages to them with migrate_vma. The OS can't just > start using them due to memory pressure for example. > > - Alistair Thanks for the explanation. I guess the commit message confused me a bit, especially these two sentences: "Any page of a process can be migrated to such memory. However no one should be allowed to pin such memory so that it can always be evicted." I read them as if the OS is free to choose which pages are migrated to this memory, and anything is eligible for migration to that memory (and that's why we also don't allow it to pin memory there). If we are not allowed to pin anything there, can the device driver decide to disable any option for oversubscription of this memory area ? Let's assume the user uses this memory area for doing p2p with other CXL devices. In that case, I wouldn't want the driver/OS to migrate pages in and out of that memory... So either I should let the user pin those pages, or prevent him from doing (accidently or not) oversubscription in this memory area. wdyt ? > > > Oded > > > >> > >> > > >> >> + * should be allowed to pin such memory so that it can always be evicted. > >> >> + * > >> >> * MEMORY_DEVICE_FS_DAX: > >> >> * Host memory that has similar access semantics as System RAM i.e. DMA > >> >> * coherent and supports page pinning. In support of coordinating page > >> >> @@ -61,6 +68,7 @@ struct vmem_altmap { > >> >> enum memory_type { > >> >> /* 0 is reserved to catch uninitialized type fields */ > >> >> MEMORY_DEVICE_PRIVATE = 1, > >> >> + MEMORY_DEVICE_COHERENT, > >> >> MEMORY_DEVICE_FS_DAX, > >> >> MEMORY_DEVICE_GENERIC, > >> >> MEMORY_DEVICE_PCI_P2PDMA, > >> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) > >> > In general, this LGTM, and it should be correct with PageAnonExclusive I think. > >> > > >> > > >> > However, where exactly is pinning forbidden? > >> > >> Long-term pinning is forbidden since it would interfere with the device > >> memory manager owning the > >> device-coherent pages (e.g. evictions in TTM). However, normal pinning > >> is allowed on this device type. > >> > >> Regards, > >> Alex Sierra > >> > >> >