From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E7A3C3DA47 for ; Wed, 10 Jul 2024 16:35:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D8576B0088; Wed, 10 Jul 2024 12:35:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 988636B0095; Wed, 10 Jul 2024 12:35:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84FC86B0096; Wed, 10 Jul 2024 12:35:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 62CAE6B0088 for ; Wed, 10 Jul 2024 12:35:03 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 124DC1C041F for ; Wed, 10 Jul 2024 16:35:03 +0000 (UTC) X-FDA: 82324392486.04.F64CD80 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 35E0318001D for ; Wed, 10 Jul 2024 16:35:00 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WMhRogxg; spf=pass (imf16.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720629269; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BF9ive2p9YXUMgbnwW6FuoRy0oF4ePAeLEyGAEOkhY0=; b=uBp556C8GNrU8AS4z6JKCJI0iraeB332IIzPwUctg9YdWwK1iilavWSE1Ar8OYKdR1qVva rYbpUPE3/OMTzpy1B1Vvg9Ho5bq+6lxqAQuDV0rob8Zhzggmti/IhNeRqvIkQ00Pz9VOdp onqa40ViOzL8Njoh1AhHaGdewEg08b8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720629269; a=rsa-sha256; cv=none; b=NsfjarArlID9vm1Jhjk+o49vnTvxnTOXmdVbqKXGYhsRYLr4q+ddUB71IF82qe/rIoUCdr uNc+hJSZrvD+gC4blqOTEpeBUEFTP3pNmjWmiDGGyqlP5qL+bo2nn25TW5//G9GmTRcivs jceS5LWVbItoyuqYkwlRl2MVBJnyROs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WMhRogxg; spf=pass (imf16.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-e0361c767ddso6304238276.1 for ; Wed, 10 Jul 2024 09:35:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720629300; x=1721234100; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BF9ive2p9YXUMgbnwW6FuoRy0oF4ePAeLEyGAEOkhY0=; b=WMhRogxgFA7Ygeq5oONWddahjXZznG1qh1fe1T1UKv+FqM9TKRn6+rxBfz7wwVHPHF 7azxOZJ+zlAHal2Xbuti5IbrTQW6iVCNZajBQRwATSlsEhztzmk1PAKnaSao+IEjCCNW b29llwNEkqDjJY6F68Ocf3gBTYYaT9wTbAZwhmkwFpfyhpuwlR99ity6NjFOvnOAZUv2 1mc6yrGKaBcLX/Aaj9O+/+smATIzlPtEypKiYT9RTVLCCdtWOY6Yx5oK1IV2kW7ZuA2t 7cuRsGw2vp3+4fg9EA4ilY1rBkmoAvoO7oxgGUDHImr9FvyXE7hu9wnWc/M2UXCd/gM1 aJvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720629300; x=1721234100; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BF9ive2p9YXUMgbnwW6FuoRy0oF4ePAeLEyGAEOkhY0=; b=bRh7YkSsK7a214RPUxwrqVPDUKQ1mNT6NH0R3fLJ1vcztZYaloVgb4KpmaEg7ruzGe zDEj7MlRzGuN2zdlzrzHwRz70zo5xz8suV1E6elUGbZ6Ha1sX/OEohbFS1jFs22FmT3E fbedKMdCmfqGf1j29JpOws7w4LNb/jPS0b4RWTQ37vbjbKcfQe12bI4dcJCLcTsx9T+8 6gar2ZJGe7qPWZz/wMRuRKv9EiT5rzzFxnpIdYrawcGx+aEGu9uweOxRGWzUhxs372BN ohzoleI7BWnrupIP1883c3x/KsODlzh+8QsGyi1Lycs9c5HB12SvDMj6wHoJjHoJIL2/ 6XZw== X-Forwarded-Encrypted: i=1; AJvYcCU7oF9yHbvQNSQZ5+khbrybJl7X8MTeluy/aJHesBvpfh7ZQ1dAYZF8XK5cppqARjmsq4BHt4tTmIe6Y0nvEMiMeh8= X-Gm-Message-State: AOJu0Yxb1EVWCd7sYNyr++F1u/EO/f2OIX3kPdEHtCLzKLCuxcSJyANY vRydiM0+AItYXobhL7KLrnlzd0I5A36bHqzDCsfDf93vyycbjUkfcmOyqf/cDd7HddTM5sIG9he fjHiZiplyAmZgIl5P9sQxnLBiZ2thn2EsGztQ X-Google-Smtp-Source: AGHT+IHZ5TRrLFlHZ0PUpr8ak7tkgpez+qbokWJEToBW1Plq4/Xn+bXBEL4Smbx2jz+0wQ10oMADhbvO2AMcGSXJNgc= X-Received: by 2002:a25:26c8:0:b0:e03:5d07:e17a with SMTP id 3f1490d57ef6-e041b17195bmr7050542276.56.1720629299714; Wed, 10 Jul 2024 09:34:59 -0700 (PDT) MIME-Version: 1.0 References: <20240710135757.25786-1-liulei.rjpt@vivo.com> <5e5ee5d3-8a57-478a-9ce7-b40cab60b67d@amd.com> <0393cf47-3fa2-4e32-8b3d-d5d5bdece298@amd.com> In-Reply-To: From: "T.J. Mercier" Date: Wed, 10 Jul 2024 09:34:47 -0700 Message-ID: Subject: Re: [PATCH 0/2] Support direct I/O read and write for memory allocated by dmabuf To: Lei Liu Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , Sumit Semwal , Benjamin Gaignard , Brian Starkey , John Stultz , Andrew Morton , David Hildenbrand , Matthew Wilcox , Muhammad Usama Anjum , Andrei Vagin , Ryan Roberts , Kefeng Wang , linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Daniel Vetter , "Vetter, Daniel" , opensource.kernel@vivo.com, quic_sukadev@quicinc.com, quic_cgoldswo@quicinc.com, Akilesh Kailash Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 35E0318001D X-Stat-Signature: zyhyxh6xca33ou1n7rty5fwx8pz5buw6 X-HE-Tag: 1720629300-521608 X-HE-Meta: U2FsdGVkX1/Vmvgp2L60uZjYAl3ijyYuZyxQ0xx2E2NYPJUmAX6irlJV0OPw+1dBulAXTHP6joaK19v1DCJlBS678mEw8J0a5czxrX4aEAyeFZ/5YV6i6AfrlkDGnA3UUAWC1q2h6WzhhTSvCZutata/wjcOwe/PCWb6zM8lZlQrm69uSvatbjIce0DAtg48u56Ijkt+QKmYSMZFL2lSVNceXuQAnxXmtYvB2qB1t5L+uWG1VmdA1KzGOIvVNA6V3YVI9SkRvBcc22rGnLQozvM2La3H4COKZGJ/oRlbTlHjh15yznLrykb8DlpLWyDyn6nEyGF1MINYXkFoC535Y5B7/Pq5xZWPamnkX9qwYybnnaJotGORVSrXf7VVdlHz4B0Z1r7dCkyee40L2ZYz96B8F2r0OXBiydJGQBPyWT4edgKN9muRT/FXNXKEJbNDGCk+QvVTuUQHYfg1GYgYsUWPl8mHpcBrRjD9P4gQh0UbOjaKcVTfaf89WXPzQWnCTAM4BXFMhZ7WDaBue3yXUyg4aZkifNOa2PhcmT8+YXC/7TZlZs316Qtv6bwecuOmS78eGv1mMhjronoBe/4uxJxuTZCB10/71yzRADLdk2j6xwr+YRKTGHrOHPPpGW7Y16igFsx1/rxZH8y4Xy0mH2L/EFwYsuPMdysz78902QLGrirxnxk0ata8w999uQGqIyNhAvMd8sAl6NoWGXJNgqSMNqG1mqh+3oOBTSAUqXyUMxQxEtwO9IdquIeU2JTdGJfkaNW/o6dUpoG9ISrySaXoxySp3vhBNWaqzKJjq6hypbYbvXDgSzz6qrRrqpHM56rQBIX3wTh473yakhYY1hvtmyNXHf63Ajzfp5gtVEogfJQsfeNTvAJwmqTDs/RbTH47EvAezCbHPFu6KKNB91bD8CfhqE+7dCGQbGArD8i5OGsh018XK1TkQ354uFHtYAV4Z3ta2r9mdc0Pmht hbJK7TlU RWKeQKaqu1xwmreXI26N1QSMY+s9C2H2dqSnoeL2QFfobCo/sS0+X7pwr/g1bb1M8OWoJlCtHU2/xuVNMuq2IOUw8dvC/YbZWUbxHxQsTBEDx4RhCx17zX02pjAT8K177pz1Q0WA9ueEBH3pbBZJV56NNMrXO5GgEHD4DEjw3Fsrw74338SDSExtrGWG8hsh83BSzoB5VC6luDNLb0q8g6tixttux2dkL01YgcCozq/1X2jE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 10, 2024 at 8:08=E2=80=AFAM Lei Liu wrot= e: > > > on 2024/7/10 22:48, Christian K=C3=B6nig wrote: > > Am 10.07.24 um 16:35 schrieb Lei Liu: > >> > >> on 2024/7/10 22:14, Christian K=C3=B6nig wrote: > >>> Am 10.07.24 um 15:57 schrieb Lei Liu: > >>>> Use vm_insert_page to establish a mapping for the memory allocated > >>>> by dmabuf, thus supporting direct I/O read and write; and fix the > >>>> issue of incorrect memory statistics after mapping dmabuf memory. > >>> > >>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-buf= s. > >> > >> Hello! Could you explain why direct_io is disabled on DMABUF? Is > >> there any historical reason for this? > > > > It's basically one of the most fundamental design decision of DMA-Buf. > > The attachment/map/fence model DMA-buf uses is not really compatible > > with direct I/O on the underlying pages. > > Thank you! Is there any related documentation on this? I would like to > understand and learn more about the fundamental reasons for the lack of > support. Hi Lei and Christian, This is now the third request I've seen from three different companies who are interested in this, but the others are not for reasons of read performance that you mention in the commit message on your first patch. Someone else at Google ran a comparison between a normal read() and a direct I/O read() into a preallocated user buffer and found that with large readahead (16 MB) the throughput can actually be slightly higher than direct I/O. If you have concerns about read performance, have you tried increasing the readahead size? The other motivation is to load a gajillion byte file from disk into a dmabuf without evicting the entire contents of pagecache while doing so. Something like this (which does not currently work because read() tries to GUP on the dmabuf memory as you mention): static int dmabuf_heap_alloc(int heap_fd, size_t len) { struct dma_heap_allocation_data data =3D { .len =3D len, .fd =3D 0, .fd_flags =3D O_RDWR | O_CLOEXEC, .heap_flags =3D 0, }; int ret =3D ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data); if (ret < 0) return ret; return data.fd; } int main(int, char **argv) { const char *file_path =3D argv[1]; printf("File: %s\n", file_path); int file_fd =3D open(file_path, O_RDONLY | O_DIRECT); struct stat st; stat(file_path, &st); ssize_t file_size =3D st.st_size; ssize_t aligned_size =3D (file_size + 4095) & ~4095; printf("File size: %zd Aligned size: %zd\n", file_size, aligned_siz= e); int heap_fd =3D open("/dev/dma_heap/system", O_RDONLY); int dmabuf_fd =3D dmabuf_heap_alloc(heap_fd, aligned_size); void *vm =3D mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE, MAP_SHARED, dmabuf_fd, 0); printf("VM at 0x%lx\n", (unsigned long)vm); dma_buf_sync sync_flags { DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE }; ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); ssize_t rc =3D read(file_fd, vm, file_size); printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : ""); sync_flags.flags =3D DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE; ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); } Or replace the mmap() + read() with sendfile(). So I would also like to see the above code (or something else similar) be able to work and I understand some of the reasons why it currently does not, but I don't understand why we should actively prevent this type of behavior entirely. Best, T.J. > > > >>> > >>> We already discussed enforcing that in the DMA-buf framework and > >>> this patch probably means that we should really do that. > >>> > >>> Regards, > >>> Christian. > >> > >> Thank you for your response. With the application of AI large model > >> edgeification, we urgently need support for direct_io on DMABUF to > >> read some very large files. Do you have any new solutions or plans > >> for this? > > > > We have seen similar projects over the years and all of those turned > > out to be complete shipwrecks. > > > > There is currently a patch set under discussion to give the network > > subsystem DMA-buf support. If you are interest in network direct I/O > > that could help. > > Is there a related introduction link for this patch? > > > > > Additional to that a lot of GPU drivers support userptr usages, e.g. > > to import malloced memory into the GPU driver. You can then also do > > direct I/O on that malloced memory and the kernel will enforce correct > > handling with the GPU driver through MMU notifiers. > > > > But as far as I know a general DMA-buf based solution isn't possible. > > 1.The reason we need to use DMABUF memory here is that we need to share > memory between the CPU and APU. Currently, only DMABUF memory is > suitable for this purpose. Additionally, we need to read very large files= . > > 2. Are there any other solutions for this? Also, do you have any plans > to support direct_io for DMABUF memory in the future? > > > > > Regards, > > Christian. > > > >> > >> Regards, > >> Lei Liu. > >> > >>> > >>>> > >>>> Lei Liu (2): > >>>> mm: dmabuf_direct_io: Support direct_io for memory allocated by > >>>> dmabuf > >>>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf > >>>> allocated > >>>> memory with direct_io support > >>>> > >>>> drivers/dma-buf/heaps/system_heap.c | 5 +++-- > >>>> fs/proc/task_mmu.c | 8 +++++++- > >>>> include/linux/mm.h | 1 + > >>>> mm/memory.c | 15 ++++++++++----- > >>>> mm/rmap.c | 9 +++++---- > >>>> 5 files changed, 26 insertions(+), 12 deletions(-) > >>>> > >>> > >