From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77B25C433DB for ; Thu, 31 Dec 2020 08:00:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C4F5222288 for ; Thu, 31 Dec 2020 08:00:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C4F5222288 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1FC88D00B3; Thu, 31 Dec 2020 03:00:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED0858D00AE; Thu, 31 Dec 2020 03:00:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D97278D00B3; Thu, 31 Dec 2020 03:00:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id C2F598D00AE for ; Thu, 31 Dec 2020 03:00:51 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 313F4362D for ; Thu, 31 Dec 2020 08:00:51 +0000 (UTC) X-FDA: 77652831102.30.boot98_430d511274ac Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 10821180B3AA7 for ; Thu, 31 Dec 2020 08:00:51 +0000 (UTC) X-HE-Tag: boot98_430d511274ac X-Filterd-Recvd-Size: 9290 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 31 Dec 2020 08:00:50 +0000 (UTC) Received: by mail-ej1-f41.google.com with SMTP id qw4so24547219ejb.12 for ; Thu, 31 Dec 2020 00:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=4vZcplAIb41k4LaB6dKNbkRj1W5i0zw8fIZIC7vJy8k=; b=og+5JY4E3K4UPnN5TiFOOScjvaN/w5yE7Bg5E5Nk1ufD8N77pqk1f+zRfF9L1FVUkp hSBs8/203UT342aufzMtkzaIRltYNmRxuNHFXA6Ydd78Tm+ZTCQ8hseCdHG/H5raSS1j RsoMKMpG431hTuvucAo8IC7DYC1qQUyD9NfQDHp9kFHduTTsb65AIoIrSjzWbqgfHk32 FhvjmZpgYZfv/B03Z2qSK4AG7nsUT99nKz3q8fjSbN05oc+Q7kexg+cejWKyV8PjggCA LWC2nofzjR51hoXhTpMqu95zJg9mOz5HjNL8obkBcR6NXOs3e+jtE1ejS1At3Am3N/6k 5Z2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4vZcplAIb41k4LaB6dKNbkRj1W5i0zw8fIZIC7vJy8k=; b=oBF54wEQGUrK+7n3USdqUeh2bw5IK0cmwF+Fb/FIi1Vmxa30AgE5w5awfC1nEvQxkw lD16tKakXFT/UQNwXTu/o1T2td5w8OUsjcXYayBaC92PO5vmeIVW4rT3jBQWTbwivmxP otVlNP8WPOShlZu2cTaJtXpZWjcP9QSE1wqRM+hs/qVFGtmkYhAdKqRnEbw5kUxiK4jN P9k8MVksKf/xVSB1crqIHFdLShXHYopiY1Ap6fQD8sK7T9AD/6q5x+TfJ4RGuUe+MTA8 cV45jRZxhIGE9BJxKhsB4AcViolSvoDxQ9B2xR6R1IkYXFP09mtRR4QoC8Rb4I/LKYTX Hpcw== X-Gm-Message-State: AOAM530TcEuPQumeLZuNUCC9Mq3tIaf12AWAFSWl2L1b/BdcEvdrzGoJ dq7//JmeLOyYhWVzKFHwKgDk5QK7IUSskZ2N9glR X-Google-Smtp-Source: ABdhPJzUYyZKN/oWREBg8SGLRKnYI6sZMmeS65RWsW2GrEDgzsI9LiI+njvKlfqjdSPeAoYGVse9La7+ch/x1AsJ/IQ= X-Received: by 2002:a17:906:edc8:: with SMTP id sb8mr52668753ejb.247.1609401648958; Thu, 31 Dec 2020 00:00:48 -0800 (PST) MIME-Version: 1.0 References: <20201222145221.711-1-xieyongji@bytedance.com> <2b24398c-e6d9-14ec-2c0d-c303d528e377@redhat.com> <1356137727.40748805.1609233068675.JavaMail.zimbra@redhat.com> <3fc6a132-9fc2-c4e2-7fb1-b5a8bfb771fa@redhat.com> <0885385c-ae46-158d-eabf-433ef8ecf27f@redhat.com> <79741d5d-0c35-ad1c-951a-41d8ab3b36a0@redhat.com> In-Reply-To: <79741d5d-0c35-ad1c-951a-41d8ab3b36a0@redhat.com> From: Yongji Xie Date: Thu, 31 Dec 2020 16:00:38 +0800 Message-ID: Subject: Re: Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message To: Jason Wang Cc: "Michael S. Tsirkin" , Stefan Hajnoczi , sgarzare@redhat.com, Parav Pandit , akpm@linux-foundation.org, Randy Dunlap , Matthew Wilcox , viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 31, 2020 at 3:12 PM Jason Wang wrote: > > > On 2020/12/31 =E4=B8=8B=E5=8D=882:52, Yongji Xie wrote: > > On Thu, Dec 31, 2020 at 1:50 PM Jason Wang wrote: > >> > >> On 2020/12/31 =E4=B8=8B=E5=8D=881:15, Yongji Xie wrote: > >>> On Thu, Dec 31, 2020 at 10:49 AM Jason Wang wro= te: > >>>> On 2020/12/30 =E4=B8=8B=E5=8D=886:12, Yongji Xie wrote: > >>>>> On Wed, Dec 30, 2020 at 4:41 PM Jason Wang wr= ote: > >>>>>> On 2020/12/30 =E4=B8=8B=E5=8D=883:09, Yongji Xie wrote: > >>>>>>> On Wed, Dec 30, 2020 at 2:11 PM Jason Wang = wrote: > >>>>>>>> On 2020/12/29 =E4=B8=8B=E5=8D=886:26, Yongji Xie wrote: > >>>>>>>>> On Tue, Dec 29, 2020 at 5:11 PM Jason Wang wrote: > >>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>> On Mon, Dec 28, 2020 at 4:43 PM Jason Wang wrote: > >>>>>>>>>>>> On 2020/12/28 =E4=B8=8B=E5=8D=884:14, Yongji Xie wrote: > >>>>>>>>>>>>>> I see. So all the above two questions are because VHOST_IO= TLB_INVALIDATE > >>>>>>>>>>>>>> is expected to be synchronous. This need to be solved by t= weaking the > >>>>>>>>>>>>>> current VDUSE API or we can re-visit to go with descriptor= s relaying > >>>>>>>>>>>>>> first. > >>>>>>>>>>>>>> > >>>>>>>>>>>>> Actually all vdpa related operations are synchronous in cur= rent > >>>>>>>>>>>>> implementation. The ops.set_map/dma_map/dma_unmap should no= t return > >>>>>>>>>>>>> until the VDUSE_UPDATE_IOTLB/VDUSE_INVALIDATE_IOTLB message= is replied > >>>>>>>>>>>>> by userspace. Could it solve this problem? > >>>>>>>>>>>> I was thinking whether or not we need to generate IOT= LB_INVALIDATE > >>>>>>>>>>>> message to VDUSE during dma_unmap (vduse_dev_unmap_page). > >>>>>>>>>>>> > >>>>>>>>>>>> If we don't, we're probably fine. > >>>>>>>>>>>> > >>>>>>>>>>> It seems not feasible. This message will be also used in the > >>>>>>>>>>> virtio-vdpa case to notify userspace to unmap some pages duri= ng > >>>>>>>>>>> consistent dma unmapping. Maybe we can document it to make su= re the > >>>>>>>>>>> users can handle the message correctly. > >>>>>>>>>> Just to make sure I understand your point. > >>>>>>>>>> > >>>>>>>>>> Do you mean you plan to notify the unmap of 1) streaming DMA o= r 2) > >>>>>>>>>> coherent DMA? > >>>>>>>>>> > >>>>>>>>>> For 1) you probably need a workqueue to do that since dma unma= p can > >>>>>>>>>> be done in irq or bh context. And if usrspace does't do the un= map, it > >>>>>>>>>> can still access the bounce buffer (if you don't zap pte)? > >>>>>>>>>> > >>>>>>>>> I plan to do it in the coherent DMA case. > >>>>>>>> Any reason for treating coherent DMA differently? > >>>>>>>> > >>>>>>> Now the memory of the bounce buffer is allocated page by page in = the > >>>>>>> page fault handler. So it can't be used in coherent DMA mapping c= ase > >>>>>>> which needs some memory with contiguous virtual addresses. I can = use > >>>>>>> vmalloc() to do allocation for the bounce buffer instead. But it = might > >>>>>>> cause some memory waste. Any suggestion? > >>>>>> I may miss something. But I don't see a relationship between the > >>>>>> IOTLB_UNMAP and vmalloc(). > >>>>>> > >>>>> In the vmalloc() case, the coherent DMA page will be taken from the > >>>>> memory allocated by vmalloc(). So IOTLB_UNMAP is not needed anymore > >>>>> during coherent DMA unmapping because those vmalloc'ed memory which > >>>>> has been mapped into userspace address space during initialization = can > >>>>> be reused. And userspace should not unmap the region until we destr= oy > >>>>> the device. > >>>> Just to make sure I understand. My understanding is that IOTLB_UNMAP= is > >>>> only needed when there's a change the mapping from IOVA to page. > >>>> > >>> Yes, that's true. > >>> > >>>> So if we stick to the mapping, e.g during dma_unmap, we just put IOV= A to > >>>> free list to be used by the next IOVA allocating. IOTLB_UNMAP could = be > >>>> avoided. > >>>> > >>>> So we are not limited by how the pages are actually allocated? > >>>> > >>> In coherent DMA cases, we need to return some memory with contiguous > >>> kernel virtual addresses. That is the reason why we need vmalloc() > >>> here. If we allocate the memory page by page, the corresponding kerne= l > >>> virtual addresses in a contiguous IOVA range might not be contiguous. > >> > >> Yes, but we can do that as what has been done in the series > >> (alloc_pages_exact()). Or do you mean it would be a little bit hard to > >> recycle IOVA/pages here? > >> > > Yes, it might be hard to reuse the memory. For example, we firstly > > allocate 1 IOVA/page during dma_map, then the IOVA is freed during > > dma_unmap. Actually we can't reuse this single page if we need a > > two-pages area in the next IOVA allocating. So the best way is using > > IOTLB_UNMAP to free this single page during dma_unmap too. > > > > Thanks, > > Yongji > > > I get you now. Then I agree that let's go with IOTLB_UNMAP. > Fine, will do it. Thanks, Yongji