From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D778EB64DA for ; Thu, 22 Jun 2023 08:25:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBB608D0002; Thu, 22 Jun 2023 04:25:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6B4B8D0001; Thu, 22 Jun 2023 04:25:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C31E18D0002; Thu, 22 Jun 2023 04:25:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B4DD08D0001 for ; Thu, 22 Jun 2023 04:25:28 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8724BB04CE for ; Thu, 22 Jun 2023 08:25:28 +0000 (UTC) X-FDA: 80929699536.24.EDDF6F8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 361F3C0020 for ; Thu, 22 Jun 2023 08:25:25 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G7xOmh1l; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687422326; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rZXaqCVfN5pwWKG3rKtxQoevt0a1zXC+RoOCtcXhifY=; b=3V596Ue16GlWJ4dFTJpUH2XTurkH1DbRggynX5zE7bUbqKS//Q3qiuK2EzGzZvpK6Nmmh6 SAnNMt5uZ2I0enSiv4/mRkFwuFnpP8Z/P4ZYGF+aLFY/dCX2jFospxk4NNeUK5Up9WRQWS iPk9sxCvjAvfauXoZp2kFYLG4is/f/Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687422326; a=rsa-sha256; cv=none; b=4HIcpeSE95r8muNg5ueA7SdDEFG7EMfNw8CzY4Nfp1crCj1ouWyEDEP2kqe6BJ5VdTkrfK +P6RFWd2GSXct/pWsDaWF0hfAqCURcpUepbmSO0p5I3rEEgLLbKZugFNChldDMLIk0DE9O CtDKD6/JeodcHD3UqIjIepER5NZjAtc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G7xOmh1l; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687422324; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rZXaqCVfN5pwWKG3rKtxQoevt0a1zXC+RoOCtcXhifY=; b=G7xOmh1lf5VnxHYHXkry/5JFd8jhXiwP2TeLxQLJKgI2p7QpWzs2W5/95ae+TuVO+hIh0w 56zAU71ZN7DRVawXoFNKsfci0bvwhOS5i7rdxnVi17wFmEes2auZkATissglal/Ip0pyyI L42fdw7so/7qqsqO4pk05G3/9Pae514= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-153-A9ZBKL02McqQGZ8hj55M_Q-1; Thu, 22 Jun 2023 04:25:21 -0400 X-MC-Unique: A9ZBKL02McqQGZ8hj55M_Q-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-4ecb00906d0so3432927e87.1 for ; Thu, 22 Jun 2023 01:25:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687422319; x=1690014319; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rZXaqCVfN5pwWKG3rKtxQoevt0a1zXC+RoOCtcXhifY=; b=gumn+Nv6zIT7+aU5uoIM9MDqu8etg1fsIMiLKqChxpP2OKcVytuDrBeZpeFQZk8Bd4 vEOOdSDWimUAHra0YMU7NO+B3o7gAZp43oi6u/OIHM8h0+n92UCLdDpNvZaG8W0Zjf/U d/tMxrKL3hpKt7KsWskOM9ija7lhMGqZy9pgFa5tlW3OiY2/aC/RHNCE1uTAF/4X2R0E KwaZD/0U2DD8kuYoK1glWHdEf6+bJpBQ6TK56xfZxfwSEQaZcnXDpRPlGCKIbl5Wi2yR BITr51v9fbYBk/hqDsVUP5chU7fed0Hl9zRNEii39upb8W0gibcI5mYhlIp5jhmitr7b 9h6A== X-Gm-Message-State: AC+VfDwQ5DsA99zhSMZguF/9TkfUM3Zd330idI7NvGsN2rBzStlNrkfy M9wV/YlYwJuvko+/JxKiKiPpdorwZcMDss94XjednTYKspF+AzveIcKSrGJDak+iN0CW8BCD3vQ pqM5Dq8AWuqA= X-Received: by 2002:a19:5e01:0:b0:4f6:45af:70b8 with SMTP id s1-20020a195e01000000b004f645af70b8mr10006751lfb.58.1687422319532; Thu, 22 Jun 2023 01:25:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4qHq2NkQm/h8KwcL2Gs4ziUytBjL4OQtI5NDyjVyLs7lue4yBBVXXSWHlZdfjTNRloVUjujw== X-Received: by 2002:a19:5e01:0:b0:4f6:45af:70b8 with SMTP id s1-20020a195e01000000b004f645af70b8mr10006730lfb.58.1687422319088; Thu, 22 Jun 2023 01:25:19 -0700 (PDT) Received: from ?IPV6:2003:cb:c71a:c00:a3d3:85e8:742c:2e9? (p200300cbc71a0c00a3d385e8742c02e9.dip0.t-ipconnect.de. [2003:cb:c71a:c00:a3d3:85e8:742c:2e9]) by smtp.gmail.com with ESMTPSA id e17-20020a5d5011000000b002c70ce264bfsm6448123wrt.76.2023.06.22.01.25.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Jun 2023 01:25:18 -0700 (PDT) Message-ID: <6e429fbc-e0e6-53c0-c545-2e2cbbe757de@redhat.com> Date: Thu, 22 Jun 2023 10:25:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Vivek Kasireddy , dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Mike Kravetz , Gerd Hoffmann , Dongwon Kim , Andrew Morton , James Houghton , Jerome Marchand , Junxiao Chang , "Kirill A . Shutemov" , Michal Hocko , Muchun Song , Jason Gunthorpe , John Hubbard References: <20230622072710.3707315-1-vivek.kasireddy@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages In-Reply-To: <20230622072710.3707315-1-vivek.kasireddy@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 361F3C0020 X-Rspam-User: X-Stat-Signature: 7o31tfswg94mak34zdxhz5d8woefee1q X-Rspamd-Server: rspam03 X-HE-Tag: 1687422325-424428 X-HE-Meta: U2FsdGVkX18c9MsiGjpuuqnF6DwKdXTZ+YvFyxaMx6k7jUwSjYOpEIohN3LmlInBnsrk4eEhons3zPxXz2kSWfB36nVXWetqBo9I0H5O1WdrAdAZbiHln7/J0bTZOW9U8sg0JhnsvQoqxJSy2TaZNoK0aPFA7mDUkau66n15kwXIBNUW1XlztbDbLFqkiAmmD/4Kdyf37vYwuMpUHzpy3miFsUzEImdYxsdXFoeA8gBrhd31RyfBrZRRVcsfBjFYIG5Ykyo6Zq7ibu8rQSwTgZEWfLvJmq2U1oGe2I4r4G+4BSspHstFsS20kV9mZsF+BZfrxZy5yW4wRdRPhwuHhkBf2p7eIPB47SCXLvrXishr8aC0SmfbP3zBaOB9pP/wu5S0I2R+TtSmcjYPVwPIVIQRnXKlpfs7/vzUh1cA/cXa+qRrmbohDE0zYlFv6O0FxtdqMmseuK42uwN6auaXFapN8C8T8ojbtrdWv5VF3b4pekqHzUeHRIPUewgEE178XYr83Qoxt3FMRmt66gdj+szhI/HU6/+2rMWgtWDbpRz24V3tUeG39Bmt59b2BADJnKAHxuanwhjjryg+LLW9sEAhYPRifyAYdjpBt6BR3XInx/b+5eOW+dfVe5JlN/+9tzW0iVvCAY8MLopO3ZThYAkjL6A/ZovrhFSMRLW7GfBG0JO9IY50m2yNyx/2afP8tIRBNUSqdmvWUH7a+ITnmlaZ24JmlGMeHx33gAw317XabYZKKIpjSxVlkw1ohaRNVhcGcq/oHzEpZpetp6VBpkggdaBw7P8qb1ljdL3h43djLXZqKzt8y52byzHJTmVj3O7Q7TSzdwirjFfuAEbqsX3Q3CmJH0hWr9carD5yaFmxN/sNsu2ms+/0mX2STZ5NYb4igqw43Brctyr6N5LVwZ55MUhkWkZ4CshvRoxdUMzwMitYwHmsuFlfT5fnpilEzYmbQUA//++6YNDEkBe BvbzIhVW N0cOnzX3VJbZRWBXNPRhS0noHqL2jAxL370nsEttbUTV11nXp/QQjiMysWLqqAWe9Zcuyp2oB7NfZP4NXoPfnHfbKO/Wz76sJ6INbjlN7qM4aH2Eff1sYG62vxS8H2X2lMfKHZNPrjhlrP+5tzOxuki9fL7N7N6y8l2qtLJnLkmDFNKui3mSo5yeEnkoKd4ZskO/7x1kon257uRnW4P2k+G1MWejM7jfyIv7hvGSogYvcoLRVKAU+uKWdmBuD3SqlrG7xji8ncqd1jzOATa5eSWunmTV5ERD9nvm6J6veN+8F0Rvtxlh7rD7x6xEW5rKlvpKafZAMNrfnu7F5YR3kCetPPfl641WpdTGpppBzimJolQSKII0m7GzYZHPoumE1kTyE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22.06.23 09:27, Vivek Kasireddy wrote: > The first patch ensures that the mappings needed for handling mmap > operation would be managed by using the pfn instead of struct page. > The second patch restores support for mapping hugetlb pages where > subpages of a hugepage are not directly used anymore (main reason > for revert) and instead the hugetlb pages and the relevant offsets > are used to populate the scatterlist for dma-buf export and for > mmap operation. > > Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options > were passed to the Host kernel and Qemu was launched with these > relevant options: qemu-system-x86_64 -m 4096m.... > -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 > -display gtk,gl=on > -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M > -machine memory-backend=mem1 > > Replacing -display gtk,gl=on with -display gtk,gl=off above would > exercise the mmap handler. > While I think the VM_PFNMAP approach is much better and should fix that issue at hand, I thought more about missing memlock support and realized that we might have to fix something else. SO I'm going to raise the issue here. I think udmabuf chose the wrong interface to do what it's doing, that makes it harder to fix it eventually. Instead of accepting a range in a memfd, it should just have accepted a user space address range and then used pin_user_pages(FOLL_WRITE|FOLL_LONGTERM) to longterm-pin the pages "officially". So what's the issue? Udma effectively pins pages longterm ("possibly forever") simply by grabbing a reference on them. These pages might easily reside in ZONE_MOVABLE or in MIGRATE_CMA pageblocks. So what udmabuf does is break memory hotunplug and CMA, because it turns pages that have to remain movable unmovable. In the pin_user_pages(FOLL_LONGTERM) case we make sure to migrate these pages. See mm/gup.c:check_and_migrate_movable_pages() and especially folio_is_longterm_pinnable(). We'd probably have to implement something similar for udmabuf, where we detect such unpinnable pages and migrate them. For example, pairing udmabuf with vfio (which pins pages using pin_user_pages(FOLL_LONGTERM)) in QEMU will most probably not work in all cases: if udmabuf longterm pinned the pages "the wrong way", vfio will fail to migrate them during FOLL_LONGTERM and consequently fail pin_user_pages(). As long as udmabuf holds a reference on these pages, that will never succeed. There are *probably* more issues on the QEMU side when udmabuf is paired with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for virtio-balloon, virtio-mem, postcopy live migration, ... for example, in the vfio/vdpa case we make sure that we disallow most of these, because otherwise there can be an accidental "disconnect" between the pages mapped into the VM (guest view) and the pages mapped into the IOMMU (device view), for example, after a reboot. -- Cheers, David / dhildenb