From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51DD5EB64DA for ; Mon, 19 Jun 2023 12:27:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E10D98D0002; Mon, 19 Jun 2023 08:27:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC1128D0001; Mon, 19 Jun 2023 08:27:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C89068D0002; Mon, 19 Jun 2023 08:27:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B74268D0001 for ; Mon, 19 Jun 2023 08:27:45 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 88F84C0156 for ; Mon, 19 Jun 2023 12:27:45 +0000 (UTC) X-FDA: 80919423690.03.6DD46BC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 944FD40004 for ; Mon, 19 Jun 2023 12:27:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B4i92959; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf12.hostedemail.com: domain of kraxel@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=kraxel@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687177663; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YVrUyaauyyvwxcUddbauFV45YIITDOV45M/dy9o2iWA=; b=Cz4NiyiObUHWiucDDAPUwtrGNPNQgwU4XgwfmdR0u9yqaV1sRlwUOhSpuX5g7rEpHeMfmE Nr77qh16AlJktDstMYC+HiJUFUbti0NNd+xX0trJn2c2zOtKtKZ+9yU68nqIxA4WrCeHiI 33Nmqcjff/2qD9+oOFdAByMxlI8nxBE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B4i92959; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf12.hostedemail.com: domain of kraxel@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=kraxel@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687177663; a=rsa-sha256; cv=none; b=xu12rfmG5IWgQibrj5H1HpOJBsf9UjNsxGWdkKhAobM5Z/c+UifMMjvRwE9Z/vPx+cV3rp rDUr6ns3B+Ur+XWgvSf/c7poF8I8Jd/24qQaJT1RK3oFfNjr+uXyOAWHSvTDeIu3xvXaGl E7J9po4blRxxcxZ8uAx8/EDCO/jZk4E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687177662; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YVrUyaauyyvwxcUddbauFV45YIITDOV45M/dy9o2iWA=; b=B4i929595/UlggZju7bfpb47HixLLBayxAEov8IlDVLPo5cUTbw8KtaSbSdJFp3wr53VDd THwGj39HH4XrEHVY7PrCrQPXjUJdwjo40spyUw5uW8P1mz0yYvfXY01yl6mFYqHerQjbN2 byAx/B9jjjv+ul4doS74ZCQ4ocSRkac= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-60-wbQjvZxYMPKM6drPfvmQ_A-1; Mon, 19 Jun 2023 08:27:39 -0400 X-MC-Unique: wbQjvZxYMPKM6drPfvmQ_A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EC930101AA46; Mon, 19 Jun 2023 12:27:38 +0000 (UTC) Received: from sirius.home.kraxel.org (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B658D48FB01; Mon, 19 Jun 2023 12:27:38 +0000 (UTC) Received: by sirius.home.kraxel.org (Postfix, from userid 1000) id 590CB18003AB; Mon, 19 Jun 2023 14:27:37 +0200 (CEST) Date: Mon, 19 Jun 2023 14:27:37 +0200 From: Gerd Hoffmann To: Mike Kravetz Cc: James Houghton , Junxiao Chang , akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, mhocko@suse.com, jmarchan@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Vivek Kasireddy , Dongwon Kim , dri-devel@lists.freedesktop.org Subject: Re: [PATCH] mm: fix hugetlb page unmap count balance issue Message-ID: References: <20230512072036.1027784-1-junxiao.chang@intel.com> <20230512232947.GA3927@monkey> <20230515170259.GA3848@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230515170259.GA3848@monkey> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Rspamd-Queue-Id: 944FD40004 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: st4qq8uprdn8c8prrfcdas3buxsa6iqf X-HE-Tag: 1687177663-517053 X-HE-Meta: U2FsdGVkX19WfttXSPH0PB0xe2RxwhInezdtPAuxJ32Sy13Dg3zbYQJqPlT6laZqV+U0qlxClZvSKSYz7rKe3pyLDCbWpXP9NHifLMtl4qdTk0Md3RRdFUBDkQ11YYiXrt+EUOAl6LYwu7lt8wnvoorAIhSrdWXfnkL/szgX1tv5qOsmopryBgqd94NMjCJ21kn0jT5pyuWhCY9Gor0Ml6Y6kE5v5lrtMgYsynG0iLm1kiJMr6TFzMdSa9p6YuNcAVO5mtckJaO4CYEDcyVfxKWY6HKcirwqYhoh6oYi9Byyd2DCejIxyMZL16KydI2otaCadugOTjYn7Pz0RsDh+wpv7E3ECafvIyHlFjSqmF7Ny6UPl/DiDaUyWKny/Y9zaqzMhzH8q9AenOicShpt4iLvzGq3Q5Heo4P8Lw8tLwgG2S4RIuD/ztKr1BBzTzmBgfhZOIorRwTAwuXPHgGS4crYqN/uNMGcjxYzkmBpDBas2Isi1NU79Idg4EnCwa6ZDOioVYDcEVi1x4yMdBDpkvfcufE+Da0jGnrYxKQtuVpR7JdCTczKx3ouMSMe8RSAqA4JJzLFs7FU/JDhVM7fDgjwpl05PAKTVmxWuG9oSfcLz+oAoAE9bxF8eLl5ANlrTnkU4r+x4j690Z3xhzKyXr+xABTsF9DkkHazny/xHLBKBbYWrHXAO5oCmWAKY7hgx30owJorWoHJkw6n/9SVE9nMyusTIxSZvNsVvXLF/J4JI35eFo/oVyoF1oISFSAxyCis7y/iOAe14Vu/CIwKB+sGNvsQS/DmmR19N3g8Sffgio+ocToqzD2uk+uHwGJEGgaBbzOJ0qcVpur+Ec/Zh2s2DkBArhpzjDij1WYeyQN2y0YCEuX6JnXnEY/TlL4LqJ2X7aaAVueRZBP1mo9CDmB5HO3UVjA06aW7QsSOa6+1Dxm6XTFzjDJCXTYboIvzzWX9X7Okrf1dFqLO8jO tgA9l+O9 J5aoQGtZlZMXO3YiDqb/nqJryS2hCNF0ZQDomEtHd1CHVKoOmJuVvI9GAbpu4Xbv1bYTIOMtQspebSS9mUsGq8UlwNjD4u8TxqlSOuOoU/liS6vFZAIbcsWqQJmTQ1JbNRfqFu6+McH9/113gwvc6LBg1bl1Da9NqTFRYkmWtsZqJj9WCpGc3UjRYvDfXSFZ5kV/D3PH5lXLW3ZEyMPS7DSOGQNESZOIYeSL5b5vAxl/vuw0LnrfD9YGWRJ3J55F7CrEBq7Bu1BTWlrc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 15, 2023 at 10:04:42AM -0700, Mike Kravetz wrote: > On 05/12/23 16:29, Mike Kravetz wrote: > > On 05/12/23 14:26, James Houghton wrote: > > > On Fri, May 12, 2023 at 12:20 AM Junxiao Chang wrote: > > > > > > This alone doesn't fix mapcounting for PTE-mapped HugeTLB pages. You > > > need something like [1]. I can resend it if that's what we should be > > > doing, but this mapcounting scheme doesn't work when the page structs > > > have been freed. > > > > > > It seems like it was a mistake to include support for hugetlb memfds in udmabuf. > > > > IIUC, it was added with commit 16c243e99d33 udmabuf: Add support for mapping > > hugepages (v4). Looks like it was never sent to linux-mm? That is unfortunate > > as hugetlb vmemmap freeing went in at about the same time. And, as you have > > noted udmabuf will not work if hugetlb vmemmap freeing is enabled. > > > > Sigh! > > > > Trying to think of a way forward. > > -- > > Mike Kravetz > > > > > > > > [1]: https://lore.kernel.org/linux-mm/20230306230004.1387007-2-jthoughton@google.com/ > > > > > > - James > > Adding people and list on Cc: involved with commit 16c243e99d33. > > There are several issues with trying to map tail pages of hugetllb pages > not taken into account with udmabuf. James spent quite a bit of time trying > to understand and address all the issues with the HGM code. While using > the scheme proposed by James, may be an approach to the mapcount issue there > are also other issues that need attention. For example, I do not see how > the fault code checks the state of the hugetlb page (such as poison) as none > of that state is carried in tail pages. > > The more I think about it, the more I think udmabuf should treat hugetlb > pages as hugetlb pages. They should be mapped at the appropriate level > in the page table. Of course, this would impose new restrictions on the > API (mmap and ioctl) that may break existing users. I have no idea how > extensively udmabuf is being used with hugetlb mappings. User of this is qemu. It can use the udmabuf driver to create host dma-bufs for guest resources (virtio-gpu buffers), to avoid copying data when showing the guest display in a host window. hugetlb support is needed in case qemu guest memory is backed by hugetlbfs. That does not imply the virtio-gpu buffers are hugepage aligned though, udmabuf would still need to operate on smaller chunks of memory. So with additional restrictions this will not work any more for qemu. I'd suggest to just revert hugetlb support instead and go back to the drawing board. Also not sure why hugetlbfs is used for guest memory in the first place. It used to be a thing years ago, but with the arrival of transparent hugepages there is as far I know little reason to still use hugetlbfs. Vivek? Dongwon? take care, Gerd