From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3BA7C001B0 for ; Tue, 8 Aug 2023 16:17:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4F536B0071; Tue, 8 Aug 2023 12:17:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFFEF8D0002; Tue, 8 Aug 2023 12:17:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA0368D0001; Tue, 8 Aug 2023 12:17:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BAA266B0071 for ; Tue, 8 Aug 2023 12:17:26 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 415E740D97 for ; Tue, 8 Aug 2023 16:17:26 +0000 (UTC) X-FDA: 81101442492.09.5DC8D5A Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by imf10.hostedemail.com (Postfix) with ESMTP id 11C47C0012 for ; Tue, 8 Aug 2023 16:17:23 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=KVgQDsCi; spf=none (imf10.hostedemail.com: domain of daniel@ffwll.ch has no SPF policy when checking 209.85.221.48) smtp.mailfrom=daniel@ffwll.ch; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691511444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xVnuqY1pi/3wEwJ5zen4asyUB1kEiQwuQ5lKJazadzM=; b=YTlcph8IjYGzQMOy4aWXcVFoyirqSmlkqcsd6mx/oteIvl4DmFOk35rM6CGS1ZFhIzCx0u VMPmhhOuAxsqEpioYwnenKc+VvxuXYHHSb9e/7HBciFtAF9+oRWMNmW4S2vbm36PE2VOGI wv83sFMA1aGbH/x63hhDwyitsdqR7qI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=KVgQDsCi; spf=none (imf10.hostedemail.com: domain of daniel@ffwll.ch has no SPF policy when checking 209.85.221.48) smtp.mailfrom=daniel@ffwll.ch; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691511444; a=rsa-sha256; cv=none; b=feA7pTOBQhNaJDv5jiQqpiDJpnr7BQeB2zGUWK8g4RB2mhjO4V82000AliLNDuNbL9959D mqPMBiEJHHhUIhYlVv4412KatSaZb4zhNVEPaj57NGZovy6KW9RUTMigaKK4vOrfVnEmQz C8ywJbhS457S/COch6dObdT2/JIc11I= Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-3177675d6dfso749326f8f.0 for ; Tue, 08 Aug 2023 09:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1691511442; x=1692116242; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=xVnuqY1pi/3wEwJ5zen4asyUB1kEiQwuQ5lKJazadzM=; b=KVgQDsCihSbJCDqVjOK7Y+bZEVuxVm0T/t2qQhrqJXgMXcGdAGin1Le1UG7hek9mJL NdT5UISXEICDjiStuGuwufUp4x6UZUSKFh1CoHT4TAviZiP8EJymQXzpQvGImc9mVGra KFTlQdIB0C/rDuDZpNCkLWtndCIWFEmngz3Nw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691511442; x=1692116242; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xVnuqY1pi/3wEwJ5zen4asyUB1kEiQwuQ5lKJazadzM=; b=Z4OFlBTjTEM4Ni9ImeSjV0JkraBPPQlqdvt6nmKaG/eKL0YBU+c0Egtj8izJPvBKvu rg8wL9NF4jeQ9JKOqa5KfQqW3pEMB1XZ0qXY23ktaL6VekWHyhGVZbbBRYZ6apk9grhb S/SsAaR9vB63qSLsN9jnsmrGleKvi0LF6eSJs3YQccL8vfMCOgkzprorLVxzkCDfYqTz hl+sSLdW850Dl8cdMdmMCTqTy7643VFEpCa1AC2AAV6NMHve3HIJSEyxcH653wi2yVYC uJRdqUnxWhJ0Iag3ceFrOnwdnLbcyGxmUcxg9T093uBWNH/ugQI7Cqow1d+sw6IwmQY6 kbTA== X-Gm-Message-State: ABy/qLabgPQH0/Jwzh92cqROkIvfPBZhEkZ5kP+GbtHpPbQpQK3GIims lBirCcv8DqsrgKgxrgrPswrtUQ== X-Google-Smtp-Source: APBJJlG10QPLrt9DvmMnETHAGzBCHONf7kRZXLM0tF9zaaM94aofb+Yin+QyoTbyDUPxyfUxC9Mx9A== X-Received: by 2002:a5d:63ce:0:b0:316:f32c:b156 with SMTP id c14-20020a5d63ce000000b00316f32cb156mr23304750wrw.6.1691511441896; Tue, 08 Aug 2023 09:17:21 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id k5-20020a5d6e85000000b003141f96ed36sm14202258wrz.0.2023.08.08.09.17.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Aug 2023 09:17:21 -0700 (PDT) Date: Tue, 8 Aug 2023 18:17:19 +0200 From: Daniel Vetter To: David Hildenbrand Cc: Vivek Kasireddy , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, James Houghton , Jerome Marchand , Dongwon Kim , Junxiao Chang , Muchun Song , Michal Hocko , Gerd Hoffmann , Jason Gunthorpe , John Hubbard , Andrew Morton , "Kirill A . Shutemov" , Mike Kravetz Subject: Re: [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages Message-ID: Mail-Followup-To: David Hildenbrand , Vivek Kasireddy , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, James Houghton , Jerome Marchand , Dongwon Kim , Junxiao Chang , Muchun Song , Michal Hocko , Gerd Hoffmann , Jason Gunthorpe , John Hubbard , Andrew Morton , "Kirill A . Shutemov" , Mike Kravetz References: <20230622072710.3707315-1-vivek.kasireddy@intel.com> <6e429fbc-e0e6-53c0-c545-2e2cbbe757de@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6e429fbc-e0e6-53c0-c545-2e2cbbe757de@redhat.com> X-Operating-System: Linux phenom 6.3.0-2-amd64 X-Rspamd-Queue-Id: 11C47C0012 X-Rspam-User: X-Stat-Signature: n4x4puf6wo18r5iudgk3qftwgdm8jbh1 X-Rspamd-Server: rspam01 X-HE-Tag: 1691511443-367837 X-HE-Meta: U2FsdGVkX1/Aq4ZJX1g07C62W0wlOtc8hBRLEk3yMmEROdJrKa226x5K1IUmMdQpwpixVoXxh+RnMc5pZ5PFuQBwLDk3exFVnMM9NeCFnEifqrhN18BDYTa7nOLT5ES+z017xpwu3Ruwy1AO1b7sNA60GZzD5vO1oxOSyTvBVTgoiCuYDe7lVH/bi5Q+WRn//QP8iEC+Qk+BenOWK7JrpFlanaW4T1sRYaM9w1p+cn5nRJnR0n2b/P9mCxfPgg2UJCOTyoPmvva2wUSJ33xyITwCLovFAZQ5foKT3h5euIv529/RnoxqxvJidjqY8+86TvXxrwy7DiHgNbdJA8Tajshitrxoy4dk5mPf9Cx4arY8kq3haOrNYYrgTZwFHbS8erFSIfhlm4bi0Upj3KKKizKkpxOqraKm9sKMZpLgmqQhMKx16WIApg/BioMEVdeuwvyHuzjBWemqhIwfU6I82KoqINzgNuR3Zq4PkgsCuAkOX50ksqg1OKufqrcNALqDPZVJqmQyMKtTa+qOFsIcRfUmdipKWFfkVdB6zvAKneK2vQtU9az/gQ9H11uMDbkXE6n6skzGZEe4oFog+CWsbjzN+YvTP7dPF1B5RdOxu0ocuvF+WId0R7N04YfWivYeHPL6EVxLjdft9yja+CLqMCnpvfYKmdx/b6jGtPEqU/NgUAIoUiMqzN7R+rSSSNtdSKrdzpdHAK40zqhSk+Rt7E376eqxtOTmWhijc9avRsLEnFJ0osn+WDgcXrrBkHvqojlz+FsCcNDieTiRs9KtEnL5VuCryqphh0lM4W+ONOX/FcLR/r0Byq4BLKW7OgspNLlcaobc/IPPItEvFSQo1e+Q6VemEeZmin+3zMnBqmXS/5Yhx/7qxcQIw+ecAbFmWYWodlSAVdVTbvpEDOop5ykF96U5ZYjhoz7OVzK+lCfRE2eVx9NDrFcZOof/2nI04iE3nFzVv4S/7hFgYX/ CkDwesyk SkKc9tsxyoykWJSrdGJV6DiXxbHyxjNWwUg05QnReADl7MLKkH/aogrHEeJ9ney5ml6m9jFml82hkKHt0PinyWtxhdwMnr0pO/AWmc2h5tIXqPTkRn6EO2xE9hA9DOCp7lb264cq9bNak2V+YTZFhtWMl6BR2ZmgdDWiYHJ5ZpUaHLLvpuWIpSD1OfLKn1Ji/jjZtk83T25WWvq5BekKVUiFvNz7KMY1QlMII+RrdDtxPEKndo9WxtQ/mWQne3alPlWhI6LFVM2ZuWFwEuiRTQYgJcoAF+ZedaOlSKwhIXuneIbyDRoV6xqHtLo3ByM1ciOpaGj3tbjkqQdk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 22, 2023 at 10:25:17AM +0200, David Hildenbrand wrote: > On 22.06.23 09:27, Vivek Kasireddy wrote: > > The first patch ensures that the mappings needed for handling mmap > > operation would be managed by using the pfn instead of struct page. > > The second patch restores support for mapping hugetlb pages where > > subpages of a hugepage are not directly used anymore (main reason > > for revert) and instead the hugetlb pages and the relevant offsets > > are used to populate the scatterlist for dma-buf export and for > > mmap operation. > > > > Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options > > were passed to the Host kernel and Qemu was launched with these > > relevant options: qemu-system-x86_64 -m 4096m.... > > -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 > > -display gtk,gl=on > > -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M > > -machine memory-backend=mem1 > > > > Replacing -display gtk,gl=on with -display gtk,gl=off above would > > exercise the mmap handler. > > > > While I think the VM_PFNMAP approach is much better and should fix that > issue at hand, I thought more about missing memlock support and realized > that we might have to fix something else. SO I'm going to raise the issue > here. > > I think udmabuf chose the wrong interface to do what it's doing, that makes > it harder to fix it eventually. > > Instead of accepting a range in a memfd, it should just have accepted a user > space address range and then used pin_user_pages(FOLL_WRITE|FOLL_LONGTERM) > to longterm-pin the pages "officially". > > So what's the issue? Udma effectively pins pages longterm ("possibly > forever") simply by grabbing a reference on them. These pages might easily > reside in ZONE_MOVABLE or in MIGRATE_CMA pageblocks. > > So what udmabuf does is break memory hotunplug and CMA, because it turns > pages that have to remain movable unmovable. > > In the pin_user_pages(FOLL_LONGTERM) case we make sure to migrate these > pages. See mm/gup.c:check_and_migrate_movable_pages() and especially > folio_is_longterm_pinnable(). We'd probably have to implement something > similar for udmabuf, where we detect such unpinnable pages and migrate them. > > > For example, pairing udmabuf with vfio (which pins pages using > pin_user_pages(FOLL_LONGTERM)) in QEMU will most probably not work in all > cases: if udmabuf longterm pinned the pages "the wrong way", vfio will fail > to migrate them during FOLL_LONGTERM and consequently fail pin_user_pages(). > As long as udmabuf holds a reference on these pages, that will never > succeed. Uh this is no good and I totally missed this, because the very first version of udmabuf used pin_user_pages(FOLL_LONGTERM). I think what we need here as first fix is a shmem_pin_mapping_page_longterm that does all the equivalent of pin_user_pages(FOLL_LONGTERM), and use it in udmabuf. >From a quick look the folio conversions that already landed should help there. It might also be good if we convert all the gpu driver users of shmem_read_mapping_page over to that new shmem_pin_mapping_page_longterm, just for safety. gpu drivers use a private shmem file and adjust the gfp mask to clear GFP_MOVEABLE, so the biggest issues shouldn't be possible. But pin(LONGTERM) compared to just getting a page ref has gained quite a few other differences in the past years, and it would be good to be consistent I think. Anything else than longterm pins wont work for udmabuf, because the locking between struct page/gup.c/mmu_notifier and dma_buf is rather fundamentally (and by design due to gpu driver requirements) incompatible with dma_buf locking rules. > There are *probably* more issues on the QEMU side when udmabuf is paired > with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for virtio-balloon, > virtio-mem, postcopy live migration, ... for example, in the vfio/vdpa case > we make sure that we disallow most of these, because otherwise there can be > an accidental "disconnect" between the pages mapped into the VM (guest view) > and the pages mapped into the IOMMU (device view), for example, after a > reboot. I think once we have the proper longterm pinning for udmabuf we need to look into what coherency issues are left, and how to best fix them. udmabuf already requires that the memfd is size sealed to avoid some issues, we might need to require more. Or on the other side, perhaps reject or quietly ignore some of the hole punching for longterm pinned pages, to maintain coherency. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch