From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9F97C433FE for ; Mon, 17 Oct 2022 08:41:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D53116B0072; Mon, 17 Oct 2022 04:41:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D02946B0074; Mon, 17 Oct 2022 04:41:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7C426B0075; Mon, 17 Oct 2022 04:41:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A41F86B0072 for ; Mon, 17 Oct 2022 04:41:58 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6C06C1403AB for ; Mon, 17 Oct 2022 08:41:58 +0000 (UTC) X-FDA: 80029798716.01.7CC5F21 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 339A3100030 for ; Mon, 17 Oct 2022 08:41:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665996116; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AYu+Olly91MbNGXLgDC9bc9k1FpmlocPO3fPhrKXidQ=; b=QC/3LJWPgKdK1nUuEI0z2j1jGejRKjMXiVp5pX31dxLOAOoLLTfY9RzAovGX/tSCW9vuEN Sf13qaaQScntkQ3pEtJ45FSlSm10J12dSXxYe9DnFLj25Ctp9S1ZevsAKOHAi3hXQLAdw9 ibQIX2wrWUCte1FkyiEAI5KxcrDyk10= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-583-yHtzyysjNwKpYrzOx3EXwg-1; Mon, 17 Oct 2022 04:41:55 -0400 X-MC-Unique: yHtzyysjNwKpYrzOx3EXwg-1 Received: by mail-wm1-f72.google.com with SMTP id n19-20020a7bcbd3000000b003c4a72334e7so5237037wmi.8 for ; Mon, 17 Oct 2022 01:41:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AYu+Olly91MbNGXLgDC9bc9k1FpmlocPO3fPhrKXidQ=; b=aGOMybbFCLy6nOvVm2l64O3dcy5QpJVcQB8XnEsyc60Glj7h1QA+3hRg/4T2HiOR7Y Pwf3T+RTkh8t9FTw1YSbmkzFBVpk2PYfINg2YS6YQXoKTf/THcTgDEDBAjHHt5a2qwU6 RCEHSDfE0XaxB8JtbaV5EYN61H1OMkBofwvuufSLxDct5EKF33GjUYXRWOUt6AsEXrtY /jrQ80jg4GKetQhde0mnL36uaMjhde5ghijT6EgUZpSOmAlnYvzfdObKSzW3RnbrgnhD ENm+OPT5RP6/QQvmLl0bz+MUCr2cD0o+iLcE+qw8uUqvO7R2/9tpptul7IBha8QdEzeM 5I8w== X-Gm-Message-State: ACrzQf1DZy7PBV7P9IFLmOyc+6nGQmrxSXWKP5FNQL0q3uHp0f3Dc6qp e9n8zPPirDtLEqvMvne6907SPhkH7a0SVriJ/MSC61eg96hBZ/rsaLcRCTvbHpWXiue1UOxJG2A VuFCsmumK29M= X-Received: by 2002:a7b:c5c2:0:b0:3c4:fd96:fb68 with SMTP id n2-20020a7bc5c2000000b003c4fd96fb68mr6474792wmk.36.1665996114161; Mon, 17 Oct 2022 01:41:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM44s/3N3JjFxznWXGrQX+rdkuNXuUerrs1tO5v7ve4FrMytiiusaeyg5H7HECTjpmjok+oUpw== X-Received: by 2002:a7b:c5c2:0:b0:3c4:fd96:fb68 with SMTP id n2-20020a7bc5c2000000b003c4fd96fb68mr6474777wmk.36.1665996113848; Mon, 17 Oct 2022 01:41:53 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:2300:e5ce:21ba:1d93:4323? (p200300cbc7072300e5ce21ba1d934323.dip0.t-ipconnect.de. [2003:cb:c707:2300:e5ce:21ba:1d93:4323]) by smtp.gmail.com with ESMTPSA id v1-20020adfedc1000000b00228daaa84aesm7881318wro.25.2022.10.17.01.41.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 Oct 2022 01:41:53 -0700 (PDT) Message-ID: <6227ba4c-9455-9652-7434-7842b2b3edcb@redhat.com> Date: Mon, 17 Oct 2022 10:41:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [RFC PATCH] mm: Introduce new MADV_NOMOVABLE behavior To: Baolin Wang , akpm@linux-foundation.org Cc: arnd@arndb.de, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org References: From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665996118; a=rsa-sha256; cv=none; b=ciqic+knDd5J8d3PooTEfOc18/NPVQMJlXIeYOV+keK/jh6pG3D/SMGNlM1jSFNTsEf88O c0JhKYVfTvaZDQWTUFeGjJc+inAjlkd13NgiSa7r/ZVxXcKnxhbCq/dWq3lr2maErYeF8I 3/8me1yidBV8bWkjdSTJcu4AX6/GyTA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="QC/3LJWP"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665996118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AYu+Olly91MbNGXLgDC9bc9k1FpmlocPO3fPhrKXidQ=; b=DewZbwpfdYePQgB2Bp70pRQTifFOag+l5GoeHTCDDWnoiTvCjSlZTNN2TtYn6YiKveBagT PULmTTWqlemyoSbIs4ZN7/U30Tm5YkyzwcV+v82gXVxhaIHKAGRLZJMLgdmDEs0wcEcc2U jQpr/MnD7Rwjxk8GWYRmrDVzGz4XQLQ= X-Rspamd-Server: rspam05 X-Rspam-User: X-Rspamd-Queue-Id: 339A3100030 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="QC/3LJWP"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Stat-Signature: optsbhqtttz9ii6gsxmnbw83b73k9wcg X-HE-Tag: 1665996116-129917 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.10.22 09:32, Baolin Wang wrote: > When creating a virtual machine, we will use memfd_create() to get > a file descriptor which can be used to create share memory mappings > using the mmap function, meanwhile the mmap() will set the MAP_POPULATE > flag to allocate physical pages for the virtual machine. > > When allocating physical pages for the guest, the host can fallback to > allocate some CMA pages for the guest when over half of the zone's free > memory is in the CMA area. > > In guest os, when the application wants to do some data transaction with > DMA, our QEMU will call VFIO_IOMMU_MAP_DMA ioctl to do longterm-pin and > create IOMMU mappings for the DMA pages. However, when calling > VFIO_IOMMU_MAP_DMA ioctl to pin the physical pages, we found it will be > failed to longterm-pin sometimes. > > After some invetigation, we found the pages used to do DMA mapping can > contain some CMA pages, and these CMA pages will cause a possible > failure of the longterm-pin, due to failed to migrate the CMA pages. > The reason of migration failure may be temporary reference count or > memory allocation failure. So that will cause the VFIO_IOMMU_MAP_DMA > ioctl returns error, which makes the application failed to start. > > To fix this issue, this patch introduces a new madvise behavior, named > as MADV_NOMOVABLE, to avoid allocating CMA pages and movable pages if > the users want to do longterm-pin, which can remove the possible failure > of movable or CMA pages migration. Sorry to say, but that sounds like a hack to work around a kernel implementation detail (how often we retry to migrate pages). If there are CMA/ZONE_MOVABLE issue, please fix them instead, and avoid leaking these details to user space. ALSO, with MAP_POPULATE as described by you this madvise flag doesn't make too much sense, because it will gets et after all memory already was allocated ... NAK -- Thanks, David / dhildenb