From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC71CC05027 for ; Fri, 17 Feb 2023 08:53:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20E8E6B0074; Fri, 17 Feb 2023 03:53:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BEC66B0075; Fri, 17 Feb 2023 03:53:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AF266B0078; Fri, 17 Feb 2023 03:53:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F0B366B0074 for ; Fri, 17 Feb 2023 03:53:56 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C9A1AABBF5 for ; Fri, 17 Feb 2023 08:53:56 +0000 (UTC) X-FDA: 80476171272.28.A9F871F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 2FC3840003 for ; Fri, 17 Feb 2023 08:53:54 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gxRwNdc4; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676624034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tgD2uEymDagnzBwSlUGEHk29gX9WAxjQ6+JjiZ4XaSI=; b=eOx1Vp65kAFSpM6lKwqJ7uKoDa9sCtLsJTFMD8ZtCmSEDA+j57WYQ0MvV98SCIjRBK5oZW +n1fk3SW9ZzOZ+4MfF8wjdFOTh/aJgNRzGZXYNMftthvfm9hh/0hMC/5LDcz1cMywifin3 Bs6jjrAVZ2zFZNQctoFBzkRV4Cxs1oA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gxRwNdc4; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676624034; a=rsa-sha256; cv=none; b=XE2CwzROB/PDfVArvmQYACZv9KHMUl8VBt2bnJ7PyzwZ4BCYCNnhd5uhWrUGU7CsCM0eWO KM+tx5F4afdONUQoNixWnBrhxqy0VuJZh5yGGNlN9JGKJdSdROAGaAfAqzDXCygqQdakgk UzSOdlTzKiJQw4jKWACyZcOg1+6+SZw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676624033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tgD2uEymDagnzBwSlUGEHk29gX9WAxjQ6+JjiZ4XaSI=; b=gxRwNdc49YlVOk2G2B/e8w92mTjfD3PmD3T6WPa/Ku5iCKgTECEpfFjiPQ+CcuuZVm+N6H So927BJIOoZlJkjF8oovCnrH4dlONWKa7GQ0lGdNkxqp5Uw4xS20FitTB5K/H+aDfu7/tf z2Yfw8ABW2nJijxl1JRZxhNXnWzJF2s= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-626-M3wPnOC3MBagGgKFKn9nBw-1; Fri, 17 Feb 2023 03:53:50 -0500 X-MC-Unique: M3wPnOC3MBagGgKFKn9nBw-1 Received: by mail-wm1-f70.google.com with SMTP id k2-20020a05600c1c8200b003dd41ad974bso325674wms.3 for ; Fri, 17 Feb 2023 00:53:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tgD2uEymDagnzBwSlUGEHk29gX9WAxjQ6+JjiZ4XaSI=; b=d4fYpd1ognh+8eKT20+zrYH3d0qInmVcKTzflN4+d5bLlL+EaxNpu5M3d6/BpMXbDU EvwpJQKjeoEn/4NNkdywdDLBHdKi/WfY8MZJm6L4PwTcjAY7TIj9jZJmld+6d8924reC UA+cup3976bCNfWV5Ky0WcKRAKaP7Me3iIf4wTcjC5WLdGskdTtZfBW5nHjln8ucApXR wu4yLQ6FYxvggQRHiVUtblpNFswFPr+FsotH2UVhiT4+69vphGr0LNHw/+/Kwslpker3 w6jWSKd7OyEpdrbsZ/F5KYIKmaUOaq8gSxovwJWqY/wjqrN92GV4wMP2b5hf+v3rVttS LtFQ== X-Gm-Message-State: AO0yUKW3M/xbJxGyINiD3ia3XE4Ma8XH9ghs94q9xkqEdS8kDNzmAVwA Tq1kWRmv0gajrDbN2CKxljMafnRSyt0bA7pcq7zDlD9J/8TsdoVKKb9N1iyd5QxFTSd+GDvKIi2 E35loZxO0F3M= X-Received: by 2002:a05:6000:180f:b0:2c6:e91d:1359 with SMTP id m15-20020a056000180f00b002c6e91d1359mr520374wrh.61.1676624029206; Fri, 17 Feb 2023 00:53:49 -0800 (PST) X-Google-Smtp-Source: AK7set95a+M7/PizFBZZTVRGJDhDAH5Z0d4fwTlTuXv7W0l5qprNUlC8qglp9fDeqWRsHQsPiUimdg== X-Received: by 2002:a05:6000:180f:b0:2c6:e91d:1359 with SMTP id m15-20020a056000180f00b002c6e91d1359mr520360wrh.61.1676624028848; Fri, 17 Feb 2023 00:53:48 -0800 (PST) Received: from ?IPV6:2003:cb:c707:9800:59ba:1006:9052:fb40? (p200300cbc707980059ba10069052fb40.dip0.t-ipconnect.de. [2003:cb:c707:9800:59ba:1006:9052:fb40]) by smtp.gmail.com with ESMTPSA id b11-20020adfe30b000000b002c573a6216fsm3714387wrj.37.2023.02.17.00.53.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Feb 2023 00:53:48 -0800 (PST) Message-ID: <53dc6054-07eb-f97b-7b2f-558f02d1b90a@redhat.com> Date: Fri, 17 Feb 2023 09:53:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Peter Xu Cc: Muhammad Usama Anjum , Andrew Morton , kernel@collabora.com, Paul Gofman , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230216091656.2045471-1-usama.anjum@collabora.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v4 1/2] mm/userfaultfd: Support WP on multiple VMAs In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2FC3840003 X-Stat-Signature: axkpwrtd3q3ma5cgm5qahkk76xaioe33 X-HE-Tag: 1676624034-610997 X-HE-Meta: U2FsdGVkX1/WDu5QCYouAaPTwx+AMUpuG70QXrveFDTaKGdjzkwzR+d0euZ7EtZNx42uG4gcZy8dK1jgyPKk1+hNQViQThgWcscy4vj03Z9shPCoFd6YQFxhEw5t+GjMycwxfLn7LFMVqh5phugCTs2684Etqyc0Tf0NMlmGyAOEAXf77pe+N92taCeZ6C3ESpM50aLn8+Fg3bIf0JjtFDqVwMGxdF8Zd1o/xAix+h86EMsksP+hlaMxt7hIEiGvRVtj0xge53CE6sV+WpNK0XAqxT3jufwAs1VghhrzeDj9O4RVOqTgqSrbnskVVFR2DGPYwpXX6DAjj0H6hoUotjgjvgR9VQ1McMaOQbzeQY6Yhx1ZpH5Y8FaU4fEYwiHqO/tzza86PvnJZL04P+nKY9hs5H7hiz/Sgn6KAimVj5GpBVmXm+OqMwttNGUvCiM8ImLT+oMtWUr/p6Ik++oVK3Yjl3QhNCl5MHs9h+14qhSR6pSe+ciMXiSQpxWw5FLn+AE4RGgllPQuxkCjW/bP+Ro6PTeW/s0AGL6+jsvXx9j4i0wN9xVn7oB3/WytUiQSMxARx/tRmhJhAiGO8zRBdMcFREmM5Jb7d3AxCZ2VpwIssMZgzHq5EckI2HLGILumH3Ghfdlkd8pnpXn2tZ1W7d6ym0heobVXky+5nqWLbcXRmMfmlGIj5fzcYLrA2kuKkFfUlmPMlex0KwadZ81y6C9Gctqok0Xl2Dffw991Dg3MCU3RutDOFOwTD7ZgUa1QEuWAw5Uoa1vTChRa38PXX4i/PwUqt5dbQheRMmINIC6VjgOcC6yT67XCombe98rBh0LiQXZc6wV6apNRhQJpaCxMDZGBQ78+SzTQZk+0f+BvAOZp9/PDU4jyXuFmv/uTZdEJGveHnvaOXMZ4cgv4HvGl0/AK9Mj2pmYzphi/vv2pqYPQUIydiSLYgxaRrdKM0IRcGDBAUUXouuc3iwt ntN9OBak 1XBflxvFCYewrVjc1Rs/TKZWQ5yxWasCDp701+Q35rBEq91qchR/iXOHNyABWcjWUKZwrQMR+1/CeAxudoNgYaaHVvim9LVXuVeKyFioDuJtNiaBBrUIWlDYjsw0ZgxSyZH+0XMqi3Fgxc8Fm+hMqanvDTyFUMMRj7jK5ORon68PWX2FF75wu/jdRcURqeCsVdfueH1274AS56QPDoaIMPdvho1mHJcdrqck8sqOhxmGHqEqFFYe0Yr591AFVmNPizoZ7vJE+bArItoaz2+owoBqjFFol8zFGvWsPlrMpCxlhH90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.02.23 21:25, Peter Xu wrote: > On Thu, Feb 16, 2023 at 10:37:36AM +0100, David Hildenbrand wrote: >> On 16.02.23 10:16, Muhammad Usama Anjum wrote: >>> mwriteprotect_range() errors out if [start, end) doesn't fall in one >>> VMA. We are facing a use case where multiple VMAs are present in one >>> range of interest. For example, the following pseudocode reproduces the >>> error which we are trying to fix: >>> - Allocate memory of size 16 pages with PROT_NONE with mmap >>> - Register userfaultfd >>> - Change protection of the first half (1 to 8 pages) of memory to >>> PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs. >>> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors >>> out. >> >> I think, in QEMU, with partial madvise()/mmap(MAP_FIXED) while handling >> memory remapping during reboot to discard pages with memory errors, it would >> be possible that we get multiple VMAs and could not enable uffd-wp for >> background snapshots anymore. So this change makes sense to me. > > Any pointer for this one? In qemu, softmmu/physmem.c:qemu_ram_remap() is instructed on reboot to remap VMAs due to MCE pages. We apply QEMU_MADV_MERGEABLE (if configured for the machine) and QEMU_MADV_DONTDUMP (if configured for the machine), so the kernel could merge the VMAs again. (a) From experiments (~2 years ago), I recall that some VMAs won't get merged again ever. I faintly remember that this was the case for hugetlb. It might have changed in the meantime, haven't tried it again. But looking at is_mergeable_vma(), we refuse to merge with vma->vm_ops->close. I think that might be set for hugetlb (hugetlb_vm_op_close). (b) We don't consider memory-backend overrides, like toggling a backend QEMU_MADV_MERGEABLE or QEMU_MADV_DONTDUMP from backends/hostmem.c, resulting in multiple unmergable VMAs. (c) We don't consider memory-backend mbind() we don't re-apply the mbind() policy, resulting in unmergable VMAs. The correct way to handle (b) and (c) would be to notify the memory backend, to let it reapply the correct flags, and to reapply the mbind() policy (I once had patches for that, have to look them up again). So in these rare setups with MCEs, we would be getting more VMAs and while the uffd-wp registration would succeed, uffd-wp protection would fail. Not that this is purely theoretical, people don't heavily use background snapshots yet, so I am not aware of any reports. Further, I consider it only to happen very rarely (MCE+reboot+a/b/c). So it's more of a "the app doesn't necessarily keep track of the exact VMAs". [I am not sure sure how helpful remapping !anon memory really is, we should be getting the same messed-up MCE pages from the fd again, but that's a different discussion I guess] -- Thanks, David / dhildenb