From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35608C4708E for ; Thu, 8 Dec 2022 01:57:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93D478E0006; Wed, 7 Dec 2022 20:57:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EB888E0001; Wed, 7 Dec 2022 20:57:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B2F68E0006; Wed, 7 Dec 2022 20:57:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 681A38E0001 for ; Wed, 7 Dec 2022 20:57:12 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3737B160617 for ; Thu, 8 Dec 2022 01:57:12 +0000 (UTC) X-FDA: 80217476304.13.20EAAD4 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by imf25.hostedemail.com (Postfix) with ESMTP id A6BBAA0015 for ; Thu, 8 Dec 2022 01:57:10 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=iLRCrUtT; spf=pass (imf25.hostedemail.com: domain of dmatlack@google.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670464630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sgZQZKod6we+ImeEyiG1D+f4JgWlzUQbiuVGgB2oWc0=; b=4Fh26qtuCCrVO2p5CsgsG8IQGjpciqU1u52pvA7S7Y8HlmGguACrDI73bZHf3n86Ncpv5R zq6dqyUczjqVFN+uizWm88yDb2geaM3XClPraB0qUYWnDFY7zQlO591xrbNpnKO47uapL5 Ix5h4K5Gm9a2bn3Nkj+udQvgEo99FFU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=iLRCrUtT; spf=pass (imf25.hostedemail.com: domain of dmatlack@google.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670464630; a=rsa-sha256; cv=none; b=D+luafo6u8jBeR7ZdUvSH7/mFrH6mM9vlZrVq0bX8x85w5ucLhjF0l8xtEzb33nLciI3aO E4GL2Bftu+dUHE3Fhv/pMLl01T2vhTNL9xgEQj+XZnWEpFfM1YXXGmHf0dXf78T1yx9Wyi RFNuxczq5kAHXqgX5WzTNSbhukQWsKo= Received: by mail-yb1-f177.google.com with SMTP id b16so2100yba.0 for ; Wed, 07 Dec 2022 17:57:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sgZQZKod6we+ImeEyiG1D+f4JgWlzUQbiuVGgB2oWc0=; b=iLRCrUtTrivQY/VNlpIW5PxLBLiVFIu5r5x/7Gq7JZ5VBc4VGPHNXWbqi9oGkr+L24 1O0HnrawGj43v0vv3GbCYrcPbO55iW49bRH2+rx0tbTOV7NcuDfR/P4ps/EG8Jn5Y5Y1 oIRJOIJDgfp4m/d7AroY72PmVGgKRDXTFSmHIvai29avrqQzJLLEEGT7DI/S4YCk2Nxu B0ZFxzpRCY9NU8mcLWhmTZ8ihaxFIONgixzNTRyDJdkgo102qvc7WejaajERLyzPNoV9 kxbDp0AB8Hada5eNQhvfe5JBOBYVag4aaoboqWJDO5diArqYyY2Qx/SSFiZGsrD8e3rk CSAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sgZQZKod6we+ImeEyiG1D+f4JgWlzUQbiuVGgB2oWc0=; b=tLJOQJGxD6nShdx4OswyB2dZvdMTX57Pku8Ziu/RJ27QyUnWpICxxvnEN/xutUDQhS uQtpR3EmSziVDpTxLpDkDCMfbsc5m+y2IcjztTSfFQjJ6F70CSJ/nbA2DEp7uStkv9F2 WH/rAhPk9G804Dl2kGJ9uOtgjD4j99xGMUXM32thYxMOei06r2fVvWBpjq9p2AHOcBfG b/1AvRYpTP+Jq5hTjfRQRuI5Ko8b9gVOkGzEGX5qmifnHIf4qTuTqKEBQrrvCYFmw8uw alOB+tmoRBzcLr9AjyHoJLIL2kY09WjRcGPDlm7Dgyv6Cz91YdmDMRil3Q/zg96ToifL omOQ== X-Gm-Message-State: ANoB5pnC6MZR7sNW0iu/HS71UapVb7NF4G8a4t7eFFkhjtSPa6Ed1c09 1soMCtNOf3KAdnanXFDfmNEH9a9XA5kAM7exNh1Snw== X-Google-Smtp-Source: AA0mqf7qQLRJfGdTJCsgR9vxHtzEIM6grxclRadtbUckzrnP/lLgyh/1Vmp45w4SYZNrGCkCHUnVWMD56Ot1gNdGcjU= X-Received: by 2002:a25:b948:0:b0:6de:6c1:922e with SMTP id s8-20020a25b948000000b006de06c1922emr90363887ybm.0.1670464629602; Wed, 07 Dec 2022 17:57:09 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Matlack Date: Wed, 7 Dec 2022 17:56:43 -0800 Message-ID: Subject: Re: [RFC] Improving userfaultfd scalability for live migration To: James Houghton Cc: Sean Christopherson , Peter Xu , Andrea Arcangeli , Paolo Bonzini , Axel Rasmussen , Linux MM , kvm , chao.p.peng@linux.intel.com, Oliver Upton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A6BBAA0015 X-Stat-Signature: pizj5xehoi5fphq3w7g378hjsrc5gweg X-Rspam-User: X-HE-Tag: 1670464630-776177 X-HE-Meta: U2FsdGVkX1/orXCD+ycpuBDsoj6PMRTVKYwWl33WY2U1i5u96PH6MsscqqYpsqjjGDZawRrGxu7nlpY3qdLK7ZUF3evPq3DmhgmnK7bwavOv7JiDaPP/H8710v2QBeAhMubUHl8MgJi4WnEViNBct2w/KuqVW+ICutrcOWuwNkJ0mpLyCdKtzeub+N36QEHTl3sFcQxydYnxUW69pY01HjOo9yQuYDUs5bQjMSgSxZVAoNi+JXFKoll/KPkWe/+y0yMrLPDbhErg4Wa0p2GRT0olek0hkHQsEtBtFz/lxfRYxB/1gAZLoTh/PJZvK6+rL1nS30ViZ6p3mA+Mv2bF1sPCfMMYU0shJ/pjeF4b93IqVlt84gnE1zAsYbIRV0kAD0DIMq3iTugv9z1djAXrGM0SsLHf73MhoOxDKvoZpb6sJs7xTJuWzoHhw/BAvFtE50MbXPXVjaY3h8LKXlJ9FE61QUv87ZFaOyTR/poIgQBfZSuysSq03YKmPZCJgNNMdwI6APs0i+3ukRTz84bvzxmfZyPfkxeFJ+TD2nevDkL3NlK3bsPthCQ6OJ3VU0JU+gJJy4UhJfIPOBPuiBuXvE+vGZi8WG3LbcTvcH6FeI5uDbPKXlH689fUCVaZTJk30ZTAru+L7O8Bd3MuRAVvywAFspeO4UPmkcOqMAFGiMZPHculaMW9AjsTkaa0UT4NHEoFozIp0+0OEDYu/qip/NQ0arZVzvYcK/CzHo3/Yd/nr2lBC48T63qQDIs2YBtapyzfsTq9OHDfkOtldhWW+BtF1sPC1lARLZvu2S2bV5T1a4enfzsN3tsnX2fJdvBOm6VnYb/mHJjE5lcogdlGfIxTPn1+Vm4MYW/zBcTtLkFcZn2JxLZwpEP3EaDybez1mvLQeByvje19kCI3vYnP/PvxYslEFvubExEQ49t1f808e2kfF5aZdfEgIcbwFRSD8tvwwyuUjud8fQlytjy BD76JQtg nI+GxKPvmPs0yeqg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 6, 2022 at 12:41 PM James Houghton wrote: > On Tue, Dec 6, 2022 at 1:01 PM Sean Christopherson wrote: > > Can you elaborate on what makes it better? Or maybe generate a list of pros and > > cons? I can think of (dis)advantages for both approaches, but I haven't identified > > anything that would be a blocking issue for either approach. Doesn't mean there > > isn't one or more blocking issues, just that I haven't thought of any :-) > > Let's see.... so using no-slow-GUP over no UFFD waiting: > - No need to take mmap_lock in mem fault path. > - Change the relevant __gfn_to_pfn_memslot callers > (kvm_faultin_pfn/user_mem_abort/others?) to set `atomic = true` if the > new CAP is used. > - No need for a new PF_NO_UFFD_WAIT (would be toggled somewhere > in/near kvm_faultin_pfn/user_mem_abort). > - Userspace has to indirectly figure out the state of the page tables > to know what action to take (which introduces some weirdness, like if > anyone MADV_DONTNEEDs some guest memory, we need to know). I'm no expert but I believe a guest access to MADV_DONTNEED'd GFN would just cause a new page to be allocated by the kernel. So I think userspace can still blindly do MADV_POPULATE_WRITE in this case. Were there any other scenarios you had in mind? > - While userfaultfd is registered (so like during post-copy), any > hva_to_pfn() calls that were resolvable with slow GUP before (without > dropping into handle_userfault()) will now need to be resolved by > userspace manually with a call to MADV_POPULATE_WRITE. This extra trip > to userspace could slow things down. Is there any way to enable fast-gup to identify when a PTE is not present due to userfaultfd specifically without taking the mmap_lock (e.g. using an unused bit in the PTE)? Then we could avoid extra trips to userspace for MADV_POPULATE_WRITE. > > Both of these seem pretty simple to implement in the kernel; the most > complicated part is just returning KVM_EXIT_MEMORY_FAULT in more > places / for other architectures (I care about x86 and arm64). > > Right now both approaches seem fine to me. Not having to take the > mmap_lock in the fault path, while being such a minor difference now, > could be a huge benefit if we can later get around to making > UFFDIO_CONTINUE not need the mmap lock. Disregarding that, not > requiring userspace to guess the state of the page tables seems > helpful (less bug-prone, I guess). > > > > > > When KVM_RUN exits: > > > - If we haven't UFFDIO_CONTINUE'd yet, do that now and restart KVM_RUN. > > > - If we have, then something bad has happened. Slow GUP already ran > > > and failed, so we need to treat this in the same way we treat a > > > MADV_POPULATE_WRITE failure above: userspace might just want to crash > > > (or inject a memory error or something). > > > > > > - James