From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2CA4C3ABAA for ; Mon, 5 May 2025 16:37:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0173E6B000A; Mon, 5 May 2025 12:37:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EDAC76B0089; Mon, 5 May 2025 12:37:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D54A26B008A; Mon, 5 May 2025 12:37:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B78776B0085 for ; Mon, 5 May 2025 12:37:16 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0F0961602FC for ; Mon, 5 May 2025 16:37:18 +0000 (UTC) X-FDA: 83409409356.16.C88CA10 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 053732000C for ; Mon, 5 May 2025 16:37:15 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kylehuey.com header.s=google header.b=TawUey8q; dmarc=none; spf=pass (imf03.hostedemail.com: domain of me@kylehuey.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=me@kylehuey.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746463036; a=rsa-sha256; cv=none; b=0u6BzIapbySUe9EMdNQK79+PDMPzctaDkpg+x86J1RAd3RsB/Tk3TOMINCj1OivnlutBrp eWciGs4ptPFyqcdPsCJqEn9BtB5tIUHgxKq4YZKSVRJw/vwONIdD3OxlzNQEYzssAOD6sI Ee8Yba0bbmHRBVEVeCwhrawQLkcKcmA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kylehuey.com header.s=google header.b=TawUey8q; dmarc=none; spf=pass (imf03.hostedemail.com: domain of me@kylehuey.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=me@kylehuey.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746463036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=DYVG8FHPEwJMjW4hwHccNZPPUKc3Wmzb9dipMGOCitc=; b=rocnOV4Jzke8TOAmBmosaTqdqQA7mphLKdncpfnHHERaCEyS88YxfHMUusOivOE4pDWkgd cV31NcES3Nhb2y6oWOn5PM9HLBF6+6tMKBiTOgpkxfNG06h7iRNuECRgLsBSHXeose4yzk w6T0JrbuV9WTenCjC6G6oqiqkI1Bedg= Received: by mail-lj1-f180.google.com with SMTP id 38308e7fff4ca-30bfb6ab47cso41294121fa.3 for ; Mon, 05 May 2025 09:37:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kylehuey.com; s=google; t=1746463034; x=1747067834; darn=kvack.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=DYVG8FHPEwJMjW4hwHccNZPPUKc3Wmzb9dipMGOCitc=; b=TawUey8qRv05xG27ofFkvFUwbFn+kWx1sc/qZf9qgoD0JTD/tK7+gQTda2SDi1g07J tc3I5QPuMkJzg0OWZTLHKwF0I48LwsjpmS6mH8AZQ8R3BnALtiPPkTe81rB606+3fWyD rPJ9kHvvrGTQMV9ksWX6iWDpUco6PPsB40vexDzdGFNMnNyZNi1WA40Xnn7hcPJXM6nT 2xTvuSAwe0vUMcjyd6AJciIjsP9b3jpNt4TzHq1pBcCx/t6svv2nbivM0mkK41YA0cNc yDN5xfWeuWTLkoWNS5/sXpAXZsFbknz6fmlGkHmcHkc2w06BCteYNqJ3+1Kjf25VUrvG 8rag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746463034; x=1747067834; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=DYVG8FHPEwJMjW4hwHccNZPPUKc3Wmzb9dipMGOCitc=; b=mVGlh6Ph9XOOQYCZA9cBB8EnN7AXhhoHBxKrKtd+JREohDtrMzfycmTp0D9lJxUlkU rqy0uMcdo6JQhWSaDvL1/pk1Sd9jyhQG0Kl0rR4C1h95sbitgwbAWsjycdy9VoieXXoA js6l1uj/bF1TYHeoGgez2VN1bboua2AAqdNuuFGr182XTaD6yLmbtwsF0QnFDN5zt3yG 4SpraFNoWNXRCSgJzcwZyvevF9XB9oO5js9z8/m+7Mrmiln4vhwBOEMoaB8G6MUA6pUE Cbi7/L4FVRbpbaJETxtVOilPeBkcdiZARLrmvtSj0aDaWOh8MpL1JYI4pF+lAf6CtaDt QZ9g== X-Forwarded-Encrypted: i=1; AJvYcCXPOxi86VRlDCBAyyQ3NthJky/mjqr7SBz+rlTCE3pCOZlW7CtiJts9L+UXDIot4+H7RZaKJNJCSg==@kvack.org X-Gm-Message-State: AOJu0YyUKwGdmiSA3B7wZPx765JzqzIKXoKaJNlIPxGRBIu4XDJF074b PUFBzeD/HMUaPDu49h5VGnIV0I7pPYTsj7eXh6LFxPY89ypMBzwF+wEiBI2roDa9voifQw3JFNw GMNLm+FSXwWc7b21ugJvNS1IRLkmiqeDekMOl X-Gm-Gg: ASbGncs4b6a1IOhKH57Ss8yISuSVgGTTtCf/KzIiiFe9s3lKoAxO8WokaNFaLT2LEQA GjxVZZfe6R67QirEgmOJVj889+/92m1Ngk9R9MgZAjoqK8omRtsJjhTzfbmKXJG5IcnOLkC2spo 1I+cNESwBuEF3ohgtAhxCpiy8lzyYOV1xb X-Google-Smtp-Source: AGHT+IEf041WQMZsEB6ApJMZB+Xxx8JHQfvRRsLBSqzHkQpZbZMtYWqmoWb/v7skUZcL3dTPnjTDniBrEIOLE/Z180E= X-Received: by 2002:a2e:a812:0:b0:31f:8659:dc23 with SMTP id 38308e7fff4ca-32352503df1mr32065821fa.33.1746463033960; Mon, 05 May 2025 09:37:13 -0700 (PDT) MIME-Version: 1.0 From: Kyle Huey Date: Mon, 5 May 2025 09:37:01 -0700 X-Gm-Features: ATxdqUEwA7iSnVgV7ajfnBVATpF9QSTCIL6wXi_qBYuQWjWHd4nepvj6iJ4LJmk Message-ID: Subject: Suppress pte soft-dirty bit with UFFDIO_COPY? To: Andrew Morton , Peter Xu Cc: open list , linux-mm@kvack.org, criu@lists.linux.dev, "Robert O'Callahan" Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 053732000C X-Rspamd-Server: rspam04 X-Stat-Signature: wsps3iuhwqw4inqcb97qrf86g9adgf76 X-HE-Tag: 1746463035-24728 X-HE-Meta: U2FsdGVkX1+9lXtoX2XJAiCi73zagam+kYouRmhyPD2KqzFVyUxMjQEt+3Ohg4UBsdo74ukF4S2sbg8zNmq11RLSpWxOjSZfH0sFTmrpiundkquHZYa7DZ6roBhb9pxzf5wfzDc+nrlVVNVbjnRWZ9pWk/8hm3CpoeI6MXJlMj98DIub4Jtfhs7CInAjpYuqc1OgzGfGlbFnc7Cdo7GZTwEg28OsxGubGX2sounmsEBANQUSa2vpune6Lb8t7eh81bUy63qo1EccwKm1enp5sUBKsyL6Od+f9oBgckxL51bfIK5usw9NOP9ZaQzCxNorCzGFwD363Obp3w7rPOgY+1i7RkIxkSuW3wSSho8Ri3IxrwfjtkFh5apz+KNBfMA4sGaEvYLsjifpB5eCGntMc+kguyTkH9wUGrtGKx/mSGT/ZLpGELGJlVT0dmk9w+IyetANHG/4RimBKrfhuvA74KNXPs8gtdpmdaXFt1I1WW9ELn3TMNSBAB1973ljdPotFq3bMVQykjQe/ssUeHiLeWoC7ccAErkmoQr7SwDlj/ZdaDqz4p9xB3oU5F5vNjTOcKq7hUJTObU/0aodBjqANMeKhoG2b7khZV1pyFW2Mm9QJO3AUgl5hRjyKPk/nXdlDNxwWc5S2ouk4Jz6U1wiXGcTkz+vywAK8yuwGOaG7w0gRw91OPP1ueiVxMwnV/gXVffFCvf1oqg9NeG93n3S82sUCerNIgiooHZGXvNu5s5PSVa6NMHnQnkXL/uKk/M+bYavJH7iJGQYW5Gdjqpzs8Xz9W1ouh/hXto9mfTK2vn9xhDaeOh5fYUQvksGVGGd8mmqJZVqal9CLGySr6gipaynKTOj7F5syekK8pYk0vuNE/ACuTTShtE5D1t2MoUERDtjHdadm8BFM6myA0nXyDPspFHF0jhRfELOwDVRYWzfsy+02rG7M/LKfS88SSIu6DsZ/dlAgudvfLQmP6e q9elMH7o 7nZ1+t2uR1kmULCxOiPjFUt4BvRVhTspiC6Glty0L6kMus5nADZxoKNrbEMeEVEvQX8Wp1DwU7BcT6i2pt52srTdd+eem+gV+hFdO25gcbzFtMUgoWjf00n6HwKDQKfQ11kkV/lsX0GdXfXDrKxjZbEfS2Wpz+Uwzp5luQG5NgfK98jl3yAWbijS8lSx+G7b+VxgBSkI/BWpHRrXBJuzp8JmoHWEp2OaJAcwiBSyUVQFl1wjx8VB8vg2K/9qDnChU+LoMtlSI2FnDfYE+HztU09SAJEiYLUHBwfOwf0V9604HrxW+Ei2XDBoWYz6iEDwdYyCEzYkANd71wr6JYg8usjS51jTl4y4hJn6w X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: tl;dr I'd like to add UFFDIO_COPY_MODE_DONTSOFTDIRTY that does not add the _PAGE_SOFT_DIRTY bit to the relevant pte flags. Any thoughts/objections? The kernel has a "soft-dirty" bit on ptes which tracks if they've been written to since the last time /proc/pid/clear_refs was used to clear the soft-dirty bit. CRIU uses this to track which pages have been modified since a previous checkpoint and reduce the size of the checkpoints taken. I would like to use this in my debugger[0] to track which pages a program function dirties when that function is invoked from the debugger. However, the runtime environment for this function is rather unusual. In my debugger, the process being debugged doesn't actually exist while it's being debugged. Instead, we have a database of all program state (including registers and memory values) from when the process was executed. It's in some sense a giant core dump that spans multiple points in time. To execute a program function from the debugger we rematerialize the program state at the desired point in time from our database. For performance reasons, we fill in the memory lazily[1] via userfaultfd. This makes it difficult to use the soft-dirty bit to track the writes the function triggers, because UFFDIO_COPY (and friends) mark every page they touch as soft-dirty. Because we have the canonical source of truth for the pages we materialize via UFFDIO_COPY we're only interested in what happens after the userfaultfd operation. Clearing the soft-dirty bit is complicated by two things: 1. There's no way to clear the soft-dirty bit on a single pte, so instead we have to clear the soft-dirty bits for the entire process. That requires us to process all the soft-dirty bits on every other pte immediately to avoid data loss. 2. We need to clear the soft-dirty bits after the userfaultfd operation, but in order to avoid racing with the task that triggered the page fault we have to do a non-waking copy, then clear the bits, and then separately wake up the task. To work around all of this, we currently have a 4 step process: 1. Read /proc/pid/pagemap and note all ptes that are soft-dirty. 2. Do the UFFDIO_COPY with UFFDIO_COPY_MODE_DONTWAKE. 3. Write to /proc/pid/clear_refs to clear soft-dirty bits across the process. 4. Do a UFFDIO_WAKE. The overhead of all of this (particularly step 1) is a millisecond or two *per page* that we lazily materialize, and while that's not crippling for our purposes, it is rather undesirable. What I would like to have instead is a UFFDIO_COPY mode that leaves the soft-dirty bit unchanged, i.e. a UFFDIO_COPY_MODE_DONTSOFTDIRTY. Since we clear all the soft-dirty bits once after setting up all the mmaps in the process the relevant ptes would then "just do the right thing" from our perspective. But I do want to get some feedback on this before I spend time writing any code. Is there a reason not to do this? Or an alternate way to achieve the same goal? If this is generally sensible, then a couple questions: 1. Do I need a UFFD_FEATURE flag for this, or is it enough for a program to be able to detect the existence of a UFFDIO_COPY_MODE_DONTSOFTDIRTY by whether the ioctl accepts the flag or returns EINVAL? I would tend to think the latter. 2. Should I add this mode for the other UFFDIO variants (ZEROPAGE, MOVE, etc) at the same time even if I don't have any use for them? - Kyle [0] https://pernos.co/ [1] Conceptually this is similar to CRIU's `restore --lazy-pages`. We set up all the mappings at the beginning but we don't back them. Instead we UFFDIO_REGISTER them all and when they're touched for the first time we go get the pages from our database and then UFFDIO_COPY them into the address space.