From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34EB7C432BE for ; Wed, 18 Aug 2021 20:58:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BAC31610A4 for ; Wed, 18 Aug 2021 20:58:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BAC31610A4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 148636B006C; Wed, 18 Aug 2021 16:58:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F7EA8D0001; Wed, 18 Aug 2021 16:58:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F01316B0072; Wed, 18 Aug 2021 16:58:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id D4E7C6B006C for ; Wed, 18 Aug 2021 16:58:25 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 665B6253C7 for ; Wed, 18 Aug 2021 20:58:25 +0000 (UTC) X-FDA: 78489414570.35.CF98B17 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf11.hostedemail.com (Postfix) with ESMTP id 1B4D7F005897 for ; Wed, 18 Aug 2021 20:58:24 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id qe12-20020a17090b4f8c00b00179321cbae7so3269396pjb.2 for ; Wed, 18 Aug 2021 13:58:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=cc:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=eqOo/6HQ3yqEsh3fgdKiKd/SoMq47M4f8HbYZUxUp64=; b=uTqJEGsU3Kh/nFVcwOBLgzZcDftaXaoaXoL1X/nC0AJCQvXTYvw2ZxB9HJeoyPBLIl R84XI176tCE3TKKPupcgfhlylMg/12mfGKf+KI7ifBcO0IK68JqMdknv4kM8yCVZ6LhZ mR3hLEYC5j5GuWVEsw49mM3ZO9FESIgbiIRRm1U+1BcDjxXa991OQooGOzqbEAInU1EM XDIHF0zlbZXEqDx02VgUXJ1uAM9rI6NLZXTvlXY6dJ3Nr6SQX7ir8+ht03z8S/gPSrbG iFyE5mOGQ4GN11wWnnUBZyPCN4E0VJqpOMOYXoULWzAOxVbfj0PQTTBVfkCX6zYhJAXI 6bBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:cc:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eqOo/6HQ3yqEsh3fgdKiKd/SoMq47M4f8HbYZUxUp64=; b=OrQGlvj4ayBrwnifIAvPIAUGXZuQh9zk4mtLYcY1jwkCbRC6ijDV5vgge2+JiMOFA8 Jwg3ctxsyLYExDrV+CblvIap/4zUQCfV1huIntvFlWn2SuSsg5GcrOLQlJ2udvjCY6G9 C7EfRRr2wIbE3qfLf7FF+R3GQ/QRk9C+ox3PtQP/cViFcYe8FKfoxiRBaW0gsnYB98KY E/BQtX4noyroTkbawySQDUKOxo+uD9EsS1R9SxlFC+KhvMNAzZVfZS2tiYrlP5Vqp7Qn FZSYe23yN9uQE7gB6+fH55L/xDj7kED+SPPNeA+PcXDOtrXYnSnFpkVMRnd76xaVVObY oaSQ== X-Gm-Message-State: AOAM531IpmrhtFCGRarlXshNYrwEBXEQ3KN6J55ksj64TOHf0XAPTJZ9 KH4LUemRQylcmbHVWFaysbmhHd74Ae8= X-Google-Smtp-Source: ABdhPJwl/IKmfu2aNrkf75QS/Df1fV3xJdszoRbiJNETdwX6/g3h71/P/AFPN4obVLvRln+2tykVrg== X-Received: by 2002:a17:90a:c087:: with SMTP id o7mr11151357pjs.57.1629320303713; Wed, 18 Aug 2021 13:58:23 -0700 (PDT) Received: from [192.168.1.71] (122-61-176-117-fibre.sparkbb.co.nz. [122.61.176.117]) by smtp.gmail.com with ESMTPSA id 4sm667832pjb.21.2021.08.18.13.58.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Aug 2021 13:58:23 -0700 (PDT) Cc: mtk.manpages@gmail.com, Pankaj Gupta , Alejandro Colomar , Andrew Morton , Michal Hocko , Oscar Salvador , Jann Horn , Mike Rapoport , Linux API , linux-mm@kvack.org Subject: Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE To: David Hildenbrand , linux-man@vger.kernel.org References: <20210816081922.5155-1-david@redhat.com> <70792f9c-ace1-6876-378b-5388f7948a60@redhat.com> From: "Michael Kerrisk (man-pages)" Message-ID: Date: Wed, 18 Aug 2021 22:58:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <70792f9c-ace1-6876-378b-5388f7948a60@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1B4D7F005897 X-Stat-Signature: 9iutonptekqpjdhf9suzhxfy9fsfqt3s Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=uTqJEGsU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of mtkmanpages@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=mtkmanpages@gmail.com X-HE-Tag: 1629320304-238991 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello David, On 8/18/21 10:35 AM, David Hildenbrand wrote: > On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote: >> Hello David, >> >> Thank you for writing this! Could you please take >> a look at the comments below and revise? > > Hi Michael, > > thanks for your valuable input. Your feedback will certainly make this > easier to understand for people that are not heavily involved in MM work :) > > [...] > >>> man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 107 insertions(+) >>> >>> diff --git a/man2/madvise.2 b/man2/madvise.2 >>> index f1f384c0c..f6cea9ad2 100644 >>> --- a/man2/madvise.2 >>> +++ b/man2/madvise.2 >>> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing >>> storage. >>> The advice might be ignored for some pages in the range when it is not >>> applicable. >>> +.TP >>> +.BR MADV_POPULATE_READ " (since Linux 5.14)" >>> +Populate (prefault) page tables readable for the whole range without actually >> >> I have trouble to understand "Populate (prefault) page tables readable". >> Does it mean that it is just the page tables are being populated, and the >> PTEs are marked to indicate that the pages are readable? If yes, I >> think some rewording would help. > > I actually tried phrasing it similar to our MAP_POPULATE documentation: > ("Populate (prefault) page tables for a mapping.") Yeah, well that description is a bit thin too :-}. > We will prefault all pages, faulting them in. >> >>> +reading memory. >> >> I don't understand "without actually reading memory"? Do you mean, >> "without actually faulting in the pages"; or something else? > > "Populate (prefault) page tables readable, faulting in all pages in the > range just as if manually reading one byte of each page; however, avoid > the actual memory access that would have been performed after handling > the fault." > > Does that make it clearer? (avoiding eventually touching the page at all > can be beneficial, especially when dealing with DAX memory where memory > access might be expensive) That text is much better. But, what's still not clear to me then is the dfference between mmap(2) MAP_POPULATE, and MADV_POPULATE_READ and MADV_POPULATE_WRITE. What is the differnece, and in what situations would one prefer one or the other approach? I think it would be helpful if the manual page said something about these details. >>> +Depending on the underlying mapping, >>> +map the shared zeropage, >>> +preallocate memory or read the underlying file; >>> +files with holes might or might not preallocate blocks. >>> +Do not generate >>> +.B SIGBUS >>> +when populating fails, >>> +return an error instead. >> >> Better: >> >> [[ >> If populating fails, a >> .B SIGBUS >> signal is not generated; instead, an error i returned. >> ]] >> > > Sure, thanks. > >>> +.IP >>> +If >>> +.B MADV_POPULATE_READ >>> +succeeds, >>> +all page tables have been populated (prefaulted) readable once. >>> +If >>> +.B MADV_POPULATE_READ >>> +fails, >>> +some page tables might have been populated. >>> +.IP >>> +.B MADV_POPULATE_READ >>> +cannot be applied to mappings without read permissions >>> +and special mappings, >>> +for example, >>> +marked with the kernel-internal >> >> s/marked/mappings marked/ >> >>> +.B VM_PFNMAP >>> +and >> >> Just checking: should it be "and" or "or" here"? >> >> Looking at the EINVAL error below, I guess "or", and a better >> wording would be: >> >> [[ >> ...for example, mappings marked with kernel-internal flags such as >> .B VMPPFNMAP >> or >> .BR BR_V_IO. >> ]] > > Much better. Note that there might be more types of mappings that won't > work (e.g., initially also secretmem IIRC). Ahh nice. Since there's about to be a memfd_secret() manual page, I suggest adding also "or secret memory regions created using memfd_secret(2)". >>> +.BR VM_IO . >>> +.IP >>> +Note that with >>> +.BR MADV_POPULATE_READ , >>> +the process can be killed at any moment when the system runs out of memory. >>> +.TP >>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)" >>> +Populate (prefault) page tables writable for the whole range without actually >> >> I have trouble to understand "Populate (prefault) page tables writable". >> Does it mean that it is just the page tables are being populated, and the >> PTEs are marked to indicate that the pages are writable? If yes, I >> think some rewording would help. >> >>> +writing memory. >> >> I don't understand "without actually writing memory"? Do you mean, >> "without actually faulting in the pages"; or something else? >> > > Similar to the other wording: > > "Populate (prefault) page tables writable, faulting in all pages in the > range just as if manually writing one byte of each page; however, avoid > the actual memory access that would have been performed after handling > the fault." Much better, but see also my comments above re MADV_POPULATE_READ. [...] >>> +.B EFAULT >>> +.I advice >>> +is >>> +.B MADV_POPULATE_READ >>> +or >>> +.BR MADV_POPULATE_WRITE , >>> +and populating (prefaulting) page tables failed because a >>> +.B SIGBUS >>> +would have been generated on actual memory access and the reason is not a >>> +HW poisoned page. >> >> Maybe: >> s/.$/(see the description of MADV_HWPOISON in this page)./ >> ? >> > > Sure, we can add that. But note that MADV_HWPOISON is just one of many > ways to HWpoison a page. Then maybe something like: "(HW poisoned pages can, for example, be created using the MADV_HWPOISON flag described elsewhere in this page.)" [...] Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/