From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6229C4338F for ; Wed, 18 Aug 2021 08:35:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 56FC160C3E for ; Wed, 18 Aug 2021 08:35:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 56FC160C3E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D66318D0001; Wed, 18 Aug 2021 04:35:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D16216B0072; Wed, 18 Aug 2021 04:35:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDDA78D0001; Wed, 18 Aug 2021 04:35:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id A22FC6B006C for ; Wed, 18 Aug 2021 04:35:29 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3B2F31840406D for ; Wed, 18 Aug 2021 08:35:29 +0000 (UTC) X-FDA: 78487542378.26.D126AAD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf01.hostedemail.com (Postfix) with ESMTP id E315D504E1B8 for ; Wed, 18 Aug 2021 08:35:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629275728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b1Ko8+ILOMc+8O6rANr8DU6FjMVNXf3i4wLW1p6zNUU=; b=YW/4n+zLfMT6b0rbF5y0agsahDQZ6DqrH3zuR+GMaPkj5Cm5a/bj0sRtJ3E8rb0xow/1k/ Vaot71c5T7xmAkp5xcOqq/vdA2JugooIvi47K6FhvlKf5xk9JPtEGKRjrHBEQ8b9uKKAad 70pDAU2p7sOcfTrCPceTg7U2KsM3kiM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-37-p1OTHh1CNvyNKZngwvX4Fg-1; Wed, 18 Aug 2021 04:35:27 -0400 X-MC-Unique: p1OTHh1CNvyNKZngwvX4Fg-1 Received: by mail-wm1-f70.google.com with SMTP id m13-20020a7bcf2d000000b002e6cd9941a9so1988527wmg.1 for ; Wed, 18 Aug 2021 01:35:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=b1Ko8+ILOMc+8O6rANr8DU6FjMVNXf3i4wLW1p6zNUU=; b=immEZPIQYMZzy6IdPqpWLLKkOagb6nQAfvSwfJw5zIk9o+RpIcGZFRcznoQNVh2GSV xW5CAucKE6lGK4tqpIh6fV5db/BXFCaVks/6frjhZz+Hp1rLwkJd2JKD4sxNhJpHY3CS R1i9j69P6/4GJQUCEpRm/DwlyswXuDFevKgX9sSzn+/ocVCttdOkg5YKk4PWlwXEmmiT vkTFg2Jd6o7OV6J/TIjv1SkoPKM9ESuSe+1K7jc4c2xqe7aGaf++cS/DBAjf4TgPyO1e JuKfTxk5cQtFDVFsHeW8KGnb4MTtKZ+EvMZkYN5ItSo0HHnn8GydtSzxBk26qEfJdnBz 0XTA== X-Gm-Message-State: AOAM5329LMNrkJbob9jl9KvRfbkKRMvREdSx+6AhVPph4hIC20pdaTeN nS85QWI+iMv0ib4yUgbMkJt1fPLjY+7EWacnO9P3iXU+ij0Wv/tW15KR+g9ju0zyf/xcFNMiAQh SSG7yvLqO4SVvlSDLK468oPPTeZHDPDWirQFVpjvofDJxJSk4wD+t2ZsnNGA= X-Received: by 2002:a05:6000:25a:: with SMTP id m26mr9123842wrz.262.1629275726010; Wed, 18 Aug 2021 01:35:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyb/yN+nWIaEIebPNyA6KpcKsGT6xb17YaQThnz7zbqeBRx2VuHB3zWqtLfRlASOujKa1hfVQ== X-Received: by 2002:a05:6000:25a:: with SMTP id m26mr9123809wrz.262.1629275725712; Wed, 18 Aug 2021 01:35:25 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6417.dip0.t-ipconnect.de. [91.12.100.23]) by smtp.gmail.com with ESMTPSA id r4sm4279212wmq.34.2021.08.18.01.35.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Aug 2021 01:35:25 -0700 (PDT) To: "Michael Kerrisk (man-pages)" , linux-man@vger.kernel.org Cc: Pankaj Gupta , Alejandro Colomar , Andrew Morton , Michal Hocko , Oscar Salvador , Jann Horn , Mike Rapoport , Linux API , linux-mm@kvack.org References: <20210816081922.5155-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE Message-ID: <70792f9c-ace1-6876-378b-5388f7948a60@redhat.com> Date: Wed, 18 Aug 2021 10:35:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E315D504E1B8 X-Stat-Signature: ik88z96cogw8urwzho9ks6gectkzf8me Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="YW/4n+zL"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf01.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1629275728-982921 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote: > Hello David, >=20 > Thank you for writing this! Could you please take > a look at the comments below and revise? Hi Michael, thanks for your valuable input. Your feedback will certainly make this=20 easier to understand for people that are not heavily involved in MM work = :) [...] >> man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++= ++ >> 1 file changed, 107 insertions(+) >> >> diff --git a/man2/madvise.2 b/man2/madvise.2 >> index f1f384c0c..f6cea9ad2 100644 >> --- a/man2/madvise.2 >> +++ b/man2/madvise.2 >> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be wr= itten back to the backing >> storage. >> The advice might be ignored for some pages in the range when it is n= ot >> applicable. >> +.TP >> +.BR MADV_POPULATE_READ " (since Linux 5.14)" >> +Populate (prefault) page tables readable for the whole range without = actually >=20 > I have trouble to understand "Populate (prefault) page tables readable"= . > Does it mean that it is just the page tables are being populated, and t= he > PTEs are marked to indicate that the pages are readable? If yes, I > think some rewording would help. I actually tried phrasing it similar to our MAP_POPULATE documentation: ("Populate (prefault) page tables for a mapping.") We will prefault all pages, faulting them in. >=20 >> +reading memory. >=20 > I don't understand "without actually reading memory"? Do you mean, > "without actually faulting in the pages"; or something else? "Populate (prefault) page tables readable, faulting in all pages in the=20 range just as if manually reading one byte of each page; however, avoid=20 the actual memory access that would have been performed after handling=20 the fault." Does that make it clearer? (avoiding eventually touching the page at all=20 can be beneficial, especially when dealing with DAX memory where memory=20 access might be expensive) >=20 >> +Depending on the underlying mapping, >> +map the shared zeropage, >> +preallocate memory or read the underlying file; >> +files with holes might or might not preallocate blocks. >> +Do not generate >> +.B SIGBUS >> +when populating fails, >> +return an error instead. >=20 > Better: >=20 > [[ > If populating fails, a > .B SIGBUS > signal is not generated; instead, an error i returned. > ]] >=20 Sure, thanks. >> +.IP >> +If >> +.B MADV_POPULATE_READ >> +succeeds, >> +all page tables have been populated (prefaulted) readable once. >> +If >> +.B MADV_POPULATE_READ >> +fails, >> +some page tables might have been populated. >> +.IP >> +.B MADV_POPULATE_READ >> +cannot be applied to mappings without read permissions >> +and special mappings, >> +for example, >> +marked with the kernel-internal >=20 > s/marked/mappings marked/ >=20 >> +.B VM_PFNMAP >> +and >=20 > Just checking: should it be "and" or "or" here"? >=20 > Looking at the EINVAL error below, I guess "or", and a better > wording would be: >=20 > [[ > ...for example, mappings marked with kernel-internal flags such as > .B VMPPFNMAP > or > .BR BR_V_IO. > ]] Much better. Note that there might be more types of mappings that won't=20 work (e.g., initially also secretmem IIRC). >=20 >> +.BR VM_IO . >> +.IP >> +Note that with >> +.BR MADV_POPULATE_READ , >> +the process can be killed at any moment when the system runs out of m= emory. >> +.TP >> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)" >> +Populate (prefault) page tables writable for the whole range without = actually >=20 > I have trouble to understand "Populate (prefault) page tables writable"= . > Does it mean that it is just the page tables are being populated, and t= he > PTEs are marked to indicate that the pages are writable? If yes, I > think some rewording would help. >=20 >> +writing memory. >=20 > I don't understand "without actually writing memory"? Do you mean, > "without actually faulting in the pages"; or something else? >=20 Similar to the other wording: "Populate (prefault) page tables writable, faulting in all pages in the=20 range just as if manually writing one byte of each page; however, avoid=20 the actual memory access that would have been performed after handling=20 the fault." >> +Depending on the underlying mapping, >> +preallocate memory or read the underlying file; >> +files with holes will preallocate blocks. >> +Do not generate >> +.B SIGBUS >> +when populating fails, >> +return an error instead. >=20 > Better: >=20 > [[ > If populating fails, a > .B SIGBUS > signal is not generated; instead, an error i returned. > ]] >=20 Ack. > +.IP >> +If >> +.B MADV_POPULATE_WRITE >> +succeeds, >> +all page tables have been populated (prefaulted) writable once. >> +If >> +.B MADV_POPULATE_WRITE >> +fails, some page tables might have been populated. >> +.IP >> +.B MADV_POPULATE_WRITE >> +cannot be applied to mappings without write permissions >> +and special mappings, >> +for example, >> +marked with the kernel-internal >=20 > s/marked/mappings marked/ >=20 >> +.B VM_PFNMAP >> +and >=20 > Just checking: should it be "and" or "or" here"? >=20 > Looking at the EINVAL error below, I guess "or", and a better > wording would be: >=20 > [[ > ...for example, mappings marked with kernel-internal flags such as > .B VMPPFNMAP > or > .BR BR_V_IO. > ]] >=20 Ack. >> +.BR VM_IO . >> +.IP >> +Note that with >> +.BR MADV_POPULATE_WRITE , >> +the process can be killed at any moment when the system runs out of m= emory. >> .SH RETURN VALUE >> On success, >> .BR madvise () >> @@ -490,6 +556,17 @@ A kernel resource was temporarily unavailable. >> .B EBADF >> The map exists, but the area maps something that isn't a file. >> .TP >> +.B EFAULT >> +.I advice >> +is >> +.B MADV_POPULATE_READ >> +or >> +.BR MADV_POPULATE_WRITE , >> +and populating (prefaulting) page tables failed because a >> +.B SIGBUS >> +would have been generated on actual memory access and the reason is n= ot a >> +HW poisoned page. >=20 > Maybe: > s/.$/(see the description of MADV_HWPOISON in this page)./ > ? >=20 Sure, we can add that. But note that MADV_HWPOISON is just one of many=20 ways to HWpoison a page. >> +.TP >> .B EINVAL >> .I addr >> is not page-aligned or >> @@ -533,6 +610,18 @@ or >> .BR VM_PFNMAP >> ranges. >> .TP >> +.B EINVAL >> +.I advice >> +is >> +.B MADV_POPULATE_READ >> +or >> +.BR MADV_POPULATE_WRITE , >> +but the specified address range includes ranges with insufficient per= missions >> +or incompatible mappings such as >> +.B VM_IO >> +or >> +.BR VM_PFNMAP. >=20 > s/.BR VM_PFNMAP./.BR VM_PFNMAP ./ >=20 Agreed. >> +.TP >> .B EIO >> (for >> .BR MADV_WILLNEED ) >> @@ -548,6 +637,15 @@ Not enough memory: paging in failed. >> Addresses in the specified range are not currently >> mapped, or are outside the address space of the process. >> .TP >> +.B ENOMEM >> +.I advice >> +is >> +.B MADV_POPULATE_READ >> +or >> +.BR MADV_POPULATE_WRITE , >> +and populating (prefaulting) page tables failed because there was not= enough >> +memory. >> +.TP >> .B EPERM >> .I advice >> is >> @@ -555,6 +653,15 @@ is >> but the caller does not have the >> .B CAP_SYS_ADMIN >> capability. >> +.TP >> +.B EHWPOISON >> +.I advice >> +is >> +.B MADV_POPULATE_READ >> +or >> +.BR MADV_POPULATE_WRITE , >> +and populating (prefaulting) page tables failed because a HW poisoned= page >> +was encountered. >> .SH VERSIONS >> Since Linux 3.18, >> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb >=20 > Thanks, >=20 Thanks a lot Michael! --=20 Thanks, David / dhildenb