From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7622EC433EF for ; Wed, 25 May 2022 18:38:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDD9D8D0003; Wed, 25 May 2022 14:38:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B64108D0001; Wed, 25 May 2022 14:38:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98CBF8D0003; Wed, 25 May 2022 14:38:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 834268D0001 for ; Wed, 25 May 2022 14:38:52 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 556E41213E4 for ; Wed, 25 May 2022 18:38:52 +0000 (UTC) X-FDA: 79505126904.03.C2E0183 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 93440180019 for ; Wed, 25 May 2022 18:38:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1653503931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H/tW/Qrqk4/G5pQlU4Ra/fXakbIqjHAMYdwDV+SraSk=; b=QvkvVjoOGhBbjZLGRljkQEKkBMeyaW6FmoJMbhE0poBqfx2KVw2q60FGAsFxs/AjH/wIg3 lNNwxgovQVTsUFz1aWhcSW+qZu2/5fXstMhp5x86fYq8A/j79aMuQigvTPjbunJX1hqLqF qZtEkUer3/m9JvRY/GE0/tz2BYAUw6w= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-262-10LMpMJQN--eWay7PA4AyA-1; Wed, 25 May 2022 14:38:49 -0400 X-MC-Unique: 10LMpMJQN--eWay7PA4AyA-1 Received: by mail-io1-f69.google.com with SMTP id s198-20020a6b2ccf000000b0065e4872af2dso10906609ios.21 for ; Wed, 25 May 2022 11:38:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=H/tW/Qrqk4/G5pQlU4Ra/fXakbIqjHAMYdwDV+SraSk=; b=MYCSulQi/6kWyujhc3PbU5ku83Rayxn5V/SVIEkQyphs1QePZNrAxFYx6UpYDvRs6/ f/mGUUxIWRQJySERdJqh5nPVyYFUXpWsBhRMR5P9aSFlKVSEuFPmLnd2XAW8Ma4qnc17 YwnOT39UrDulMXdL44dTrji79gHd/85Lx7KTd3vTLj3Xa7HW0wlag+rRCGtLtpLjiWVP r6Q38OvG8VxmU0zS226UOZAFvo4va9tHZWts1esvbzyyFDRUK5XSh8MMzjhAAAgC3+n0 /7ZNhzHuXg6ewKYiUO5trKJU16aJehExC9U2RwftchfQmAVk0nzTKWYI80trFjpkCLnH 8jmQ== X-Gm-Message-State: AOAM530z2+FJVfFXzEhMxehfxpyYLdlEZQUk/woryPBGYQsLf6Xzm1CY di3/m4bI2Mb24T8Ks7GZRR+I96Zz+D3y8aPyQHMRktgXtRgzo60+Rob7OQ67O5edoAZvpnyNFBG MecoT5cWnCdE= X-Received: by 2002:a05:6638:250d:b0:330:a268:e76 with SMTP id v13-20020a056638250d00b00330a2680e76mr3140089jat.115.1653503928883; Wed, 25 May 2022 11:38:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5//BWgRhmE+0rRhiaZ5bRN6CyVatTFrlenPZ6KoQcIVG1HotsbWQ4m3s9Qilm0k05BB+aKw== X-Received: by 2002:a05:6638:250d:b0:330:a268:e76 with SMTP id v13-20020a056638250d00b00330a2680e76mr3140078jat.115.1653503928576; Wed, 25 May 2022 11:38:48 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id r3-20020a0566022b8300b0065a47e16f59sm4824746iov.43.2022.05.25.11.38.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 May 2022 11:38:48 -0700 (PDT) Date: Wed, 25 May 2022 14:38:46 -0400 From: Peter Xu To: Mike Kravetz Cc: linux-man@vger.kernel.org, linux-mm@kvack.org, Alejandro Colomar , Michael Kerrisk , David Hildenbrand , Axel Rasmussen Subject: Re: [PATCH] madvise.2: Clarify addr/length and update hugetlb support Message-ID: References: <20220524232844.169332-1-mike.kravetz@oracle.com> MIME-Version: 1.0 In-Reply-To: <20220524232844.169332-1-mike.kravetz@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 93440180019 X-Stat-Signature: biirc3oru8bbpuzsrtb114ou4g7o6k43 X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QvkvVjoO; spf=none (imf24.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1653503918-454685 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Mike, Some minor nitpicks below. On Tue, May 24, 2022 at 04:28:44PM -0700, Mike Kravetz wrote: > Clarify that madvise only works on full pages, and remove references > to 'bytes'. > > Update MADV_DONTNEED and MADV_REMOVE sections to remove notes that > HugeTLB mappings are not supported. They now are supported. > > Under 'Linux notes' describe addr requirements and length handling > for ranges in HugeTLB mappings. > > Signed-off-by: Mike Kravetz > --- > man2/madvise.2 | 36 ++++++++++++++++++++++++++---------- > 1 file changed, 26 insertions(+), 10 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index f1f384c0c..c3b0615cb 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -61,9 +61,13 @@ system call is used to give advice or directions to the kernel > about the address range beginning at address > .I addr > and with size > +.IR length . > +.BR madvise () > +only operates on whole pages, therefore > +.I addr > +must be page-aligned. The value of > .I length > -bytes > -In most cases, > +is rounded up to a multiple of page size. In most cases, > the goal of such advice is to improve system or application performance. > .PP > Initially, the system call supported a set of "conventional" > @@ -143,7 +147,7 @@ The resident set size (RSS) of the calling process will be immediately > reduced however. > .IP > .B MADV_DONTNEED > -cannot be applied to locked pages, Huge TLB pages, or > +cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. This looks good, but since this will be a behavior change and we won't be able to change the old kernels, I saw the man page normally does this with things like: Since Linux 5.18, this madvise supports hugetlbfs pages. Majorly it states starting from which version it'll work, and when it'll not. > (Pages marked with the kernel-internal > @@ -170,24 +174,24 @@ Note that some of these operations change the semantics of memory accesses. > .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 > Free up a given range of pages > and its associated backing store. > -This is equivalent to punching a hole in the corresponding byte > +This is equivalent to punching a hole in the corresponding > range of the backing store (see > .BR fallocate (2)). > Subsequent accesses in the specified address range will see > -bytes containing zero. > +pages containing zero. > .\" Databases want to use this feature to drop a section of their > .\" bufferpool (shared memory segments) - without writing back to > .\" disk/swap space. This feature is also useful for supporting > .\" hot-plug memory on UML. > .IP > The specified address range must be mapped shared and writable. > -This flag cannot be applied to locked pages, Huge TLB pages, or > +This flag cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. > .IP > In the initial implementation, only > .BR tmpfs (5) > -was supported > +supported > .BR MADV_REMOVE ; > but since Linux 3.5, > .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > @@ -196,9 +200,9 @@ any filesystem which supports the > .BR FALLOC_FL_PUNCH_HOLE > mode also supports > .BR MADV_REMOVE . > -Hugetlbfs fails with the error > -.BR EINVAL > -and other filesystems fail with the error > +Filesystems which do not support > +.BR MADV_REMOVE > +fail with the error > .BR EOPNOTSUPP . > .TP > .BR MADV_DONTFORK " (since Linux 2.6.16)" > @@ -596,6 +600,18 @@ that are not mapped, the Linux version of > ignores them and applies the call to the rest (but returns > .B ENOMEM > from the system call, as it should). > +.PP > +If the specified address > +.I addr > +is within a mapping backed by Huge TLB pages, then > +.I addr > +must be aligned to the underlying Huge TLB page size. If the range > +specified by > +.I addr > +and > +.I length > +ends in a mapping backed by Huge TLB pages, then the end of the range > +will be rounded up to a multiple of the underlying Huge TLB page size. I'm slightly worried this could be hidden too deep, meanwhile it duplicates part of the sentence of how start/end will be treated. How about adding a short paragraph into each of MADV_DONTNEED and MADV_REMOVE section (right after the new sentences upon hugetlbfs), with: For hugetlbfs, the start/end alignments on page sizes will be based on huge page size. No strong opinions on any of these. Anyway: Acked-by: Peter Xu Thanks, -- Peter Xu