On Fri, 2022-10-21 at 13:48 -0700, Mike Kravetz wrote: > On 10/21/22 15:45, Rik van Riel wrote: > > A common use case for hugetlbfs is for the application to create > > memory pools backed by huge pages, which then get handed over to > > some malloc library (eg. jemalloc) for further management. > > > > That malloc library may be doing MADV_DONTNEED calls on memory > > that is no longer needed, expecting those calls to happen on > > PAGE_SIZE boundaries. > > > > Thanks Rik.  I tend to agree with this direction as it is 'breaking' > current code.  David and I discussed this in this thread, > https://lore.kernel.org/linux-mm/356a4b9a-1f56-ae06-b211-bd32fc93ecda@redhat.com/ > > One thing to note is that there was not any documentation saying > madvise would happen on page boundaries.  The system call takes a > length and rounds up to page size.  However, the man page explicitly > said it operates on a byte range.  Certainly mm people and others > know we only operate on pages.  But, that is not what was documented. > > When the change was made to add hugetlb support, the decision was > made > to round up the range to hugetlb page boundaries in hugetlb vmas.  > This > was to be consistent with how madvise operated on base pages.  At the > same time, madvise documentation was updated say it operates on page > boundaries as well as the behavior for hugetlb mappings.  If moving > forward with this change we will need to update the man page. I'll send in a patch for the man page after the patch gets merged. I'll change the text to clarify that the system may round up the specified length to PAGE_SIZE granularity, which is a quantity programs can get through (IIRC) getconf. Andrew, I split out the bit of the patch for stable. -- All Rights Reversed.