From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3AB9C433DB for ; Tue, 9 Mar 2021 11:11:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 385B165253 for ; Tue, 9 Mar 2021 11:11:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 385B165253 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ska.ac.za Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A04D18D00DF; Tue, 9 Mar 2021 06:11:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B4838D007F; Tue, 9 Mar 2021 06:11:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 854FD8D00DF; Tue, 9 Mar 2021 06:11:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 6AA618D007F for ; Tue, 9 Mar 2021 06:11:11 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 257D03A82 for ; Tue, 9 Mar 2021 11:11:11 +0000 (UTC) X-FDA: 77900069142.09.3D9C5ED Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by imf14.hostedemail.com (Postfix) with ESMTP id 3C56DC0007C7 for ; Tue, 9 Mar 2021 11:11:06 +0000 (UTC) Received: by mail-yb1-f171.google.com with SMTP id b10so13548936ybn.3 for ; Tue, 09 Mar 2021 03:11:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ska-ac-za.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kcgVhqt7ei4neoJ4B0LudRIj1bg7WmhStgxCw/LBMic=; b=yBlXK0f83RnelyQc2U8n6lA5DuT4Gfls9QK8F37rtAgejad8xxUYCaRS+QNbH8h4gi MEO5rxoFNJj9T27Z3fb1ZcM5am0ujEk4fHfRzLDAT2JUls7RL5Ktcn3rWGK3SKo2D8ws ZJhqnhiGe2QDcLBxQsJmq2WqQJC3stK3QUF6s+8nNVk8H7ySQ+81c4QPqkai6VsOqR0v Fvsfs1C25BJgQA/0yVNE1ovttCD5VG3USl1q3LIfShxo99adBZ6spxBeAydMJIPlzQpw uWZhWzBZkE21KDb/MfwF4yr5L29d5Agfc9uFiM9venwnM/xYeMcS6uswnCy9p73cg0D/ aFpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kcgVhqt7ei4neoJ4B0LudRIj1bg7WmhStgxCw/LBMic=; b=ZmlIZFZgUzWvmhbMNUKwH0Gsi0bDJKFttTzjnJs+RCYn0N4D96WQPMYmKkiDwi00y9 jXndU3LUfZcJKdzvKf4x8HuYRYYrNck93pnWDmhDFzy0rAbAIgEqAFNVVpmxl8qOrfd8 qBiKwDppBVO3Oa2qj0zujTxYQ5C252sWERGUky3Ms6W7a1K40wZVo1c/rUb1PLbR+fiG 5GE7DQJ/1olPPa9Az4SHr9YfflusCvd6ACvV47yBYYUnbEHNsuhzxnB8FkoM6QsBJlNE D8Q0TRGXVKFhFHIt/V68FBWEI8FbMiDq8H2bEpyv+7AVSHhYSfufvai2jeqbyDatiUS8 THkw== X-Gm-Message-State: AOAM533CqvQCVuzaHbES7avTPLMZ2qh9UtX5krjAdGKQpB7/UGlW2Udl LG61G2u5GtrLfX5ZcNo4qGjdikKozeuWJ7xrHakp+A== X-Google-Smtp-Source: ABdhPJwFuJWqJ5sBFVVl3RgBOqiOoTdSRvukJ9NjwBQ9wwT5nesIVSGWM8xno9ovf+b/s8SFQz5/PaEmbch92xf4Skk= X-Received: by 2002:a25:7645:: with SMTP id r66mr43680318ybc.36.1615288269432; Tue, 09 Mar 2021 03:11:09 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bruce Merry Date: Tue, 9 Mar 2021 13:10:58 +0200 Message-ID: Subject: Re: Is MAP_POPULATE supposed to fail silently? To: David Hildenbrand Cc: Linux MM , Mike Kravetz Content-Type: multipart/alternative; boundary="000000000000827ff305bd189baf" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3C56DC0007C7 X-Stat-Signature: udoqr7bgcg4xdys1a768sqsgqfzgpei9 Received-SPF: none (ska.ac.za>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=mail-yb1-f171.google.com; client-ip=209.85.219.171 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615288266-917412 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000827ff305bd189baf Content-Type: text/plain; charset="UTF-8" Thanks. I'm hoping that the *.rsvd.limit_in_bytes cgroup settings will suffice if we can upgrade the production systems to a kernel new enough to have them, and mlock() is a possibility as well, but it's useful to know about the other options. Mostly I wanted to know whether this was a kernel problem or a documentation problem. I've filed https://bugzilla.kernel.org/show_bug.cgi?id=212153 for the man page. On Tue, 9 Mar 2021 at 12:32, David Hildenbrand wrote: > On 09.03.21 10:33, Bruce Merry wrote: > > Hi > > > > I've run into a problem with using mmap(..., MAP_ANONYMOUS | > > MAP_POPULATE | MAP_HUGETLB). If there are no huge pages available due to > > vm.nr_hugepages (or hugetlb.2MB.rsvd.limit_in_bytes cgroup setting) then > > the mmap call fails and I can gracefully fall back to 4KB pages. > > However, if neither of the above apply but hugetlb.2MB.limit_in_bytes > > prevents pages being mapped, then it appears that MAP_POPULATE is > > silently ignored (according to mincore), and rather than being able to > > gracefully fall back, attempting to use the memory results in SIGBUS. > > I would have imagined that the hugepage reservation would fail. But > looks like they might get reserved, however, actual population is > restricted using cgroups later. > > Huge page reservation is actually pretty weird in some special cases > (including NUMA bindings). > > > > > Is that expected behaviour? I don't see anything in the mmap(2) man page > > about it being best-effort (in contrast to MAP_LOCKED, which explicitly > > says the call won't fail if it can't lock the memory). > > I think it has been best-effort forever, just like MAP_LOCKED. > > You could use memfd_create() to create an anonymous file backed by huge > pages, then try allocating backend storage using fallocate() - which > fails in a safe way. You just have to make sure to map it MAP_SHARED > later to avoid nasty side effects with private mappings + fallocate(). > > > > > This is on Linux 5.8 on Ubuntu 20.04. I can provide sample code if it's > > of interest, or test on a newer kernel if it'll help. > > > > Note that I'm working on a reliable populate mechanism that can also > work on parts of a mapping only, especially relevant in combination with > MAP_NORESERVE. Not sure if that applies to your use case, sounds like > memfd_create() +fallocate() could be good enough - unless you also > really want to have all page tables properly populated already or really > need MAP_PRIVATE. > > https://lkml.kernel.org/r/20210308164520.18323-1-david@redhat.com > > -- > Thanks, > > David / dhildenb > > -- Bruce Merry Senior Science Processing Developer SARAO --000000000000827ff305bd189baf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks. I'm hoping that the *.rsvd.limit_in_bytes= cgroup settings will suffice if we can upgrade the production systems to a= kernel new enough to have them, and mlock() is a possibility as well, but = it's useful to know about the other options. Mostly I wanted to know wh= ether this was a kernel problem or a documentation problem. I've filed = https://bu= gzilla.kernel.org/show_bug.cgi?id=3D212153 for the man page.
<= /div>
O= n Tue, 9 Mar 2021 at 12:32, David Hildenbrand <david@redhat.com> wrote:
On 09.03.21 10:33, Bruce Merry wrote:
> Hi
>
> I've run into a problem with using mmap(..., MAP_ANONYMOUS |
> MAP_POPULATE | MAP_HUGETLB). If there are no huge pages available due = to
> vm.nr_hugepages (or hugetlb.2MB.rsvd.limit_in_bytes cgroup setting) th= en
> the mmap call fails and I can gracefully fall back to 4KB pages.
> However, if neither of the above apply but hugetlb.2MB.limit_in_bytes =
> prevents pages being mapped, then it appears that MAP_POPULATE is
> silently ignored (according to mincore), and rather than being able to=
> gracefully fall back, attempting to use the memory results in SIGBUS.<= br>
I would have imagined that the hugepage reservation would fail. But
looks like they might get reserved, however, actual population is
restricted using cgroups later.

Huge page reservation is actually pretty weird in some special cases
(including NUMA bindings).

>
> Is that expected behaviour? I don't see anything in the mmap(2) ma= n page
> about it being best-effort (in contrast to MAP_LOCKED, which explicitl= y
> says the call won't fail if it can't lock the memory).

I think it has been best-effort forever, just like MAP_LOCKED.

You could use memfd_create() to create an anonymous file backed by huge pages, then try allocating backend storage using fallocate() - which
fails in a safe way. You just have to make sure to map it MAP_SHARED
later to avoid nasty side effects with private mappings + fallocate().

>
> This is on Linux 5.8 on Ubuntu 20.04. I can provide sample code if it&= #39;s
> of interest, or test on a newer kernel if it'll help.
>

Note that I'm working on a reliable populate mechanism that can also work on parts of a mapping only, especially relevant in combination with MAP_NORESERVE. Not sure if that applies to your use case, sounds like
memfd_create() +fallocate() could be good enough - unless you also
really want to have all page tables properly populated already or really need MAP_PRIVATE.

https://lkml.kernel.org/r/202103081= 64520.18323-1-david@redhat.com

--
Thanks,

David / dhildenb



--
Bruce Merry
Seni= or Science Processing Developer
SARAO
--000000000000827ff305bd189baf--