From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 551ABC433F5 for ; Thu, 27 Jan 2022 17:53:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EFB36B0071; Thu, 27 Jan 2022 12:53:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 99DFE6B0074; Thu, 27 Jan 2022 12:53:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88D976B0078; Thu, 27 Jan 2022 12:53:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 7CC246B0071 for ; Thu, 27 Jan 2022 12:53:10 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 39A32181A3483 for ; Thu, 27 Jan 2022 17:53:10 +0000 (UTC) X-FDA: 79076813340.15.B4A978C Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) by imf16.hostedemail.com (Postfix) with ESMTP id DF076180004 for ; Thu, 27 Jan 2022 17:53:09 +0000 (UTC) Received: by mail-il1-f175.google.com with SMTP id y17so3181804ilm.1 for ; Thu, 27 Jan 2022 09:53:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=irEE7yHILyJWoTqBD2Snz597LhBeTSpmicRdNJceySU=; b=WaoWsAgLDsn8+cDXwJGuBSwy822bEXB3Sr5ofxaHfQSg7Gu+gT6bYzIxtbbeA7LcAc rhYG211svtnFRxmgpOVlYL0RNEfxluZbJKmoK7IwSwPASU++F88KcAnU9ab9ZuRbs3Gp SEKa+Jd+5CHfIDx3N4JFN+KLiJ7wEUqIYq/gtogS9Sa6yh7oBfVQ3va15aJs5s9kUW4v rVdjzODrABmhUZPQaTM87Q0iAMdVFm/ur7QCk13JE7p/3MMyCvTVsl2B7Xbx+SfxN4RK 4KkTUNGkprWu9S8CxGXGwj752zhjT12FxXO2SlU0lZfQiF6hBp2po442vmAKqp+imTiL E56Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=irEE7yHILyJWoTqBD2Snz597LhBeTSpmicRdNJceySU=; b=o9MpZpPq8H8BGZzfITkM7XhD+x2bS9FHhJswG2tCYO2sdB1ld2dDAaWqYvd0qOxytM XFpW8df7G8LMGdJwfvs4l+Os2zxeqvQBVSvncvb0rvBSl3k5nT9WNcBDobkeKnfqMYKs BAqOtU6/vsgnoHog3VabVkbOipnRPU0+D3zyJpECE8JHbcjOsxQtnr9r827wXAdD5WrY lt7CPu8xDBVeSO9/7Uz4BvLzQql5KIXsHJhKC2nSwl44/WZb0a6Ys1Xe8YQSJhua9opD YAkQU76Sfj9aVbMIUB4dWocMANDEmh5jKoHLcfDpMNkfmqTYNvBVhA2nzT/FcfK7UO9V x0jQ== X-Gm-Message-State: AOAM531PqxkAIpbFTUafg+XLCZ6DtHgAelV+R/7Q9ClxBy44MVF01NSB B3DGh6VkDKxRnjkQePoPaken36IhFsX5F7QpgVyBBQ== X-Google-Smtp-Source: ABdhPJylTVfStujKAvnQnAOslIqzd/2GW4sVblM2ShOJLob04RtjYxUGCRV5LHSHyVj2VeZvQx7cZLbtNdgTd92UD/0= X-Received: by 2002:a05:6e02:1b81:: with SMTP id h1mr3315954ili.239.1643305988641; Thu, 27 Jan 2022 09:53:08 -0800 (PST) MIME-Version: 1.0 References: <20220113180308.15610-1-mike.kravetz@oracle.com> In-Reply-To: From: Axel Rasmussen Date: Thu, 27 Jan 2022 09:52:32 -0800 Message-ID: Subject: Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support To: David Hildenbrand Cc: Mike Kravetz , LKML , Linux MM , Michal Hocko , Naoya Horiguchi , Peter Xu , Andrea Arcangeli , Mina Almasry , Shuah Khan , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DF076180004 X-Stat-Signature: mq75yqnkotnxnyw43cdkzsj1tnyjngbu Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WaoWsAgL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.166.175 as permitted sender) smtp.mailfrom=axelrasmussen@google.com X-Rspam-User: nil X-HE-Tag: 1643305989-273104 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 27, 2022 at 3:57 AM David Hildenbrand wrote: > > On 13.01.22 19:03, Mike Kravetz wrote: > > Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP > > testing. However, mremap support was recently added in commit > > 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed > > vma"). While attempting to enable mremap support in the test, it was > > discovered that the mremap test indirectly depends on MADV_DONTNEED. > > > > hugetlb does not support MADV_DONTNEED. However, the only thing > > preventing support is a check in can_madv_lru_vma(). Simply removing > > the check will enable support. > > > > This is sent as a RFC because there is no existing use case calling > > for hugetlb MADV_DONTNEED support except possibly the userfaultfd test. > > However, adding support makes sense as it is fairly trivial and brings > > hugetlb functionality more in line with 'normal' memory. > > > > Just a note: > > QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...) > but instead always goes either via hugetlbfs or via memfd. > > For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems > to get the job done (IOW: also discards private anon pages). See the > comments in the QEMU code below. I remember that that is somewhat > inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we > always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED) > to make sure > > a) All file pages are removed > b) All private anon pages are removed > > IIRC hugetlbfs really is different in that regard, but maybe other fs > behave similarly. > > That's why QEMU was able to live for now without MADV_DONTNEED support > for hugetlbfs and most probably won't ever need it. Agreed, all of the production use cases I'm aware of use hugetlbfs, not MAP_HUGE... But, I would say this is convenient for testing purposes. It's slightly more convenient to not have to mount hugetlbfs / perform the associated setup for tests. Perhaps that's only a small motivation for enabling this, but then again Mike's patch to do so is likewise very small. :) > > > ... > /* The logic here is messy; > * madvise DONTNEED fails for hugepages > * fallocate works on hugepages and shmem > * shared anonymous memory requires madvise REMOVE > */ > need_madvise = (rb->page_size == qemu_host_page_size); > need_fallocate = rb->fd != -1; > if (need_fallocate) { > /* For a file, this causes the area of the file to be zero'd > * if read, and for hugetlbfs also causes it to be unmapped > * so a userfault will trigger. > */ > #ifdef CONFIG_FALLOCATE_PUNCH_HOLE > ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > start, length); > if (ret) { > ret = -errno; > error_report("ram_block_discard_range: Failed to fallocate " > "%s:%" PRIx64 " +%zx (%d)", > rb->idstr, start, length, ret); > goto err; > } > #else > ret = -ENOSYS; > error_report("ram_block_discard_range: fallocate not available/file" > "%s:%" PRIx64 " +%zx (%d)", > rb->idstr, start, length, ret); > goto err; > #endif > } > if (need_madvise) { > /* For normal RAM this causes it to be unmapped, > * for shared memory it causes the local mapping to disappear > * and to fall back on the file contents (which we just > * fallocate'd away). > */ > #if defined(CONFIG_MADVISE) > if (qemu_ram_is_shared(rb) && rb->fd < 0) { > ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE); > } else { > ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED); > } > if (ret) { > ret = -errno; > error_report("ram_block_discard_range: Failed to discard range " > "%s:%" PRIx64 " +%zx (%d)", > rb->idstr, start, length, ret); > goto err; > } > #else > ... > > -- > Thanks, > > David / dhildenb >