From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BE33C433DB for ; Mon, 8 Mar 2021 20:36:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB70F6520C for ; Mon, 8 Mar 2021 20:36:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB70F6520C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3DE268D0076; Mon, 8 Mar 2021 15:36:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39BA38D001D; Mon, 8 Mar 2021 15:36:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E0B18D0076; Mon, 8 Mar 2021 15:36:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id F0CE98D001D for ; Mon, 8 Mar 2021 15:36:06 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AEEF01EF1 for ; Mon, 8 Mar 2021 20:36:06 +0000 (UTC) X-FDA: 77897863932.07.04AB639 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf10.hostedemail.com (Postfix) with ESMTP id DE5F44080F46 for ; Mon, 8 Mar 2021 20:36:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615235765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j1ZRi6r55gqh8w+4zPXFG5UB/B77Oqmd2ZQXxFgjKjo=; b=Fnw+nm4zUJ5FrEtYGOIqo6XOpdy5LEP+0ojJG1JxwQNkhZ1/2LBjNaynRxS3YBguLet89K EEOBQ0i8/Gbe2VBmpyaPdplaBCXDf84Q2knBmxW1gBAev6f06yvl7qdW8Fu8cKkHwTW2p/ 495WfbbkBq5Nt7nvQXKY4uRIrU2MOw4= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-486-aFaGT592N_iRtnrzEjhPtA-1; Mon, 08 Mar 2021 15:36:01 -0500 X-MC-Unique: aFaGT592N_iRtnrzEjhPtA-1 Received: by mail-wr1-f69.google.com with SMTP id s10so5350802wre.0 for ; Mon, 08 Mar 2021 12:36:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=FtEM8SCXvIO16AentYXNZkKzs4IVjutP/ewH3m8Y9qw=; b=EYw9lnehDzVbHS1J/lq3L5qKy6jQXE9OawiroV54BZonSBmtaVr/0k27wdRxcBeYLh wmsd3Ua23Egxdcai5KOnxlyDZfC3BG9tssdrRXQb/hZntEnJO4C5mAG903Z5KapHJo1C 9HP7PPdGPCTjOK7BtgfWXcgsYuCXYet3PpXfHvquWFjf/2vvp3RsNqZ3xPOzvGtVVhGN GRCCD5XrQsYdO7a9MYrB8hKku8y5CkOyLH4puNCkKxW9luzvngiyIliuwKOQi4u+jLIF p9BUlZJJPJAJk6uWaD7Rk8SL8ozBVM0NI0FZasYvepGJit266GI/fuHveFgXO2H9LVUR vxeQ== X-Gm-Message-State: AOAM533khoTMlSIOk8oBYAdkykZHhYM/oAyq3iHuYxJpu02zahTE4Np1 dXSaryrqTGXoPJusTVfMKQUDLv0oUp31FenZb6WRZfx+r4pVxv/RUvakK3orLq1kumpvZQcov2w 9LbO1Lno4tWE= X-Received: by 2002:a5d:698d:: with SMTP id g13mr25530219wru.2.1615235759858; Mon, 08 Mar 2021 12:35:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJzM42sUxBjHtd8Lz8LSAruQlmts87ltWsaKFthq9Fv+YcDQ8wlH4KkqDNuaYsQk/GwjwHCPUQ== X-Received: by 2002:a5d:698d:: with SMTP id g13mr25530205wru.2.1615235759653; Mon, 08 Mar 2021 12:35:59 -0800 (PST) Received: from [192.168.3.108] (p5b0c6c02.dip0.t-ipconnect.de. [91.12.108.2]) by smtp.gmail.com with ESMTPSA id d29sm20522202wra.51.2021.03.08.12.35.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 08 Mar 2021 12:35:59 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] mm: huge_memory: a new debugfs interface for splitting THP tests. Date: Mon, 8 Mar 2021 21:35:58 +0100 Message-Id: <44C62A78-4B37-445D-A9F7-25D1A412A802@redhat.com> References: Cc: David Hildenbrand , Zi Yan , Linux MM , Linux Kernel Mailing List , linux-kselftest@vger.kernel.org, "Kirill A . Shutemov" , Andrew Morton , Shuah Khan , John Hubbard , Sandipan Das , David Rientjes , Alex Shi In-Reply-To: To: Yang Shi X-Mailer: iPhone Mail (18D52) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DE5F44080F46 X-Stat-Signature: tukhza5jiumyfjn1ank5csfiqpu8che7 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615235762-820458 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 08.03.2021 um 21:18 schrieb Yang Shi : >=20 > =EF=BB=BFOn Mon, Mar 8, 2021 at 11:30 AM David Hildenbrand wrote: >>=20 >>> On 08.03.21 20:11, Yang Shi wrote: >>> On Mon, Mar 8, 2021 at 11:01 AM Zi Yan wrote: >>>>=20 >>>> On 8 Mar 2021, at 13:11, David Hildenbrand wrote: >>>>=20 >>>>> On 08.03.21 18:49, Zi Yan wrote: >>>>>> On 8 Mar 2021, at 11:17, David Hildenbrand wrote: >>>>>>=20 >>>>>>> On 08.03.21 16:22, Zi Yan wrote: >>>>>>>> From: Zi Yan >>>>>>>>=20 >>>>>>>> By writing ",," to >>>>>>>> /split_huge_pages_in_range_pid, THPs in the process with = the >>>>>>>> given pid and virtual address range are split. It is used to test >>>>>>>> split_huge_page function. In addition, a selftest program is added= to >>>>>>>> tools/testing/selftests/vm to utilize the interface by splitting >>>>>>>> PMD THPs and PTE-mapped THPs. >>>>>>>=20 >>>>>>> Won't something like >>>>>>>=20 >>>>>>> 1. MADV_HUGEPAGE >>>>>>>=20 >>>>>>> 2. Access memory >>>>>>>=20 >>>>>>> 3. MADV_NOHUGEPAGE >>>>>>>=20 >>>>>>> Have a similar effect? What's the benefit of this? >>>>>>=20 >>>>>> Thanks for checking the patch. >>>>>>=20 >>>>>> No, MADV_NOHUGEPAGE just replaces VM_HUGEPAGE with VM_NOHUGEPAGE, >>>>>> nothing else will be done. >>>>>=20 >>>>> Ah, okay - maybe my memory was tricking me. There is some s390x KVM c= ode that forces MADV_NOHUGEPAGE and force-splits everything. >>>>>=20 >>>>> I do wonder, though, if this functionality would be worth a proper us= er interface (e.g., madvise), though. There might be actual benefit in havi= ng this as a !debug interface. >>>>>=20 >>>>> I think you aware of the discussion in https://lkml.kernel.org/r/d098= c392-273a-36a4-1a29-59731cdf5d3d@google.com >>>>=20 >>>> Yes. Thanks for bringing this up. >>>>=20 >>>>>=20 >>>>> If there will be an interface to collapse a THP -- "this memory area = is worth extra performance now by collapsing a THP if possible" -- it might= also be helpful to have the opposite functionality -- "this memory area is= not worth a THP, rather use that somehwere else". >>>>>=20 >>>>> MADV_HUGE_COLLAPSE vs. MADV_HUGE_SPLIT >>>>=20 >>>> I agree that MADV_HUGE_SPLIT would be useful as the opposite of COLLAP= SE when user might just want PAGESIZE mappings. >>>> Right now, HUGE_SPLIT is implicit from mapping changes like mprotect o= r MADV_DONTNEED. >>>=20 >>> IMHO, it sounds not very useful. MADV_DONTNEED would split PMD for any >>> partial THP. If the range covers the whole THP, the whole THP is going >>> to be freed anyway. All other places in kernel which need split THP >>> have been covered. So I didn't realize any usecase from userspace for >>> just splitting PMD to PTEs. >>=20 >> THP are a limited resource. So indicating which virtual memory regions >> are not performance sensitive right now (e.g., cold pages in a databse) >> and not worth a THP might be quite valuable, no? >=20 > Such functionality could be achieved by MADV_COLD or MADV_PAGEOUT, > right? Then a subsequent call to MADV_NOHUGEPAGE would prevent from > collapsing or allocating THP for that area. >=20 I remember these deal with optimizing swapping. Not sure how they interact = with THP, especially on systems without swap - I would guess they don=E2=80= =98t as of now. >>=20 >> -- >> Thanks, >>=20 >> David / dhildenb >>=20 >=20