From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9F4FC433E0 for ; Mon, 8 Mar 2021 21:59:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 207F66509D for ; Mon, 8 Mar 2021 21:59:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 207F66509D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 980368D001D; Mon, 8 Mar 2021 16:59:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E3476B00EA; Mon, 8 Mar 2021 16:59:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70EA68D001D; Mon, 8 Mar 2021 16:59:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id 4AA166B00E9 for ; Mon, 8 Mar 2021 16:59:01 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CE9411DE9 for ; Mon, 8 Mar 2021 21:59:00 +0000 (UTC) X-FDA: 77898072840.04.18429B2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 86855E0011C9 for ; Mon, 8 Mar 2021 21:58:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615240739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MgSuVT7IGID/21Rv9zF0p9OQqw92SfPH04YdmtwFZyg=; b=Ef/Ukp1SxfBBPTXDdyPmJKSi97NN/MEEwX2aJdcfvVL0fezkVnzEYmFNaJyPgfWc3TkJeA wsehXKw0sSaAs86oHvZhKj4Es60uEdnTf30u9MlnIEfmjsLavd1vhTBo6NKgUMQ5Z/6PiW qd5oRk97loMyLcVhpUxrZ11icWXCvYQ= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-19-FVHYFxVtO3OlStW2ru8gdw-1; Mon, 08 Mar 2021 16:58:57 -0500 X-MC-Unique: FVHYFxVtO3OlStW2ru8gdw-1 Received: by mail-wr1-f69.google.com with SMTP id z17so5419527wrv.23 for ; Mon, 08 Mar 2021 13:58:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=KrspNk2rfZj7/whsGzU05gApqtQxCSoDmRFTVen5Hf8=; b=b7KrNMGUR9U2La9WLYDyoHGLrYFYSuW3N/yA1s5wgcmz30L0BNk103frax2GgkTI2E 6o5rdEjJUqZWjB70tNcsKz0AIPkywohVPLGP1gAWGJYQBKkp2EL38EOhtBvlGLi7C6zi VAH1x1/PgIqJ4HvZiNSxo82mUqucBKBMH68hg0RhH2Wynw1C4o/9h7qTxYiLeyTsCf2L ZNUokVOnzPhUP0HTE/kUsV+ay2RwDfTKbwoCE1itCpUZOxd+EVA8pfdqVvO70f/jqtfw EpN4J0R1DGu9/RATjbKIWhm4EMtg3qR6nyNXu0xK6uf6ITslWsgn8W8W2ml360o19LSE FVqA== X-Gm-Message-State: AOAM532V7xM7ZQHjDGZt7rFG9c0JAU4S/aZBZQsJuZkIgp1zxM++ILAK LTxMBtPRxuwam7uAaeg25rvRxXDDEr7hFO1nXFtYeFu3gyCQsUdIyppT+Ks/PVHPP61fmnXdLGG 3mJ6OXAtoBb0= X-Received: by 2002:a1c:66c4:: with SMTP id a187mr769346wmc.164.1615240736304; Mon, 08 Mar 2021 13:58:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJxnMtr0hHT3Ag3D8EwKA/HQC1ulxa/i0rpRvb0QgcVr3VqstcJxdUY/3I38QdQT8QV+qQWpsQ== X-Received: by 2002:a1c:66c4:: with SMTP id a187mr769332wmc.164.1615240736072; Mon, 08 Mar 2021 13:58:56 -0800 (PST) Received: from [192.168.3.108] (p5b0c6c02.dip0.t-ipconnect.de. [91.12.108.2]) by smtp.gmail.com with ESMTPSA id b186sm856863wmc.44.2021.03.08.13.58.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 08 Mar 2021 13:58:55 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] mm: huge_memory: a new debugfs interface for splitting THP tests. Date: Mon, 8 Mar 2021 22:58:54 +0100 Message-Id: References: Cc: David Hildenbrand , Zi Yan , Linux MM , Linux Kernel Mailing List , linux-kselftest@vger.kernel.org, "Kirill A . Shutemov" , Andrew Morton , Shuah Khan , John Hubbard , Sandipan Das , David Rientjes , Alex Shi In-Reply-To: To: Yang Shi X-Mailer: iPhone Mail (18D52) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5aajm55qai34a1wxsdu3rwm1m4n53rew X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 86855E0011C9 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615240738-463562 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 08.03.2021 um 22:25 schrieb Yang Shi : >=20 > =EF=BB=BFOn Mon, Mar 8, 2021 at 12:36 PM David Hildenbrand wrote: >>=20 >>=20 >>>> Am 08.03.2021 um 21:18 schrieb Yang Shi : >>>=20 >>> =EF=BB=BFOn Mon, Mar 8, 2021 at 11:30 AM David Hildenbrand wrote: >>>>=20 >>>>> On 08.03.21 20:11, Yang Shi wrote: >>>>> On Mon, Mar 8, 2021 at 11:01 AM Zi Yan wrote: >>>>>>=20 >>>>>> On 8 Mar 2021, at 13:11, David Hildenbrand wrote: >>>>>>=20 >>>>>>> On 08.03.21 18:49, Zi Yan wrote: >>>>>>>> On 8 Mar 2021, at 11:17, David Hildenbrand wrote: >>>>>>>>=20 >>>>>>>>> On 08.03.21 16:22, Zi Yan wrote: >>>>>>>>>> From: Zi Yan >>>>>>>>>>=20 >>>>>>>>>> By writing ",," to >>>>>>>>>> /split_huge_pages_in_range_pid, THPs in the process wit= h the >>>>>>>>>> given pid and virtual address range are split. It is used to tes= t >>>>>>>>>> split_huge_page function. In addition, a selftest program is add= ed to >>>>>>>>>> tools/testing/selftests/vm to utilize the interface by splitting >>>>>>>>>> PMD THPs and PTE-mapped THPs. >>>>>>>>>=20 >>>>>>>>> Won't something like >>>>>>>>>=20 >>>>>>>>> 1. MADV_HUGEPAGE >>>>>>>>>=20 >>>>>>>>> 2. Access memory >>>>>>>>>=20 >>>>>>>>> 3. MADV_NOHUGEPAGE >>>>>>>>>=20 >>>>>>>>> Have a similar effect? What's the benefit of this? >>>>>>>>=20 >>>>>>>> Thanks for checking the patch. >>>>>>>>=20 >>>>>>>> No, MADV_NOHUGEPAGE just replaces VM_HUGEPAGE with VM_NOHUGEPAGE, >>>>>>>> nothing else will be done. >>>>>>>=20 >>>>>>> Ah, okay - maybe my memory was tricking me. There is some s390x KVM= code that forces MADV_NOHUGEPAGE and force-splits everything. >>>>>>>=20 >>>>>>> I do wonder, though, if this functionality would be worth a proper = user interface (e.g., madvise), though. There might be actual benefit in ha= ving this as a !debug interface. >>>>>>>=20 >>>>>>> I think you aware of the discussion in https://lkml.kernel.org/r/d0= 98c392-273a-36a4-1a29-59731cdf5d3d@google.com >>>>>>=20 >>>>>> Yes. Thanks for bringing this up. >>>>>>=20 >>>>>>>=20 >>>>>>> If there will be an interface to collapse a THP -- "this memory are= a is worth extra performance now by collapsing a THP if possible" -- it mig= ht also be helpful to have the opposite functionality -- "this memory area = is not worth a THP, rather use that somehwere else". >>>>>>>=20 >>>>>>> MADV_HUGE_COLLAPSE vs. MADV_HUGE_SPLIT >>>>>>=20 >>>>>> I agree that MADV_HUGE_SPLIT would be useful as the opposite of COLL= APSE when user might just want PAGESIZE mappings. >>>>>> Right now, HUGE_SPLIT is implicit from mapping changes like mprotect= or MADV_DONTNEED. >>>>>=20 >>>>> IMHO, it sounds not very useful. MADV_DONTNEED would split PMD for an= y >>>>> partial THP. If the range covers the whole THP, the whole THP is goin= g >>>>> to be freed anyway. All other places in kernel which need split THP >>>>> have been covered. So I didn't realize any usecase from userspace for >>>>> just splitting PMD to PTEs. >>>>=20 >>>> THP are a limited resource. So indicating which virtual memory regions >>>> are not performance sensitive right now (e.g., cold pages in a databse= ) >>>> and not worth a THP might be quite valuable, no? >>>=20 >>> Such functionality could be achieved by MADV_COLD or MADV_PAGEOUT, >>> right? Then a subsequent call to MADV_NOHUGEPAGE would prevent from >>> collapsing or allocating THP for that area. >>>=20 >>=20 >> I remember these deal with optimizing swapping. Not sure how they intera= ct with THP, especially on systems without swap - I would guess they don=E2= =80=98t as of now. >=20 > Yes, MADV_PAGEOUT would just swap the THP or sub pages out. I think I > just forgot to mention MADV_FREE which would be more suitable for this > usecase. >=20 >>=20 Can you elaborate? MADV_FREE is destructive, just like a delayed MADV_DONTN= EED. How would that help here? >>>>=20 >>>> -- >>>> Thanks, >>>>=20 >>>> David / dhildenb >>>>=20 >>>=20 >>=20 >=20