From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81FDBC433EF for ; Fri, 26 Nov 2021 04:13:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E60446B0075; Thu, 25 Nov 2021 23:13:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E0E6B6B0078; Thu, 25 Nov 2021 23:13:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD5A26B007B; Thu, 25 Nov 2021 23:13:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id B43A96B0075 for ; Thu, 25 Nov 2021 23:13:07 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 71CB2824C454 for ; Fri, 26 Nov 2021 04:12:57 +0000 (UTC) X-FDA: 78849760752.24.1C54401 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 7EBC0F00008C for ; Fri, 26 Nov 2021 04:12:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637899976; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FaA5Hc6qxaqzZYkeRafG4dNVzHGTwgjEF0vky65xwRs=; b=NADOgh28GSXEBi2Kxncsh6oOBuzYm4FvFmZiueND+Cj3Y1DkeadFQF2E+OlgbyCFyIroW3 zrdwD3OXuxKs9T4kXXLiM+wfQQwCfqlmwPTT5aMyt6/HMSSbTg3KVBjXRtLQJ50vq93T6y 7YtGUTFck/GpLdHM7N8C1SgvkLJGqyk= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-433-OoHqLMLCMx-QY3ZTdmfHWA-1; Thu, 25 Nov 2021 23:12:54 -0500 X-MC-Unique: OoHqLMLCMx-QY3ZTdmfHWA-1 Received: by mail-pj1-f72.google.com with SMTP id a12-20020a17090aa50cb0290178fef5c227so2869576pjq.1 for ; Thu, 25 Nov 2021 20:12:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FaA5Hc6qxaqzZYkeRafG4dNVzHGTwgjEF0vky65xwRs=; b=s2qsp62k0WerDZ1+TRKypbnQta3ga1Awo3TGVlZ+YEzVi/hJmIa/fY1XDcQ2zO6P9f vjU/asGYwjW5YAoQjEsaO5OLDaDEizG3pF2vXAplcIXlGJUwS181CIabrNLIMqoalw1r ZcND1DsGLcXfUlyi1UUTgpZYIttyXQ101HfitKZ5TzKv+p1pc1s+2IsjB7ZZQWWKvRQg AOK5lhcKklxEk1v3sJRmUH9fZf6SyUEdZPvCCqDapkQDa+1+FLbPHxhhzg+mXmlEq5/g p2u8n46qzhAtUnBpne7Hiv1fc6VhFeZSnbJItNEBHcREBw8yGr6DETcZnl4jnQUSAoEO QHuQ== X-Gm-Message-State: AOAM532s/mjfaig4+TXFezfz3Kjxbc/fHZtIMRieYm5lk+hwDfmKp9yZ WnnPVm7Tt9l2jR4u//38otqc2dNa6nEVZxDuifMgbSlA+fH0LOf+EvHoePueP1VaZCjwUvHhM8N JsfoNm3OcYiA= X-Received: by 2002:a63:ea51:: with SMTP id l17mr19137963pgk.363.1637899973628; Thu, 25 Nov 2021 20:12:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwQkGzCPtwUNossXL3Z5QyhMzDRCODUjel1GfgWU8bdNCvVroqzxSBb2IERuSpyruw3MoBCtw== X-Received: by 2002:a63:ea51:: with SMTP id l17mr19137942pgk.363.1637899973317; Thu, 25 Nov 2021 20:12:53 -0800 (PST) Received: from xz-m1.local ([94.177.118.150]) by smtp.gmail.com with ESMTPSA id w17sm5006610pfu.58.2021.11.25.20.12.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Nov 2021 20:12:52 -0800 (PST) Date: Fri, 26 Nov 2021 12:12:45 +0800 From: Peter Xu To: Shakeel Butt Cc: David Hildenbrand , "Kirill A . Shutemov" , Yang Shi , Zi Yan , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Rientjes Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED Message-ID: References: <20211120201230.920082-1-shakeelb@google.com> <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7EBC0F00008C X-Stat-Signature: y5e5tmuq8nqznn5csxyppewm4c64upbc Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NADOgh28; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf16.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1637899972-881646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 25, 2021 at 07:21:54PM -0800, Shakeel Butt wrote: > On Thu, Nov 25, 2021 at 2:24 AM Peter Xu wrote: > > > > On Mon, Nov 22, 2021 at 10:40:54AM -0800, Shakeel Butt wrote: > > > > Do we have a performance evaluation how much overhead is added e.g., for > > > > a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that > > > > covers the whole THP? > > > > > > I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on > > > x86 for both settings you suggested. I don't see any statistically > > > significant difference with and without the patch. Let me know if you > > > want me to try something else. > > > > I'm a bit surprised that sync split thp didn't bring any extra overhead. > > > > "unmap whole thp" is understandable from that pov, because afaict that won't > > even trigger any thp split anyway even delayed, if this is the simplest case > > that only this process mapped this thp, and it mapped once. > > > > For "unmap 4k upon thp" IIUC that's the worst case and zapping 4k should be > > fast; while what I don't understand since thp split requires all hand-made work > > for copying thp flags into small pages and so on, so I thought there should at > > least be some overhead measured. Shakeel, could there be something overlooked > > in the test, or maybe it's me that overlooked? > > > > Thanks for making me rerun this and yes indeed I had a very silly bug in the > benchmark code (i.e. madvise the same page for the whole loop) and this is > indeed several times slower than without the patch (sorry David for misleading > you). > > To better understand what is happening, I profiled the benchmark: > > - 31.27% 0.01% dontneed [kernel.kallsyms] [k] zap_page_range_sync > - 31.27% zap_page_range_sync > - 30.25% split_local_deferred_list > - 30.16% split_huge_page_to_list > - 21.05% try_to_migrate > + rmap_walk_anon > - 7.47% remove_migration_ptes > + 7.34% rmap_walk_locked > + 1.02% zap_page_range_details Makes sense, thanks for verifying it, Shakeel. I forgot it'll also walk itself. I believe this effect will be exaggerated when the mapping is shared, e.g. shmem file thp between processes. What's worse is that when one process DONTNEED one 4k page, all the rest mms will need to split the huge pmd without even being noticed, so that's a direct suffer from perf degrade. > > The overhead is not due to copying page flags but rather due to two rmap walks. > I don't think this much overhead is justified for current users of MADV_DONTNEED > and munmap. I have to rethink the approach. Some side notes: I digged out the old MADV_COLLAPSE proposal right after I thought about MADV_SPLIT (or any of its variance): https://lore.kernel.org/all/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ My memory was that there's some issue to be solved so that was blocked, however when I read the thread it sounds like the list was mostly reaching a consensus on considering MADV_COLLAPSE being beneficial. Still copying DavidR in case I missed something important. If we think MADV_COLLAPSE can help to implement an userspace (and more importantly, data-aware) khugepaged, then MADV_SPLIT can be the other side of kcompactd, perhaps. That's probably a bit off topic of this specific discussion on the specific use case, but so far it seems all reasonable and discussable. -- Peter Xu