From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60764C76196 for ; Thu, 6 Apr 2023 17:11:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9BFA6B0071; Thu, 6 Apr 2023 13:11:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4BF46B0074; Thu, 6 Apr 2023 13:11:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D13926B0075; Thu, 6 Apr 2023 13:11:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C32196B0071 for ; Thu, 6 Apr 2023 13:11:09 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8FC511C6DAD for ; Thu, 6 Apr 2023 17:11:09 +0000 (UTC) X-FDA: 80651606658.29.8609CAC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 3634C8001F for ; Thu, 6 Apr 2023 17:11:06 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LtzxMfOv; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680801066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w9BEmWCga9rATnS7k1cV/U68L/kkD9CUtlqpFM/Am+c=; b=epYK4PfJIZgbcIiAekc5J0KDA9YzcxO1NK0SRARAaTBV1zyvGppK6nUiCWBwtispOa5hkm oe8Ah454D6BJzIM3w4RKPvGrmW6fai0eNnTECQliBdTC6pBj3gxq+EIHQ0rY0HotYnVVR3 aBHffk7UcMxoYUO+awMwrTkd36fvqHs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LtzxMfOv; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680801066; a=rsa-sha256; cv=none; b=e3a7SL0uk92QbhssH+QzUVTdWB1yIU0MoLFOJ4a0MKyC5zAWzHj+wLFkPhAVzadCTJSwHf CmTTilux/jDGLnJtP+x8nZbe9yvCCBz/f310X1mGmDEiZaDJDH1RrLLGBkzxLoKCuZFML3 Qm7FzAT0jotecp2HfBbU7ANTLlHdVfQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680801065; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w9BEmWCga9rATnS7k1cV/U68L/kkD9CUtlqpFM/Am+c=; b=LtzxMfOvXr+m7tCY04/1HYVrd33Nc4+w+U6XfW738TF7kDeEc29qfrzHkwbHOO0NE5r1m4 Oa4U37mlyHdAE5tk6EekTVnDncRmo68G+Ht2tdy1QfY/GdnmKlhduVKFPFwhi7V02Gl4ls rnLUbYjU0E6TTXjxSIkKJ6NO7loLHsk= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-407-mSnxhdtiP4S0uastivBbWQ-1; Thu, 06 Apr 2023 13:11:03 -0400 X-MC-Unique: mSnxhdtiP4S0uastivBbWQ-1 Received: by mail-wr1-f71.google.com with SMTP id b14-20020a05600003ce00b002cfefd8e637so5075753wrg.15 for ; Thu, 06 Apr 2023 10:11:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680801062; x=1683393062; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=w9BEmWCga9rATnS7k1cV/U68L/kkD9CUtlqpFM/Am+c=; b=JNYe9k7RGDbqTugmzz9BPC2OVDC+Aozvjg8vXRX90k8jMWj6IcHuJnzVFXGB91aqyG 9be5VWav4OFtVINdFhnOuzj+ir/TwFNslqH2wF6emWEhAb0Y04NFNzxbZ7LvRqEYa0Ps YDIh/U1RXHgu1H8ocCjlZvncWT7zS08OU6GVhw3qfk7tmJCVAVfPvTGQDGVuD26ZLzVe rt17ifSxa4qPGHyT98UgjxHRtFvmXTEk2P5azwGwqafqjf8CSkNMnwSbec4m0F8rnS9r OqmZz09kXXbJLM2wPurjQEfak2GfcCoD22QdB8KhgKxy8n8jp3EnLYjSe4J2j2OA+tKn VfeQ== X-Gm-Message-State: AAQBX9cPnHt8fCkxm2+g6iPfP63L/njcMV36062THbKlxcUCXlp72Zva JM+Yfldd0UjNxNQqDnHrWMBbmfIfnEz7bUmjlS80WmUu08jUoO06fdPIB+ujNR5IJ/YFMN5m7nf 8O5VE4/WYQlB3Vb3KJbY= X-Received: by 2002:adf:f201:0:b0:2cf:e74f:2957 with SMTP id p1-20020adff201000000b002cfe74f2957mr7401244wro.33.1680801062100; Thu, 06 Apr 2023 10:11:02 -0700 (PDT) X-Google-Smtp-Source: AKy350avrWUB3Clk/HKyyujt4hLYkFiffOizvgUlgZPG1DjbaTTYFi5GJEZjRmoaZaveorZkIYNxLg== X-Received: by 2002:adf:f201:0:b0:2cf:e74f:2957 with SMTP id p1-20020adff201000000b002cfe74f2957mr7401219wro.33.1680801061763; Thu, 06 Apr 2023 10:11:01 -0700 (PDT) Received: from ?IPV6:2003:cb:c705:6300:a8be:c1ad:41a1:2bf7? (p200300cbc7056300a8bec1ad41a12bf7.dip0.t-ipconnect.de. [2003:cb:c705:6300:a8be:c1ad:41a1:2bf7]) by smtp.gmail.com with ESMTPSA id fj12-20020a05600c0c8c00b003ef67848a21sm5825259wmb.13.2023.04.06.10.11.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Apr 2023 10:11:01 -0700 (PDT) Message-ID: <10dd1fd4-4d10-c25d-174b-de37f01bef48@redhat.com> Date: Thu, 6 Apr 2023 19:10:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support To: Stefan Roesch Cc: Johannes Weiner , Andrew Morton , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Hugh Dickins References: <20230310182851.2579138-1-shr@devkernel.io> <20230328160914.5b6b66e4a5ad39e41fd63710@linux-foundation.org> <37dcd52a-2e32-c01d-b805-45d862721fbc@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: tre1kicp9t1si1aexarxbkc16dddzh4b X-Rspamd-Queue-Id: 3634C8001F X-HE-Tag: 1680801066-66731 X-HE-Meta: U2FsdGVkX18tpci+jVMGLg10SDfobQMxPDQeFIQ8aXulFA2HI7xbQLJXoLBi8KXktjfCCAvfQ1h1UJX9GHrWt0WtLsqbp/8vekHScgvWusyrsNyFOLYUmvqVvAojHCj0nnNGI3VHNTcTEcbPhN+35Rw9Uv/cCFwYN9PX4RKFBYbyWcTO+hfTvtQjiXPl+9Gv8mPVmHLFPR7WWApbunhvtWsXWvN6ZPiUG0J4IFFu5/LEGWnxxYXRZpYuwv/yIIVRIUeOGsa6rXtZGRmT1nScM4Itm7rdVujjtQSXu8Bo94QT7zIgNakTpmGzsmlOPYDLtiI5W9R8nQCHYJ1XxTa7H5zjm1kcvrHo4lx0qKk00A0yJttDSTWzARB71e06xrIghIfAqr7X54HagkQx76dNuJSC4Z/uUKM8pT/Shezc4i9oCn3nT9xVw8WytVfHHBl1lF1Kl1cjg4PHvCtm5JQfIpg+ayJRKCMPbNoHcIggXbfKqEBT2HUb6q0DSPS0FsMIfive+3BrXUMuTqslJ5AHeTVBGQc7J1bWHgUI90HerfGakcfrTT1nJST+XjWt4ABtCHzygCa8tVgl/6zdaMThv9fZj1rjMUAFQdbGE553pcqaOVrRZwKShT1Vc+fvoxsV/o00/UIVHLhU5cVgb32mjjSMXwN3Xf9FrWMP6bgJlgHX12wo1HbLNugz7hpqIyu/TS5PccZY+Z5Xcgblgmqwv6Moc//AcuRD0c4/kZTNOOO0oh9/TD60oZgR5tEphlIwc51wRpTyUobfjApj4DvQo7p0oYzM0i53FQiTstqiOMU0io1mBwypvT/0JtAIQQt2SxreI2p0KHX4jUD3bHYU50AsB7R+hBEQOKrLfkXy9MoFJau1sTfw2KHK9wThF2JCOks8WvZdMWGkxTzpnoFtfhoSl7OQpdOURX8rgf3mtq27WYpsJdChqmzljRrIIrelMkNxHb5nchEthvYobDG dxGwtyQx PmYMhbClmsc0YZTM6qr7Szvq+Us7EocDcmkVR5ME4saOAE2/jVPze/19DtDhZwjqFp4p8F/b/m+HHIcumBfWv4h3/dkryHB0Ny/wl4386mR8ROnZsPipVWHW6dNd3tGq20gBPU3Lq5lX0saefYuikRlL/jo+Sd59GTpUtC3/b6TLxBFw+AFTefPSGT2GlyofOcaYOy+Xi0S3lE3XMapqv26p6rNYQakujPQ50h/sMZbsJCA6sgpIwrWOJZKBYBdUAISVo+PGQ2Q6uEgsZCLXVmiHeBnfsNbkNbjsju8swNfKYdMsFMuAXVYO74dinNc8Qycc8KArowLm+KFAMh/ACfhVzh5Y8GPdYDFNKx6fzrRqivR2IapwgHC9euk/8PmZ/c3a5PL16zNENJ3GnaUDvMhAclUGiqzYbbpNWQVDnm2HVDKRnQCCAg4WWuif09T6uSu8I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.04.23 18:59, Stefan Roesch wrote: > > Stefan Roesch writes: > >> David Hildenbrand writes: >> >>>>>> Obviously we could spend months analysing which exact allocations are >>>>>> identical, and then more months or years reworking the architecture to >>>>>> deduplicate them by hand and in userspace. But this isn't practical, >>>>>> and KSM is specifically for cases where this isn't practical. >>>>>> Based on your request in the previous thread, we investigated whether >>>>>> the boost was coming from the unintended side effects of KSM splitting >>>>>> THPs. This wasn't the case. >>>>>> If you have other theories on how the results could be bogus, we'd be >>>>>> happy to investigate those as well. But you have to let us know what >>>>>> you're looking for. >>>>>> >>>>> >>>>> Maybe I'm bad at making such requests but >>>>> >>>>> "Stefan, can you do me a favor and investigate which pages we end up >>>>> deduplicating -- especially if it's mostly only the zeropage and if it's >>>>> still that significant when disabling THP?" >>>>> >>>>> "In any case, it would be nice to get a feeling for how much variety in >>>>> these 20% of deduplicated pages are. " >>>>> >>>>> is pretty clear to me. And shouldn't take months. >>>>> >>> >>> Just to clarify: the details I requested are not meant to decide whether to >>> reject the patch set (I understand that it can be beneficial to have); I >>> primarily want to understand if we're really dealing with a workload where KSM >>> is able to deduplicate pages that are non-trivial, to maybe figure out if there >>> are other workloads that could similarly benefit -- or if we could optimize KSM >>> for these specific cases or avoid the memory deduplication altogether. >>> >>> In contrast to e.g.: >>> >>> 1) THP resulted in many zeropages we end up deduplicating again. The THP >>> placement was unfortunate. >>> >>> 2) Unoptimized memory allocators that leave many identical pages mapped >>> after freeing up memory (e.g., zeroed pages, pages all filled with >>> poison values) instead of e.g., using MADV_DONTNEED to free up that >>> memory. >>> >>> >> >> I repeated an experiment with and without KSM. In terms of THP there is >> no huge difference between the two. On a 64GB main memory machine I see >> between 100 - 400MB in AnonHugePages. >> >>>> /sys/kernel/mm/ksm/pages_shared is over 10000 when we run this on an >>>> Instagram workload. The workload consists of 36 processes plus a few >>>> sidecar processes. >>> >>> Thanks! To which value is /sys/kernel/mm/ksm/max_page_sharing set in that >>> environment? >>> >> >> It's set to the standard value of 256. >> >> In the meantime I have run experiments with different settings for >> pages_to_scan. With the default value of 100, we only get a relatively >> small benefit of KSM. If I increase the value to for instance to 2000 or >> 3000 the savings are substantial. (The workload is memory bound, not >> CPU bound). >> >> Here are some stats for setting pages_to_scan to 3000: >> >> full_scans: 560 >> general_profit: 20620539008 >> max_page_sharing: 256 >> merge_across_nodes: 1 >> pages_shared: 125446 >> pages_sharing: 5259506 >> pages_to_scan: 3000 >> pages_unshared: 1897537 >> pages_volatile: 12389223 >> run: 1 >> sleep_millisecs: 20 >> stable_node_chains: 176 >> stable_node_chains_prune_millisecs: 2000 >> stable_node_dups: 2604 >> use_zero_pages: 0 >> zero_pages_sharing: 0 >> >> >>> What would be interesting is pages_shared after max_page_sharing was set to a >>> very high number such that pages_shared does not include duplicates. Then >>> pages_shared actually expresses how many different pages we deduplicate. No need >>> to run without THP in that case. >>> >> >> Thats on my list for the next set of experiments. >> > > In the new experiment I increased the max_page_sharing value to 16384. > This reduced the number of stable_node_dups considerably (its around 3% > of the previous value). However pages_sharing is still very high for > this workload. > > full_scans: 138 > general_profit: 24442268608 > max_page_sharing: 16384 > merge_across_nodes: 1 > pages_shared: 144590 > pages_sharing: 6230983 > pages_to_scan: 3000 > pages_unshared: 2120307 > pages_volatile: 14590780 > run: 1 > sleep_millisecs: 20 > stable_node_chains: 23 > stable_node_chains_prune_millisecs: 2000 > stable_node_dups: 78 > use_zero_pages: 0 > zero_pages_sharing: 0 Interesting, thanks! I wonder if it's really many interpreters performing (and caching?) essentially same blobs (for example, for a JIT the IR and/or target executable code). So maybe in general, such multi-instance interpreters are a good candidate for KSM. (I recall there were some processes where a server would perform and cache the translations instead) But just a pure speculation :) -- Thanks, David / dhildenb