From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DBADC433FE for ; Thu, 29 Sep 2022 17:53:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4336B8D0003; Thu, 29 Sep 2022 13:53:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E1F48D0002; Thu, 29 Sep 2022 13:53:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2AA098D0003; Thu, 29 Sep 2022 13:53:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1C9B98D0002 for ; Thu, 29 Sep 2022 13:53:43 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 32D10C1223 for ; Thu, 29 Sep 2022 17:53:42 +0000 (UTC) X-FDA: 79965870684.17.9173211 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id D0018120003 for ; Thu, 29 Sep 2022 17:53:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664474021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/sXJFjCRFxMNssr/mUWen1rN16k+mBeuV002AH3z+GQ=; b=GLiYOW6GJKm+uJs8ne4Bh/2f2qwRuaJLf3ZN1R7kvsOOqUwLK97ZTIZr0dDAHD/G1KJMTh lpzZ5XG48d+7y8HNmRCa0inxaWZ9pMg65lOql4mG44yvRHFLuU/3oTEMcIWS+f1HtRk+xj PJA+U4gWqmKvuLg7j3oTmMxJlmAB+Vc= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-528-TUeCFj8RMqSItpkdkKFQCA-1; Thu, 29 Sep 2022 13:53:37 -0400 X-MC-Unique: TUeCFj8RMqSItpkdkKFQCA-1 Received: by mail-wr1-f70.google.com with SMTP id d18-20020adfa352000000b0022cbe33d2a5so803106wrb.11 for ; Thu, 29 Sep 2022 10:53:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=/sXJFjCRFxMNssr/mUWen1rN16k+mBeuV002AH3z+GQ=; b=nfqA7mPMCdgMXxxTxiEjYluPxook/VdSulqvl8n43xmO90hZwPzgZkx36hxT4/mnFC OFoMXFLRtaZiIBVFFQhQoLpx5d7pmLh42ScDvOHAesdSN8YnhSSMaQ5+dVK+9sIAK48S yFEuhTGjIBqPhzZxQvMIrUCR38zTDQjOSFO4fVxtUY/vj+yCMsDsD94nlGUTM3GzfcxB GIuuxzR8xReWn9TbfdYH0rBGoWLIXhSsX350OFHJujLK2jWYN1XREylfqo5lopRBLuwh 6AI0sqQynkyM7nWFvSkbQI5RW8bQVYGsPVSLSJihgnViGiVIBZX67QDNw4FFkH9g/y36 8EeA== X-Gm-Message-State: ACrzQf1dIA6nlTUhViz07GpFiLKd7vQqzUeWLjb/l+Ss4FkN1SfhF1zD DgWk7BuBd4Q0I1vcE7o8Ewxb0jIwwQG8G35NKVW1Y/XNif8zIGJ8tUJdHaXBKdj0C8N1v4BYp7S hmbMMpN+zn7Y= X-Received: by 2002:a5d:64e8:0:b0:22a:bb78:1e44 with SMTP id g8-20020a5d64e8000000b0022abb781e44mr3522232wri.378.1664474016327; Thu, 29 Sep 2022 10:53:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6NM1V+j1UR86PNp4yTqE5L3bA7FGaGV39Ce7xz8ZTyuznu+JH3bLcqrb0Wa7tlWDfb94uU8g== X-Received: by 2002:a5d:64e8:0:b0:22a:bb78:1e44 with SMTP id g8-20020a5d64e8000000b0022abb781e44mr3522211wri.378.1664474016026; Thu, 29 Sep 2022 10:53:36 -0700 (PDT) Received: from ?IPV6:2003:cb:c705:ce00:b5d:2b28:1eb5:9245? (p200300cbc705ce000b5d2b281eb59245.dip0.t-ipconnect.de. [2003:cb:c705:ce00:b5d:2b28:1eb5:9245]) by smtp.gmail.com with ESMTPSA id bs11-20020a056000070b00b0022a2bacabbasm43197wrb.31.2022.09.29.10.53.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Sep 2022 10:53:35 -0700 (PDT) Message-ID: <1fc6984b-bcc7-123d-1ea3-9e04d5b26529@redhat.com> Date: Thu, 29 Sep 2022 19:53:34 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 To: Claudio Imbrenda Cc: xu.xin.sc@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xu xin References: <20220929025206.280970-1-xu.xin16@zte.com.cn> <4a3daba6-18f9-d252-697c-197f65578c44@redhat.com> <20220929123630.0951b199@p-imbrenda> <745f75a4-6a2a-630f-8228-0c5e081588e7@redhat.com> <20220929140548.1945dccf@p-imbrenda> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 0/3] ksm: fix incorrect count of merged pages when enabling use_zero_pages In-Reply-To: <20220929140548.1945dccf@p-imbrenda> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664474021; a=rsa-sha256; cv=none; b=qyesodseiJLwc299rWIOj3EWKz9YkCI9j/rkjvsX4K65CjqlVzVtKvSot/hEtNDjS50vLz g3BdRqAAtFTwSqoSqDqQgr4nXkjaIwy67ZwAnpOS2oncQdY4wv+mWSHvIVeJbtMH0oAZNd 8xL+sJgq5GzCK9vM8ZV125lAF6XW46Q= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GLiYOW6G; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664474021; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/sXJFjCRFxMNssr/mUWen1rN16k+mBeuV002AH3z+GQ=; b=jd/KpXfKnJ0C2+0lkGY21fBxWoaNvsS+dIv47uPy8AHhpS7bepTSE9/DzvuWq7koEuhscm ug/QpfCc6i1hfwFwjcdF50TNsHfMLUpJPrdrDifUscizDIQXTyH1Fg+YIPz9K6vd3LKE7M cxjBIQP7wckJVG9dY5x5qiG1zcf2HEw= X-Stat-Signature: kd7sa8ojhm4sde4gfpdc6tdfeopz6585 X-Rspamd-Queue-Id: D0018120003 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GLiYOW6G; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1664474021-67456 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 29.09.22 14:05, Claudio Imbrenda wrote: > On Thu, 29 Sep 2022 13:12:44 +0200 > David Hildenbrand wrote: > >> On 29.09.22 12:36, Claudio Imbrenda wrote: >>> On Thu, 29 Sep 2022 11:21:44 +0200 >>> David Hildenbrand wrote: >>> >>>> On 29.09.22 04:52, xu.xin.sc@gmail.com wrote: >>>>> From: xu xin >>>>> >>>>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/ >>>>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But >>>>> after enabling use_zero_pages, all empty pages that are merged with >>>>> kernel zero page are not counted in pages_sharing or pages_shared. >>>>> That is because the rmap_items of these ksm zero pages are not >>>>> appended to The Stable Tree of KSM. >>>>> >>>>> We need to add the count of empty pages to let users know how many empty >>>>> pages are merged with kernel zero page(s). >>>>> >>>>> Please see the subsequent patches for details. >>>> >>>> Just raising the topic here because it's related to the KSM usage of the >>>> shared zero-page: >>>> >>>> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare >>>> the shared zeropage as placed by KSM (which is against the >>>> MADV_UNMERGEABLE documentation at least). It will only unshare actual >>>> KSM pages. We might not want want to blindly unshare all shared >>>> zeropages in applicable VMAs ... using a dedicated shared zero (KSM) >>>> page -- instead of the generic zero page -- might be one way to handle >>>> this cleaner. >>> >>> I don't understand why do you need this. >>> >>> first of all, one zero page would not be enough (depending on the >>> architecture, e.g. on s390x you need many). the whole point of zero >>> page merging is that one zero page is not enough. >> >> I don't follow. Having multiple ones is a pure optimization on s390x (I >> recall something about cache coloring), no? So why should we blindly >> care in the special KSM use case here? > > because merging pages full of zeroes with only one page will have > negative performance on those architectures that need cache colouring > (and s390 is not even the only architecture that needs it) > > the whole point of merging pages full of zeroes with zero pages is to > not lose the cache colouring. > > otherwise you could just let KSM merge all pages full of zeroes with > one page (which is what happens without use_zero_pages), and all the > numbers are correct. > > if you are not on s390 or MIPS, you have no use for use_zero_pages Ah, I see now that use_zero_pages is really only (mostly) s390x specific. I already wondered why on earth we would really need that, thanks for pointing that out. One question I'd have is: why is the shared zero page treated special in KSM then *at all*. Cache coloring problem should apply to *each and every* deduplicated page. Why is a page filled with 0xff any different from a page filled with 0x0? Yes, I read e86c59b1b12d. It doesn't mention any actual performance numbers and if the performance only applies to some microbenchmarks nobody cares about. Did you post some benchmarks results back then? That would be interesting. I assume that the shared zeropage was simply the low hanging fruit. > >> >>> >>> second, once a page is merged with a zero page, it's not really handled >>> by KSM anymore. if you have a big allocation, of which you only touch a >>> few pages, would the rest be considered "merged"? no, it's just zero >>> pages, right? >> >> If you haven't touched memory, there is nothing populated -- no shared >> zeropage. >> >> We only populate shared zeropages in private anonymous mappings on read >> access without prior write. > > that's what I meant. if you read without writing, you get zero pages. > you don't consider those to be "shared" from a KSM point of view > > does it make a difference if some pages that have been written to but > now only contain zeroes are discarded and mapped back to the zero pages? That's a good question. When it comes to unmerging, you'd might expect that whatever was deduplicated will get duplicated again -- and your memory consumption will adjust accordingly. The stats might give an admin an idea regarding how much memory is actually overcommited. See below on the important case where we essentially never see the shared zeropage. The motivation behind these patches would be great -- what is the KSM user and what does it want to achieve with these numbers? > >> >>> this is the same, except that we take present pages with zeroes in it >>> and we discard them and map them to zero pages. it's kinda like if we >>> had never touched them. >> >> MADV_UNMERGEABLE >> >> "Undo the effect of an earlier MADV_MERGEABLE operation on the >> specified address range; KSM unmerges whatever pages it had merged in >> the address range specified by addr and length." >> >> Now please explain to me how not undoing a zeropage merging is correct >> according to this documentation. >> > > because once it's discarded and replaced with a zero page, the page is > not handled by KSM anymore. > > I understand what you mean, that KSM did an action that now cannot be > undone, but how would you differentiate between zero pages that were > never written to and pages that had been written to and then discarded > and mapped back to a zero page because they only contained zeroes? An application that always properly initializes (write at least some part once) all its memory will never have the shared zeropage mapped. VM guest memory comes to mind, probably still the most important KSM use case. There are currently some remaining issues when taking a GUP R/O longterm pin on such a page (e.g., vfio). In contrast to KSM pages, such pins are not reliable for the shared zeropage, but I have fixes for them pending. However, that is rather a corner case (it didn't work at all correctly a while ago) and will be sorted out soon. So the question is if MADV_UNMERGEABLE etc. (stats) should be adjusted to document the behavior with use_zero_pages accordingly. -- Thanks, David / dhildenb