From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8C82C30653 for ; Thu, 4 Jul 2024 13:55:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40AB56B0083; Thu, 4 Jul 2024 09:55:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 393116B0085; Thu, 4 Jul 2024 09:55:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BF286B0088; Thu, 4 Jul 2024 09:55:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EB4F06B0083 for ; Thu, 4 Jul 2024 09:55:39 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7741B1A0E79 for ; Thu, 4 Jul 2024 13:55:39 +0000 (UTC) X-FDA: 82302217998.02.EED5F21 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 0640DC000D for ; Thu, 4 Jul 2024 13:55:36 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLWk+QPC; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720101324; a=rsa-sha256; cv=none; b=F/ELVAtfiZQOGOOF3HxVWDvEMbMX+EwlsHGSj+gHR4aBqJ3pdJFwlJ+bT4Fb0ivd7LA97Z pxrqkDYcem/B9YQZ5uun6c3nO5oiDPDcnKWpR4oRfF/t10Vmei3VjoEn3NpwGru4UB0Ite eyHYREhO1io8JTh/8zjIwc96Wb5pBDQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLWk+QPC; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720101324; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wgqQ32b5rVaXzz0uV6UWl6r0Y+IUDX6M52mGD4hfAuk=; b=WNrFtE+jfxwLGBGG3Jqu9JFApHUA3j4PVTW4tiMYsw9ipbHuOpG8b0CLsm2T8Sgh/qS3Hl 1oGwJ26wcUm7jgjeR5NO+7CYR+23QNw7qKYtiLUbX7Nybnz/3kM0WZCKA2Mj8619X6ND7x 3kfwE9Z74UiDXQgb4+1q34roSQhCOzo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720101336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=wgqQ32b5rVaXzz0uV6UWl6r0Y+IUDX6M52mGD4hfAuk=; b=fLWk+QPC5+dimlm/juZqX8Y32Ab3rtG7iBQB5IKou7YGPNY5oWVYCDNGjJa9LP7YLrN4sg x55bZLm/0wEKPZyZ3TRtz2gLqxdg7ST8/If5GcZ9SLoFOCz1QSS4ajS6BrrnMoH5LCAfLr Zb0722o9CQfbxR7qvbitK1wCKnKCCPQ= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-499-qylUAM4iPiKPQ8D-M_wFvw-1; Thu, 04 Jul 2024 09:55:32 -0400 X-MC-Unique: qylUAM4iPiKPQ8D-M_wFvw-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-52e993e31a9so623304e87.1 for ; Thu, 04 Jul 2024 06:55:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720101331; x=1720706131; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=wgqQ32b5rVaXzz0uV6UWl6r0Y+IUDX6M52mGD4hfAuk=; b=uLAN0ifzBd5JIuA4Y7BHGotJjwunYlESVDjsZDqiaowIbkQKfHnyeopPiQloUyyaA0 fsrhvEhRt8wqXlDGA8ak1OS+wEmchW5AbPT32YgwGnamXViG4ZfMIkZKDS/Z6p+7oHya jrHYYEuJqokSFnXR1q5gq6qc3nyJwujrXzd7MA+T++c1uPqmKm0bRZvZ24l0U58C06fl PyOOrcVOmWT88pZiOplIcRr91HKZh2eR3aJzOUIVZg0klx/or78tq/4odyV+8hDO07i6 BFwdJTH+PoUZo6i6LiTqqiCwNLXo/E/2UlFM04hMOLao5Xo8g9xUu3ezQROoeAm0RCD+ 9ZqQ== X-Forwarded-Encrypted: i=1; AJvYcCVghrrGd9vd8L474OwRneWdP85n22VBQQut5hpApXK70q88SlH61LJAhS17et66+sSgQZ58qBNvspGXU3BYmmWeb44= X-Gm-Message-State: AOJu0YySlODFTjPdZ4mDi3Or13gkECU8nOrIpy7xbbIMV0+orrfnoTBB A86E3+0qTBgX/tIYlduqY2psUq2fPTPK8xK/k8sltKJJ8VK5gvCROOAzWGKVil74kC/fqVgjLLn /eg2HXnKfauUCwyZSURnjBMcahHsM2H2kEzTpo0MCRUoaBUVA X-Received: by 2002:a19:8c05:0:b0:52c:dc57:868b with SMTP id 2adb3069b0e04-52ea061f62cmr1389928e87.13.1720101331217; Thu, 04 Jul 2024 06:55:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGd0NtIw1BhCXzUFwwdCiKFl2hOylHqKdM9d9EMLw3XzAkk3ac7QNibfzDVCdKbmo3lCqsSrQ== X-Received: by 2002:a19:8c05:0:b0:52c:dc57:868b with SMTP id 2adb3069b0e04-52ea061f62cmr1389906e87.13.1720101330748; Thu, 04 Jul 2024 06:55:30 -0700 (PDT) Received: from ?IPV6:2003:cb:c715:8600:f05d:97b6:fb98:2bc1? (p200300cbc7158600f05d97b6fb982bc1.dip0.t-ipconnect.de. [2003:cb:c715:8600:f05d:97b6:fb98:2bc1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4264a1dda06sm26012605e9.17.2024.07.04.06.55.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 04 Jul 2024 06:55:30 -0700 (PDT) Message-ID: <734ab5c8-5791-45d4-b3e5-6ee4d7cd61f4@redhat.com> Date: Thu, 4 Jul 2024 15:55:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [Question] performance regression after VM migration due to anon THP split in CoW To: Jinjiang Tu Cc: Kefeng Wang , Nanyong Sun , aarcange@redhat.com, akpm@linux-foundation.org, baohua@kernel.org, baolin.wang@linux.alibaba.com, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mike.kravetz@oracle.com, rcampbell@nvidia.com, william.kucharski@oracle.com, yang.shi@linux.alibaba.com, ziy@nvidia.com References: <740d7379-3e3d-4c8c-4350-6c496969db1f@huawei.com> <4881e19d-d556-4b54-a788-bf1e111ff24a@huawei.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <4881e19d-d556-4b54-a788-bf1e111ff24a@huawei.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 59m8jkc35crprbbzskabdqti69omqqxt X-Rspamd-Queue-Id: 0640DC000D X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720101336-709764 X-HE-Meta: U2FsdGVkX1+p3KmyOPCSnH6PnQhAghHqBQiJOMZd4Q3e7j3Dme26QEuKYqNxCSCuXCTYftQaZ4TowsHuSVfqjo8XWemKQoqv9fv9crWQzqqpSwer4I3PZHW6qNwudj8DXDtj9e+3NqanJeWAcYWwkqXr/Zgj6vSlE5YALk+mQPnkjm/hdmAzHoUkzWIWPy2u/4csLONvZq4L8SJ5yWr/53EdQXZ9T9pTLOegQxp2hlb4qbh/7SsJXl5Z3m6i/WKgTPpsEbcdE4iNqJRb+d/ySh9L+o+ZrscN1vm5bR5rJ5zlYLrj91dgmNaloNOXLeV0DTrVx2wVJlgby5z0IioM4dklXDCpW+8WyJw4YoO1PNhwL41GJS9E3e3putTJJ+QRkWgYeqH24qz4atHxh/0VT+jMGsGyBzAtpFmBwz1WmJ9+czd1izHODlOdWICIyzsL+48xQ12ViSm1X9p3M8J2Dl8twBbOHusljZGfBM9cUv0dxoVwEnJSnB5cRJGyiFil+npfG6ums0qHG4fuhg6o/42GC7ulpqD+5YGflYhg4+H3C6DyqD4Fy/erf95GFPVzI+aElLxhUsQDkF5mZoKf0QmbKpEoGaYP9hZfLXYjbEQSYWd83ot1r65FXPZEAsLd/z6HiFMxHjQ1CkcVUeM3v5Rb1HavW4obhhnOTiih7BnXiiuJFT+Gt/AkMgAmXsuzjj9qnryZOmSTc+h7nJ7aY5Sha5v27dSNC2o/s2dCenYLJwEMu4wCoP1nbpjTne05qg8rdaTARui2cXIbWAUYsYeW1UF/p8s1z8xA+hwOkaVp90V6rCkwmbzQsgpSNuQgXNxVJsQty13O0OKEPytpb38na2rFjdtSHH7HzpHcioXK+uRSUOKlIpjXoIYam6+MJ2Q/0b3T7PsjxK4wAvDoOaXwoIXQJOdP95Te2MWqaKjhpuI4va+dxtCF89rPP4yE/nlIzh37ek1rao57FGJ FCS6Qjqc Kr+0qSZ2UyuJpN36q9m1lNj+HdFrSW18Ovzi+k0wNJq3cLSk7pgAdFAwfcott0YVu7KQr2wDi7xOy5RPs+49nRK2Ce7vve7Cpq/SjYJBqtZlY6GkT2AuWHM0QWRZYQkgLVqSvbhjSoccx/wyInmdsUUhXoOmBbq9ehzN/mG0XDR1eRkgNpHOA/VyovPGcmCCT964SF4OTB46myIELVJXlpHyGFYDTkQQP5i+oOG0ehZsEMcAy0SSaBiLYg6s/dglaNK/w7JEeuapgaNyDwRcIgSo1Y6tMGgSldYoJeAJeRr/bEkUvODsBumISGMRlNOkAin2uCmk5itoizwXKSIEcJckGsZ8oZfP61I63js8j3sRjTmWHlvHkJuCvde+WhJGMyPqOsYpVmk4SLOetLKPqjnDp9qhcZD9OU9mAnYxNJFu9j+blmMbqXPQGAIfmdi9vnqCVCMBZgDNEDd+Mev23xVpvJnD0tyW2lUVu+0vIbMGM5QwQ6scgFvq++2EB3mdoskBoFuGLragqARA6oijaEfc/PyjM7r5WEMeT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04.07.24 15:31, Jinjiang Tu wrote: > > 在 2024/6/29 17:45, David Hildenbrand 写道: >> Hi, >> >> Likely the mailing lists won‘t like my mail from this Google Mail >> client ;) >> >> Jinjiang Tu schrieb am Sa. 29. Juni 2024 um 11:18: >> >> Hi, >> >> We noticed a performance regression in benchmark memtester[1] after >> upgrading the kernel. THP is enabled by default >> (/sys/kernel/mm/transparent_hugepage/enabled >> is set to "always"). The issue arises when we migrate a virtual >> machine >> that has 125G total memory and 124G free memory to another host. >> And then, >> we run the command `memtester 120G` in the VM. The benchmark takes >> about >> 20 seconds to consume 120G memory in v4.18, but takes about 160 >> seconds in >> v5.10. This issue exists in mainline kernel too. >> >> >> Simple: use preallocation in QEMU. „prealloc=on“ for host memory >> backends, for example. >> >> >> We find commit 3917c80280c9 ("thp: change CoW semantics for anon-THP") >> leads to the performance regression. Since this commit, When we >> trigger a >> write fault on a anon THP, we split the PMD and allocate a 4K >> page, instead >> of allocating the full anon THP. When a VM is migrating (based on >> qemu[2]), >> if the page is marked zero page in the source VM, the destination >> VM will >> call mmap and read the region to allocate memory, making the >> region mapped >> by the zero THP. When we run memtester in the destination VM after VM >> migration finishes, memtester(in VM) will allocate large amounts >> of free >> memory and write to them, cause CoW of anon THP and THP split, further >> cause performance regression. After reverting this commit, performance >> regression disappears. >> >> >> You talk about COW of anon THP, whereby your scenario really only >> relied on COW of the huge zeropage. >> >> Wouldn’t you would get a similar result when disabling the huge zeropage? >> >> >> >> This commit optimises some scenarios such as Redis, but may lead to >> performance regression in some other scenarios, such as VM migration. >> How could we solve this issue? Maybe we could add a new sysctl to >> let users >> decide whether to CoW the full anon THP or not? >> >> >> I‘m not convinced the use case you present really warrants a toggle >> for that. In your case you only want to change semantics on COW fault >> to the huge zeropage. But … >> >> Using preallocation in QEMU will give you all anon THP right from the >> start, avoiding any cow. Sure, you consume all memory right away, but >> after all that‘s what your use case triggers either way. And it might >> all be even faster. :) >> >> Cheers! >> > Thanks for reply. The two methods both work. But they both lead to large > memory consumption even though the VM doesn't need so much memory right now. Please see https://lkml.kernel.org/r/1cfae0c0-96a2-4308-9c62-f7a640520242@arm.com on a related discussion. -- Cheers, David / dhildenb