From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F890C3DA49 for ; Tue, 30 Jul 2024 08:47:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 408AD6B0085; Tue, 30 Jul 2024 04:47:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B8616B0088; Tue, 30 Jul 2024 04:47:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2596E6B0089; Tue, 30 Jul 2024 04:47:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 068276B0085 for ; Tue, 30 Jul 2024 04:47:28 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 965A2C01C3 for ; Tue, 30 Jul 2024 08:47:28 +0000 (UTC) X-FDA: 82395790176.12.50290C2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 0D073140003 for ; Tue, 30 Jul 2024 08:47:25 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c5zwPG1H; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722329174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BoagRk+gSe5xue+rd2621fYhA6FVlq9e/Frmy4c0XII=; b=Ex7ERgqovQAAT9+WZ5a1qaIHN9vzqU91IR7FBfnlfVz3wDzMNJGuS31yxJSg20S/qHnlov iIxkLmNX9MwSr7yJW6wLxPjlEFoij6HhMSsK+T4Qb1wvDkVmv6FvI7T2Kfshaa2qNRPkkZ J9DAHpLEiC3LsCJb4Xr/rCU4O0rTTek= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c5zwPG1H; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722329174; a=rsa-sha256; cv=none; b=PN5S/8gEp1GUUZNkzUDbQMcO/b/mrsljdKNr9aFEE+eL13YXd5mlA+tmnOPJyFBCAYQSAT /ANRWcXcnV/cbH10I/lYLafvMh3FR6Lz+MqqAa2LBfbLdnIc78/ZQBLFcY2kWH2EVrm+LC JnkvkSq1DBFNuN/QwCtASI1jeuXk48Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722329245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=BoagRk+gSe5xue+rd2621fYhA6FVlq9e/Frmy4c0XII=; b=c5zwPG1HapfpCiUDL+MNOzh+lZyHTgF/ZyGz2KyBeJgOKgqCNHigH4y86dH9866l5sV06V i1zbhrhgFB5c45flUdxrvIQlwozmpIK45Jn0FxYhDHR+Jyj5nxmxLEutrJDt9fFZ3PQU2d BESsSwn8HL4M7e0kftMOeu8XhBDsjWg= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-6-NKAULacdMhiYknO6SaP3Hw-1; Tue, 30 Jul 2024 04:47:21 -0400 X-MC-Unique: NKAULacdMhiYknO6SaP3Hw-1 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-2ef1b1f93baso42405451fa.0 for ; Tue, 30 Jul 2024 01:47:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722329240; x=1722934040; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=BoagRk+gSe5xue+rd2621fYhA6FVlq9e/Frmy4c0XII=; b=beX2gI08bjqgAVWN92zGjHlaVkLh2TU/wvkclsG/tqCxzxQ/cfIC/H9KWVdqdpBAyR WcUUGNsu9mFcJLPsAACuCbIceVBF+mAZi5uDF5v2br+jIN8dIr/psQQafZtflR22lCmj adwizHPK16TtrpmWCGGTDOFGygzHTd8j065PuTpelsQIuKKomloin5HFejImVM9T33Fu 8igpcSZMBA8cl4KrdBtL2juW/NKiZIhx5UqdJXylilM7zSZYkj0Gq8Ac+RjIYArNdhaW 3dKjp5i+pXNNHKGbEt/7I1nSbOKYm05POeDbGbJUP29SumVH9qdkH4E8hCaiFgnyFE4i iOUQ== X-Forwarded-Encrypted: i=1; AJvYcCVxjEPuYT4Fozwr/9KUWzQK8dEFYomCsPWJNnIXSW6ysbb1vynIHxN4lfL3tdWK3pmvRXMsSSo5e7sb9iGA/ckZTrk= X-Gm-Message-State: AOJu0YypW30QnOfB47wbTtn3s5Jb93RnGHzdgyNG8oXUUAZSDXmIgubW z9R7skubFz6aHZeFU/ywarFxofHNvcHeenzzWpNmjqxQeRGRtuywaejdsb0TJkGzfsCs3vcDkKY gBgidFspaIXyol0noW1SeNCIRymUhbQCFZXfI9dO9Sl4d5X3I X-Received: by 2002:a2e:a4ac:0:b0:2ef:392e:e4f with SMTP id 38308e7fff4ca-2f12ee2f0e1mr52206441fa.47.1722329239646; Tue, 30 Jul 2024 01:47:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHp78HIdgJdyE0H3uqbbJxnRLg6C1y31kOyGCMPQ69rbANTg7A1Z8vi9tPcoxyWbTF4SzUYjQ== X-Received: by 2002:a2e:a4ac:0:b0:2ef:392e:e4f with SMTP id 38308e7fff4ca-2f12ee2f0e1mr52206131fa.47.1722329239002; Tue, 30 Jul 2024 01:47:19 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:4e00:31ad:5274:e21c:b59? (p200300cbc7064e0031ad5274e21c0b59.dip0.t-ipconnect.de. [2003:cb:c706:4e00:31ad:5274:e21c:b59]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42808f684d0sm183470695e9.6.2024.07.30.01.47.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Jul 2024 01:47:18 -0700 (PDT) Message-ID: Date: Tue, 30 Jul 2024 10:47:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy To: Ryan Roberts , Matthew Wilcox , Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, xiang@kernel.org, yosryahmed@google.com References: <20240726094618.401593-1-21cnbao@gmail.com> <20240726094618.401593-5-21cnbao@gmail.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0D073140003 X-Stat-Signature: 1837y34sigatcxwderc9bjezjcbmkxje X-Rspam-User: X-HE-Tag: 1722329245-77367 X-HE-Meta: U2FsdGVkX1+TrIa+H5eVlWTQ7/IoMTgUzdt2eVB5HFMwi2QdRniWlExXdeN6l9WICCnl2BP4pzvkZEV7XhsUAGNfp+7CRwFF7sMQWsAebK3TseJAWaHCd/domuIM9ia9e7XdeCts5VBZ/jv68FWAq0dOzFga0sskEOPt2SNVgkAiye8ldhjdU+HEihEgR7EkFsjAIdtOgwbbRaUvXszHzi1FCPcvw9YQWmz3bUzsMKtK9jd/U0/l2V7NgGWNddIX6gCJJAuM1IuVU9q5xA9xmk9/vo++xlBj/n39RWVKXV4Lq3YrFCsOqHv7MVAH3+wBXP6BCDWOKievNHin97IUsQxjmBcD9r/pZFiK+y09Tj9GlgFPISXFZF+iMetKWMY+y6xlrNJ38LfEIx43Vv3tIEV+X3O3KHJvrbGefnefDaiCnvEJpm5NDAtHdBosq6BxBwbmS5V+FUB5hPqaWYgskq7k3t8ItMKC3Soq4NWJ1905q7fYSQYLNIDR53cuYSXVpD9RiBEKEb/uwdqBRGg153DeCWVkRDPh+5gCSR0kpqjsLn08JS8qd5rlkqiDzDbQ4y6FHfw1O2zgQBi+tyTClNPFr5bPbNiDc4bGpaF6vo0TPuJP9dORycpEWRCqRTkFb5E2XVxoHJQndynLAsRYLuQURK1eE8oAnu8L4jWWFj01GRJNjdy8hMMcPV3Uk+G/hAMIH/gLSCgBN5McP4LUmRH+KYRckCTmP5YJG8+NeVg2Yr/895/gmAFpik/phvtt3D6wEC7SBz2DevxZ8X6DEQH5hOzIKkEAWGDjoW2uc29PW+WPpWK3BOC/ASl6URQnSSgHVnGN6C0vfOz1tctY7a9O4CBpt5055VZ+bSTzbOa3rM4yGR3v5oIzNKrzs9UUtmi5xDRfpfCncpjbD/IQVqvPyu0oA+AADmuqvpKOXhspNTJvluc7PY5s6gKINOkE67e49+vcUni7EJNaevy s3HFcaXL jcD0PPLNY12aFeBu0pE7YSX4hLXBDdRgPhmb0Jz3AFVavASso2zgoSBc+r2Ai3//X9ym9JgrZnGmoBuoahhqaDzUEOktHrZzZ8CzjAhUBBd8O07Y5MdLXTF4369FCMB2bN6ansND2YWWf1vpRCNbrkIvson/wO6QuuhebESY3Y8G0ZsFtQBtJV3Uu/1c3SJHPdEJtXP3hzRjL+qr5Smq8dEvPSTOSiGKq1d6PtekL5Erom3tqNvWFdu2Mr8lHrKY5CJ8EpAomDvNErqIx9b4GqJ994zeISLG1soFcITY3GRDyfDxs2iRo5ae9RR5/v6hdg8YqjRGHME3LxO6QyXpd0etI8yAy1GjdMpJhjjaCgNO7zcXYpheSwB7aig== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30.07.24 10:36, Ryan Roberts wrote: > On 29/07/2024 04:52, Matthew Wilcox wrote: >> On Fri, Jul 26, 2024 at 09:46:18PM +1200, Barry Song wrote: >>> A user space interface can be implemented to select different swap-in >>> order policies, similar to the mTHP allocation order policy. We need >>> a distinct policy because the performance characteristics of memory >>> allocation differ significantly from those of swap-in. For example, >>> SSD read speeds can be much slower than memory allocation. With >>> policy selection, I believe we can implement mTHP swap-in for >>> non-SWAP_SYNCHRONOUS scenarios as well. However, users need to understand >>> the implications of their choices. I think that it's better to start >>> with at least always never. I believe that we will add auto in the >>> future to tune automatically, which can be used as default finally. >> >> I strongly disagree. Use the same sysctl as the other anonymous memory >> allocations. > > I vaguely recall arguing in the past that just because the user has requested 2M > THP that doesn't mean its the right thing to do for performance to swap-in the > whole 2M in one go. That's potentially a pretty huge latency, depending on where > the backend is, and it could be a waste of IO if the application never touches > most of the 2M. Although the fact that the application hinted for a 2M THP in > the first place hopefully means that they are storing objects that need to be > accessed at similar times. Today it will be swapped in page-by-page then > eventually collapsed by khugepaged. > > But I think those arguments become weaker as the THP size gets smaller. 16K/64K > swap-in will likely yield significant performance improvements, and I think > Barry has numbers for this? > > So I guess we have a few options: > > - Just use the same sysfs interface as for anon allocation, And see if anyone > reports performance regressions. Investigate one of the options below if an > issue is raised. That's the simplest and cleanest approach, I think. > > - New sysfs interface as Barry has implemented; nobody really wants more > controls if it can be helped. > > - Hardcode a size limit (e.g. 64K); I've tried this in a few different contexts > and never got any traction. > > - Secret option 4: Can we allocate a full-size folio but only choose to swap-in > to it bit-by-bit? You would need a way to mark which pages of the folio are > valid (e.g. per-page flag) but guess that's a non-starter given the strategy to > remove per-page flags? Maybe we could allocate for folios in the swapcache a bitmap to store that information (folio->private). But I am not convinced that is the right thing to do. If we know some basic properties of the backend, can't we automatically make a pretty good decision regarding the folio size to use? E.g., slow disk, avoid 2M ... Avoiding sysctls if possible here would really be preferable... -- Cheers, David / dhildenb