From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C87F8E77197 for ; Tue, 7 Jan 2025 16:41:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 366686B00A2; Tue, 7 Jan 2025 11:41:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3141E6B00A4; Tue, 7 Jan 2025 11:41:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 140436B00A6; Tue, 7 Jan 2025 11:41:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E34216B00A2 for ; Tue, 7 Jan 2025 11:41:31 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 92F061609D9 for ; Tue, 7 Jan 2025 16:41:31 +0000 (UTC) X-FDA: 82981221582.01.0F77390 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 15695A000C for ; Tue, 7 Jan 2025 16:41:28 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iaAZuVpc; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736268089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u/+sNd1lokDgg4e6IRGxrC+yp8zU1dtSUASKGBhrhlE=; b=i1Idux3yPSNEkRvgRy4Cu5Dv/jMjZEeE3t5P6OKErCkewQSD3XHZQ3+hHtOaEP5EU7fdvk 4RU6H07jYIEOzJIity++q6eDCtLLx+oXZoCmKH0Y6Vh1+wfE0FQs/UR9QhHojJ4L/L+/q4 1RYU3QxZDAA40HHKnYYnuzszEOxKRds= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iaAZuVpc; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736268089; a=rsa-sha256; cv=none; b=l/d+mUPJZvzo95Wpscs3h+rEyosojBeH6TTyHX8CYCAtFUYmKPOvtfg45l+IIWP6KgQW4V 47vd43D6NOoUgRn6QmmzOBS9wwg3fz4mrgzJ4x1MmmwKgFyJj7BgTEeYHOiSEkyk5izeQm Rl1p6yNaVZiCSC6IjDwHdVXq8K3HpMM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736268088; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=u/+sNd1lokDgg4e6IRGxrC+yp8zU1dtSUASKGBhrhlE=; b=iaAZuVpcTnXG5U7xvCcF+n6eY0Vw/mI756oluk2dQNoD+N4lpOkvuIKbMJUy67o8Av3YOo fzCD6TJ2R6eMMHxo7bshJ9+kafLZLTvFJWRi3IKCuuYhOwn9wxTy7ZGt4kw8zIgeM8YRuQ CWRP0VS0DTt+tQj0+aFAo/dWqqWQjx0= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-96-SmnrVwBPMh2q60g7lg8__g-1; Tue, 07 Jan 2025 11:41:27 -0500 X-MC-Unique: SmnrVwBPMh2q60g7lg8__g-1 X-Mimecast-MFC-AGG-ID: SmnrVwBPMh2q60g7lg8__g Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3862f3ccf4fso6502509f8f.0 for ; Tue, 07 Jan 2025 08:41:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736268085; x=1736872885; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=u/+sNd1lokDgg4e6IRGxrC+yp8zU1dtSUASKGBhrhlE=; b=tlWH65NeDq1ogjF6mQVmI+srDXIsrs9Hj1K9SbG5ivdtaIGXRD7XgU57oZFBUoH7W0 vGtjn4Q/9MLF/frlvdOOH/LhIN0iBhS092NROFId3MtLFyHN4yXiHq4/JwqsNRGcFRSI jA7WFVhaMfUDcNAYOcicoBOv0B9IS2JHDiLr+I7t4aAV3cS8dhe+XlWjIZFefI+bVVVg dJdygx38UF4TjJxT0TeY+OMCJoeXHDuz5uA6XX52l87qAOCaAXQRpn4Ev2Z7dfZgWDG/ zwGG+SbQDJg8XW4RyKHA4rS2k3oNQR3c9tDqJp6cyswTgqcGysu1UxrFjP82elf6mcKL 2fRw== X-Forwarded-Encrypted: i=1; AJvYcCW8soYkNzU9klBtFxS+BDLZwIVGnAP8mbTzs15Iwm/pJR70GvZmDs36o2u+kLimM3dB5s1iWryaxg==@kvack.org X-Gm-Message-State: AOJu0Yz4gxJ+BeDJYSxpJ0k4ZLo61wDkHslwqA5FpagPQU7ySzlMKsp3 rH2YBWuN+Ptdf3iidDrjkjZ5YVpOXJOgNAOO6ydt7ehDXsfRF0V771CmN9NNZ445jz7Bg7PW9aW THOLx+UtQd57Hp3eb2yY+DszNmDBk7UMt1FF9Q+rXcWcD2Lx0 X-Gm-Gg: ASbGncv1y1d2mcBt+aRsGupIRsvkbFoDxbVEV33ihnjwdlXga6zWHvrW/LJYoOxqVGZ +ZKVxoo3D1P4oSQv7S1fyjwpUlyhgT9WItSwzgrr+MX+caIurDAyZ/Q5AP3bSFsL9TEOF6sUqsj lg7BWnFMZDuHItrWLqkyK63MkHZf/BPbETgfMJMoic3V9lHeUByAdlGnJZCT5/tGzYd9JbvwieN sNE98YYoCPj6UHSn15b75z2lBGpkwPKXUBcviIb5HbyGgkpj8EDZCm8OeAntEzZNwiw84Wk4fjU Pm1/XnG8W2esAtlKDMWzz7qM7ANYpigYxDKXe1ztGsYY8sdGL+IyGOn34Ws8/ZITtpzAot/yWnC TU5VI9QRk X-Received: by 2002:a05:6000:18a8:b0:385:f0dc:c9fd with SMTP id ffacd0b85a97d-38a221f2fd3mr49778255f8f.27.1736268085711; Tue, 07 Jan 2025 08:41:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IEM8QWIV6xkO8aqwWB4ArSpsPkMvQjcEcvsaAWwZdYL7YeqpVIJQIqZ4N2B65d8o3pUoP4RFA== X-Received: by 2002:a05:6000:18a8:b0:385:f0dc:c9fd with SMTP id ffacd0b85a97d-38a221f2fd3mr49778234f8f.27.1736268085331; Tue, 07 Jan 2025 08:41:25 -0800 (PST) Received: from ?IPV6:2003:cb:c719:1700:56dc:6a88:b509:d3f3? (p200300cbc719170056dc6a88b509d3f3.dip0.t-ipconnect.de. [2003:cb:c719:1700:56dc:6a88:b509:d3f3]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38a1c6ad3e3sm52515474f8f.0.2025.01.07.08.41.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Jan 2025 08:41:24 -0800 (PST) Message-ID: <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com> Date: Tue, 7 Jan 2025 17:41:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Swap Min Odrer To: Daniel Gomez Cc: Ryan Roberts , Barry Song , Andrew Morton , linux-mm@kvack.org, Luis Chamberlain , Pankaj Raghav , chrisl@kernel.org References: <20250107094347.l37isnk3w2nmpx2i@AALNPWDAGOMEZ1.aal.scsc.local> <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: tbpEuGEMBcDFP8dYGOVPPqECmMWu1bam5j6A9DiabZo_1736268086 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 15695A000C X-Rspamd-Server: rspam12 X-Stat-Signature: nfx76t4gydhx6mr9ezr5kzi8rbsqzqkp X-Rspam-User: X-HE-Tag: 1736268088-856949 X-HE-Meta: U2FsdGVkX1+jzd5LRH/BjIJ/+2l5Bmzgco6biC4zpSvvhVc17BS+s5QY6L9C/aF4BjPlk1eNtGRwM65JdakyULQTfXCDdBCjoIwU+Dk82kuBGpM/77LqKvFJHKsm0IFZM3jrSyZ0T7Ro+SaufpcarnVzYxViqFdWjg8blyhkzIorK3d+q0szt8eTJ2f2NE3lQUf9eWJDGVqaJbSRPmyzoFLDas/4rtXTc14Ysge0P7R4E7gWwLrCu72f7DYiVuaSImauM7TUew8XRAqMhtm4xaQnjWiG9md++onnMgnh3Sjkd7wnD8MafPwt+tfJ7FAf4EbHdbT3o843dPZVGRTD3DsUSa7+8H7icFFIsHKkSzKtTG6ap8NfFoO9kNo6oH0W/PlI9ZwMB9++QPy/cWyhQKUSZBDrsqOSTtR/rd1Ba+CMJtB6UQAAkj7mWVukmKifexcRgkxpK+ZcFgknJryk9gGgkcRqeVgcFPU8Ykzh8l9uUz4CN9aor3oeGaO4uU4wDxjjhjvbZGUrEShkC/RCPUXu9knYrSsDd9UE/ImKQ+iD74zTgaS2geSGQD36XhG+illB7f2HsivDYyeZBNXWQ8TSN8++xrRJV26w2PJXJtUekDU5X8V8P2Jbn29djMRuy8ipleLwIB0nH4yOwi49lBmgVFZMkoeXmiUlTvYLLFj/veT7DRQkQemOUF200WGGI4vX/OYqncYv9XF/ns2xE7e5SdIaBCB+pe435JO0YD/iHjPMbMc22kcMfee9GtCrq/93qYAnKaWxCrORazWxQBBaMXjR3cedXGNnRQHzwNbhPPKEJa2ofZHQMKuhH2r+4S9wUzRLMELAABcQFsvGJrMDdxnq4531J28VKiyqzNAyN1RAqUEjkn+NAj0naE3Kn3CWgQ9oCwG1PBULBEsUKf4h4tdf8yQPKUoAPf3JSt5AX7iOShMI0hqzQxzgAUhjrAtUvDc2pqnMw3qzYex 11jO83Z+ gV8RaRKinX1qzBJfbSrBDbHGb80lUXvEOd/sn+MlLbAGjc1i8yhRPSpGUjTpHOeJz/zevAVWcOXTHhnBI7Q5NmOf9Zpf1p6/B2Rqxmn4Z71t/HPMiG58y+7ai6VKF18i5RshrkkZwHKh+YJZmmxh1NHsc9wCDYHN9mrb/MHhwlWGGr1KFeIjm6vn7NWtnjdisj7EGrqhppv5adnrtfryzgap/l3XY5PuWuWaGiEFgAWtE59Zc9auReR7WJUPdIip8kNVcez4eh/ohfPDWKH3leSR9QPHYyUQi/su2FrZM6IvEEANEsGutZtMFp93jY/zAoMX7SqpHahTuRP3HzIELHB+YJOuK372ZLtVmtDM3NVl+AGiqOq3per/NfiXgBQm6u4YaLODSNeLHl58= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07.01.25 13:29, Daniel Gomez wrote: > On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote: >> On 07.01.25 10:43, Daniel Gomez wrote: >>> Hi, >> >> Hi, >> >>> >>> High-capacity SSDs require writes to be aligned with the drive's >>> indirection unit (IU), which is typically >4 KiB, to avoid RMW. To >>> support swap on these devices, we need to ensure that writes do not >>> cross IU boundaries. So, I think this may require increasing the minimum >>> allocation size for swap users. >> >> How would we handle swapout/swapin when we have smaller pages (just imagine >> someone does a mmap(4KiB))? > > Swapout would require to be aligned to the IU. An mmap of 4 KiB would > have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any > potential RMW penalty. So, I think aligning the mmap allocation to the > IU would guarantee a write of the required granularity and alignment. We must be prepared to handle and VMA layout with single-page VMAs, single-page holes etc ... :/ IMHO we should try to handle this transparently to the application. > But let's also look at your suggestion below with swapcache. > > Swapin can still be performed at LBA format levels (e.g. 4 KiB) without > the same write penalty implications, and only affecting performance > if I/Os are not conformant to these boundaries. So, reading at IU > boundaries is preferred to get optimal performance, not a 'requirement'. > >> >> Could this be something that gets abstracted/handled by the swap >> implementation? (i.e., multiple small folios get added to the swapcache but >> get written out / read in as a single unit?). > > Do you mean merging like in the block layer? I'm not entirely sure if > this could guarantee deterministically the I/O boundaries the same way > it does min order large folio allocations in the page cache. But I guess > is worth exploring as optimization. Maybe the swapcache could somehow abstract that? We currently have the swap slot allocator, that assigns slots to pages. Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options to explore. For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a single slot. This would waste swap space with small folios, that would go away with large folios. If we stick to 4 KiB swap slots, maybe pageout() could be taught to effectively writeback "everything" residing in the relevant swap slots that span a BS? I recall there was a discussion about atomic writes involving multiple pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely no expert on that, unfortunately. Hoping Chris has some ideas. > >> >> I recall that we have been talking about a better swap abstraction for years >> :) > > Adding Chris Li to the cc list in case he has more input. > >> >> Might be a good topic for LSF/MM (might or might not be a better place than >> the MM alignment session). > > Both options work for me. LSF/MM is in 12 weeks so, having a previous > session would be great. Both work for me. -- Cheers, David / dhildenb