From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B6BCC02180 for ; Mon, 13 Jan 2025 15:27:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD3F56B0089; Mon, 13 Jan 2025 10:27:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5CAE6B008A; Mon, 13 Jan 2025 10:27:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B19C6B008C; Mon, 13 Jan 2025 10:27:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 780456B0089 for ; Mon, 13 Jan 2025 10:27:45 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F2B9B1A027B for ; Mon, 13 Jan 2025 15:27:44 +0000 (UTC) X-FDA: 83002808448.03.C0B7FDE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 5CB89A0011 for ; Mon, 13 Jan 2025 15:27:42 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="i9oAj5g/"; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736782062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B1tdRw+GFmnVDVJUdRXQGRA8NBvUzqvFlAtcjpsRORs=; b=FLtR6/s/q6pgbbHFQ9IdwOaqjlZ/hbX0QCxYASV7vCpQLz+MDxAPiokQTz04UYUZqzJLHT WCxuBsONg+RLsC4LbPVPCjzV1ygZY0PFAcToXsy1dD0Lk7qCUbKJmKF0SnV18TDoAb9Erd y5CzCxumMJSaaLESrWjBtGfUFNSti/w= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="i9oAj5g/"; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736782062; a=rsa-sha256; cv=none; b=cLyObhzAooncxl0+rcO8UqHCj49N5a7h0XFtcouWdrlyCCmBPIE0oomvwJ3eIfUv78/yAD b8myVcb95sG4vMaQrtXJSj5ZC5itoTS/nz66nQIIbHHRE1MRdIEcmweknFcXq38Jk6JzIR KUOcxCoIt+Q79/jDL2YIhW8bkU8lQXc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736782061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=B1tdRw+GFmnVDVJUdRXQGRA8NBvUzqvFlAtcjpsRORs=; b=i9oAj5g/hX+d6K4bHDyLMcErHqlzwvATaqkMKrF6WtTHkASpEUk7xiO1GZFtFIWj76ETDi 4N8p/dOlZAamIMeQf5omwZZIxW1R6IUN2SiHU3Oqon+67KQn7lZqd3NmZCWSc59TorlPdS WkeYiTlQwZWaJKdqbTDNJWcEvy6tdwY= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-556-DuGeGmVYPJOC-BLtM0TLZw-1; Mon, 13 Jan 2025 10:27:39 -0500 X-MC-Unique: DuGeGmVYPJOC-BLtM0TLZw-1 X-Mimecast-MFC-AGG-ID: DuGeGmVYPJOC-BLtM0TLZw Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-436248d1240so21429515e9.0 for ; Mon, 13 Jan 2025 07:27:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736782058; x=1737386858; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=B1tdRw+GFmnVDVJUdRXQGRA8NBvUzqvFlAtcjpsRORs=; b=lqLFaEZoZXS2S2pYIWSAOfnqA+FPAWDVI4FraGlJOm1nDEdQM3n9ydYm65IFKYecra mZbuqzmxzVrMpR3lt6bAIN6uxU72+vQJhgntvQxDoRU0LliSeCFl62POVjyPCKWQw3yP oiWlzKW0EAiQCyODsxgsuGRY05LYP+XfGsL6SHtEjf/7JEs5wq3QciKYFmBKfudtalGk JS6bDCThuU49Win5cx2gERWFdeHcU6XQ+hysjxGQ8pJ4jikFausIb8C3Tiw6bcx2VEU0 wlgRb7G8+s2Nmy5EM6wVwP1TXkYH+EsTQXQomRcUUPfTdA+dvOejw7/LiUyQ1kP0RTke kqmw== X-Forwarded-Encrypted: i=1; AJvYcCV3U66uXpnfe8z/olZDVpPc462xKM6eP7SvJ5qHpbo4vS6s8YnlribP6kTj2zacMTrJ6XG/JTHjAg==@kvack.org X-Gm-Message-State: AOJu0YxRB20f7tuTTPYjC8NxgCLSGncYSNa9z5qb3O9pEYfwkaggDjbW kRHpVELQuvp8hQySpjAz1wBQqJLP5k9LhSVkElLfLojOxLJe6mirhLhF2NGvCv7oWTqwLuSQ8vU wm1ystlWPfMjHMPYy6hkEcV0rs7W6S+eIFzuz06rWHVszFSFj X-Gm-Gg: ASbGnctzDUKz5TbVdpPgrD2H2sul6a3iGaFilVpRSnVbtVpf0gzW9BdzkYQMB1EErU/ il8L5Y/joTh5DMZrZhGZgFeFYByhpwQVUgnmMRIeVWY2vmQRRY+n6zWpLjNbndpt/LCLWrqy7Kw ZJ0nCyCY/fD7Ma4s6zaBFefp8jVNbNNj7Y8zceECoxC3blAC2bbZI1g/40Md1XFScTFtaVb7FxH /56nVwzf2Pqs8hYt8ThB/C4vcV1t9IVUXul1Oz9J+8sp8OcZhxO10BdOIeoRKj+dQj/wIeIB2RE P3pniylA/glKSME= X-Received: by 2002:a05:600c:1d14:b0:436:5fc9:309d with SMTP id 5b1f17b1804b1-436e26f6d81mr144857585e9.30.1736782058138; Mon, 13 Jan 2025 07:27:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IF10MMhhJLojZYGh83FtfYFql07yg7ZjEHf/55/4MjF9hbTzyhzmb+dfeltdh2S+k12HjoOiw== X-Received: by 2002:a05:600c:1d14:b0:436:5fc9:309d with SMTP id 5b1f17b1804b1-436e26f6d81mr144857165e9.30.1736782057664; Mon, 13 Jan 2025 07:27:37 -0800 (PST) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-436e9e03e5fsm145668705e9.18.2025.01.13.07.27.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Jan 2025 07:27:36 -0800 (PST) Message-ID: <2848b566-3cae-4e89-916c-241508054402@redhat.com> Date: Mon, 13 Jan 2025 16:27:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings To: Shakeel Butt Cc: Jeff Layton , Miklos Szeredi , Joanne Koong , Bernd Schubert , Zi Yan , linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko References: <791d4056-cac1-4477-a8e3-3a2392ed34db@redhat.com> <1fdc9d50-584c-45f4-9acd-3041d0b4b804@redhat.com> <54ebdef4205781d3351e4a38e5551046482dbba0.camel@kernel.org> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Sb9H1exUhpFXPY1r7aavk4S4wH55XGpumdHzS4xfghY_1736782058 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 5CB89A0011 X-Rspamd-Server: rspam12 X-Stat-Signature: 3uky95qm6rjxcsf4xfmcdrphsddxgftu X-Rspam-User: X-HE-Tag: 1736782062-262463 X-HE-Meta: U2FsdGVkX19pT0I+zs7sdYQhOxs7Ji3vcsrNxQznKNrcstrsFXFB8D6NR/WuS15I3m4TtgegbnHxh3/uA7iZ1gN2FWKTc+wVEQnpKGm7wbfVDP5dLVEgUF16o63IXdZB6e1WxK+lY2uhU/3Zo/CqPzJyWdRmJdok+dnXHXZcH3NMRb2PaI5Akg5LfpiLx75eeqiLtz9zCiLBJInFjSllBwboq+7YYguPEWKYF74OP6GOc7WZjmFe5voAWCqCclIwzbfyeDGfkiw/MOf0FZ/zjwnJulCaBPZftzGEcZsK2yBgdFD2dK92Qfb2oQENusBDshZmGDQ0uLSHJ0DdvjZqs7ArKncvT+P5yCNZqGWFcVaAOYWLnOwXXhk5mRzlNJxz1UUhhbSB2jLzFq5aK2SruXU07CAVZqrlW3etRDIg8J1FLBs6EJ7lDpdtfvrLcHqg5UHFk4J49LHhmZFL1x6b8JCKMFNLUAH6lHbe5MqMIHDv22cAPgaSjPPNAQOyD3FwJd4meu1aOy9Vxm/VEpVAHFvBs/jnJc8GCEBl5pk2VVXN7lQIa4IhQFFsO+l5MN2XXI0cdhJuebobk/wQlT+5iVgoPrR17KQwNH5iZbj9Smqx6zczbflpYXTmxUOEJ1pmM/vj5PZeie19JCmYiwe3tYtySuUTfrMPtCZXfH2WNgjYaR0xJaGNixhfdz8f/szP7BL/xZMG0BaOKT7oTXePUJbF6sZ/IqaMFNnq2XPdukdzWNuVeCeg7+DsnHLJW9N7KC+yktCND39KsRr5Vlg3d392CK08Vomvs+9L9BLm4bMaUC9oHcdH9P8r1DdwNfxfGJZ0mfV8qUFJjDXNcWsMgfgTDy4qCM8gETuc9w3bdO5cGlZ+rHNupZpjp4RHduQ9Jx7FMx/pTZpoW0R7TYFstZ5nOWDl+fnox/CiSxiXS2Xv6utnlay9v0J0MUpOHsP4b5tjYgEGr1HW/WORsz9 qUX5OXnI sB1K0d2pMHH3ZZh6RuZuazNakEfuMLS41iLfjpj4AUeHPBJX6g1Oi4yFFaGAAcHVu0TcdFrTNCv4vw1+4vbNG0gBOeMjIBNFlzv8sShS4BzjD7qHq5g5P0fY/BxHiPuoqwnBMpGos6MSwCDc70Y13r4SNVXlnnw0YtOVBjKOcu2p+3U36Gz9vKlHT42USSJZc/culVvgYlC1A1OUT8b0e7Eac2uqZaLiHW850sXlZI+fpx9fSGykHgJBDmRZLwSu2V3NP2i5gSzogtxeuPpypwUSjxlzpGXVRCWxwUNSXXKa3fptzYR9iNkbZy+1SyjyXAp7PqMl+aK0fQ1DOV9ht1IUn2iH4U9AkJv5Dpe7W6gZjo1ZCalCeJT+hXpaxKH/vKnJrKMBMBDNJkYvlXHcQ5WIQk4EoBygvF0/G+jmjTCxrsynuC8HUaJlRN0eqTAOwAoHIjyolsb2eGLAQQ+ye6De9A+0Ma8rdy8W8wxeJIG1aL2b7xOnbvb8dpkSji5E2AMBh9dnvnDds3rYPCY8iIBcLF25NLfaVhH7q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10.01.25 23:00, Shakeel Butt wrote: > On Fri, Jan 10, 2025 at 10:13:17PM +0100, David Hildenbrand wrote: >> On 10.01.25 21:28, Jeff Layton wrote: >>> On Thu, 2025-01-09 at 12:22 +0100, David Hildenbrand wrote: >>>> On 07.01.25 19:07, Shakeel Butt wrote: >>>>> On Tue, Jan 07, 2025 at 09:34:49AM +0100, David Hildenbrand wrote: >>>>>> On 06.01.25 19:17, Shakeel Butt wrote: >>>>>>> On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote: >>>>>>>> On Fri, 3 Jan 2025 at 21:31, David Hildenbrand wrote: >>>>>>>>> In any case, having movable pages be turned unmovable due to persistent >>>>>>>>> writaback is something that must be fixed, not worked around. Likely a >>>>>>>>> good topic for LSF/MM. >>>>>>>> >>>>>>>> Yes, this seems a good cross fs-mm topic. >>>>>>>> >>>>>>>> So the issue discussed here is that movable pages used for fuse >>>>>>>> page-cache cause a problems when memory needs to be compacted. The >>>>>>>> problem is either that >>>>>>>> >>>>>>>> - the page is skipped, leaving the physical memory block unmovable >>>>>>>> >>>>>>>> - the compaction is blocked for an unbounded time >>>>>>>> >>>>>>>> While the new AS_WRITEBACK_INDETERMINATE could potentially make things >>>>>>>> worse, the same thing happens on readahead, since the new page can be >>>>>>>> locked for an indeterminate amount of time, which can also block >>>>>>>> compaction, right? >>>>>> >>>>>> Yes, as memory hotplug + virtio-mem maintainer my bigger concern is these >>>>>> pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there *must not be >>>>>> unmovable pages ever*. Not triggered by an untrusted source, not triggered >>>>>> by an trusted source. >>>>>> >>>>>> It's a violation of core-mm principles. >>>>> >>>>> The "must not be unmovable pages ever" is a very strong statement and we >>>>> are violating it today and will keep violating it in future. Any >>>>> page/folio under lock or writeback or have reference taken or have been >>>>> isolated from their LRU is unmovable (most of the time for small period >>>>> of time). >>>> >>>> ^ this: "small period of time" is what I meant. >>>> >>>> Most of these things are known to not be problematic: retrying a couple >>>> of times makes it work, that's why migration keeps retrying. >>>> >>>> Again, as an example, we allow short-term O_DIRECT but disallow >>>> long-term page pinning. I think there were concerns at some point if >>>> O_DIRECT might also be problematic (I/O might take a while), but so far >>>> it was not a problem in practice that would make CMA allocations easily >>>> fail. >>>> >>>> vmsplice() is a known problem, because it behaves like O_DIRECT but >>>> actually triggers long-term pinning; IIRC David Howells has this on his >>>> todo list to fix. [I recall that seccomp disallows vmsplice by default >>>> right now] >>>> >>>> These operations are being done all over the place in kernel. >>>>> Miklos gave an example of readahead. >>>> >>>> I assume you mean "unmovable for a short time", correct, or can you >>>> point me at that specific example; I think I missed that. > > Please see https://lore.kernel.org/all/CAJfpegthP2enc9o1hV-izyAG9nHcD_tT8dKFxxzhdQws6pcyhQ@mail.gmail.com/ > >>>> >>>>> The per-CPU LRU caches are another >>>>> case where folios can get stuck for long period of time. >>>> >>>> Which is why memory offlining disables the lru cache. See >>>> lru_cache_disable(). Other users that care about that drain the LRU on >>>> all cpus. >>>> >>>>> Reclaim and >>>>> compaction can isolate a lot of folios that they need to have >>>>> too_many_isolated() checks. So, "must not be unmovable pages ever" is >>>>> impractical. >>>> >>>> "must only be short-term unmovable", better? > > Yes and you have clarified further below of the actual amount. > >>>> >>> >>> Still a little ambiguous. >>> >>> How short is "short-term"? Are we talking milliseconds or minutes? >> >> Usually a couple of seconds, max. For memory offlining, slightly longer >> times are acceptable; other things (in particular compaction or CMA >> allocations) will give up much faster. >> >>> >>> Imposing a hard timeout on writeback requests to unprivileged FUSE >>> servers might give us a better guarantee of forward-progress, but it >>> would probably have to be on the order of at least a minute or so to be >>> workable. >> >> Yes, and that might already be a bit too much, especially if stuck on >> waiting for folio writeback ... so ideally we could find a way to migrate >> these folios that are under writeback and it's not your ordinary disk driver >> that responds rather quickly. >> >> Right now we do it via these temp pages, and I can see how that's >> undesirable. >> >> For NFS etc. we probably never ran into this, because it's all used in >> fairly well managed environments and, well, I assume NFS easily outdates CMA >> and ZONE_MOVABLE :) >> >>>>>> >>>>> The point is that, yes we should aim to improve things but in iterations >>>>> and "must not be unmovable pages ever" is not something we can achieve >>>>> in one step. >>>> >>>> I agree with the "improve things in iterations", but as >>>> AS_WRITEBACK_INDETERMINATE has the FOLL_LONGTERM smell to it, I think we >>>> are making things worse. > > AS_WRITEBACK_INDETERMINATE is really a bad name we picked as it is still > causing confusion. It is a simple flag to avoid deadlock in the reclaim > code path and does not say anything about movability. > >>>> >>>> And as this discussion has been going on for too long, to summarize my >>>> point: there exist conditions where pages are short-term unmovable, and >>>> possibly some to be fixed that turn pages long-term unmovable (e.g., >>>> vmsplice); that does not mean that we can freely add new conditions that >>>> turn movable pages unmovable long-term or even forever. >>>> >>>> Again, this might be a good LSF/MM topic. If I would have the capacity I >>>> would suggest a topic around which things are know to cause pages to be >>>> short-term or long-term unmovable/unsplittable, and which can be >>>> handled, which not. Maybe I'll find the time to propose that as a topic. >>>> >>> >>> >>> This does sound like great LSF/MM fodder! I predict that this session >>> will run long! ;) >> >> Heh, fully agreed! :) > > I would like more targeted topic and for that I want us to at least > agree where we are disagring. Let me write down two statements and > please tell me where you disagree: I think we're mostly in agreement! > > 1. For a normal running FUSE server (without tmp pages), the lifetime of > writeback state of fuse folios falls under "short-term unmovable" bucket > as it does not differ in anyway from anyother filesystems handling > writeback folios. That's the expectation, yes. As long as the FUSE server is able to make progress, the expectation is that it's just like NFS etc. If it isn't able to make progress (i.e., crash), the expectation is that everything will get cleaned up either way. I wonder if there could be valid scenario where the FUSE server is no longer able to make progress (ignoring network outages), or the progress might start being extremely slow such that it becomes a problem. In contrast to in-kernel FSs, one can do some fancy stuff with fuse where writing a page could possibly consume a lot of memory in user-space. Likely, in this case we might just blame it on the admin that agreed to running this (trusted) fuse server. > > 2. For a buggy or untrusted FUSE server (without tmp pages), the > lifetime of writeback state of fuse folios can be arbitrarily long and > we need some mechanism to limit it. Yes. Especially in 1), we really want to wait for writeback to finish, just like for any other filesystem. For 2), we want a way so writeback will not get stuck for a long time, but are able to make progress and migrate these pages. -- Cheers, David / dhildenb