From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBC95C432BE for ; Fri, 6 Aug 2021 11:30:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 29B6D611C6 for ; Fri, 6 Aug 2021 11:30:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 29B6D611C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 62BD26B006C; Fri, 6 Aug 2021 07:30:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DC698D0001; Fri, 6 Aug 2021 07:30:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A3786B0072; Fri, 6 Aug 2021 07:30:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0123.hostedemail.com [216.40.44.123]) by kanga.kvack.org (Postfix) with ESMTP id 298936B006C for ; Fri, 6 Aug 2021 07:30:27 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C58BA8249980 for ; Fri, 6 Aug 2021 11:30:26 +0000 (UTC) X-FDA: 78444437652.10.A3DE02C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 37740B003D27 for ; Fri, 6 Aug 2021 11:30:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628249425; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pTlr3c52TlrPZjNn01B9Yc2TW1zR2QrXNrx8DpDtl0g=; b=E2uvy+4WL8l239yELwfUIApCyrHtre5koeThEh2jqZqdsVQJ0SPgb2ws6Hz98ABcqOPt6J 3xC2AMM0vk3ZsIkIBqnLtQE79fqS8gU9Le6o9XVP9orp1QNsFWfJUfwi2wt6lWTbBFnL3g kplW+dY4uxWX71jSsubv2SzYFDImoo0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-553-pks1P3xiM3WqWTh1JYqAUg-1; Fri, 06 Aug 2021 07:30:24 -0400 X-MC-Unique: pks1P3xiM3WqWTh1JYqAUg-1 Received: by mail-wm1-f70.google.com with SMTP id 205-20020a1c04d60000b02902e6620dfe4bso553671wme.0 for ; Fri, 06 Aug 2021 04:30:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=pTlr3c52TlrPZjNn01B9Yc2TW1zR2QrXNrx8DpDtl0g=; b=NUxAdt4DOhTU+DZgCm4eUkzVnrrSZe5Toj6XOvzf+JOFK6lISxkXNoy7m39PWIecyD rNbTug3sQeleXd9lK9SbrHQPVIKzvlFgNewJPKVlR3XODAfHC1PviY4A2xz6YSLQtTL4 B7VVc79Aw5IWKU+LolkOReGmtyhWB1cruSzut0SlD8mLkjNEQkSGSUWYUd7GkoMtCfjJ KM5DSnqzaBaThRMTG4bAf8qvbZeLzKvlmiWA/kKgHpVLiUEW7Qq3984y0KCLdEmQKgnV ro0GsrQxxsWlD2k9NcJd4FaKtgKyzXW3GszowtuqW1xxW7/XKnZXd2O3wAX83qzmHTD1 WouA== X-Gm-Message-State: AOAM532ZMToq5f8FKha0QnH7QG4ZDdO97TLAkxoyyrm/WF2OvISqd/Ec /ThxKdlrn2+JbIOP/7BMK/I0M5Pu7wG/yg/Qk86dnQQYHJDh0fMwsQixCNERh/AO11r/UliS08Q 5QlbuKVY6diI= X-Received: by 2002:adf:f602:: with SMTP id t2mr10044873wrp.232.1628249423191; Fri, 06 Aug 2021 04:30:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNmG6xkSFgJJKLpxPSZOuyPxqiv5NEnXFWCst9foFZeSqgWcwPEuP2Q12L+fDdVGB/UHUOnQ== X-Received: by 2002:adf:f602:: with SMTP id t2mr10044849wrp.232.1628249422950; Fri, 06 Aug 2021 04:30:22 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6104.dip0.t-ipconnect.de. [91.12.97.4]) by smtp.gmail.com with ESMTPSA id x15sm8944899wrs.57.2021.08.06.04.30.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 Aug 2021 04:30:22 -0700 (PDT) To: Claudio Imbrenda Cc: kvm@vger.kernel.org, cohuck@redhat.com, borntraeger@de.ibm.com, frankja@linux.ibm.com, thuth@redhat.com, pasic@linux.ibm.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, Ulrich.Weigand@de.ibm.com, "linux-mm@kvack.org" , Michal Hocko References: <20210804154046.88552-1-imbrenda@linux.ibm.com> <86b114ef-41ea-04b6-327c-4a036f784fad@redhat.com> <20210806113005.0259d53c@p-imbrenda> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy Message-ID: Date: Fri, 6 Aug 2021 13:30:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210806113005.0259d53c@p-imbrenda> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 37740B003D27 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E2uvy+4W; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf24.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: ds1dggakgfxqczqfpz4s387fdtzwajow X-HE-Tag: 1628249426-733141 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >>> This means that the same address space can have memory belonging to >>> more than one protected guest, although only one will be running, >>> the others will in fact not even have any CPUs. >> >> ... this ... >=20 > this ^ is exactly the reboot case. Ah, right, we're having more than one protected guest per process, so=20 it's all handled within the same process. >=20 >>> When a guest is destroyed, its memory still counts towards its >>> memory control group until it's actually freed (I tested this >>> experimentally) >>> >>> When the system runs out of memory, if a guest has terminated and >>> its memory is being cleaned asynchronously, the OOM killer will >>> wait a little and then see if memory has been freed. This has the >>> practical effect of slowing down memory allocations when the system >>> is out of memory to give the cleanup thread time to cleanup and >>> free memory, and avoid an actual OOM situation. >> >> ... and this sound like the kind of arch MM hacks that will bite us >> in the long run. Of course, I might be wrong, but already doing >> excessive GFP_ATOMIC allocations or messing with the OOM killer that >=20 > they are GFP_ATOMIC but they should not put too much weight on the > memory and can also fail without consequences, I used: >=20 > GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN >=20 > also notice that after every page allocation a page gets freed, so this > is only temporary. Correct me if I'm wrong: you're allocating unmovable pages for tracking=20 (e.g., ZONE_DMA, ZONE_NORMAL) from atomic reserves and will free a=20 movable process page, correct? Or which page will you be freeing? >=20 > I would not call it "messing with the OOM killer", I'm using the same > interface used by virtio-baloon Right, and for virtio-balloon it's actually a workaround to restore the=20 original behavior of a rarely used feature: deflate-on-oom. Commit=20 da10329cb057 ("virtio-balloon: switch back to OOM handler for=20 VIRTIO_BALLOON_F_DEFLATE_ON_OOM") tried to document why we switched back=20 from a shrinker to VIRTIO_BALLOON_F_DEFLATE_ON_OOM: "The name "deflate on OOM" makes it pretty clear when deflation should happen - after other approaches to reclaim memory failed, not while reclaiming. This allows to minimize the footprint of a guest - memory will only be taken out of the balloon when really needed." Note some subtle differences: a) IIRC, before running into the OOM killer, will try reclaiming anything else. This is what we want for deflate-on-oom, it might not be what you want for your feature (e.g., flushing other processes/VMs to disk/swap instead of waiting for a single process to stop). b) Migration of movable balloon inflated pages continues working because we are dealing with non-lru page migration. Will page reclaim, page migration, compaction, ... of these movable LRU=20 pages still continue working while they are sitting around waiting to be=20 cleaned up? I can see that we're grabbing an extra reference when we put=20 them onto the list, that might be a problem: for example, we can most=20 certainly not swap out these pages or write them back to disk on memory=20 pressure. >=20 >> way for a pure (shutdown) optimization is an alarm signal. Of course, >> I might be wrong. >> >> You should at least CC linux-mm. I'll do that right now and also CC >> Michal. He might have time to have a quick glimpse at patch #11 and >> #13. >> >> https://lkml.kernel.org/r/20210804154046.88552-12-imbrenda@linux.ibm.c= om >> https://lkml.kernel.org/r/20210804154046.88552-14-imbrenda@linux.ibm.c= om >> >> IMHO, we should proceed with patch 1-10, as they solve a really >> important problem ("slow reboots") in a nice way, whereby patch 11 >> handles a case that can be worked around comparatively easily by >> management tools -- my 2 cents. >=20 > how would management tools work around the issue that a shutdown can > take very long? The traditional approach is to wait starting a new VM on another=20 hypervisor instead until memory has been freed up, or start it on=20 another hypervisor. That raises the question about the target use case. What I don't get is that we have to pay the price for freeing up that=20 memory. Why isn't it sufficient to keep the process running and let=20 ordinary MM do it's thing? Maybe you should clearly spell out what the target use case for the fast=20 shutdown (fast quitting of the process?) is?. I assume it is, starting a=20 new VM / process / whatsoever on the same host immediately, and then a) Eventually slowing down other processes due heavy reclaim. b) Slowing down the new process because you have to pay the price of=20 cleaning up memory. I think I am missing why we need the lazy destroy at all when killing a=20 process. Couldn't you instead teach the OOM killer "hey, we're currently=20 quitting a heavy process that is just *very* slow to free up memory,=20 please wait for that before starting shooting around" ? --=20 Thanks, David / dhildenb