From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA851C83004 for ; Wed, 29 Apr 2020 14:04:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 80E24208FE for ; Wed, 29 Apr 2020 14:04:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LTCggxC9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 80E24208FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2E2568E0011; Wed, 29 Apr 2020 10:04:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B9AE8E0005; Wed, 29 Apr 2020 10:04:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CF798E0011; Wed, 29 Apr 2020 10:04:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id 029AA8E0005 for ; Wed, 29 Apr 2020 10:04:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BEC075DF9 for ; Wed, 29 Apr 2020 14:04:02 +0000 (UTC) X-FDA: 76761061524.13.shelf66_62aaf999a13c X-HE-Tag: shelf66_62aaf999a13c X-Filterd-Recvd-Size: 6726 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 29 Apr 2020 14:04:02 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id c23so1852146qtp.11 for ; Wed, 29 Apr 2020 07:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OPeUScr2vL9/Zkz5n3yjWKmeXq3NwFtTvKkYOvdwcJg=; b=LTCggxC9MXNyozRB6RCYBO8Lbp0lLeLTySbNpJQhtIFbSYraYm25VdWaDJlMpN3BN2 CqzhR0a16L40HQr0KYxb4JN+MGOqhjvgX63E5A4tba8gwwlnFCg1squLu7pJJkT81JWG qy65myRgm6O+M918NRgjLhXBFp1/UCw1HGPZkkr3KG+mo3ph1CaD3QlOXne8wjvvQ/9k puyVBe991Mmpaa/0UmjaPTufO8PNOhVyp9hX8hEcwcLb8O//uXvqVh5LSgNgfYdLViA1 c8eWv1keMyVgk8dLGVIKDN4EvyeFnOZ+2lsrAMpgrO4Pq6AVuf7TOPbrfaPrSXJyPgOw gkXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OPeUScr2vL9/Zkz5n3yjWKmeXq3NwFtTvKkYOvdwcJg=; b=ialh0dLeQJxod5VEuqYFDUSiBliuRHr+DiHRQYOspwEkhGu3PEUJrsf/8Q92TmpYW7 ARD4zQEh6KPIjkd9gA154OatpKYBodNgYJEk/0DyRWr3EQQg+kOf5iFP6roLOwKxTyEY CWEAWSlmo31tU/dsNGaWQ023GifXdKRCksGGMSn0cyrUye+KODMCxGOFI5s/dbZd3Op3 gsx9cur5a8GXFxCFP0FOndZJgldofVFalkHxFz10RAsWBgszMWhFcDR5IL5P7ZRq3IUm xMUui9+xiAdkStAqVCKaRz4HTusUgSkT5IoMI4PsLyBsjnsLGdj5X/OgbDFfBqobc9tB m7KQ== X-Gm-Message-State: AGi0PuaYqP0Wxx5RaNuGBZHi1Tmr71u1nNADyCVZDLVeQTL+VN3NqnVe TroEyDxpTasx7O2rUi/gPWc= X-Google-Smtp-Source: APiQypJyqJaXIDnsm/YAulJYEip+0qmiilCCcy1yOWUQP+L+XgAtkDC94qJjHwuYbiR6+7ZAJ/BCwg== X-Received: by 2002:ac8:71d8:: with SMTP id i24mr34585392qtp.223.1588169041381; Wed, 29 Apr 2020 07:04:01 -0700 (PDT) Received: from dschatzberg-fedora-PC0Y6AEN ([2620:10d:c091:480::1:14f1]) by smtp.gmail.com with ESMTPSA id b10sm12609955qkl.19.2020.04.29.07.03.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 07:04:00 -0700 (PDT) Date: Wed, 29 Apr 2020 10:03:57 -0400 From: Dan Schatzberg To: Dave Chinner Cc: Jens Axboe , Alexander Viro , Jan Kara , Amir Goldstein , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Ingo Molnar , "Peter Zijlstra (Intel)" , Mathieu Desnoyers , "Kirill A. Shutemov" , Andrea Arcangeli , Thomas Gleixner , "open list:BLOCK LAYER" , open list , "open list:FILESYSTEMS (VFS and infrastructure)" , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup Message-ID: <20200429140357.GB18499@dschatzberg-fedora-PC0Y6AEN> References: <20200428161355.6377-1-schatzberg.dan@gmail.com> <20200428214653.GD2005@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200428214653.GD2005@dread.disaster.area> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 29, 2020 at 07:47:34AM +1000, Dave Chinner wrote: > On Tue, Apr 28, 2020 at 12:13:46PM -0400, Dan Schatzberg wrote: > > The loop device runs all i/o to the backing file on a separate kworker > > thread which results in all i/o being charged to the root cgroup. This > > allows a loop device to be used to trivially bypass resource limits > > and other policy. This patch series fixes this gap in accounting. > > How is this specific to the loop device? Isn't every block device > that offloads work to a kthread or single worker thread susceptible > to the same "exploit"? I believe this is fairly loop device specific. The issue is that the loop driver issues I/O by re-entering the VFS layer (resulting in tmpfs like in my example or entering the block layer). Normally, I/O through the VFS layer is accounted for and controlled (e.g. you can OOM if writing to tmpfs, or get throttled by the I/O controller) but the loop device completely side-steps the accounting. > > Or is the problem simply that the loop worker thread is simply not > taking the IO's associated cgroup and submitting the IO with that > cgroup associated with it? That seems kinda simple to fix.... > > > Naively charging cgroups could result in priority inversions through > > the single kworker thread in the case where multiple cgroups are > > reading/writing to the same loop device. > > And that's where all the complexity and serialisation comes from, > right? > > So, again: how is this unique to the loop device? Other block > devices also offload IO to kthreads to do blocking work and IO > submission to lower layers. Hence this seems to me like a generic > "block device does IO submission from different task" issue that > should be handled by generic infrastructure and not need to be > reimplemented multiple times in every block device driver that > offloads work to other threads... I'm not familiar with other block device drivers that behave like this. Could you point me at a few? > > > This patch series does some > > minor modification to the loop driver so that each cgroup can make > > forward progress independently to avoid this inversion. > > > > With this patch series applied, the above script triggers OOM kills > > when writing through the loop device as expected. > > NACK! > > The IO that is disallowed should fail with ENOMEM or some similar > error, not trigger an OOM kill that shoots some innocent bystander > in the head. That's worse than using BUG() to report errors... The OOM behavior is due to cgroup limit. It mirrors the behavior one sees when writing to a too-large tmpfs.