Re: [TECH TOPIC] re-think of richACLs in AI/LLM era

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: "Coly Li" <colyli@fnnas.com>
To: "Randy Dunlap" <rdunlap@infradead.org>
Cc: "Paul Moore" <paul@paul-moore.com>,
	 "Steven Rostedt" <rostedt@goodmis.org>,
	"Jan Kara" <jack@suse.cz>,  <ksummit@lists.linux.dev>
Subject: Re: [TECH TOPIC] re-think of richACLs in AI/LLM era
Date: Wed, 17 Sep 2025 10:38:03 +0800	[thread overview]
Message-ID: <d36bb4fwzhasdhs2z6wvilckuuqte7b7ivpvghmse2txscrzqr@xpdquqqg6hwf> (raw)

On Tue, Sep 16, 2025 at 11:07:27AM +0800, Randy Dunlap wrote:
> Hi Coly,
> 
> On 9/16/25 10:12 AM, Coly Li wrote:
> > On Wed, Sep 10, 2025 at 03:11:55PM +0800, Paul Moore wrote:
> >> On Wed, Sep 10, 2025 at 9:32 AM Coly Li <colyli@fnnas.com> wrote:
> >>> On Mon, Sep 08, 2025 at 09:03:24PM +0800, Paul Moore wrote:
> >>
> >> ...
> >>
> >>>> I can't say I'm familiar with the RichACL concept, but generally
> >>>> speaking yes, the LSM framework exists as a way to implement access
> >>>> control mechanisms beyond the traditional Linux access controls (other
> >>>> things too, but those aren't really relevant here).
> >>>
> >>> Is it convenient for normal users or non-root processes (including the policy agent) to
> >>> setup the LSM rules? We need to allow normal users to set their own access control policy
> >>> for the data they owned.
> >>
> >> Management of an individual LSM's configuration is generally left to
> >> the individual LSM.  Some LSMs restrict their configuration knobs
> >> behind capabilities or their own access controls, while others allow
> >> unprivileged access to the configuration; it depends on the LSM's
> >> security model.  As an unprivileged example, Landlock allows
> >> applications, run by arbitrary users, to set their own Landlock
> >> security policy via the Landlock API.
> >>
> > 
> > I forward the suggestion to our application team, and they try to evaluate
> > and replied the result.
> > 
> > Currently they are using bpf to do the access control rules checking, and
> > LSM access control method e.g. Landlock is quite similar to a rules based
> > control method. They still need to persist all the rules on disk, and load
> > the rules during system initlization time. When the number of rules increased,
> > the maintennance is complicated and slow.
> > 
> > Also the application team give me a use case, ask how to achieve the access
> > control effiently. Let me describe by the followed text.
> > 
> > Users store they photos on the system, and the compact AI module processes all
> > their photos and groups all the photos into different categories like pizza,
> > dogs, cats, foods or group photos. After the process done, users may see they
> > photos in different categories that the AI module thinks they should be in. Then
> > users may share the categories with photos to others. If indentical categories
> > shared by different users, the shared photos can be combined all together. And
> > AI module may continue to process the shared photos and generate new categaries
> > from the shared photos, e.g. pizza in the same city, cats and dogs in closed
> > location, group photos contains the most common people, etc. Now the differet
> > categories are implemented by different directories in the publicly shared
> > directory.
> > 
> > In each category directory, photos with a category (or attribution) can be
> > accessed as hard links to the original photo inodes and share the identical
> > inodes. All these category directories are created by the AI module, although
> > the photos are shared from each users. If a user is identified from a group
> > photo, and this user is noticed that the photo is publicly shared. If this user
> > doesn't want his face to be shared in public, for an optinal privacy protection
> > right, this user can remove the hardlink of the photo which his or her face is
> > in, that is he or she can remove the hardlink (dentry) under a publicly shared
> > directory which this user doesn't have write permission. Because this user can
> > be idnetified as owner of his or her face, and the photo has his face in, he or
> > she should have write permission to delete the photo, but no write permission to
> > other photos in same category directioy which his or her face is not in.
> 
> 
> What permissions/rights does the AI module have such that it can create
> a file in shared/photos/faces from my personal files?

By default the AI module is not installed. If admin user installs the AI module
and a user permit AI module to process his personal information, the AI module
has read-only permission to the original data.

> The shared file with my photo is still owned by the sharing user, correct?

Yes, the original files are owned by the users who share the files to public.

> What are the permissions/owner of the parent directory?

If a user shared selected photos with a specific category/attribute, the parent
directory of the shared photos or category directories of the shared photos are
not owned by the original files owner. All the shared photos will be processed
furthe to build more connections and categories. 

> What permissions are required in my personal photos directory to allow
> files there to be shared?
>

The original files are not visible/accessible to other users by default.
AI process has read access granted by posix acl. But removing the dentry/hard-
link from a shared direcotry which the user has no write permission is not
supported by posix acl, and currently it is implemented by a ugly and in-
efficient ebpf hack. We don't like such hack, and it won't work in future for
huge number of files with more complicated AI process/share circumstances.

> It sounds to me like the AI was aggressive in sharing; now you (the user)
> want to correct/fix that. The AI could be trained better, but that's too
> difficult. (just playing devil's advocate there)

It is expected behavior, we don't have the motivation to correct/fix the
behavior. By default the AI processed photos are only accessed by the owner who
uploads the photos onto the system. People do want to share the AI processed
photos to others, e.g. all photos are categoried into 'fine food', and select
all photo in this category to other users on the same system. Currently they are
faimily members and very close friends or relatives. What I described in the
above case is a prolicy/procedure how conflict of privacy protect is solved.

Unlike the public clound and AI facilities, on our product the AI module only
processes data locally and no where to upload or share the processed information
in public, and users can use the compact AI capability without register a global
and unique online account. And the AI processed information can be shared very
convenient among all users on the system.

> 
> I haven't looked at this in any detail, but I'm wondering if an
> intermediate directory level with "my" permissions/ownership would
> allow a fix for this issue. Then I (the user) could remove the file
> (the hardlink) from the intermediate directory. However, that might

It is almost impossible to use intermediate directories. Because the photos in
a category directory may come from different users. The AI module processes the
shared photos again and again, tries to find interesting connection or
attributes of these photos and creates different categories every several days
or weeks. This is not initialized by the owner of the photos. Our system is free
of charge to end users, people may deploy the software on very normal hardware,
e.g. an old laptop bought 15 years ago. So the AI process is slow, it is
probably new photo categories show up after several weeks.

> leave some dangling link in the final shared directory. I don't know
> enough about it to answer that.

Neither me :-) Our product is Debian Linux + many user space applications. The
AI applications are not open sourced yet. I maintain the kernel part and try to
post all the kernel changes back to upstream. The access control lists require-
ment is from our application team, the kernel team tried to implement the
requirement with existed kernel infrastructure and failed, this is why I propose
such a topic on kernel summit.

> 
> > The above example is one of the simple case just for photos processing and
> > sharing in the AI context. The rules of access control are created or destoried
> > dynamically and maybe only exist for a short period. And the number of rules are
> > quite large.
> > 
> > Current rules based access control is inefficient and complicated to implement
> > for the above simple case, and the application team replies they don't see the
> > rules based LSM method can make it be more simple.
> 
> [snip]
> 
> >> One of the important parts of the LSM framework as a whole is that
> >> LSMs can not grant access that would otherwise be blocked by the
> >> standard/discretionary access controls built into the Linux kernel; in
> >> other words, LSMs can only say "no" to an access, they can not grant
> >> access by themselves.  Yes, this is by design, and no, I see no reason
> >> to change that design decision at this point in time (doing so would
> >> require a tremendous amount of work and likely introduce a fair number
> >> of security regressions for quite some time).
> > 
> > I understand and agree the concern of security. But the reality is more and more
> > similar or relative access control requirements will sprint up from the AI/LLM
> > applications and use cases. We just want to solve the access control challenge,
> > and the LSM rules based methods are not easy for application developers.
> 
> Do you have control over the AI that does this photo directory magic?

If you mean the shared photos category directories, once the photos are
explicitly shared, for the shared photos among all users, new category
directorie are created and decided by the AI modules. Users still have write or
delete permission if they are matched by some permission check. For example one
user's face ID matched in a shared group photo, this user will have permission
to delete this dentry of this shared photo category.

So the user doesn't have full control over the shared photo directories, and
they still have control if the photo related to them.

> Anyway, it sounds like a reasonable topic for the kernel summit (as a
> Tech Topic) instead of the maintainer's summit (process stuff).

Yes, I change the email topic to TECH TOPIC. And also sent proposal as TECH
TOPIC again to kernel summit mailing list, submitted the proposal on web page
as well.

Thank you for the discussion.

Coly Li

next             reply	other threads:[~2025-09-17  2:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  2:38 Coly Li [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-09-19  6:16 Coly Li
2025-09-08 13:57 [TECH TOPIC] Re-think " Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d36bb4fwzhasdhs2z6wvilckuuqte7b7ivpvghmse2txscrzqr@xpdquqqg6hwf \
    --to=colyli@fnnas.com \
    --cc=jack@suse.cz \
    --cc=ksummit@lists.linux.dev \
    --cc=paul@paul-moore.com \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox