[PATCHv3 0/8] CGroup Namespaces

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Richard Weinberger
Am 06.01.2015 um 01:10 schrieb Aditya Kali:
> Since the old/default behavior is on its way out, I didn't invest time
> in fixing that. Also, some of the properties that make
> cgroup-namespace simpler are only provided by unified hierarchy (for
> example: a single root-cgroup per container).

Does the new sane cgroupfs behavior even have a single real world user?
I always thought it isn't stable yet.

Linux distros currently use systemd v210. They don't dare to use a newer one.
Even *if* systemd would support the sane sane cgroupfs behavior in the most recent
version it will take 1-2 years until it would hit a recent distro.

So please support also the old and nasty behavior such that one day we can run current
systemd distros in Linux containers.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Aditya Kali
I understand your point. But it will add some complexity to the code.

Before trying to make it work for non-unified hierarchy cases, I would
like to get a clearer idea.
What do you expect to be mounted when you run:
  container:/ # mount -t cgroup none /sys/fs/cgroup/
from inside the container?

Note that cgroup-namespace wont be able to change the way cgroups are
mounted .. i.e., if say cpu and cpuacct subsystems are mounted
together at a single mount-point, then we cannot mount them any other
way (inside a container or outside). This restriction exists today and
cgroup-namespaces won't change that.

So, If on the host we have:
root@adityakali-vm2:/sys/fs/cgroup# cat /proc/mounts | grep cgroup
tmpfs /sys/fs/cgroup tmpfs rw,relatime 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpuset,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/mem cgroup rw,relatime,memory,hugetlb 0 0
cgroup /sys/fs/cgroup/rest cgroup
rw,relatime,devices,freezer,net_cls,blkio,perf_event,net_prio 0 0

And inside the container we want each subsystem to be on its own
mount-point, then it will fail. Do you think even then its useful to
support virtualizing paths for non-unified hierarchies?

Thanks,


On Mon, Jan 5, 2015 at 4:17 PM, Richard Weinberger <[hidden email]> wrote:

> Am 06.01.2015 um 01:10 schrieb Aditya Kali:
>> Since the old/default behavior is on its way out, I didn't invest time
>> in fixing that. Also, some of the properties that make
>> cgroup-namespace simpler are only provided by unified hierarchy (for
>> example: a single root-cgroup per container).
>
> Does the new sane cgroupfs behavior even have a single real world user?
> I always thought it isn't stable yet.
>
> Linux distros currently use systemd v210. They don't dare to use a newer one.
> Even *if* systemd would support the sane sane cgroupfs behavior in the most recent
> version it will take 1-2 years until it would hit a recent distro.
>
> So please support also the old and nasty behavior such that one day we can run current
> systemd distros in Linux containers.
>
> Thanks,
> //richard



--
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Richard Weinberger
Am 07.01.2015 um 00:20 schrieb Aditya Kali:
> I understand your point. But it will add some complexity to the code.
>
> Before trying to make it work for non-unified hierarchy cases, I would
> like to get a clearer idea.
> What do you expect to be mounted when you run:
>   container:/ # mount -t cgroup none /sys/fs/cgroup/
> from inside the container?

I expect cgroupfs to behave exactly as it would in the initial namespace.
Such that the container can do with it whatever it wants.
systemd mounts and manages cgroups on its own.
Like for CONFIG_DEVPTS_MULTIPLE_INSTANCES.

If a new cgroup namespace cannot provide a clean and autonomous cgroupfs
instance it is fundamentally flawed.
You cannot provide a namespace mechanism which depends on the host side
that much.
This will also horrible break container migrations between hosts.
i.e. Migrate a container from a Ubuntu host to a Fedora (systemd!) host.

> Note that cgroup-namespace wont be able to change the way cgroups are
> mounted .. i.e., if say cpu and cpuacct subsystems are mounted
> together at a single mount-point, then we cannot mount them any other
> way (inside a container or outside). This restriction exists today and
> cgroup-namespaces won't change that.

Why can't cgroup namespace change this?
I think of cgroup namespace as a new and clean cgroupfs instance which inherits
all limits from the outside.

> So, If on the host we have:
> root@adityakali-vm2:/sys/fs/cgroup# cat /proc/mounts | grep cgroup
> tmpfs /sys/fs/cgroup tmpfs rw,relatime 0 0
> cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpuset,cpu,cpuacct 0 0
> cgroup /sys/fs/cgroup/mem cgroup rw,relatime,memory,hugetlb 0 0
> cgroup /sys/fs/cgroup/rest cgroup
> rw,relatime,devices,freezer,net_cls,blkio,perf_event,net_prio 0 0
>
> And inside the container we want each subsystem to be on its own
> mount-point, then it will fail. Do you think even then its useful to
> support virtualizing paths for non-unified hierarchies?

As I've stated above I expect from cgroup namespaces a clean and sane
cgroupfs instance no matter how the outer mounts are.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Richard Weinberger
In reply to this post by Aditya Kali
Am 07.01.2015 um 00:20 schrieb Aditya Kali:

> I understand your point. But it will add some complexity to the code.
>
> Before trying to make it work for non-unified hierarchy cases, I would
> like to get a clearer idea.
> What do you expect to be mounted when you run:
>   container:/ # mount -t cgroup none /sys/fs/cgroup/
> from inside the container?
>
> Note that cgroup-namespace wont be able to change the way cgroups are
> mounted .. i.e., if say cpu and cpuacct subsystems are mounted
> together at a single mount-point, then we cannot mount them any other
> way (inside a container or outside). This restriction exists today and
> cgroup-namespaces won't change that.

I wondered why cgroup namespaces won't change that and looked at your patches
in more detail.
What you propose as cgroup namespace is much more a cgroup chroot() than
a namespace.
As you pass relative paths into the namespace you depend on the mount structure
of the host side.
Hence, the abstraction between namespaces happens on the mount paths of the initial
cgroupfs. But we really want a new cgroupfs instance within a container and not just
a cut out of the initial cgroupfs mount.

I fear you approach is over simplified and won't work for all cases. It may work
for your specific use case at Google but we really want something generic.
Eric, what do you think?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman
Richard Weinberger <[hidden email]> writes:

> Am 07.01.2015 um 00:20 schrieb Aditya Kali:
>> I understand your point. But it will add some complexity to the code.
>>
>> Before trying to make it work for non-unified hierarchy cases, I would
>> like to get a clearer idea.
>> What do you expect to be mounted when you run:
>>   container:/ # mount -t cgroup none /sys/fs/cgroup/
>> from inside the container?
>>
>> Note that cgroup-namespace wont be able to change the way cgroups are
>> mounted .. i.e., if say cpu and cpuacct subsystems are mounted
>> together at a single mount-point, then we cannot mount them any other
>> way (inside a container or outside). This restriction exists today and
>> cgroup-namespaces won't change that.
>
> I wondered why cgroup namespaces won't change that and looked at your patches
> in more detail.
> What you propose as cgroup namespace is much more a cgroup chroot() than
> a namespace.
> As you pass relative paths into the namespace you depend on the mount structure
> of the host side.
> Hence, the abstraction between namespaces happens on the mount paths of the initial
> cgroupfs. But we really want a new cgroupfs instance within a container and not just
> a cut out of the initial cgroupfs mount.
>
> I fear you approach is over simplified and won't work for all cases. It may work
> for your specific use case at Google but we really want something generic.
> Eric, what do you think?

I think I probably need to go back upthread and read the patches.

I think it is a reasonable practical requirement that a widely used long
term supported distribution like RHEL 7 needs to be able to run in a linux
container bizarre init system and all.  And that we the abstractions
should be that that we should be able to migrate such a beast.

There are a couple of issues in play and I think we need actual testing
rather than reports that something shouldn't work before we reject a set
of patches.    Aditya in one of his replies to me has reported a
configuration that he expects will work.  So I think that configuration
needs to be tested.

cgroups is a weird beast and the problems tend not to lie where a person
would first expect.

I suspect no one strongly cares if the cgroup hierarchy is unified or
not.  By unified hierarchy I mean that  every mount of cgroupfs has the
same directories with the same processes in each directory.

I do think people will care which controllers will show up in differ
mounts of cgroupfs, and I think that is relevant to process migration.




I am going to segway into scope of what is achievable with a cgroup namespace.

- If there are files in cgroupfs that are not safe to delegate we can
  not support those files in a container.

  Last I looked there were such files and systemd used them.

- Which controllers share hierarchies of processes to track resources is
  a core cgroup issue and not a cgroup namespace issue.

  If we find problems with using a unified hierarchy support we need to
  go fix cgroups in general not cgroupfs.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Aditya Kali
In reply to this post by Richard Weinberger
On Wed, Jan 7, 2015 at 1:28 AM, Richard Weinberger <[hidden email]> wrote:

> Am 07.01.2015 um 00:20 schrieb Aditya Kali:
>> I understand your point. But it will add some complexity to the code.
>>
>> Before trying to make it work for non-unified hierarchy cases, I would
>> like to get a clearer idea.
>> What do you expect to be mounted when you run:
>>   container:/ # mount -t cgroup none /sys/fs/cgroup/
>> from inside the container?
>>
>> Note that cgroup-namespace wont be able to change the way cgroups are
>> mounted .. i.e., if say cpu and cpuacct subsystems are mounted
>> together at a single mount-point, then we cannot mount them any other
>> way (inside a container or outside). This restriction exists today and
>> cgroup-namespaces won't change that.
>
> I wondered why cgroup namespaces won't change that and looked at your patches
> in more detail.
> What you propose as cgroup namespace is much more a cgroup chroot() than
> a namespace.
> As you pass relative paths into the namespace you depend on the mount structure
> of the host side.
> Hence, the abstraction between namespaces happens on the mount paths of the initial
> cgroupfs. But we really want a new cgroupfs instance within a container and not just
> a cut out of the initial cgroupfs mount.
>

What you describe will be useful at Google too, just that I found it
difficult/infeasible to include it in the scope of cgroup namespaces.
The scope of cgroup namespace was deliberately limited to virtualize
/proc/<pid>/cgroup file. That too in a way that doesn't need major
changes to cgroup code itself. (It was also limited to unified
hierarchy to keep things simple, but that can be changed).

Many of the cgroup subsystems (memory, cpu, etc) rely on the fact that
they can see entire cgroup view. For example, in a memcg-OOM scenario,
the memory controller would need to look at all sub-cgroups inside the
OOMing cgroup. A per namespace cgroupfs instance (if I understand
correctly) would mean that sub-cgroups created inside the namespace
won't be visible outside. I expect this will break the functionality
of the subsystem.

Illustration: memcg A is under OOM; [B] and [C] are cgroup namespace
roots with possibly namespace-private sub-cgroups.
              ------ [B]
A --------|
              ------ [C]

Cgroups are heavily used inside the kernel for various purposes which
need any namespace-agnostic view. Inherent limitation of running
containers running on a machine is that they share the same kernel.
Perhaps what you need is something like kexec to be supported inside a
container.

> I fear you approach is over simplified and won't work for all cases. It may work
> for your specific use case at Google but we really want something generic.
> Eric, what do you think?
>
> Thanks,
> //richard


--
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Serge E. Hallyn-3
In reply to this post by Eric W. Biederman
Quoting Eric W. Biederman ([hidden email]):

> Richard Weinberger <[hidden email]> writes:
>
> > Am 07.01.2015 um 00:20 schrieb Aditya Kali:
> >> I understand your point. But it will add some complexity to the code.
> >>
> >> Before trying to make it work for non-unified hierarchy cases, I would
> >> like to get a clearer idea.
> >> What do you expect to be mounted when you run:
> >>   container:/ # mount -t cgroup none /sys/fs/cgroup/
> >> from inside the container?
> >>
> >> Note that cgroup-namespace wont be able to change the way cgroups are
> >> mounted .. i.e., if say cpu and cpuacct subsystems are mounted
> >> together at a single mount-point, then we cannot mount them any other
> >> way (inside a container or outside). This restriction exists today and
> >> cgroup-namespaces won't change that.
> >
> > I wondered why cgroup namespaces won't change that and looked at your patches
> > in more detail.
> > What you propose as cgroup namespace is much more a cgroup chroot() than
> > a namespace.
> > As you pass relative paths into the namespace you depend on the mount structure
> > of the host side.
> > Hence, the abstraction between namespaces happens on the mount paths of the initial
> > cgroupfs. But we really want a new cgroupfs instance within a container and not just
> > a cut out of the initial cgroupfs mount.
> >
> > I fear you approach is over simplified and won't work for all cases. It may work
> > for your specific use case at Google but we really want something generic.
> > Eric, what do you think?
>
> I think I probably need to go back upthread and read the patches.
>
> I think it is a reasonable practical requirement that a widely used long
> term supported distribution like RHEL 7 needs to be able to run in a linux
> container bizarre init system and all.  And that we the abstractions
> should be that that we should be able to migrate such a beast.

Userspace should be able to deal with however cgroups are mounted for
it.  The only case I've heard of where it really made a meaningful
difference was google's advanced grid usage.  In fact, the whole
justification of the unified cgroup stuff was that it was claimed (and
argued against by google) that that sufficed for any users.

Now yes, until now userspace could cache its info on how cgroups were
mounted and assume that wouldn't change (because the kernel wouldn't
let it), and migration will break that.  But if the cgroup roadmap
is to obsolete anything but unified hierarchy, then this was going
to happen regardless of what the cgroupns patchset did.

I agree with Aditya.  So long as the proclaimed direction of cgroups is
to only support unified cgroup hierarchy, there's no point in having
cgroupns do anything more than the chrooting.

> There are a couple of issues in play and I think we need actual testing
> rather than reports that something shouldn't work before we reject a set
> of patches.    Aditya in one of his replies to me has reported a
> configuration that he expects will work.  So I think that configuration
> needs to be tested.
>
> cgroups is a weird beast and the problems tend not to lie where a person
> would first expect.
>
> I suspect no one strongly cares if the cgroup hierarchy is unified or
> not.

Well, google does.  There are cases that were either much more complicated
or impossible to represent with unified hierarchy.  But complicating cgroupns
to support something which Tejun has said is explicitly not going to be
supported in the future would be ill-advised.

>   By unified hierarchy I mean that  every mount of cgroupfs has the
> same directories with the same processes in each directory.

No, my reading of Documentation/cgroups/unified-hierarchy.txt is that
unified hierarchy means that all (sane) controllers are co-mounted into
one hierarchy.

> I do think people will care which controllers will show up in differ
> mounts of cgroupfs, and I think that is relevant to process migration.
>
>
>
>
> I am going to segway into scope of what is achievable with a cgroup namespace.
>
> - If there are files in cgroupfs that are not safe to delegate we can
>   not support those files in a container.
>
>   Last I looked there were such files and systemd used them.
>
> - Which controllers share hierarchies of processes to track resources is
>   a core cgroup issue and not a cgroup namespace issue.
>
>   If we find problems with using a unified hierarchy support we need to
>   go fix cgroups in general not cgroupfs.
>
> Eric
> _______________________________________________
> Containers mailing list
> [hidden email]
> https://lists.linuxfoundation.org/mailman/listinfo/containers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman
"Serge E. Hallyn" <[hidden email]> writes:

>>   By unified hierarchy I mean that  every mount of cgroupfs has the
>> same directories with the same processes in each directory.
>
> No, my reading of Documentation/cgroups/unified-hierarchy.txt is that
> unified hierarchy means that all (sane) controllers are co-mounted into
> one hierarchy.

I see what you mean.  If it is indeed the case than a mount of cgroupfs
using the unified hiearchy and can not specify which controllers are
present under that mount that very significant bug and presents a very
significant regression in user space flexibility.

I think you can still mount the unified hierarchy and select which
controls you want to see.  If you can not that is a change significantly
past what was agreed to and a regression fix needs to be applied.

With a unified hierarchy and separate controllers per mount many cgroup
using applications will continue to work as before without changes, or
with minimal changes.  That is what was agreed to and what I expect has
been actually implemented and it is what needs to be implemented in any
case.

I will see about making time to see where things are really at.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
On Wed, Jan 07, 2015 at 04:14:40PM -0600, Eric W. Biederman wrote:
> I see what you mean.  If it is indeed the case than a mount of cgroupfs
> using the unified hiearchy and can not specify which controllers are
> present under that mount that very significant bug and presents a very
> significant regression in user space flexibility.

The parent always controls which controllers are made available at the
children level.  Only if the parent enables a controller, its
children, whether they're namespaces or not, can choose to further
distribute resources using that controller.  It's a straight-forward
top-down thing.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman
Tejun Heo <[hidden email]> writes:

> On Wed, Jan 07, 2015 at 04:14:40PM -0600, Eric W. Biederman wrote:
>> I see what you mean.  If it is indeed the case than a mount of cgroupfs
>> using the unified hiearchy and can not specify which controllers are
>> present under that mount that very significant bug and presents a very
>> significant regression in user space flexibility.
>
> The parent always controls which controllers are made available at the
> children level.  Only if the parent enables a controller, its
> children, whether they're namespaces or not, can choose to further
> distribute resources using that controller.  It's a straight-forward
> top-down thing.

Ignoring namespace details for a moment. The following should be
possible with a unified hierarchy.  If it is not it is a show stopper
of a regression.

mount -t tmpfs none /sys/fs/cgroup
(cd /sys/fs/cgroup ; mkdir cpu cpuacct devices memory)
mount -t cgroupfs -o cpu /sys/fs/cgroup/cpu
mount -t cgroupfs -o cpuacct /sys/fs/cgroup/cpuacct
mount -t cgroupfs -o devices /sys/fs/cgroup/devices
mount -t cgroupfs -o memory /sys/fs/cgroup/memory

With the expectation that only the control files for the specified
controllers show up in those mounts.

That is a unified hierarchy is fine.  Requiring that there only be one
mount point and that every one use it is not ok and it actively a problem.

It is absolutely required to be able to avoid b0rked controllers, and
to my knowledge the only way to do that is to have multiple mounts where
we pick the controller on each mount.   Even if there is now a way that
doesn't require multiple mounts to keep b0rked controllers from being
enabled multiple mounts still need to work to support the existing
userspace programs.

This discussion is happening because Documentation/cgroups/unified-hierarchy.txt
implies the configuration I have just described will not work with
unified hierachies enabled.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
On Wed, Jan 07, 2015 at 05:02:17PM -0600, Eric W. Biederman wrote:
> Ignoring namespace details for a moment. The following should be
> possible with a unified hierarchy.  If it is not it is a show stopper
> of a regression.

The -o SUBSYS option doesn't exist.  Jesus, at least get yourself
familiar with the basics before claiming random stuff.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman
Tejun Heo <[hidden email]> writes:

> On Wed, Jan 07, 2015 at 05:02:17PM -0600, Eric W. Biederman wrote:
>> Ignoring namespace details for a moment. The following should be
>> possible with a unified hierarchy.  If it is not it is a show stopper
>> of a regression.
>
> The -o SUBSYS option doesn't exist.  Jesus, at least get yourself
> familiar with the basics before claiming random stuff.

Not random and I am familiar thank you very much.

I may have mistyped the manual command line configuration for specifying
which controllers appear on a mount point does not alter my point.

The old options to enable selecting controllers need to continue and
need to continue to work with a unified hierarchy.

Anything else is a gratuitious regression.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
On Wed, Jan 07, 2015 at 05:09:53PM -0600, Eric W. Biederman wrote:
> I may have mistyped the manual command line configuration for specifying
> which controllers appear on a mount point does not alter my point.

Hmmm?  You were talking about the old hierarchies?

> The old options to enable selecting controllers need to continue and
> need to continue to work with a unified hierarchy.
>
> Anything else is a gratuitious regression.

I have no idea what you're on about.  If the outer system uses unified
hierarchy, the inner system should use that too.  If the outer system
doesn't use unified hierarchy, namespace support has never existed,
and even if it did, the inside could never pick and choose controllers
independent from the outside.  If the outside is co-mounting cpu and
cpuacct, the inside is either also doing that or not mounting either.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman
In reply to this post by Eric W. Biederman
[hidden email] (Eric W. Biederman) writes:

> Tejun Heo <[hidden email]> writes:
>
>> On Wed, Jan 07, 2015 at 05:02:17PM -0600, Eric W. Biederman wrote:
>>> Ignoring namespace details for a moment. The following should be
>>> possible with a unified hierarchy.  If it is not it is a show stopper
>>> of a regression.
>>
>> The -o SUBSYS option doesn't exist.  Jesus, at least get yourself
>> familiar with the basics before claiming random stuff.

Oh let's see I got that command line option out of /proc/mounts and yes
it works.  Perhaps it doesn't if I invoke unified hiearchies but the
option does in fact exist and work.

Now I really do need to test report regressions, and send probably send
regression fixes.  If I understand your strange ranting I think you just
told me that option that -o SUBSYS does work with unified hierarchies.

Tejun.  I asked you specifically about this case 2 years ago at plumbers
and you personally told me this would continue to work.  I am going to
hold you to that.

Fixing bugs is one thing.  Gratuitious regressions that make supporting
existing user space applications insane is another.

Eric



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
On Wed, Jan 07, 2015 at 05:27:38PM -0600, Eric W. Biederman wrote:
> >> The -o SUBSYS option doesn't exist.  Jesus, at least get yourself
> >> familiar with the basics before claiming random stuff.
>
> Oh let's see I got that command line option out of /proc/mounts and yes
> it works.  Perhaps it doesn't if I invoke unified hiearchies but the
> option does in fact exist and work.

I meant the -o SUBSYS doesn't exist for unified hierarchy.

> Now I really do need to test report regressions, and send probably send
> regression fixes.  If I understand your strange ranting I think you just
> told me that option that -o SUBSYS does work with unified hierarchies.

What?  Why would -O SUBSYS exist for unified hierarchy?  It's unified
for all controllers.

> Tejun.  I asked you specifically about this case 2 years ago at plumbers
> and you personally told me this would continue to work.  I am going to
> hold you to that.

I have no idea what you're talking about in *THIS* thread.  I'm fully
aware of what was discussed *THEN*.

> Fixing bugs is one thing.  Gratuitious regressions that make supporting
> existing user space applications insane is another.

Can you explain what problem you're actually trying to talk about
without spouting random claims about regressions?

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Serge E. Hallyn-3
Quoting Tejun Heo ([hidden email]):

> On Wed, Jan 07, 2015 at 05:27:38PM -0600, Eric W. Biederman wrote:
> > >> The -o SUBSYS option doesn't exist.  Jesus, at least get yourself
> > >> familiar with the basics before claiming random stuff.
> >
> > Oh let's see I got that command line option out of /proc/mounts and yes
> > it works.  Perhaps it doesn't if I invoke unified hiearchies but the
> > option does in fact exist and work.
>
> I meant the -o SUBSYS doesn't exist for unified hierarchy.
>
> > Now I really do need to test report regressions, and send probably send
> > regression fixes.  If I understand your strange ranting I think you just
> > told me that option that -o SUBSYS does work with unified hierarchies.
>
> What?  Why would -O SUBSYS exist for unified hierarchy?  It's unified
> for all controllers.
>
> > Tejun.  I asked you specifically about this case 2 years ago at plumbers
> > and you personally told me this would continue to work.  I am going to
> > hold you to that.
>
> I have no idea what you're talking about in *THIS* thread.  I'm fully
> aware of what was discussed *THEN*.
>
> > Fixing bugs is one thing.  Gratuitious regressions that make supporting
> > existing user space applications insane is another.
>
> Can you explain what problem you're actually trying to talk about
> without spouting random claims about regressions?

A few weeks ago, in order to test the cgroup namespace patchset with lxc,
I went through the motions of getting lxc to work with unified hierarchy.
A few of the things I had to change:

1. Hierarchy_num in /proc/cgroups and /proc/self/cgroup start at 0.  Used
to start with 1.  I expect many userspace parsers to be broken by this.

2. After creating every non-leaf cgroup, we must fill in the
cgroup.subtree_cgroups file.  This is extra work which userspace
doesn't have to do right now.

3. Let's say we want to create a freezer cgroup /foo/bar for some set of
tasks, which they will administer.  In fact let's assume we are going to
use cgroup namespaces.  We have to put the tasks into /foo/bar, unshare
the cgroup ns, then create /foo/bar/leaf, move the tasks into /foo/bar/leaf,
and then write 'freezer' into /foo/bar.  (If we're not using cgroup
namespaces, then we have to do a similar thing to let the tasks administer
/foo/bar while placing them under /foo/bar/leaf).  The oddness I'm pointing
to is where the tasks have to know that they can create cgroups in "..".

For containers this becomes odd.  We tend to group containers by the
tasks in and under a cgroup.  We now will have to assume a convention
where we know to check for tasks in and under "..", since by definition
pid 1's cgroup (in a container) cannot have children.

4. The per-cgroup "tasks" file not existing seems odd, although certainly
unexpected by much current software.

So, if the unified hierarchy is going to not cause undue pain, existing
software really needs to start working now to use it.  It's going to be
a sizeable task for lxc.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
On Wed, Feb 11, 2015 at 04:46:16AM +0100, Serge E. Hallyn wrote:
> 1. Hierarchy_num in /proc/cgroups and /proc/self/cgroup start at 0.  Used
> to start with 1.  I expect many userspace parsers to be broken by this.

This is intentional.  The unified hierarchy will always have the
hierarchy number zero.  Userland needs to be updated anyway and the
unified hierarchy won't show up unless explicitly enabled.

> 2. After creating every non-leaf cgroup, we must fill in the
> cgroup.subtree_cgroups file.  This is extra work which userspace
> doesn't have to do right now.

Again, by design.  This is how organization and control are separated
and the differing levels of granularity is achieved.

> 3. Let's say we want to create a freezer cgroup /foo/bar for some set of

There shouldn't be a "freezer" cgroup.  The processes are categorized
according to their logical structure and controllers are applied to
the hierarchy as necessary.

> tasks, which they will administer.  In fact let's assume we are going to
> use cgroup namespaces.  We have to put the tasks into /foo/bar, unshare
> the cgroup ns, then create /foo/bar/leaf, move the tasks into /foo/bar/leaf,
> and then write 'freezer' into /foo/bar.  (If we're not using cgroup
> namespaces, then we have to do a similar thing to let the tasks administer
> /foo/bar while placing them under /foo/bar/leaf).  The oddness I'm pointing
> to is where the tasks have to know that they can create cgroups in "..".
>
> For containers this becomes odd.  We tend to group containers by the
> tasks in and under a cgroup.  We now will have to assume a convention
> where we know to check for tasks in and under "..", since by definition
> pid 1's cgroup (in a container) cannot have children.

The semantics is that the parent enables distribution of its given
type of resource by enabling the controller in its subtree_control.
This scoping isn't necessary for freezer and I'm debating whether to
enable controllers which don't need granularity control to be enabled
unconditionally.  Right now, I'm leaning against it mostly for
consistency.

> 4. The per-cgroup "tasks" file not existing seems odd, although certainly
> unexpected by much current software.

And, yes, everything is per-process for reasons described in
unified-hierarchy.txt.

> So, if the unified hierarchy is going to not cause undue pain, existing
> software really needs to start working now to use it.  It's going to be
> a sizeable task for lxc.

Yes, this isn't gonna be a trivial conversion.  The usage model
changes and so will a lot of controller knobs and behaviors.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Serge E. Hallyn-3
Quoting Tejun Heo ([hidden email]):

> On Wed, Feb 11, 2015 at 04:46:16AM +0100, Serge E. Hallyn wrote:
> > 1. Hierarchy_num in /proc/cgroups and /proc/self/cgroup start at 0.  Used
> > to start with 1.  I expect many userspace parsers to be broken by this.
>
> This is intentional.  The unified hierarchy will always have the
> hierarchy number zero.  Userland needs to be updated anyway and the
> unified hierarchy won't show up unless explicitly enabled.
>
> > 2. After creating every non-leaf cgroup, we must fill in the
> > cgroup.subtree_cgroups file.  This is extra work which userspace
> > doesn't have to do right now.
>
> Again, by design.  This is how organization and control are separated
> and the differing levels of granularity is achieved.
>
> > 3. Let's say we want to create a freezer cgroup /foo/bar for some set of
>
> There shouldn't be a "freezer" cgroup.  The processes are categorized
> according to their logical structure and controllers are applied to
> the hierarchy as necessary.

But there can well be cgroups for which only freezer is enabled.  If
I'm wrong about that, then I am suffering a fundamental misunderstanding.

> > tasks, which they will administer.  In fact let's assume we are going to
> > use cgroup namespaces.  We have to put the tasks into /foo/bar, unshare
> > the cgroup ns, then create /foo/bar/leaf, move the tasks into /foo/bar/leaf,
> > and then write 'freezer' into /foo/bar.  (If we're not using cgroup
> > namespaces, then we have to do a similar thing to let the tasks administer
> > /foo/bar while placing them under /foo/bar/leaf).  The oddness I'm pointing
> > to is where the tasks have to know that they can create cgroups in "..".
> >
> > For containers this becomes odd.  We tend to group containers by the
> > tasks in and under a cgroup.  We now will have to assume a convention
> > where we know to check for tasks in and under "..", since by definition
> > pid 1's cgroup (in a container) cannot have children.
>
> The semantics is that the parent enables distribution of its given
> type of resource by enabling the controller in its subtree_control.
> This scoping isn't necessary for freezer and I'm debating whether to
> enable controllers which don't need granularity control to be enabled
> unconditionally.  Right now, I'm leaning against it mostly for
> consistency.

Yeah, IIUC (i.e. freezer would always be enabled?) that would be
even-more-confusing.

> > 4. The per-cgroup "tasks" file not existing seems odd, although certainly
> > unexpected by much current software.
>
> And, yes, everything is per-process for reasons described in
> unified-hierarchy.txt.
>
> > So, if the unified hierarchy is going to not cause undue pain, existing
> > software really needs to start working now to use it.  It's going to be
> > a sizeable task for lxc.
>
> Yes, this isn't gonna be a trivial conversion.  The usage model
> changes and so will a lot of controller knobs and behaviors.
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Eric W. Biederman

A slightly off topic comment, for where this thread has gone but
relevant if we are talking about cgroup namespaces.

If don't implement compatibility with existing userspace, they get a
nack.  A backwards-incompatible change should figure out how to remove
the need for any namespaces.

Because that is what namespaces are about backwards compatibility.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces

Tejun Heo-2
In reply to this post by Serge E. Hallyn-3
Hello,

On Wed, Feb 11, 2015 at 05:29:42AM +0100, Serge E. Hallyn wrote:
> > There shouldn't be a "freezer" cgroup.  The processes are categorized
> > according to their logical structure and controllers are applied to
> > the hierarchy as necessary.
>
> But there can well be cgroups for which only freezer is enabled.  If
> I'm wrong about that, then I am suffering a fundamental misunderstanding.

Ah, sure, I was mostly arguing semantics.  It's just weird to call it
"freezer" cgroup.

> > The semantics is that the parent enables distribution of its given
> > type of resource by enabling the controller in its subtree_control.
> > This scoping isn't necessary for freezer and I'm debating whether to
> > enable controllers which don't need granularity control to be enabled
> > unconditionally.  Right now, I'm leaning against it mostly for
> > consistency.
>
> Yeah, IIUC (i.e. freezer would always be enabled?) that would be
> even-more-confusing.

Right, freezer is kinda weird tho.  Its feature can almost be
considered a utility feature of cgroups core rather than a separate
controller.  That said, it's most likely that it'll remain in its
current form although how it blocks tasks should definitely be
reimplemented.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
123