Linux-SRT User's Manual (2) - System Calls

Synopsis

#include <sched.h>
#include <linux/qos.h>

int sched_inherit(int pid, int inherit);

int sched_query_reserve(int pid, char *name);
int join_reserve(int pid, char *name);
int leave_reserve(int pid);

int get_reserve(char *name, struct rsv_param *rsv);
int set_reserve(char *name, struct rsv_param *rsv,
                bool create, bool modify);
void delete_reserve(char *name);

int sched_yield(void);

Data Structure

From <linux/qos.h>

/* Scheduling policies: */
#define SCHED_OTHER     0
#define SCHED_FIFO      1
#define SCHED_RR        2
#define SCHED_QOS       3
#define SCHED_IDLE      4
#define SCHED_PAUSE     5
#define SCHED_RSV       6

struct rsv_param
{
   int policy, overrun_policy;
   
   int period;       /* ms */
   int cpu;          /* ms */

   int rt_priority;  /* Only for FIFO or RR */
   int priority;     /* Only for OTHER or IDLE */
};

The period and cpu fields are optional (zero if not used). If period is non-zero then so must cpu be.

Think of the cpu value as either a slice or a ceiling, depending on whether the relevant policy is QOS or not.

The "priority" field refers to the nice value. It would have been less confusing to call this "nice", and rename "rt_priority" to "static_priority". The reason for the strange names is that this is what they are called by struct task_struct inside the kernel.

Arguments

A pid of 0 can be used anywhere to indicate the current process.

Reserve names may contain spaces. A name of "" should be used to indicate an automatic reserve (which will actually be named auto <pid>). The auto prefix is reserved for automatic reserves (you cannot use it for a named reserve).

Description

sched_inherit

Sets the sched_inherit flag for process pid. inherit can be 1 (set), 0 (clear) or -1 (don't change). In all cases the previous value of the flag is returned.

sched_query_reserve

Fill in name with the reserve used by process . If no reserve is in use, return -ENOENT.

get_reserve

Fill in rsv with settings for the reserve indicated. If it does not exist, return -ENOENT.

join_reserve

Process joins the reserve given by . An implied leave_reserve() is performed first if necessary. If this is the first member to join a QOS reserve and allocation fails, the reserve remains uninstantiated (but still in the namespace), the call fails and -ENOSPC is returned.

leave_reserve

Leave the reserve currently in use by this process. Returns -ENOENT if not using a reserve.

set_reserve

This call defines a reserve with the given name (an automatic reserve can be specified by the empty string of course).

Note that a subsequent and explict join_reserve is necessary if the same process wishes to be a member of the new reserve.

If the reserve does not exist, the create flag specifies whether to create it. If the reserve does exist, the modify flag specifies whether to modify it.

An attempt to modify an existing, instantiated reserve may fail if the QOS allocation needs to be increased. In this case the reserve retains its current values and members, but the call returns -ENOSPC. An existing reserve which has no members can always be modified without any error possibility.

delete_reserve

The reserve must have no members. If so it is removed from the namespace.

sched_yield

This is an existing system call which we have extended. If called by a process using a reserve, the effect is to immediately drain the current allocation of CPU time to zero. The reserve will be refreshed at its next epoch as usual.

This can be used by programmers to synchronize an application with its reserve, avoiding the need for low resolution and unpredictable interval timers.

Return codes

Specific return codes are noted above for each function. In addition the following values can be returned in several cases:

EFAULT if there's a memory error with the supplied structure pointers
ESRCH if a process does not exist
EINVAL if the arguments take invalid values
EPERM if not permitted

Entry and exit from reserves

When a process exits for any reason it calls leave_reserve automatically.

Admission control for QOS reserves occurs when members exceeds 0, and the resources are freed when members drops back to 0.

Named reserves are never removed from the namespace until an explicit delete_reserve call.

Automatic reserves are also deleted when their owner dies if they have no remaining members at that time.

If they do have other members, the reserve is instead renamed according to the pid of the first remaining member. If that process has already created its automatic reserve (but not of course joined it, since processes can only belong to a single reserve) this is an error and the reserve is deleted instead (forcing the remaining members to leave it).

Note 2: Not deleting the reserve when the owner dies ensures that daemons which fork and exit to put themselves into the background will work with AutoQOS.

Note 1: The transfer of ownership maintains the relationship between automatic reserve names and live process pids, so no collisions occur after pid wrapping.

Author

David Ingram <dmi@uk.research.att.com>