API Reference¶
Packages¶
inference.networking.x-k8s.io/v1alpha2¶
Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.
Resource Types¶
Extension¶
Extension specifies how to configure an extension that runs the endpoint picker.
Appears in: - InferencePoolSpec
Field | Description | Default | Validation |
---|---|---|---|
group Group |
Group is the group of the referent. The default value is "", representing the Core API group. |
MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ |
|
kind Kind |
Kind is the Kubernetes resource kind of the referent. Defaults to "Service" when not specified. ExternalName services can refer to CNAME DNS records that may live outside of the cluster and as such are difficult to reason about in terms of conformance. They also may not be safe to forward to (see CVE-2021-25740 for more information). Implementations MUST NOT support ExternalName Services. |
Service | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ |
name ObjectName |
Name is the name of the referent. | MaxLength: 253 MinLength: 1 Required: {} |
|
portNumber PortNumber |
The port number on the service running the extension. When unspecified, implementations SHOULD infer a default value of 9002 when the Kind is Service. |
Maximum: 65535 Minimum: 1 |
|
failureMode ExtensionFailureMode |
Configures how the gateway handles the case when the extension is not responsive. Defaults to failClose. |
FailClose | Enum: [FailOpen FailClose] |
ExtensionFailureMode¶
Underlying type: string
ExtensionFailureMode defines the options for how the gateway handles the case when the extension is not responsive.
Validation: - Enum: [FailOpen FailClose]
Appears in: - Extension
Field | Description |
---|---|
FailOpen |
FailOpen specifies that the proxy should forward the request to an endpoint of its picking when the Endpoint Picker fails. |
FailClose |
FailClose specifies that the proxy should drop the request when the Endpoint Picker fails. |
Group¶
Underlying type: string
Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.
This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208
Valid values include:
- "" - empty string implies core Kubernetes API group
- "gateway.networking.k8s.io"
- "foo.example.com"
Invalid values include:
- "example.com/bar" - "/" is an invalid character
Validation:
- MaxLength: 253
- Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
Appears in: - Extension - ParentGatewayReference - PoolObjectReference
InferenceObjective¶
InferenceObjective is the Schema for the InferenceObjectives API.
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha2 |
||
kind string |
InferenceObjective |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec InferenceObjectiveSpec |
|||
status InferenceObjectiveStatus |
InferenceObjectiveSpec¶
InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.
The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.
Appears in: - InferenceObjective
Field | Description | Default | Validation |
---|---|---|---|
priority integer |
Priority defines how important it is to serve the request compared to other requests in the same pool. Priority is an integer value that defines the priority of the request. The higher the value, the more critical the request is; negative values are allowed. No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field. However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'. Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued). All requests will be queued, and flow control will always allow requests of higher priority to be served first. Fairness is only enforced and tracked between requests of the same priority. Example: requests with Priority 10 will always be served before requests with Priority of 0 (the value used if Priority is unset or no InfereneceObjective is specified). Similarly requests with a Priority of -10 will always be served after requests with Priority of 0. |
||
poolRef PoolObjectReference |
PoolRef is a reference to the inference pool, the pool must exist in the same namespace. | Required: {} |
InferenceObjectiveStatus¶
InferenceObjectiveStatus defines the observed state of InferenceObjective
Appears in: - InferenceObjective
Field | Description | Default | Validation |
---|---|---|---|
conditions Condition array |
Conditions track the state of the InferenceObjective. Known condition types are: * "Accepted" |
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]] | MaxItems: 8 |
InferencePool¶
InferencePool is the Schema for the InferencePools API.
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha2 |
||
kind string |
InferencePool |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec InferencePoolSpec |
|||
status InferencePoolStatus |
Status defines the observed state of InferencePool. | { parent:[map[conditions:[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] parentRef:map[kind:Status name:default]]] } |
InferencePoolSpec¶
InferencePoolSpec defines the desired state of InferencePool
Appears in: - InferencePool
Field | Description | Default | Validation |
---|---|---|---|
selector object (keys:LabelKey, values:LabelValue) |
Selector defines a map of labels to watch model server Pods that should be included in the InferencePool. In some cases, implementations may translate this field to a Service selector, so this matches the simple map used for Service selectors instead of the full Kubernetes LabelSelector type. If specified, it will be applied to match the model server pods in the same namespace as the InferencePool. Cross namesoace selector is not supported. |
Required: {} |
|
targetPortNumber integer |
TargetPortNumber defines the port number to access the selected model server Pods. The number must be in the range 1 to 65535. |
Maximum: 65535 Minimum: 1 Required: {} |
|
extensionRef Extension |
Extension configures an endpoint picker as an extension service. |
InferencePoolStatus¶
InferencePoolStatus defines the observed state of InferencePool.
Appears in: - InferencePool
Field | Description | Default | Validation |
---|---|---|---|
parent PoolStatus array |
Parents is a list of parent resources (usually Gateways) that are associated with the InferencePool, and the status of the InferencePool with respect to each parent. A maximum of 32 Gateways will be represented in this list. When the list contains kind: Status, name: default , it indicates that the InferencePool is notassociated with any Gateway and a controller must perform the following: - Remove the parent when setting the "Accepted" condition. - Add the parent when the controller will no longer manage the InferencePool and no other parents exist. |
MaxItems: 32 |
Kind¶
Underlying type: string
Kind refers to a Kubernetes Kind.
Valid values include:
- "Service"
- "HTTPRoute"
Invalid values include:
- "invalid/kind" - "/" is an invalid character
Validation:
- MaxLength: 63
- MinLength: 1
- Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
Appears in: - Extension - ParentGatewayReference - PoolObjectReference
LabelKey¶
Underlying type: string
LabelKey was originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731 Duplicated as to not take an unexpected dependency on gw's API.
LabelKey is the key of a label. This is used for validation of maps. This matches the Kubernetes "qualified name" validation that is used for labels. Labels are case sensitive, so: my-label and My-Label are considered distinct.
Valid values include:
- example
- example.com
- example.com/path
- example.com/path.html
Invalid values include:
- example~ - "~" is an invalid character
- example.com. - can not start or end with "."
Validation:
- MaxLength: 253
- MinLength: 1
- Pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$
Appears in: - InferencePoolSpec
LabelValue¶
Underlying type: string
LabelValue is the value of a label. This is used for validation of maps. This matches the Kubernetes label validation rules: * must be 63 characters or less (can be empty), * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]), * could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
Valid values include:
- MyValue
- my.name
- 123-my-value
Validation:
- MaxLength: 63
- MinLength: 0
- Pattern: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$
Appears in: - InferencePoolSpec
Namespace¶
Underlying type: string
Namespace refers to a Kubernetes namespace. It must be a RFC 1123 label.
This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L187
This is used for Namespace name validation here: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/api/validation/generic.go#L63
Valid values include:
- "example"
Invalid values include:
- "example.com" - "." is an invalid character
Validation:
- MaxLength: 63
- MinLength: 1
- Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$
Appears in: - ParentGatewayReference
ObjectName¶
Underlying type: string
ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.
Validation: - MaxLength: 253 - MinLength: 1
Appears in: - Extension - ParentGatewayReference - PoolObjectReference
ParentGatewayReference¶
ParentGatewayReference identifies an API object including its namespace, defaulting to Gateway.
Appears in: - PoolStatus
Field | Description | Default | Validation |
---|---|---|---|
group Group |
Group is the group of the referent. | gateway.networking.k8s.io | MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ |
kind Kind |
Kind is kind of the referent. For example "Gateway". | Gateway | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ |
name ObjectName |
Name is the name of the referent. | MaxLength: 253 MinLength: 1 |
|
namespace Namespace |
Namespace is the namespace of the referent. If not present, the namespace of the referent is assumed to be the same as the namespace of the referring object. |
MaxLength: 63 MinLength: 1 Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$ |
PoolObjectReference¶
PoolObjectReference identifies an API object within the namespace of the referrer.
Appears in: - InferenceObjectiveSpec
Field | Description | Default | Validation |
---|---|---|---|
group Group |
Group is the group of the referent. | inference.networking.k8s.io | MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ |
kind Kind |
Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ |
name ObjectName |
Name is the name of the referent. | MaxLength: 253 MinLength: 1 Required: {} |
PoolStatus¶
PoolStatus defines the observed state of InferencePool from a Gateway.
Appears in: - InferencePoolStatus
Field | Description | Default | Validation |
---|---|---|---|
parentRef ParentGatewayReference |
GatewayRef indicates the gateway that observed state of InferencePool. | ||
conditions Condition array |
Conditions track the state of the InferencePool. Known condition types are: "Accepted" "ResolvedRefs" |
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] | MaxItems: 8 |
PortNumber¶
Underlying type: integer
PortNumber defines a network port.
Validation: - Maximum: 65535 - Minimum: 1
Appears in: - Extension