1. 前言
轉載請說明原文出處, 尊重他人勞動成果!
本文將分析
kube-scheduler
的自定義調度器, 本文針對的是可以自己選擇合適的預選和優選方法.
源碼位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)
2. 不帶擴展方法
2.1 例子
集群安裝可以參考k8s源碼編譯以及二進制安裝(用于源碼開發調試版).
2.2.1 準備配置文件
因為需要自定義預選和優選方法以及擴展方法, 所以肯定需要配置文件(
schedulerConfig.yaml
)
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
schedulerName: my-scheduler
algorithmSource:
policy:
file:
path: policy.yaml
leaderElection:
leaderElect: true
lockObjectName: my-scheduler
lockObjectNamespace: kube-system
下面是
policy
文件.
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "PodFitsHostPorts"},
{"name" : "PodFitsResources"},
{"name" : "NoDiskConflict"},
{"name" : "MatchNodeSelector"},
{"name" : "HostName"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
],
"hardPodAffinitySymmetricWeight" : 10
}
2.1.2 運行以及測試
接著重新啟動
kube-scheduler
用如下命令./kube-scheduler --master=http://localhost:8080 --config=schedulerConfig.yaml
. (關于schedulerConfig.yaml
和policy.yaml
自行修改就行)
接著部署一個帶有
schedulerName
的pod
和一個不帶有schedulerName
的pod
.
[root@master kubectl]# cat pod-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-schduler
spec:
schedulerName: my-scheduler
containers:
- name: podtest-scheduler
image: nginx
ports:
- containerPort: 80
[root@master kubectl]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: podtest
image: nginx
ports:
- containerPort: 80
[root@master kubectl]# ./kubectl apply -f pod-scheduler.yaml
[root@master kubectl]# ./kubectl apply -f pod.yaml
[root@master kubectl]# ./kubectl get pods
NAME READY STATUS RESTARTS AGE
test 0/1 Pending 0 83s
test-schduler 1/1 Running 0 13m
可以看到帶有
schedulerName
的pod
, 也就是test-scheduler
已經成功運行.
[root@master kubectl]# ./kubectl describe pod test-scheduler
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m my-scheduler Successfully assigned default/test-schduler to 172.21.0.12
Normal Pulling 12m kubelet, 172.21.0.12 pulling image "nginx"
Normal Pulled 11m kubelet, 172.21.0.12 Successfully pulled image "nginx"
Normal Created 11m kubelet, 172.21.0.12 Created container
Normal Started 11m kubelet, 172.21.0.12 Started container
Warning MissingClusterDNS 62s (x12 over 12m) kubelet, 172.21.0.12 pod: "test-schduler_default(213933b8-efda-11e9-9434-525400d54f7e)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
而沒有帶有
schedulerName
的pod
, 也就是test
一直處于pending
狀態, 因為沒有設置schedulerName
的情況下默認使用k8s
默認的調度器, 但是默認的調度器目前沒有啟動, 所以無法調度. 如果此時, 再啟動一個不帶config
參數的調度器, 那就該pod
就會被調度. 關于默認調度器可以參考 [k8s源碼分析][kube-scheduler]scheduler/algorithmprovider之注冊default-scheduler
2.2 源碼分析
2.2.1 解析文件
其實該部分的源碼大部分已經在[k8s源碼分析][kube-scheduler]scheduler之啟動run(1) 中已經分析了, 所以這里就盡量從簡. 解析
kube-scheduler
中的config
如下.
NewSchedulerCommand -> runCommand -> opts.Config() -> o.ApplyTo(c)
所以最終是到這里會來加載
config
文件中的內容.
// cmd/kube-scheduler/app/options/options.go
func (o *Options) ApplyTo(c *schedulerappconfig.Config) error {
// 如果kube-scheduler 沒有指定--config 就是從默認配置(o.ComponentConfig)拿
if len(o.ConfigFile) == 0 {
...
} else {
// 如果kube-scheduler 指定了--config 那就會從配置文件中取
cfg, err := loadConfigFromFile(o.ConfigFile)
if err != nil {
return err
}
// use the loaded config file only, with the exception of --address and --port. This means that
// none of the deprectated flags in o.Deprecated are taken into consideration. This is the old
// behaviour of the flags we have to keep.
c.ComponentConfig = *cfg
if err := o.CombinedInsecureServing.ApplyToFromLoadedConfig(c, &c.ComponentConfig); err != nil {
return err
}
}
...
}
// cmd/kube-scheduler/app/options/configfile.go
func loadConfigFromFile(file string) (*kubeschedulerconfig.KubeSchedulerConfiguration, error) {
data, err := ioutil.ReadFile(file)
if err != nil {
return nil, err
}
return loadConfig(data)
}
func loadConfig(data []byte) (*kubeschedulerconfig.KubeSchedulerConfiguration, error) {
configObj := &kubeschedulerconfig.KubeSchedulerConfiguration{}
if err := runtime.DecodeInto(kubeschedulerscheme.Codecs.UniversalDecoder(), data, configObj); err != nil {
return nil, err
}
return configObj, nil
}
進而把
config
(schedulerConfig.yaml
)中的內容轉化成一個kubeschedulerconfig.KubeSchedulerConfiguration
對象如下:
// pkg/scheduler/apis/config/types.go
type KubeSchedulerConfiguration struct {
metav1.TypeMeta
// SchedulerName is name of the scheduler, used to select which pods
// will be processed by this scheduler, based on pod's "spec.SchedulerName".
SchedulerName string
// AlgorithmSource specifies the scheduler algorithm source.
AlgorithmSource SchedulerAlgorithmSource
// RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule
// corresponding to every RequiredDuringScheduling affinity rule.
// HardPodAffinitySymmetricWeight represents the weight of implicit PreferredDuringScheduling affinity rule, in the range 0-100.
HardPodAffinitySymmetricWeight int32
// 高可用的時候會分析
LeaderElection KubeSchedulerLeaderElectionConfiguration
ClientConnection apimachineryconfig.ClientConnectionConfiguration
// defaulting to 0.0.0.0:10251
HealthzBindAddress string
// serve on, defaulting to 0.0.0.0:10251.
MetricsBindAddress string
apiserverconfig.DebuggingConfiguration
// 是否禁止搶占
DisablePreemption bool
PercentageOfNodesToScore int32
FailureDomains string
BindTimeoutSeconds *int64
}
type SchedulerAlgorithmSource struct {
// Policy 策略
Policy *SchedulerPolicySource
// Provider is the name of a scheduling algorithm provider to use.
Provider *string
}
type SchedulerPolicySource struct {
// 從文件中讀
File *SchedulerPolicyFileSource
// 從configmap中讀
ConfigMap *SchedulerPolicyConfigMapSource
}
// 高可用
type KubeSchedulerLeaderElectionConfiguration struct {
apiserverconfig.LeaderElectionConfiguration
// LockObjectNamespace defines the namespace of the lock object
LockObjectNamespace string
// LockObjectName defines the lock object name
LockObjectName string
}
type LeaderElectionConfiguration struct {
LeaderElect bool
LeaseDuration metav1.Duration
RenewDeadline metav1.Duration
RetryPeriod metav1.Duration
ResourceLock string
}
// k8s.io/apimachinery/pkg/apis/meta/v1/types.go
type TypeMeta struct {
Kind string `json:"kind,omitempty" protobuf:"bytes,1,opt,name=kind"`
APIVersion string `json:"apiVersion,omitempty" protobuf:"bytes,2,opt,name=apiVersion"`
}
可以看到
c.ComponentConfig = *cfg
就是這個schedulerConfig.yaml
所轉化的kubeschedulerconfig.KubeSchedulerConfiguration
.
2.2.2 解析algorithmSource
接著就是
runCommand -> Run(cc, stopCh) -> scheduler.New```
注意:
scheduler.New
傳進來的kubeschedulerconfig.SchedulerAlgorithmSource
就是cc.ComponentConfig.AlgorithmSource
也就是schedulerConfig.yaml
中的algorithmSource
.
// New returns a Scheduler
func New(client clientset.Interface,
...
schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
stopCh <-chan struct{},
opts ...func(o *schedulerOptions)) (*Scheduler, error) {
...
var config *factory.Config
source := schedulerAlgorithmSource
switch {
case source.Provider != nil:
// 默認調度器會進入到這里 *source.Provider = DefaultProvider
...
case source.Policy != nil:
// 自定義調度器會進入到這里
// Create the config from a user specified policy source.
policy := &schedulerapi.Policy{}
switch {
case source.Policy.File != nil:
if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
return nil, err
}
case source.Policy.ConfigMap != nil:
if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
return nil, err
}
}
sc, err := configurator.CreateFromConfig(*policy)
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
}
config = sc
default:
return nil, fmt.Errorf("unsupported algorithm source: %v", source)
}
...
}
可以到
initPolicyFromFile
方法根據source.Policy.File.Path
也就是policy.yaml
的路徑讀取內容并進行解析, 然后轉化成schedulerapi.Policy
對象.
type Policy struct {
metav1.TypeMeta
Predicates []PredicatePolicy
Priorities []PriorityPolicy
ExtenderConfigs []ExtenderConfig
HardPodAffinitySymmetricWeight int32
AlwaysCheckAllPredicates bool
}
type PredicatePolicy struct {
Name string
Argument *PredicateArgument
}
type PriorityPolicy struct {
Name string
Weight int
Argument *PriorityArgument
}
type PredicateArgument struct {
ServiceAffinity *ServiceAffinity
LabelsPresence *LabelsPresence
}
type PriorityArgument struct {
ServiceAntiAffinity *ServiceAntiAffinity
LabelPreference *LabelPreference
RequestedToCapacityRatioArguments *RequestedToCapacityRatioArguments
}
type ExtenderConfig struct {
URLPrefix string
FilterVerb string
PreemptVerb string
PrioritizeVerb string
Weight int
BindVerb string
EnableHTTPS bool
TLSConfig *restclient.TLSClientConfig
HTTPTimeout time.Duration
NodeCacheCapable bool
ManagedResources []ExtenderManagedResource
Ignorable bool
}
關于配置擴展方法的也列出來了, 不用多說, 結構體中的內容與
yaml
內容對應解析的.
2.2.3 根據policy生成factory.config
func (c *configFactory) CreateFromConfig(policy schedulerapi.Policy) (*Config, error) {
klog.V(2).Infof("Creating scheduler from configuration: %v", policy)
// validate the policy configuration
if err := validation.ValidatePolicy(policy); err != nil {
return nil, err
}
// 生成預選方法的key
// 如果沒有 就那默認的那些預選方法的key
predicateKeys := sets.NewString()
if policy.Predicates == nil {
klog.V(2).Infof("Using predicates from algorithm provider '%v'", DefaultProvider)
provider, err := GetAlgorithmProvider(DefaultProvider)
if err != nil {
return nil, err
}
predicateKeys = provider.FitPredicateKeys
} else {
for _, predicate := range policy.Predicates {
klog.V(2).Infof("Registering predicate: %s", predicate.Name)
predicateKeys.Insert(RegisterCustomFitPredicate(predicate))
}
}
// 生成優選方法的key
// 如果沒有 就那默認的那些優選方法的key
priorityKeys := sets.NewString()
if policy.Priorities == nil {
klog.V(2).Infof("Using priorities from algorithm provider '%v'", DefaultProvider)
provider, err := GetAlgorithmProvider(DefaultProvider)
if err != nil {
return nil, err
}
priorityKeys = provider.PriorityFunctionKeys
} else {
for _, priority := range policy.Priorities {
klog.V(2).Infof("Registering priority: %s", priority.Name)
priorityKeys.Insert(RegisterCustomPriorityFunction(priority))
}
}
// 生成擴展
var extenders []algorithm.SchedulerExtender
if len(policy.ExtenderConfigs) != 0 {
ignoredExtendedResources := sets.NewString()
for ii := range policy.ExtenderConfigs {
klog.V(2).Infof("Creating extender with config %+v", policy.ExtenderConfigs[ii])
extender, err := core.NewHTTPExtender(&policy.ExtenderConfigs[ii])
if err != nil {
return nil, err
}
extenders = append(extenders, extender)
for _, r := range policy.ExtenderConfigs[ii].ManagedResources {
if r.IgnoredByScheduler {
ignoredExtendedResources.Insert(string(r.Name))
}
}
}
predicates.RegisterPredicateMetadataProducerWithExtendedResourceOptions(ignoredExtendedResources)
}
// Providing HardPodAffinitySymmetricWeight in the policy config is the new and preferred way of providing the value.
// Give it higher precedence than scheduler CLI configuration when it is provided.
if policy.HardPodAffinitySymmetricWeight != 0 {
c.hardPodAffinitySymmetricWeight = policy.HardPodAffinitySymmetricWeight
}
// When AlwaysCheckAllPredicates is set to true, scheduler checks all the configured
// predicates even after one or more of them fails.
if policy.AlwaysCheckAllPredicates {
c.alwaysCheckAllPredicates = policy.AlwaysCheckAllPredicates
}
// 根據預選, 優選, 擴展方法進行生成config
return c.CreateFromKeys(predicateKeys, priorityKeys, extenders)
}
CreateFromKeys
在[k8s源碼分析][kube-scheduler]scheduler之啟動run(1) 已經分析過了, 這個主要注重一下extenders
.
func (c *configFactory) CreateFromKeys(predicateKeys, priorityKeys sets.String, extenders []algorithm.SchedulerExtender) (*Config, error) {
klog.V(2).Infof("Creating scheduler with fit predicates '%v' and priority functions '%v'", predicateKeys, priorityKeys)
if c.GetHardPodAffinitySymmetricWeight() < 1 || c.GetHardPodAffinitySymmetricWeight() > 100 {
return nil, fmt.Errorf("invalid hardPodAffinitySymmetricWeight: %d, must be in the range 1-100", c.GetHardPodAffinitySymmetricWeight())
}
// 根據當前的預選key得到所有的預選方法
predicateFuncs, err := c.GetPredicates(predicateKeys)
if err != nil {
return nil, err
}
// 根據當前的優選key得到所有的優選方法
priorityConfigs, err := c.GetPriorityFunctionConfigs(priorityKeys)
if err != nil {
return nil, err
}
// priorityMetaProducer 在算分的時候會用到
priorityMetaProducer, err := c.GetPriorityMetadataProducer()
if err != nil {
return nil, err
}
// predicateMetaProducer 在真正預選的時候會用到
predicateMetaProducer, err := c.GetPredicateMetadataProducer()
if err != nil {
return nil, err
}
// 是否打開了加速predicate的equivalence class cache
// Init equivalence class cache
if c.enableEquivalenceClassCache {
c.equivalencePodCache = equivalence.NewCache(predicates.Ordering())
klog.Info("Created equivalence class cache")
}
// 生成真正進行調度計算的Algorithm algorithm.ScheduleAlgorithm
algo := core.NewGenericScheduler(
c.schedulerCache,
c.equivalencePodCache,
c.podQueue,
predicateFuncs,
predicateMetaProducer,
priorityConfigs,
priorityMetaProducer,
extenders,
c.volumeBinder,
c.pVCLister,
c.pdbLister,
c.alwaysCheckAllPredicates,
c.disablePreemption,
c.percentageOfNodesToScore,
)
...
}
3. 總結
本文使用了一個例子來說明自定義調度器是如何使用, 比如可以自己確定使用哪些預選和優選方法, 后面會繼續分析如何使用自己擴展的預選和優選方法.