[k8s源碼分析][kube-scheduler]scheduler之自定義調度器(1)

1. 前言

轉載請說明原文出處, 尊重他人勞動成果!

本文將分析kube-scheduler的自定義調度器, 本文針對的是可以自己選擇合適的預選和優選方法.
源碼位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)

2. 不帶擴展方法

architecture.png

2.1 例子

集群安裝可以參考k8s源碼編譯以及二進制安裝(用于源碼開發調試版).

2.2.1 準備配置文件

因為需要自定義預選和優選方法以及擴展方法, 所以肯定需要配置文件(schedulerConfig.yaml)

apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
schedulerName: my-scheduler
algorithmSource:
  policy:
    file:
      path: policy.yaml
leaderElection:
  leaderElect: true
  lockObjectName: my-scheduler
  lockObjectNamespace: kube-system

下面是policy文件.

{
  "kind" : "Policy",
  "apiVersion" : "v1",
  "predicates" : [
  {"name" : "PodFitsHostPorts"},
  {"name" : "PodFitsResources"},
  {"name" : "NoDiskConflict"},
  {"name" : "MatchNodeSelector"},
  {"name" : "HostName"}
  ],
  "priorities" : [
  {"name" : "LeastRequestedPriority", "weight" : 1},
  {"name" : "BalancedResourceAllocation", "weight" : 1},
  {"name" : "ServiceSpreadingPriority", "weight" : 1},
  {"name" : "EqualPriority", "weight" : 1}
  ],
  "hardPodAffinitySymmetricWeight" : 10
}
2.1.2 運行以及測試

接著重新啟動kube-scheduler用如下命令./kube-scheduler --master=http://localhost:8080 --config=schedulerConfig.yaml. (關于schedulerConfig.yamlpolicy.yaml自行修改就行)

接著部署一個帶有schedulerNamepod和一個不帶有schedulerNamepod.

[root@master kubectl]# cat pod-scheduler.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: test-schduler
spec:
  schedulerName: my-scheduler
  containers:
  - name: podtest-scheduler
    image: nginx
    ports:
    - containerPort: 80
[root@master kubectl]# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
  - name: podtest
    image: nginx
    ports:
    - containerPort: 80
[root@master kubectl]# ./kubectl apply -f pod-scheduler.yaml 
[root@master kubectl]# ./kubectl apply -f pod.yaml 
[root@master kubectl]# ./kubectl get pods 
NAME            READY   STATUS    RESTARTS   AGE
test            0/1     Pending   0          83s
test-schduler   1/1     Running   0          13m

可以看到帶有schedulerNamepod, 也就是test-scheduler已經成功運行.

[root@master kubectl]# ./kubectl describe pod test-scheduler
...
Events:
  Type     Reason             Age                 From                  Message
  ----     ------             ----                ----                  -------
  Normal   Scheduled          12m                 my-scheduler          Successfully assigned default/test-schduler to 172.21.0.12
  Normal   Pulling            12m                 kubelet, 172.21.0.12  pulling image "nginx"
  Normal   Pulled             11m                 kubelet, 172.21.0.12  Successfully pulled image "nginx"
  Normal   Created            11m                 kubelet, 172.21.0.12  Created container
  Normal   Started            11m                 kubelet, 172.21.0.12  Started container
  Warning  MissingClusterDNS  62s (x12 over 12m)  kubelet, 172.21.0.12  pod: "test-schduler_default(213933b8-efda-11e9-9434-525400d54f7e)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.

而沒有帶有schedulerNamepod, 也就是test一直處于pending狀態, 因為沒有設置schedulerName的情況下默認使用k8s默認的調度器, 但是默認的調度器目前沒有啟動, 所以無法調度. 如果此時, 再啟動一個不帶config參數的調度器, 那就該pod就會被調度. 關于默認調度器可以參考 [k8s源碼分析][kube-scheduler]scheduler/algorithmprovider之注冊default-scheduler

2.2 源碼分析

2.2.1 解析文件

其實該部分的源碼大部分已經在[k8s源碼分析][kube-scheduler]scheduler之啟動run(1) 中已經分析了, 所以這里就盡量從簡. 解析kube-scheduler中的config如下.

NewSchedulerCommand -> runCommand -> opts.Config() -> o.ApplyTo(c) 

所以最終是到這里會來加載config文件中的內容.

// cmd/kube-scheduler/app/options/options.go

func (o *Options) ApplyTo(c *schedulerappconfig.Config) error {
    // 如果kube-scheduler 沒有指定--config 就是從默認配置(o.ComponentConfig)拿
    if len(o.ConfigFile) == 0 {
        ...
    } else {
        // 如果kube-scheduler 指定了--config 那就會從配置文件中取
        cfg, err := loadConfigFromFile(o.ConfigFile)
        if err != nil {
            return err
        }

        // use the loaded config file only, with the exception of --address and --port. This means that
        // none of the deprectated flags in o.Deprecated are taken into consideration. This is the old
        // behaviour of the flags we have to keep.
        c.ComponentConfig = *cfg

        if err := o.CombinedInsecureServing.ApplyToFromLoadedConfig(c, &c.ComponentConfig); err != nil {
            return err
        }
    }
    ...
}

// cmd/kube-scheduler/app/options/configfile.go

func loadConfigFromFile(file string) (*kubeschedulerconfig.KubeSchedulerConfiguration, error) {
    data, err := ioutil.ReadFile(file)
    if err != nil {
        return nil, err
    }

    return loadConfig(data)
}

func loadConfig(data []byte) (*kubeschedulerconfig.KubeSchedulerConfiguration, error) {
    configObj := &kubeschedulerconfig.KubeSchedulerConfiguration{}
    if err := runtime.DecodeInto(kubeschedulerscheme.Codecs.UniversalDecoder(), data, configObj); err != nil {
        return nil, err
    }

    return configObj, nil
}

進而把config(schedulerConfig.yaml)中的內容轉化成一個kubeschedulerconfig.KubeSchedulerConfiguration對象如下:

// pkg/scheduler/apis/config/types.go 

type KubeSchedulerConfiguration struct {
    metav1.TypeMeta
    // SchedulerName is name of the scheduler, used to select which pods
    // will be processed by this scheduler, based on pod's "spec.SchedulerName".
    SchedulerName string
    // AlgorithmSource specifies the scheduler algorithm source.
    AlgorithmSource SchedulerAlgorithmSource
    // RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule
    // corresponding to every RequiredDuringScheduling affinity rule.
    // HardPodAffinitySymmetricWeight represents the weight of implicit PreferredDuringScheduling affinity rule, in the range 0-100.
    HardPodAffinitySymmetricWeight int32
    // 高可用的時候會分析
    LeaderElection KubeSchedulerLeaderElectionConfiguration
    ClientConnection apimachineryconfig.ClientConnectionConfiguration
    // defaulting to 0.0.0.0:10251
    HealthzBindAddress string
    // serve on, defaulting to 0.0.0.0:10251.
    MetricsBindAddress string
    apiserverconfig.DebuggingConfiguration
    // 是否禁止搶占
    DisablePreemption bool
    PercentageOfNodesToScore int32
    FailureDomains string
    BindTimeoutSeconds *int64
}

type SchedulerAlgorithmSource struct {
    // Policy 策略
    Policy *SchedulerPolicySource
    // Provider is the name of a scheduling algorithm provider to use.
    Provider *string
}
type SchedulerPolicySource struct {
    // 從文件中讀
    File *SchedulerPolicyFileSource
    // 從configmap中讀
    ConfigMap *SchedulerPolicyConfigMapSource
}

// 高可用
type KubeSchedulerLeaderElectionConfiguration struct {
    apiserverconfig.LeaderElectionConfiguration
    // LockObjectNamespace defines the namespace of the lock object
    LockObjectNamespace string
    // LockObjectName defines the lock object name
    LockObjectName string
}
type LeaderElectionConfiguration struct {
    LeaderElect bool
    LeaseDuration metav1.Duration
    RenewDeadline metav1.Duration
    RetryPeriod metav1.Duration
    ResourceLock string
}

// k8s.io/apimachinery/pkg/apis/meta/v1/types.go
type TypeMeta struct {
    Kind string `json:"kind,omitempty" protobuf:"bytes,1,opt,name=kind"`
    APIVersion string `json:"apiVersion,omitempty" protobuf:"bytes,2,opt,name=apiVersion"`
}

可以看到c.ComponentConfig = *cfg就是這個schedulerConfig.yaml所轉化的kubeschedulerconfig.KubeSchedulerConfiguration.

2.2.2 解析algorithmSource

接著就是

runCommand -> Run(cc, stopCh) -> scheduler.New```

注意: scheduler.New傳進來的kubeschedulerconfig.SchedulerAlgorithmSource就是cc.ComponentConfig.AlgorithmSource也就是schedulerConfig.yaml中的algorithmSource.

// New returns a Scheduler
func New(client clientset.Interface,
    ...
    schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
    stopCh <-chan struct{},
    opts ...func(o *schedulerOptions)) (*Scheduler, error) {
    ...
    var config *factory.Config
    source := schedulerAlgorithmSource
    switch {
    case source.Provider != nil:
        // 默認調度器會進入到這里 *source.Provider = DefaultProvider
        ...
    case source.Policy != nil:
        // 自定義調度器會進入到這里
        // Create the config from a user specified policy source.
        policy := &schedulerapi.Policy{}
        switch {
        case source.Policy.File != nil:
            if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
                return nil, err
            }
        case source.Policy.ConfigMap != nil:
            if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
                return nil, err
            }
        }
        sc, err := configurator.CreateFromConfig(*policy)
        if err != nil {
            return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
        }
        config = sc
    default:
        return nil, fmt.Errorf("unsupported algorithm source: %v", source)
    }
    ...
}

可以到initPolicyFromFile方法根據source.Policy.File.Path也就是policy.yaml的路徑讀取內容并進行解析, 然后轉化成schedulerapi.Policy對象.

type Policy struct {
    metav1.TypeMeta
    Predicates []PredicatePolicy
    Priorities []PriorityPolicy
    ExtenderConfigs []ExtenderConfig
    HardPodAffinitySymmetricWeight int32
    AlwaysCheckAllPredicates bool
}
type PredicatePolicy struct {
    Name string
    Argument *PredicateArgument
}
type PriorityPolicy struct {
    Name string
    Weight int
    Argument *PriorityArgument
}
type PredicateArgument struct {
    ServiceAffinity *ServiceAffinity
    LabelsPresence *LabelsPresence
}
type PriorityArgument struct {
    ServiceAntiAffinity *ServiceAntiAffinity
    LabelPreference *LabelPreference
    RequestedToCapacityRatioArguments *RequestedToCapacityRatioArguments
}
type ExtenderConfig struct {
    URLPrefix string
    FilterVerb string
    PreemptVerb string
    PrioritizeVerb string
    Weight int
    BindVerb string
    EnableHTTPS bool
    TLSConfig *restclient.TLSClientConfig
    HTTPTimeout time.Duration
    NodeCacheCapable bool
    ManagedResources []ExtenderManagedResource
    Ignorable bool
}

關于配置擴展方法的也列出來了, 不用多說, 結構體中的內容與yaml內容對應解析的.

2.2.3 根據policy生成factory.config
func (c *configFactory) CreateFromConfig(policy schedulerapi.Policy) (*Config, error) {
    klog.V(2).Infof("Creating scheduler from configuration: %v", policy)

    // validate the policy configuration
    if err := validation.ValidatePolicy(policy); err != nil {
        return nil, err
    }

    // 生成預選方法的key
    // 如果沒有 就那默認的那些預選方法的key
    predicateKeys := sets.NewString()
    if policy.Predicates == nil {
        klog.V(2).Infof("Using predicates from algorithm provider '%v'", DefaultProvider)
        provider, err := GetAlgorithmProvider(DefaultProvider)
        if err != nil {
            return nil, err
        }
        predicateKeys = provider.FitPredicateKeys
    } else {
        for _, predicate := range policy.Predicates {
            klog.V(2).Infof("Registering predicate: %s", predicate.Name)
            predicateKeys.Insert(RegisterCustomFitPredicate(predicate))
        }
    }

    // 生成優選方法的key
    // 如果沒有 就那默認的那些優選方法的key
    priorityKeys := sets.NewString()
    if policy.Priorities == nil {
        klog.V(2).Infof("Using priorities from algorithm provider '%v'", DefaultProvider)
        provider, err := GetAlgorithmProvider(DefaultProvider)
        if err != nil {
            return nil, err
        }
        priorityKeys = provider.PriorityFunctionKeys
    } else {
        for _, priority := range policy.Priorities {
            klog.V(2).Infof("Registering priority: %s", priority.Name)
            priorityKeys.Insert(RegisterCustomPriorityFunction(priority))
        }
    }

    // 生成擴展
    var extenders []algorithm.SchedulerExtender
    if len(policy.ExtenderConfigs) != 0 {
        ignoredExtendedResources := sets.NewString()
        for ii := range policy.ExtenderConfigs {
            klog.V(2).Infof("Creating extender with config %+v", policy.ExtenderConfigs[ii])
            extender, err := core.NewHTTPExtender(&policy.ExtenderConfigs[ii])
            if err != nil {
                return nil, err
            }
            extenders = append(extenders, extender)
            for _, r := range policy.ExtenderConfigs[ii].ManagedResources {
                if r.IgnoredByScheduler {
                    ignoredExtendedResources.Insert(string(r.Name))
                }
            }
        }
        predicates.RegisterPredicateMetadataProducerWithExtendedResourceOptions(ignoredExtendedResources)
    }
    // Providing HardPodAffinitySymmetricWeight in the policy config is the new and preferred way of providing the value.
    // Give it higher precedence than scheduler CLI configuration when it is provided.
    if policy.HardPodAffinitySymmetricWeight != 0 {
        c.hardPodAffinitySymmetricWeight = policy.HardPodAffinitySymmetricWeight
    }
    // When AlwaysCheckAllPredicates is set to true, scheduler checks all the configured
    // predicates even after one or more of them fails.
    if policy.AlwaysCheckAllPredicates {
        c.alwaysCheckAllPredicates = policy.AlwaysCheckAllPredicates
    }

    // 根據預選, 優選, 擴展方法進行生成config
    return c.CreateFromKeys(predicateKeys, priorityKeys, extenders)
}

CreateFromKeys[k8s源碼分析][kube-scheduler]scheduler之啟動run(1) 已經分析過了, 這個主要注重一下extenders.

func (c *configFactory) CreateFromKeys(predicateKeys, priorityKeys sets.String, extenders []algorithm.SchedulerExtender) (*Config, error) {
    klog.V(2).Infof("Creating scheduler with fit predicates '%v' and priority functions '%v'", predicateKeys, priorityKeys)

    if c.GetHardPodAffinitySymmetricWeight() < 1 || c.GetHardPodAffinitySymmetricWeight() > 100 {
        return nil, fmt.Errorf("invalid hardPodAffinitySymmetricWeight: %d, must be in the range 1-100", c.GetHardPodAffinitySymmetricWeight())
    }

    // 根據當前的預選key得到所有的預選方法
    predicateFuncs, err := c.GetPredicates(predicateKeys)
    if err != nil {
        return nil, err
    }

    // 根據當前的優選key得到所有的優選方法
    priorityConfigs, err := c.GetPriorityFunctionConfigs(priorityKeys)
    if err != nil {
        return nil, err
    }

    // priorityMetaProducer 在算分的時候會用到
    priorityMetaProducer, err := c.GetPriorityMetadataProducer()
    if err != nil {
        return nil, err
    }
    // predicateMetaProducer 在真正預選的時候會用到
    predicateMetaProducer, err := c.GetPredicateMetadataProducer()
    if err != nil {
        return nil, err
    }

    // 是否打開了加速predicate的equivalence class cache
    // Init equivalence class cache
    if c.enableEquivalenceClassCache {
        c.equivalencePodCache = equivalence.NewCache(predicates.Ordering())
        klog.Info("Created equivalence class cache")
    }

    // 生成真正進行調度計算的Algorithm algorithm.ScheduleAlgorithm
    algo := core.NewGenericScheduler(
        c.schedulerCache,
        c.equivalencePodCache,
        c.podQueue,
        predicateFuncs,
        predicateMetaProducer,
        priorityConfigs,
        priorityMetaProducer,
        extenders,
        c.volumeBinder,
        c.pVCLister,
        c.pdbLister,
        c.alwaysCheckAllPredicates,
        c.disablePreemption,
        c.percentageOfNodesToScore,
    )
    ...
}

3. 總結

本文使用了一個例子來說明自定義調度器是如何使用, 比如可以自己確定使用哪些預選和優選方法, 后面會繼續分析如何使用自己擴展的預選和優選方法.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,119評論 6 531
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,382評論 3 415
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,038評論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,853評論 1 309
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,616評論 6 408
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,112評論 1 323
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,192評論 3 441
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,355評論 0 288
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,869評論 1 334
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,727評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,928評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,467評論 5 358
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,165評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,570評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,813評論 1 282
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,585評論 3 390
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,892評論 2 372