1. 概述

Everything about cloud-init, a set of python scripts and utilities to make your cloud images be all they can be!


1.1 用户可配置


  • user-data string
  • user-data file

1.2 功能检测

可以通过features list确认当前cloud-init支持的功能,该列表保存于cloudinit.version.FEATURES,目前已定义的有:

1.3 Boot Stages

为了能够提供cloud-init所执行的功能,必须以相当可控的方式将cloud-init集成到系统引导中。 总共有5个阶段:Generator、Local、Network、Config、Final。这里简单提一下,方便理解命令行接口。

1.4 命令行接口


cloud-init init


  • –local: Run init-local stage instead of init.

cloud-init modules

cloud-init有5个boot stages,其中Network、Config、Final三个阶段需要执行的modules被声明在/etc/cloud/cloud.cfg 文件中对应的关键字下面。也可以通过命令行执行,但由于/var/lib/cloud/中的信号量,每个module只能运行一次 。

  • –mode (init|config|final): Run modules:initmodules:config or modules:final cloud-init stages.

2. User-Data格式

  • Gzip Compressed Content
  • Mime Multi Part Archive
  • User-Data Script (Begins with: #! or Content-Type: text/x-shellscript when using a MIME archive.)
  • Include File (Begins with: #include or Content-Type: text/x-include-url when using a MIME archive. )
  • Cloud Config Data (Begins with: #cloud-config or Content-Type: text/cloud-config when using a MIME archive.)
  • Upstart Job (Begins with: #upstart-job or Content-Type: text/upstart-job when using a MIME archive.)
  • Cloud Boothook
  • Part Handler
Cloud Config Data(cloud.cfg) Examples

3. Boot Stages

为了能够提供cloud-init所执行的功能,必须以相当可控的方式将cloud-init集成到系统引导中。 总共有5个阶段:

  • Generator
  • Local
  • Network
  • Config
  • Final

3.1 Generator


  • A file exists: /etc/cloud/cloud-init.disabled
  • The kernel command line as found in /proc/cmdline contains cloud-init=disabled. When running in a container, the kernel command line is not honored, but cloud-init will read an environment variable named KERNEL_CMDLINE in its place.

3.2 Local

  • systemd service: cloud-init-local.service
  • runs: As soon as possible with / mounted read-write.
  • blocks: as much of boot as possible, must block network bringup.
  • modules: none


  • 定位datasource
  • 将网络配置应用于系统


  • datasource
  • fallback
  • none (disabled)

3.3 Network

  • systemd service: cloud-init.service
  • runs: After local stage and configured networking is up.
  • blocks: As much of remaining boot as possible.
  • modules: cloud_init_modules in /etc/cloud/cloud.cfg

此阶段要求所有已配置的网络都处于online状态,因为它将完全处理找到的所有用户数据(可能会通过网络获取数据) 。


运行 cloud_init_modules in /etc/cloud/cloud.cfg

3.4 Config

  • systemd service: cloud-config.service
  • runs: After network stage.
  • blocks: None.
  • modules: cloud_config_modules in /etc/cloud/cloud.cfg

This stage runs config modules only. Modules that do not really have an effect on other stages of boot are run here.

运行 cloud_config_modules in /etc/cloud/cloud.cfg

3.5 Final

  • systemd service: cloud-final.service
  • runs: As final part of boot (traditional “rc.local”)
  • blocks: None.
  • modules: cloud_final_modules in /etc/cloud/cloud.cfg


  • package installations
  • configuration management plugins (puppet, chef, salt-minion)
  • user-scripts (including runcmd).
运行 cloud_final_modules in /etc/cloud/cloud.cfg

4. Datasources



  • files
  • yaml
  • shell scripts


  • server name
  • instance id
  • display name
  • other cloud specific details



cloud-init会保存所有的metadata、vendordata、userdata到 /run/cloud-init/instance-data.json文件中 。这个json文件就是instance-data,它会包含特定datasource才有的key和name,但是cloud-init维护了一组最小标准的keys,并在任何云上保持稳定。这些key出现在v1关键字下。任意datasource中被cloud-init消费的metadata被放在ds关键字下。

Below is an instance-data.json example from an OpenStack instance:

    "base64-encoded-keys": [
    "ds": {
        "ec2_metadata": {
            "ami-id": "ami-0000032f",
            "ami-launch-index": "0",
            "ami-manifest-path": "FIXME",
            "block-device-mapping": {
                "ami": "vda",
                "ephemeral0": "/dev/vdb",
                "root": "/dev/vda"
            "hostname": "xenial-test.novalocal",
            "instance-action": "none",
            "instance-id": "i-0006e030",
            "instance-type": "m1.small",
            "local-hostname": "xenial-test.novalocal",
            "local-ipv4": "",
            "placement": {
                "availability-zone": "None"
            "public-hostname": "xenial-test.novalocal",
            "public-ipv4": "",
            "reservation-id": "r-fxm623oa",
            "security-groups": "default"
        "meta-data": {
            "availability_zone": null,
            "devices": [],
            "hostname": "xenial-test.novalocal",
            "instance-id": "3e39d278-0644-4728-9479-678f9212d8f0",
            "launch_index": 0,
            "local-hostname": "xenial-test.novalocal",
            "name": "xenial-test",
            "project_id": "e0eb2d2538814...",
            "random_seed": "A6yPN...",
            "uuid": "3e39d278-0644-4728-9479-678f92..."
        "network_json": {
            "links": [{
                    "ethernet_mac_address": "fa:16:3e:7d:74:9b",
                    "id": "tap9ca524d5-6e",
                    "mtu": 8958,
                    "type": "ovs",
                    "vif_id": "9ca524d5-6e5a-4809-936a-6901..."
            "networks": [{
                    "id": "network0",
                    "link": "tap9ca524d5-6e",
                    "network_id": "c6adfc18-9753-42eb-b3ea-18b57e6b837f",
                    "type": "ipv4_dhcp"
            "services": [{
                    "address": "",
                    "type": "dns"
        "user-data": "I2Nsb3VkLWNvbmZpZ...",
        "vendor-data": null
    "v1": {
        "availability-zone": null,
        "cloud-name": "openstack",
        "instance-id": "3e39d278-0644-4728-9479-678f9212d8f0",
        "local-hostname": "xenial-test",
        "region": null

datasource api



# returns a mime multipart message that contains
# all the various fully-expanded components that
# were found from processing the raw userdata string
# - when filtering only the mime messages targeting
#   this instance id will be returned (or messages with
#   no instance id)
def get_userdata(self, apply_filter=False)

# returns the raw userdata string (or none)
def get_userdata_raw(self)

# returns a integer (or none) which can be used to identify
# this instance in a group of instances which are typically
# created from a single command, thus allowing programmatic
# filtering on this launch index (or other selective actions)
def launch_index(self)

# the data sources' config_obj is a cloud-config formatted
# object that came to it from ways other than cloud-config
# because cloud-config content would be handled elsewhere
def get_config_obj(self)

#returns a list of public ssh keys
def get_public_ssh_keys(self)

# translates a device 'short' name into the actual physical device
# fully qualified name (or none if said physical device is not attached
# or does not exist)
def device_name_to_device(self, name)

# gets the locale string this instance should be applying
# which typically used to adjust the instances locale settings files
def get_locale(self)

def availability_zone(self)

# gets the instance id that was assigned to this instance by the
# cloud provider or when said instance id does not exist in the backing
# metadata this will return 'iid-datasource'
def get_instance_id(self)

# gets the fully qualified domain name that this host should  be using
# when configuring network or hostname releated settings, typically
# assigned either by the cloud provider or the user creating the vm
def get_hostname(self, fqdn=False)

def get_package_mirror_info(self)



  • config drive
  • OpenStack
  • Amazon EC2

config drive

config driver类型的datasource支持了OpenStack的配置驱动磁盘。





Amazon EC2

亚马逊支持通过一个magic IP来获取metadata信息,其实OpenStack就是模仿了亚马逊,连IP地址都一样。需要网络online。相关配置请参考上述链接。

5. 目录结构


    - data/
       - instance-id
       - previous-instance-id
       - datasource
       - previous-datasource
       - previous-hostname
    - handlers/
    - instance
    - instances/
          - boot-finished
          - cloud-config.txt
          - datasource
          - handlers/
          - obj.pkl
          - scripts/
          - sem/
          - user-data.txt
          - user-data.txt.i
    - scripts/
       - per-boot/
       - per-instance/
       - per-once/
    - seed/
    - sem/




Contains information related to instance ids, datasources and hostnames of the previous and current instance if they are different. These can be examined as needed to determine any information related to a previous boot (if applicable).


Custom part-handlers code is written out here. Files that end up here are written out with in the scheme of part-handler-XYZ where XYZ is the handler number (the first handler found starts at 0).


A symlink to the current instances/ subdirectory that points to the currently active instance (which is active is dependent on the datasource loaded).


All instances that were created using this image end up with instance identifier subdirectories (and corresponding data for each instance). The currently active instance will be symlinked the instance symlink file defined previously.


Scripts that are downloaded/created by the corresponding part-handler will end up in one of these subdirectories.




Cloud-init has a concept of a module semaphore, which basically consists of the module name and its frequency. These files are used to ensure a module is only ran per-once, per-instance, per-always. This folder contains semaphore files which are only supposed to run per-once (not tied to the instance id).