MaaS Part 3 - Remaining Issues
In part 1 and part 2, I’ve got the basis for a simple network-based hardware provisioning system. However, it has some issues that I’d like to list here to drive future improvement efforts.
Update: part 4 addresses several of these issues plus some new ones using Ansible-Container, some shell scripting, and Inspec.
-
Lack of central inventory of systems and asset attributes
There is no central place that defines an asset and its attributes like hostname, mac address, system type, CPU, RAM, Disks, etc. As Ansible is typically my go-to tool of choice for lightweight orchestration and configuration management, it seems logical to define these in an inventory file and vars file.
-
Hardcoded paths and configuration files/settings in Docker containers
While Docker containers help make what’s needed explicit in terms of configuration settings, it’s difficult to avoid embedding paths that change per deployment. What if I don’t like my TFTP root as
/nbi
? Right now, it’d be very painful to change that path and re-test everything. I’m curious to see if Ansible-Container helps take the sting out of making templated configuration files inside Docker containers. -
Hardcoded DHCP Leases
Because I have a fixed set of hardware, this is a one-time effort. However, it fails to scale at any size beyond where it is now. These static reservations should be generated from the asset system or leases be dynamic and have the IP updates tied into DNS directly. A quick looping Ansible playbook to generate these from the inventory/vars files could do the trick.
-
Individually crafted Kickstart configuration files
While 95% of each kickstart file is identical, templates with a couple variables from the asset system generating the final kickstart files would be better. Another quick looping Ansible playbook to generate these from the inventory/vars files would solve this.
-
SELinux is disabled on
deploy
This was a matter of avoiding very difficult to debug issues during rapid iteration. It’s not “production ready” unless it has an SELinux policy that aligns with the needs of the app, and this one just needs some time and elbow grease.
-
One version of one operating system supported
I’m standardizing on Centos 7.x here, but an eventuality will be to support newer releases as time goes on. Assuming all the other issues are handled better with Ansible roles/playbooks, this problem most likely becomes much easier, too.
-
Logging from the containers
I want to know what’s happening in terms of DHCP giving out leases, TFTP files downloaded, Kickstart configs downloaded, and general errors when they happen. This will require more investigation into best practices.