Getting Started with DevOps

Getting Started with DevOps

A guide to learning DevOps and the most important technologies used

·

18 min read

DevOps is an emerging field in tech that has in recent years started to gain a lot of attraction from developers across the world, this, in turn, has led to a lot of people trying to get into this field and learn about it to add it as a new tech skill to their developer toolkits.

For most beginners, this domain is pretty confusing as various technologies are required to be successful in this field and new tools are being developed everyday.

Where do you start then? Why do we need so many technologies? Which technologies and tools should I learn? In what order should I learn them? How do these tools work together? These are questions that I myself wanted answered when I had started learning about DevOps.

Resources do exist about this topic to help beginners get started but a lot of them are not to-the-point and leave a lot of ambiguity which leave many newcomers confused, making them believe that DevOps is extremely complex and not a possible field for their development journey which is not at all the case.

So I thought why don't I try and take a stab at answering these questions?

Lets get started!

What is DevOps?

DevOps combines development and operations together in a single unified field and is simply a concept that consists of various practices and tools that try to improve any organizations ability to deliver services and applications faster to users.

It does so by automating most if not all of the processes tied to delivering an application so much so that as an example in order to host a website all a developer needs to do is push their commits to GitHub.

Conceptually it also deals with software development lifecycle models such as agile methods that are aimed at improving the rate of delivering applications.

At its core, as a DevOps enginner what would be expected of you is to create and maintain automated systems in the form of Continuous Integration (CI) and Continuous Deployment (CD) using various tools and technologies which deals with the steps upto writing the code, along with the steps needed after it until delivered to its users.

Why DevOps?

DevOps as a field allows you to learn about important concepts such as systems design along with teaching you how extremely large projects are maintained.

Also there is a mind blowing high demand of DevOps enginners which pay exorbitant amounts of money for doing work that is pretty minimal in nature. This is a great incentive for anyone wanting to earn huge amounts of money and gain new opportunities.

In short DevOps allows you to rake in a lot of money especially via remote work, has great career development potential, immense community and network building and the work is not at all hectic as compared to other development fields.

What is CI/CD?

CI/CD as also mentioned above stands for Continuous Integration and Continuous Deployment/Delivery. It is the backbone concept of any DevOps engineers work as a CI/CD pipeline is actually majority of the work that is required in DevOps.

It breaks the entire developer to user workflow into two components, one that deals with ensuring the maintainance of the software or application and the other deals with ensuring that the maintained application reaches to its users without fail all the while keeping in mind that the applications is still in development and future changes and updates are to be expected.

What is CI?

Continuous Integration or CI for short deals with automating the process of working together and collaboriting on an application, since there will be a lot of developers working on the application at a time, it is required to test the updated codebase, build the code, get the code coverage etc after every change either big or small.

If these changes were to be tested manually then it would require the developer to build the entire codebase on their system then run all tests and generate analysis data, vulnerability analysis etc on their own for every small change and would take up a lot of time if the project is anything bigger than hobby work.

So to save time, improve efficiency and maintain consistency we automate this entire process using various technologies and build a pipeline.

A pipeline is simply a sequence of steps that are executed one after the other and each step in the pipeline takes in input from the output of the previous step and gives output which is taken as input for the next step.

What is CD?

Continuous Delivery or Continuous Deployment deals with creating a sequence of steps that execute in order to release the application or project for the users to use.

If this sequence needs to be manually triggered it is referred to as Continuous Delivery, Continuous Deployment is the next step of Continuous Delivery automation in which the entire sequence is executed automatically and requires no manual intervention from anyone.

Otherwise if this process was to be done manually, someone would have to build and copy the entire code to the hosting service being used, then reconfigure the entire hosting or deployment settings and this is to be done for every new version/release/build/change/commit.

Again, if we automate these steps as well, it will save a lot of time and remove the scope of human error from our side for the most part if automated correctly.

Prior Knowledge Required?

This blog is aimed at complete newcomers to the topic but to fully understand and comprehend the things listed here a basic amout of knowledge regarding topics such as:

  • Git

  • GitHub

  • Basic project development knowledge (Web, Android, ML, Blockchain etc)

  • How testing works in development

  • Basic scripting (Python, Bash etc)

  • Linux

Even if these topics are not familiar to you right now you can go through and look into them once you are done with the blog and have a basic knowledge of what DevOps is and how it works, or if you want you could first go through the above mentioned points and then pick up the blog from here. Either option would be a good decision to take in my point view.

Tools for CI?

Our entire workflow will begin from a developers local setup where changes will be made to the codebase and then would be commited and pushed to the project repository after which our work here will start.

For CI our focus will be on tools that allow us to create a pipeline that can test changes, build the project, analyze and scan the project and save the new build somewhere for deployment later.

GitHub Actions

The first tool that we use is GitHub Actions, similar feature also exists for GitLab and other version control interface platforms as well.

Github Actions allow us to create a very basic pipeline on GitHub for our project, what this pipeline can do is, we can make it build the project, run tests on it and most of everything we require as well, these could be serverless in nature but would be managed by GitHub i.e. your pipeline wouldn't completely be under your control but for the most part you won't feel a difference unless you want to really customize your setup.

We will have to enable triggers for GitHub Actions which are simply conditions for when to run the pipeline, i.e. you can set the actions to run on every commit, or every PR, or for only specific users etc.

It also allows you to setup secret tokens and passwords so that you can integrate other tools with your pipeline. This is mainly done via the Actions Marketplace but you can do it yourself as well.

Jenkins

Jenkins or another similar alternative such as TravisCI is a server based pipeline tool which you would have to setup on your own server somewhere and setup the entire pipeline from scratch which you will be having complete control over.

Jenkins too similar to GitHub Actions will require a trigger to be setup on the GitHub project to define when the pipeline should be run. Moreover Jenkins too can be integrated with various other tools via plugins and scripting.

Since you can add your own machine server to GitHub actions to make GitHub Actions server based, and as support for extensions are vast for both, you must be wandering then why should I even learn Jenkins after GitHub Actions?

This is because GitHub Actions and its advancements are pretty new, earlier Jenkins was the goto tool for pipeline creation and management, so, a lot of organizations now a days still continue to use Jenkins as their pipeline setup as they don't really have a need to shift to anything else, i.e. technical debt.

Having knowledge of Jenkins will surely help you when you are trying to go through the CI/CD management for any older organization.

Docker

The first step of your pipeline would be to connect with GitHub and pull the entire codebase from your project repository to the pipeline manager server and run tests on the code and record the results, then we can go ahead with running analysis on the project such as code coverage and security checks etc all the while recording these results as well.

Once we have generated all required reports as per our requirements what we would need to do in most cases is to build the project and store it in a runnable state so that later on it can be accessed by users.

Now-a-days one of the best ways to achieve this is by using Docker. Docker containerizes the entire application by converting the entire project into an image based on the instructions provided to it in its DockerFile.

The image can easily be stored and can be used easily to create a container and run/deploy the application when required.

In order to save the image we can label it with a new version and once the image is build by Docker in the pipeline we can push it to an image registry such as DockerHub which again can easily be integrated with the pipeline manager of your choice (Jenkins, GitHub Actions etc).

Tools for CD?

So far in the CI/CD process we have created a pipeline that runs when a trigger from GitHub is detected and pulls the new code from the project repository to the local server where tests, scans, analysis and report generation are run on it after which the project is containerized using Docker and the created image is then stored in a remote image registery such as DockerHub.

Now since a new version of the project is available, we will be creating a process for deploying the created image as containers, hence now the work for CD will begin.

Kubernetes

Kubernetes or K8s is a container orchestration tool which handles and manages numerous containers simultaneously while also ensuring that the application is never down and also is extremely useful in scaling the application easily with very small changes in response to varying traffic as and when required.

It also has an entire technological ecosystem built around it where a lot of add-ons and extensions have been created and many more are under work that improve upon the capabilities of the tool, this includes security checks, monitoring etc.

Kubernetes is generally used to now-a-days for managing and running applications in clusters to ensure uptime and gradual rolling out of updates.

At its core you can think of Kubernetes as a collection of machines called nodes, of which some are master nodes and other are worker nodes, you as a Kubernetes user tells the master to deploy a specific image container and how many such containers you want and the master then internally runs the desired amount of containers of that image on the worker nodes and load balances them to distribute traffic equally.

In order to update the image you simply run a command to let the master node to use a different image in the containers and it automatically does so, in our case we will simply tell it to use the newly created image on DockerHub created by our CI pipeline.

Kubernetes is one of the most important tools if not THE most important tool in CD and DevOps.

Amazon Web Services

Now since the entire workflow is ready, i.e. once we push changes, a new image would be created which we would then inform the Kubernetes cluster about and update the version after which we would have the new version up and running for all users.

The only thing that remains is that as mentioned before Kubernetes is simply a collection of machines which are referred to as nodes, so to create the Kubernetes cluster and add or remove nodes to it we would actually need to have access to machines that we can use.

One approach would be to buy your own hardware when required but this would be extremely infeasable for most organizations as the cost of maintaining such machines i.e. servers would be a lot so we look for an alternative.

The best global alternative for this problem is to actually get our hardware requirements fulfilled by renting machines from cloud providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP) etc in the form of virtual machines and other required components.

Hence cloud is essential as it will allow us to provision and rent hardware resources for our servers such as for Kubernetes and Jenkins to run on as we don't want to personally purchase and maintain the hardware.

You can learn any cloud service but I would prefer AWS because it's the most often used out of the top three.

Terraform

Terraform is an automation tool created in order to provision and manage all kinds of resources, in our use case it will allow us to provision and manage resources on our cloud service platform such as AWS.

So why do we even need to use this? Well cloud service platforms provision only one resource at a time, so let's say you are supposed to create 100 VMs for your Kubernetes cluster you'll have to manually go through the VM creation process 100 times using the user interface which is simply tedious, moreover even if you decide to use the respective command line tool, if you want to use multiple cloud providers then you'll have to manage the command line tools of all the providers you want to use which again makes the process cumbersome.

Enter Terraform, it provides an interface that allows us to write simple syntax in order to create any kind of resource any number of times on any cloud service provider of our choice. Updating and deleting such resources are ridiculously easy as well.

Terraform is built on the concept of IaaC which means Infrastructure as a Code concept, which aims to abstract the process of provisioning resources by simply defining your requirements as a piece of code which then simply runs and creates all required resources.

So Terraform can be used to manage the cloud resources we require, it can also work on local or other resources as well if required, hence ensuring that we are in complete control of our infrastructure i.e. our hardware and all the things that allow us to access and use that hardware. It also ensures that the process is automated and hence increases efficiency and saves time.

Ansible

Now that we have provisoned cloud resources that are to be used by Kubernetes as nodes using Terraform another small problem still remains.

Kubernetes needs to be setup on these nodes and needs to be informed that all these nodes are a part of its cluster so that it can use these nodes as workers as and when required. Kubernetes does this by running a process called Kubeadm on each of the nodes that is a part of the cluster, so we will now have to set this up on every provisioned VM resource.

Let's say we have 100 such VMs so we will have to SSH into each VM one by one and install Docker and Kubeadm and run the service to connect it to the master Kubernetes node.

You can image just how tedious and time consuming this entire process would be, so we automate this process as well by using Ansible.

Ansible uses sequence of instructions called Playbooks and takes in a list of all VMs and simply uses that list to SSH into all the VMs and run the Playbook on each which includes instructions to setup Docker and Kubeadm on a machine and then running the service.

This way we automate the process of setting up Kubernetes on all the provisioned VM resources after which the process of building our Kubernetes cluster is complete.

Prometheus

Now that our Kubernetes cluster is setup and running we use tools such as Prometheus to monitor the cluster and extract metrics and logs from the cluster which lets us know how the cluster is performing, how healthy it is, if everything is running how its expected to be and in case of an error also lets us know what the exact problem was helping in debugging as well along with monitoring and maintainence.

We can also use Grafana on top of Prometheus which converts the metrics and information from Prometheus into visualizations and graphs which are easier to comprehend and interpret.

Prometheus and Grafana are extensions to Kubernetes and part of the Kubernetes ecosystem.

Next Steps?

If you have made it till here,

Congratulations! You have your entire CI/CD pipeline and infrastructure setup!

Now all you need to do is ensure smooth functioning by keeping a track on metrics, making changes to the infrastructure as required using the tools mentioned above and getting a huge pay for this work.

Extensions of DevOps?

DevOps is a major field and in recent years some other interesting fields which are technically subfields of DevOps have also been gaining a lot of attraction so we take a quick look at them as well.

DevSecOps?

It is pretty similar to DevOps but in this we also pay extra attention to security of our system and our entire workflow for the CI/CD pipelining.

It aims to ensure that the application we are creating is completely vulnerability free and cannot be at a potential risk of getting attacked by someone nefarious.

There also exists this amazing blog by Christophe Limpalair who you can connect with on LinkedIn and Twitter which dives deep into the entire concept of DevSecOps and teaches everything from the ground up.

GitOps?

GitOps is another emerging field which takes automating the deployment to another level, here we assume that GitHub is the only source of truth for the Kubernetes cluster, what this means is that earlier once we had updated the code in the GitHub repository the CI pipeline would automatically build the image for the project and push it to DockerHub but in order to deploy it to Kubernetes we had to manually run a command.

Moreover there could be a mismatch between what we think the configuration of the cluster is and what the configuration actually is, as someone with access to the cluster could have modified it.

So we use tools such as ArgoCD or FluxV2 which again are extensions of Kubernetes and a part of the Kubernetes ecosystem. These tools will be looking for changes in the configuration files of the Kubernetes cluster which would be stored in the GitHub repository of the project and if any changes are detected in the configuration files in the repo, those changes would be automatically applied to the Kubernetes cluster.

Now if we push a change to the repository then the cluster would be automatically updated instead of us having to run a command to let the cluster know to update the image it is using or for the count of containers or something else. Moreover even if someone changes the configuration of the cluster by logging into it manually, even then the cluster will simply pull the changes from the GitHub repository and change its state to the one defined in the repository.

Final Workflow?

Now what will happen is that lets say I make a change in my codebase and push those commits to GitHub, this will trigger a hook for the pipeline to run which could be something like GitHub Actions or Jenkins, the pipeline will pull the code in the repository to its server where it will run tests on it, scan and analyze the code as instructed, record those results and then use Docker to containerize the code by converting it to an image and pushing it to a remote image registry such as DockerHub.

Then I would have to edit the Kubernetes configuration files to change the version of the image used then manually access the cluster and run the command to use the updated configuration instead after which the cluster will apply all changes specified and run the updated version of the application for the users to access.

Apart from this time-to-time we would be keeping a track over the metrics generated by Prometheus and Grafana to make sure that the cluster is healthy and running exactly as expected.

In order to initially setup the Kubernetes cluster we would have to use Terraform in order to provision resources such as VMs on a cloud service provider like AWS after which we will use Ansible to SSH into each of the created VMs and setup Docker and Kubeadm on each in order to connect them together to form a cluster.

In case we are using DevSecOps we would also perform security and vulnerability scans on these steps to ensure that we don't have any point of exploit or attack exposed for someone nefarious to utilize.

Instead if in case we were using GitOps then we would add a step in the pipeline to edit the configuration files for the Kubernetes cluster and update them with the latest created image by the pipeline and commit those changes again to the repository (we can setup a different repository for these configuration files altogether otherwise the new commit will again trigger the pipeline and lead to a new version which would then lead to an infinite loop of image creation and updation) after which ArgoCD or FluxV2 setup in the cluster will check the new state in the repository and automatically update the cluster to follow the configuration in the repository and simply update the image used and pull the newly pushed one instead.

End Thoughts?

This blog aims to teach the reader about the concepts of DevOps and how the different tools work together to complement each other and create the entire pipeline and infrastructure for automation and efficiency.

I understand that the explainations for the tools mentioned in this blog are very bare bones and is only aimed to let the reader know the order in which they should learn these topics, blogs in the future will cover these tools individually with practical samples but for now I suppose this much information is more than enough for anyone to get started with DevOps.

Special Thanks

I would like to thank my great close friend Ved Ratan who had suggested and motivated me to get started with DevOps, he himself is an amazing DevOps enginner and enthusiast for the same. If not for him I would never have gotten into DevOps and this blog would never have existed in the first place!