By Tyler Charbonneau
This is the final installment of our multi-article series exploring cloud-native technologies. To learn more about optimizing for developer experience and its critical role in implementation success, check out part four here. Parts one, two, and three are also available if you want to learn more about defining goals and responsibilities, establishing your cloud-native infrastructure, and distribution and rollout.
Nailing your initial cloud-native setup is one thing, but the process doesn’t end there. Throughout its existence, your cloud-native infrastructure will require maintenance and continual improvements to stay fully functional. Feedback is essential to this mission.
In this guide, you’ll learn the merits of gathering feedback, how to collect it, and which voices are critically important.
When maintaining a cloud ecosystem, there are many usage considerations that drive how the ecosystem develops over time. On one hand, you have your development and engineering teams who are responsible for navigating the complex inner workings of your setup and tasked with preserving uptime. You also have your internal users who depend on the system for mission-critical business operations. Internal portals, applications, and other proprietary mechanisms facilitate access to key resources.
Finally, you have to consider your external users. This is especially important for SaaS, PaaS, and IaaS companies who sell services, host outside applications, and generally target diverse users who simply need to get things done. These users can be from the technical, business, or everyday realms (like someone accessing Netflix).
While it’s true that many users won’t have direct exposure to your backend systems, the health and behavior of that infrastructure impact user experiences on the frontend; therefore, stable functionality begets satisfaction. No matter the user’s background, updated and reliable systems are generally much nicer to use. They boast higher availability and cause much less friction.
Feedback has real, measurable value. For the growing number of DevOps acolytes, feedback is also integral to the greater development pipeline.
A sincere emphasis on feedback gathering signifies meaningful cultural change. That open-mindedness ultimately empowers teams to build the best products and services possible, given their resource constraints.
That last point is important for many organizations. You must consider both DevOps hours and cost when implementing important cloud infrastructure changes. Feedback analysis lets teams prioritize any changes and ensures that resource expenditure leads to clear improvements.
However, those important improvements won’t always be visible to everyone. Consider infrastructure updates to your roadways, for example. While drivers may not fully appreciate the merits of bridge reconstruction, civil engineers behind the scenes were privy to cracks and other faults that, if left unchecked, would have proven to be catastrophic. Feedback assessment contributes to both obvious and inconspicuous fixes.
Gathering feedback is, undoubtedly, a process that takes time, from compilation to response. Because cloud-native deployments (and especially multi-cloud setups) can be so complex, monitoring strategies must be all-encompassing.
Your initial infrastructure serves as a foundation for optimization, and it’s crucial to always ask, “How can you do better?” That’s especially important when your backend does so much heavy lifting.
Feedback leads to improvements that strengthen your systems at scale. Below, you’ll assess common infrastructure areas that typically require continuous improvement (CI) and how to approach it:
Perhaps the most important measure of a setup’s success is performance indicators that give deep insight into the overall health of your infrastructure. By monitoring availability (uptime) percentage, resource utilization, load balancing, and user activity, it’s easier to know if each system component is doing its job. This is immensely important for cloud setups leveraging Kubernetes, which can be particularly sensitive to configuration changes. However, the universal importance of performance, from computing to networking, cannot be overstated for any virtualized system.
How are servers and nodes behaving? Are users encountering bottlenecks during their engagement with a service? Applying these questions to real-time observations can help you form a solid improvement plan.
Additionally, one should note that performance is based on perception, and this feeling of whether something is slow or fast can influence how feedback rolls in. When users know that simple web-based interactions should be snappy, but the opposite is true, frustration can mount. In the case of complete service outages, either tech support or the help desk can field numerous inquiries. Be sure to log these concerns and monitor their persistence over time. Performance issues that are particularly tricky or obscure can take engineers quite a while to find, which is why asking detailed follow-up questions is so critical.
Creating what’s known as a Minimum Viable Product (MVP) is common in the software world, and cloud deployments sometimes follow the same doctrine. While almost everything coded and configured by people will have some vulnerabilities, the severity of those vulnerabilities differs. It can be troubling when security holes sneak through to production, especially when those issues aren’t immediately apparent.
Cloud deployments include a host of vital mechanisms, including networking, storage, and computing. An exploit impacting any of these links can have disastrous results. For example, data at rest resides in various storage locations, either internal or external. Vulnerabilities there can lead to data leaks. You also have data in transit, which traverses the network during API requests and responses, and these pathways must be galvanized.
The situation worsens when that information is proprietary or personally identifiable, which has its own compliance concerns and monetary implications. It’s proven that just one data breach can sink a company. Breaches erode customer trust and drive them to competing solutions.
When choosing a service, it’s also vital to maintain the security of external components or integrations and weigh the vendor’s commitment to security. You need to understand which factors you can control and which you can’t. Listen to your engineers who should be performing regular security audits, penetration testing, and other forms of chaos testing. Accordingly, outside groups like Google’s Project Zero can raise red flags when vulnerabilities threaten to be highly severe, highly impactful, or highly visible.
Maintaining strong security will keep your services running, your data safer, and your users happier.
Gathering feedback on feature satisfaction and requests helps engineers assemble a decisive roadmap. Whether internal or external, the fact remains that users often gauge a development’s success based on the features it brings to the table. Develop too frequently, and you risk bringing half-baked software into production. Develop too sporadically, and it appears as though you’ve stagnated.
Conversely, while catering to end users is important, catering to internal engineers is also paramount, and the base components of a cloud-native deployment are important. However, so too is the supporting tooling: the monitoring tools, the remediation tools, and the automation workflows that help teams manage what’s in front of them. Upgrading these programs (given compatibility is preserved) might bring the arrival of new management features for DevOps pros.
Overall, you want to move forward from day one. Your initial implementation is a springboard for the testing and introduction of useful new features that can benefit numerous users. By implementing features based on popular feedback, you’ll ensure that each introduction positively impacts the largest subset of users.
Keeping your system up to date is integral and plays into the things you’ve learned already. They ensure that you’re equipped with the latest security patches, bug fixes, and feature tweaks. From a development standpoint, each major (or even minor) update can influence compatibility with other tools and system components. This is why updates are important, but with a caveat.
Blindly updating without doing research isn’t advisable. Consulting your engineers and admins about any dependency changes, deprecations, and regressions can save you from introducing breaking changes. Gathering feedback from these groups is also important, as it leads to informed decision-making.
Being on the bleeding edge in the tech world is exciting but can come at a cost. Always weigh benefits against any costs before clicking install or running that script.
Firstly, continuous improvement needs to be evidence-based. This means you shouldn’t just blindly make improvements based on what feels right. Instead, your decisions should be based on feedback from users and engineers. What’s working? What’s not working? What changes would you like to see above all else?
Assessing the state of your infrastructure based on the answers you receive to these questions is key. From there, it’s much easier to triage issues and apply resources.
The main goal in this instance is to gauge satisfaction with your system from all groups and monitor how that evolves as changes are made. Assessing opinions regularly with each notable change can be a great way to do this. This can be facilitated via communication channels or even applications themselves.
That said, here are some reliable ways in which you can gather feedback on your cloud-based system:
While gathering feedback from an analytics standpoint is important, the methods for doing so shouldn’t be disruptive or heavy-handed. A good system needs a good user experience, and any intrusive trackers might interfere with workflows. They may be distracting, or even viewed in a poor light from a privacy standpoint, even if nothing nefarious goes on behind the scenes.
Ideally, you’ll want some system (for customers and end users) that silently collects usage information in the background. By examining this data, you can see how interactions progress as users go about their journeys. Additionally, you can see how certain behaviors and loads impact the system.
Of course, those findings must also make it back to DevOps and engineering teams. The most efficient, least-intrusive method for doing this might include using a collection agent to extract any valuable metrics. This information is then transported over the network to its destination, which might often be a centralized metrics dashboard. The nice thing about that implementation is that those visualizations can provide value for multiple teams.
For internal engineers, the process is a little more performance-driven. There are a host of monitoring tools, both built-in and external, that capture statistics. These can monitor the following:
- User activity and times of peak activity
- Resource utilization and any associated trends
- Network performance
- Average engagement time
- Statefulness of components
- Control plane metrics, like API latency or queue waiting time
Metrics tracking is mostly passive and excludes any real human interaction. The process is more objective and numbers-driven.
One great thing about Slack’s rise is how it welcomes the creation of public channels. Many companies have established these channels to build communities around their products and services. They’re a great forum for active discussion on development pipelines, bugs, vulnerabilities, etc. One popular example is Kubernetes’ #devspace channel within their Slack workspace. This medium has allowed users and engineers to connect with DevSpace’s maintenance team.
Consider leveraging Slack channels to engage directly with your users. This is an excellent way to assess public opinion and uncover widespread issues. Conversely, understanding what makes users happy will help you identify your development priorities.
You can also utilize closed Slack channels. This is another effective way to communicate simultaneously with multiple engineers, share tickets, and escalate issues according to feedback.
Sometimes, the most effective and approachable way to gather feedback is through traditional mechanisms. Email remains a powerful tool for capturing relevant information about system performance and any manifested symptoms stemming from different problems.
The key is ensuring that all support emails feed into one specialized inbox. From here, team members are best equipped to understand the problem and escalate appropriately to pass information along. If these submissions were grouped with general inquiries, then critical technical feedback would be lost amongst the noise.
Everyone knows how email works. Additionally, feedback in written format is immensely useful for reference later on. It’s possible to send emails from anywhere. Best of all, email is a passive channel at the collection stage since mail servers operate 24/7, unlike help desk agents. From there, the feedback loop can operate as intended.
Stagnation is the enemy of advancement in the technology world. Engineers and developers can’t simply rest on their laurels once an ecosystem is officially up and running. That infrastructure requires constant maintenance to perform well and improve over time. This is especially important and feasible in a microservices context since smaller chunks of code are much easier to work with than monoliths.
That said, companies are flying blind without gathering valuable feedback. This can be done actively (polling, crowdsourcing, outreach) or passively through channels that are always available (if not always monitored). Both technical and functional feedback is important. Objective metrics and subjective opinion have equal gravitas depending on the subject at hand. By harnessing this feedback, you’ll be in an excellent position to succeed on the backend.