About ZTech
ZTech is the new tech department of the Ziegert Group. Their mission is to develop innovative solutions in the areas of Tech & IT for a modern real estate industry and the residential quarters of the future. The Ziegert Group includes the real estate company Ziegert EverEstate GmbH, the project developer INCEPT GmbH and the human proptech METATRUST. The headquarters of the Ziegert Group are located at Checkpoint Charlie in Berlin-Kreuzberg.
The Challenge
Ship often! Ship fast! Ship safely! - That was the goal of the cooperation between ZTech, the digital department of the Ziegert Group, an established real estate company with more than 35 years of experience, and PCG. ZTech had successfully built a series of digital platforms for managing real estate assets over the course of the last several years and was ready to take it to the next level: Transforming the existing single-tenant systems into multi-tenant SaaS platforms.
The existing setup already had perfect preconditions: Modern cloud-native, event-driven serverless architectures, several cross-functional teams working together in an agile way and a clear product vision. However, the transformation from a single-tenant to a multi-tenant solution is not easy. Thus, ZTech asked PCG to help them prepare for the future: Create the technical foundation to allow for increased speed, throughput and quality in the delivery, and empower the existing development teams to deploy with confidence and be ready for further growth.
The Solution
Teams from ZTech and PCG worked together closely to implement different measures comprised of tools, processes and people. This allowed shipping to be faster, more frequent and safer. It enabled software to be delivered more quickly because automation was increased with Infrastructure as Code, the CI/CD tooling was migrated to a more effective solution and cognitive load was reduced. Setting the preconditions and subsequently migrating from the traditional GitFlow process to Continuous Deployment allowed for more frequent delivery. And lastly, shifting left on security and empowering teams to take complete ownership and adopt a DevOps mindset led to safer delivery.
Ship Fast!
Remove blockers by implementing automation with Infrastructure as Code
One major factor causing a high lead time, i.e. the time it takes from code being committed to code being running in production, was that changes often required manual modifications of the AWS infrastructure components. Full automation is often not a main concern when getting started with a new product, and might even turn out to be a misled effort when scope changes very frequently. However, at a certain point the system and its interdependencies become so complicated that correct execution is very time-consuming. This is especially true for a serverless architecture where infrastructure components such as queues, streams and databases make up a central part of the application logic itself, and when changes to the application code often need to be in sync with the infrastructure code and vice versa.
To achieve the desired level of automation, Infrastructure as Code with AWS Cloud Development Kit (CDK) and TypeScript was introduced. After using AWS Resource Manager to create an inventory of all the existing resources, they were either imported into existing stacks or created from scratch and put into production with appropriate migration and cutover processes. The infrastructure code was included in the application’s code repositories, and management of the resources’ complete lifecycle was integrated into the application’s build and deployment pipelines. Afterwards, all changes could be made in sync and rolled out in a repeatable, secure and reliable way, allowing for frequent and small changes by anyone.
Increase developer productivity by replacing CI/CD tooling
Another unnecessary delay was caused by the tooling used for the build and deployment pipelines itself - AWS CodeBuild and AWS CodePipeline. While these two built-in AWS services are a great choice for small teams getting started on AWS, they are not the best fit for larger teams with more advanced processes.
Thus,PCG suggested switching to the native CI/CD offering of the version control system already being used - in this case GitHub and GitHub Actions. Not only did it offer a more advanced feature set, it also integrated smoothly into the teams’ regular development workflow, reducing context switches and thus increasing developer productivity. As the first step in the migration effort, PCG consultants moved one service’s pipeline as a proof-of-concept. After presenting it to the teams, discussing questions and concerns and resolving potential blockers, the teams were convinced by the advantages and in a shared effort, all pipelines were successfully moved to GitHub Actions, resulting in more flexible and faster pipelines.
Reduce cognitive load by reducing code complexity
The last major step to ship faster was to reduce overall complexity where possible, reducing teams’ cognitive load and making it easier for future developers to apply changes in a fast and confident way. Both the overall architecture approach as such, and the individual services’ implementations were analyzed. A thorough review by PCG’s Serverless experts came to the conclusion that the chosen architecture approach was indeed a good fit for the specific business problems that needed to be solved. Thus, it was not reconsidered or fundamentally changed. However, the analysis revealed several low-level parts that could be simplified.
PCG consultants used their preexisting knowledge of the given frameworks and their longtime AWS and Serverless experience to support the teams in replacing custom implementations with out-of-the box solutions, identifying and removing obsolete parts of the code base and making use of more native features, which resulted in more maintainable code.
Ship Often!
Resolve issues quickly by improving observability setup
After one of the major preconditions - automation - had been established, the only thing left to do to be able to drastically increase the deployment frequency, was to tighten the observability setup. A good observability setup is crucial to maintaining a low mean time to restore (MTTR), especially when releasing every commit directly to production. All applications already had logs, metrics and tracing in place. But one main issue remained: Production issues were often detected by diligent manual checks after a release, which would not be feasible anymore with more frequent deployments. Additionally, the frequency of production issues was expected to increase as well with more frequent deployments. To lay the foundation for a speedy incident resolution, usability of root cause analysis on production issues needed to be improved as well.
To detect and alert about failures automatically, PCG collaborated with ZTech to define and implement Amazon CloudWatch alarms based on technical and business KPIs. The alarms were integrated it into Microsoft Teams so that developers on-call were immediately notified in case something went wrong. To ease root cause analysis, dashboards were created based on default and custom metrics, log collection was standardized and noise-generating false positives were significantly reduced by carefully reviewing and adapting error scenario handlings. With all this in place, everyone involved felt ready to move to more frequent deployments.
Go all the way by introducing continuous deployment
With all the previous actions being successfully done, the applications were moved from the previous GitFlow process to a Continuous Deployment process. On a technical level, this simply boiled down to deleting the develop branch, making the main branch the default branch and adapting the deployment pipeline to deploy straight to production after the deployment to staging was successful.
Ship Safely!
Keep production failures low by shifting left on security
With the now significantly raised tempo, the stability of the system might be in danger. Even though automation, observability and simplification already acted as risk mitigators, additional actions were taken to be on the safe side. One of them was the adoption of a shift left approach in terms of security. To keep change fail percentage low, bugs are aimed to be caught as early as possible and preferably not in production. This is especially critical for security related issues. To support the shift left approach, several actions were taken: Making it close to impossible to perform insecure operations by putting guardrails in place, making it as hard as possible to go against best practices by providing AWS CDK blueprints with sensible defaults and reference implementations for IAM policies adhering to the principle of least privilege, and making it super easy to always stay ahead by extending the build and deployment pipelines with automatic security alerts and updates using GitHub Action’s Dependabot feature.
Ensure longevity by increasing confidence and trust
Last but not least, shipping safely is also a matter of feeling confident and of trusting in both the team’s and the system’s capabilities. This might turn out to be more challenging than expected if team members are faced with an unfamiliar technology stack in combination with a lack of access to initial decisions. To increase the level of confidence across all teams and to set the foundation for future growth, PCG invested heavily in knowledge sharing and enablement.
Several formal activities such as bi-weekly interactive brown bag sessions and a DevOps seat rotation were established to provide theoretical knowledge combined with hands-on experience. However, the main leverage was the continuous and ongoing collaboration: PCG consultants worked integrated into the ZTech development teams, initiated and participated in mob and pair programming sessions, and were always available for on-demand mentoring sessions and individual coaching. Over time, the teams became more confident and PCG faded out, being replaced by ZTech developers who were now able to act as multipliers on topics such as DevOps mindset, specific technical details of the used services and AWS best practices.
Results and Benefits
With the tooling we have in place now, largely thanks to PCG, it will become possible for us to improve speed, throughput, and quality. Jonathan Hansen, Head of Engineering & Agile
The combination of all the steps - among them increased automation, increased developer productivity and increased confidence - set ZTech up for the future: Be able to deliver high-quality software at a high pace for the ambitious product roadmap and the technical challenges ahead. The close collaboration between ZTech and PCG left the teams empowered to own and expand their existing Serverless stack on AWS, the introduced DevOps tools and the newly established processes and approaches. Teams at ZTech are prepared to ship fast, often and safely for years to come.
About PCG
Public Cloud Group (PCG) supports companies in their digital transformation through the use of public cloud solutions.
With a product portfolio designed to accompany organisations of all sizes in their cloud journey and competence that is a synonym for highly qualified staff that clients and partners like to work with, PCG is positioned as a reliable and trustworthy partner for the hyperscalers, relevant and with repeatedly validated competence and credibility.
We have the highest partnership status with the three relevant hyperscalers: Amazon Web Services (AWS), Google, and Microsoft. As experienced providers, we advise our customers independently with cloud implementation, application development, and managed services.