$ cd ..

GSoC 2020 @ CERN - Part II

📅 2020-11-01

1474 days ago

Workflow configuration import and validation for AliECS

Gopher
ALICE Gopher

General Overview

The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. AliECS is being built from scratch in Go and takes advantage of Apache Mesos for its cluster resource management capabilities, with the goal of controlling tens of thousands of processes over hundreds of nodes.

In preparation for LHC Run 3 starting Spring 2021, the ALICE Experiment at CERN LHC is undergoing a major upgrade, which includes a new computing system called O² (Online-Offline). A workflow template is a file (YAML) which describes a set of data-driven processes to run and control throughout the O²/FLP cluster at LHC Point 2.

For developers of processing software, the high-level interface is called DPL (O² Data Processing Layer). This component is able to generate dumps of data-driven workflows (i.e. files describing a set of processes to run and how they talk to each other), which currently cannot directly be imported into AliECS.

These template workflows were handcrafted to the specifications of the DPL dump, which was time-consuming and did not follow a formal schema. The goal of this project was to develop a converter tool that would receive a DPL dump and output the required number of task templates along with a single workflow template.

Conversion
Intended conversion inputs and outputs

The following image shows a typical data flow pipeline from generating DPL dumps to using coconut and starting environments:

Pipeline
Data-driven workflow pipeline

Project Goals

The following were the core goals defined in the project proposal:

GoalResult
Develop a tool to convert a DPL dump into workflow and task templates
Define formal schemata that these templates adhere to
Develop a validation tool to verify if templates adhere to the said schemata

Beyond the above, workflow.Graft() was also implemented. This function allows us to convert a fresh DPL dump on the fly and append its contents to an existing workflow template. Had GSoC not ended so soon I would’ve liked to work on:

Highlights

Challenges and Lessons

Conclusion

Overall, GSoC has been a phenomenal learning experience for me. The knowledge I gained is not limited to just programming, I have learned how to work in a team and how to present the work we do in a structured and digestible fashion for other engineers that depend on it. I’m immensely grateful to CERN, Google and most of all my mentor, Teo Mrnjavac.

My experience with GSoC 2020 @ CERN: Part 0, Part I