Workflow configuration import and validation for AliECS
The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. AliECS is being built from scratch in Go and takes advantage of Apache Mesos for its cluster resource management capabilities, with the goal of controlling tens of thousands of processes over hundreds of nodes.
In preparation for LHC Run 3 starting Spring 2021, the ALICE Experiment at CERN LHC is undergoing a major upgrade, which includes a new computing system called O² (Online-Offline). A workflow template is a file (YAML) which describes a set of data-driven processes to run and control throughout the O²/FLP cluster at LHC Point 2.
For developers of processing software, the high-level interface is called DPL (O² Data Processing Layer). This component is able to generate dumps of data-driven workflows (i.e. files describing a set of processes to run and how they talk to each other), which currently cannot directly be imported into AliECS.
These template workflows were handcrafted to the specifications of the DPL dump, which was time-consuming and did not follow a formal schema. The goal of this project was to develop a converter tool that would receive a DPL dump and output the required number of task templates along with a single workflow template.
The following image shows a typical data flow pipeline from generating DPL dumps to using
coconut and starting environments:
The following were the core goals defined in the project proposal:
|Develop a tool to convert a DPL dump into workflow and task templates||✅|
|Define formal schemata that these templates adhere to||✅|
|Develop a validation tool to verify if templates adhere to the said schemata||✅|
Beyond the above,
workflow.Graft() was also implemented. This function allows us to convert a fresh DPL dump on the fly and append its contents to an existing workflow template. Had GSoC not ended so soon I would’ve liked to work on:
- Adding commit hooks to validate all templates uploaded to AliceO2Group/ControlWorkflows
- Preserve custom ordering of marshaled YAML elements
- Before we began work on the project we needed a name for the utility that would house the above features. Given that
AliceO2Group/Control(the repository for AliECS) already had a couple of tools called
peanut. We decided that
walnutwas an appropriate title.
walnutstands for the Workflow Administration and LiNting UTility.
- Twice during GSoC, I had to present my work to the fellow members of the ALICE community. These were encapsulated in:
Challenges and Lessons
- Using hard-coded paths to marshal complicated types like
iteratorRoleresulted in overly complicated
MarshalYAML()functions. Rather, calling
MarshalYAML()on each of its constituent fields to simply reuse the custom marshalers we had already defined proved to be a cleaner and more elegant approach. Typecasting these into a
map[string]interfaceand iterating over the key-value pair, adding to the result as we go, gave us exactly what we wanted in a much smaller package.
- During marshaling of YAML files,
omitemptywas proving to be unreliable in the case of slices. Further research revealed that the slices were not empty but rather held
nilvalues. You can read more about this here.
- Before we could begin work on
workflow.Graft(), we needed a way to load existing workflow templates into
walnut. We couldn’t use the
UnmarshalYAMLmethods defined already since this means we lose all ordering and comments present in the workflow template. Using the
yaml.Nodeimplementation solved this problem allowing us to insert new elements and preserve the ordering and comments as well.
- Over the 90 days of Google Summer of Code, I submitted:
- 19 pull requests with a total of 138 commits
- 3,600+ additions to
- I spent ~300 hours working on this project which is around 25 hours/week
Overall, GSoC has been a phenomenal learning experience for me. The knowledge I gained is not limited to just programming, I have learned how to work in a team and how to present the work we do in a structured and digestible fashion for other engineers that depend on it. I’m immensely grateful to CERN, Google and most of all my mentor, Teo Mrnjavac.
My experience with GSoC 2020 @ CERN: Part 0, Part I