My Experience with Google Summer of Code 2020 @ CERN - Part I
I was confident, yet nervous. I had been in light communication with the project mentor during the selection period but you never know for sure. I had only applied to one project so I’m glad it worked out.
The official title for the project is:
AliECS Workflow configuration import and validation for AliECS
AliECS is an experiment control system for ALICE (A Large Ion Collider Experiment).
The data-driven workflow dumps generated by the DPL (O2 Data Processing Layer) cannot be directly imported into AliECS. These workflow templates are JSON files describing a set of processes to run and how they communicate and are thus, essential.
The project revolves developing an importer that converts a DPL workflow dump and converts it into a format that AliECS can work with. Here’s a rough diagram of the dataflow:
- Define a formal schema for AliECS workflow input template
- Develop a schema validation package that does not require conversion from YAML to JSON (or vice versa)
- Develop an importer to convert DPL workflow dumps into AliECS workflow templates and the required number of task templates
- Integration of the above into AliECS along with proper documentation and tests
For reference, here’s the link to the project page. You can also read the paper Towards The Alice Online-Offline (O2) Control System to know more about ALICE O2.
Google’s timeline dictates that the first month of the program be reserved for Community Bonding. You get to know the community working behind the project and the people you’ll be working with. From the documentation Google provides, most Open Source projects operate on IRC and that’s certainly the case for KDE.
However, this was not the case for the High Energy Physics community. In this case, the community is not one of Open Source hackers, volunteers and Open Source companies but rather physicists, engineers who are almost always employed by an institute or university. The community for this project is the ALICE collaboration.
It is a far more formal environment than most IRC channels I’m a part of. Initially, communication with my mentor was through e-mail. We soon shifted to Mattermost, an open source self-hosted alternative to Slack.
Before I could join the CERN Mattermost, I needed an account. CERN provided me with an email address (@cern.ch) which gave me access to the following:
- CERN Outlook E-Mail
- CERNBox Service (1TB of storage)
- CERN OpenStack (upto 5 instances!)
- CERN Mattermost
- CERN GitLab
- CERN’s JIRA instance
And a host of many others that I haven’t yet explored :^)
Every new CERN account needs to read the security guidelines and sign the computing rules. Serious stuff!
Apart from communication with the mentor, students also occasionally receive emails from the CERN GSoC admins. An introductory video call with the admins was also scheduled where the students introduced themselves and talked about their project.
Coding Period I
Even though I had access to the private CERN GitLab instance, the majority of the work is done at the AliceO2Group/Control repository on GitHub. I’m currently working on developing
walnut which stands for the Workflow Administration and Linting Utility.
1,300 lines of code later, I merged my first PR into the Control repo. As of June 29th, I have two more in the works. My mentor is responsive and reviews my work frequently. This is by far the most valuable part of GSoC. My goal for GSoC was to learn Go. Not just knowing the syntax but having it be idiomatic, the best practices and most importantly, how to write production-grade code.
This is much more than anything I could’ve learned working through Go myself. My mentor regularly gives me reading material to Go through. Any doubts I have are cleared with a clear, concise explanation.
If you’ve read my other posts, you might know that I track my time. Since the beginning of the coding period, I’ve devoted approximately 140 hours on this project. This results in an average of 35 hours/week. According to Google, the GSoC program expects 30+ hours/week of work. Since I’m well on pace, I feel this is a comfortable amount of time and leaves me free to work/study on other things that interest me.
GSoC has been a blast. I’m enjoying working on these problems and learning from the best. Most importantly, this project allows me to contribute towards LHC Run 3, scheduled for 2021. I cannot express how proud I feel having the opportunity to contribute to the combined efforts of 10,000 scientists and the hundreds of universities and laboratories in building the world’s largest and highest-energy particle collider :^)
For that, I’d like to offer my thanks to the GSoC program, the CERN admins and last but not the least, my mentor, Teo Mrnjavac.
My experience with GSoC 2020 @ CERN: Part 0, Part II