Where MeteoGroup begins: The processing of immense amounts of data
You have to be a puzzle-solver. And he is. Karsten Becker is Director of the Data Provisioning department, and he and his team are the foundation of MeteoGroup. His department is responsible for correct processing of all raw data that enters MeteoGroup, to be used in the weather room, for example, where every kind of model information is studied and interpreted. If something goes wrong in that processing, if parameters are missing or lacking, then Data Provisioning will look for the leak. However, the senior software engineers of Data Provisioning do much more. They streamline data processes, construct the transition of all data processes into the cloud and extract all kinds of derived weather data from the incoming information.
The software engineers’ computer screens show strings of information that don’t mean anything to someone without ICT knowledge. A large screen is up against the wall next to the workplace, monitoring all running processes, with graphs that show erratic patterns and several peaks, reflecting the processing of extra incoming data. One screen also reports problems that can be passed on by colleagues around the world. Karsten or his colleagues will immediately start working on these issues, and do so 24 hours a day.
The Data Provisioning Department involves two teams: the Observation Team and the Model Team. The former deals with the correct processing of all incoming observation data, while the Model team does the same for all acquired model information. Some of Karsten's people work in Wageningen, a few in Utrecht and Belgium, and there are three colleagues in the Berlin office.
The team keeps a close eye to all running data processes
"Those who work here know to do programming, of course, and like to do so”, says Karsten. “Affinity with the weather is useful but not necessary. The most important skill is a great analytical capacity. If something does not work, all kinds of cogs start spinning in our heads. Where do I look, what system is running, what is the first thing I need to find out? In addition, it is important to possess knowledge of older computer systems, so you can dig into those systems and you can also be creative about how to achieve a solid future-proof solution. And one condition is that you like a complicated puzzle."
Bytes and bits
Whoever talks to Karsten is also being pulled into the puzzle and will get a glimpse of the complex tasks his teams face. Just look at the immense amount of data the company enters to be processed: radar observations, model data, WMO observations, storm information, satellite images. It is such a gigantic amount of information that providers deliver it compressed. In other words, binary; and this process of 'ones and zeros' must be decoded again upon entry.
As an example, we can take a look at the data delivery from ECMWF (the European Centre for Medium-Range Weather Forecasts). MeteoGroup purchases model data from this renowned weather institute, and it serves, among other things, as input for our own post-processing and for use by meteorologists in the weather rooms and at the BBC. "An enormous amount of data files are coming our way from the ECMWF”, says Karsten. “The information is per weather element, per forecast period, per altitude in the atmosphere, and then twice every 24 hours. That big pile of data belongs to separate grid points, about 6.5 million on Earth. This amounts to so much data that the ECMWF itself interpolates to a somewhat larger-scale grid and sends that data to us, in binary form. At Data Provisioning all these files come in and we pick them up in a way that ensures the information runs smoothly in our systems. A second interpolation will follow and the generated files are then sent off to all kinds of internal customers. Different parts of the ECMWF data are requested per division. We cut the bulk of information into pieces and send it through.”
However, the scope of Data Provisioning goes way beyond this. "We also deduce a lot from the incoming information. For example, the relative humidity is not included in a number of models. You’ll have to derive it yourself out of temperature and dew point. We ensure that the weather parameter is drawn from the two elements per period. In this way we also derive, for example, the Boyden index, an indicator of the probability of a thunderstorm."
In the cloud
MeteoGroup is in the midst of an operation to get all processes running in the cloud. As Karsten explains, "We used to do everything via servers and in separate steps, because server capacity had its limits. When data came in, the next step in the process could begin, and then the following steps. This is completely different in the cloud. Not all processes have to take place in a row; they can run parallel, and if you want to you can run 1000 processes at the same time. It also implies you can scale up and down infinitely. What’s great about the cloud is that when you're done with the server you can say 'and now down'. You can tune your needs with these servers.”
"We started this project of moving all data processes into the cloud about four years ago. As far as modeling is concerned, we’ve made a lot of progress. This year, 2019, we are going to make all model processing run in the cloud. When it comes to observations, a lot of steps still have to be taken. Approximately 200 different observation streams are currently coming in. Some each minute, others per hour. Ultimately, those also must move to the cloud."
Challenges aplenty for Karsten’s department; the puzzle now has many dimensions.
High resolution satellite images are also being transferred in bits and bytes and will be decoded again upon entry.
Enjoyed reading this blog?