My first working experience at the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS) Dresden was as well the initial contact with applied ML research. The topic aimed at recognizing text (OCR) by training an own neural network, as applying existing OCR tools such as Tesseract did not lead to usable results due to the fuzziness and complexity of the scanned map images.

Starting point was a paper that implemented an OCR model trained with artificially generated training data using real-world photos[1]. As it was not suitable to manually annotate thousands of map images, the idea was to utilize a similar approach based on artificially augmented map data.

 

Excerpt from a scan of an old German map, depicting the area around the village Neuhof at the center, surrounded by a bog at the top left. As the background consists of squiggly lines all over the place and significantly interfere with the text, these maps provide a significant challenge for any OCR algorithm. Image Source: [2]

 

The artificial training data generator was implemented by selecting random map areas without text and placing randomly generated text on them using random fonts. To make the results a bit more realistic, the text was rendered using a smaller resolution and then upscaling the rendered text before placing it onto the background. Additionally, the text color was adapted to fit the overall map color.

 

Generated Samples

 

During the project, I also implemented a grid search for hyperparameter optimization and learned how to work in a HPC environment.

Note that this was a job as undergraduate assistant I started in a quite early stage of my studies. As this was my first contact with neural networks and applied machine learning in general as well, I’d pursue some other approaches then I did back then with my today’s knowledge. However, this was an important contribution to my understanding of machine learning pipelines and training data-oriented solutions.



References

[1] Jaderberg et. Al: Reading Text in the Wild with Convolutional Neural Networks, https://arxiv.org/pdf/1412.1842.pdf (04.12.2014)

[2] SLUB / Deutsche Fotothek, Excerpt of GROSS- BRÜTZ (1881)