We apply big data and machine learning techniques to massive open source code databases to create tools and services that improve the software development process.
We are currently working on the following projects:
Improving decompilation of binary files.
Classifying source code for different purposes such as predicting the expertise level of programmers by analyzing their code.
Source code representation and analysis using overlaid graph representations.
Predicting the impact of type annotation on the runtime performance of gradually typed programs.
Programming language recognition from source code excerpts (PLangRec).
New algorithms of structured data mining for big amounts of data.