This section describes the composition of the different measured program suites. An in depth description of the contents of each suite can be found here. The test battery is composed of four levels of programs, depending on its complexity. The majority of the programs that compose each of these levels have been taken from already existing benchmarking suites and programs. Programs that use third party non-standard Python libraries or GUI toolkits were not considered, to maximize the amount of python implementations that are able to execute the programs. For the same reason, only standard Python code is measured, without considering any implementation-specific language extension.
A very small fraction of the measured programs have been created to measure certain language characteristics. This was done to test those programming language characteristics that were not covered with the existing ones. These custom tests were created following the structure of existing benchmark suites. All programs have been stripped from its screen output code to obtain more accurate performance results. When user input is needed to operate, the program have been modified to simulate it, so the user have not to input any information to continue its processing. The four levels of programs are:
Small programs that deal with only one language characteristic (integer arithmetic, double arithmetic …). They are used to compare the performance of Python implementations considering exclusively that particular characteristic. Already existing benchmarks were used whenever possible, although as we said some of them have to be developed in order to test some characteristics, especially metaprogramming ones. Several microbenchmarks have been ported from other languages with the help of the Java2Python. In total, we measured 309 different microbenchmarks.
Microbenchmark suite | Description |
---|---|
Tommti |
A Python translation of the Tommti benchmarking suite: http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html . This suite is composed by 13 microbenchmarks that measure typical language characteristics such as arithmetic operations and data structure usage among others. |
Pybench |
Pybench (http://svn.python.org/projects/python/trunk/Tools/pybench/ ) is a collection of 15 files that comprises 57 microbenchmarks, providing a standardized way to measure the performance of Python implementations. It measures different aspects of the Python programming language, allowing us to obtain a different performance number for each characteristic instead of providing a single overall performance estimation. |
Java Grande |
A Python translation of the Java Grande benchmarks (http://www2.epcc.ed.ac.uk/computing/research_activities/java_grande/index_1.html ). This benchmark suite has the purpose of measuring and comparing alternative Java execution environments in ways which are important to Grande applications. A Grande application is one which uses large amounts of processing, I/O, network bandwidth, or memory. They include applications in science and engineering among others. For our work, the three-layered structure of increasingly complex programs that compose this suite adapts well to our testing layout. Therefore we opted to translate this suite to Python and use its different layers to the corresponding parts of our testing suite. For our measurement work, we used the sequential version of these benchmarks. In this part, only the section 1 of the benchmarks (Low-level operations such as arithmetic and math library operations, method calls, casting…) were used, comprising 9 different code files that executes 83 different microbenchmarks. |
Pypy translator microbenchmarks |
The PyPy python implementation sources includes 10 code files in its source code distribution that holds up to 82 microbenchmarks. These test individual programming language characteristics in the same way as the other microbenchmarks we used:https://github.com/pypy/pypy/blob/master/pypy/translator/microbench/test_bltn.py |
Custom Characteristics |
We developed some microbenchmarks to measure those programming language characteristics that we considered that were not appropriately covered by the other microbenchmarks we used. In total we developed 24 microbenchmarks:
|
Functional Programming |
These benchmarks cover typical functional programming usage scenarios. We developed 7 microbenchmarks following the same structure of existing benchmarking suites. These benchmarks cover:
|
Metaprogramming |
Based on the work developed in: How (and why) developers use the dynamic features of programming languages: the case of Smalltalk (Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger. Empirical Software Engineering DOI 10.1007/s10664-012-9203-2, 2012), and in our past research ( A hybrid class- and prototype-based object model to support language-neutral structural intercession; Francisco Ortin, Miguel A. Labrador, Jose M. Redondo; Information and Software Technology, Volume 56, Issue 2, February 2014, Pages 199-219), we developed a benchmarking suite composed by 46 microbenchmarks that covers the metaprogramming characteristics in the Python language, divided in four layers:
|
This group of programs contains a compendium of popular open source benchmarks that are either coded in Python or translated from other programming languages. Each benchmark deals with several language characteristics at once to achieve its final result. So, while not being "real" programs per se, it allows us to test several characteristics in a more realistic environment. We used 61 benchmarks in our measurements. These are:
Benchmark suite | Description |
---|---|
Unladen swallow benchmarks |
Unladen Swallow was an optimization branch of CPython, intended to be fully compatible and significantly faster. The project was abandoned due to failing to reach its performance milestones, but its source code is still available and contains a suite of performance programs ( http://code.google.com/p/unladen-swallow/source/browse/#svn%2Ftests%2Fperformance ). This suite is composed by benchmarking programs and tests that use commercial products. We included 24 of these benchmarks in this group. |
The Computer Language Benchmarks Game (CLSB) |
This is a compendium of benchmarking programs of different nature that have a python version. These are freely available on http://benchmarksgame.alioth.debian.org/ . For our tests we used 10 of these benchmarks. |
Parrotbench |
Parrot http://www.python.org/ftp/python/parrotbench/is a virtual machine designed to efficiently compile and execute bytecode for dynamic languages. Parrot currently hosts a variety of language implementations in various stages of completion, including Tcl, Javascript, Ruby, Lua, Scheme, PHP, Python, Perl 6, APL, and a .NET bytecode translator. Its distribution comes with 7 benchmarks that cover several dynamic language features, including metaprogramming. We adapted these benchmarks to be used in our measurements. |
Jolden - TSP |
The popular Jolden benchmark suite has a Java version that can be found in http://www.software.imdea.org/~marron/benchmarks/bench.html. We translated the TSP benchmark from this suite to be used in our measurements. |
Java Grande |
As we described previously, the Java Grande benchmarks are composed by 3 sections of increasingly complex benchmarks. We translated 6 benchmarks from section 2 and 3 benchmarks of Java Grande section 3, to be included in this group. |
Crypto - AES |
As cryptography algorithms are commonly used as benchmarking tools, we used a Python implementation of the AES (Advanced Encryption Standard) by Josh Davis (http://www.josh-davis.org ) as a benchmark. The code was obtained from http://www.nullege.com/codes/show/src@p@y@pytoshare-HEAD@pytoshare@lib@encryption@aes.py |
Crypto - MD5 |
For the previous reason, a Python implementation of the MD5 hashing algorithm by Dinu C. Gherman ( http://python.net/~gherman/programs/md5py/ ) was also used as a benchmark |
Intercessive Pybench |
These are 3 benchmarks that use intercession, developed as part of our previous research. http://www.sciencedirect.com/science/article/pii/S0950584913001778 |
Dynamic inheritance |
These are 4 benchmarks that use dynamic inheritance, developed as part of our previous research. http://www.sciencedirect.com/science/article/pii/S0950584913001778 |
Pystone |
This benchmark is the Python version of the Dhrystone benchmark and is commonly used to compare different implementations of the Python programming language. Pystone is included in the standard CPython distribution. |
This group is composed by applications that are created for a specific and realistic purpose: perform scientific calculations, popular problem solving, data manipulation and parsing, etc. These programs serve much different purposes and have different sizes, due to the different complexity required by each solved problem. The programs in this group are not specifically designed as benchmarks, but as Python applications that, independently of its size, were designed with a realistic purpose in mind.
To obtain programs that fit in this category we used the program collection provided by the Shed Skin Python compiler distribution. Shed Skin (http://code.google.com/p/shedskin/) is an experimental compiler that can translate pure, but implicitly statically typed Python programs into optimized C++. Although this distribution is promising, it lacks support for certain Python programming language features at this moment, so we decided not to include it yet in our measurements. However, the example library it provides (http://code.google.com/p/shedskin/downloads/list ) is excellent, and suits to our purposes.
The Shed Skin program collection is composed by a wide variety of programs of different nature and authors. We adapted the majority of them to perform our measures. The following table describes each one of the 54 programs we used:
Program | Description |
---|---|
Adatron Support Vector Machine |
An Adatron Support Vector Machine with polynomial kernel placed in the public domain by Stavros Korokithakis. |
Ambient Occlusion Renderer |
This program is an ambient occlusion renderer written by Syoyo Fujita http://syoyo.wordpress.com/2009/01/26/ao-bench-is-evolving/ |
Ant colony optimization |
This program generates a random array of distances between cities, then uses Ant Colony Optimization to find a short path traversing all the cities the Travelling Salesman Problem. By Eric Rollins http://eric_rollins.home.mindspring.com/ |
Arithmetic compressor encoder and decoder |
Arithmetic coding compressor and uncompressor for binary data by David MacKay http://www.inference.phy.cam.ac.uk/mackay/python/compress/ |
Barnes-Hut |
A Python implementation of the BH* Olden benchmark ( http://www.irisa.fr/caps/people/truong/M2COct99/Benchmarks/Olden/ ). It implements the Barnes-Hut benchmark that is described in: A hierarchical o(N log N) force-calculation algorithm, ; Barnes, J., Hut, P., Nature, 324:446-449, Dec. 1986 |
Block compression |
A Huffman block compression algorithm by David MacKay http://www.inference.phy.cam.ac.uk/mackay/python/compress/ |
Brainfuck interpreter |
A Brainfuck programming language (http://www.muppetlabs.com/~breadbox/bf/ ) interpreter by Philippe Biondi http://www.secdev.org/ |
Chaosgame fractals |
This program creates chaosgame-like fractals. By Carl Friedrich Bolz https://bitbucket.org/cfbolz/compost/src/a11f10cc7026b4bb138d6f995c91cf990e0a7e23/python/chaosgame.py?at=default |
Chess |
Simple chess like speed test program written in Python by Jyrki Alakuijala |
Color Patterns Shells |
This program Models for the simulations of the color pattern on the shells of mollusks |
Connect four game |
This program implements the connect four (also known as four-in-a-row) game http://users.softlab.ece.ntua.gr/~ttsiod/score4.html |
Conway game of life |
This program is an implementation of the Game of Life, a cellular automaton devised by the British mathematician John Horton Conway. Its author is Francesco Frassinelli https://github.com/frafra |
Dijkstra |
Program that uses the Dijkstra shortest-distance algorithm by Gustavo J.A.M. Carneiro http://gjcarneiro.blogspot.com.es/2007/01/dijkstra-in-python.html |
Dijkstra Bidirectional |
Bidirectional Dijkstra/search algorithm extracted from the NetworkX program (http://networkx.lanl.gov/) |
Genetic algorithm |
A genetic algorithm implementation |
Genetic algorithm 2 |
Another genetic algorithm implementation by Stavros Korokithakis |
Go |
An implementation of the popular Go game by Mark Dufour https://sites.google.com/site/markdufour/home |
Hq2x filter |
Python adaptation of the Hq2x filter demo program by Maxim Stepin http://spacy51.sp.funpic.de/hq_filters/hq2x.html |
Jpeg decoder |
A JPEG decoding program by Tong Lin http://www.cis.pku.edu.cn/faculty/vision/lintong/default.htm |
Kanoodle puzzle |
A program that finds solutions to the popular kanoodle puzzle game by David Austin http://www.ams.org/samplings/feature-column/fcarc-kanoodle |
Lempel-Ziv compression |
A program that performs compression of data using the Lempel-Ziv code by David Mackay http://www.inference.phy.cam.ac.uk/mackay/python/compress/#LZ |
Linear algebra |
A program that uses various linear algebra operations by Mladen Bestvina |
Loop nodes |
A program that performs loop recognition in a node graph by Leonardo Maffi http://www.fantascienza.net/leonardo/js/index.html |
Mandelbrot |
Program that generates and serializes Mandelbrot fractals by Tony Veijalainen http://www.daniweb.com/software-development/python/code/371844/mandelbrot-with-tkinter-and-shedskin |
Mastermind |
An implementation of the Mastermind game, by Sean McCarthy |
Mastermind2 |
Another implementation of the Mastermind game by Leonardo Maffi http://code.activestate.com/recipes/496907/ |
Maze solver |
A random maze generator/solver by ActiveState: http://code.activestate.com/recipes/496884/ |
Minimal global illumination renderer |
This program is a minimal global illumination renderer (minilight) written by Harrison Ainsworth HXA7241 and Juraj Sukop http://www.hxa.name/minilight/ |
Neural network |
This program implements a Back-Propagation Neural Networks. Its author is Neil Schemenauer http://arctrix.com/nas/python/ |
Othello game |
This program is an implementation of the Othello (also named Reversi or Yang) game by Mark Dufour https://sites.google.com/site/markdufour/home |
Path tracer |
This program is an implementation of Path tracing, a way of solving a rendering equation using Montecarlo integration. It consist on a Python port of the works of Jonas Wagner (http://29a.ch/ ) |
Primes by sieve of Atkin |
This program computes a finite list of prime numbers that are smaller than a predefined limit by using the sieve of Atkin. Its author is Steve Krenzel https://github.com/stevekrenzel |
Probabilistic linear context-free rewriting systems parser for natural language |
This program is a natural language parser for PLCFRS (probabilistic linear context-free rewriting systems), an extension of context-free grammar which rewrites tuples of strings instead of strings. Its author is Andreas van Cranenburgh http://staff.science.uva.nl/~acranenb/ |
RGB Converter |
Conversion functions between RGB and other color systems |
Richards task management |
A Python translation by Mario Wolczko that implements a task management algorithm originally written by Dr. Martin Richards |
RSync |
This is a Python implementation of the Rsync algorithm: The rsync algorithm, Tridgell A., Mackerras, P.; Technical Report TR-CS-96-05, Canberra 0200 ACT, Australia, 1996. http://rsync.samba.org/ |
Rubik solver |
This program implements a Rubik cube solver algorithm http://pastebin.com/KwGMujyB . It was adapted by Mark Dufour https://sites.google.com/site/markdufour/home |
Rubik solver 2 |
Another Rubik's cube solver using Thistlethwaite's algorithm, originally implemented on C++ by Stefan Pochmann but translated to Python by Mark Dufour https://sites.google.com/site/markdufour/home |
Satisfiability solver |
This program implements a satisfiability solver. Its author is Mark Dufour https://sites.google.com/site/markdufour/home |
SHA |
This program is a Python implementation of the SHA-1 algorithm by J. Hall´en and L. Creighton https://seattle.poly.edu/svn/seattle/trunk/seattlelib/sha.repy |
Simple mandelbrot |
This is an implementation of the Mandelbrot fractal by Daniel Rosengren http://www.timestretch.com/article/mandelbrot_fractal_benchmark |
Sokoban |
An implementation of the Sokoban game |
Solitaire encryption |
Python implementation of Bruce Schneier's Solitaire Encryption algorithm by John Dell'Aquila. https://www.schneier.com/ |
Sudoku solvers |
We included 5 different Sudoku solvers by various authors:
|
Tic Tac Toe |
An implementation of the Tic Tac Toe game by Peter Goodspeed |
Voronoi |
This program implements a Voronoi diagram, which is a way of dividing space into a number of regions. A set of points is specified beforehand and for each seed there will be a corresponding region consisting of all points closer to that seed than to any other. The regions are called Voronoi cells. |
Yopyra ray tracer |
This program is a complete Python raytracer by Carlos Gonzalez Morcillo https://github.com/tomspur/shedskin/blob/master/examples/yopyra.py It was adapted from the examples included in the Shed Skin experimental Python implementation (http://code.google.com/p/shedskin/ ) |
Another block of testing programs is composed by the BioPython suite ( http://biopython.org/wiki/Main_Page ). BioPython is a set of freely available tools for biological computation written in Python. Its aim is to develop Python libraries and applications which address the needs of works in bioinformatics. This program suite has a high number of test programs included, that execute different functionalities of the suite using unit testing. Therefore, it makes use of metaprogramming features during its execution. We adapted 85 of these programs and individualized it as part of our measurements.
The final group comprises realistic commercial-grade Python applications. It also includes programs that use the services of a commercial framework, library or API. These programs use a large amount of the program language characteristics we measure, and are a good opportunity to check the performance of the Python implementations we measured with real workloads.
Looking for programs to include in this category is difficult because we couldn’t include those that use third party non-standard libraries or GUI toolkits, in order to try to execute them in the majority of Python implementations we measured. In the end, we included 14 applications in this category:
Program | Description |
---|---|
2to3 python code tanslation |
The 2to3 application is included in the Python 3.X distribution to help developers to adapt Python 2 code to the Python 3 specification. This test uses the 2to3 tool to translate itself. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Bazaar version control |
Bazaar ( http://bazaar.canonical.com/en/ ) is a version control system that helps you track project history over time and to collaborate easily with others. This application uses this application at command line issuing some commands. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
BlindElephant web application fingerprinter |
The BlindElephant Web Application Fingerprinter attempts to discover the version of a web application by comparing static files at known locations against precomputed hashes for versions of those files in all available releases. http://blindelephant.sourceforge.net/ . Testing was done with a local sample web page. |
Cog code generation tool |
Runs Cog http://www.nedbatchelder.com/code/cog, a simple code generation tool written in Python. http://www.python.org/about/success/cog/ |
Django template system |
Django template system (https://www.djangoproject.com/ ) to build a 150x150-cell HTML table. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Html5 library |
This application parses the HTML 5 spec (http://svn.whatwg.org/webapps/index ) using html5lib. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Mako template system |
Mako ( http://www.makotemplates.org/) is a web template library written in Python. It is used by popular web sites such as www.python.org. This application parses a sample web page with this template system. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Pyramid web framework |
Runs the Pyramid web framework, a small and fast Python web application development framework http://pyramid.readthedocs.org/en/latest/ |
Rietveld code review |
This application uses the Rietveld code review app http://code.google.com/p/rietveld/ along with the Django template system (https://www.djangoproject.com/ ). It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Spam Bayesian classifier |
This application runs a “canned” mailbox through a SpamBayes (http://spambayes.sourceforge.net/ ) ham/spam classifier. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
Spitfire template system |
This application uses the Spitfire template system (http://code.google.com/p/spitfire/ ) to build a 1000x1000-cell HTML table. It was adapted from the unladen swallow examples. http://code.google.com/p/unladen-swallow/wiki/Benchmarks |
SQLMap Injection testing tool |
Runs SQL Map, an open source penetration testing tool that automates the process of detecting and exploiting SQL injection flaws and taking over of database servers. http://sqlmap.org/ |
Volatility forensics framework |
The Volatility Framework is a completely open collection of tools, implemented in Python under the GNU General Public License, for the extraction of digital artifacts from volatile memory (RAM) samples. https://code.google.com/p/volatility/. A sample RAM dump was used for its execution. |
Web2Py web framework |
Runs the Web2Py web framework, a free open source full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications. http://www.web2py.com/ |