Skip to main content

Section 8.2 The Tools of Data Science

Over the past few chapters, we’ve gotten a pretty quick jump start on an analytical tool used by thousands of data analysts worldwide the open source R system for data analysis and visualization. Despite the many capabilities of R, however, there are hundreds of other tools used by data scientists, depending on the particular aspects of the data problem they focus on.
The single most popular and powerful tool, outside of R, is a proprietary statistical system called SAS (pronounced "sass"). SAS contains a powerful programming language that provides access to many data types, functions, and language features. Learning SAS is arguably as difficult (or as easy, depending upon your perspective) as learning R, but SAS is used by many large corporations because, unlike R, there is extensive technical and product support offered. Of course, this support does not come cheap, so most SAS users work in large organizations that have sufficient resources to purchase the necessary licenses and support plans.
Next in line in the statistics realm is SPSS, a package used by many scientists (the acronym used to stand for Statistical Package for the Social Sciences). SPSS is much friendlier than SAS, in the opinion of many analysts, but not quite as flexible and powerful.
R, SPSS, and SAS grew up as statistics packages, but there are also many general purpose programming languages that incorporate features valuable to data scientists. One very exciting development in programming languages has the odd name of "Processing." Processing is a programming language specifically geared toward creating data visualizations. Like R, Processing is an open source project, so it is freely available at http://processing.org/. Also like R, Processing is a cross-platform program, so it will run happily on Mac, Windows, and Linux. There are lots of books available for learning Processing and the website contains lots of examples for getting started. Besides R, Processing might be one of the most important tools in the data scientist’s toolbox, at least for those who need to use data to draw conclusions and communicate with others.
You have attempted of activities on this page.