Researchers and organizations that want to dip their toes into sophisticated data analysis, or solve a particular problem at scale, can now do so without spinning up their own Hadoop cluster. Pivotal Analytics Workbench is a project that provides free access to the largest publicly-available Hadoop cluster in 90-day segments, offering researchers and data scientists a low-friction way to take advantage of Hadoop’s prodigious storage and analytics capabilities.
Pivotal Analytics Workbench demonstrates Hadoop’s capabilities by offering the basic infrastructure for companies and organizations to run algorithms in a collaborative environment. Researchers can harness the Analytics Workbench’s mixed mode environment to analyze structured and unstructured data at a scale enabled by Pivotal’s 1000-node Hadoop cluster. Located at the state of the art SwitchNap Las Vegas data center, the Analytics Workbench boasts 1000 servers equipped with dual Intel Xeon processors, 12,000 Seagate 2 TB hard drives, and lightning-fast 40GB Ethernet adapters. Standard Hadoop software including MapReduce, HDFS, Hive, Jenkins, Plato, Hbase, Pig, and Mahout is also available to users.
To encourage adoption of the cluster and facilitate collaboration, Analytics Workbench includes a wealth of pre-loaded data relevant to researchers in numerous fields, including over 1.2TB of Twitter data, more than 2 TB of Comscore data, weather data from NASA and the Godard Space Center, as well as all publicly-available human genome data, amounting to over 2TB of genomic data. While users can load their own datasets into the cluster, these readily-available collections of valuable information can help accelerate the time to insight.
Since Analytics Workbench was first announced at last year’s EMC World, a diverse assortment of companies and research organizations including Intel, NASA, Mellanox, VMware, and Booz Allen Hamilton have used it to test new products, automate video analysis, and study historical weather patterns. Data integration company Informatica used the Analytics Workbench to conduct large-scale tests of the company’s Identity Resolution for Hadoop product during development. The company’s researchers generated 100 million row datasets to identify any performance restrictions with working with such algorithms on a large cluster and optimize performance.
Alpine Data Labs has extensively used the Analytics Workbench to develop and fine-tune algorithms that work at scale, and prototype new algorithms. The company tested its graphical data mining engine Alpine Illuminator at scale, and was able to demo the product on datasets with billions of rows and hundreds of columns against many complex models earlier this year at the Strata conference. During this demonstration, the company were able to identify naturally occurring clusters of people living in the US based on things like education levels, citizenship, gender and military service. They also determined which factors seemed most likely to be linked to higher or lower incomes. The demonstration was well-received, both showing off the product’s capabilities, as well as the scalability and speed of the Analytics Workbench cluster.
Going forward, the Analytics Workbench team is planning on adding more publicly-available datasets to the cluster, and expanding the range and number of companies and organizations taking advantage of the free platform for complex analytics. Find out more, and take advantage of the power of Pivotal’s 1000-node Hadoop cluster, at analyticsworkbench.com.
About the Author