This is meant to be a repository of useful Spark information, links, projects, etc.
Spark For Bioinformatics
- ADAM - an API for genomics on Hadoop/Spark from Big Data Genomics
- Check out my description of Spark with a simple bioinformatics example
Helpful Links
- Apache Spark Project home at apache.org
- Dean Wampler’s Spark Workshop, complete with solutions and excellent follow along examples
- Spark Quick Start Guide - a good intro to Spark Shell and simple Spark apps in Scala, Python, or Java
- Spark & Data Science - an article by Cloudera’s Sean Owen on the Data Science landscape, and how it slopes towards Spark
- databricks - a Spark vendor
- Apache Spark Youtube Channel - chock full of wonderful content
Interesting Spark Projects
- MLLib - a machine learning library built from Spark
- SparkR - run functions and or closures on a Spark cluster from an R script
- GraphX - efficiently express graph computation at scale through Spark
Apache Spark originated at Berkeley’s AMPLab.
Spark on Twitter
- @ApacheSpark
- @AMPLab - the Berkeley lab where Spark began
- @deanwampler - from @typesafe
- @sean_r_owen - data science at cloudera
- @GeoTrellis - GeoSpatial framework on Spark
Talks
- Dave Patterson - faculty member at @AMPLab talking about the future of big data and genomics research
- Dean Wampler talking about the connection between databricks and typesafe and the ecosystem in general