Apache Spark
Apache Spark is distributed computing platform that provides near real time processing of data from various data sources. The data sources can vary from HDFS file system or Kafka or Flume or Relational Database.
There are many spark components which facilitate the integration with various data sources such as Spark SQL, Spark Streaming, Mlib, GraphX.