XGBoost: Implementing the Winningest Kaggle Algorithm in Spark and Flink
An overview of XGBoost4J, a JVM-based implementation of XGBoost, one of the most successful recent machine learning algorithms in Kaggle competitions, with distributed support for Spark and Flink.
on Mar 24, 2016 in Apache Spark, Distributed Systems, Flink, Kaggle, XGBoost
scikit-feature: Open-Source Feature Selection Repository in Python
scikit-feature is an open-source feature selection repository in python, with around 40 popular algorithms in feature selection research. It is developed by Data Mining and Machine Learning Lab at Arizona State University.
on Mar 3, 2016 in Data Mining, Data Science, Feature Extraction, Feature Selection, Machine Learning, Python
Top Spark Ecosystem Projects
Apache Spark has developed a rich ecosystem, including both official and third party tools. We have a look at 5 third party projects which complement Spark in 5 different ways.
on Mar 2, 2016 in Apache Mesos, Apache Spark, Cassandra, Databricks, Distributed Systems
New Salford Predictive Modeler 8
Salford Predictive Modeler software suite: Faster. More Comprehensive Machine Learning. More Automation. Better results. Take a giant step forward in your data science productivity with SPM 8. Download and try it today!
on Mar 1, 2016 in Data Science Platform, Decision Trees, Gradient Boosting, Predictive Modeler, Regression, Salford Systems
Distributed TensorFlow Has Arrived
Google has open sourced its distributed version of TensorFlow. Get the info on it here, and catch up on some other TensorFlow news at the same time.
on Mar 1, 2016 in Deep Learning, Distributed Systems, Google, Matthew Mayo, TensorFlow
|