2019 Sep

All (62) | Meetings (1) | News (1) | Opinions (20) | Tutorials, Overviews (40)

How AI will transform healthcare (and can it fix the US healthcare system?)

This thorough review focuses on the impact of AI, 5G, and edge computing on the healthcare sector in the 2020s as well as a look at quantum computing's potential impact on AI, healthcare, and financial services.

on Sep 30, 2019 in AI, Healthcare, Quantum Computing, Startups, USA
Know Your Data: Part 1

This article will introduce the different type of data sets, data object and attributes.

on Sep 30, 2019 in Beginners, Datasets
DeepMind Has Quietly Open Sourced Three New Impressive Reinforcement Learning Frameworks

Three new releases that will help researchers streamline the implementation of reinforcement learning programs.

on Sep 30, 2019 in DeepMind, Reinforcement Learning
Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.

on Sep 27, 2019 in Baseball, History, Sports, TIBCO, Time Series
What is Hierarchical Clustering?

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

on Sep 27, 2019 in Clustering, Machine Learning, Python
Data Mapping Using Machine Learning

Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.

on Sep 27, 2019 in Data Cleaning, Data Preparation, Machine Learning
Why data analysts should choose stories over statistics

Join the Crunch Data Conference in Budapest, Oct 16-18, with stellar speakers from companies like Facebook, Netflix and LinkedIn. Use the discount code ‘KDNuggets’ to save $100 off your conference ticket.

on Sep 26, 2019 in Budapest, Career Advice, Crunch Conference, Data Analytics, Data Science, Hungary, Storytelling
The Future of Analytics and Data Science

Learn about the current and future issues of data science and possible solutions from this interview with IADSS Co-founder, Dr. Usama Fayyad following his keynote speech at ODSC Boston 2019.

on Sep 26, 2019 in Analytics, IADSS, Kate Strachnyi, ODSC, Trends, Usama Fayyad
Natural Language in Python using spaCy: An Introduction

This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

on Sep 26, 2019 in NLP, Paco Nathan, Python, spaCy
Customer Segmentation for R Users

This article shows you how to separate your customers into distinct groups based on their purchase behavior. For the R enthusiasts out there, I demonstrated what you can do with r/stats, ggradar, ggplot2, animation, and factoextra.

on Sep 26, 2019 in Customer Analytics, R, Segmentation
6 bits of advice for Data Scientists

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.

on Sep 25, 2019 in Advice, Data Cleaning, Data Scientist, Metrics, Overfitting, Statistics
The thin line between data science and data engineering

Today, as companies have finally come to understand the value that data science can bring, more and more emphasis is being placed on the implementation of data science in production systems. And as these implementations have required models that can perform on larger and larger datasets in real-time, an awful lot of data science problems have become engineering problems.

on Sep 25, 2019 in Data Engineering, Data Science, Podcast
Beta Distribution: What, When & How

This article covers the beta distribution, and explains it using baseball batting averages.

on Sep 25, 2019 in Distribution, Probability, Statistics
Automatic Version Control for Data Scientists

How can you keep your machine learning models and data organized so you can collaborate effectively? Discover this new tool set available for better version control designed for the data scientist workflow.

on Sep 24, 2019 in Data Scientist, GitHub, Jupyter, Version Control
Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?

Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.

on Sep 24, 2019 in Challenges, Data Quality
A 2019 Guide for Automatic Speech Recognition

In this article, we’ll look at a couple of papers aimed at solving the problem of automated speech recognition with machine and deep learning.

on Sep 24, 2019 in NLP, Speech Recognition
12 Deep Learning Researchers and Leaders

Our list of deep learning researchers and industry leaders are the people you should follow to stay current with this wildly expanding field in AI. From early practitioners and established academics to entrepreneurs and today’s top corporate influencers, this diverse group of individuals is leading the way into tomorrow’s deep learning landscape.

on Sep 23, 2019 in Andrej Karpathy, Andrew Ng, Deep Learning, Demis Hassabis, Fei-Fei Li, Geoff Hinton, Ian Goodfellow, Influencers, Jeremy Howard, Research, Yann LeCun
A Single Function to Streamline Image Classification with Keras

We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

on Sep 23, 2019 in Image Classification, Image Recognition, Keras, Python
Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

The new open source framework that brings multi-task learning to conversational agents.

on Sep 23, 2019 in Microsoft, Multitask Learning, NLP
The Hidden Risk of AI and Big Data

With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?

on Sep 20, 2019 in AI, Big Data, Causation, Correlation, Overfitting, Risks
A Gentle Introduction to PyTorch 1.2

This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.

on Sep 20, 2019 in Neural Networks, Python, PyTorch
Automate Hyperparameter Tuning for Your Models

When we create our machine learning models, a common task that falls on us is how to tune them. So that brings us to the quintessential question: Can we automate this process?

on Sep 20, 2019 in Automated Machine Learning, Hyperparameter, Machine Learning, Modeling
Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.

on Sep 19, 2019 in Dataset, Machine Learning, scikit-learn, Synthetic Data
Applying Data Science to Cybersecurity Network Attacks & Events

Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.

on Sep 19, 2019 in Cybersecurity, Data Science, Machine Learning, Python, Security
5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

“I want to learn machine learning and artificial intelligence, where do I start?” Here.

on Sep 19, 2019 in Beginners, Data Science, Machine Learning, Python
The 5 Sampling Algorithms every Data Scientist need to know

Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.

on Sep 18, 2019 in Algorithms, Sampling
Reddit Post Classification

This article covers the implementation of a data scraping and natural language processing project which had two parts: scrape as many posts from Reddit’s API as allowed &then use classification models to predict the origin of the posts.

on Sep 18, 2019 in Classification, NLP, Reddit
Which Data Science Skills are core and which are hot/emerging ones?

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.

on Sep 17, 2019 in Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
Explore the world of Bioinformatics with Machine Learning

The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.

on Sep 17, 2019 in Bioinformatics, Machine Learning, Python
BERT, RoBERTa, DistilBERT, XLNet: Which one to use?

Lately, varying improvements over BERT have been shown — and here I will contrast the main similarities and differences so you can choose which one to use in your research or application.

on Sep 17, 2019 in BERT, NLP, Transformer
How Bad is Multicollinearity?

For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.

on Sep 17, 2019 in Analytics, Multicollinearity, Regression, Statistics
My journey path from a Software Engineer to BI Specialist to a Data Scientist

The career path of the Data Scientist remains a hot target for many with its continuing high demand. Becoming one requires developing a broad set of skills including statistics, programming, and even business acumen. Learn more about one person's experience making this journey, and discover the many resources available to help you find your way into a world of data science.

on Sep 16, 2019 in Career, Data Scientist, Software Engineer
5 Step Guide to Scalable Deep Learning Pipelines with d6tflow

How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.

on Sep 16, 2019 in Deep Learning, Pipeline, Python, PyTorch, Workflow
What is Machine Behavior?

The new emerging field that wants to study AI agents the way social scientists study humans.

on Sep 16, 2019 in AI, Humans vs Machines, Machine Intelligence
Cartoon: Unsupervised Machine Learning?

New KDnuggets Cartoon looks at one of the hottest directions in Machine Learning and asks "Can Machine Learning be too unsupervised?"

on Sep 14, 2019 in Cartoon, Humor, Machine Learning, Unsupervised Learning, Yann LeCun
Many Heads Are Better Than One: The Case For Ensemble Learning

While ensembling techniques are notoriously hard to set up, operate, and explain, with the latest modeling, explainability and monitoring tools, they can produce more accurate and stable predictions. And better predictions can be better for business.

on Sep 13, 2019 in Bagging, Boosting, Ensemble Methods, Machine Learning, XGBoost
Version Control for Data Science: Tracking Machine Learning Models and Datasets

I am a Git god, why do I need another version control system for Machine Learning Projects?

on Sep 13, 2019 in Data Science, Datasets, Machine Learning, Modeling, Version Control
There is No Free Lunch in Data Science

There is no such thing as a free lunch in life or data science. Here, we'll explore some science philosophy and discuss the No Free Lunch theorems to find out what they mean for the field of data science.

on Sep 12, 2019 in Algorithms, Data Science, Machine Learning, Optimization
Ensemble Methods for Machine Learning: AdaBoost

It turned out that, if we ask the weak algorithm to create a whole bunch of classifiers (all weak for definition), and then combine them all, what may figure out is a stronger classifier.

on Sep 12, 2019 in Adaboost, Ensemble Methods, Machine Learning, Python
A Friendly Introduction to Support Vector Machines

This article explains the Support Vector Machines (SVM) algorithm in an easy way.

on Sep 12, 2019 in Algorithms, Explained, Machine Learning, Support Vector Machines, SVM
Can graph machine learning identify hate speech in online social networks?

Online hate speech is a complex subject. Follow this demonstration using state-of-the-art graph neural network models to detect hateful users based on their activities on the Twitter social network.

on Sep 11, 2019 in Graph Analytics, Machine Learning, Social Network Analysis, Twitter
Train sklearn 100x Faster

As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.

on Sep 11, 2019 in Distributed Systems, Machine Learning, Python, scikit-learn, Training
Scikit-Learn vs mlr for Machine Learning

How does the scikit-learn machine learning library for Python compare to the mlr package for R? Following along with a machine learning workflow through each approach, and see if you can gain a competitive advantage by knowing both frameworks.

on Sep 10, 2019 in Exxact, Machine Learning, R, scikit-learn
The 5 Graph Algorithms That Data Scientists Should Know

In this post, I am going to be talking about some of the most important graph algorithms you should know and how to implement them using Python.

on Sep 10, 2019 in Algorithms, Data Science, Data Scientist, Graph, Python
Common Machine Learning Obstacles

In this blog, Seth DeLand of MathWorks discusses two of the most common obstacles relate to choosing the right classification model and eliminating data overfitting.

on Sep 9, 2019 in Cross-validation, Decision Trees, Logistic Regression, Machine Learning, MathWorks, Overfitting, SVM
A 2019 Guide to Speech Synthesis with Deep Learning

In this article, we’ll look at research and model architectures that have been written and developed to do just that using deep learning.

on Sep 9, 2019 in Deep Learning, NLP, Speech
OpenStreetMap Data to ML Training Labels for Object Detection

I am really interested in creating a tight, clean pipeline for disaster relief applications, where we can use something like crowd sourced building polygons from OSM to train a supervised object detector to discover buildings in an unmapped location.

on Sep 9, 2019 in Geospatial, Machine Learning, Object Detection, Python
10 Great Python Resources for Aspiring Data Scientists

This is a collection of 10 interesting resources in the form of articles and tutorials for the aspiring data scientist new to Python, meant to provide both insight and practical instruction when starting on your journey.

on Sep 9, 2019 in Data Science, Data Scientist, Programming, Python
What’s the difference between analytics and statistics?

From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.

on Sep 6, 2019 in Analytics, Explained, Statistics
I wasn’t getting hired as a Data Scientist. So I sought data on who is.

Instead of focusing on skills thought to be required of data scientists, we can look at what they have actually done before.

on Sep 6, 2019 in Career, Career Advice, Data Science, Data Science Skills, Data Scientist
Build Your First Voice Assistant

Hone your practical speech recognition application skills with this overview of building a voice assistant using Python.

on Sep 6, 2019 in Machine Learning, NLP, Python, Speech Recognition
TensorFlow Optimization Showdown: ActiveState vs. Anaconda

In this TensorFlow tutorial, you’ll learn the impact of optimizing both operators and entire graphs, how to efficiently organize data in training and testing datasets to minimize data shuffling, and how to identify a well-optimized model using Anaconda and ActivePython.

on Sep 5, 2019 in ActiveState, Anaconda, TensorFlow
Advice on building a machine learning career and reading research papers by Prof. Andrew Ng

This blog summarizes the career advice/reading research papers lecture in the CS230 Deep learning course by Stanford University on YouTube, and includes advice from Andrew Ng on how to read research papers.

on Sep 5, 2019 in Andrew Ng, Career, Machine Learning, Research
An Easy Introduction to Machine Learning Recommender Systems

Recommender systems are an important class of machine learning algorithms that offer "relevant" suggestions to users. Categorized as either collaborative filtering or a content-based system, check out how these approaches work along with implementations to follow from example code.

on Sep 4, 2019 in Beginners, Machine Learning, Python, Recommendation Engine, Recommender Systems
Python Libraries for Interpretable Machine Learning

In the following post, I am going to give a brief guide to four of the most established packages for interpreting and explaining machine learning models.

on Sep 4, 2019 in Bias, Interpretability, LIME, Machine Learning, Python, SHAP
An Overview of Topics Extraction in Python with Latent Dirichlet Allocation

A recurring subject in NLP is to understand large corpus of texts through topics extraction. Whether you analyze users’ online reviews, products’ descriptions, or text entered in search bars, understanding key topics will always come in handy.

on Sep 4, 2019 in LDA, NLP, Python, Text Analytics, Topic Modeling
TensorFlow vs PyTorch vs Keras for NLP

These three deep learning frameworks are your go-to tools for NLP, so which is the best? Check out this comparative analysis based on the needs of NLP, and find out where things are headed in the future.

on Sep 3, 2019 in Deep Learning, Exxact, Keras, NLP, PyTorch, TensorFlow
Beyond Neurons: Five Cognitive Functions of the Human Brain that we are Trying to Recreate with Artificial Intelligence

The quest for recreating cognitive capabilities of the brain in deep neural networks remains one of the elusive goals of AI. Let’s explore some human cognitive skills that are serving as inspiration to a new generation of AI techniques.

on Sep 3, 2019 in AI, Attention, Cognitive Computing, Inference, Neuroscience
Automate your Python Scripts with Task Scheduler: Windows Task Scheduler to Scrape Alternative Data

In this tutorial, you will learn how to run task scheduler to web scrape data from Lazada (eCommerce) website and dump it into SQLite RDBMS Database.

on Sep 3, 2019 in Data Science, Python, Web Scraping
6 Tips for Building a Training Data Strategy for Machine Learning

Without a well-defined approach for collecting and structuring training data, launching an AI initiative becomes an uphill battle. These six recommendations will help you craft a successful strategy.

on Sep 2, 2019 in Advice, Machine Learning, Training Data
Cartoon: Labor Day in the age of AI

KDnuggets cartoon looks at how AI will impact Labor Day in the year 2050.

on Sep 2, 2019 in AI, Cartoon, Labor Day, Robots
Top 10 Data Science Use Cases in Energy and Utilities

In this article, we will consider the most vivid data science use cases in the industry of energy and utilities.

on Sep 2, 2019 in Data Science, Energy, Use Cases, Utilities

2019 Sep

Latest Posts

Top Posts