2018 Oct

Labeling Unstructured Text for Meaning to Achieve Predictive Lift

In this post, we examine several advance NLP techniques, including: labeling nouns and noun phrases for meaning, labeling (most often) adverbs and adjectives for sentiment, and labeling verbs for intent.

on Oct 31, 2018 in NLP, Overfitting, Text Mining, Unstructured data
Cartoon: Halloween Costume for Big Data.

We revisit KDnuggets cartoon looking at the appropriate Halloween costume for Big Data and its companion, No Privacy.

on Oct 31, 2018 in Big Data, Cartoon, Halloween, Privacy
Amazing consistency: Largest Dataset Analyzed / Data Mined – Poll Results and Trends

The poll results show amazing consistency to past years, with median answers still in 10-100 gigabytes range. Really Big Data Scientists (100 Petabytes and more) continue to stand apart, but remain small segment where Asian data scientists lead for the first time in this poll.

on Oct 29, 2018 in Asia, Dataset, Europe, Largest, Poll, USA
Introduction to Deep Learning with Keras

In this article, we’ll build a simple neural network using Keras. Now let’s proceed to solve a real business problem: an insurance company wants you to develop a model to help them predict which claims look fraudulent.

on Oct 29, 2018 in Deep Learning, Keras, Neural Networks, Python
SQL, Python, & R in One Platform

No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.

on Oct 26, 2018 in Data Visualization, Mode Analytics, Python, R, SQL
Notes on Feature Preprocessing: The What, the Why, and the How

This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.

on Oct 26, 2018 in Data Preparation, Data Preprocessing, numpy, Python, scikit-learn, SciPy
Naive Bayes from Scratch using Python only – No Fancy Frameworks

We provide a complete step by step pythonic implementation of naive bayes, and by keeping in mind the mathematical & probabilistic difficulties we usually face when trying to dive deep in to the algorithmic insights of ML algorithms, this post should be ideal for beginners.

on Oct 25, 2018 in Machine Learning, Naive Bayes, Python
Named Entity Recognition and Classification with Scikit-Learn

Named Entity Recognition and Classification is a process of recognizing information units like names, including person, organization and location names, and numeric expressions from unstructured text. The goal is to develop practical and domain-independent techniques in order to detect named entities with high accuracy automatically.

on Oct 25, 2018 in NLP, Text Classification, Text Mining
Implementing Automated Machine Learning Systems with Open Source Tools

What if you want to implement an automated machine learning pipeline of your very own, or automate particular aspects of a machine learning pipeline? Rest assured that there is no need to reinvent any wheels.

on Oct 25, 2018 in Automated Machine Learning, Feature Engineering, Feature Selection, Hyperparameter, Machine Learning, Open Source
Generative Adversarial Networks – Paper Reading Road Map

To help the others who want to learn more about the technical sides of GANs, I wanted to share some papers I have read in the order that I read them.

on Oct 24, 2018 in GANs, Generative Adversarial Network, Neural Networks
Building a Question-Answering System from Scratch

This part will focus on introducing Facebook sentence embeddings and how it can be used in building QA systems. In the future parts, we will try to implement deep learning techniques, specifically sequence modeling for this problem.

on Oct 24, 2018 in Machine Learning, NLP, Question answering
Introduction to Active Learning

An extensive overview of Active Learning, with an explanation into how it works and can assist with data labeling, as well as its performance and potential limitations.

on Oct 23, 2018 in Active Learning, Data Preparation, Figure Eight, Machine Learning
Get a 2–6x Speed-up on Your Data Pre-processing with Python

Get a 2–6x speed-up on your pre-processing with these 3 lines of code!

on Oct 23, 2018 in Data Preprocessing, Efficiency, Programming, Python
How to Define a Machine Learning Problem Like a Detective

The common refrain among machine learning practitioners is that it’s as much an art as a science. True enough, but in this discipline, you can only appreciate the former if you understand the latter.

on Oct 22, 2018 in Crime, Data journalism, Machine Learning
The Intuitions Behind Bayesian Optimization with Gaussian Processes

Bayesian Optimization adds a Bayesian methodology to the iterative optimizer paradigm by incorporating a prior model on the space of possible target functions. This article introduces the basic concepts and intuitions behind Bayesian Optimization with Gaussian Processes.

on Oct 19, 2018 in Bayesian, Distribution, Hyperparameter, Machine Learning, Optimization
Apache Spark Introduction for Beginners

An extensive introduction to Apache Spark, including a look at the evolution of the product, use cases, architecture, ecosystem components, core concepts and more.

on Oct 18, 2018 in Apache Spark, Beginners, Hadoop, R
Graphs Are The Next Frontier In Data Science

GraphConnect 2018, Neo4j’s bi-annual conference, was held in New York City in mid-September. Read about what happened, and why graphs are the next big thing in data science.

on Oct 18, 2018 in Conference, Data Science, Graph Analytics, Neo4j
Music for Data Scientists? Music by Data Scientists? …What…?!

Introducing Mean Reversion, an NYC-based songwriting duo comprising of data scientist Foster Provost and statistician Cliff Hurvich.

on Oct 17, 2018 in Foster Provost, Humor, Music
The Main Approaches to Natural Language Processing Tasks

Let's have a look at the main approaches to NLP tasks that we have at our disposal. We will then have a look at the concrete NLP tasks we can tackle with said approaches.

on Oct 17, 2018 in Machine Learning, Neural Networks, NLP, Text Classification
Adversarial Examples, Explained

Deep neural networks—the kind of machine learning models that have recently led to dramatic performance improvements in a wide range of applications—are vulnerable to tiny perturbations of their inputs. We investigate how to deal with these vulnerabilities.

on Oct 16, 2018 in Adversarial, Deep Learning
Applied Data Science: Solving a Predictive Maintenance Business Problem Part 3

In this post we will expand our analysis to multiple variables and then see how intuitions we develop during the exploration phase, can lead to generating new features for modelling.

on Oct 16, 2018 in Business Context, Data Science, Predictive Maintenance
5 “Clean Code” Tips That Will Dramatically Improve Your Productivity

TL;DR: If it isn’t tested, it’s broken; Choose meaningful names; Classes and functions should be small and obey the Single Responsibility Principle (SRP); Catch and handle exceptions, even if you don’t think you need to; Logs, logs, logs

on Oct 15, 2018 in Efficiency, Programming
Machine Reading Comprehension: Learning to Ask & Answer

Investigating the dual ask-answer network, covering the embedding, encoding, attention and output layer, as well as the loss function, with code examples to help you get started.

on Oct 11, 2018 in Tencent, Text Analysis, Text Classification
Using Confusion Matrices to Quantify the Cost of Being Wrong

The terms ‘true condition’ (‘positive outcome’) and ‘predicted condition’ (‘negative outcome’) are used when discussing Confusion Matrices. This means that you need to understand the differences (and eventually the costs associated) with Type I and Type II Errors.

on Oct 11, 2018 in Confusion Matrix, Data Science, Machine Learning, Metrics, Predictive Modeling
Evaluating the Business Value of Predictive Models in Python and R

In these blogs for R and python we explain four valuable evaluation plots to assess the business value of a predictive model. We show how you can easily create these plots and help you to explain your predictive model to non-techies.

on Oct 11, 2018 in Business Value, Data Visualization, Lift charts, Predictive Models, Python, R
10 Best Mobile Apps for Data Scientist / Data Analysts

A collection of useful mobile applications that will help enhance your vital data science and analytic skills. These free apps can improve your listening abilities, logical skills, basic leadership qualities and more.

on Oct 10, 2018 in Apps, Data Scientist, Mobile, Python
Preprocessing for Deep Learning: From covariance matrix to image whitening

The goal of this post/notebook is to go from the basics of data preprocessing to modern techniques used in deep learning. My point is that we can use code (Python/Numpy etc.) to better understand abstract mathematical notions!

on Oct 10, 2018 in Data Preprocessing, Deep Learning, Image Processing, Mathematics
Top 8 Python Machine Learning Libraries

Part 1 of a new series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.

on Oct 9, 2018 in GitHub, Keras, Machine Learning, Python
How To Learn Data Science If You’re Broke

A first-hand account on how to learn data science on a budget, with advice covering useful resources, a recommended curriculum, typical concepts, building a portfolio and more.

on Oct 9, 2018 in Beginners, Career, Data Science, Data Science Education
Semantic Interoperability: Are you training your AI by mixing data sources that look the same but aren’t?

Semantic interoperability is a challenge in AI systems, especially since data has become increasingly more complex. The other issue is that semantic interoperability may be compromised when people use the same system differently.

on Oct 9, 2018 in AI, Datasets, Healthcare, Semantic Analysis
Building an Image Classifier Running on Raspberry Pi

The tutorial starts by building the Physical network connecting Raspberry Pi to the PC via a router. After preparing their IPv4 addresses, SSH session is created for remotely accessing of the Raspberry Pi. After uploading the classification project using FTP, clients can access it using web browsers for classifying images.

on Oct 9, 2018 in Classifier, Image Recognition, Raspberry Pi
BIG, small or Right Data: Which is the proper focus?

For most businesses, having and using big data is either impossible, impractical, costly to justify, or difficult to outsource due to the over demand of qualified resources. So, what are the benefits of using small data?

on Oct 8, 2018 in Big Data, Big Data Analytics, Data Analytics, Small Data
Things you should know when traveling via the Big Data Engineering hype-train

Maybe you want to join the Big Data world? Or maybe you are already there and want to validate your knowledge? Or maybe you just want to know what Big Data Engineers do and what skills they use? If so, you may find the following article quite useful.

on Oct 8, 2018 in Big Data, Big Data Hype, Data Engineering, Hype
Basic Image Data Analysis Using Python – Part 4

Accessing the internal component of digital images using Python packages helps the user understand its properties, as well as its nature.

on Oct 5, 2018 in Computer Vision, Image Processing, Python
A Concise Explanation of Learning Algorithms with the Mitchell Paradigm

A single quote from Tom Mitchell can shed light on both the abstract concept and concrete implementations of machine learning algorithms.

on Oct 5, 2018 in Algorithms, Learning, Machine Learning, Tom Mitchell
Understand Why ODSC is the Most Recommended Conference for Applied Data Science

Running 4 days, 40 training sessions, 50 workshops, and over 200 speakers, an ODSC conference offers unparalleled depth and breadth in deep learning, machine learning, and other data science topics. Save 20% offer ends tomorrow. Register now!

on Oct 4, 2018 in CA, Data Science, ODSC, San Francisco
Semantic Segmentation: Wiki, Applications and Resources

An extensive overview covering the features of Semantic Segmentation and possible uses for it, including GeoSensing, Autonomous Drive, Facial Recognition and more.

on Oct 4, 2018 in Deep Learning, Image Recognition, Machine Learning, Object Detection, Segmentation
Top 3 Trends in Deep Learning

We investigate the intermediate stage of deep learning, and the trends that are emerging in response to the challenges at this stage, including Interoperability and the multi-deployment options.

on Oct 3, 2018 in Cloud Computing, Deep Learning, MathWorks
Linear Regression in the Wild

We take a look at how to use linear regression when the dependent variables have measurement errors.

on Oct 3, 2018 in Algorithms, Linear Regression, Python
Sequence Modeling with Neural Networks – Part I

In the context of this post, we will focus on modeling sequences as a well-known data structure and will study its specific learning framework.

on Oct 3, 2018 in Neural Networks, NLP, Recurrent Neural Networks, Sequences
How to Create a Simple Neural Network in Python

The best way to understand how neural networks work is to create one yourself. This article will demonstrate how to do just that.

By Dr. Michael J. Garbade on Oct 2, 2018 in Machine Learning, Neural Networks, Python
5 Reasons Why You Should Use Cross-Validation in Your Data Science Projects

In cross-validation, we do more than one split. We can do 3, 5, 10 or any K number of splits. Those splits called Folds, and there are many strategies we can create these folds with.

on Oct 2, 2018 in Cross-validation, Data Science, Machine Learning
Top Stories, Sep 24-30: Machine Learning Cheat Sheets; Learning the Mathematics of Machine Learning

Also: Math for Machine Learning; Introducing Path Analysis Using R; Introduction to Deep Learning; Essential Math for Data Science: Why and How; 6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study

on Oct 1, 2018 in Top stories
Recent Advances for a Better Understanding of Deep Learning

A summary of the newest deep learning trends, including Non Convex Optimization, Overparametrization and Generalization, Generative Models, Stochastic Gradient Descent (SGD) and more.

on Oct 1, 2018 in Deep Learning, Explained, Flat Minima, Linear Networks, Machine Learning, Optimization, SGD
More Effective Transfer Learning for NLP

Until recently, the natural language processing community was lacking its ImageNet equivalent — a standardized dataset and training objective to use for training base models.

on Oct 1, 2018 in Neural Networks, NLP, Transfer Learning, Word Embeddings

2018 Oct

Latest Posts

Top Posts