2019 Apr

All (46) | Courses, Education (1) | News (3) | Opinions (20) | Tutorials, Overviews (22)

Naive Bayes: A Baseline Model for Machine Learning Classification Performance

We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.

on May 7, 2019 in Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
Learn About Data Science & the Future of Investing from Hedge Fund Leaders at Rev 2

Rev 2 features interactive sessions, Q&A with industry luminaries, poster sessions for interesting modeling techniques and accomplishments, and stimulating conversations about how to make data science an enterprise-grade capability.

on Apr 30, 2019 in Data Science, Domino, Hedge fund, Investment, New York City, NY
Interview Questions for Data Science – Three Case Interview Examples

Part two in this series of useful posts for aspiring data scientists focuses on case interviews and how you can best go about answering them.

on Apr 30, 2019 in Career, Data Science, Interview Questions, Kaiser Fung
Top Data Science and Machine Learning Methods Used in 2018, 2019

Once again, the most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests. The greatest relative increases this year are overwhelmingly Deep Learning techniques, while SVD, SVMs and Association Rules show the greatest decline.

on Apr 29, 2019 in Algorithms, Clustering, Data Science, Deep Learning, Machine Learning, Poll, Regression
Pandas DataFrame Indexing

The goal of this post is identify a single strategy for pulling data from a DataFrame using the Pandas Python library that is straightforward to interpret and produces reliable results.

on Apr 29, 2019 in Data Science, Pandas, Python
The most desired skill in data science

What is the biggest skill gap in data science according to hiring managers looking for hire recent graduates? Hint: it’s not coding.

on Apr 26, 2019 in Data Science, Data Science Skills, Kaiser Fung, Self-Driving Car
Projects to Include in a Data Science Portfolio

“Don’t pick just random projects to work on and add it to your resume or portfolio. Solve a problem that relates to the companies that you’re interested in.”

on Apr 26, 2019 in Career Advice, Data Science, Dataquest, Portfolio
Graduating in GANs: Going From Understanding Generative Adversarial Networks to Running Your Own

Read how generative adversarial networks (GANs) research and evaluation has developed then implement your own GAN to generate handwritten digits.

on Apr 25, 2019 in Deep Learning, GANs, Generative Adversarial Network, Generative Models, MNIST, Neural Networks, Python
Generative Adversarial Networks – Key Milestones and State of the Art

We provide an overview of Generative Adversarial Networks (GANs), discuss challenges in GANs learning, and examine two promising GANs: the RadialGAN, designed for numbers, and the StyleGAN, which does style transfer for images.

on Apr 24, 2019 in GANs, Generative Adversarial Network, NVIDIA
Machine Learning and Deep Link Graph Analytics: A Powerful Combination

We investigate how graphs can help machine learning and how they are related to deep link graph analytics for Big Data.

on Apr 23, 2019 in Fraud Detection, Graph Analytics, Graph Databases, Machine Learning, XAI
2019 Best Masters in Data Science and Analytics – Online

We provide an updated comprehensive and objective survey of online Masters in Analytics and Data Science, including rankings, tuition, and duration of the education program.

on Apr 23, 2019 in Data Analytics, Data Science, Education, Master of Science, MS in Analytics, MS in Data Science, Online Education
Was it Worth Studying a Data Science Masters?

As I started to apply for Data Science roles it quickly became apparent that I was lacking two key skills: applying Machine Learning and coding

on Apr 23, 2019 in Advice, Career, Data Science, Data Scientist, UK
Easy Way to Scrape Data from Website By Yourself

Introducing Octoparse, a simple cloud-based website data scrapper that will let you extract any web data in real-time and coding is not needed.

on Apr 22, 2019 in Cloud Computing, Octoparse, Web Scraping
The Mueller Report Word Cloud: A brief tutorial in R

Word clouds are simple visual summaries of the mostly frequently used words in a text, presenting essentially the same information as a histogram but are somewhat less precise and vastly more eye-catching. Get a quick sense of the themes in the recently released Mueller Report and its 448 pages of legal content.

on Apr 22, 2019 in Donald Trump, Politics, R, Word Cloud
How To Go Into Data Science: Ultimate Q&A for Aspiring Data Scientists with Serious Guides

To learn ALL the skills sets in data science is next to impossible as the scope is way too wide. There’ll always be some skills (technical/non-technical) that data scientists don’t know or haven’t learned as different businesses require different skill sets.

on Apr 22, 2019 in Advice, Career, Data Science, Data Science Education, Data Scientist, Online Education
The Rise of Generative Adversarial Networks

A comprehensive overview of Generative Adversarial Networks, covering its birth, different architectures including DCGAN, StyleGAN and BigGAN, as well as some real-world examples.

on Apr 19, 2019 in Art, Deepfakes, GANs, Generative Adversarial Network, Ian Goodfellow
Data Visualization in Python: Matplotlib vs Seaborn

Seaborn and Matplotlib are two of Python's most powerful visualization libraries. Seaborn uses fewer syntax and has stunning default themes and Matplotlib is more easily customizable through accessing the classes.

on Apr 19, 2019 in Advice, Data Visualization, Matplotlib, Python, Seaborn
Sisense BloX – Go Beyond Dashboards

Introducing Sisense BloX, the tool that allows you to integrate your business platforms inside your dashboards using prebuilt templates. Users stay within the dashboard environment and go from understanding insights to taking action—in one click.

on Apr 18, 2019 in Analytics, Dashboard, Sisense
3 Big Problems with Big Data and How to Solve Them

We discuss some of the negatives of using big data, including false equivalences and bias, vulnerability to security breaches, protecting against unauthorized access and the lack of international standards for data privacy regulations.

on Apr 18, 2019 in Advice, Bias, Big Data, Privacy, Security
Distributed Artificial Intelligence: A primer on Multi-Agent Systems, Agent-Based Modeling, and Swarm Intelligence

Distributed Artificial Intelligence (DAI) is a class of technologies and methods that span from swarm intelligence to multi-agent technologies. It is one of the subsets of AI where simulation has greater importance that point-prediction.

on Apr 18, 2019 in AI, Distributed Systems, Modeling, Swarm Intelligence
How Optimization Works

Optimization problems are naturally described in terms of costs - money, time, resources - rather than benefits. In math it's convenient to make all your problems look the same before you work out a solution, so that you can just solve it the one time.

on Apr 18, 2019 in Data Science, Data Scientist, Gradient Descent, Optimization, Prescriptive Analytics
Best Data Visualization Techniques for small and large data

Data visualization is used in many areas to model complex events and visualize phenomena that cannot be observed directly, such as weather patterns, medical conditions or mathematical relationships. Here we review basic data visualization tools and techniques.

on Apr 17, 2019 in Big Data, Charts, Data Visualization, Histogram, Sciforce
Building a Flask API to Automatically Extract Named Entities Using SpaCy

This article discusses how to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask.

on Apr 17, 2019 in API, Flask, NLP, Python
How Machines Make Sense of Big Data: an Introduction to Clustering Algorithms

We outline three different clustering algorithms - k-means clustering, hierarchical clustering and Graph Community Detection - providing an explanation on when to use each, how they work and a worked example.

on Apr 16, 2019 in Algorithms, Clustering, Explained
2019 Best Masters in Data Science and Analytics – Europe Edition

We provide an updated list of our comprehensive, unbiased survey of graduate programs in Data Science and Analytics from across Europe.

on Apr 16, 2019 in Data Analytics, Data Science, Education, Europe, Master of Science, MS in Analytics, MS in Data Science
Data Science with Optimus Part 2: Setting your DataOps Environment

Breaking down data science with Python, Spark and Optimus. Today: Data Operations for Data Science. Here we’ll learn to set-up Git, Travis CI and DVC for our project.

on Apr 16, 2019 in Apache Spark, Data Operations, Data Science, Python, Workflow
An introduction to explainable AI, and why we need it

We introduce explainable AI, why it is needed, and present the Reversed Time Attention Model, Local Interpretable Model-Agnostic Explanation and Layer-wise Relevance Propagation.

on Apr 15, 2019 in AI, Explainable AI, LIME, Machine Learning, XAI
Data Science with Optimus Part 1: Intro

With Optimus you can clean your data, prepare it, analyze it, create profilers and plots, and perform machine learning and deep learning, all in a distributed fashion, because on the back-end we have Spark, TensorFlow, Sparkling Water and Keras. It’s super easy to use.

on Apr 15, 2019 in Apache Spark, Data Science, Python, Workflow
How can quantum computing be useful for Machine Learning

We investigate where quantum computing and machine learning could intersect, providing plenty of use cases, examples and technical analysis.

on Apr 12, 2019 in Machine Learning, Quantum Computing, SVM
Why Data Scientists Need To Work In Groups

If you read this article you will see that the job of data scientist is NOT listed. The rest of this article will explore why it is true that data scientists need to work in groups.

on Apr 12, 2019 in Career Advice, Data Scientist
Beyond Siri, Google Assistant, and Alexa – what you need to know about AI Conversational Applications

We discuss industry trends in Artificial Intelligence with Vijay Ramakrishnan, a machine learning engineer and expert in conversational applications.

on Apr 10, 2019 in AI, Alexa, Google, Siri, Virtual Assistant
S2DS, a 5-week data science bootcamp helping analytical PhDs transition from academia to industry

Introducing Europe’s largest data science training programme. Five weeks of intensive, project-based training turning exceptional analytical PhDs and MScs into Data Scientists.

on Apr 9, 2019 in Bootcamp, Data Science, Pivigo
All you need to know about text preprocessing for NLP and Machine Learning

We present a comprehensive introduction to text preprocessing, covering the different techniques including stemming, lemmatization, noise removal, normalization, with examples and explanations into when you should use each of them.

on Apr 9, 2019 in Data Preprocessing, Machine Learning, NLP, Python, Text Analysis, Text Mining
Which Data Science / Machine Learning methods and algorithms did you use in 2018/2019 for a real-world application?

Which Data Science / Machine Learning methods and algorithms did you use in 2018/2019 for a real-world application? Take part in the latest KDnuggets survey and have your say.

on Apr 9, 2019 in Algorithms, Data Science, Machine Learning, Poll
Advice for New Data Scientists

We provide advice for junior data scientists as they begin their career, with tips and commentary from a tech lead at Airbnb.

on Apr 8, 2019 in Advice, Beginners, Data Scientist
Spatio-Temporal Statistics: A Primer

Marketing scientist Kevin Gray asks University of Missouri Professor Chris Wikle about Spatio-Temporal Statistics and how it can be used in science and business.

on Apr 5, 2019 in Interview, Spatio-Temporal, Statistics
Another 10 Free Must-See Courses for Machine Learning and Data Science

Check out another follow-up collection of free machine learning and data science courses to give you some spring study ideas.

on Apr 5, 2019 in AI, Data Science, Deep Learning, Keras, Machine Learning, NLP, Reinforcement Learning, TensorFlow, U. of Washington, UC Berkeley, Unsupervised Learning
Training a Champion: Building Deep Neural Nets for Big Data Analytics

Introducing Sisense Hunch, the new way of handling Big Data sets that uses AQP technology to construct Deep Neural Networks (DNNs) which are trained to learn the relationships between queries and their results in these huge datasets.

on Apr 4, 2019 in Big Data Analytics, Deep Learning, Neural Networks, Sisense, SQL
Building a Recommender System

A beginners guide to building a recommendation system, with a step-by-step guide on how to create a content-based filtering system to recommend movies for a user to watch.

on Apr 4, 2019 in Movies, Python, Recommendation Engine, Recommender Systems
Predict Age and Gender Using Convolutional Neural Network and OpenCV

Age and gender estimation from a single face image are important tasks in intelligent applications. As such, let's build a simple age and gender detection model in this detailed article.

on Apr 4, 2019 in Computer Vision, Convolutional Neural Networks, OpenCV, Python
Getting started with NLP using the PyTorch framework

We discuss the classes that PyTorch provides for helping with Natural Language Processing (NLP) and how they can be used for related tasks using recurrent layers.

on Apr 3, 2019 in Neural Networks, NLP, PyTorch, Recurrent Neural Networks
How to DIY Your Data Science Education

Some people find the path of formal education works well for them, but this may not work for everyone, in every situation. Here are eight ways that you can take a DIY approach to your data science education.

on Apr 3, 2019 in Books, Data Science, Data Science Education, MOOC, Podcast, Programming Languages, Youtube
Top 8 Data Science Use Cases in Gaming

The understanding of the data value for optimization and improvement of gaming makes specialists search for new ways to apply data science and its benefits in the gaming business. Therefore, various specific data science use cases appear. Here is our list of the most efficient and widely applied data science use cases in gaming.

on Apr 3, 2019 in Data Science, Gaming, Use Cases
7 Qualities Your Big Data Visualization Tools Absolutely Must Have and 10 Tools That Have Them

Without the right visualization tools, raw data is of little use. Data visualization helps present the data in an interactive visual format. Here are the qualities to look for in a data visualization tool.

on Apr 2, 2019 in Big Data, Data Visualization, Domo, Plotly, Power BI, QlikView, Sisense, Tableau
Which Face is Real?

Which Face Is Real? was developed based on Generative Adversarial Networks as a web application in which users can select which image they believe is a true person and which was synthetically generated. The person in the synthetically generated photo does not exist.

on Apr 2, 2019 in Deep Learning, GANs, Generative Adversarial Network, Neural Networks, NVIDIA, Python
Top 10 Coding Mistakes Made by Data Scientists

Here is a list of 10 common mistakes that a senior data scientist — who is ranked in the top 1% on Stackoverflow for python coding and who works with a lot of (junior) data scientists — frequently sees.

on Apr 2, 2019 in Data Science, Data Scientist, Mistakes, Programming

2019 Apr

Latest Posts

Top Posts