2020 Mar

All (61) | News, Education (5) | Opinions (21) | Top Stories, Tweets (1) | Tutorials, Overviews (34)

Research into 1,001 Data Scientist LinkedIn Profiles, the latest

What makes a data scientist today? Consider this review of data collected from three years worth of data scientist LinkedIn profiles to gain insight into how this important new career path is shaping up.

on Mar 31, 2020 in Career, Data Scientist, Education, Industry, LinkedIn
365 Data Science courses free until 15 April

Be safe. Stay at home. Learn data science. 365 Data Science is making all of their courses free until Apr 15. Sign up now and learn for free!

on Mar 31, 2020 in Courses, Data Science Education, Online Education
Microsoft Research Uses Transfer Learning to Train Real-World Autonomous Drones

The new research uses policies learned in simulations in real world drone environments.

on Mar 31, 2020 in Microsoft, Research, Transfer Learning
How (not) to use Machine Learning for time series forecasting: The sequel

Developing machine learning predictive models from time series data is an important skill in Data Science. While the time element in the data provides valuable information for your model, it can also lead you down a path that could fool you into something that isn't real. Follow this example to learn how to spot trouble in time series data before it's too late.

on Mar 30, 2020 in Forecasting, Machine Learning, Mistakes, Time Series
Brain Tumor Detection using Mask R-CNN

Mask R-CNN has been the new state of the art in terms of instance segmentation. Here I want to share some simple understanding of it to give you a first look and then we can move ahead and build our model.

on Mar 30, 2020 in Brain, Cancer Detection, Convolutional Neural Networks, Healthcare, Medical
Advice for a Successful Data Science Career

This blog is meant to show that most everyone has had to expend quite a bit of effort to get where they are. They have to work hard, sometimes experience failure, show discipline, be persistent, be dedicated to their goals, and sometimes sacrifice or take risks.

on Mar 30, 2020 in Career Advice, Data Science
Predicting the President: Two Ways Election Forecasts Are Misunderstood

With election cycles always seeming to be in season, predictions on outcomes remain intriguing content for the voting citizens. Misinterpretation of election forecasts also runs rampant, and can impact perceptions of candidates and those who post these predictions. A better fundamental understanding of probability can help improve our collective notion of futurism, and how we monitor elections.

on Mar 27, 2020 in Elections, Forecasting, Mistakes, Politics, Prediction
COVID-19 Visualized: The power of effective visualizations for pandemic storytelling

Clear, succinct data visualizations can be powerful tools for telling stories and explaining phenomena. This article demonstrates this concept as relates to the COVID-19 pandemic.

on Mar 27, 2020 in Coronavirus, Data Visualization, Visualization
Introduction to Kubeflow MPI Operator and Industry Adoption

Kubeflow just announced its first major 1.0 release recently. This post introduces the MPI Operator, one of the core components of Kubeflow, currently in alpha, which makes it easy to run synchronized, allreduce-style distributed training on Kubernetes.

on Mar 27, 2020 in Cloud, Kubeflow, Kubernetes, Machine Learning
Deep Learning Breakthrough: a sub-linear deep learning algorithm that does not need a GPU?

Deep Learning sits at the forefront of many important advances underway in machine learning. With backpropagation being a primary training method, its computational inefficiencies require sophisticated hardware, such as GPUs. Learn about this recent breakthrough algorithmic advancement with improvements to the backpropgation calculations on a CPU that outperforms large neural network training with a GPU.

on Mar 26, 2020 in Algorithms, Deep Learning, GPU, Machine Learning
How To Painlessly Analyze Your Time Series

The Matrix Profile is a powerful tool to help solve this dual problem of anomaly detection and motif discovery. Matrix Profile is robust, scalable, and largely parameter-free: we’ve seen it work for a wide range of metrics including website user data, order volume and other business-critical applications.

on Mar 26, 2020 in Anomaly Detection, API, Python, Time Series
Making sense of ensemble learning techniques

This article breaks down ensemble learning and how it can be used for problem solving.

on Mar 26, 2020 in Algorithms, Data Science, Ensemble Methods, Machine Learning
Evaluating Ray: Distributed Python for Massive Scalability

If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.

on Mar 25, 2020 in Domino, Python, Scalability
Alternative Data, Text Analytics, and Sentiment Analysis in Trading and Investing

Different types of data beyond your typical dollars and cents have been used in the finance industry for many years. By leveraging machine learning, sentiment data is expected to play an increasingly dominant role in the investment industry, and this article highlights some special challenges of its use in trading models.

on Mar 25, 2020 in Investment, Sentiment Analysis, Text Analytics
Diffusion Map for Manifold Learning, Theory and Implementation

This article aims to introduce one of the manifold learning techniques called Diffusion Map. This technique enables us to understand the underlying geometric structure of high dimensional data as well as to reduce the dimensions, if required, by neatly capturing the non-linear relationships between the original dimensions.

on Mar 25, 2020 in Data Preparation, Data Science, Dimensionality Reduction, Feature Engineering, Machine Learning
Top AI Resources – Directory for Remote Learning

Whether you are just learning Data Science, a current professional, or just interested, it's crucial to keep the mind stimulated and stay current. With conferences, schools, and travel largely canceled because of #coronavirus, these remote resources will help you stay engaged.

on Mar 24, 2020 in AI, Blogs, Courses, Online Education, Podcast, RE.WORK, Resources, Webinar, White Paper, Youtube
Why BERT Fails in Commercial Environments

The deployment of large transformer-based models in dynamic commercial environments often yields poor results. This is because commercial environments are usually dynamic, and contain continuous domain shifts between inference and training data.

on Mar 24, 2020 in BERT, Business Value, Failure
Made With ML: Discover, build, and showcase machine learning projects

This is a short introduction to Made With ML, a useful resource for machine learning engineers looking to get ideas for projects to build, and for those looking to share innovative portfolio projects once built.

on Mar 23, 2020 in GitHub, Kaggle, Machine Learning, Research
Exploring TensorFlow Quantum, Google’s New Framework for Creating Quantum Machine Learning Models

TensorFlow Quantum allow data scientists to build machine learning models that work on quantum architectures.

on Mar 23, 2020 in Google, Machine Learning, Quantum Computing, TensorFlow
Nine lessons learned during my first year as a Data Scientist

What is it like to be a Data Scientist? There can be many hats to wear, and so many problems to solve that are fed with data, churned by data science, and guided by business results. Find out about lessons learned from one Data Scientist about how best to work and perform in the role.

on Mar 20, 2020 in Advice, Career, Data Scientist
Build an Artificial Neural Network From Scratch: Part 2

The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.

on Mar 20, 2020 in Neural Networks, numpy, Python
ModelDB 2.0 is here!

We are excited to announce that ModelDB 2.0 is now available! We have learned a lot since building ModelDB 1.0, so we decided to rebuild from the ground up.

on Mar 19, 2020 in MLOps, ModelDB, Modeling, Version Control
A Comprehensive Data Repository for Fake Health News Detection

We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.

on Mar 19, 2020 in Bots, Fake News, Fraud Detection, Health, NLP
The 4 Best Jupyter Notebook Environments for Deep Learning

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

on Mar 19, 2020 in Deep Learning, Google Colab, Jupyter, Python, Saturn Cloud
A Top Machine Learning Algorithm Explained: Support Vector Machines (SVM)

Support Vector Machines (SVMs) are powerful for solving regression and classification problems. You should have this approach in your machine learning arsenal, and this article provides all the mathematics you need to know -- it's not as hard you might think.

on Mar 18, 2020 in Algorithms, Explained, Linear Algebra, Machine Learning, Support Vector Machines, SVM
Time Series Classification Synthetic vs Real Financial Time Series

This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.

on Mar 18, 2020 in Finance, R, Time Series, XGBoost
A Beginner’s Guide to Data Integration Approaches in Business Intelligence

An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.

on Mar 18, 2020 in Beginners, Business Intelligence, Data Integration
Five Interesting Data Engineering Projects

As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

on Mar 17, 2020 in Dask, Data Engineering, dbt, DVC, Python
Scaling Your Data Strategy

This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.

on Mar 17, 2020 in Data Science, Scalability, Strategy
Forecasting Stories: Is it seasonality or not?

Kicking off with a series of forecasting stories, starting with seasonality and its business applications. This first article speaks of course corrections that were based on weather and calendar driven seasonality.

on Mar 17, 2020 in Forecasting, Seasonality, Time Series
When Will AutoML replace Data Scientists? Poll Results and Analysis

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

on Mar 16, 2020 in Automated Data Science, AutoML, Humans vs Machines, Poll, Trends
Skynet Is Real: The History and Future of Factories With No Workers

Let’s see whether robots will become "grave diggers" of the proletariat, what do we lack to get total automation, and what compromises exist.

on Mar 16, 2020 in Automation, Robots
Building a Mature Machine Learning Team

After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.

on Mar 13, 2020 in Data Science Team, Machine Learning, Team
The Most Useful Machine Learning Tools of 2020

This articles outlines 5 sets of tools every lazy full-stack data scientist should use.

on Mar 13, 2020 in Applications, GitHub, Machine Learning, Postgres, PyCharm, Tools
Decision Boundary for a Series of Machine Learning Models

I train a series of Machine Learning models using the iris dataset, construct synthetic data from the extreme points within the data and test a number of Machine Learning models in order to draw the decision boundaries from which the models make predictions in a 2D space, which is useful for illustrative purposes and understanding on how different Machine Learning models make predictions.

on Mar 13, 2020 in Decision Boundaries, Machine Learning, Modeling, R
How To Build Your Own Feedback Analysis Solution

Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.

on Mar 12, 2020 in Customer Analytics, NLP, Text Analytics
Few-Shot Image Classification with Meta-Learning

Here is how you can teach your model to learn quickly from a few examples.

on Mar 12, 2020 in Image Classification, Learning, Machine Learning
Top KDnuggets tweets, Mar 04-10: 10 Free Must-Read Books for Machine Learning and Data Science

Also: The three phases of #COVID19 – and how we can make it manageable; 50 Must-Read Free Books For Every Data Scientist in 2020; Binary classification is a core machine learning technique, but is there a better way to evaluate its performance than ROC-AUC?

on Mar 11, 2020 in Top tweets
Math for Programmers!

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

on Mar 11, 2020 in Book, Manning, Mathematics, Programming
Software Interfaces for Machine Learning Deployment

While building a machine learning model might be the fun part, it won't do much for anyone else unless it can be deployed into a production environment. How to implement machine learning deployments is a special challenge with differences from traditional software engineering, and this post examines a fundamental first step -- how to create software interfaces so you can develop deployments that are automated and repeatable.

on Mar 11, 2020 in API, Deployment, Machine Learning, MLOps, Software Engineering
Python Pandas For Data Discovery in 7 Simple Steps

Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.

on Mar 10, 2020 in Beginners, Data Preparation, Pandas, Python
Generate Realistic Human Face using GAN

This article contain a brief intro to Generative Adversarial Network(GAN) and how to build a Human Face Generator.

on Mar 10, 2020 in GANs, Generative Adversarial Network, Neural Networks, Python
21 Machine Learning Projects – Datasets Included

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

By Shivashish Thakur on Mar 9, 2020 in Chatbot, Datasets, Google Trends, Machine Learning, Project, Uber
50 Must-Read Free Books For Every Data Scientist in 2020

In this article, we are listing down some excellent data science books which cover the wide variety of topics under Data Science.

on Mar 9, 2020 in Books, Data Science, Data Scientist, Free ebook
A Crash Course in Game Theory for Machine Learning: Classic and New Ideas

Game theory is experiencing a renaissance driven by the evolution of AI. What are some classic and new ideas that data scientists should be aware of.

on Mar 9, 2020 in Game Theory, Machine Learning
Resources for Women in AI, Data Science, and Machine Learning

For the international women's day, we feature resources to help more women enter and succeed in AI, Big Data, Data Science, and Machine Learning fields.

on Mar 8, 2020 in AI, Data Science, Diversity, Machine Learning, Women
Analyzing GDPR Fines – who are largest violators?

Fines from the GDPR have been rolling in since its inception in 2018. This article investigates who are the largest penalty recipients by country, the amounts, and private individuals.

on Mar 6, 2020 in Europe, GDPR, Privacy
Tokenization and Text Data Preparation with TensorFlow & Keras

This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.

on Mar 6, 2020 in Data Preprocessing, Keras, NLP, Python, TensorFlow, Text Analytics, Tokenization
Phishytics – Machine Learning for Detecting Phishing Websites

Since phishing is such a widespread problem in the cybersecurity domain, let us take a look at the application of machine learning for phishing website detection.

on Mar 6, 2020 in Cybersecurity, Machine Learning, Security
Trends in Machine Learning in 2020

Many industries realize the potential of Machine Learning and are incorporating it as a core technology. Progress and new applications of these tools are moving quickly in the field, and we discuss expected upcoming trends in Machine Learning for 2020.

on Mar 5, 2020 in Machine Learning, Security, Trends
TensorFlow 2.0 Tutorial: Optimizing Training Time Performance

Tricks to improve TensorFlow training time with tf.data pipeline optimizations, mixed precision training and multi-GPU strategies.

on Mar 5, 2020 in Neural Networks, Optimization, Python, TensorFlow, Training
Achieving Accuracy with your Training Dataset

How do we make sure our training data is more accurate than the rest? Partners like Supahands eliminate the headache that comes with labeling work by providing end-to-end managed labeling solutions, completed by a fully managed workforce that is trained to work on your model specifics.

on Mar 5, 2020 in Accuracy, Data Labeling, Data Preparation, Training Data
How Bad Data is Affecting Your Organization’s Operational Efficiency

Despite recognizing the importance of data quality, many companies still fail to implement a data quality framework that could protect them from making costly mistakes. Poor data does not just cause revenue loss – it’s the reason your company could lose employees, customers and reputation!

on Mar 5, 2020 in Business, Data Management, Data Operations, Data Quality, Efficiency
A simple and interpretable performance measure for a binary classifier

Binary classification tasks are the bread and butter of machine learning. However, the standard statistic for its performance is a mathematical tool that is difficult to interpret -- the ROC-AUC. Here, a performance measure is introduced that simply considers the probability of making a correct binary classification.

on Mar 4, 2020 in Classification, Classifier, Interpretability, Machine Learning, Metrics, ROC-AUC
How do we Better Solve Analytics Problems?

Problem definition and solution development are key ingredients of being a consultant. Structuring the problem definition phase is critical to project success but may seem like a creative process.

on Mar 4, 2020 in Business Analytics, Data Analytics, Data Science
Recreating Fingerprints using Convolutional Autoencoders

The article gets you started working with fingerprints using Deep Learning.

on Mar 4, 2020 in Autoencoder, Convolutional Neural Networks, Neural Networks, Python
Linear to Logistic Regression, Explained Step by Step

Logistic Regression is a core supervised learning technique for solving classification problems. This article goes beyond its simple code to first understand the concepts behind the approach, and how it all emerges from the more basic technique of Linear Regression.

on Mar 3, 2020 in Classification, Explained, Linear Regression, Logistic Regression, Probability
The Augmented Scientist Part 1: Practical Application Machine Learning in Classification of SEM Images

Our goal here is to see if we can build a classifier that can identify patterns in Scanning Electron Microscope (SEM) images, and compare the performance of our classifier to the current state-of-the-art.

on Mar 3, 2020 in Data Science, Data Scientist, Image Classification, Machine Learning
20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2)

We explain important AI, ML, Data Science terms you should know in 2020, including Double Descent, Ethics in AI, Explainability (Explainable AI), Full Stack Data Science, Geospatial, GPT-2, NLG (Natural Language Generation), PyTorch, Reinforcement Learning, and Transformer Architecture.

on Mar 2, 2020 in AI, Data Science, Explainability, Geospatial, GPT-2, Key Terms, Machine Learning, Natural Language Generation, Reinforcement Learning, Transformer
5 Google Colaboratory Tips

Are you looking for some tips for using Google Colab for your projects? This article presents five you may find useful.

on Mar 2, 2020 in Google, Google Colab, Jupyter, Python, Tips
Uber Unveils a New Service for Backtesting Machine Learning Models at Scale

The transportation giant built a new service and architecture for backtesting forecasting models.

on Mar 2, 2020 in Machine Learning, Scalability, Uber

2020 Mar

Latest Posts

Top Posts