What makes a data scientist today? Consider this review of data collected from three years worth of data scientist LinkedIn profiles to gain insight into how this important new career path is shaping up.
Developing machine learning predictive models from time series data is an important skill in Data Science. While the time element in the data provides valuable information for your model, it can also lead you down a path that could fool you into something that isn't real. Follow this example to learn how to spot trouble in time series data before it's too late.
Mask R-CNN has been the new state of the art in terms of instance segmentation. Here I want to share some simple understanding of it to give you a first look and then we can move ahead and build our model.
This blog is meant to show that most everyone has had to expend quite a bit of effort to get where they are. They have to work hard, sometimes experience failure, show discipline, be persistent, be dedicated to their goals, and sometimes sacrifice or take risks.
With election cycles always seeming to be in season, predictions on outcomes remain intriguing content for the voting citizens. Misinterpretation of election forecasts also runs rampant, and can impact perceptions of candidates and those who post these predictions. A better fundamental understanding of probability can help improve our collective notion of futurism, and how we monitor elections.
Clear, succinct data visualizations can be powerful tools for telling stories and explaining phenomena. This article demonstrates this concept as relates to the COVID-19 pandemic.
Kubeflow just announced its first major 1.0 release recently. This post introduces the MPI Operator, one of the core components of Kubeflow, currently in alpha, which makes it easy to run synchronized, allreduce-style distributed training on Kubernetes.
Deep Learning sits at the forefront of many important advances underway in machine learning. With backpropagation being a primary training method, its computational inefficiencies require sophisticated hardware, such as GPUs. Learn about this recent breakthrough algorithmic advancement with improvements to the backpropgation calculations on a CPU that outperforms large neural network training with a GPU.
The Matrix Profile is a powerful tool to help solve this dual problem of anomaly detection and motif discovery. Matrix Profile is robust, scalable, and largely parameter-free: we’ve seen it work for a wide range of metrics including website user data, order volume and other business-critical applications.
If your team has started using ​Ray​ and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
Different types of data beyond your typical dollars and cents have been used in the finance industry for many years. By leveraging machine learning, sentiment data is expected to play an increasingly dominant role in the investment industry, and this article highlights some special challenges of its use in trading models.
This article aims to introduce one of the manifold learning techniques called Diffusion Map. This technique enables us to understand the underlying geometric structure of high dimensional data as well as to reduce the dimensions, if required, by neatly capturing the non-linear relationships between the original dimensions.
Whether you are just learning Data Science, a current professional, or just interested, it's crucial to keep the mind stimulated and stay current. With conferences, schools, and travel largely canceled because of #coronavirus, these remote resources will help you stay engaged.
The deployment of large transformer-based models in dynamic commercial environments often yields poor results. This is because commercial environments are usually dynamic, and contain continuous domain shifts between inference and training data.
This is a short introduction to Made With ML, a useful resource for machine learning engineers looking to get ideas for projects to build, and for those looking to share innovative portfolio projects once built.
What is it like to be a Data Scientist? There can be many hats to wear, and so many problems to solve that are fed with data, churned by data science, and guided by business results. Find out about lessons learned from one Data Scientist about how best to work and perform in the role.
We are excited to announce that ModelDB 2.0 is now available! We have learned a lot since building ModelDB 1.0, so we decided to rebuild from the ground up.
We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.
Support Vector Machines (SVMs) are powerful for solving regression and classification problems. You should have this approach in your machine learning arsenal, and this article provides all the mathematics you need to know -- it's not as hard you might think.
An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.
Kicking off with a series of forecasting stories, starting with seasonality and its business applications. This first article speaks of course corrections that were based on weather and calendar driven seasonality.
Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.
After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.
I train a series of Machine Learning models using the iris dataset, construct synthetic data from the extreme points within the data and test a number of Machine Learning models in order to draw the decision boundaries from which the models make predictions in a 2D space, which is useful for illustrative purposes and understanding on how different Machine Learning models make predictions.
Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.
Also: The three phases of #COVID19 – and how we can make it manageable; 50 Must-Read Free Books For Every Data Scientist in 2020; Binary classification is a core machine learning technique, but is there a better way to evaluate its performance than ROC-AUC?
While building a machine learning model might be the fun part, it won't do much for anyone else unless it can be deployed into a production environment. How to implement machine learning deployments is a special challenge with differences from traditional software engineering, and this post examines a fundamental first step -- how to create software interfaces so you can develop deployments that are automated and repeatable.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.
For the international women's day, we feature resources to help more women enter and succeed in AI, Big Data, Data Science, and Machine Learning fields.
Fines from the GDPR have been rolling in since its inception in 2018. This article investigates who are the largest penalty recipients by country, the amounts, and private individuals.
Since phishing is such a widespread problem in the cybersecurity domain, let us take a look at the application of machine learning for phishing website detection.
Many industries realize the potential of Machine Learning and are incorporating it as a core technology. Progress and new applications of these tools are moving quickly in the field, and we discuss expected upcoming trends in Machine Learning for 2020.
How do we make sure our training data is more accurate than the rest? Partners like Supahands eliminate the headache that comes with labeling work by providing end-to-end managed labeling solutions, completed by a fully managed workforce that is trained to work on your model specifics.
Despite recognizing the importance of data quality, many companies still fail to implement a data quality framework that could protect them from making costly mistakes. Poor data does not just cause revenue loss – it’s the reason your company could lose employees, customers and reputation!
Binary classification tasks are the bread and butter of machine learning. However, the standard statistic for its performance is a mathematical tool that is difficult to interpret -- the ROC-AUC. Here, a performance measure is introduced that simply considers the probability of making a correct binary classification.
Problem definition and solution development are key ingredients of being a consultant. Structuring the problem definition phase is critical to project success but may seem like a creative process.
Logistic Regression is a core supervised learning technique for solving classification problems. This article goes beyond its simple code to first understand the concepts behind the approach, and how it all emerges from the more basic technique of Linear Regression.
Our goal here is to see if we can build a classifier that can identify patterns in Scanning Electron Microscope (SEM) images, and compare the performance of our classifier to the current state-of-the-art.
We explain important AI, ML, Data Science terms you should know in 2020, including Double Descent, Ethics in AI, Explainability (Explainable AI), Full Stack Data Science, Geospatial, GPT-2, NLG (Natural Language Generation), PyTorch, Reinforcement Learning, and Transformer Architecture.