As a data scientist writing code for your models, it's quite possible that your work will make its way into a production environment to be used by the masses. But, writing code that is deployed as software is much different than writing code for exploratory data analysis. Learn about the key approaches for making your code production-ready that will save you time and future headaches.
Natural language processing has made incredible advances through advanced techniques in deep learning. Learn about these powerful models, and find how close (or far away) these approaches are to human-level understanding.
This week's free eBook is a classic of data science, An Introduction to Statistical Learning, with Applications in R. If interested in picking up elementary statistical learning concepts, and learning how to implement them in R, this book is for you.
This is a slightly more intricate example of MCMC, compared to many with a fairly simple model, a single predictor (maybe two), and not much else, which highlights a couple of issues and tricks worth noting for a handwritten implementation.
What has been happening to the definition of Data Scientist over the past 5 years? Does it still exist or has it morphed into a new version of its old self? Learn more about the recent trends in job descriptions and salaries for data scientists, ML engineers, and others to best understand the best fit for your career trajectory and interests.
Here is a selection of courses for those interested in diversifying their domain knowledge into the related realms of economics and finance, with the goal of being able to apply your data science skills to these domains.
If you want to learn simple and practical rules for coding and refactoring, "Five Lines of Code" from Manning is the guide for you, teaching you concrete principles for refactoring. Save 40% with code nlfive40 until July 24.
By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.
When we consider the complexity of an algorithm, we shouldn’t really care about the exact number of operations that are performed; instead, we should care about how the number of operations relates to the problem size.
The technologies that generate deepfake content is at the forefront of manipulating humans. While the research developing these algorithms is fascinating and will lead to powerful tools that enhance the way people create and work, in the wrong hands, these same tools drive misinformation at a scale we can't yet imagine. Stopping these bad actors using awesome tools is in your hands.
Those interested in studying AI bias, but who lack a starting point, would do well to check out this introductory set of slides and the accompanying talk on the subject from Google researcher Margaret Mitchell.
Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.
This work explores how genetic relationships can be exploited alongside genomic information to predict genetic traits with the aid of graph machine learning algorithms.
modelStudio is an R package that automates the exploration of ML models and allows for interactive examination. It works in a model agnostic fashion, therefore is compatible with most of the ML frameworks.
PyTorch is a constantly developing deep learning framework with many exciting additions and features. We review its basic elements and show an example of building a simple Deep Neural Network (DNN) step-by-step.
LightGBM is a histogram-based algorithm which places continuous values into discrete bins, which leads to faster training and more efficient memory usage. In this piece, we’ll explore LightGBM in depth.
Foster Provost in memoriam for Tom Fawcett, killed on June 4th in a freak bicycle accident. Tom was a brilliant scholar, a selfless collaborator, a substantial contributor to Data Science for three decades, and a unique individual.
This post takes you through the basic steps for creating a cloud-based deep learning dog classifier, with everything accomplished from the AWS Management Console.
Classification is a core technique in the fields of data science and machine learning that is used to predict the categories to which data should belong. Follow this learning guide that demonstrates how to consider multiple classification models to predict data scrapped from the web.
Computer vision has tremendous promise for improving crop monitoring at scale. We present our learnings from building such models for detecting stem and wheat rust in crops.
The Youtube videos on this list cover concepts such as what machine learning is, the basics of natural language processing, how computer vision works, and machine learning in video games.
Google Colab is a widely popular cloud service for machine learning that features free access to GPU and TPU computing. Follow this detailed guide to help you get up and running fast to develop your next deep learning algorithms with Colab.
There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.
Time to get back to basics. This week we have a look at a book on foundational machine learning concepts, Understanding Machine Learning: From Theory to Algorithms.
In this tutorial, we will use a previously-built machine learning pipeline and Flask app to demonstrate how to deploy a machine learning pipeline as a web app using the Microsoft Azure Web App Service.
Everyone is prey to cognitive biases that skew thinking, but data scientists must prevent them from spoiling their work. Learn more about five biases that can all too easily make your seemingly objective work become surprisingly subjective.
There are many reasons why data scientists should learn Java. Read this overview of 6 specific reasons to help decide if Java might be right for your projects.
The tech progress in mobile app development, as well as digital enhancements, have created new chances for brands to allure and retain customers. In bridging the individualization gap, Machine Learning comes to the rescue.
COVID-19-driven concept shift has created concern over the usage of AI/ML to continue to drive business value following cases of inaccurate outputs and misleading results from a variety of fields. Data Science teams must invest effort in post-model tracking and management as well as deploy an agility in the AI/ML process to curb problems related to concept shift.
This hands-on book bridges the gap between theory and practice, showing you the math of deep learning algorithms side by side with an implementation in PyTorch. Save 40% off Math and Architectures of Deep Learning with code nlkdarch40
In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
Dashboards have been the primary weapon of choice for distributing data over the last few decades, but they have brought with them a new set of problems. To increasingly democratise access to data we need to think again.
Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters. Training a language model this large has its merits and limitations, so this article covers some of its most interesting and important aspects.
All is not well with artificial intelligence-based systems during the coronavirus pandemic. No, the virus does not impact AI – however, it does impact humans, without whom AI and ML systems cannot function properly. Surprised?
If you are just diving into learning statistics, then where do you begin? Find insight from those who have tread in these waters before, and see what they might have done differently along their personal journeys in statistics.
Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.
This article jumps into the latest skill set observations in the Data Engineering Job Market which could definitely add a boost to your existing career or assist you in starting off your Data Engineering journey.
The process of understanding your data begins by asking 3 questions at the highest level, and then iteratively asking hundreds of cascading questions to get deeper insights.
Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?
This article is about the story of taking effective business decisions basis a combined model. Let us together study how these components work hand in hand.
If you are interested in a top-down, example-driven book on deep learning, check out the draft of the upcoming Deep Learning for Coders with fastai & PyTorch from fast.ai team.