With
today’s tools, anyone can collect data from almost anywhere, but not everyone can pull the important nuggets out of that data. Whacking your data into Tableau is an OK start, but it’s not going to give you the business critical insights you’re looking for. To truly make your data come alive you need to mine it. Dig deep. Play around. And tease out the diamond in the rough.
Jumpstarting your data mining journey can be an uphill battle if you didn’t study data science in school. Not to worry! Few of today’s brightest data scientists did. So, for those of us who may need a little refresher on data mining or are starting from scratch, here are 38 great resources to learn data mining concepts and techniques.
Learn data mining languages: R, Python and SQL
W3Schools - Fantastic set of interactive tutorials for learning different languages. Their SQL tutorial is second to none. You’ll learn how to manipulate data in MySQL, SQL Server, Access, Oracle, Sybase, DB2 and other database systems.
Treasure Data - The best way to learn is to work towards a goal. That’s what this helpful blog series is all about. You’ll learn SQL from scratch by following along with a simple, but common, data analysis scenario.
10 Queries - This course is recommended for the intermediate SQL-er who wants to brush up on his/her skills. It’s a series of 10 challenges coupled with forums and external videos to help you improve your SQL knowledge and understanding of the underlying principles.
TryR - Created by Code School, this interactive online tutorial system is designed to step you through R for statistics and data modeling. As you work through their seven modules, you’ll earn badges to track your progress helping you to stay on track.
Leada - If you’re a complete R novice, try Lead’s introduction to R. In their 1 hour 30 min course, they’ll cover installation, basic usage, common functions, data structures, and data types. They’ll even set you up with your own development environment in RStudio.
Advanced R - Once you’ve mastered the basics of R, bookmark this page. It’s a fantastically comprehensive style guide to using R. We should all strive to write beautiful code, and this resource (based on Google’s R style guide) is your key to that ideal.
Swirl - Learn R in R - a radical idea certainly. But that’s exactly what Swirl does. They’ll interactively teach you how to program in R and do some basic data science at your own pace. Right in the R console.
Python for beginners - The Python website actually has a pretty comprehensive and easy-to-follow set of tutorials. You can learn everything from installation to complex analyzes. It also gives you access to the Python community, who will be happy to answer your questions.
PythonSpot - A complete list of Python tutorials to take you from zero to Python hero. There are tutorials for beginners, intermediate and advanced learners.
Read all about it: data mining books
Data Jujitsu: The Art of Turning Data into Product - This free book by DJ Patil gives you a brief introduction to the complexity of data problems and how to approach them. He gives nice, understandable examples that cover the most important thought processes of data mining. It’s a great book for beginners but still interesting to the data mining expert. Plus, it’s free!
Data Mining: Concepts and Techniques - The third (and most recent) edition will give you an understanding of the theory and practice of discovering patterns in large data sets. Each chapter is a stand-alone guide to a particular topic, making it a good resource if you’re not into reading in sequence or you want to know about a particular topic.
Mining of Massive Datasets - Based on the Stanford Computer Science course, this book is often sighted by data scientists as one of the most helpful resources around. It’s designed at the undergraduate level with no formal prerequisites. It’s the next best thing to actually going to Stanford!
Hadoop: The Definitive Guide - As a data scientist, you will undoubtedly be asked about Hadoop. So you’d better know how it works. This comprehensive guide will teach you how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. Make sure you get the most recent addition to keep up with this fast-changing service.
Learn from the best: top data miners to follow
John Foreman - Chief Data Scientist at MailChimp and author of Data Smart, John is worth a follow for his witty yet poignant tweets on data science.
DJ Patil - Author and Chief Data Scientist at The White House OSTP, DJ tweets everything you’ve ever wanted to know about data in politics.
Nate Silver - He’s Editor-in-Chief of FiveThirtyEight, a blog that uses data to analyze news stories in Politics, Sports, and Current Events.
Andrew Ng - As the Chief Data Scientist at Baidu, Andrew is responsible for some of the most groundbreaking developments in Machine Learning and Data Science.
Bernard Marr - He might know pretty much everything there is to know about Big Data.
Gregory Piatetsky - He’s the author of popular data science blog
KDNuggets, the leading newsletter on data mining and knowledge discovery.
Christian Rudder - As the Co-founder of OKCupid, Christian has access to one of the most unique datasets on the planet and he uses it to give fascinating insight into human nature, love, and relationships
Dean Abbott - He’s contributed to a number of data blogs and authored his own book on Applied Predictive Analytics. At the moment, Dean is Chief Data Scientist at
SmarterHQ.
Practice what you’ve learned: data mining competitions
Kaggle - This is the ultimate data mining competition. The world’s biggest corporations offer big prizes for solving their toughest data problems.
Stack Overflow - The best way to learn is to teach. Stackoverflow offers the perfect forum for you to prove your data mining know-how by answering fellow enthusiast's questions.
TunedIT - With a live leaderboard and interactive participation, TunedIT offers a great platform to flex your data mining muscles.
DrivenData - You can find a number of nonprofit data mining challenges on DataDriven. All of your mining efforts will go towards a good cause.
Quora - Another great site to answer questions on just about everything. There are plenty of curious data lovers on there asking for help with data mining and data science.
No comments:
Post a Comment