I have been waiting a long time to finally be able to craft this blog post. Last Friday Julia Silge and I led a userR! 2020 online tutorial on “predictive modeling with text using tidy data principles”. The tutorial was host by R-Ladies en Argentina and I could not be more grateful for all the work the organizers put into this making this event happen.
Materials for this tutorial are available on GitHub, with two main resources in the repo:
- Slides, which you can see rendered here and the source for here
If you get stuck, you can post a question as an issue on this repo or post on RStudio Community
During the tutorial, I was excited and proud to publicly announce the book Julia and I are working on! The book is called “Supervised Machine Learning for Text Analysis in R” to be published in the Chapman & Hall/CRC Data Science Series! An online version is and will continue to be available at smltar.com. This year long project have been an exciting time of my life and I have been learning about, not just about the subject matter at hand, but about publishing, polishing and reviewing.
The book has been divided into 3 main sections:
Natural language features: This section covers the considerations and methods one can use to turn text into a numerical representation we can feed into our model. We are writing about but not limited to; tokenization, stemming and stop words (yes! you read that right! we have a whole chapter about stop words! And it was needed). This section is in really good shape.
Machine learning methods: We investigate the power of some of the simpler and more lightweight models in our toolbox. We are doing full walkthroughs of classification and regression with commentary and considers We drew from these chapters in our useR tutorial.
Deep learning methods: Given more time and resources, we see what is possible once we turn to neural networks. This section is still to come.
I already have a lot of people to thank for making this possible!
- Julia for seeing the promising in this book idea and taking on the big task with me
- our Chapman & Hall editor John Kimmel
- The helpful technical reviewers
- Desirée De Leon for the site beautiful design of the book’s website
- Max Kuhn and Davis Vaughan for the amazing work on tidymodels which we using in the second section of the book
- My wife for her continued support and her faint attempts to feign interest when I talk for about the book ❤️
- Alberto Cairo for lending an ear and encouragements to this idea