PYCON UK

Why you should do text analysis in python

Bhargav Srinivasa Desikan | Friday 12:00 | Room C

The purpose of this talk is to convince the python community to do text analysis - and explain both the hows and the whys. Python has traditionally been a very good parsing language, aruguably replacing perl for all text file handling tasks. Reading files, regular expressions, writring to files, crawling on the web for textual data have all been standard ways to use python - and now with the Machine Learning and AI explosion - we have a great set of tools in python to understand all the textual data we can so easily play with.

I will be briefly talking aboubt the merits, de-merits and use-cases of the most popular text processing libraries. In particular, these will be spaCy, NLTK, gensim. I will also talk about how to use traditional Machine Learning libraries for text analysis, such as scikit-learn, Keras and TensorFlow. Pre-processing is the one of the most important steps of Text Analysis, and I will talk more about this - after all, garbage in, garbage out!

The final part of the talk will be about where to get your data - and how to create your own textual data as well. You could analyse anything, from your own emails and whatsapp conversations to freely available British Parliament transcripts!