Responsibilities
In-depth data analysis: Extract data to manipulate/calculate/format/combine into
presentable reports, charts, and graphs. Analyze and interpret data to find outliers, understand root cause, business impact, correlations/discrepancies, and propose
changes/alternate solutions.
Discover patterns/root causes, and generate insights to drive product enhancements.
Bring together disparate data sources to create a complete analysis.
Analyze and evaluate the quality of data used for model training and testing
Create and present proposals and results in an intuitive, data-backed manner, along
with actionable insights and recommendations to drive business decisions.
Collaborate with other data scientists and engineers on data collection and feature
design efforts across teams.
Communicate results to diverse audiences through effective writing and data
visualizations (BI reports and Dashboards).
Desired Skills
Solid experience with Natural Language Processing (NLP).
Text Extraction from various sources (MS Word, plain text files, pdf files,
html pages, etc.), Text Cleaning, Text Pre-Processing, Tokenization, POS
tagging,NER,DependencyParsing,CoreferenceResolution,FeatureVector
Generation (binary, count, tf-idf, etc.), word2vec, doc2vec, glove, RAKE,
document similarity (Cosine, Jaccard, etc.),fuzzy text matching, Lexicaland
Semantic Information Extraction
Understanding of various NL constructs like Parts of Speech, Sentence
structures, Subject Verb Object relationships, word dependencies
(ROOT, compound, etc.)
Strong expertise in Python.
Expert-level skills with packages like NLTK, spaCy, genism, Pattern,
TextBlob,Vocabulary,extraction
tools like PDFMiner, Apache Tika with Python, PyPDF2, etc. pandas,
sklearn, numpy, xgboost, matplotlib, keras, etc.
Expertise in Command Line usage (., Bash), and SQL
Robust knowledge of statistical modelling and machine learning techniques
Techniques: text clustering (k-means,