Table of Contents
Starting a Career in Data Science is a Nightmare: Is it?
Getting started with data science is something that every professional in the IT space aspires to do. Data science as a field have gained popularity very recently though the data mining algorithms date back to the 60s. Data Science is the most in demand career in today’s highly connected, highly digitized world.
In fact, the charm that data science has also influenced academia and every year more and more students opt data science as their career option. But there is a catch, a lot of confusions all around, a lot questions that are left unanswered and a lack of proper guidance and mentoring. Many of the young graduates fresh out of college with degrees in their hands and a lot of energy and passion often ask me questions such as,
Where should I start my career in Date Science? Is it okay to start with anything that is offered to me and later when I have some years of industry experience I would then switch to a data science role because that is the ultimate next in the digital space?
There are openings for data engineers, model engineers, data analysts, machine learning engineers, etc. How do I know which one is the perfect suit for me?
I also often have questions from experienced professionals who want to switch to and start afresh a career in data science such as,
I am from a software engineering background having X numbers in my profile, can I get into data science? If yes, where do I start if I want a Career in Data Science?
I recently completed a course on end to end Data Science project management from my organization’s internal Learning and Development portal and I want to switch to a Career in Data Science. But looking at the requirements, I do not think I got shot. Do you think I should quit and start all over again with a lower package in some other organization?
All of these questions when laid on a piece of paper, often looks scary because they are real questions not limited to a handful of professionals who want a Career in Data Science. It won’t be wrong to admit that it is because of these questions that pop in one’s mind and a lack of proper guidance in answering them where a lot of professionals never change their careers to Data Science even after having a very strong knowledge and understanding of the subject.
If all of these gives you nightmares and you felt you are not the right person for a career in data science, this article gives you more clarity on what it takes to start a career in data science. I will also give you some tips and a strategical roadmap which might help you out land that dream job of yours in data science. I am sure that by the end of the article, you would be confident enough to start working on your existing skillsets and build your way up towards a career in data science.
This article is for professionals in general, young graduates who want to start a Career in Data Science and for students who are looking for sources to answer their questions before they appear in front of the interview panel.
But before going into the details and answering these questions, lets first understand what is Data Science in a nutshell?
Looking forward to becoming a Data Scientist? Check out the Data Science Bootcamp Program and get certified today.
Data Science in a Nutshell
As per the definition of Data Science in Wikipedia,
“Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data”
Data Science is a multidisciplinary field which encompasses expertise from Project Management, Statistics, Mathematics, Computer Engineering, Algorithm design, cloud storage and Data Handling. It is thus for sure, irrespective of the number of courses you have completed online or offline and the number of Proof of Concepts (POCs) you have designed to prove your knowledge or the degree that you hold, starting a career in data science with an end to end responsibility has a very low probability.
It is because of a very simple reason that when we refer data in the corporate world, it is the live data from different sources which is very delicate, private and is often governed by stern laws to protect customers from getting exposed to unwanted agents in the digital space.
A data scientist thus has to gain a handsome understanding on the different phases involved in a data science project, also should be able to work out processes that would optimize resources and time and must deliver fruitful results in a live environment where billions of transactions take place every day between the customers and the organization. The typical steps involved in data science projects are as below,
- Data capturing: Data capturing can be considered as places which capture data in a business process. For example, when we sign up for a newsletter on a website, the form that we fill in can be considered as data capturing a point or when we swipe our cards while checking out, the Point of Sales (PoS) devices can be considered as data capturing a point. In this digital world, any activity on the internet can be considered as a data capturing activity because all of our activities get stored in databases and can be accessed by the organizations whenever required.
- Data Storage and maintenance: Once the data is captured in data capturing points in an organization, the data is securely routed to different storages such as relational databases or NoSQL databases where they are stored for further utilization. It is also a common practice to maintain the data in the same place where they are stored by creating indexes, data warehouses, caching the data, etc. for faster access and many times for getting insights into user activities on the websites.
- Data Visualization and processing: The stored data becomes useless if there is no information that can be extracted from the data. This is where data visualization and processing comes into use where the data is taken and visualized to see trends in the data or getting patterns so that organizations can optimize their resources for better services. It is also a common practice to get some general idea of the data in the form of statistical terms such as mean, median, mode, variance and standard deviations.
- Data Analytics: Data Analysis is a more, in-depth analysis of data which focuses on getting in-depth information from the data gathered and stored in the databases so that those results can help organizations optimize their resources and time.
- Result Communication: Once the entire data science process is complete, the last steps in communicating the findings to the stakeholders and the key decision-makers of the organization. This is usually done with the help of graphs and charts.
Now that we are familiar to what a data scientist does and what are the various steps involved in a data science project, let us now turn to understand how one should go about in order to start a lucrative career in data science.
Answering the big question: Where do I start?
This has been the biggest question in the minds of many people as to where one can start in this vast space of data science with so many things to learn, practice and gain experience. Often, it is not the theory or the codes or even the languages that are daunting but the list of skill sets, responsibilities and expertise that the organizations want from the candidates is what keeps them in the back seat.
Below is a strategical roadmap of how can one go ahead and start with a career in data science. Please keep in mind though that the below is just a strategy and as all other strategy it depends on a number of other factors for its success.
- Find out where do you fit: The first step in getting started with a Career in Data Science is to find out where you fit in. Figure out the skill sets that are in common to the role you are aspiring and the current role you are in your organization.
For example, if you are from a software development background, you have a lot of experience in coding and hence a good place to start will be feature engineering and as a machine learning engineer. Both of these roles need one to code so that the model is properly trained. If you are from a project management background, maybe starting with data visualization is what you can get started with.
You need not code to visualize the data rather there are tools are Microsoft Power BI, tableau, etc. which can help you create great plots for your data, you can as well jump into optimizations where you plan and get the most optimized results from your model.
Getting started with a skillset close your current skillset ensures that you are not starting from zero rather you are adding on to your current skill set which enables you in a different role altogether. In case, you are not sure about where do you fit in, you can always do one of the below to help you achieve the clarity,
Get mentors from the data science field, talk to them openly with what skillset you have and let them tell you which role can best suit your current profile.
Blindly decide on a particular aspect of data science whichever seems less daunting to you and build your skill sets around it by attending MOOCs, reading and solving problems online.
- Enhance your skillset to match the industry-standard: The second step in getting started with a Career in Data Science is enhancing your matching skill set to the industry standards of data science. One thing that we should keep in mind is no matter how closely your skill set relates to the particular role, there will be always a big gap between what you know and can handle to what is expected out of you in a data science project.
For example, if you are getting into data science from software engineering, the expectation from the industry will be to get a code that takes in data as is and gives you a prediction or a label as an output which might or might not be acceptable by the business, whereas when you are into a software engineering role, the output expected out of your code would be some desired process automated with the least amount of side effects to the existing functionalities of the tool that you are working on.
To improve on the model performance in data science one follows what is called optimization and performance evaluation whereas the later follow how well can you automate a manual process. Hence enhancing the skillset in the direction of the industry becomes your top priority.
A plethora of online resources are available on the internet and offline to help you enhance your skillset. For instance, if you are into machine learning, you have resources like coursera, udacity, udemy, cognitiveclass.ai, etc. which offer MOOC programs where you are taken step wise from one concept to the other. Make sure you enroll into one of the MOOCs and complete it. The main advantage of taking a MOOC is you not only get access to video lectures but also get access to online discussion forums, live interactive classes and peer reviews of your work which adds value to what you are trying to gain.
- Learn a Language and then a tool: The third step in getting started with a Career in Data Science is to learn a language followed by a tool and not vice versa. I have seen many instances where professionals and young graduates enroll themselves in post graduate programs offered by various institutes across the Silicon Valley where the focus is completely in learning a new tool rather than learning a language.
In my personal opinion, this is not the right approach because tools have limitations on their functionalities, for example, one of the oldest data analytics tool that we are all familiar with is the MS Excel because of the varied plots and functionalities available in it which is further enhanced by DAX.
The limitation of MS excel is that once the data size crosses few Megabytes, the tool becomes pathetically slow in analyzing the data and if it goes beyond some hundreds of megabytes, the tool crashes altogether while analyzing the data. Another shortcoming of learning a tool directly is the lack of knowledge in the scientific knowhow of how an algorithm produces the output.
For example, referring back to excel utility, one can directly do a regression analysis on the data with a help of few clicks but that is not fruitful because as someone who wants a career in data science; they are expected to know how things are getting done and if there are ways in which things can be tuned to generate more efficient results. Using a tool directly, keeps the user in the dark as to what happens in the back ground and if someone wants to enhance the plots they have no ways of doing it in the tool and even if they do there is a very limited scope for the same.
The good news is all of these limitations of a tool can easily be ignored if one is comfortable working with programming languages. By going through the documentation of various functions in a package one can easily understand how graphs are plotted or how models are built and what can be tuned to further improve the performance of those models. This comes handy especially when there is a requirement from the business executives on a different more intuitive plots or also improve the efficiency of the model whereas achieving all these requirements with the help of tools might not be feasible due to their innate limitations in design.
- Build PoCs and get it peer reviewed: The fourth step in getting started with a Career in Data Science is to build Proof of Concepts (PoCs) and get them peer-reviewed. Data Science is more of a practical field which can very easily be learned over a period of time by doing and practising the concepts again and again. If one wants to change their careers to data science or even for young graduates who wish to start a career in Data Science, small model programs can take you a long way in achieving your dream role in data science.
In corporate language, a proof of concept is a demo program or model that works on a limited amount of data which can be a segment of the real-world data or it can be a hypothetical data set which produces a similar output as it would be produced by the actual model when it runs on the actual data. PoCs not only show one’s expertise with the concepts but also go a long way to show the practical experience that someone has in the field.
Once building the PoC is complete, getting it peer reviewed is a plus. There are thousands of communities on social media which you can join. Get your dummy model uploaded in a shared repository such as Git and share the link across in the community to get the model reviewed by a variety of people. This helps in two ways,
- The performance of your model gets better with time since the model would then be reviewed by a number of professionals both experts and amateurs. That way you get a feedback on your model which can be worked upon to make your model better.
- You get the highlight in the community. The more models you create and share across the more visible you become in the community as a result your chances of getting hired by one of the experts in the community gets exponentially high.
- Keep yourself updated: The fifth step in getting started with a Career in Data Science is to keep yourself updated with all the developments and releases that happens almost every day in this space. The data science community is a highly dynamic sector where things change drastically. It is thus very important for one to keep themselves updated with the latest trends that’s going on the field so that one knows what is still in use and what has become obsolete.
One of the most powerful ways to keep yourself updated on these changes is to follow the giants like Google, Facebook and IBM to name a few on various research platforms and on Twitter. Some of the renowned data scientists that I follow on twitter are Andrew Ng, Geoffrey Hinton and Yann Lecun.
You can follow the leading researchers in your field of study and keep yourself updated on the latest trends on data science. Keeping yourself updated with the latest trends in data science adds a competitive edge on your resume as someone who not only knows the basics but also is dedicated in gaining knowledge on the rapidly changing space.
- Practice Communication Skills: The last and the final step in getting started with a Career in Data Science is to practice communication skills. If you remember at the onset of this article I had mentioned that the entire data science process ends with communicating the findings to the executives who are the ultimate decision-makers in the organization. No matter how strong you are on your technical abilities to work with data and create dashboards and plots and graphs, all of these makes no sense if one cannot communicate the results to the executives who are more interested in numbers than the technicalities involved in the process.
Another aspect of having a good communication skill is to share the knowledge you have to your peers or suggesting pointers that would improve the model or communicating your findings to your team members. One of the biggest reasons, why people get rejected at the job interviews is because of the lack in communication skills. This might not be directly related on being a data scientist but is a must to have, in order to initiate a career in data science.
Now that we have a strategy which we can employ while getting a career in data science, it’s now time to have a look at some of the roles that is available in the data science field and the key roles and responsibilities they must fulfill in an organization.
Roles in Data Science
- Data Scientist: The role of a data scientist is perhaps the most lucrative and sought after career in data science. It is challenging and is equally interesting to work as a data scientist in any organization. Data Scientists are often termed as data unicorns because of their obsession with data. They work both with structured and unstructured data to extract information from them and communicate them to the clients or executives and their peers. Data Scientists often are wonderful communicators and they communicate their findings with utmost precision and tact. A data scientist has knowledge of the overall data science process and is not confined to any particular step in the data science processor technology.
- Data Analyst: As the name suggests, data analysts are the real professionals who deal in raw data to figure out insights. They are involved in data gathering, data cleansing, getting the basic data analysis done and also communicate the patterns that can be seen in the data. Data analysts build models to get key insights from the data in terms of various business Key Performance Indicators. Data Analysts must be great team players and communicators in order to deliver the best in their roles by working in tandem with a variety of teams and professionals in the organization.
- BI analyst: The BI analyst deals in the performance of the business as a whole by looking into historical data and trends. They work alongside the executives to formulate policies based on the interpretation they have from the data provided to them. BI analysts usually are an expert in Tableau, a tool for plotting interactive graphs of the data provided as inputs and usually align their findings in the best interest of the business performance. BI analysts also analyze trends in the market to figure out trends of high and lows and help business gain a competitive advantage over their peers in the market.
- Data Engineer: Data Engineers are the ones who are known for their wisdom in data. Data Engineers are specifically trained professionals who focus their attention towards data cleansing, outlier detection and data preprocessing before letting the data into the model. Data Engineers are often data experts specializing in big data and their focus area lies in analyzing, visualizing, collecting and processing big volumes of data. Data Engineers are also seen as experts in ETL which is the abbreviation of Extraction, Transformation and Loading in the field of data science. Data Engineers work closely with BI analysts, data analysts and data scientists and support them during model building and generate visual plots for business communication.
- Machine Learning Engineer: As the name suggests a machine learning engineer is an expert in machine learning algorithms which involves supervised, unsupervised and deep learning. Many experts consider machine learning engineers as data engineers with an aptitude in machine learning. Machine Learning engineers are expected to have sound technical knowledge on the niche skills required for machine learning and deep learning and they are also expected to have a good hold on mathematics so that they can work on the existing algorithms and propose improvements on the same.
- Data Architects: Data Architects are the creators of the data which digitizes the business world. They are specialized experts in data storage, data collection and data flow management in the organization. They are concerned with the improvement of the data flow and data storage within and outside the organization. They also plan strategies that would help an organization optimize the data management process and also work closely with other data experts while they access, maintain and work with sensitive data for modelling.
This concludes the list of roles available in a data science project. Please note that the list above is only an indicative list and not an exhaustive list. Depending on the requirements, business structure and organization hierarchy the roles are subject to change and there might be additional roles introduced to support these roles in a data science project.
Starting a Career in Data Science is a Nightmare: No it is not?
By this time, I am pretty confident that what once looked like a nightmare to many young graduates and experienced professionals is no more a nightmare now. If you have gone through this article, I am confident that you have got a strategy on how does one land a job in pursuit of a career in Data Science. Below is the brief summary of what it takes to start with a Career in Data Science,
- Find out where do you fit
- Enhance your skill set to match the industry
- Learn a Language and then a tool
- Build PoCs and get it peer reviewed
- Keep yourself updated
- Practice Communication Skills
We also looked qualitatively on what roles does a project in data science usually offers. They can be summarized as below,
- Data Scientist
- Data Analyst
- BI analyst
- Data Engineer
- Machine Learning Engineer
- Data Architects
However, I should also point out the this is a qualitative list and not an exhaustive list. Depending on the organization structure, hierarchy and the demand of the data science project, the roles and responsibilities changes; there are even instances where roles are added to support the other roles involved in the project.
Recommended Reads:
- Top 15 Best Data Science Course in Mumbai
- Top 10 Data Science Course in Pune
- Top 10 Data Science Course in Bangalore
- Top 10 Data Science Courses in Nagpur
- Top 20 Data science course in Delhi NCR
- Top 10 Data Science Course In India
- Data Profiling, Process and its Tools
Also, Check this Video
Recommended Programs
Data Science Course
With Training
The Data Science Course from Henry Harvin equips students and Data Analysts with the most essential skills needed to apply data science in any number of real-world contexts. It blends theory, computation, and application in a most easy-to-understand and practical way.
Artificial Intelligence Certification
With Training
Become a skilled AI Expert | Master the most demanding tech-dexterity | Accelerate your career with trending certification course | Develop skills in AI & ML technologies.
Certified Industry 4.0 Specialist
Certification Course
Introduced by German Government | Industry 4.0 is the revolution in Industrial Manufacturing | Powered by Robotics, Artificial Intelligence, and CPS | Suitable for Aspirants from all backgrounds
RPA using UiPath With
Training & Certification
No. 2 Ranked RPA using UI Path Course in India | Trained 6,520+ Participants | Learn to implement RPA solutions in your organization | Master RPA key concepts for designing processes and performing complex image and text automation
Certified Machine Learning
Practitioner (CMLP)
No. 1 Ranked Machine Learning Practitioner Course in India | Trained 4,535+ Participants | Get Exposure to 10+ projects
Explore Popular CategoryRecommended videos for you
Learn Data Science Full Course
Python for Data Science Full Course
What Is Artificial Intelligence ?
Demo Video For Artificial intelligence
Introduction | Industry 4.0 Full Course
Introduction | Industry 4.0 Full Course
Demo Session for RPA using UiPath Course
Feasibility Assessment | Best RPA Using Ui Path Online Course