What is Data Science? – DAY 0
Data science is an interdisciplinary field. It uses scientific methods, processes, algorithms and systems to extract and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data.
Data science is a “concept to unify statistics, data analysis, machine learning, domain knowledge and their related methods” in order to “understand and analyze actual phenomena” with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, domain knowledge and information science. Turing award winner Jim Gray imagined data science as a “fourth paradigm” of science and asserted that “everything about science is changing because of the impact of information technology” and the data deluge.
Use of Python in Data Science
This blog series is focused on applied skills using the python programming language. There are many other tools that one can use in data science such as specialized statistical analysis languages like R or more general-purpose programming languages like JAVA and C. We choose Python as the basis for this specialization for three reasons.
1- Easy to Learn
Python is now the language of choice for introducing you to programming. In US’s top computer science programs and python programs, it have more popularity. It tends to have minimal templating that you might have been seen in other languages. And have more natural constructs for typical tasks you might need to accomplish.
If you have programming experience but not python specific experience, you can pick up Python very quickly.
2- Full Featured
Python is a very general programming language with a lot of built-in libraries. it has excels at manipulating data, network programming and databases. there is plenty of resources available from books to online resources. finally, python has a significant set of data science libraries one can use.
The base of these is called the scipy ecosystem. It even has its own conference series. Both the interface we are going to use called Jupiter notebook. The main libraries pandas and matplotlib are part of scipy stack. It provide an excellent basis for moving into machine learning, text mining and networks analysis. The blog series we will mainly work on four modules.
- The first module focuses on getting prerequisites in place and reviews some of the basics of the python language.
- In the second module, we are going to dig into the pandas toolkit. The pandas toolkit is fundamental in python data science. It provides a data structure for thinking about data in a tabular form.
- Some of the most advance ways to query and manipulate pandas data frames like boolean masking and hierarchical indexing are different than in databases and require some careful discussion. We discuss these in module three.
- In the final module, we will work on projects. You will take some databases, merge and clean them, then process the data and answer some questions.
In the next blog before we go into programming fundamentals, we will talk a bit more about what data science is and why it is sweeping over the world.
THANK YOU, KEEP CODING.
VISIT OUR WEBSITE BRIGHTERBEES FOR MORE ABOUT DATA SCIENCE.
CONNECT WITH THE AUTHOR ON LINKEDIN.
IF YOU WANT TO KNOW ABOUT MACHINE LEARNING click here.