DEV Community

Cover image for Python for Data Science beginners
_khar
_khar

Posted on • Updated on

Python for Data Science beginners

Python.

Python is a high-level, interpreted programming language that Guido van Rossum originally released to the world in 1991.
It is a widely used language for a variety of tasks, such as web development, data analysis, scientific computing, and machine learning.
Python is an interpreted language, which implies that instructions are carried out line by line without the requirement for prior compilation.
Code may be written and tested rapidly thanks to this, but execution times may be slower than with compiled languages like C or Java.

Programming styles supported by Python include procedural, object-oriented, and functional programming.
Because it is dynamically typed, variable types are chosen at runtime rather than being defined directly in the code.

Overall, Python is a fantastic choice for a variety of programming tasks because it is a flexible and popular language with a sizable user and developer community.

Other data science tools include the following:

  1. Julia - is a relatively new language that was designed specifically for scientific and technical computing. It is known for its performance and scalability, with some benchmarks showing that Julia can be faster than Python for certain tasks. Julia also has a growing community and an active development team.

  2. R - is a language and environment for statistical computing and graphics. It is widely used in academia and industry for data analysis, visualization, and modeling. R has a large user community and many specialized packages for various statistical and data-related tasks.

  3. Scala - is a general-purpose language that runs on the Java Virtual Machine (JVM). It is designed to be scalable and is often used for building large-scale distributed systems. Scala is known for its functional programming features and is popular in the big data ecosystem, with frameworks such as Apache Spark being built on top of it.

Ultimately, each of these languages has advantages and disadvantages, and the best option will rely on the task's particular demands. For instance, R may be more suited for statistical computing whereas Scala may be better for constructing distributed systems and Python may be a solid option for data analysis and machine learning.

Python's simplicity and readability, which make it simple for beginners to learn and create code, are some of its important characteristics.
It is also simple to locate and utilize tools for a variety of tasks thanks to its sizable standard library and extensive ecosystem of third-party packages.

Variables and Data Types.

In Python, a variable is a name that refers to a value or an object. Variables can be used to store and manipulate data in a program. Here are some key things to know about variables in Python:

  1. Variable names in Python can consist of letters, numbers, and underscores, but cannot start with a number. For example, valid variable names include "my_variable", "variable2", and "myVar".
  2. Variables in Python are dynamically typed, which means that their data type can change during runtime. You do not need to declare the type of a variable when you create it.
  3. You can assign a value to a variable using the equal sign (=). For example, the following code creates a variable called "x" and assigns it the value 7:
 x = 7
Enter fullscreen mode Exit fullscreen mode
  1. You can assign multiple variables at once using a comma-separated list. For example, the following code creates two variables, "a" and "b", and assigns them the values 1 and 2, respectively:
 a, b = 1, 2
Enter fullscreen mode Exit fullscreen mode
  1. You can access the value of a variable by using its name in your code. For example, the following code prints the value of the variable "x":
print(x)
Enter fullscreen mode Exit fullscreen mode

Python Environment.

Anaconda

The Python programming language open-source distribution Anaconda comes with a number of potent tools and packages for data science, machine learning, and scientific computing.

The Anaconda framework includes a package manager, which allows users to easily install, manage, and update Python packages, as well as a variety of useful libraries and tools such as Jupyter Notebook, Jupyter Lab, Spyder, and NumPy. It also includes a number of pre-built environments or "virtual environments" which can be used to isolate and manage different sets of Python packages and dependencies.

Anaconda provides a complete ecosystem for data science and machine learning, making it an ideal choice for individuals or organizations looking for a robust and easy-to-use platform for their data analysis and machine learning projects.

Further information concerning the Anaconda framework can be obtained in their website following.

Anaconda | The World's Most Popular Data Science Platform

Anaconda is the birthplace of Python data science. We are a movement of data scientists, data-driven enterprises, and open source communities.

favicon anaconda.com

Jupyter Notebooks.

Jupyter Notebooks are a web-based interactive computational environment that allows users to create and share documents that contain live code, equations, visualizations, and narrative text.
Originally developed for Python, Jupyter now supports many programming languages, including R, Julia, and Scala.

Cells in Jupyter Notebooks can include either markdown text or code. Users can utilize markdown cells to give explanations or documentation for the code, and run code cells to execute code and view the results within the notebook.
Moreover, interactive widgets and visualizations can be added to notebooks, enabling users to study data and change parameters in real-time.

Jupyter Lab.

JupyterLab is the next-generation web-based user interface for Jupyter Notebooks. It provides an integrated development environment (IDE) that enables users to work with multiple notebooks, text editors, terminals, and other interactive components in a single interface. JupyterLab offers a more flexible and powerful environment than the classic Jupyter Notebook interface, with features such as:

  1. Tabs and panes: JupyterLab provides a flexible layout system that allows users to arrange notebooks, code editors, and other components in a tabbed interface.

  2. Code navigation: Users can search and navigate code files, notebooks, and other documents within JupyterLab.

  3. Drag-and-drop interface: JupyterLab allows users to drag and drop files and components from the file system, desktop, and other applications into the interface.

  4. Extensions: JupyterLab supports a variety of extensions that can add functionality such as Git integration, interactive widgets, and more.

  5. Command Palette: JupyterLab includes a command palette that allows users to search for and execute commands using a keyboard shortcut.

JupyterLab is designed to be compatible with the existing Jupyter Notebook format, allowing users to easily switch between the two interfaces. It is also highly extensible, allowing developers to create custom components and extensions to meet their specific needs.

Data Science.

In order to get insights and knowledge from data, data scientists utilize a variety of statistical, computational, and analytical techniques.
Data science's ultimate objective is to transform unstructured data into knowledge that can be applied to corporate decisions, research, and other uses.

Data science involves various stages, including data collection, data cleaning, data preprocessing, data analysis, and data visualization. It involves working with large and complex datasets, often using machine learning algorithms and other advanced analytical techniques to identify patterns, make predictions, and generate insights.

Data science has numerous applications across industries, including finance, healthcare, marketing, and more. It is a rapidly growing field, with increasing demand for data scientists who can help organizations make sense of their data and derive insights to inform business decisions.

Python Data Types.

  • Numbers: Python supports several types of numbers, including integers, floating-point numbers, and complex numbers. Integers are represented with the int type, and floating-point numbers are represented with the float type. Complex numbers are represented with the complex type.
# integer
x = 5
print(x, type(x))  # output: 5 <class 'int'>

# floating-point number
y = 3.14
print(y, type(y))  # output: 3.14 <class 'float'>

# complex number
z = 2 + 3j
print(z, type(z))  # output: (2+3j) <class 'complex'>

Enter fullscreen mode Exit fullscreen mode
  • Strings: Strings are used to represent text in Python and are represented with the str type. They are enclosed in quotes, either single quotes ('...') or double quotes ("...").
name = 'Velma'
print(name, type(name))  # output: Velma <class 'str'>

# string concatenation
greeting = 'Hello, ' + name
print(greeting)  # output: Hello, Velma

# string indexing and slicing
print(name[0])  # output: V
print(name[1:3])  # output: el

Enter fullscreen mode Exit fullscreen mode
  • Booleans: Booleans are used to represent truth values and are represented with the bool type. They can have two possible values: True and False.
is_sunny = True
print(is_sunny, type(is_sunny))  # output: True <class 'bool'>

# boolean operators
is_raining = False
print(is_sunny and is_raining)  # output: False
print(is_sunny or is_raining)  # output: True

Enter fullscreen mode Exit fullscreen mode
  • Lists: Lists are used to store collections of items and are represented with the list type. They are mutable, meaning their contents can be changed after they are created.
fruits = ['apple', 'banana', 'orange']
print(fruits, type(fruits))  # output: ['apple', 'banana', 'orange'] <class 'list'>

# accessing list elements
print(fruits[0])  # output: apple
print(fruits[1:3])  # output: ['banana', 'orange']

# modifying list elements
fruits[0] = 'pear'
print(fruits)  # output: ['pear', 'banana', 'orange']

# adding to a list
fruits.append('kiwi')
print(fruits)  # output: ['pear', 'banana', 'orange', 'kiwi']

Enter fullscreen mode Exit fullscreen mode
  • Tuples: Tuples are similar to lists but are immutable, meaning their contents cannot be changed after they are created. They are represented with the tuple type.
person = ('Velma', 30)
print(person, type(person))  # output: ('Velma', 30) <class 'tuple'>

# accessing tuple elements
print(person[0])  # output: Velma
print(person[1])  # output: 30

Enter fullscreen mode Exit fullscreen mode
  • Sets: Sets are used to store unique items and are represented with the set type. They are mutable, meaning their contents can be changed after they are created.
colors = {'red', 'green', 'blue'}
print(colors, type(colors))  # output: {'blue', 'red', 'green'} <class 'set'>

# adding to a set
colors.add('yellow')
print(colors)  # output: {'blue', 'red', 'green', 'yellow'}

# removing from a set
colors.remove('green')
print(colors)  # output: {'blue', 'red', 'yellow'}

Enter fullscreen mode Exit fullscreen mode
  • Dictionaries: Dictionaries are used to store key-value pairs and are represented with the dict type. They are mutable, meaning their contents can be changed after they are created.
person = {'name': 'Allan', 'age': 30}
print(person, type(person))  # output: {'name': 'Allan', 'age': 30} <class 'dict'>

# accessing dictionary values
print(person['name'])  # output: Allan
print(person['age'])  # output: 30

# modifying dictionary values
person['age'] = 35
print(person)  # output: {'name': 'Allan', 'age': 35}

# adding to a dictionary
person['city'] = 'Lagos'
print(person)  # output: {'name': 'Allan', 'age': 35, 'city': 'Lagos'}

Enter fullscreen mode Exit fullscreen mode

Python further supports other common data types present in other programming language. In addition to these basic data types, advanced data types such as byte arrays, byte strings, and custom classes are made available.

Top comments (0)