DEV Community

Cover image for Part Ten: PIP, External Modules and GitHub
Simon Chalder
Simon Chalder

Posted on

Part Ten: PIP, External Modules and GitHub

Welcome to part ten. In this series I hope to introduce the basics of coding in Python for absolute beginners in an easy to follow and hopefully fun way. In this final article we will look at how to install and use a package manager, import modules from an online repository, and take a look at some modules which may be of interest to land based workers and students.


"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall


Sharing Is Caring


So, why would we want to use someone else's code? That's cheating isn't it? No, not at all! In fact, sharing our code encourages some rather fantastic things - the free sharing of experience and knowledge, and people coming together to collaborate on projects.

People share their code online for a number of reasons. Firstly there is the bragging rights to say that 'x number of people have downloaded or subscribed to my project this month'. Others have taken existing projects and re-written them is a way that they feel is better and wish to share it with the world. Some use their code as a resume for coding jobs.

Whatever the reason, sharing code online for free is a great thing to do and it's even become an online movement. So how do we do it? There are several online repositories where we can upload and store our code for all to see and use. The largest and most well known is GitHub, now owned by Microsoft, but there are others such as GitLab etc. In order to upload code or use other people's modules we need to be part of the GitHub community. Unless you have ethical issues with Microsoft products (some people do), you can go to GitHub's signup page and create an account so do that first if you do not already have one.

OK, now that's out of the way we can now take a look at the millions of coding projects currently being hosted on the site. In the search bar type 'python' and hit enter to see what we can find. At the time of writing there are 2,725,251 code repositories with the tag 'python'. That's a lot of projects! Let's go back to the search bar and this time search for 'pandas'. At the top of the search results you should see 'pandas-dev/pandas', click on the link.

What you are looking at is the GitHub repository page for the 'Pandas' project. Unfortunately 'pandas' has nothing to do with the clumsy black and white bears. Instead it is a powerful tool for data analysis.

So let's say we have read the project introduction and we like what we read. How do we use this module with our code? If we scroll down the page we will eventually come to the 'Where to get it?' section. You won't see any installer files to download here, instead there are two options; one for 'conda' and one for 'pip'. Conda and pip are package managers for Python and we can use them to install, modify or uninstall packages found in online repositories.

For this article, I am going to be using 'pip' (short for PIP Installs Packages) so we need to install that first. Take a look at this page and look for installation instructions for your operating system. Open up a shell prompt on your device and copy and paste the commands shown on the pip website to install pip.

To check if pip is installed correctly, open up a command prompt and type the following and press enter:

pip -V
Enter fullscreen mode Exit fullscreen mode

If you see 'pip' followed by a version number, you are good to go.


Installing Packages With PIP


Now that we have pip installed we can install our first online module. In your command prompt type the following and then press enter:

pip install pandas
Enter fullscreen mode Exit fullscreen mode

You should see something similar to the image below:

pandas install
We can check to see if the module installed properly as well as get a list of all installed modules with the following command:

pip list
Enter fullscreen mode Exit fullscreen mode

You may have noticed that pip also installed something called 'numpy'. This is another module which deals with mathematical operations and is required by pandas in order for it to function properly. This is known as a dependency and pip handles this automatically.

To install any module with pip use the following syntax pip install [module]. If you are unsure of the module name you can look for it in pip with the following command pip search [module]. We can also remove installed modules with pip uninstall [module].


Importing Installed Modules


If we go back to our IDE and open a new project, we should now be able to import 'pandas' as though it was a built in module:

import pandas
Enter fullscreen mode Exit fullscreen mode

We can confirm this is working with the dir() function:

print(dir(pandas))
Enter fullscreen mode Exit fullscreen mode

All being well we should get the following output:

['ArrowDtype', 'BooleanDtype', 'Categorical', 'CategoricalDtype', 'CategoricalIndex', 'DataFrame', 'DateOffset', 'DatetimeIndex', 'DatetimeTZDtype', 'ExcelFile', 'ExcelWriter', 'Flags', 'Float32Dtype', 'Float64Dtype', 'Float64Index', 'Grouper', 'HDFStore', 'Index', 'IndexSlice', 'Int16Dtype', 'Int32Dtype', 'Int64Dtype', 'Int64Index', 'Int8Dtype', 'Interval', 'IntervalDtype', 'IntervalIndex', 'MultiIndex', 'NA', 'NaT', 'NamedAgg', 'Period', 'PeriodDtype', 'PeriodIndex', 'RangeIndex', 'Series', 'SparseDtype', 'StringDtype', 'Timedelta', 'TimedeltaIndex', 'Timestamp', 'UInt16Dtype', 'UInt32Dtype', 'UInt64Dtype', 'UInt64Index', 'UInt8Dtype', '__all__', '__builtins__', '__cached__', '__deprecated_num_index_names', '__dir__', '__doc__', '__docformat__', '__file__', '__getattr__', '__git_version__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_config', '_is_numpy_dev', '_libs', '_testing', '_typing', '_version', 'annotations', 'api', 'array', 'arrays', 'bdate_range', 'compat', 'concat', 'core', 
'crosstab', 'cut', 'date_range', 'describe_option', 'errors', 'eval', 'factorize', 'from_dummies', 'get_dummies', 'get_option', 'infer_freq', 'interval_range', 'io', 'isna', 'isnull', 'json_normalize', 'lreshape', 'melt', 'merge', 'merge_asof', 'merge_ordered', 'notna', 'notnull', 'offsets', 'option_context', 'options', 'pandas', 'period_range', 'pivot', 'pivot_table', 'plotting', 'qcut', 'read_clipboard', 'read_csv', 'read_excel', 'read_feather', 'read_fwf', 'read_gbq', 'read_hdf', 'read_html', 'read_json', 'read_orc', 'read_parquet', 'read_pickle', 'read_sas', 'read_spss', 'read_sql', 'read_sql_query', 'read_sql_table', 'read_stata', 'read_table', 'read_xml', 'reset_option', 'set_eng_float_format', 'set_option', 'show_versions', 'test', 'testing', 'timedelta_range', 'to_datetime', 'to_numeric', 'to_pickle', 'to_timedelta', 'tseries', 'unique', 'util', 'value_counts', 'wide_to_long']
Enter fullscreen mode Exit fullscreen mode

That's a lot of methods! Pandas is a very large project with over 2750 people contributing to the code! This brings me to my next point - there is nothing to stop you being one of those people. Collaborating with others on projects and contributing your time and skills can be immensely rewarding. Even as a beginner, many projects will have tasks that you will be able to complete. If you have the time and want to try it out, look for a project you are interested (probably a large project to begin with) and get in touch!

So how do we use pandas for data analysis or numpy for some mathematical operations? What about all of those other modules?

This is where the step by step tutorials end and the free learning begins. At this point you are now capable of finding, installing and using any number of online modules and projects. However, each of these modules has methods which do different things and expect different input from the developer. To learn how to use them you need to do two things - read the documentation, and experiment. Never be afraid to simply install a module and have a play around to see what it can do.

From here you can choose to take your coding journey in any number of directions; want to study data science for ecology - no problem, guess wildlife populations in the future with machine learning - go for it! The only limit now is your imagination and the time you wish to devote to coding.

If you would prefer to work on your own projects but would like to upload your code and store it in GitHub then follow this link for a guide of how to do it.


"I have run into many coders, photographers, writers, who don't think of themselves as makers. But, I submit that making is any time you use your point of view to make something from nothing" - Adam Savage


GitHub Safari


To end the series, I am going to take a stroll through the GitHub repositories and highlight a few repositories that may be of interest in the land and wildlife field. If any of these spark your interest, install them with pip and try them out. Maybe you could even become one of their contributors!


1. Numpy


GitHub Page
Official Documentation

What is it?

Numpy is a Python module which allows you to perform scientific computing and numeric operations in your project.

Why is it interesting?

Aside from providing methods for various types of scientific calculations, one of numpy's big draws is it allows us to create three dimensional arrays (lists). What this means is lists within lists which allow us to make data structures similar to spreadsheets which can be very useful when working with actual spreadsheet or comma separated value datasets.

If you have an interest in scientific computing or just like the numpy project they are actively looking for contributors! Follow this link to offer your services for everything from writing code, administration and even graphic design.


Pandas


GitHub Page
Official Documentation

What is it?

Pandas is a powerful module for data analysis in Python. It allows us to work easily with large datasets with features for things like handling missing data, grouping and modifying data, and merging multiple data sets together. It requires numpy to work but as a result it works seamlessly with numpy's arrays.

Why is it interesting?

Pandas is an essential tool in data sciences and is used extensively in fields such as ecology. It is also used for processing data for use in machine learning / A.I. Pandas can be useful for anyone who wants to analyse any kind of dataset.


Matplotlib


GitHub Page
Official Documentation

What is it?

Matplotlib is an extensive module for visualising data in Python. It allows for the creation and display of many types of static and animated graphs.

Why is it interesting?

Graphs can be useful for all manner of professions and projects as a way of conveying data quickly and easily. Instead of exporting your data to something like Excel or another graphing software, simply do it all in Python.


Tkinter Designer and Custom Tkinter


GitHub Page - Tkinter Designer
GitHub Page - Custom Tkinter

What is it?

Tkinter is a Python framework for creating graphical user interfaces (GUI) for your applications. If you want to make a project which has user input boxes, buttons, labels, drop-down menus etc. Tkinter can make that happen. Tkinter designer and Custom Tkinter are two modules which allow you to easily create your own GUI.

Why is it interesting?

A GUI can be a lot more enticing for a user to engage with over using a command line app. Tkinter Designer makes it possible to design your GUI using only drag and drop components (you still have to code functionality into those buttons and other stuff). If you want more granular control, Custom Tkinter allows you to code your own GUI from scratch which is not nearly as hard as it sounds.


GeoPandas


GitHub Page
Official Documentation

What is it?

GeoPandas enables the working of geospacial data in Python and also allows for the plotting of that data in a similar fashion to using GIS software.

Why is it interesting?

Similar to pandas, geopandas has its own special data structures for dealing with geospacial data such as coordinates and geometry data. This means you can get up and running much quicker and utilising geopandas suite of methods will allow you to manipulate and plot your data to your hearts content.


Google Maps API


GitHub Page
Official Documentation

What is it?

Simply put, this is Google's official Maps API (Application Programming Interface). Think of it just like a module which provides pre-made methods which enable you to interact with Google Maps in your application. In order to use the API you do need to sign up for a Google account if you don't already have one and then generate an API key (a unique ID so Google knows who is using its servers).

Why is it interesting?

The API allows you to access all data found in Google Maps. That includes location lookup, directions, distance measurements, elevation data, and more. You can run a query through the API with as little as 4 lines of code, so why not give it a go?


Conclusion


Well that's it. The end.

If you have read all of the series, all the way to the end, you have my undying gratitude and thanks. If you only read some of it, then you still have my thanks and I hope you were able to take something away from it.

Here is a quick recap of what we covered:

  • Variables - what they are and how to use them

  • Data Types - what common kinds of data does Python support

  • Data Structures - lists, tuples and dictionaries and how we can store multiple pieces of data in a single variable

  • Flow Control - how we can use logic to let our code perform some decision making for us

  • Loops - making our code work hard for us to perform repetitive tasks quickly and easily

  • Functions - chunks of code we can write once and recall whenever we need them

  • Error Handling - dealing with those tricky and unpredictable users

  • Classes and Objects - creating blueprints for our objects and then generating objects from the blueprints in true OOP style

  • Files and Modules - how we can create our own reusable modules for use in the future to save us a lot of time

  • GitHub and External Modules - how we can use code written by others in our projects and how we can get involved by contributing to those projects

My intention with this series was not to get you to a professional developer level (although with further study, why not?), more I wanted to introduce the world of coding to those who maybe never saw a need for it or who thought it was beyond their reach.

If the series helped you at all, please consider telling a friend. Any comments and constructive feedback are always appreciated.

Thank you.

Simon.

Top comments (0)