As usual, if you wanna go straight forward for the code, here it is: >>> https://github.com/hugoestradas/Python_Basics
1) Counting unique words.
Almost every modern word processor software has a counting tool go get the total number of words in a document. I'll take this concept a little bit further, to practice both breaking down text and counting items.
For this exercise I'll write down a Python function to count the number of unique words and how often each occurs.
My input would be the path of a text file and the output or result will be the total number of words, the top 10 most frequent words and finally the number of occurrences for the top 10:
This time I'm importing two Python modules: "re" for regular expressions and "Collections" for counting.
My function begins by opening the file, with a given "path" variable which stores the location of the file, then using a regular expression I find all the words within its text.
The search pattern looks for any sequence of one or more letters, numbers, hyphens, and/or apostrophes.
Then I convert the list of words that it finds into all uppercase and then print out the length of that list, which indicates the total number of words that were found.
On line 10 I'm creating a new "Counter" object and the use a for loop to iterate through the entire list of words and increment the list entries within the counter's dictionary.
In the last block of code I use the Counter's "most_common" method
to retrieve a list of the 10 most common words, along with their count values to display:
2) Merging CSV files
Comma-Separated Values (CSV) files is a file format that stores tabular data in plain text.
I'm going to write a Python function to merge multiple CSV files into one.
I'm going to receive as input a list of files in a "path" variable.
The function should be robust enough to merge files which the headers don't even match.
The fields might be in different order, or a file could have additional fields that the other does not. It'll handle all of these cases, without losing any fields or data:
The first block of code builds up a list of field names, I start by creating an empty list then use a for loop to open up all of the files in the input list of CSV files to merge.
I used the CSV module's "DictReader" on line eight to extract all the field names from each file and then on line nine I add them to the fieldnames list, if they're not already in there from a previous input file.
The second part of the function, handles the writing of the records to the output file, based on those field names.
I used a context manager to open the output file to write on line 12, and then I created a new "DictWriter" object from the CSV module passing in the list of fields of names I created.
Every record I added using this "DictWriter" method includes all of the field from that list.
On line 14, I write the first header row to the output file and then use the for loop to iterate through all of the input CSV files again.
I open each one up and create a new "DictReader" from it, then I use a for loop to iterate through each record in that input file and write it to the output file.
If that row I just read is missing certain fields, the "DictWriter" will leave them blank or empty in the output.
These are the CSV files I will be comparing with my function:
And this is the final merged CSV file:
3) Save a dictionary.
Python's dictionaries are very popular among data scientist, data engineers and other data professionals. This is because their are awesome for storing and retrieving information. The only problem is that this data is kept in memory.
What if you need to use this dictionary later?
In this exercise I'll write a Python function that stores a dictionary into a file.
My two inputs are the dictionary to save, and the path for the output file.
I'll start by importing the "pickle" module, if you're not familiar with this library, I'll leave the official documentation here:
And here's the code:
For the execution of this program, I need to create a test dictionary object with the keys, after saving the dictionary into the file I can simply print the content from this file and show you the content remains there:
4) Create a ZIP archive
Okay, straightforward here's my code:
As you can see I imported the "os" module to search directories and to manipulate file paths and the "zipfile" module to actually build my zip file.
My function starts by opening the "output_zip" file using a context manager.
On the next line I use the "os.walk" function to explore and search in the directory.
As you can see my foor loop is separated as a linux-like directory structure "root", "dirs" and finally "files.
I need to maintain the relative file path for files in the output archive, but if the user calls the "zip_all" function with an absolute path the root path I get from the "walk" function will also be absolute. That's why on line 7 I use the "os.relpath" function.
5) Find All List Items
Python's index method finds the index of the first item in a list;
but what if there are multiple instances of that item?
In this one, I'm writing a Python function to find the indexes for all of the
items in a list that are equal to a given value.
The inputs are the list to be search and the value to be search for.
The output should be the list of indices, each represented by a list of numbers.
It's good to keep in mind that Python lists can also contain other lists.
So, this function should be able traverse multidimensional lists to find all
indices of the given value.