I enrolled in a student-run AI/machine learning program called LearnAI, close to the winter of 2019 as a part of the student club Undergraduate Artificial Intelligence Group at UofT (UAIG). The program was led by upper-years in computer science and the course taught machine learning fundamentals and AI to students who have not yet taken the course (at the University of Toronto, there are 7 prerequisite courses to take in order to take upper-year machine learning courses). Point being, they wanted students to be able to work with machine learning/AI without needing to have all the prerequisites.
Even though I didn't understand all the math behind the algorithms and concepts they were teaching (CNN, RNN, loss function, etc.), I still managed to make it to the project phase. Here comes the fun... 😃
Amongst groups of teams, my team and I used text summarization (Natural Language Processing) to analyze a given piece of text, and be able to eventually pick out the keywords and combine the main ideas into a coherent summary.
- What I learned: how to find datasets, understanding RNNs/CNNs, text summarization principles, techniques to use
It was in this stage that I realized, "Holy crap, finding the right dataset and the amount of data was the hardest part!". In a larger sense, knowing the business logic and reasoning behind the purpose of our project was another big aspect we needed to consider. How would our project benefit others? What types of problems will it solve? Has there been a solution created already? To be fair, I don't think people usually understand the amount of work that goes in to machine learning research projects and the amount of energy needed to sustain them. There is so much more to selecting the right model for your project, over-fitting issues, and knowing what algorithms to use, at least for me 😄.
We first went on to Kaggle and Google Dataset and found our data set - the Daily Mail data set. After going through that repository's code to filter out what we needed, we finally got down to being able to train our models with training data. We are currently in the process!
- What I learned: cleaning data, refactoring code, percentage allocation of training/testing data, working in a team, working with other people's code, using nltk, version control, tensorflow
Eventually, we'd like to take this idea and be able to integrate it into a software application to aid accessibility, and for note-taking purposes.
I recently attended the StartAI Conference 2020, the world's largest undergraduate AI conference. Guess who the keynote speaker was? None other than Professor Emeritus Geoffrey Hinton! He is often known as the "godfather of AI".
At his presentation,
Some psychological evidence that our visual systems impose coordinate "frames" in order to represent shapes. - StartAI Conference ft. Prof. Hinton, [Presentation slides]
How cool is that? Just like Plato's Theory of Forms, our brains compose a visual representation of objects within our visual frame to recognize them! In kinesiology, we learn that our brains build motor programs, modify them, and store them (memory). This is called a "motor representation". In fact, I realize that the field of motor learning is so closely tied to machine learning!
LearnAI also had the honour of hosting Sam Lightstone this week - the CTO for IBM Data, IBM Data & AI, and an IBM Fellow (I know, everyone go crazy! 🤓). It was then that I really learned about the supercomputer we currently have. Did you know that no supercomputer is more efficient than the human brain? Did you know that if you change a couple pixels in an image, imperceptible to the human eye, that it completely messes up the supercomputer? The most powerful one we have?
The human brain consumes around the same Watts as a lightbulb!
So, after all that, learning more about AI's applications and - despite our humanly powers to create technology - its downfalls. Before this program, I hadn't had a clue of AI's capabilities nor how it really melded with other fields like healthcare (something I'm studying) and medicine. After the presentation and seeing IBM's lead at the forefront of AI, I think it's safe to say that AI is still young, yet advancing so fast that we may run out of energy to power our training models by 2040! Again, it's up to us/our developers and machine learning researchers/institutions to start to understand how to better solve current issues as most practical machine learning applications are mainly image-recognition based, when in fact, many issues that currently hold us back reside with JSON or text-based data. Ethically speaking, how can we make AI sustainable, fun, and/or ethical?
In conclusion, the field of machine learning has grown so much and has so much more room to grow. I can't say that I'm an expert but this certainly intrigued me. Its applications reach far and wide, ever-reaching, and as we progress, we should evaluate how far we've come, starting from Arthur Samuel and his chess game, to future advancements.
Comment below and let me know what you think! Lots to ponder...
Curious? 💡 I've listed my resources below:
- A Quick Introduction to Text Summarization (3 min read): https://towardsdatascience.com/a-quick-introduction-to-text-summarization-in-machine-learning-3d27ccf18a9f
- Unnecessary AI - fun stuff with AI: https://unnecessary.ai
- A Whale Fact - AI-generated whale facts!: https://t.co/RSIzbcvtqa?amp=1
- Vector Institute - Canada's leading AI Institute: https://vectorinstitute.ai
And for those who are curious about my project:
- The Daily Mail GitHub: https://github.com/abisee/cnn-dailymail