Receive latest posts
Great! Please check your inbox and click the confirmation link.
Sorry, something went wrong. Please try again.

Computer Vision - Week 1

public
2 min read
Computer Vision - Week 1

Table of contents

Hey! My name’s Michel Liao. I’m a computer science first-year at Princeton University. I hope to get a Ph.D. in computer vision, publishing meaningful research along the way. Join me in my CV journey!

Github: https://github.com/Michel-Liao
Personal Website: https://michelliao.com/
Medium: https://medium.com/@michel.liao


Course Progress

63% through Coursera’s Supervised Machine Learning: Regression and Classification.

Paper of the Week

“Gradient-Based Learning Applied to Document Recognition”

I’m still working on reading this paper. It’s the first CV paper I’ve read, so I don’t understand a lot of the terms. I’ll have some insights from this paper for next week!

Videos/Lectures

I finished this intro to CV playlist by CV Professor Shree Nayar at Columbia University. It gives a nice overview of what CV is capable of and first principles.

Assignments

This assignment was given by Erich Liang.

The first assignment focused on the basics of Python and using NumPy and PIL to manipulate images. I overlayed a triangle, Venn diagram, and checkerboard on a picture by directly manipulating the NumPy arrays. Check it out on my GitHub!

Insights

  • Installing Mamba was a pain, but I got it working with some troubleshooting. It seems like a better version of conda, so why not use it?
  • I couldn’t get OpenCV to work. I think there was a problem with the latest version. Using this tutorial on PIL, I converted an image into a NumPy array. Going forward, I think I’ll try to use OpenCV.
  • This YouTube tutorial was helpful for understanding image processing.
  • Initially, I tried overlaying NumPy patches on the pictures, but this was a kind of hack. Directly manipulating the arrays was painstaking because I just couldn’t understand slicing. But, this brings me to the biggest insight of this week:
  • Thinking of accessing arrays as arr[row, col, channel] is so much easier than thinking of the array as a 3D array. It hurt my brain too much to think of 3D arrays.
    • Note: Not all libraries import arrays in height, width, channel order. Make sure to find out the structure of your library!
  • Creating a circle doesn’t come with the same logic as a triangle. The triangle’s for loop is pretty straightforward. I tried stacking two triangles on top of each other, but that wouldn’t make a circle. The key is to think of the image as the Cartesian plane. Then, use the distance formula to describe a circle and implement this in a for loop.
  • Argparse doesn’t work well with Jupyter Notebook. Switch to using regular Python scripts, instead!
  • Best line fits aren’t actually that easy to do. It’s a simple case of linear regression, but it’s actually pretty fun to implement!
  • Coursera marks assignments as overdue, but you can still get 100% on them. I panicked for a second.

Questions

  • Which areas of CV have been well-developed? Which areas still need lots of research?
  • When are convolutional neural networks preferable to other neural networks?
  • What does CV research look like?

Going Forward

I’ll keep reading the LeNet paper linked above. Additionally, I have another assignment that seems to be a lot more involved. It’s been pretty fun finishing them, though, so I’m looking forward to that!

Michel Liao

Michel Liao

Boise, Idaho, United States
Hello! I'm a sophomore studying computer science at Princeton. I like reading, rock climbing, and running.