Posts

Showing posts with the label python

NumPy

Image
NumPy is a python library used for applying high-level mathematical functions to multi-dimensional arrays and matrices. In our case it would help us in vectorization. Vectorization refers to the solution which allow the application of operations to an entire set of arrays at once instead of applying it one element at a time (as is done in case of loops) Vectorization is much much faster than the traditional loops and most computer can do it more efficiently ( Here is a sample of code to demonstrate just how fast it is) But before we start with matrices and vectorization and broadcasting and what not, let's start with the very basics and learn how to make and modify matrices NumPy is usually enabled by default, nevertheless. You can check check it from the Anaconda Navigator   (if you don't know about Anaconda Navigator and don't know how to install packages, check the following link ) Now, Open Jupyter notebooks, go to whatever folder you want to save the file in and let&#

Logistic Regression

Image
INTRO Logistic Regression is a learning algorithm used for classification, i.e., given an input feature vector x, the output y will be some discrete value, i.e., either 0 or 1 (representing no or yes in a binary classification) or 0, 1, 2..and so on (if we're classifying between multiple categories say, cat, dog, trees etc) Given an input feature vector x, corresponding to an image that you want to classify as either cat (y = 1), or not cat (y = 0), we want the algorithm to output a prediction ($\hat y$) that is an estimate of y, i.e., the probability of it being y (probability of it being a cat) If x is an n x dimensional vector. Given that information, the parameters of logistic regression would be:  w, which is also an n x  dimensional vector b, which is just a real number Given the input x and the parameters w and b, this prediction could be generated as   $$ \hat y= \sigma (w^Tx+b) $$   Where $ \sigma (w^Tx+b) $ is $$  \sigma(

Intro To Anaconda

Image
Image source :  https://www.anaconda.com/ OK. First Off, Intro (What Anaconda? Why Anaconda? How Anaconda?) Paraphrasing from Wikipedia ( https://en.wikipedia.org/wiki/Anaconda_(Python_distribution) ): Anaconda is a free and open-source distribution of Python that serves as the perfect platform for scientific computing, Data Analysis, Machine Learning (our main interest) Anaconda makes is really simple to apply machine learning projects, and is really great at managing packages. To install Anaconda  Go to  https://www.anaconda.com/products/individual Scroll waaaayyyy down, until you see something like this : Download the appropriate package (according to your computer OS and architecture). Also, Make sure you're downloading for Python 3.7 During the installation, don't forget to check the following options when prompted: After you're done with  the installation, there's another small thing I want you to introduce... Environments And here we come to the main reason we

Intro to Pandas

Image
By Marc Garcia - https://github.com/pandas-dev/pandas/blob/master/web/pandas/static/img/pandas.svg, BSD, https://commons.wikimedia.org/w/index.php?curid=73107397 Pandas is a Python Library used to store and manipulate tabular data. This is an essential library that we'll use in the future (and quite frequently so) to manipulate our data for predictions, classifications and what not. So, without further ado, let's jump right into it. Let us first download our data, that we will use to practice all the features and functions of library. Go to https://www.kaggle.com/sohier/calcofi    It should direct you to this page: Click on the "Download" button right beside the "New Notebook". It'll download a zipped file. (Save the zipped file somewhere in the C drive)  Extract that to your desired folder (Also, somewhere in C drive). I put the folder here : Now open "Anaconda Navigator" and download pandas packag