Talks Tech #41: Build a Data Science Project From Scratch

Talks Tech #41: Build a Data Science Project From Scratch

Written by Kirthikka Devi Venkataram

Podcast

Women Who Code Talks Tech 41     |     Spotify iTunes Google YouTube

Kirthikka is a Tech professional with 10 years of concentrated experience in managing, developing and delivering customer-centric products including safety critical products increasing organizational revenue growth by 35% YoY. She holds a Masters degree in Engineering securing Anna University’s rank V in 2007 and successfully led a cohort as Cohort Representative in her Executive education program in Product Management. She strives to uplift women in technology careers as a mentor with Products by Women and as a Community Lead with Women Who Code Data Science herself encountering multiple career break. Kirthikka shares her talk, How to Build a Data Science Project From Scratch. She talks about using GitHub as a platform for collaborating with other coders and working on projects and how files can be organized and presented to the world.

GitHub is a version-controlled code hosting platform on an open-source program, Git. It enables collaborative code development and management using a version control system that easily navigates files between the folders in your local system and the cloud. It helps you to leverage an individual project by making a portfolio, as well as organizing the files of an organization and a team, which makes it visible with the help of its visibility feature, making it public or private. There are quite a number of inbuilt features that help you to manage and develop code and automate the process in a structured way.

GitHub has an inbuilt security feature that helps find code vulnerabilities that also supports enterprise software. The real-time code is highly protected. You have an automated CI/CD pipeline and a task flow easily built with quick environment support. You have a personalized view of your project through the project management feature that gives you a glimpse into the current milestone, foresee the different milestones, and track the project. You also have Team Management that helps you to authorize or access users and provide them write or read access, enabling them to have control of the repository and allowing users to have special authorization by means of team management.

There is also an inbuilt feature called the Community, where you can raise questions that can be easily answered by the community or any of the users of GitHub. It depends on the visibility of the repository and the type of question that you ask on the platform. There are other options such as marketplace and apps that can be integrated into GitHub.

There are two version currently available with GitHub. One is its web portal, and the other is the desktop version. File navigation and transferability becomes quite easy with these two versions. You can commit, pull, push, and merge your changes into your repository as well as branch with these two versions, maintaining a copy on your local computer.

GitHub has two versions; one is the web portal version and the other is a desktop version. You can have both the versions for easy navigation of files and folders between local and the cloud. You can also have easy access to pull any of the repositories and have a local copy of it, and easily upload it to the cloud or to the GitHub portal. The name you give as an owner identifies you as a unique GitHub owner.

GitHub ca be explored in many different ways. You can navigate to different repositories, or follow certain Git portals or Git owners whose repositories are interesting to you. To find trending repositories, just click github.com/trending. You can also press the star button if things interest you. You can also receive updates or notifications about changes.

On your profile, you can add your basic details including your biography and social media accounts. Your profile has repositories, projects, and starred repositories that you are interested in. You can also create a link that can be organized separately for you.

The first step is to create a repository to have a local copy of the existing repository or to load your own local files to the cloud or the web portal of your GitHub account. You can choose if the repository is public, private, or if it can only be edited by specific users.

The best practice is adding at least one file to your repository, and GitHub allows you to add a readme file. You can straight away go and click that to get to your repository if you are not going to have any other local files added to it.

The repository has a main branch, and there is also a feature branch which indicates the changes that are being done in the main branch code or in the main branch files. There are also different ways you can pull, or have the repository content upload the content back to the repository.

Once the repository is created, you add a file to it. There are two ways that you can do it; one is to upload the files, which is there in your local storage, or you can also pull in from a different repository and add it over here. Else, you can create a new file and then place it in this particular repository. Right now, I’m going to upload a file and then add it to this particular repository. Let me choose the file that I’m going to add to this repository.

The main branch has visible files. You can also create a new branch and then commit it to the new branch to be pulled later. Changes to this file can be committed to the featured branch. The main branch holds the complete changes which are accepted by the author as per the rights given to the user.

There are two types of GitHub versions available. One is the GitHub web portal, and the other is a desktop version. Desktop allows you to navigate to your files easily.

There are two terms, cloning a repository, and forking a repository. With forking the repository contents are completely cloned to have a copy under your username or under your ownership. Cloning is when you commit or when you do changes, the commit happens to the original repository and not to the local copy of the repository that you created.

Forking will give you easy access to have a local copy of the complete repository, so you can make changes and experiment with the files that are present in the repository. The changes you commit will be reflected only in the local copy of the repository, and won’t affect the original repository content. That becomes significant when you don’t have author access to a repository, so you fork a repository, create a local copy of it and then you play around with it.