Kavish Hukmani

Cover Letter Generator LLM

2023-12-17T22:52:00+00:00

Introduction

Cover Letter Generator App was created as part of the interview process at ALMA (https://www.tryalma.ai/). The project took approximately 4-5 hours to complete, with around 3 hours dedicated to creating backend and UI, and an additional 1-2 hours for fine-tuning the prompt to optimize results.

LLM Powered

The LLM used is gpt-3.5-turbo for its cost-effectiveness and impressive capabilities at the given price range.
Local models were avoided due to a lack of available compute resources. However, the design through LangChain allows for easy switching to alternative models. The repo contains two main prompts, each designed with detailed instructions focusing on a different style of cover letter. These prompts can easily be modified or expanded to include additional styles.

Steps

Upload Resume:
- Allows the user to upload a resume in PDF format. The resume is then converted to text using the pypdf2 library.
Input Job Listing:
- Allows the user to input a job listing URL or text. A simple LLM based scraper is used to extract the relevant job description and company information from the URL.
  - Note: For websites which are hard to scrape, please use the text option to enter the job description manually.
Style Selector:
- Allows the user to select a style of cover letter, determining which prompt will be used to generate the cover letter.
  - The Classic style is designed to be a more traditional cover letter with a suitable length and tone.
  - The Modern style is designed to be a more concise cover letter with a more casual tone.
Generate Cover Letter:
- Generates a cover letter based on the selected style and the provided resume and job listing. The generated cover letter is displayed in the UI and can be easily copied to the clipboard.

How to Run

Prerequisites

Make sure you have Poetry installed.

Clone the Repository

git clone https://github.com/DoubleGremlin81/cover-letter-llm.git
cd cover-letter-llm

Install Dependencies

poetry install

Create a `.env` File

echo "OPENAI_API_KEY=" >> .env

Run the Application

poetry run streamlit run app.py

Screenshots

Examples

Example 1

Resume: Resume
Job Listing: Job Listing

Classic Cover Letter

Dear Hiring Manager,

I am writing to express my interest in the Data Scientist position at AppLovin. With a decade of experience in data analysis and statistical modeling, I believe I would be a valuable addition to your team.

Throughout my career, I have demonstrated a strong ability to extract meaningful insights from large datasets and make data-driven decisions. I have extensive experience in conducting in-depth data analysis, identifying trends and patterns, and providing valuable insights to drive business strategies. I also have a deep understanding of statistics and have successfully applied statistical concepts to real-world problems.

In addition, I have strong proficiency in data analysis tools and packages such as Python and SQL, which I can leverage to manipulate and analyze data efficiently. I am also skilled in data visualization using tools like Matplotlib and Tableau, allowing me to effectively communicate insights to both technical and non-technical stakeholders.

I am highly detail-oriented and have a passion for uncovering insights within data. I am also a strong problem solver and have a track record of collaborating effectively in cross-functional teams. I am constantly learning and staying updated with the latest trends and advancements in data analysis and statistical techniques.

I am excited about the opportunity to contribute to AppLovin’s advertising technology and help drive its growth. I believe that my skills and experience align well with the responsibilities of the Data Scientist position, and I am confident that I would be a valuable asset to your team.

Thank you for considering my application. I look forward to the opportunity to discuss my qualifications further and how I can contribute to AppLovin’s success.

Sincerely, Terrence Coleman

Modern Cover Letter

Hi, I’m Terrence Coleman, an analytically minded self-starter with a decade of experience collaborating with cross-functional teams to ensure data accuracy and integrity. I have a strong background in data analysis, statistical expertise, and data visualization using tools like Python and SQL. I have successfully led teams and implemented predictive modeling to drive business efficiency and strategic goals. I am excited about the opportunity to join AppLovin as a Data Scientist and apply my skills to analyze large datasets, uncover insights, and provide valuable recommendations to drive the advertising technology forward. With my strong analytical and problem-solving abilities, attention to detail, and effective communication skills, I am confident that I would be a great fit for this role. I look forward to the opportunity to contribute to AppLovin’s success. Thank you for considering my application.

Example 2

Resume: Resume
Job Listing: Job Listing

Classic Cover Letter

Dear Hiring Manager,

I am excited to apply for the Staff Data Scientist position at Grid. With my decade of experience in data science and analytics, I am confident that I can make a significant impact on Grid’s progress and contribute to the success of its users.

In my current role as a Senior Data Scientist at Best Buy, I have led data extraction and evaluation efforts, resulting in cost savings of over 11M. I have also partnered with the product team to build a production recommendation engine in Python, which generated $450K in incremental annual revenue. These experiences demonstrate my ability to collaborate with cross-functional teams and deliver actionable insights.

Furthermore, my experience at 2U as a Data Scientist and Data Analyst has honed my skills in A/B testing, data extraction, and modeling. I have successfully optimized algorithms to target the learning audience by 15% and increased profitability by 4% through Python clustering methods.

With my proficiency in Python, SQL, and machine learning techniques, I am well-equipped to develop and validate models that align with Grid’s strategic objectives. Additionally, my understanding of the financial industry, gained through building and scaling FinTech products, will enable me to contribute domain knowledge to the team.

I am eager to join Grid and help build out the data science team and practice. I am confident that my analytical mindset, autonomy, and curiosity make me the ideal candidate for this role. Thank you for considering my application.

Sincerely, Terrence Coleman

Modern Cover Letter

Hi, I’m Terrence Coleman, an analytically minded self-starter with a decade of experience in data science and analysis. I have a strong background in statistical inference and machine learning, which aligns well with Grid’s need for a Staff Data Scientist. In my current role at Best Buy, I led data extraction efforts and built a recommendation engine that resulted in increased revenue. Additionally, I developed customer attrition models and improved monthly retention. My experience at 2U allowed me to optimize algorithms and improve learning platforms through A/B testing. I have also worked extensively with Python, SQL, and Excel to extract and analyze data. With my expertise in data science, autonomy, and domain knowledge in the financial industry, I believe I would be a great fit for Grid’s team. I am excited to contribute to Grid’s mission of leveling the financial playing field and would love the opportunity to further discuss how my skills and experience can benefit your company. Thank you for considering my application.

Better Visuals

2023-05-15T13:40:00+00:00

Better Visuals is a Plotly Dash app designed to host interactive and visually appealing dashboards.

Dashboards

Data Privacy

Better Visuals is committed to user privacy. Any data collected is anonymized and not used for commercial purposes. The data is only used to track metrics and conduct educational analysis at an aggregated level.

Contributing

We welcome contributions to Better Visuals. Feel free to open an issue or submit a pull request if you have any suggestions or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

View source code on GitHub

Aspects of Analytics That You Cannot Learn In Class

2022-04-03T23:59:00+00:00

This blog was initially written as part of my coursework for BAX 462 - Practicum Elaboration at UC Davis.

Photo by Markus Winkler on Unsplash

Analytics is hard.

It is not something that can be taught entirely through a textbook. Of course, that is not to say that the classes I took as part of my master’s degree were not effective. They ensured that I understood the theory and thought process behind every process while also giving me a hands-on experience through projects and assignments. However, they do not provide any training on certain aspects related to the scale and complexity of real-world projects.

This is where my year-long practicum came in to picture. For the uninitiated, a practicum, as defined by Merriam-Webster, is a course of study [..] that involves the supervised practical application of previously studied theory. The MS in Business Analytics program at UC Davis consists of a year-long practicum, a critical factor in my decision to join the program. The goal of the practicum is to provide real-world experience in analytics by working on real projects in a team at an external organization.

In this blog, I will be talking about six practical aspects of implementing analytics that I learned through the practicum. While some of these topics are mentioned in various courses, their elaborate nature is hard to gauge. To this end, I will also provide ways that you, the reader, can gain these skills (or at least get a better understanding of them). I shall refer to this as the Best Alternative to a Practicum Experience, or BATPE for short.

1. Dealing with multiple sources of data

While Kaggle datasets might be fantastic for someone new to analytics, they are not representative of working in the industry. Most companies have their data spread across multiple tables, data warehouses, and types of databases. It is common to join multiple tables at different levels of data, each with its own intricacies. This requires a combination of domain knowledge as well as SQL expertise to ensure correct and quick execution. During my practicum, I learned this lesson the hard way when our team ran some inefficient joins, which hogged all the resources, making the database inaccessible to anyone else.

BATPE: Work on an analytics project with multiple related data sources. Since finding a curated dataset in this format is hard, you would have to use multiple public datasets and even scrape your own data. My advice would be to choose a topic revolving around geographic or demographic differences as fields such as County and Gender are present in many diverse datasets which are publically available.

2. Dealing with dirty data

Although this is a topic often covered in courses, it is glossed over as it is time-consuming to try out practically. While most larger companies have Data Engineering teams that deal with data cleaning, it is still an important skill to have for an analyst, especially as one should know the ins and outs of the data before performing any analysis.

BATPE: This might be the easiest of my learnings to replicate as there are quite a few dirty datasets available. It is common to find incomplete information in government-released information. I would recommend Australian Marriage Law Postal Survey, 2017, and NYCs 311 Service Requests from 2010 to Present if you are looking for a challenge. Another option would be to scrape some data yourself.

3. Experiment tracking and collaboration

While group projects provide some exposure to the collaborative nature of working in a team, they are seldom as structured as working in a well-oiled team at a job. They also do not emphasize tracking and versioning models and datasets.

BATPE: Maintaining a proper Git repository for the code while following best practices can drastically reduce the friction of collaboration in university or boot camp group projects. Weights & Biases is another incredible tool that helps with experiment tracking, dataset versioning, and model management.

4. Model runtime

There are two aspects to a model’s runtime- training time and execution/prediction time. Both of these might or might not be significant based on the use case. The importance of training time depends on how long it takes and how regularly the model is updated, whereas prediction time might be critical when dealing with streaming data.

BATPE: Monitor times while creating models. A few tips from my experience would be to use scalable and parallelizable frameworks and algorithms such as Dask or PySpark instead of Pandas and LightGBM instead of XGBoost. Another common trick is to drop excess data and precision, trading off some accuracy for better performance.

5. Model deployment

An ML project does not end with computing the F1-Score on the test set. Deployment is a vital part of the analytics life cycle, without which nobody would be able to use the incredibly beneficial models you built. During my practicum, I deployed the models my team built on Azure and exposed them through a REST API to run nightly batch processes. While this is not very complicated, it is important to know the pros and cons of different kinds of deployments to ensure that the model is being run without any hiccups or manual intervention.

BATPE: There are some high-quality online resources, such as this blog by StackOverflow and this course by DeepLearning.AI which provide an in-depth look into deployments and the world of MLOps.

6. Performance tracking

Deployment is also not the end of an ML project. It is critical to monitor the model inputs and KPIs for drift to detect any external changes affecting the model’s performance. This ensures that the model has not worsened over time, causing it to provide negative business value.

BATPE: Neptune.ai has a beautiful guide that dives deep into monitoring models in production.

Bonus: Timelining

Creating a timeline for any sort of work is hard. Even experienced professionals often misjudge the time needed to solve a problem. People often overestimate the efficiency and do not factor in complexities that might arise mid-way through a project. While there is no BATPE for this, working on more projects helps you get a better grasp on estimating deadlines. In accordance with Hofstadter’s law, a common rule of thumb is to double any initial estimate.

Not impressed?

Is there any aspect that you think that I should have mentioned? Or a BATPE that you recommend? Feel free to tweet at me using the button below.

Tweet to @2gremlin181

Analyzing Online Conversations of the 2022 Russian Invasion of Ukraine

2022-03-19T23:11:00+00:00

This analysis was a part of my BAX-452 Machine Learning course final project at UC Davis.

You can find the GitHub repo here.

Data Characteristics

For the analysis, we used data from two different datasets, which were linked temporally and topically.

Twitter Data

We used tweets between the 02-27-2022 to 03-10-2022 from Ukraine Conflict Twitter Dataset. These tweets were then filtered for English only using Twitter’s language parameter. These Tweets were further preprocessed and cleaned to prepare them for analysis. For Topic Modeling, they were converted to lowercase, tokenized, stripped of stop words, links, numbers, and symbols before being stemmed using the Porter Stemmer algorithm. For Sentiment Analysis, the only preprocessing done was removing URLs and mentions due to the other factors providing additional context to our models.

ACLED Data

The Armed Conflict Location & Event Data Project (ACLED) is a non-profit organization that provides reputable granular information about worldwide conflicts such as battles and protests. We used their Data Export Tool to collect data regarding events of interest in Russia and Ukraine. This data was last updated on 03-11-2022.

Analyses and Insights

Topic Modeling on Tweets

The first step in our analysis was to understand the topics that people were talking about. We started with a naive approach of creating a frequency distribution of the words and plotting a word cloud.

This approach was not able to provide us with any meaningful insights due to the most common word stems, somewhat predictably, being based on the words Ukraine, Russia, Putin, and War.

Our next approach was to use Latent Dirichlet Allocation (LDA) to cluster similar topics. We then mapped out the most prominent topics using the words, their weights, and some context through the content of the Tweet and news.

Word Stems	Interpreted Topic
ukrain, student, border, indian, poland, evacu	Indian students stranded at the Ukraine-Poland border in poor conditions after a failed evacuation attempt.
ukrain, support, help, nuclear, plant, power	Russian attack on a Ukrainian nuclear power plant.
ukrain, russian, mariupol, kharkiv, kyiv, armi	Russian army invasion of the Ukrainian cities of Kyiv, Kharkiv, and Mariupol
close, ukrain, nato, stoprussia, stopputin, un	People asking the UN and NATO to intervene and assist Ukraine
weapon, provid, defend, humanitarian, putin, civilian	People talking about providing Ukrainian civilians humanitarian aid

Sentiment Analysis on Tweets

Bag of Words Approach

The first attempt at sentiment analysis used a rudimentary bag of words approach. We used the NLTK corpus through TextBlob which provided us with two metrics- polarity and subjectivity. The polarity score ranged from -1.0 for a very negative sentiment to +1.0 for a very positive sentiment. The subjectivity ranged from 0.0 for very objective statements to +1.0 for very subjective statements. The results from this approach were not promising due to the lack of depth in the sentiment tags and the inability to parse context which is a limitation of the bag of words approach.

Transformer Models

To improve the sentiment analysis, we switched to using transformers. We used ROBERTa (Robustly Optimized BERT Pre-training Approach) models. ROBERTa is an optimized variant of BERT (Bidirectional Encoder Representations from Transformers), which is a direction agnostic model that makes use of masking. We used two pre-trained models created by Cardiff NLP, a research group at Cardiff University. This allowed us to better predict the sentiments for tweets as the models were fine tuned on the TweetEval dataset. Combined together, we had seven labels for each tweet- positive, neutral, negative, joy, optimism, anger, and sadness.

Hashtag Mapping

Since hashtags are such an essential part of Tweets, it is important to also understand them. It is difficult to directly analyze the sentiment of a hashtag as they are often made up of acronyms and multiple words without delimiters. To better understand them, we extracted the hashtags from each tweet and used the seven sentiment values derived from the previous exercise as target variables for regressions. The hashtags were then converted to dummy variables whose coefficients would provide context regarding the overall sentiment. We limited the analysis to the top 40 most popular hashtags to avoid outliers.

Sentiment Label	Top Hashtags (Based on +ve coefficient value)
negative	‘Mariupol’, ‘SafeAirliftUkraine’, ‘StopPutin’, ‘UkraineUnderAttack’, ‘Putin’
neutral	‘BREAKING’, ‘EU’, ‘China’, ‘US’, ‘NATO’
positive	None
anger	‘UKRAINE’, ‘StopRussia’, ‘StopPutin’, ‘putin’, ‘RussianUkrainianWar’
joy	‘SlavaUkraini’, ‘Zelenskyy’
optimism	‘StandWithUkraine’, ‘StandWithUkraine️’, ‘China’, ‘EU’, ‘SafeAirliftUkraine’
sadness	‘Mariupol’, ‘SafeAirliftUkraine’, ‘UkraineUnderAttack’, ‘BREAKING’, ‘Kharkiv’

As can see from the results, we were able to successfully model the sentiment behind the hashtags which would not be otherwise possible by directly looking at them. Seeming neutral words such as names like Putin and Zelenskyy were correlated with sentiments that make sense given the context.

Correlation with Real-World Events

To link the online sentiment with real-world events, we merged the average values for daily sentiment with the count of each event type from the ACLED dataset.

From the graph we can infer that the sentiment is fairly stable, however, there is a noticeable lagged effect. This can be clearly seen from 03-06-2022 to 03-08-2022 when there were a large number of anti-war protests in Russia leading to a drop in anger and negativity.

Additionally, we can also look at the correlation between the event type and sentiment to provide context.

While not perfect, we can see that types such as protests were positively correlated with positivity and negatively correlated with sadness, whereas event types such as violence against civilians were correlated with anger.

Header Image by Joshua Hoehne on Unsplash

Enhancing Client Engagement Through Better Branding

2022-02-25T23:59:00+00:00

This blog was initially written as part of my coursework for BAX 462 - Practicum Elaboration at UC Davis.

Photo by Austin Distel on Unsplash

Every Job is a Sales Job

I’m sure you’ve heard this quote thrown around before. You might have even read a “bestselling” book that goes by the same title. Not that the terms New York Times Bestseller or Wall Street Journal Bestseller carry much meaning these days. As noted in this Observer article, these lists are easily manipulated by authors and publishers purchasing a large number of books at launch to achieve bestseller status as a marketing tactic. While this might or might not have been the case for this book, I’m sure the irony is not lost on you.

In fact, let me blow your mind by telling you that the author of the previous article exposing these practices runs a company that offers services to market books and make them bestsellers.

Branding is the first step towards engaging with your customers. While these were innovative ways of branding products, I will be talking about a different application of branding that isn’t given as much thought but is equally applicable in our day-to-day lives- Project Branding.

Each project in an organization has a brand attached to it by people consciously or subconsciously. Generally speaking, a team working on new bleeding-edge features is seen as sexy even though it might not be contributing to the bottom line. On the other hand, a team working on maintaining a core product is seen as boring. This perception can have an impact on the engagement and priority given to them.

A similar effect also exists while dealing with clients as consultants. In this blog, I talk about my experience with enhancing client engagement through project branding as a data analyst in the practicum at the UC Davis MSBA. As part of the practicum, a team of fellow students and I worked for an external organization whom we met on a weekly basis. As the primary objective of the practicum is to offer a unique learning experience to students, the onus rests on them to be the primary driver of the projects. Since the degree of learning possible depends on the client’s engagement, it is essential to create and maintain an impression of eagerness to learn and competency in delivery with them.

This has led to one of my key takeaways from the MSBA Practicum this quarter not being about some novel algorithm and its unique application but rather about the importance of project branding. This was heavily influenced by an article in the MIT Sloan Management Review- Why Every Project Needs a Brand (and How to Create One).

While the article mentions several key factors, some of them are not always tweakable, such as the Pitch or desirability of a project. This was also the case of the practicum, where the pitch stage was already completed. However, there are several things that you can do, both individually and as a team, that lead to the client having a favorable impression and, in turn, leading to enhanced engagement.

Here are five activities my team implemented that contributed towards creating a better brand for ourselves

1. Providing an agenda and minutes for each meeting

As we generally only met with the client once a week, we sent out the meeting agenda a day prior. This not only allowed us to set an expectation for the meeting but also demonstrated how we were well prepared for it. Similarly, the minutes also showed our professionality along with acting as a source of truth for the discussions held.

2. Asking intelligent questions

Asking intelligent questions is an essential aspect of one’s brand. It demonstrates the ability to quickly and accurately grasp the core elements of a discussion, and helps develop trust and strengthen rapport. I found The Surprising Power of Questions [HBR May–June 2018] to be an excellent resource on how to ask better questions.

3. Making our efforts visible

Not everything one does is successful. You could be an expert in a field and still face unexpected setbacks. This is especially common as students who are learning along the way. Showcasing setbacks and failures provided the client with an inside eye on the efforts we put in, along with providing an opportunity to get feedback and suggestions based on their domain knowledge on how to approach the issue.

4. Creating a structured plan to tackle problems

Since the nature of the practicum led to rigid overall time constraints, we used Trello to keep track of all the tasks and their deadlines. This showcased our professional competency and commitment to deliver on the project in the given timeframe.

5. Setting additional meetings when required

When faced with a roadblock that required the client’s assistance, reaching out and setting up additional meetings rather than waiting for the weekly call showed a sense of urgency and priority towards the project.

Branding is part of client engagement. It is a sales job.

Client engagement is everywhere; in the case of the practicum, the goal was to extract more in the form of learning, and in the real world, it can ensure buy-in from your client and create customer loyalty. In this case, you, the reader, is engaging with my blog, and I want to ensure that my personal brand makes a good impression that makes you want to come back and interact more with my content. If it did, follow me on GitHub or Tweet at me with your thoughts.

Follow @DoubleGremlin181 Tweet to @2gremlin181

My Understanding of Teamwork in the Analytics Workplace

2022-01-25T23:59:00+00:00

An evolution through the UC Davis MSBA Practicum

This blog was initially written as part of my coursework for BAX 462 - Practicum Elaboration at UC Davis.

Photo by Nick Fewings on Unsplash

I’m sure that everyone agrees with the statement that teamwork is, at least to some degree, imperative to success. You can find journal papers and excerpts from books studying the impact of teamwork from the early 1900s to ones published just yesterday. The definition and requirement of teamwork is essentially the same whether you were a farmer and his agent in 1918, a player in a professional sports team fighting for the world championship, or a part of a contemporary firm and are working remotely due to the ongoing pandemic.

In this essay, I will be talking about my perspective on teamwork from the lens of a modern analytics environment and how it matured through my practicum at UC Davis.

For the uninitiated, a practicum, as defined by Merriam-Webster, is a course of study [..] that involves the supervised practical application of previously studied theory. The MS in Business Analytics program at UC Davis consists of a year-long practicum, a critical factor in my decision to join the program. As a part of it, I am currently working with Angel Flight West(AFW), a non-profit organization that leverages a fleet of volunteer pilots to provide non-emergency medical transport to patients at no cost.

Before joining the program, my experience working in large teams was limited. During my time at Manipal Institute of Technology and Impact Analytics, I was primarily in teams of 3 or fewer members, where everyone belonged to similar backgrounds and had identical job titles. This homogeneity made working with them very easy and efficient, as more often than not, we would share a common vision and approach to problem-solving. It gave me the impression that everyone within a team was an identical cog in a machine.

Photo by Lukas Tennie on Unsplash

Looking back, I can see how naive my idea of a team was. While this might have worked in smaller groups, it is not scalable for larger, more complex projects with more people working on them. I know now that it is when we put multiple different gears together that we can do great things. You can’t build a watch with just one kind of gear. My practicum team consists of four batchmates from diverse walks of life, our mentor Prof. Sanjay Saigal, and the CIO and Operations Leads at AFW. Together, each of us brings our unique perspective to the table and have been able to leverage it to enable AFW to positively impact more people each day.

Photo by leah hetteberg on Unsplash

A prominent aspect of teamwork, particularly in the workplace, that I have come to realize through the practicum is that not everyone can be the star striker. In some sense, a modern analytics team is composed more like a sports team, where different people have dedicated roles. You cannot build a soccer team with 11 strikers and expect to score a lot of goals. A team also needs to have good chemistry and prioritize the common goal over the individual. We can take the example of Paris Saint-Germain, a soccer team with some of the biggest names in the sport, such as Messi and Neymar. However, they always fail to deliver at an international level and often have been in the news for internal conflicts. A team needs to be well-rounded and have good chemistry in order to be successful. Similarly, not everyone working on a project can be building models and tuning hyperparameters. This can be seen in the industry, where Decision Scientists, Data Engineers, Business Intelligence Analysts, and ML Ops Engineers all work together in tandem.

Photo by John Schnobrich on Unsplash

The students in a practicum team at the UC Davis MSBA are given the freedom to allocate roles and responsibilities amongst themselves as they see fit. While we all initially took up the title of Data Analyst, I soon came to realize that you can’t build a functioning team with everyone wanting to develop machine learning models. I ended up taking a role in Data Engineering, working on the data preparation and querying for the remaining analysts. I was initially bummed out about it, wanting to work on the “impactful” aspects of the project. This feeling was amplified due to the fact that the assignment had the potential to have such a direct positive impact on people in need and I wanted to do my best to enable it.

However I soon came to realize that my work also had its place; it might not have been the shiniest cog in the machine, but it played a crucial role, and without it, everything would have come to a halt. Besides my technical learning, my biggest takeaway was the perspective of a different role in an analytics team. This allows me to be a better team player by understanding the workflow of another role and anticipate potential issues that can arise from both perspectives. Going forward in the practicum, I am looking forward to working in as many different roles as possible, thereby furthering my holistic understanding of the modern analytics team. This also leads into my other key takeaway,the importance of clearly defined roles and their direct impact on teamwork. It was something that I learned both in theory, through The Secrets of Great Teamwork, a reading that was a part of the practicum course and in practice.

Through this experience, the practicum has enabled me to better understand the workings of an analytics team. It has transformed me into a well oiled cog, who works in a frictionless manner and I am grateful that I decided to enroll in the UC Davis MSBA program.

Introduction to Mathematical Optimization

2021-11-30T19:25:00+00:00

This report was initially written as part of my coursework for BAX 401 - Information, Impacts and Insights at UC Davis.

Introduction

A problem is classified as an optimization problem when the goal is to maximize the efficiency of a process by minimizing or maximizing an objective function. This is done by adjusting the values for a set of variables, known as the decision variables, such that they satisfy a set of predefined constraints while providing the best possible solution to the objective. Optimization techniques can be classified into two groups-

Deterministic Optimization Methods

Problems that have no random or unknown elements are solved using deterministic techniques. The solution provided in such cases is guaranteed to be the global optimum. Depending on the nature of the system of equations used to represent the problem, they are classified into categories such as Linear Programming(LP), Mixed Integer Programming(MIP), Nonlinear Programming(NLP) etc. These problems are typically solved using solving programs or “solvers”, which use a combination of various techniques such as Branch and Bound, Plane Cutting, Simplex Method etc. The primary challenge lies in defining the constraints in a manner that is easy to solve. There are both open-source and proprietary solvers available with varying feature sets and efficiency.

Deterministic optimization is generally used for problems that can be represented using a small number of equations or when a guaranteed global optimum is required. This is due to the fact that combinatorial explosion from a large number of constraint equations causes the solution time to increase exponentially. Formulation of equations for large scale problems requires exploitation of intricacies using domain knowledge to decrease the time taken to solve it.

Stochastic Optimization Methods

Stochastic or probabilistic methods are used to solve problems with random or unknown elements. They cannot guarantee the optimality of the solution but instead, provide its probability of being the global optimum. This probability increases as time goes on and reaches 1 given infinite time. There are various methods that take different approaches to solve such problems. The following are some of the most popular methods-

Genetic Algorithms

Genetic algorithms are a versatile approach based on the Theory of Natural Selection. The objective is represented in the form of a fitness function which measures the fitness of each individual. We begin with a population of potential solutions from which we pick the two fittest individuals and create a set of children by combining them. These children then go on to form the next population iteration. This process continues until some cutoff criterion, such as a limit on the number of generations, is reached.

Simulated Annealing

Simulated annealing is generally used in discrete problems with uneven surfaces consisting of a large number of local minimas. It is based on the process of physical annealing in which metals are heated up and cooled down to change their physical properties. In simulated annealing, the optimization function can accept suboptimal solutions when the “temperature” is high, leading to exploration. This leads to the algorithm locating the approximate global minima rather than getting stuck at a local minima like a more precise algorithm such as gradient descent.

Tabu Search

Tabu search is another approach used to solve problems with uneven surfaces. However, it can be applied to both discrete and continuous problems. It works by banning a set of moves that would lead to the same solution as those taken recently, thereby making some moves tabu or taboo. This leads to the acceptance of suboptimal solutions, which allows escaping local minima.

Applications

The broad definition of mathematical optimization allows it to be applied across various domains from Operations Research to Computer Science. However, these can be generalized to basic problem types prevalent across fields. The following are a few examples of such problem types-

Resource Allocation Optimization (or Scheduling Problems)

Such problems deal with effectively utilizing a set of resources given constraints to maximize an objective, usually revenue or profit. A typical example of such a problem is the optimization of machine usage/labor in a factory that is capable of producing multiple products.

Route Optimization

Also known as the Traveling Salesman Problem (and its variants), these are problems that can be represented in the form of a graph where we need to traverse over the nodes while keeping the sum of edge costs/weights minimum. A typical example is the optimization of the routes for delivery trucks in a city.

How to create an interactive README for your GitHub profile

2020-07-17T12:37:00+00:00

This is a NO BS tutorial on how to create interactive READMEs for your GitHub Profile.

I implemented an interactive game of Tic-Tac-Toe on my profile and will be using that as an example for this tutorial while keeping it open enough to let you implement your ideas.

Prerequisites

A GitHub account
Intermediate knowledge in any scripting language. I will be using Python.
Basic Markdown and HTML for formatting your README.
Basics of GitHub Actions.

Let the games begin

1) Create a new GitHub repo with the same name as your username. https://github.com/username/username

2) Create an account on Link Click Counter using a (temporary) email ID. This is a free service which will track any clicks on the URLs in realtime. Create as many links as needed.

3) Create a new file in the language of your choice and scrape the results from https://www.linkclickcounter.com/userAccount.php. Store the values in a JSON file. The number of new clicks between two runs of the program is the difference between the old and new values. Don’t forget to store your email ID and password as environment variables.

Expand gist

Rubiks Cube Gym

2020-05-24T15:52:00+00:00

After my internship ended prematurely due to the COVID-19 pandemic, I decided to use my newfound free time to learn new things and hone my skills. On the programming side of things, this has been reinforcement learning. I have been fascinated by reinforcement learning and OpenAI since I first read about it. OpenAI has led to some significant breakthroughs in recent years. From Gym, which standardised environments and can be used with all major RL libraries to the OpenAI Five, a team of five individual bots which beat the reigning champions OG at DotA2 to GPT-2 which completely revolutionised the NLP field.

I am also a big fan of the Rubik’s Cube and various twisty puzzles. Combining these two, I have created an OpenAI Gym environment for twisty puzzles. Currently, it supports the 2x2 Rubik’s Cube with more puzzles such as the Pyraminx and Skewb to come in the near future. I had initially intended to create one for the 3x3 Rubik’s Cube which everyone knows and loves, but the number of possible states is far too large, 43 quintillion or 4.3 x 10^19, making it impractical to use.

The 2x2 Rubiks Cube is available in three different environments, each for different reward methods. These include layer by layer, a beginners method and the Ortega method, which is commonly used by speedcubers.

This project has been a fantastic learning experience, and I plan to use it to do some Reinforcement Learning using it in the near future(stay tuned for that).

The code is available on my Github along with some more technical information in the README file.

I would love to see how anyone ends up using it. As always, I am open to suggestions and feedback.

Photo Credits: Gerwin Sturm

COVID-19 Visualizations

2020-05-09T14:10:00+00:00

I created some visualizations to show the spread of COVID-19 using Flourish

Number of COVID-19 cases per state in India over time

Data was pulled from the COVID19-India API.
You can find a cleaned version of the data I used here.

Number of COVID-19 cases per ward in Pune over time

Data was scrapped from maps posted by @smartpune.
You can find a cleaned version of the data I used here.

Kavish Hukmani

Cover Letter Generator LLM

Introduction

LLM Powered

Steps

How to Run

Prerequisites

Clone the Repository

Install Dependencies

Create a .env File

Run the Application

Screenshots

Examples

Example 1

Classic Cover Letter

Modern Cover Letter

Example 2

Classic Cover Letter

Modern Cover Letter

Better Visuals

Dashboards

Data Privacy

Contributing

License

Aspects of Analytics That You Cannot Learn In Class

Analytics is hard.

1. Dealing with multiple sources of data

2. Dealing with dirty data

3. Experiment tracking and collaboration

4. Model runtime

5. Model deployment

6. Performance tracking

Bonus: Timelining

Not impressed?

Analyzing Online Conversations of the 2022 Russian Invasion of Ukraine

Data Characteristics

Twitter Data

ACLED Data

Analyses and Insights

Topic Modeling on Tweets

Sentiment Analysis on Tweets

Bag of Words Approach

Transformer Models

Hashtag Mapping

Correlation with Real-World Events

Enhancing Client Engagement Through Better Branding

Here are five activities my team implemented that contributed towards creating a better brand for ourselves

1. Providing an agenda and minutes for each meeting

2. Asking intelligent questions

3. Making our efforts visible

4. Creating a structured plan to tackle problems

5. Setting additional meetings when required

Branding is part of client engagement. It is a sales job.

My Understanding of Teamwork in the Analytics Workplace

An evolution through the UC Davis MSBA Practicum

Introduction to Mathematical Optimization

Introduction

Deterministic Optimization Methods

Stochastic Optimization Methods

Genetic Algorithms

Simulated Annealing

Tabu Search

Applications

Resource Allocation Optimization (or Scheduling Problems)

Route Optimization

How to create an interactive README for your GitHub profile

Prerequisites

Let the games begin

Rubiks Cube Gym

COVID-19 Visualizations

Number of COVID-19 cases per state in India over time

Number of COVID-19 cases per ward in Pune over time

Create a `.env` File