Mailgist

AI powered email processing and filtration

My motivation for this project/exploration was an email inbox overflowing with newsletters that I didnt have time to catchup with. I've explored possible tools that could solve my problem but haven't found any that would satisfy my needs.

Once every couple of weeks I'd open my newsletters inbox. I'd go through each of them often looking for keywords based on my interests. If the title or description of a given link from newsletter grasps my attention I'd open it in new tab and read it.

So the idea here was to automate the process of going through the newsletters and manually filtering for the information that was interesting to me.

Implementation

My idea for this project was that each user would receive unique email address which they can use to subscribe to various newsletters. This way my tool would have access to incoming email. Once the system receives the email I'd use AI(LLM) for processing the emails and filter its content based on user interests.

For this project I've decided to learn new technology and decided to go with AWS stack using sst

Email processing

First step and also most important step was email processing. First step was providing each user with unique email address. I've decided to make it simple for myself and go with an email processing service that would let me create multiple email addresses, I've picked postmarkapp.

This allowed me to give an unique email address based on the user name for each registered user.

Now user can subscribe to various newsletters using the email address provided by the app.

Once subscribed system would start receiving emails, for that I've came up with an email processing flow that my system would use, you can see it bellow:

Once postman receives the email it makes a post request to my Lambda function, and that where the email path within my system starts 1.

Approval process and spam filtering

First thing that my system checks who is the email addressed for based on the custom email address each user have 2. If email is addressed to existing user then we countinue the flow otherwise email still endsup in DB where I'm amble to investigate it further.

If email is addressed to existing user I then check if the sender of the email is already approved in the APP 3. Each email sender needs to be approved by the user to limit potential spam and assure that user receives only the newsletters that they're interested in.

In the app user can see all email senders and thats where they can be approved.

In the case that email sender is not registered in my DB and send user a notification about new email sender that needs to be approved 4.

If the email sender is already in the DB we check their approval status 5, if its pending we still add the email to DB so it could be still processed once user decides to approve it.

In the case where email sender is approved we continue the happy path 6, we're now at the point where user received an newsletter that they subscribed to and it can be processed.

Processing the email with AI

Now we have two stages of processing the email with LLMs. First 7 we categorise email as one of confirmation_request, newsletter, addvertisement, other. For this step we use only a part of the email + title since this is just enough to get a category and we dont need to process the whole email for that.

The categorisation happens with use of chatGPT 3.5 with a simple promtp that gives chatGPT list of available categories and asks to pick one of them.

Once the email is categorised as newsletter we can proceed with the final and most crucial part of the processing 8.

In this stage we take the whole email and send it to chatGPT with the task to turn it into a structured data format:

typescript
1interface ProcessedEmailChunk {
2 idx: number;
3 title: string;
4 description?: string;
5 url?: string;
6 category?: string;
7 tags?: string[];
8 partial?: boolean;
9}
10
11type ProcessedEmail = Array<ProcessedEmailChunk>;
12

We ask chatGPT to extract various data from the email sections eg title, description, url, but we also ask to generate a new data like "tags" or "category". Possible options for tags and categories are provided as part of the prompt. That data is also used in UI to let user select their preferences. This way we can easilly match the data from email with user preferences

My first attempt at this task was to simply send the whole email to chatGPT but I've quickly realised that some emails could contain quite a lot of text resulting in me quickly hitting the max context window of an LLM. Thats why I've decided to split the email in chunks and then process it separately

I've also quickly realised that using chatGPT 4 was not a cost efficient approach and switched to version 3.5, this required me to engineer a much more complex prompts since version 3.5 is not as good as v4 in following and understanding user needs.

Bellow you can see part of the prompt.

Main prompt template
Main prompt template

It ended up being quite complex but it was working in reliable way and chatGPT was responding with satisfying results, this was enough for the prof of concept!

And here you can see what the processed newsletters looks like in the UI(this view was only for dev purposess, its not something user would have direct access too)

Once email is successfuly processed and saved in the DB we can display it in the UI focusing only on the pieces of newsletters that are important to user based on their interests. We can also prepare a summary and send it to user in a selected interval

UI

The UI was build with shadcn/ui and it consisted of two main views/pages

Main page/dashboard

This is where user could get a full overview of the app

Email inbox

User also have an option to access and view original newsletter emails rather than their processed and filtered versions

Summary

I've decided to put the project on hold, for now. This turned out to be quite a big project for a single person but I'm happy I've started working on. I've learned a lot of new tools and approaches.

It was my first time using Lambda, DynamoDB or setting up project on AWS. This was a great lerning experience that I already take advantage of in different projects!

Learnings

ChatGPT 3.5 vs GPT 4

I've started project with chatGPT 4 which is the smarter one and better in following the orders but I've quickly realised that the amount of data I had to process would quickly add up in costs so I had to switch to 3.5.

This made the prompting much more complex but in the end I've achieved the desired output in a cost efficient way.

DynamoDB

This seemed like a good choice initially turned out not to be the best approach.

Once I've implemented email processing I've started thinking that vector search could work great here, allowing th filter the information semantically in an efficient way using vector database search. Unfortunatelly DynamoDB was not implemented with that in mind so I'd need to rething my approach and use a DB more suited for that usecase. It was too late into the project to switch the technology.

SST

SST made working with the AWS stack so much easier. I'll definietely stick with it for my upcoming full stack projects!