Hacking together JournalAIst
I participated in an AI hackathon with my friends Ferdinand Schenck and Axel Nordfeldt. During the roughly 36 hours, we came up with an idea and hacked together a personal ghostwriter, which we called JournalAIst and you can try out for yourself.
This was my first hackathon in many years and it was incredibly well organized by Factory Network and {Tech: Berlin}. The hackathon was sponsored by Mistral, Weaviate and LumaAI who provided free credits for their platforms.
I have experience with natural language processing in my previous role as a data science lead, but hadn't spent much time playing around with current LLMs. These days, I mostly use them as a coding assistant. It was fun to get to test out the latest LLM, image understanding and video generation models to discover what works well (and what doesn't).
We had written down a few ideas before the event but after a bit of brainstorming we settled on building a ghostwriter. The idea was inspired by Axel who had someone write a beautiful story about an event in his life, which he can fondly look back at forever. Most of us don't have the will or the time to write down our experiences. When we do write them down and read them months or years later they are a pleasure to have and bring one back into the memory. I have personally started writing down summaries of any trips I go on but unfortunately often forget to or am just lazy to do a proper job.
I usually make time to write about my day to friends or family, typically sending along a few pictures that highlight my experiences throughout the day. Messaging in short snippets is easier than writing a perfectly written story. While messaging, we don't let perfect get in the way of good enough so we are more likely to do it.
Building
The aim, divide and conquer. Ferdi, Axel and myself broke our tasks down into separate chunks which we could stitch together. We broke the problem down into a picture summarizer, an interviewer, and a writer. As a bonus (and because we could use the API for free) we added a step to generate a 50-word description of a video to send to the LumaAI's Dream Machine model to generate a 5 second video.
Each stage builds upon the previous one, and the aim of each step is to capture information to help with the next task.
Here is a general overview of the flow:
We had some trouble getting the model to say when the interview should end, so to save time we just added an 'End conversation' button that the user could press when they got tired.
The code is available in my GitHub if you wish to play around with it. To run it locally, you will need a Mistral API key, which you can get for free from their website.
The final product
We ultimately deployed the final product to Streamlit . Feel free to give it a try and let me know if you have any feedback. Please note this is hackathon quality, and was made in 24 hours!
Here are some screenshots of the various steps:
The interview:
Pick your story adventure:
The output
Here is an example story that I made up and is completely fictional, but based upon my made up answers. I used the Tesla marketing images for the Fremont factory. The pictures are automatically placed alongside the relevant part of the story.
Video
Here are some sample videos generated by the Dream Machine. They were really a bit of a hit or a miss. The less realistic the video, the better it appeared to be.
Conclusion
We made it to the final (Top 6 of 25 teams), and really impressed the judges in our first demonstration. Our project was one of the few that actually worked, and was developed from scratch during the hackathon.
The final pitch went well, but the judges had some reservations about the amount of hallucinations that the models could generate. This could be reduced with better prompting and having more information added in the interview. Overall we were happy to make it to the finals, but it was bitter-sweet to leave with nothing. Maybe we could have done with a product or marketing team mate who could have sold the idea better.
Overall, there is something that could be useful here, and probably needs more than 24 hours to solve. It needs to be made more convenient to use and to gather more information to limit any hallucinations.
Convenience leads down a big privacy path due to the nature of having 'someone' follow you around and ask questions about your experiences in your life. It would need to be built in a way that doesn't stream your entire life to an unknown server. Ignoring that, ideally it should be something that could read the information you are already capturing, built into your existing flow of communication. It could summarize messages, photos and possibly even your chats about a topic with various contacts, but only extract the relevant bits for the story you would like to tell.
I hope you enjoyed the read, and please let me know if you have any comments!