Code generation using a Custom Knowledge Base

Recently, I embarked on an interesting project where I had to generate a codebase with the help of ChatGPT. However, I needed the code to align with a specific architecture and platform-building methodology I had in mind. This meant I couldn’t just rely on generic prompts; I had to be intentional and precise.

To achieve this, I started by developing prompts based on predefined code. The idea was to feed ChatGPT with structured input that would guide the generated code in the right direction. But for this to work effectively, I needed a knowledge base of existing code—aggregated from different repositories—organized in a way that was compatible with a language model (LLM).

The challenge was in filtering out unnecessary files and directories. Some files and folders needed to be ignored to keep the dataset clean and relevant. To streamline this process, I created a collect file script. This script aggregated only the required files, adding appropriate descriptions to each, and ensured that only the useful content made it into the knowledge base.

With this system in place, the script generated a "prompt file"—a structured input that I could use directly with ChatGPT to guide the output. This helped maintain consistency and precision throughout the generated code, ensuring it followed the platform requirements I had set.

I ended up creating a beautiful script at writing/linkedin-scraper/collect_files.py

In the end, the approach worked beautifully. The combination of curated prompts and an LLM-ready knowledge base made the code generation process efficient and aligned with my vision. It’s a workflow I’d highly recommend if you need to generate code that’s tailored to specific standards or architectures!

Let's Connect