What Happens When You Give AI Control of Your Dinner Plans: An Experiment
You either love AI or hate AI. For us, we’re 50/50 on it — we want it to take over the menial tasks of life so that humans can focus on the important and creative stuff. AI is just not quite there yet. It’s overhyped. It’s great at pretending to know things!
Can it actually help with one of life's most relentless questions — what’s for dinner? We tested a handful of the most talked-about LLMs to find out if they can meal plan like a pro. Will it surprise us, or let us down?
😏 Mediocre Fact: Did you know that ChatGPT can’t reliably do multiplication? Try it out for yourself. Don’t believe us? Too lazy? Here’s an example:
It was close…
The actual answer.
But First, Let’s Talk About How AI Works
Have you ever pressed the predictive text options on your phone’s keyboard to finish a sentence? That’s what Large Language Models (LLMs) are doing. Of course, their predictions are a bit more sophisticated, but at the end of the day, it’s just guessing what the next word should be based on what’s come before.
As a result, LLMs shouldn’t be considered AI in the sense of a generalized intelligence — they aren’t actually thinking.
So how do they guess what should come next? Well, we can’t really get into that as it’s far too complex and not meant for a food blog. The tl;dr is that they use a bunch of statistics and fancy math to predict what should come next based on a gigantic set of training data that the LLM learned on.
Since they’re not actually thinking, they have no idea whether what they’re saying is right, wrong, conditionally right, or conditionally wrong. This means that you can sometimes get what are called hallucinations. These are false statements presented as facts. Identifying what is and isn’t a hallucination in an LLM response is part of being a responsible user of LLMs.
Sidenote: We are using the free versions of these AI tools. Sure, the paid version is probably better but who wants to pay for this shit? 💩
Here are the models used in this blog post:
ChatGPT-4.1
Claude Sonnet 4
Google Gemini 2.5 Flash
Microsoft Copilot (which uses GPT-4)
🧪 Experiment 1
Okay LLMs, time to work your magic.
Yes, this prompt is broad. We know. Let’s see how it does.
Prompt: “Create a meal plan for a 5-day work week.”
(You can click/tap on the images to make them larger!)
Our Thoughts
We gotta hand it to the LLMs, they sure know how to make a healthy meal plan. On the flip side, dear God, there’s so much Greek yogurt. In terms of creativity, it appears that the four models were all trained on very similar data sets as you will likely notice many similarities across the four.
ChatGPT suggests that you cook in bulk and use leftovers for lunches, except that it then proceeds to not use its own suggestion and makes you make a unique meal for every meal. Q’est ce que fuck?
Claude suggests you have a single hard boiled egg for a snack on Thursday. So we’re just boiling one egg for the week? Okay then.
Google Gemini assumed we had leftover chicken from the weekend that we could use in a salad for Monday’s lunch (huh?)
Microsoft Copilot loves avocado. (Does it know that we’re millennials?)
With a prompt this broad, it gave us what we expected: a generic meal plan that sounds delicious, but is a fuck ton of work to implement. We (and most people, we assume) don’t have time for that.
🧪 Experiment 2
Okay. So our first prompt was obviously very, very broad. Let’s narrow it down a bit, by asking it to use similar ingredients. Who has time to cook something unique for every meal? We sure don’t.
Prompt: “Create a meal plan for a 5-day work week utilizing similar ingredients to cut down on prep work.”
(You can click/tap on the images to make them larger!)
Our Thoughts
AI is not good at simplifying a pre-existing list. The keen eyed observers among you will notice that, across the four, the meal plans barely changed, if they changed at all (shoutout to Claude for actually understanding the assignment, but maybe it minimized the ingredients a bit too much). The ingredients that were reused tended to be simple things like spinach, tomatoes, and apples. For the GPT-4 based LLMs proteins were often not recycled.
ChatGPT did very little to switch things up or simplify them. We don’t know if this is laziness or a hallucination. At least laziness is a human trait.
Claude went hard on the chicken. (Also the almonds and almond butter.) You’re eating chicken for lunch and dinner, just with slightly different combinations. Claude had the best intentions and also gave us a list of to-do items for a one-time prep session on Sunday (30 minutes!), a grocery list (only 15 items!), and said that our daily prep would be 5 minutes max. Wow. However, it does not take into consideration that we have to make the soup sometime. When is that happening? Who knows.
Claude was really aiming for the gold star with this one.
Google Gemini provided the meal plan, but didn’t anticipate our needs like Claude did with the extra information. That being said, we’re happy we don’t have to eat chicken for every meal. It does provide a one-off meal of salmon, but hey, your co-workers will be happy you’re not bringing that into the office to reheat for lunch.
Microsoft Copilot gave you basically the exact same thing as the first time, much like ChatGPT. Did you enjoy spending several hours each day cooking unique meals? Well with Copilot’s simplified plan, you can continue to spend several hours each day cooking unique meals. Must be a GPT-4 thing.
🧪 Experiment 3
Can LLMs provide provide recipes when asked? Let’s find out!
Prompt: “Create a meal plan for a 5-day work week including recipes.”
ChatGPT
Just cook some eggs and throw in veggies. Voila, you have an omelet.
The meal plan that ChatGPT provided us starts well, but ends up being disappointing. The first few recipes that it provided us were, technically, recipes. But then around Wednesday it just sort of gave up. How do you make an omelet? How do you crisp up tofu? ChatGPT gave no directions and just said “do the thing” essentially. So if you’re not confident in the kitchen, this isn’t great.
Another complaint is the lack of… flavor. Rarely did it recommend salt and pepper, instead relying on things like salsa and honey to give any sort of flavour. (Which don’t get us wrong, is great, but we’d love to see some seasonings used too.)
Claude
This actually sounds like it would slap.
Claude beats ChatGPT, hands down. It was easy to compare, as they came up with a few similar dishes: greek yogurt for breakfast (surprise surprise), avocado toast, stir fry, turkey wrap, and a quinoa bowl. All of the recipes Claude provided had more ingredients (but not like, an egregious amount, just enough to add more flavor) and provided better instructions.
You go, Claude.
Google Gemini
Thanks, Gemini. FOR NOTHING.
It must have been having a bad day.
Microsoft Copilot
Hey Copilot, come here real quick. We just want to talk. * whispers * WHAT’S A CURRY SPICE?! TELL US!
It’s almost cool. We like how it scraped the web for legit recipes, so you know that it’s not just pulling them out of its cybernetic brain. However, it didn’t do that for all the meals. Which would be fine if it gave a recipe for the non-sourced ones. But it fails at that, instead just listing a bunch of ingredients. This is not a recipe!
Our Thoughts
AI gets a failing grade when it comes to giving recipes. Of the four LLMs we tested, Claude was light-years ahead in terms of providing actual recipes (step-by-step instructions). Copilot gets second place for providing some sources as links without needing to be prompted. ChatGPT mostly fails because the recipes that it does provide are atrocious and would taste bland. Gemini is an auto fail because it could not complete the task. We attempted twice and couldn’t generate a meal plan both times.
🧪 Experiment 4
This is the big one. The first 3 prompts were all very simple. But this? We’re giving the LLMs a lot to go off of. Let’s see how they do!
Prompt: “Create a meal plan for a 5-day work week including recipes (provide links) for breakfast, lunch, and dinner. Also provide snack ideas. Dietary restrictions include no seafood (preference, fish sauce is okay). I like Italian, Japanese, and Indian food. I do not want to have to cook for every meal, there can be leftovers for lunch or the next day’s supper if it reheats well. I want to get at least 80g-120g of protein a day. I have rice and basic spices and condiments available. I also have eggs and some lettuce that’s about to go bad but can buy other ingredients. Create a grocery list to go with it.”
ChatGPT
Lies! Deception! Every day more lies!
We wanted this one to be ChatGPT’s redemption. It looked so promising, however, it ultimately failed the test. It did technically incorporate all of our requests, however as we said at the beginning of this post, ChatGPT can not actually do math. We figured it was going to not be super accurate with the protein calculations, but after spot checking a few of them, they were just blatantly wrong.
Moving along, it failed to provide links to recipes for several of the meals listed, and when it did provide them, it hallucinated chicken where there should not have been chicken.
ChatGPT also showed favoritism towards Indian and Italian cuisine, with only breakfast and lunch on day one having Japanese recipes. Now, we didn’t specify how many we wanted for each type of cuisine, but this was interesting.
Claude
These links are a barren wasteland of 404 errors.
Claude, you were so close to being perfect on this one. Your protein calculations? Immaculate. Your variety? Immaculate. Your ability to form an internally consistent grocery list? Immaculate. Your ability to give us working links? Hot garbage.
It’s so interesting that it was very consistently giving us links that were ending up with an Error 404. We don’t know if that’s because it was generating the links and thus suffering from hallucinations, or, whether it was pulling from some older version of data where the links all worked. Our money is on hallucinations as it seems to be too much of a coincidence to have so many links all resulting in an Error 404.
We’ll give Claude a Solid C+. (And C’s get degrees!)
Google Gemini
Gemini: “Protein Boost!”
Us: *cringe*
Google Gemini refused to provide any protein estimate at all. Instead it just told us that chicken boosts the protein of a meal. No shit, Sherlock.
Also, just like Claude, the links were broken. Every single one. Food blogs, you should be big mad. It’s taking your content, summarizing it shittily, and then not actually citing you properly so people can ACTUALLY make your recipes. Sad days. 😢
Not much else to say here. Google Gemini, this was not good.
Microsoft Copilot
How do we have leftover chicken parm BEFORE we have made it, HMMMMM? Einstein rolling over in his grave rn.
It appears that after Bill Gates perfected beaming 5G into our heads via the COVID vaccine, he then set to work on time travel. Congrats everyone, we’re living in the future.
Hey Copilot, how are having leftovers before we’ve actually cooked the meal? You’re breaking space time causality.
But hey, at least more of your links worked.
Our Thoughts
Yup, a better prompt does better. It’s not perfect, as LLMs seem to have a problem with providing proper links to their sources. Overall Claude did the best. Good job, kid. #notsponsored
Final Thoughts
Claude gets the third place ribbon. Copilot gets the participation trophy. ChatGPT slept in. Gemini didn’t even know that it was track and field day.
Let’s talk about the elephant (or cow) in the room: where’s the beef? Also, where’s the pork? LLMs really favored giving us chicken and vegetarian recipes. Also, Greek yogurt. Greek yogurt. Greek yogurt. Greek yogurt. Greek yogurt. We get it. GREEK YOGURT. Big Greek Yogurt™ has clearly funded AI. We believe that this is a result of the training data heavily being skewed towards recipes that come from bodybuilders/fitness influencers who favor super lean meat like chicken and turkey and/or vegetarian dishes including tofu.
Turns out that LLMs are also easily confused when you ask for only 5 days worth of meal planning. (Many assumed we’d have leftovers from Sunday to use for Monday.) We think this probably has to do with their training data heavily favoring doing meal prep for an entire 7 day week that repeats.
One of the biggest disappointments with most of the LLMs is that the flavor profile of the meal plans tended to be pretty boring. Take a protein, take a vegetable, heat them up. That’s about it. It wasn’t until we asked the LLMs to include different cuisines that we started seeing variety and better/more spices.
In the end, these tools aren’t going to be able to create a perfect meal plan for you. But, they can give you a start if you’re stuck. You’ll still have to use your brain by double-checking that your shopping list is correct and sourcing your own recipes based on the ideas they give you (unless the links work). Of course, for these experiments we just used a single prompt. If you don’t like part of the LLMs answer and want them to modify it, you can follow up and give it more directions to tweak your meal plan. It still won’t be completely logical, but again, it’s a start. And to be honest, starting is the hardest part.
Building Your Meal Plan Prompt
You could just say, “Create a meal plan for a 5-day work week” but that’s not giving the AI much to work with and is also a bit too much freedom (see experiments 1-3 above). We can do better!
When building your custom prompt, you’ll want to think about a few things:
How many meals do you want it to plan? Do you want breakfast, lunch, and dinner? Snacks too?
What dietary restrictions or allergies do you have? Those are important!
Are there any foods that you don’t like?
What cuisines do you like or gravitate to?
Do you want meals that you can prepare ahead of time, or do you want to cook every day? Do you want leftovers?
Do you have any caloric or macronutrient goals that you want to meet?
Do you have a specific ingredient that’s about to go bad in your fridge that you want to use up? What else is in your fridge or pantry?
What appliances do you have available? (e.g. oven/stove, air fryer, blender, etc.)
What’s your budget? Can you splurge on steak, or is it a rice and beans kinda week?
This is a lot to think about. It’s what makes meal planning so exhausting. And it’s not just the meal planning — it’s creating the shopping list, going to the grocery store (or placing your online order), prepping your ingredients, cooking, and cleaning.
AI can’t cook or clean for us (yet), but it can, technically, make a meal plan and it can create a grocery list.
It does take some work to build your prompt, but once it’s built, it’s built! You just have to tweak it slightly the next time you use it.
If you don’t like “editing someone else’s homework” then do it yourself and start from scratch. This won’t be for you. But if have trouble starting or have decision paralysis, ask Claude (or your favorite LLM) for help.
Hey r/ChatGPT, where is your God now?