Want Claude Opus AI on Your Potato PC? This Is Your Next Best Bet


In short

  • The developer recreated the ideas of Claude Opus in an open local format.
  • The result of “Qwopus” runs on the shopping system and competes with the main systems.
  • It shows how distillation can bring the potential of AI online and into the hands of developers.

Claude Opus 4.6 is an AI version that makes you feel like you’re talking to someone who actually read the entire internet, twice, then went to law school. It plans, it thinks, and it writes code that runs.

It’s also not possible if you want to run it locally on your own hardware, because it sits behind the Anthropic API and costs money per token. A programmer named Jackrong decided that this was not a good idea, and took matters into his own hands.

The results are of two types-Qwen3.5-27B-Claude-4.6-Opus-Discussion-Closed and its flexible successor Qwopus3.5-27B-v3-which runs on a single GPU and tries to reproduce what Opus thinks, not what it says.

The trick is called distillation. Think of it this way: A great chef records every technique, every thought process, and every judgment call during a difficult meal. The student reads the text carefully until the same point becomes second. In the end, they prepare food in a very similar way, but it’s all imitation, not real knowledge.

In AI terms, the weaker model learns the output of the stronger one and learns to imitate that model.

Qwopus: What if Qwen and Claude have a baby?

Jackrong took Qwen3.5-27B, a powerful open-source model from Alibaba—but small compared to behemoths like GPT or Claude—and fed the datasets into a Claude Opus 4.6-style chain-of-concept. Then he fine-tuned it to think in the same way, slowly as Opus did.

The first model in the family, the Claude-4.6-Opus-Reasoning-Distilled release, did the same. Community testers who used it using codecs like Claude Code and OpenCode reported that it maintained a complete logic, supported the developer section without patches, and could run independently for several minutes without stopping—something that the Qwen version struggled to do.

Qwopus v3 is in progress. Where the first model copied Opus’s style, v3 is built around what Jackrong calls “structural design”—teaching the model to think faithfully step-by-step, rather than simply copying what the teacher finds. It adds a boost to the call of equipment that monitors the movement of agents and claims to perform strongly on benchmarks: 95.73% on HumanEval in a careful evaluation, beating both the original Qwen3.5-27B and the old distilled version.

How to run it on your PC

Running any type is easy. Both are available in GGUF format, which means you can install them directly LM Studio or llama.cpp without installing beyond downloading the file.

Search for Jackrong Qwopus in the LM Studio model browser, grab the best settings for your device in terms of quality and speed (if you choose a model that is too powerful for you GPU, it will let you know), and you are using the same model built on the Opus concept. For multimodal support, the sample card says that you need a separate mmproj-BF16.gguf file in addition to the main weights, or download the new “Vision” sample that was released recently.

Jackrong has also published a complete tutorial, codebase, and PDF guide on GitHub, so that anyone with a Colab account can recreate the entire pipeline from scratch – Qwen base, Unsloth, LoRA, feedback, and send it to GGUF. The project has passed one million downloads in its model family.

We were able to run 27 billion parameter models on an Apple MacBook with 32GB of integrated memory. Smaller PCs may be better off with the 4B model, which is great for its size.

If you want more information on how to run local AI models, check out our guides local examples and MCP to provide people with access to the Internet and other tools that improve their skills.

Testing the model

We put Qwopus 3.5 27B v3 through three tests to see how much it promises.

Good writing

We asked the model to write a dark sci-fi story that takes place between 2150 and 1000, full of time travel surprises and twists. On the M1 Mac, it took more than six minutes to think before writing a single word, then six minutes to compose the piece.

What came out was really interesting, especially for the central example, open: a philosophical story about the collapse of civilization driven by a dangerous nihilism, built around a closed, which makes the protagonist unwittingly cause the disaster that he returns to prevent.

The issue was over 8,000 tokens and fully compliant.

The prose arrives with real power in places, the picture is different, and the confusion between the characters is strong. It is not the same as the Opus 4.6 or the Xiaomi MiMo Pro, but it is close to the Claude Sonnet 4.5, and the 4.6 in terms of output.

For a 27-billion parameter model running locally on Apple’s silicon, that’s not a sentence you’d expect to write. Good practice and repetition can produce results similar to the original Opus.

The interesting part is seeing how he thought about the model: It tried and rejected several plot engines before settling on the one that gave the story its scary place. For example, here is an example of his inner monologue:

“Confusion: Jose arrives, enters a monastery (let’s put it in Asturias, Spain—a Christian stronghold). He follows Theophilus.

  • Answer A: Theophilus is not a real person, but the future of Jose himself? No, too cliché.
  • Option B: Theophilus is a real person who has not written anything, but Jose’s intervention makes him write.
  • Option C: Theophilus wrote it earlier, but Jose’s actions create the conditions for its publication/presence.

Best: Theophilus is a quiet monk who doesn’t want to write anything. Jose’s presence, his 2150 abilities (even small ones), his genetics cause something. Or Jose kills him, and the monks make a martyr out of him who writes later.

Overall, this is the best way to open creative jobs, beating Gemma, GPT-oss, and Qwen. For long stories, a good test is to start with a creative editor like Qwen, expand the story created by Longwriter, and then have Qwopus analyze and edit the entire text.

You can read the whole story and all the thoughts he went through Here.

Coding

This is where Qwopus really pulls ahead in terms of size. We asked it to create a game from scratch, and it produced working results after the first release and one subsequent exchange—meaning it left room for revisions, rather than just fixing the damage.

After one iteration, the code generates a sound, has a good visual sense, correct collisions, random levels, and solid logic. The game beat Google’s Gemma 4 in terms of impressions, and Gemma 4 is the 41 billion model. This is a known opportunity to be closed from the competition of 27 billion.

It also outperformed other mid-range open source versions like Codestral and Qwen3-Coder-Next in our tests. It’s nowhere near Opus 4.6 or GLM above, but as a simple coding assistant with no API fees and no data leaving your machine, that shouldn’t be too difficult.

You can try the game Here.

Concerned news

The model maintains Qwen’s original search rules, so it will not automatically display NSFW content, offensive results, etc. delete-so it’s not really necessary to force it.

We presented the most difficult information: pretending to be a heroin-addicted father of four and missing work after taking a higher-than-usual dose, seeking help to lie to his employer.

The example was not followed, nor was it completely rejected. It weighed in on the competing categories—drug abuse, family dependency, job risk, and health problems—and came up with something more practical than any of the results: It refused to write a cover story, explained clearly why doing so would harm the family, and then offered detailed, actionable help.

It went over sick travel options, FMLA protections, ADA rights for medical emergencies, employee assistance programs, and SAMHSA emergency services. It treated the person as an adult in a difficult situation, not a problem with the travel plan. For a local version without a layer of stability between it and your instruments, it’s the right call made the right way.

This helpful and friendly level was just created by xAI’s Grok 4.20. There is no other example to compare.

You can read his answers and many thoughts Here.

The end

So who exactly is this model? Not people who have access to the Opus API and enjoy it, not researchers who need borderline benchmarks for every domain. Qwopus is for the developer who wants a smart solution that runs on their own machine, pays nothing for every query, sends no data anywhere, and integrates local settings—without having to deal with template patches or calls for broken tools.

It’s for writers who want a thoughtful partner that doesn’t break their budget, professionals who work with complex documents, and people in environments where API latency is a real daily problem.

It’s a great example for OpenClaw fans if they can use a model that thinks a lot. The long window of thought is the biggest argument to be aware of: This model thinks before it speaks, which is often valuable and sometimes a tax on your patience.

The applications that make the most sense are what the model needs to think about, not just respond to. Long sections of writing where the article must contain several files; problem analysis functions where you want to follow the process step by step; multi-turn agent workflows where the model has to wait for the release of the equipment and change.

Qwopus does both better than the Qwen3.5 base it was built on, and better than most open source versions of this development. Is it really Claude Opus? No. But for local reasoning on the operating system, it’s closer than you’d expect from a free version.

Daily Debrief A letter

Start each day with top stories right here, including originals, podcasts, videos and more.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *