Ufuna umsizi wezwi omncane olandela umholi wakho, osebenza ngehadiwe yakho, futhi ongeke ngephutha u-ode amaphayinaphu ayishumi nambili ngoba engakuzwi kahle? Umsizi we-AI we-DIY one-Raspberry Pi uyinto efinyelelekayo ngokumangazayo, ejabulisayo, futhi eguquguqukayo. Uzoxhumanisa igama elivusayo, ukuqashelwa kwenkulumo (i-ASR = ukuqashelwa kwenkulumo okuzenzakalelayo), ubuchopho bolimi lwemvelo (imithetho noma i-LLM), kanye nombhalo-kuya-enkulumweni (i-TTS). Engeza izikripthi ezimbalwa, isevisi eyodwa noma ezimbili, kanye nokulungiswa komsindo okucophelelayo, futhi unesikhulumi esihlakaniphile esikwazi ukulalela imithetho yakho.
Ake sikususe kusukela ku-zero kuya ekukhulumeni ne-Pi yakho ngaphandle kokudonsa izinwele okuvamile. Sizomboza izingxenye, ukusetha, ikhodi, ukuqhathanisa, ama-gotchas... yonke i-burrito. 🌯
Izihloko ongase uthande ukuzifunda ngemva kwalesi:
🔗 Indlela yokufunda i-AI ngempumelelo
Dala imephu yokufunda, prakthiza amaphrojekthi, bese ulandela inqubekela phambili.
🔗 Ungaqala kanjani inkampani ye-AI
Qinisekisa inkinga, yakha i-MVP, hlanganisa iqembu, uvikele amakhasimende okuqala.
🔗 Indlela yokusebenzisa i-AI ukuze ukhiqize kakhudlwana
Yenza imisebenzi ejwayelekile isebenze ngokuzenzakalelayo, yenza lula imisebenzi yokusebenza, futhi wandise umphumela wokudala.
🔗 Indlela yokufaka i-AI ebhizinisini lakho
Thola izinqubo ezinomthelela omkhulu, sebenzisa ama-pilot, ulinganise i-ROI, isikali.
Yini Eyenza Umsizi We-AI Ohle We-DIY nge-Raspberry Pi ✅
-
Kuyimfihlo ngokuzenzakalelayo – gcina umsindo usendaweni lapho kungenzeka khona. Nguwe onquma ukuthi yini ephuma kudivayisi.
-
ze-Modular - shintsha njenge-Lego: i-wake word engine, i-ASR, i-LLM, i-TTS.
-
Ingabizi kakhulu – ikakhulukazi umthombo ovulekile, imakrofoni yezimpahla, izipikha, kanye ne-Pi.
-
iyabanjwa – ufuna ukuzenzakalela ekhaya, amadeshibhodi, izindlela zokusebenza, amakhono enziwe ngokwezifiso? Kulula.
-
Ithembekile - ilawulwa yisevisi, iqala ukulalela ngokuzenzakalelayo.
-
Kumnandi – uzofunda okuningi ngomsindo, izinqubo, kanye nomklamo oqhutshwa yimicimbi.
Icebiso elincane: Uma usebenzisa i-Raspberry Pi 5 futhi uhlela ukusebenzisa amamodeli endawo anzima, i-clip-on cooler iyasiza lapho umthwalo uqhubeka isikhathi eside. (Uma ungabaza, khetha i-Active Cooler esemthethweni eyenzelwe i-Pi 5.) [1]
Izingxenye Namathuluzi Ozowadinga 🧰
-
I-Raspberry Pi : I-Pi 4 noma i-Pi 5 inconywa ukuthi isetshenziswe ekhanda.
-
Ikhadi le-microSD : Kunconywa i-32 GB+.
-
Imakrofoni ye-USB : imakrofoni elula ye-USB conference inhle kakhulu.
-
Isipikha : Isipikha se-USB noma esingu-3.5 mm, noma i-amp HAT ye-I2S.
-
Inethiwekhi : I-Ethernet noma i-Wi-Fi.
-
Izinto ezinhle ongazikhetha: ikesi, i-cooler esebenzayo ye-Pi 5, inkinobho yokucindezela ukuze ucindezele ukuze ukhulume, indandatho ye-LED. [1]
Ukusethwa kwe-OS kanye ne-Baseline
-
I-Flash Raspberry Pi OS ene-Raspberry Pi Imager. Kuyindlela eqondile yokuthola i-microSD ebhuthwayo ngamasethingi owafunayo. [1]
-
Qala, xhuma kunethiwekhi, bese ubuyekeza amaphakheji:
ukuvuselelwa kwe-sudo apt && ukuvuselelwa kwe-sudo apt -y
-
Izisekelo zomsindo : Ku-Raspberry Pi OS ungasetha okukhiphayo okuzenzakalelayo, amazinga, namadivayisi nge-UI yedeskithophu noma
i-raspi-config. Umsindo we-USB ne-HDMI usekelwa kuwo wonke amamodeli; okukhiphayo kwe-Bluetooth kuyatholakala kumamodeli ane-Bluetooth. [1] -
Qinisekisa amadivayisi:
irekhodi -l ukudlala -l
Bese uhlola ukuthwebula nokudlala. Uma amazinga ebonakala engajwayelekile, hlola ama-mixer kanye nokuzenzakalelayo ngaphambi kokusola imakrofoni.

Ukwakhiwa Kwazo Ngokushesha 🗺️
ohlakaniphile we-DIY one-Raspberry Pi flow ubukeka kanje:
Ukuvuka kwezwi → ukuthwebula umsindo bukhoma → ukubhalwa kwe-ASR → ukuphathwa ngenhloso noma i-LLM → umbhalo wokuphendula → i-TTS → ukudlala umsindo → izenzo zokuzikhethela nge-MQTT noma i-HTTP.
-
Izwi Lokuvuka : I-Porcupine incane, inembile, futhi isebenza endaweni ngokulawula ukuzwela kwegama elingukhiye ngalinye. [2]
-
I-ASR : I-Whisper iyimodeli ye-ASR esebenzisa izilimi eziningi, enenhloso ejwayelekile eqeqeshwe amahora angama-~680k; inamandla ekukhulumeni/umsindo wangemuva. Ukusetshenziswa kudivayisi,
i-whisper.cppinikeza indlela yokuphetha ye-C/C++ elula. [3][4] -
Ubuchopho : Ukukhetha kwakho - i-LLM yamafu nge-API, injini yemithetho, noma isiphetho sendawo kuye ngamandla ehhashi.
-
I-TTS : I-Piper ikhiqiza inkulumo yemvelo endaweni, ngokushesha okwanele ukuze iphendule ngokushesha kuhadiwe encane. [5]
Ithebula Lokuqhathanisa Okusheshayo 🔎
| Ithuluzi | Okuhle Kakhulu Kwaba | Intengo-ngokufanayo | Kungani Kusebenza |
|---|---|---|---|
| Izwi Lokuvuka Kwengungumbane | Isiqalisi esilalela njalo | Izinga lamahhala + | I-CPU ephansi, ukubopha okunembile, okulula [2] |
| I-Whisper.cpp | I-ASR Yendawo ku-Pi | Umthombo ovulekile | Ukunemba okuhle, kuyavumelana ne-CPU [4] |
| Ukuhleba Okusheshayo | I-ASR esheshayo ku-CPU/GPU | Umthombo ovulekile | Ukulungiswa kwe-CBtranslate2 |
| I-Piper TTS | Umphumela wenkulumo yendawo | Umthombo ovulekile | Amazwi asheshayo, izilimi eziningi [5] |
| Cloud LLM API | Ukucabanga okucebile | Kusekelwe ekusetshenzisweni | Ilayisha i-compute esindayo |
| I-Node-RED | Izenzo zokuhlela | Umthombo ovulekile | Ukugeleza okubonakalayo, okuhambisana ne-MQTT |
Ukwakhiwa Kwesinyathelo Ngesinyathelo: I-Voice Loop Yakho Yokuqala 🧩
Sizosebenzisa i-Porcupine njengegama lokuvuka, i-Whisper njengegama lokubhala, umsebenzi "wobuchopho" olula wempendulo (faka i-LLM oyikhethile), kanye ne-Piper njengegama lenkulumo. Gcina kuncane, bese uphinda.
1) Ukufaka okuxhomeke kukho
ukufaka i-sudo apt -y python3-pip portaudio19-dev sox ffmpeg pip3 ukufaka i-sounddevice numpy
-
I-Porcupine: thatha i-SDK/ama-bindings olimini lwakho bese ulandela ukuqala okusheshayo (ukhiye wokufinyelela + uhlu lwamagama angukhiye + ozimele bomsindo →
.process). [2] -
I-Whisper (evumelana ne-CPU): build whisper.cpp :
git clone https://github.com/ggml-org/whisper.cpp cd whisper.cpp && cmake -B build && cmake --build build -j ./models/download-ggml-model.sh base.en ./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f your.wav -otxt
Lokhu okungenhla kubonisa ukuqala okusheshayo kwephrojekthi. [4]
Uthanda i-Python?
i-faster-whisper(i-CBtranslate2) ivame ukuba mnandi kakhulu kune-vanilla Python kuma-CPU aphansi.
2) Setha i-Piper TTS
git clone https://github.com/rhasspy/piper cd piper make # Landa imodeli yezwi oyithandayo, isib. en_US-amy echo "Sawubona lapho." | ./piper --model voices/en/en_US-amy-medium.onnx --output_file hello.wav aplay hello.wav
I-Piper yenzelwe i-TTS ekudivayisi enezinketho eziningi zezwi/ulimi. [5]
3) Iluphu yomsizi encane ku-Python
Ihlanganiswe ngamabomu: ilinda umusho wokuvuka (i-stub), iqopha, ibhale nge -whisper.cpp , ikhiqiza impendulo (indawo), bese ikhuluma nge-Piper. Shintsha indawo nge-LLM yakho oyithandayo noma i-rule logic.
ngenisa i-os, i-subprocess, i-wave ngenisa idivayisi yomsindo njenge-sd WAKE_WORD = "hey computer" # swap for Porcupine in production [2] RECORD_SECONDS = 6 SAMPLE_RATE = 16000 CHANNELS = 1 WORKDIR = "/home/pi/assistant" ASR_BIN = "/home/pi/whisper.cpp/build/bin/whisper-cli" # [4] ASR_MODEL = "/home/pi/whisper.cpp/models/ggml-base.en.bin" PIPER_BIN = "/home/pi/piper/build/piper" # [5] PIPER_VOICE = "/home/pi/piper/voices/en/en_US-amy-medium.onnx" os.makedirs(WORKDIR, exist_ok=True) def record_wav(path, seconds=RECORD_SECONDS): audio = sd.rec(int(seconds * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype='int16') sd.wait() with wave.open(path, 'wb') as w: w.setnchannels(CHANNELS); w.setsampwidth(2); w.setframerate(SAMPLE_RATE) w.writeframes(audio.tobytes()) def transcribe(path): cmd = [ASR_BIN, "-m", ASR_MODEL, "-f", path, "-otxt"] subprocess.run(cmd, check=True, cwd=WORKDIR) with open(path.replace(".wav", ".txt"), "r", encoding="utf-8") as f: return f.read().strip() def generate_reply(prompt): if "weather" in prompt.lower(): return "Angikwazi ukubona amafu, kodwa kungase kube kuhle. Letha ijakethi uma kwenzeka." buyisela "Uthe: " + prompt def speak(text): proc = subprocess.Popen([PIPER_BIN, "--model", PIPER_VOICE, "--output_file", f"{WORKDIR}/reply.wav"], stdin=subprocess.PIPE) proc.stdin.write(text.encode("utf-8")); proc.stdin.close(); proc.wait() subprocess.run(["aplay", f"{WORKDIR}/reply.wav"], check=True) print("Umsizi ulungile. Thayipha umusho wokuvuka ukuze uwuhlole.") ngenkathi iQiniso: typed = input("> ").strip().lower() uma uthayiphe == WAKE_WORD: wav_path = f"{WORKDIR}/input.wav" record_wav(wav_path) text = transcribe(wav_path) reply = generate_reply(text) print("Umsebenzisi:", text); phrinta ("Umsizi:", phendula) khuluma(phendula) okunye: phrinta ("Thayipha umusho wokuvuka ukuze uhlole iluphu.")
Ukuze uthole ukuvuka kwangempela, hlanganisa i-Porcupine's streaming detector (i-CPU ephansi, ukuzwela kwegama elingukhiye ngalinye). [2]
Ukuhlela Umsindo Okubaluleke Ngempela 🎚️
Ukulungiswa okuncane kwenza umsizi wakho azizwe ehlakaniphe kakhulu nge-10×:
-
Ibanga lemakrofoni : 30–60 cm liyindawo emnandi kuma-microphone amaningi e-USB.
-
Amazinga : gwema ukunqamula kokufaka futhi ugcine ukudlala kuhlelekile; lungisa umzila ngaphambi kokuxosha ama-code ghosts. Ku-Raspberry Pi OS, ungaphatha idivayisi yokukhipha kanye namazinga ngamathuluzi esistimu noma
i-raspi-config. [1] -
Imisindo yegumbi : izindonga eziqinile zibangela ukuzwakala; umata othambile ngaphansi kwemakrofoni uyasiza.
-
Umkhawulo wamagama okuvuka : ukuzwela kakhulu → izisusa zezipoki; ukuqinela kakhulu → uzobe umemeza ipulasitiki. I-Porcupine ikuvumela ukuthi ulungise ukuzwela ngegama elingukhiye ngalinye. [2]
-
Ama-Thermals : ukubhalwa okude ku-Pi 5 kuzuza ku-cooler esebenzayo esemthethweni ukuze kusebenze kahle. [1]
Ukusuka Kuthoyizi Kuya Kumishini: Izinsizakalo, Ukuqala Ngokuzenzakalelayo, Ukuhlolwa Kwezempilo 🧯
Abantu bayakhohlwa ukusebenzisa izikripthi. Amakhompyutha ayakhohlwa ukuba muhle. Guqula i-loop yakho ibe yisevisi ephethwe:
-
Dala iyunithi yesistimu:
[Iyunithi] Incazelo=DIY Voice Assistant After=network.target sound.target [Isevisi] Umsebenzisi=pi WorkingDirectory=/home/pi/assistant ExecStart=/usr/bin/python3 /home/pi/assistant/assistant.py Qala kabusha=njalo Qala kabushaSec=3 [Faka] WantedBy=multi-user.target
-
Yinike amandla:
sudo cp assistant.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable --now assistant.service
-
Imisila yezingodo:
journalctl -u umsizi -f
Manje iqala lapho iqalisa, iqala kabusha lapho iphahlazeka, futhi ngokuvamile isebenza njengomshini wokusebenza. Kuyisicefe kancane, kungcono kakhulu.
Uhlelo Lwamakhono: Lwenze Lube Lusizo Ekhaya 🏠✨
Uma ukuzwakala kwezwi kanye nokuzwakala kwezwi sekuqinile, engeza izenzo:
-
I-router yenhloso : imizila elula yamagama angukhiye emisebenzi evamile.
-
I-Smart home : shicilela imicimbi ku-MQTT noma ushayele ama-HTTP endpoints e-Home Assistant.
-
Ama-plugin : imisebenzi ye-Python esheshayo efana
ne-set_timer,what_is_the_time,play_radio,run_scene.
Ngisho noma une-LLM yamafu, hambisa imiyalo yendawo ecacile kuqala ukuze uthole isivinini nokuthembeka.
Usizo Lwasendaweni Kuphela Uma Luqhathaniswa Nosizo Lwamafu: Ukushintshana Okuzozizwa 🌓
Zasendaweni kuphela
: izindleko zangasese, ezingaxhunyiwe ku-inthanethi, nezibikezelwayo.
Okubi: amamodeli asindayo angase ahambe kancane emabhodini amancane. Ukuqeqeshwa kwezilimi eziningi kukaWhisper kusiza ngokuqina uma ukugcine kudivayisi noma kuseva eseduze. [3]
Usizo lwamafu
Izinzuzo: ukucabanga okunamandla, amafasitela omongo omkhulu.
Okubi: idatha ishiya idivayisi, ukuncika kwenethiwekhi, izindleko eziguquguqukayo.
I-hybrid ivame ukuwina: wake word + ASR local → shayela i-API yokucabanga → TTS local. [2][3][5]
Ukuxazulula izinkinga: Ama-Strange Gremlins kanye nokulungiswa okusheshayo 👾
-
Vusa amagama angewona amanga abangela ukuzwela : kwehlisa ukuzwela noma zama imakrofoni ehlukile. [2]
-
I-ASR lag : sebenzisa imodeli encane ye-Whisper noma wakhe
i-whisper.cppenamafulegi okukhululwa (-j --config Release). [4] -
I-Choppy TTS : dala imishwana evamile kusengaphambili; qinisekisa idivayisi yakho yomsindo kanye namanani esampula.
-
Akukho mic etholakele : hlola
i-arecord -lkanye nama-mixer. -
Ukucindezela okushisa : sebenzisa i-Active Cooler esemthethweni ku-Pi 5 ukuze usebenze kahle. [1]
Amanothi Okuphepha Nobumfihlo Okufanele Uwafunde Ngempela 🔒
-
Gcina i-Pi yakho ivuselelwe nge-APT.
-
Uma usebenzisa noma iyiphi i-API yamafu, bhala phansi lokho okuthumelayo bese ucabanga ngokuhlela kabusha ama-bits akho endaweni kuqala.
-
Sebenzisa izinsizakalo ezinelungelo elincane kakhulu; gwema
i-sudoku-ExecStart ngaphandle uma kudingeka. -
Nikeza imodi yendawo kuphela yezivakashi noma amahora okuthula.
Yakha Izinhlobo: Hlanganisa Futhi Ufanise Njengesandwich 🥪
-
I-Ultra-local : I-Porcupine + whisper.cpp + Piper + imithetho elula. Iyimfihlo futhi iqinile. [2][4][5]
-
Usizo lwefu olusheshayo : I-Porcupine + (i-Whisper encane yendawo noma i-ASR yamafu) + i-TTS yendawo + i-LLM yamafu.
-
Isikhungo sokuzenzakalela kwekhaya : Engeza ukugeleza kwe-Node-RED noma i-Home Assistant kwemisebenzi, izigcawu, kanye nezinzwa.
Ikhono Lesibonelo: Ukukhanyisa nge-MQTT 💡
ngenisa i-paho.mqtt.client njenge-mqtt MQTT_HOST = "192.168.1.10" ISIHLOKO = "ikhaya/igumbi lokuhlala/ukukhanya/isethi" def set_light(isimo: str): iklayenti = mqtt.Client() iklayenti.connect(MQTT_HOST, 1883, 60) inkokhelo = "VULIWE" uma isimo.lower().startswith("kuvuliwe") okunye "VULIWE" iklayenti.publish(ISIHLOKO, inkokhelo, qos=1, retain=False) iklayenti.disconnect() # uma "kuvula izibani" kumbhalo: set_light("kuvuliwe")
Faka umusho wezwi onjengokuthi: “vula isibani segumbi lokuphumula,” bese uzozizwa njengomthakathi.
Kungani Le Stack Isebenza Ngokwemvelo 🧪
-
I-Porcupine iyasebenza futhi inembile ekutholeni amagama avukile emabhodini amancane, okwenza ukulalela njalo kube nokwenzeka. [2]
-
Ukuqeqeshwa okukhulu kwezilimi eziningi kukaWhisper kwenza kube namandla ezindaweni ezahlukene kanye nezindlela zokukhuluma. [3]
-
i-whisper.cppigcina lawo mandla esetshenziswa kumadivayisi e-CPU kuphela njenge-Pi. [4] -
I-Piper igcina izimpendulo zishesha ngaphandle kokuthumela umsindo ku-TTS yamafu. [5]
Kude Kakhulu, Angikufundanga
Yakha i-DIY Assistant AI Assistant eyimfihlo ne-Raspberry Pi ngokuhlanganisa i-Porcupine ye-wake word, i-Whisper (nge- whisper.cpp ) ye-ASR, ukukhetha kwakho ubuchopho bezimpendulo, kanye ne-Piper ye-TTS yendawo. Yisonge njengesevisi ye-systemd, lungisa umsindo, bese uxhumanisa izenzo ze-MQTT noma ze-HTTP. Kushibhile kunalokho ocabanga, futhi kuyajabulisa ngendlela exakile ukuhlala nayo. [1][2][3][4][5]
Izinkomba
-
Isofthiwe Nokupholisa ye-Raspberry Pi – I-Raspberry Pi Imager (ukulanda nokusebenzisa) kanye nolwazi lomkhiqizo we-Pi 5 Active Cooler
-
I-Raspberry Pi Imager: funda kabanzi
-
I-Active Cooler (Pi 5): funda kabanzi
-
-
I-Porcupine Wake Word – i-SDK kanye nokuqala okusheshayo (amagama angukhiye, ukuzwela, ukuphetha kwendawo)
-
I-Whisper (imodeli ye-ASR) – I-ASR enamandla, enezilimi eziningi eqeqeshwe ngamahora angama-680k
-
URadford nabanye, Ukuqashelwa Kwenkulumo Okuqinile Ngokuqondiswa Okukhulu Okubuthakathaka (Ukuhleba): funda kabanzi
-
-
whisper.cpp - Ukuqagela kwe-Whisper okunobungani ne-CPU nge-CLI kanye nezinyathelo zokwakha
-
I-Piper TTS – I-TTS yemizwa esheshayo, yendawo enamazwi/izilimi eziningi