Ufuna umsizi wezwi omncane olandela ukuhola kwakho, osebenzisa ihadiwe yakho, futhi ongeke ngephutha ode uphayinaphu abayishumi nambili ngoba ukuzwanga kahle? Umsizi we-DIY AI one-Raspberry Pi uyafezeka ngendlela emangalisayo, uyajabulisa, futhi uyavumelana nezimo. Uzohlanganisa igama lokuvuka, ukubonwa kwenkulumo (ASR = ukunakwa kwenkulumo okuzenzakalelayo), ubuchopho bolimi lwemvelo (imithetho noma i-LLM), kanye nombhalo-kuya-enkulumweni (TTS). Engeza imibhalo embalwa, isevisi eyodwa noma ezimbili, nokunye okulungisiwe okucophelelayo komsindo, futhi unesipikha esihlakaniphile esiphathekayo esithobela imithetho yakho.
Ake sikususe ku-zero kuye kokuthi ukhulume-no-yakho-Pi ngaphandle kokudonsa izinwele okuvamile. Sizofaka izingxenye, ukusetha, ikhodi, ukuqhathanisa, ama-gotchas... yonke i-burrito. 🌯
Izindatshana ongathanda ukuzifunda ngemva kwalesi:
🔗 Ungayifunda kanjani i-AI ngempumelelo
Dala umgwaqo wokufunda, prakthiza amaphrojekthi, futhi ulandelele ukuqhubeka.
🔗 Ungayiqala kanjani inkampani ye-AI
Qinisekisa inkinga, yakha i-MVP, hlanganisa iqembu, amakhasimende okuqala avikelekile.
🔗 Isetshenziswa kanjani i-AI ukuze ikhiqize kakhudlwana
Yenza ngokuzenzakalelayo imisebenzi yejwayelo, qondisa ukugeleza komsebenzi, futhi uthuthukise umphumela wokudala.
🔗 Ungayifaka kanjani i-AI ebhizinisini lakho
Khomba izinqubo ezinomthelela omkhulu, sebenzisa abashayeli bezindiza, kala i-ROI, isikali.
Yini eyenza umsizi omuhle we-DIY AI nge-Raspberry Pi ✅
-
Kuyimfihlo ngokuzenzakalelayo - gcina umsindo usendaweni lapho kungenzeka khona. Uyanquma ukuthi yini eshiya idivayisi.
-
I-Modular - shintshanisa izingxenye ezifana ne-Lego: injini yegama lokuvuka, i-ASR, i-LLM, i-TTS.
-
Kuyathengeka - ikakhulukazi umthombo ovulekile, imakrofoni yempahla, izipikha, kanye ne-Pi.
-
I-Hackable - ufuna i-automation yasekhaya, amadeshibhodi, izinqubo, amakhono angokwezifiso? Kulula.
-
Inokwethenjelwa - ilawulwa ngesevisi, iqala futhi iqala ukulalela ngokuzenzakalelayo.
-
Kuyajabulisa - uzofunda okuningi mayelana nomsindo, izinqubo, kanye nomklamo oqhutshwa umcimbi.
Ithiphu elincane: Uma usebenzisa i-Raspberry Pi 5 futhi uhlela ukusebenzisa amamodeli endawo asindayo, i-clip-on cooler isiza ngaphansi komthwalo oqhubekayo. (Uma ungabaza, khetha I-Active Cooler esemthethweni eyenzelwe i-Pi 5.) [1]
Izingxenye Namathuluzi Ozowadinga 🧰
-
I-Raspberry Pi : I-Pi 4 noma i-Pi 5 enconyelwe i-headroom.
-
Ikhadi le-microSD : 32 GB+ liyanconywa.
-
Imakrofoni ye-USB : imakrofoni yenkomfa ye-USB elula muhle.
-
Isipikha : Isipika se-USB noma esingu-3.5 mm, noma i-I2S amp HAT.
-
Inethiwekhi : I-Ethernet noma i-Wi-Fi.
-
Izinto ezinhle ozikhethela zona: ikesi, i-cooler esebenzayo ye-Pi 5, inkinobho yokusunduza yokusunduza ukuze ukhulume, indandatho ye-LED. [1]
I-OS & Ukusethwa Kwesisekelo
-
I-Flash Raspberry Pi OS ene-Raspberry Pi Imager. Yindlela eqondile yokuthola i-microSD ebhuthayo ngokusethwa ngaphambilini okufunayo. [1]
-
Qalisa, xhuma kunethiwekhi, bese ubuyekeza amaphakheji:
sudo apt update && sudo apt upgrade -y
-
Izisekelo zomsindo : Ku-Raspberry Pi OS ungasetha okukhiphayo okuzenzakalelayo, amaleveli, namadivayisi usebenzisa i-UI yedeskithophu noma
i-raspi-config. I-USB ne-HDMI yomsindo isekelwa kuwo wonke amamodeli; Okukhiphayo kwe-Bluetooth kuyatholakala kumamodeli ane-Bluetooth. [1] -
Qinisekisa amadivayisi:
irekhodi -l dlala -l
Bese uhlola ukuthwebula nokudlala. Uma amaleveli ebonakala eyinqaba, hlola izihlanganisi nokuzenzakalelayo ngaphambi kokusola imakrofoni.
I-Architecture Ngokubuka nje 🗺️
onengqondo kwe-Raspberry Pi ubukeka kanje:
Vuselela igama → ukuthwebula komsindo okubukhoma → Ukuloba kwe-ASR → ukuphatha okuhlosiwe noma i-LLM → umbhalo wempendulo → TTS → ukudlalwa komsindo → izenzo ozikhethela zona nge-MQTT noma i-HTTP.
-
Wake izwi : I-Porcupine incane, inembile, futhi isebenza endaweni ngokulawula ukuzwela kwegama ngalinye elingukhiye. [2]
-
I-ASR : I-Whisper imodeli ye-ASR yezilimi eziningi, yezinjongo ezijwayelekile eqeqeshwa amahora angu-~680k; iqinile kuma-accents/umsindo wangemuva. Ngokusetshenziswa kudivayisi,
i-whisper.cppinikeza indlela enciphile ye-C/C++. [3][4] -
Ubuchopho : Ukukhetha kwakho - i-LLM yamafu nge-API, injini yemithetho, noma okucatshangwayo kwasendaweni kuye ngamandla ehhashi.
-
I-TTS : I-Piper ikhiqiza inkulumo yemvelo endaweni, ngokushesha ngokwanele ukuze uthole izimpendulo ezisheshayo ku-hardware enesizotha. [5]
Ithebula Lokuqhathanisa Ngokushesha 🔎
| Ithuluzi | Kuhle kakhulu | Inani-ish | Kungani Isebenza |
|---|---|---|---|
| Izwi leNngungumbane | I-trigger ehlala ilalela | Isigaba samahhala + | I-CPU ephansi, inembile, izibopho ezilula [2] |
| Whisper.cpp | I-ASR yendawo ku-Pi | Umthombo ovulekile | Ukunemba okuhle, i-CPU-friendly [4] |
| Shesha-hleba | I-ASR esheshayo ku-CPU/GPU | Umthombo ovulekile | Ukulungiselelwa kwe-CTranslate2 |
| I-Piper TTS | Okukhipha inkulumo yendawo | Umthombo ovulekile | Amazwi asheshayo, izilimi eziningi [5] |
| Cloud LLM API | Ukucabanga okucebile | Ukusetshenziswa okusekelwe | Ilayisha ikhompuyutha enzima |
| I-Node-RED | Izenzo zokuhlela | Umthombo ovulekile | Ukugeleza okubonakalayo, i-MQTT inobungane |
Ukwakha Isinyathelo Ngesinyathelo: I-Voice Loop Yakho Yokuqala 🧩
Sizosebenzisa i-Porcupine ekuphenduleni igama, i-Whisper yokubhala, umsebenzi ongasindi “wobuchopho” ukuze uphendule (shintshanisa nge-LLM yakho oyikhethayo), kanye ne-Piper yokukhuluma. Kugcine kuncane, bese uphindaphinda.
1) Faka okuncikile
I-sudo ifanele ukufaka -y python3-pip portaudio19-dev sox ffmpeg pip3 faka i-sounddevice numpy
-
I-Porcupine: bamba i-SDK/izibopho zolimi lwakho bese ulandela isiqalo esisheshayo (ukhiye wokufinyelela + uhlu lwamagama angukhiye + ozimele bomsindo →
.process). [2] -
Whisper (CPU-friendly): build whisper.cpp :
git clone https://github.com/ggml-org/whisper.cpp cd whisper.cpp && cmake -B ukwakha && cmake --build build -j ./models/download-ggml-model.sh base.en ./build/bin/whisper-cli -m .
Okungenhla kukhombisa ukuqala okusheshayo kwephrojekthi. [4]
Uncamela iPython?
i-speed-whisper(CTranslate2) ivamise ukushesha kune-vanilla Python kuma-CPU anesizotha.
2) Setha i-Piper TTS
git clone https://github.com/rhasspy/piper cd piper yenza # Landa imodeli yezwi oyithandayo, isb, en_US-amy echo "Sawubona lapho." | ./piper --model voices/zu/US-amy-medium.onnx --output_file hello.wav aplay hello.wav
I-Piper yakhelwe i-TTS ekudivayisi enezinketho eziningi zezwi/zolimi. [5]
3) Iluphu yomsizi omncane kuPython
Ihlanganiswe ngamabomu: ilinda umushwana wokuvuka (stub), amarekhodi, alobe nge- whisper.cpp , ikhiqiza impendulo (isimeli), bese ikhuluma nge-Piper. Shintsha isimeli nge-LLM oyikhonzile noma umthetho logic.
import os, subprocess, wave import sounddevice as sd WAKE_WORD = "hey computer" # swap for Porcupine in production [2] RECORD_SECONDS = 6 SAMPLE_RATE = 16000 CHANNELS = 1 WORKDIR = "/home/pi/assistant" ASR_BIN = "/home/pi/whisper.cpp/build/bin/whisper-cli" # [4] ASR_MODEL = "/home/pi/whisper.cpp/models/ggml-base.en.bin" PIPER_BIN = "/home/pi/piper/build/piper" # [5] PIPER_VO "/home/pi/piper/voices/zu/zu_US-amy-medium.onnx" os.makedirs(WORKDIR, exist_ok=True) def record_wav(indlela, imizuzwana=RECORD_SECONDS): okulalelwayo = sd.rec(int(imizuzwana * SAMPLE_RATESAMPLE,CHANTELS_SAMPLE), samplerate=CHANTELS dtype='int16') sd.wait() nge-wave.open(indlela, 'wb') njenge-w: w.setnchannels(CHANNELS); w.setsampwidth(2); w.setframerate(SAMPLE_RATE) w.writeframes(audio.tobytes()) def transcribe(indlela): cmd = [ASR_BIN, "-m", ASR_MODEL, "-f", path, "-otxt"] subprocess.run(cmd, check=True, cwd. vula(WORKDI)=Open". ".txt"), "r", encoding="utf-8") as f: return f.read().strip() def generate_reply(prompt): uma "isimo sezulu" in prompt.lower(): buyisela "Angiwaboni amafu, kodwa kungase kulunge. Letha ibhantshi uma kwenzeka." buyisela "Uthe: " + prompt def speak(umbhalo): proc = subprocess.Popen([PIPER_BIN, "--model", PIPER_VOICE, "--output_file", f"{WORKDIR}/reply.wav"], stdin=subprocess.PIPE) proc.stdin.write(utfen8));(umbhalo. i-proc.stdin.close(); proc.wait() subprocess.run(["i-aplay", f"{WORKDIR}/reply.wav"], check=Iqiniso) phrinta("Umsizi ulungile. Thayipha umusho wokuvuka ukuze uhlole.") kuyilapho Iqiniso: typed = input("> ").strip().lower() uma ibhaliwe == WAKE_WORD: f"{WORKDI = wav_path. record_wav(wav_path) text = bhala(wav_path) reply = generate_reply(text) phrinta("Umsebenzisi:", umbhalo); print("Umsizi:", phendula) khuluma(phendula) okunye: phrinta("Thayipha umusho wokuvuka ukuze uhlole iluphu.")
Ukuze uthole i-wake-word yangempela, hlanganisa umtshina wokusakaza we-Porcupine (i-CPU ephansi, ukuzwela kwegama elingukhiye ngalinye). [2]
Ukushuna Umsindo Okubalulekile Ngempela 🎚️
Ukulungiswa okuncane okumbalwa kwenza umsizi wakho azizwe ehlakaniphile ngo-10×:
-
Ibanga lemakrofoni : 30–60 cm iyindawo emnandi kumakrofoni amaningi e-USB.
-
Amazinga : gwema ukusika okokufaka futhi ugcine ukudlala kunengqondo; lungisa umzila ngaphambi kokujaha izipoki zekhodi. Ku-Raspberry Pi OS, ungaphatha idivayisi yokukhiphayo namazinga ngokusebenzisa amathuluzi ohlelo noma
i-raspi-config. [1] -
I-acoustics yegumbi : izindonga eziqinile zibangela ama-echoes; umata othambile ngaphansi kwemakrofoni uyasiza.
-
Vusa umkhawulo wegama : uzwela kakhulu → izicupho eziyisipoki; uqine kakhulu → uzobe uthethisa upulasitiki. I-Porcupine ikuvumela ukuthi ulungise ukuzwela ngegama elingukhiye. [2]
-
Ama-Thermals : okulotshiweyo okude ku-Pi 5 kuyazuza kokupholisa okusemthethweni okusebenzayo kokusebenza okuqhubekayo. [1]
Ukusuka Kuthoyizi Kuya Entweni Esetshenziswayo: Izinsizakalo, Ukuqalisa Okuzenzakalelayo, Ukuhlola Impilo 🧯
Abantu bayakhohlwa ukusebenzisa imibhalo. Amakhompyutha akhohlwe ukuba muhle. Guqula iluphu yakho ibe yisevisi ephethwe:
-
Dala iyunithi ye-systemd:
[Iyunithi] Incazelo=Umsizi Wezwi we-DIY Ngemva=network.target sound.target [Isevisi] Umsebenzisi=pi WorkingDirectory=/home/pi/umsizi ExecStart=/usr/bin/python3 /home/pi/assistant/assistant.py Restart=always RestartSec=3 [Faka] Wanted-usertar=multi
-
Ivumele:
sudo cp assistant.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl vumela --now assistant.service
-
Imisila yelogi:
journalctl -u umsizi -f
Manje iqala ku-boot, iqala kabusha ekuphahlazekeni, futhi ngokuvamile iziphatha njengomshini. Okuncane okuyisicefe, okungcono kakhulu.
Uhlelo Lwekhono: Kwenze Lusebenzise Ngempela Ekhaya 🏠✨
Uma i-voice-in ne-voice-out isiqinile, engeza izenzo:
-
Irutha yenhloso : imizila yegama elingukhiye elula yemisebenzi evamile.
-
Ikhaya elihlakaniphile : shicilela imicimbi ku-MQTT noma shayela izindawo zokugcina ze-HTTP Zomsizi Wasekhaya.
-
Ama-plugin : imisebenzi esheshayo yePython efana ne
-set_timer,what_is_the_time,play_radio,run_scene.
Ngisho ne-LLM yefu ku-loop, sebenzisa imiyalo yendawo esobala kuqala ngesivinini nokuthembeka.
I-Local Only vs Cloud Assist: Ukuhwebelana Uzozizwa 🌓
zasendaweni kuphela
: okuyimfihlo, okungaxhunyiwe ku-inthanethi, izindleko ezingabikezelwa.
Ububi: amamodeli asindayo angase ahambe kancane emabhodini amancane. Ukuqeqeshwa kwe-Whisper ngezilimi eziningi kusiza ngokuqina uma ukugcine kudivayisi noma kuseva eseduze. [3]
I-Cloud assist
Pros: ukucabanga okunamandla, amawindi womongo amakhulu.
Ububi: idatha ishiya idivayisi, ukuncika kwenethiwekhi, izindleko eziguquguqukayo.
I-hybrid ivamise ukuwina: wake word + ASR local → shayela i-API ukuze ucabange → i-TTS yendawo. [2][3][5]
Ukuxazulula inkinga: I-Strange Gremlins & Quick Fixes 👾
-
Vusa izingcipho zegama ezingamanga : ukuzwela okuphansi noma zama imakrofoni ehlukile. [2]
-
I-ASR lag : sebenzisa imodeli ye-Whisper encane noma yakha
i-whisper.cppngamafulegi okukhululwa (-j --config Release). [4] -
I-Choppy TTS : dala ngaphambili imishwana evamile; qinisekisa idivayisi yakho yomsindo nezilinganiso zesampula.
-
Ayikho imakrofoni etholiwe : hlola
i-arecord -lnezihlanganisi. -
I-Thermal throttling : sebenzisa i-Active Cooler esemthethweni ku-Pi 5 ukuze uthole ukusebenza okuqhubekayo. [1]
Amanothi Okuphepha Nobumfihlo Okufanele Uwafunde Ngempela 🔒
-
Gcina i-Pi yakho ibuyekezwa nge-APT.
-
Uma usebenzisa noma iyiphi i-API yamafu, faka lokho okuthumelayo futhi ucabange ukuhlela kabusha izingcezu zomuntu endaweni kuqala.
-
Qalisa izinsizakalo ngokunenzuzo encane; gwema
i-sudoku-ExecStart ngaphandle uma kudingeka. -
Nikeza ngemodi yendawo kuphela yezivakashi noma amahora athule.
Yakha Okuhlukile: Hlanganisa Futhi Ufanise NjengeSandwich 🥪
-
I-Ultra-local : I-Porcupine + whisper.cpp + Piper + imithetho elula. Iyimfihlo futhi iqinile. [2][4][5]
-
Usizo lwamafu olusheshayo : I-Porcupine + (i-Whisper yendawo encane noma i-cloud ASR) + TTS yendawo + yefu LLM.
-
Isikhungo se-automation yasekhaya : Engeza ukugeleza kwe-Node-RED noma Komsizi Wasekhaya wemijikelezo, izigcawu, nezinzwa.
Isibonelo Samakhono: Izibani Zivuliwe nge-MQTT 💡
import paho.mqtt.client as mqtt MQTT_HOST = "192.168.1.10" TOPIC = "ikhaya/igumbi lokuhlala/ukukhanya/setha" def set_light(state: str): client = mqtt.Client() client.connect(MQTT_HOST, 1883, 60) i-payload = enye i-payload. "CIMILE" client.publish(TOPIC, payload, qos=1, retain=False) client.disconnect() # uma "khanyisa izibani" embhalweni: set_light("khanyisa")
Engeza umugqa wezwi onjengokuthi: "khanyisa isibani segumbi lokuhlala," futhi uzozizwa unjenge wizadi.
Kungani Lesi Sitaki Sisebenza Ngokuzijwayeza 🧪
-
I-Porcupine iyasebenza futhi inembile ekutholeni amazwi e-wake-word kumabhodi amancane, okwenza ukulalela njalo kwenzeke. [2]
-
Ukuqeqeshwa kwe-Whisper okukhulu, ngezilimi eziningi kuyenza iqine ezindaweni ezihlukene kanye nezimpawu zokuphimisela. [3]
-
i-whisper.cppigcina lawo mandla esebenza kumadivayisi e-CPU kuphela njenge-Pi. [4] -
I-Piper igcina izimpendulo zisheshayo ngaphandle kokuthumela umsindo ku-TTS yamafu. [5]
Inde Kakhulu, Angiyifundanga
Yakha umsizi we-DIY AI oyimodyuli, oyimfihlo nge-Raspberry Pi ngokuhlanganisa iNngungumbane yezwi lokuvuka, iWhisper (nge- whisper.cpp ) ye-ASR, ukukhetha kwakho kwengqondo ukuze uthole izimpendulo, kanye ne-Piper ye-TTS yendawo. Isonge njengesevisi yesistimu, shuna umsindo, nentambo ku-MQTT noma izenzo ze-HTTP. Kushibhile kunalokho ucabanga, futhi kujabulisa ngendlela eyinqaba ukuhlala nakho. [1][2][3][4][5]
Izithenjwa
-
I-Raspberry Pi Software & Ukupholisa - I-Raspberry Pi Imager (landa futhi usebenzise) kanye nolwazi lomkhiqizo we-Pi 5 Active Cooler
-
I-Raspberry Pi Imager: funda kabanzi
-
I-Active Cooler (Pi 5): funda kabanzi
-
-
I-Porcupine Wake Word - SDK & isiqalo esisheshayo (amagama angukhiye, ukuzwela, ukuchazwa kwendawo)
-
I-Whisper (imodeli ye-ASR) - Izilimi eziningi, i-ASR eqinile eqeqeshwe ~ ~ 680k amahora
-
U-Radford et al., Ukuqashelwa Kwenkulumo Eqinile Ngokuqondiswa Okubuthakathaka Kwesikali Esikhulu (Ukuhleba): funda kabanzi
-
-
whisper.cpp – I-CPU-friendly Whisper inference ne-CLI futhi wakhe izinyathelo
-
I-Piper TTS - I-TTS esheshayo, yendawo ye-neural enamazwi/izilimi eziningi