While this is certainly a cool concept, local voice assistants like this are currently a novelty. Cool to play around with, though!
You can expect around 5 seconds processing time to start generating the response to a basic question on a very basic model like Llama 3 8B.
For context, using Moondream2 (as recommended) on a RasPi 5, it takes around 50 seconds to process an image taken by the Camera and start generating a description.
While this is certainly a cool concept, local voice assistants like this are currently a novelty. Cool to play around with, though!
You can expect around 5 seconds processing time to start generating the response to a basic question on a very basic model like Llama 3 8B.
For context, using Moondream2 (as recommended) on a RasPi 5, it takes around 50 seconds to process an image taken by the Camera and start generating a description.
Interesting, using whisper-fast on Home Assistant on my server computer takes like 2-3 seconds to process and delivery an output in English.
Useful in the smart home space.
Laughably broken in most other languages other than English, but then again, google and Alexa barely work in other languages.