Using an almost amazingly simple procedure a few weeks ago, I worked a bit with @tmm1 on figuring most of this out. We actually got custom commands working via both proxy and on-device interposing based methods: <a href="http://mobile.twitter.com/tmm1/status/131520489049960449" rel="nofollow">http://mobile.twitter.com/tmm1/status/131520489049960449</a>
A little googling shows some interesting info about the ACE request/header. From skimming, it looks like a header compression method for VOIP on cell/lossy connections.<p>Slide deck: <a href="http://www-rn.informatik.uni-bremen.de/ietf/rohc/ace-033100-aus.pdf" rel="nofollow">http://www-rn.informatik.uni-bremen.de/ietf/rohc/ace-033100-...</a><p>Whitepaper: <a href="http://w3.ualg.pt/~bamine/B3.pdf" rel="nofollow">http://w3.ualg.pt/~bamine/B3.pdf</a>
Looks like guzzoni.apple.com is named after Didier Guzzoni (<a href="http://www.ai.sri.com/~guzzoni/" rel="nofollow">http://www.ai.sri.com/~guzzoni/</a>), an employee at SRI.<p>He's also listed on an interesting Apple patent that was only filed a few weeks ago, "INTELLIGENT AUTOMATED ASSISTANT"(<a href="http://www.wipo.int/patentscope/search/en/WO2011088053" rel="nofollow">http://www.wipo.int/patentscope/search/en/WO2011088053</a>).<p>Some very interesting implementation details there.
The question that springs to my mind is not 'how can I play with this?' but 'Are Apple bringing Siri to the desktop?', seeing as it appears there's nothing specific to the 4S hardware in how this works.<p>I'd quite like to be able to add calendar entries or tweet without moving to another application.
I didn't see anything in this article that mentions that the natural language understanding is done in the cloud. May be I am missing something, but I don't understand why everyone is jumping to the conclusion that the NLU is also done in the cloud and downvoting other's comments that said so.<p>From what I've seen, Siri sends compressed audio to the cloud which translates that to text. What happens to the text and how does that translate to action? Where is this being handled? Is there any proof that this is done in the cloud?
It'd be interesting to see whether or not Apple changed the Siri protocol since the acquisition. Was this originally how Siri worked when it was independent?<p>Because Siri has roots in government contracting (it's named after SRI International, and was originally funded by DARPA) I wonder if the roots of the obfuscation start there rather than at Apple.
I wonder if there are any characteristics about the microphone in Apple devices that the servers could check the audio against to prevent this sort of a thing. There should be a way to somewhat distinguish the device used to record a stream given Apple's control over the devices on which Siri runs and overcoming that would be hard enough for anyone to bother.
I would LOVE to backward-engineer Siri's speech-analysis algorithms. Confidence scores help, but it doesn't look like any other modeling data is available?
Is there a possibility to craft a Siri server reply with malicious code? Shouldn't be too hard for the applidium guys to attempt (maybe even use a fuzzer?)
"Seems like someone at Apple missed something!"<p>What did Apple miss?
(in other words: how could they avoid this, assuming they wanted to avoid such crack)
No one is at all concerned that this is a hack?<p>I know it's interesting stuff, but I'm curious what "rights" Applidium have in publishing this information.<p>With this information, (if I'm not wrong) it wouldn't take long to simply DDoS Siri...<p>Or port Siri to Android (effectively stealing IP).<p>(I have no bias either way, just pointing out, if someone figured out how to reverse engineer dropbox, so you could use their space, without a dropbox account, would we all be going "wow, this is so cool!" or would we be crying out "this is such an irresponsible hack!")