Historically, we've done a lot of OCR work with mixed results in the .NET ecosystem. There were always a lot of tradeoffs. We were super excited when the GPT4V was announced, and quickly saw a lot of potential for data extraction from images. There are a few pitfalls though. Primarily, GPT4V does not currently support tooling (i.e. function calling) and you cannot specify that you expect a JSON result in the API call.<p>...set aside the fact that good OpenAI libraries for .NET are hard to come by...<p>After looking over the way function calling works in completions, we realized we could do the same basic thing against GPT4V. Essentially, we annotate a simple class with a few attribtutes to instruct the API in what we want to accomplish. Then, we convert the object into a JSON schema specification and pass that in the prompt.<p>This has worked pretty well for us, and we've used it in a number of client projects. While I am sure that OpenAI will eventually support all of this directly through the API, I hope that some of our .NET brethren find it useful in the meantime!<p>The library is distributed via nuget, and the GitHub repo includes the library code as well as a simple demo app to check out the functionality locally.