科技回声

Interesting.I have a use case that is slightly different. Maybe someone can suggest a good framework / tool --Our school publishes a PDF daily -- that someone makes by filling a Microsoft Excel template and printing it to PDF / Save As PDF.The excel template is fairly simple -- a block of key-value pairs as a two column table for each subject (fixed number of fields), and N number of such blocks one below the other based on number of subjects covered that day.Now the length of the PDF (whether content fits one page or spills in 2 or 3) as well the scaling of the PDF print (how big or small the text appears) varies a lot due to the inconsistent manual steps they follow.What would be a good way to automate the extraction of text from such a daily PDF feed?I want to load this extracted data into a simple flat table (in say a SQLite database or DynamoDB) and use it to display the same content as a browsable / filterable webpage (showing content from all PDFs till date)I was hoping to take help from ChatGPT code interpreter and write a Python script that I can schedule on AWS Lambda. But if there is a known approach for this kind of document processing, please point me to it. Thanks!

Part of a set of free, no registration asked, click and point, web based functions. Your feedback is welcome.

Pdf text extractor – in pages and regions you define

2 条评论

Pdf text extractor – in pages and regions you define

2 条评论