How fascinating are mermaid and pirate stories? Some would spend their entire summer vacations reading every word of these tales, while others would rather be swimming in the sea. For those who prefer the latter, having a summary of the story is a great reason to sit in front of the computer and use Cortex!
The goal of this challenge is to create a complete pipeline in Snowflake for processing PDF files, extracting and chunking their text content, embedding the text for further analysis, and generating summaries of the text chunks.
Setup steps:
- Create the environement: database, schema, warehouse.
- Define a Python function that given the file_url as parameter, it returns a table with a single column
chunk
of typeVARCHAR
. (Hint: check here , very carefully xd ). - Create a stage with a Directory Table. (You can do that directly from Snowsight).
- Upload into the stage the three PDF files (you can find them here).
The goal is to have a table STORIES_SUMMARY_TABLE with three rows (as shown in the figure below), where for each story, you have the relative_path as story_title and the summary of the text that is produced by using a specific Cortex function (guess which one? The challenge’s title could be useful).
Stories about mermaids and pirates are undeniably fascinating. But can we say that it is equally amazing to create summaries of them in this way? Once you solve this challenge, the sea awaits, maybe you will find treasure!
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
Leave a Reply
You must be logged in to post a comment.