•  

Unstructured Data and LLMs with Crag Wolfe and Matt Robinson

0
0

The majority of enterprise data exists in heterogenous formats such as HTML, PDF, PNG, and PowerPoint. However, large language models do best when trained with clean, curated data. This presents a major data cleaning challenge.




Unstructured is focused on extracting and transforming complex data to prepare it for vector databases and LLM frameworks.




Crag Wolfe is Head of Engineering and Matt Robinson is Head of Product at Unstructured. They join the podcast to talk about data cleaning in the LLM age.


Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .





Sponsorship inquiries: sponsor@softwareengineeringdaily.com




The post Unstructured Data and LLMs with Crag Wolfe and Matt Robinson appeared first on Software Engineering Daily.


No comments yet...
Log in to comment
New
0 0 0
Yesterday

DataStax with Ed Anuff

DataStax is a generative AI data company that provides tools and services to build AI and other data…
0 0 0
2024-06-20

It’s APIs All the Way Down with Marco Palladino

Kong is a software company that provides open-source platforms and cloud services for managing, moni…
0 0 0
2024-06-19

Bitwarden with Matt Bishop

Bitwarden is an open-source password management service that securely stores passwords, passkeys, we…
0 0 0
2024-06-18

Codecademy with Zoe Bachman

Codecademy is an online platform that offers classes on languages including Python, Go, JavaScript, …
0 0 0
2024-06-13

A Decentralized Compute Marketplace with Greg Osuri

Akash Network is a decentralized cloud computing platform that leverages unused compute capacity aro…
0 0 0
2024-06-12

Ruff and Next-Generation Python Tooling with Charlie Marsh

Linting is the process of checking source code for programmatic as well as stylistic errors. Ruff is…

Software Engineering Daily

Technical interviews about software topics.

Log in to Follow

More episodes from Software Engineering Daily

Top Podcasts Top rated Podcasts