Build a private web application that:

  1. takes a spoken description of a book,
  2. transcribes the request,
  3. decides whether the description is specific enough to identify the target,
  4. asks a follow-up question when needed,
  5. retrieves the text in an unable-to-be-anticipated format via an IRC channel
  6. dynamically, decides how to safely unpacks or normalizes the retrieved file(s), and
  7. reads the result aloud using the student’s own synthesized voice.

Expected Components

  • Browser-based or web-app voice input
  • Speech-to-text
  • LLM or equivalent reasoning component for disambiguation
  • Clarification dialogue
  • Retrieval backend
  • IRC interaction
  • Safe extraction or normalization pipeline
  • Text-to-speech
  • Azure-backed deployment or infrastructure

Evaluation Notes

This project is judged substantially on integration and infrastructure, not only on model demos. Common failure points are:

  • no working fully agentic component
  • no working cloned voice end-to-end
  • local-only implementations without a real web front end
  • incomplete retrieval or file-handling logic

Sample Test Prompts

Sample prompts include:

  • “Wolf Story by William McCleery”
  • “The Dolls’ House by Rumer Godden”
  • “Twig by Elizabeth Orton Jones”
  • “The Twenty-One Balloons by William Pene du Bois”
  • “Emil is sedated with laced chocolate and robbed on a train…”
  • “15th century Poland, alchemy, the Philosopher’s Stone…”

The system must handle both exact titles and vague natural-language descriptions.

Deployment

Students will deploy their pipelines using Azure.

Repository Copy

A markdown copy of this handout is also kept in the repository under _starter_code/final-project/README.md.