Urdu Nastalique Optical Character Recognition System

Principal Investigator’s Organization (PIO):

Al-Khawarizmi Institute of Computer Science, UET, Lahore

Principal Investigator (PI):

Dr. Sarmad Hussain


The project developed an Optical Character Recognition system (OCR) to scan the content and to quickly convert it for online publishing, in editable and searchable format. A Nastalique OCR for Urdu provides the necessary impetus required for effectively bringing the much needed culturally relevant indigenous content on line. OCR system allows to search through the existing scanned text posted online. This developed technology has also enabled access to the published material to print disabled Pakistanis (blind and illiterate) as book readers etc. are being developed using this OCR and integrating it with an Urdu Text to Speech system. The project also trained resources in the area of Human Language Technology (HLT), an emerging area worldwide, which integrates research from speech, script and language processing domains for the benefit of people.

Start Date 01-Mar-2012

Duration 30 months

Budget PKR 29.14 million

Status  Closed Project

Progress ReportN/A

Publications   N/A

Thematic Area  Education

Project Website