Language in Iran

Iran

The main language spoken in Iran is Persian, known inside the country as Farsi and written in a modified Arabic script. Around 87 million people live in Iran, and Persian serves as the first or second language for the great majority. Persian belongs to the Indo-Iranian branch of the Indo-European language family and has been documented in written form for more than 2,500 years, from the Achaemenid royal inscriptions of the sixth century BCE through the classical poetry of Ferdowsi, Rumi, and Hafez to the modern administrative language of the Islamic Republic. The broader historical backdrop to this inscriptional record is covered in our profile of Tehran, the capital of Iran and its archaeological museums.

Iran also hosts around 30 other Iranian languages plus several non-Iranian languages, including Turkish-family Azerbaijani, Arabic, and Armenian. This article tracks Persian from its Old Persian roots through the adoption of the Arabic script after the seventh-century Arab conquest, presents the percentages and regions of every major language spoken in Iran today, explains the names پارسی and فارسی and the four extra Persian letters that distinguish the script from Arabic, covers the Iranian Academy of Persian Language and the politeness convention known as tarof, and provides a small set of essential traveller phrases.

Languages of Iran: Percentages and Regions

Iran ranks among the most linguistically diverse states in the Middle East. The Persian language carries the official status set by Article 15 of the Constitution of the Islamic Republic of Iran, but the same article permits regional and tribal languages for press, mass media, and the teaching of their literature alongside Persian. The table below summarises the most-cited speaker estimates from Iranian census-derived analyses and from Encyclopaedia Iranica.

Language Speakers (% of population) Main regions
Persian (Farsi) 50 to 63 percent (L1); near universal as L2 National; Tehran, Isfahan, Shiraz, Mashhad, central plateau
Azerbaijani Turkic 15 to 20 percent Northwest: East Azerbaijan, West Azerbaijan, Ardabil, Zanjan
Kurdish (Kurmanji and Sorani) 7 to 10 percent West: Kurdistan, Kermanshah, Ilam, parts of West Azerbaijan
Luri (Bakhtiari and Boyer-Ahmadi included) 6 to 9 percent West: Lorestan, Kohgiluyeh, Chaharmahal-Bakhtiari
Mazandarani and Gilaki 6 to 7 percent combined Caspian coast: Mazandaran, Gilan
Arabic (Khuzestani variety) 2 to 3 percent Southwest: Khuzestan
Balochi 2 to 3 percent Southeast: Sistan-Baluchestan
Turkmen around 1 percent Northeast: Golestan
Armenian around 0.1 percent Tehran, Isfahan (New Julfa district)

Smaller speech communities include Assyrian Aramaic in the northwest, Talysh on the Caspian coast, Georgian in pockets of Fereydan, and Romani among Iranian Roma communities. Several Caspian and Caucasian languages are listed by UNESCO as endangered. The Iranian constitution treats Persian as the sole language of national education and government, while licensing regional broadcasters in major minority languages including Azerbaijani Turkic and Kurdish.

The Native Name: From پارسی to فارسی

The Persian language calls itself پارسی Pārsī in its oldest preserved form, after the southern Iranian region of Pārs (the Greek Persis, modern Fars Province) that produced the Achaemenid dynasty. The shift to فارسی Fārsī happened after the seventh-century Arab conquest brought the Arabic alphabet, which lacks a /p/ sound, so Arabic speakers transcribed Pārsī as Fārsī and the new pronunciation gradually became the native name as well. Both forms remain in use: Pārsī survives in poetic and academic registers, while Fārsī is the everyday name across Iran today.

The Persian alphabet runs from 28 Arabic letters plus four additional letters created for Persian-specific consonants that Arabic does not have:

Persian letter Name Sound Arabic-script origin
پ pe p Modified ب (b), three dots below
چ che ch as in chair Modified ج (j), three dots below
ژ že zh as in measure Modified ز (z), three dots above
گ gāf g as in goat Modified ک (k), with overstroke

Persian also uses its own numerals, distinct from both Western Arabic numerals (the 0 to 9 used in English) and Eastern Arabic numerals used in the Middle East. The Persian set ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ differs in the glyphs for 4 (۴ vs ٤), 5 (۵ vs ٥), and 6 (۶ vs ٦), a visual difference that catches travellers out at Iranian bank machines and government forms.

Old Persian and the Achaemenid Inscriptions

Old Persian is the oldest attested form of any Iranian language, recorded in royal inscriptions from the Achaemenid Empire between around 600 and 300 BCE. The script used for these inscriptions was a simplified cuneiform with around 36 syllabic and consonantal signs, made for monumental display rather than daily writing. The longest and best-known of these texts is the Behistun inscription on the Kermanshah-to-Hamadan road, carved on a cliff face under the orders of Darius I between 521 and 486 BCE. The inscription runs in three languages in parallel: Old Persian, Elamite, and Akkadian, and its decipherment by Henry Rawlinson in the 1830s and 1840s opened modern cuneiform studies.

The Cyrus Cylinder, dated to around 539 BCE and now held by the British Museum, records Cyrus the Great’s entry into Babylon. It is written in Akkadian rather than Old Persian, but it sits at the same political and linguistic moment and is often cited as one of the earliest known charters of religious tolerance. Old Persian itself sits inside the Indo-Iranian branch of the Indo-European language family, a position that makes it a distant cousin of Sanskrit on the Indian side and of the European languages descended from Latin, Greek, Germanic, and Slavic roots. The everyday administrative language of the Achaemenid Empire was not Old Persian, however: imperial records went into Aramaic, which worked as a lingua franca across a territory that stretched from Egypt to the Indus.

Middle Persian Under the Sasanians

By the third century CE, the Sasanian dynasty had founded a new Persian empire and a new written form of the language now called Middle Persian or Pahlavi. The Pahlavi script developed from Aramaic letters and stayed in use for several hundred years, both for royal inscriptions and for the religious texts of the Zoroastrian priesthood. The Sasanian period produced a body of Middle Persian writing on history, law, ethics, and Zoroastrian theology, much of which survives only in later copies.

Avestan, the sacred language of the Zoroastrian Gathas attributed to the prophet Zarathustra, sits as the older Eastern Iranian cousin of Middle Persian and remains in liturgical use among Zoroastrian communities in Iran (centred on Yazd) and India (the Parsi community of Mumbai). Middle Persian also spread along the trade routes that linked Sasanian Iran with Central Asia, India, and the Mediterranean world. By the time the Arab conquest reached the Iranian plateau in the seventh century, Middle Persian was the working literary and administrative language of a population that spoke many related Iranian dialects in daily life. The Sasanian collapse in the 650s CE failed to erase the language, although it broke the political and religious institutions that had supported its written form.

New Persian, the Arabic Script, and the Shahnameh

The form of Persian spoken and written today is New Persian, which began to take shape in the ninth and tenth centuries CE in what is now eastern Iran and Central Asia. Unlike its Middle Persian ancestor, New Persian was written in a modified version of the Arabic script with the four extra letters above. Persian also absorbed thousands of loanwords from Arabic during this period, above all in the religious, legal, and scientific domains, although its grammar remained Iranian rather than Semitic.

The early classical period of New Persian poetry produced figures whose work is still read in the original today. Ferdowsi’s Shahnameh, the Book of Kings, completed in 1010 CE after three decades of work, runs to around 60,000 couplets and stands as the longest poem ever written by a single author. Ferdowsi is credited with preserving the Persian language during the height of Arab cultural influence by deliberately minimising Arabic loanwords in his epic, choosing Iranian roots where Arabic alternatives were available. Rumi, Hafez, Saadi, and Omar Khayyam followed in the centuries that came after, and the literary canon they produced remains part of the school curriculum in Iran, Afghanistan, and Tajikistan to this day. The poetry of these writers is recited at public events, weddings, and memorial gatherings across the Persian-speaking world.

From the eleventh century onward, New Persian served as a court and administrative language well beyond Iran itself, including in the Ghaznavid, Seljuk, and Mughal empires. Mughal India used Persian as its main official language for nearly three centuries until the British East India Company replaced it with English in 1837. The Ottoman court chancery used Persian alongside Ottoman Turkish, and Persian remained a prestigious second language across Muslim Central Asia and the Indian subcontinent through the colonial period.

The break to modern Persian poetry came with Nima Yushij (1897 to 1960), generally treated as the father of she’r-e no, the new poetry. Nima broke with the rigid classical metres and rhyme schemes that had governed Persian verse for a thousand years, opened the line length, and influenced a generation of modernist poets including Forough Farrokhzad, Ahmad Shamlou, and Sohrab Sepehri. Modern Persian prose and journalism developed alongside this poetic shift through the late Qajar and Pahlavi periods. Our companion article on the Persian (Farsi) language covers grammar and vocabulary basics for travellers and heritage learners.

Persian, Farsi, Dari, and Tajik Today

In its modern form, Persian is the official language of three countries: Iran, where it is called Farsi or Persian; Afghanistan, where the local form is called Dari; and Tajikistan, where Tajik Persian is written in a Cyrillic alphabet adopted during the Soviet period. The three forms remain mutually intelligible to a high degree, with vocabulary and pronunciation differences between them roughly comparable to those between Spanish, Portuguese, and Italian. The terminology issue around Farsi versus Persian is itself a recent argument.

Persian has been the standard English name for the language for centuries. Farsi is the Arabized form of the older Pārsī. Many speakers inside Iran use Farsi when speaking English, although academic and editorial style guides prefer Persian as the cover term that includes Dari and Tajik as well. Iran has around 87 million people, most of whom use Persian as a first or second language, while Afghanistan adds around 20 million Dari speakers and Tajikistan around 8 million Tajik speakers.

Farhangestān: The Iranian Academy of Persian Language

The فرهنگستان زبان و ادب فارسی Farhangestān-e Zabān va Adab-e Fārsī, the Academy of Persian Language and Literature, is the official state body that regulates standard Persian usage in Iran. The first Farhangestān was founded in 1935 under Reza Shah as part of the wider Pahlavi-era modernisation programme and operated through the late 1930s before being suspended during the Second World War. The current Academy is the third iteration, re-established in 1990 by the Supreme Council of the Cultural Revolution of the Islamic Republic and based in Tehran.

The Academy coins Persian replacement terms for foreign loanwords, especially the wave of English technical vocabulary that entered Iranian Persian through the twentieth century. Some of the Academy’s coined terms have entered everyday Iranian usage, including رایانه rāyāneh for computer, پیامک payāmak for SMS, خودرو khodro for automobile, and یارانه yārāneh for subsidy. Other proposed replacements have been mocked or ignored by Iranian speakers, and the Academy’s tug of war with everyday English-loanword usage runs in parallel to the work of comparable bodies in France (the Académie française) and Iceland (the Árni Magnússon Institute).

Farhangestān also publishes standard spelling guides, regulates the transliteration of foreign proper names into Persian script, and works with national broadcaster IRIB on the language of state media. Persian-speaking Afghanistan and Tajikistan maintain their own language academies which sometimes adopt and sometimes diverge from Tehran’s coinages, producing low-level vocabulary differences across the three national standards.

Tarof: The Persian Politeness Convention

تعارف tarof is the Persian system of ritualised politeness that governs almost every Iranian social interaction. Tarof works through a layered offer-and-refusal pattern: the speaker offers something, expects the listener to refuse the first one or two times out of politeness, and only treats the third or fourth offer as serious. Skipping the ritual is read as rude. Accepting the first offer is read as either naive or greedy depending on context. Foreign visitors who do not understand tarof end up either over-paying for taxis (because the driver waved off the fare twice and meant it the third time) or accidentally insulting hosts (by accepting more food than the offer was meant to provide).

Tarof shows up in specific verbal formulas. ghābel nadāre, “it is not worthy”, is the standard response when a shopkeeper or driver is asked the price; it is not a real waiver and the customer is expected to insist on paying. befarmāid, “please”, offered when sharing food or entering a room ahead of someone, expects a polite refusal before the gesture is accepted. nush-e jān, “may it nourish your soul”, is the host’s blessing over a meal and triggers a small bow from the diner. The convention exists in lighter forms in Afghanistan and Tajikistan, but the dense Iranian version is the one that defines Persian social behaviour.

Tarof is not insincerity. It is a marked grammar of respect that builds room for the listener to refuse without losing face. Persian sociolinguistics treats it as a feature of the language rather than an optional add-on, and English-language guides to Iranian culture that skip tarof give travellers a fundamentally incomplete picture.

Other Iranian Languages of the Plateau

Persian is only one of around 30 Iranian languages spoken across a region that runs from eastern Turkey to western Pakistan. Kurdish is the second most spoken member of the family, with dialect groups including Kurmanji in the north and Sorani in the south, written in Latin, Perso-Arabic, or Cyrillic alphabets depending on the country. Pashto is the main language of the Pashtun population of southern Afghanistan and northern Pakistan and uses a Perso-Arabic script with extra letters for its own consonants. Dress traditions in the Kurdish and Iranian regions are covered in our piece on dress in Iran, which includes the regional variants tied to each linguistic area.

Balochi is spoken across south-eastern Iran, south-western Pakistan, and pockets of southern Afghanistan, with no single standard written form. Smaller Iranian languages include Luri, Mazandarani, Gilaki, and Talysh in northern Iran; Ossetian in the Caucasus; and a long tail of endangered varieties in Central Asia and the Pamir mountains. Many of the smaller languages now face pressure from Persian, Pashto, or Russian as dominant regional tongues.

Finglish: Persian in the Internet Age

Finglish, also called Pinglish or Fingilish, is Persian written in the Latin alphabet. The form developed in the 1990s and 2000s when early SMS keypads, instant messaging clients, and early Internet forums lacked reliable Persian script support, and Iranian users invented an ad hoc Latin transliteration to keep communicating. Common conventions include the use of digits as substitutes for sounds that Latin letters do not fit (the number 4 for ch, the number 5 for kh), and the digit 3 for ع.

Modern smartphones handle Persian script natively, and Iranian users now switch between Persian script and Finglish depending on context. Finglish remains popular in Iranian diaspora communication, in chat apps where typing speed matters, on social media usernames that need to work in Latin-only URL slugs, and in environments where a Persian keyboard layout is not available. The pattern resembles Arabizi (Arabic in Latin script with digits) used across the Arab world for similar reasons.

Travellers’ Persian: Essential Phrases

A small set of Persian phrases goes a long way in Iran. Pronunciation follows standard romanisation, with stress generally on the last syllable.

  • سلام salām – Hello (universal greeting, works any time of day)
  • خداحافظ khodā hāfez – Goodbye (literally “God protect you”)
  • متشکرم motashakkeram – Thank you (formal, full form)
  • مرسی merci – Thank you (casual, borrowed from French, very common)
  • خیلی ممنون kheyli mamnoon – Thank you very much
  • لطفاً lotfan – Please
  • بله bale – Yes (formal); آره āre – Yes (casual)
  • نه na – No
  • ببخشید bebakhshid – Excuse me or sorry
  • اسم من … است esm-e man … ast – My name is …
  • نوش جان nush-e jān – Enjoy the meal (host’s blessing)

Tarof-aware travellers should pay attention to the moment a driver, shopkeeper, or host says ghābel nadāre or befarmāid. The expected response is a polite insistence on paying or on declining the first offer of food. Iranians appreciate visitors who attempt even basic Persian, and the difference in warmth between a salām and an English hello is large enough to repay the small effort of memorising a phrase list.

Frequently Asked Questions

What language do they speak in Iran?

The main language spoken in Iran is Persian, called Farsi inside the country. Persian is the official language for government, education, and media. Other languages spoken in Iran include Azerbaijani Turkish in the north-west (15 to 20 percent of the population), Kurdish in the west (7 to 10 percent), Luri, Mazandarani, Gilaki, Balochi, Arabic in Khuzestan, and Armenian in Isfahan and Tehran.

Is Persian the same as Farsi?

Yes. Persian is the standard English name for the language. Farsi is the local name used inside Iran and is the Arabized form of the older word Pārsī, since Arabic has no p sound and rendered Pārsī as Fārsī after the seventh-century conquest. Both refer to the same language, although scholarly style prefers Persian as the cover term that also includes Dari in Afghanistan and Tajik in Tajikistan.

What script does Persian use?

Modern Persian in Iran and Afghanistan is written in a modified Arabic script with four extra letters for sounds that Arabic lacks: پ pe, چ che, ژ že, and گ gāf. Tajik Persian in Tajikistan is written in a Cyrillic alphabet adopted during the Soviet period. Persian also uses its own numerals (۰ to ۹) that differ slightly from the Eastern Arabic numerals used elsewhere in the Middle East.

How old is the Persian language?

Old Persian inscriptions date back to the sixth century BCE, including the trilingual Behistun inscription carved between 521 and 486 BCE under Darius I. Middle Persian followed in the Sasanian era from the third to the seventh centuries CE, and New Persian has been in use from the ninth century to the present day. This makes Persian one of the longest continuously documented languages in the world.

What is tarof?

Tarof is the Persian system of ritualised politeness that governs Iranian social interaction. It works through a layered offer-and-refusal pattern in which an offer is expected to be politely refused once or twice before being accepted on the third or fourth time. Foreign visitors who do not understand tarof end up either over-paying (because the driver waived the fare twice and meant it the third time) or accidentally insulting hosts. Tarof is built into specific verbal formulas including ghābel nadāre and befarmāid.

How many languages are spoken in Iran?

Iran is home to around 30 Iranian languages plus several non-Iranian languages including Azerbaijani Turkish, Arabic, Armenian, Turkmen, Assyrian Aramaic, Georgian, and Romani. Persian is the dominant national language, but the Azerbaijani-speaking population in the north-west alone numbers 15 to 20 million people. Kurdish, Luri, Mazandarani, Gilaki, and Balochi each have millions of speakers across their respective regions.

What is Finglish?

Finglish, also called Pinglish, is Persian written in the Latin alphabet. The form developed in the 1990s and 2000s when SMS and early Internet platforms lacked reliable Persian script support. Iranian users adopted Latin transliteration with digit substitutes for sounds that Latin letters do not fit (4 for ch, 5 for kh). Finglish remains popular in Iranian diaspora communication and in chat environments where typing speed matters.

Sources and Further Reading

  • Encyclopaedia Iranica, the online reference work hosted by Columbia University, covering Persian, Iranian languages, and Iranian cultural history. Iranica Online
  • Farhangestān-e Zabān va Adab-e Fārsī, the Academy of Persian Language and Literature, institutional site with coined-vocabulary lists. Persian Academy official site
  • UNESCO Atlas of the World’s Languages in Danger, Iranian language entries.
  • Roland G. Kent, Old Persian: Grammar, Texts, Lexicon, American Oriental Society, second edition.
  • Ferdowsi, Shāhnāmeh, completed 1010 CE. Modern English translation by Dick Davis, Shahnameh: The Persian Book of Kings, Penguin Classics.
  • Statistical Centre of Iran, official population data. Statistical Centre of Iran