The Árni Magnússon Institute`s Natural Language Processing Platform provides an API where basic text and natural language processing tools for Icelandic can be accessed.
Takes raw Icelandic text as input and returns the text in tokens, along with PoS tags and lemmas
text | string The text to be PoS-tagged and/or lemmatized |
lemma | boolean Boolean flag which, when true, returns the tokens` lemmas, along with the tokens` PoS tags. The lemmatizer used is Nefnir. |
expand_tag | boolean Boolean flag which, when true, returns a JSON-formatted string, containing a human-readable version of the PoS tags` morphological information. |
{- "text": Hér er setning. Hér er önnur.\nSvo er hægt að nota línubil líka.,
- "lemma": true,
- "expand_tag": true
}
{- "submitted": string,
- "sentences": [
- [
- {
- "word": Þetta,
- "tag": fahen,
- "lemma": þessi,
- "expanded_tag": {
- "fall": nefnifall,
- "kyn": hvorugkyn,
- "orðflokkur": fornafn,
- "persóna": 3.persóna,
- "tala": eintala,
- "undirflokkur": ábendingarfornafn
}
}
]
]
}
text required | string The text which contains the words to be segmented. |
hyphenation_mode required | string Value: pattern The method to be used for segmenting words. |
hyphen_type required | string Enum: soft hard custom split The method for word segmentation (inserting characters). |
hyphen_character | string or null Automatically |
{- "text": Þessi setning inniheldur orð sem á að skipta upp.,
- "hyphenation_mode": pattern,
- "hyphen_type": hard,
- "hyphen_character": string
}
{- "sentences": [
- [ ]
]
}