# This is a transcription of Hal Daume III's "Yet Another Haskell Tutorial" # The original is 198 PDF pages which is hard to read. # This copy is currently incomplete. # Original is available at http://www.isi.edu/~hdaume/htut/ = About This Report The goal of the /Yet Another Haskell Tutorial/ is to provide a complete intoduction to the Haskell programming language. It assumes no knowledge of the Haskell language or familiarity with functional programming in general. However, general familiarity with programming concepts (such as algorithms) will be helpful. This is not intended to be an introduction to programming in general; rather, to programming in Haskell. Sufficient familiarity with your operating system and a text editor is also necessary (this report only discusses installation on configuration on Windows and *Nix system; other operating systems may be supported -- consult the documentation of your chosen compiler for more information on installing on other platforms). == What is Haskell? Haskell is called a lazy, pure functional programming language. It is called /lazy/ because expressions which are not needed to determine the answer to a problem are not evaluated. The opposize of lazy is /strict/, which is the evaluation strategry of most common programming languages (C, C++, Java, even ML). A strict language is one in which every expression is evaluated, whether the result of its computation is important or not. (This is probably not entirely true as optimizing compilers for strict languages often do what's called "dead code elmination" -- this removes unused expressions from the program.) It is called /pure/ because it does not allow side effects (A side effect is something that affects the "state" of the world. For instance, a function that prints something to the screen is said to be side- effecting, as is a function which affects the value of a global variable.) -- of course, a programming language without side effects would be horribly useless; Haskell uses a system of /monads/ to isolate all impure computations from the rest of the program and perform them in the safe way (see Chapter 9 for a discussion of monads proper or Chapter 5 for how to do input/output in a pure language). Haskell is called a /functional/ language because the evaluation of a program is equivalent to evaluating a function in the pure mathematical sense. This also differs from standard languages (like C and Java) which evaluate a sequence of statements, one after the other (this is termed an /imperative/ langauge). The History of Haskell The history of Haskell is best described using the words of the authors. The following text is quoted from the published version of the Haskell 98 Report: .indent In September of 1987 a meeting was held at the conference on Functional Programming Languages and Computer Architecture (FPCA '87) in Portland, Oregon, to discuss an unfortunate situation in the functional programming community: there had come into being more than a dozen nonstrict, purely functional programming languages, all similar in expressive power and semantic underpinnings. There was a strong consensus at this meeting that more widespread use of this class of functional languages was being hampered by the lack of a common language. It was decided that a committee should be formed to design such a language, providing faster communication of new ideas, a stable foundation for real applications development, and a vehicle through which others would be encouraged to use functional languages. This document describes the result of that committee's efforts: a purely functional programming language called Haskell, named after the logician Haskell B. Curry whose work provides the logical basis for much of ours. The committee's primary goal was to design a language that satisfied these constraints: + It should be suitable for teaching, research, and applications, including building large systems. + It should be completely described via the publication of a formal syntax and semantics. + It should be freely available. Anyone should be permitted to implement the language and distribute it to whomever they please. + It should be based on ideas that enjoy a wide consensus. + It should reduce unnecessary diversity in functional programming languages. The committee intended that Haskell would serve as a basis for future research in language design, and hoped that extensions or variants of the language would appear, incorporating experimental features. Haskell has indeed evolved continuously since its original publication. By the middle of 1997, there had been four iterations of the language design (the latest at that point being Haskell 1.4). At the 1997 Haskell Workshop in Amsterdam, it was decided that a stable variant of Haskell was needed; this stable language is the subject of this Report, and is called "Haskell 98". Haskell 98 was conceived as a relatively minor tidy-up of Haskell 1.4, making some simplifications, and removing some pitfalls for the unwary. It is intended to be a "stable" language in sense the /implementors are committed to supporting Haskell 98 exactly as specified, for the foreseeable future/. The original Haskell Report covered only the language, together with a standard library called the Prelude. By the time Haskell 98 was stabilised, it had become clear that many programs need access to a larger set of library functions (notably concerning input/output and simple interaction with the operating system). If these program were to be portable, a set of libraries would have to be standardised too. A separate effort was therefore begun by a distinct (but overlapping) committee to fix the Haskell 98 Libraries. .indent. == Why Use Haskell? Clearly you're interested in Haskell since you're reading this tutorial. There are many motivations for using Haskell. My personal reason for using Haskell is that I have found that I write more bug-free code in less time using Haskell than any other language. I also find it very readable and extensible. Perhaps most importantly, however, I have consistently found the Haskell community to be incredibly helpful. The language is constantly evolving (that's not to say it's instable; rather that there are numerous extensions that have been added to some compilers which I find very useful) and user suggestions are often heeded when new extensions are to be implemented. == Why Not Use Haskell? My two biggest complaints, and the complaints of most Haskellers I know, are: (1) the generated code tends to be slower than equivalent programs written in a language like C; and (2) it tends to be difficult to debug. The second problem tends not be to a very big issue: most of the code I've written is not buggy, as most of the common sources of bugs in other languages simply don't exist in Haskell. The first issue certainly has come up a few times in my experience; however, CPU time is almost always cheaper than programmer time and if I have to wait a little longer for my results after having saved a few days programming and debugging. Of course, this isn't the case of all applications. Some people may find that the speed hit taken for using Haskell is unbearable. However, Haskell has a standardized /foreign-function interface/ which allow you to link in code written in other languages, for when you need to get the most speed out of your code. If you don't find this sufficient, I would suggest taking a look at the language O'Caml, which often /outperforms/ even C++, yet also has many of the benefits of Haskell. == Target Audience There have been many books and tutorials written about Haskell; for a (nearly) complete list, visit the http://haskell.org/bookshelf (Haskell Bookshelf) at the Haskell homepage. A brief survey of the tutorials available yields: * /A Gentle Introduction to Haskell/ is an introduction to Haskell, given that the reader is familiar with functional programming en large. * /Haskell Companion/ is a short reference of common concepts and definitions. * /Online Haskell Course/ is a short course (in German) for beginning with Haskell. * /Two Dozen Short Lessons in Haskell/ is the draft of an excellent textbook that emphasizes user involvement. * /Haskell Tutorial/ is based on a course given at the 3rd International Summer School on Advanced Functional Programming. * /Haskell for Miranda Programmers/ assumes knowledge of the language Miranda. * /PLEAC-Haskell/ is a tutorial in the style of the Perl Cookbook. Though all of these tutorials is excellent, they are on their own incomplete: The "Gentle Introduction" is far too advanced for beginning Haskellers and the others tend to end too early, or not cover everything. Haskell is full of pitfalls for new programmers and experienced non-functional programmers alike, as can be witnessed by reading through the archives of the Haskell mailing list. It became clear that there is a strong need for a tutorial which is introductory in the sense that it does not assume knowledge of functional programming, but which is advanced in the sense that it /does/ assume some background in programming. Moreover, none of the known tutorials introduce input/output and iteractivity soon enough (not even until the 248th page, as in the case of the Hudak book). This tutorial is not for beginning programmers; some experience and knowledge of programming and computers is assumed (though the appendix does contain some background information). The Haskell language underwent a standardization process and the result is called Haskell 98. The majority of this book will cover the Haskell 98 standard. Any deviations from the standard will be noted (for instance, many compilers offer certain extensions to the standard which are useful; some of these may be discussed). The goals of this tutorial are: * to be /practical/ above all else * to provide a comprehensive, free introduction to the Haskell language * to point out common pitfalls and their solutions * to provide a good sense of how Haskell can be used in the real world == Acknowledgements It would be inappropriate not to give credit also to the original designers of Haskell. Those are: Arvind, Lennart Augustsson, Dave Barton, Brian Boutel, Warren Burton, Jon Fairbairn, Joseph Fasel, Andy Gordon, Maria Guzman, Kevin Hammond, Ralf Hinze, Paul Hudak, John Hughes, Thomas Johnsson, Mark Jones, Dick Kieburtz, John Launchbury, Erik Meijer, Rishiyur Nikhil, John Peterson, Simon Peyton Jones, Mike Reeve, Alastair Reid, Colin Runciman, Philip Wadler, David Wise, Jonathan Young. Finally, I would like to specifically thank Simon Peyton Jones, Simon Marlow, John Hughes, Alastair Reid, Koen Classen, Manuel Chakravarty, Sigbjorn Finne and Sven Panne, all of whom have made my life learning Haskell all the more enjoyable by always being supportive. There were doubtless others who helped and are not listed, but these are those who come to mind. \- Hal Daumé III = Chapter 1 -- Introduction This tutorial contains a whole host of example code, all of which should have been included in its distribution. If not, please refer to the links off of the Haskell web site (`haskell.org`) to get it. This book is formatted to make example code stand out from the rest of the text. Code will look like this. Occasionally, we will refer to interaction betwen you and the operating system and/or the interactive shell (more on this in Section 2). Interaction will look like this. Strewn throughout the tutorial, we will often make additional notes to something written. These are often for making comparisons to other programming languages or adding helpful information. *NOTE* Notes will appear like this. If we're covering a difficult or confusing topic and there is something you should watch out for, we will place a warning. *WARNING* Warnings will appear like this. Finally, we will sometimes make reference to built-in functions (so- called Preludefunctions). This will look something like this: .math map :: (a->b)->[a]->[b] .math Within the body text, Haskell keywords will appear like this: *where*, identifiers as `map`, types as /*String*/ and classes as */Eq/*. = Chapter 2 -- Getting Started There are three well known Haskell system: Hugs, GHC and NHC. Hugs is exclusively an interpreter, meaning that you cannot compile stand-alone programs with it, but can test and debug programs in an interactive environment. GHC is both an interpreter (like Hugs) and a compiler which will produce stand-alone programs. NHC is exclusively a compiler. Which you use is entirely up to you. I've tried to make a list of some of the differences in the following list but of course this is far from exhaustive: - Hugs very fast; implements almost all of Haskell 98 (the standard) and most extensions; built-in support for module browsing; cannot create stand- alones; written in C; works on almost every platform; build in graphics library. - GHC interactive environment is slower than Hugs, but allows function definitions in the environment (in Hugs you have to put them in a file); implements all of Haskell 98 and extensions; good support for interfacing with other languages; in a sense the "de facto" standard. - NHC less used and no interactive environment, but produces smaller and often faster executables than does GHC; supports Haskell 98 and some extensions. I, personally, have all of them installed and use them for different purposes. I tend to use GHC to compile (primarily because I'm most familiar with it) and the Hugs interactive environment, since it is much faster. As such, this is what I would suggest. However, that is a fair amount to download an install, so if you had to go with just one, I'd get GHC, since it contains both a compiler and interactive environment. Following is a descrition of how to download and install each of this as of the time this tutorial was written. It may have changed -- see http://haskell.org (the Haskell website) for up-to-date information. == 2.1 Hugs Hugs supports almost all of the Haskell 98 standard (it lacks some of the libraries), as well as a number of advanced/experimental extensions, including: multi-parameter type classes, extensible records, rank-2 polymorphism, existentials, scoped type variables, and restricted type synonyms. === 2.1.1 Where to get it The official Hugs web page is at: http://haskell.org/hugs. If you go there, there is a link titled "downloading" which will send you to the download page. From that page, you can download the appropriate version of Hugs for your computer. === 2.1.2 Installation procedures Once you've downloaded Hugs, installation differs depending on your platform, however, installation for Hugs is more of less identical to installation for any program on your platform. - For Windows when you click on the "msi" file to download, simply choose "Run This Program" and the installation will begin automatically. From there, just follow the on-screen instructions. - For RPMs use whatever RPM installation program you know best. - For source first gunzip the file, then untar it. Presumably if you're using a system which isn't otherwise supported, you know enough about your system to be able to run configure scripts and make things by hand. == 2.1.3 How to run it On Unix machines, the Hugs interpreter is usually started with a command line of the form: hugs `[option - file]` ... On Windows , Hugs may be started by selecting it from the start menu or by double clicking on a file with the .hs or .lhs extension. (This manual assumes that Hugs has already been successfully installed on your system.) Hugs uses options to set system parameters. These options are distinguished by a leading + or - and are used to customize the behaviour of the interpreter. When Hugs starts, the interpreter performs the following tasks: * Options in the environment are processed. The variable HUGSFLAGS holds these options. On Windows 95/NT, the registry is also queried for Hugs option settings. * Command line options are processed. * Internal data structures are initialized. In particular, the heap is initialized, and its size is fixed at this point; if you want to run the interpreter with a heap size other than the default, then this must be specified using options on the command line, in the environment or in the registry. * The prelude file is loaded. The interpreter will look for the prelude file on the path specified by the -P option. If the prelude, located in the file Prelude.hs, cannot be found in one of the path directories or in the current directory, then Hugs will terminate; Hugs will not run without the prelude file. * Program files specified on the command line are loaded. The effect of a command hugs `f1 ... fn` is the same as starting up Hugs with the hugs command and then typing `:load f1 ... fn`. In particular, the interpreter will not terminate if a problem occurs while it is trying to load one of the specified files, but it will abort the attempted load command. The environment variables and command line options used by Hugs are described in the following sections. === 2.1.4 Program options To list all of the options would take too much space. The most important option at this point is "+98" or "-98". When you start hugs with "+98" it is in Haskell 98 mode, which turns off all extensions. When you start in "-98", you are in Hugs mode and all extensions are turned on. If you've downloaded someone elses code and you're having trouble loading it, first make sure you have the "98" flag set properly. Further information on the Hugs options is in the manual: http://cvs.haskell.org/Hugs/pages/hugsman/started.html. === 2.1.5 How to get help To get Hugs specific help, go to the Hugs web page. To get general Haskell help, go to the Haskell web page. == 2.2 Glasgow Haskell Compiler The Glasgow Haskell Compiler (GHC) is a robust, fully-featured, optimising compiler and interactive environment for Haskell 98; GHC compiles Haskell to either native code or C. It implements numerous experimental language extensions to Haskell 98; for example: concurrency, a foreign language interface, multi-parameter type classes, scoped type variables, existential and universal quantification, unboxed types, exceptions, weak pointers, and so on. GHC comes with a generational garbage collector, and a space and time profiler. === 2.2.1 Where to get it Go to the official GHC web page http://haskell.org/ghc (GHC) to download the latest release. The current version as of the writing of this tutorial is 5.04.2 and can be downloaded off of the GHC download page (follow the "Download" link). From that page, you can download the appropriate version of GHC for your computer. === 2.2.2 Installation procedures Once you've downloaded GHC, installation differs depending on your platform; however, installation for GHC is more of less identical to installation for any program on your platform. - For Windows when you click on the "msi" file to download, simply choose "Run This Program" and the installation will begin automatically. From there, just follow the on-screen instructions. - For RPMs use whatever RPM installation program you know best. - For source first gunzip the file, then untar it. Presumably if you're using a system which isn't otherwise supported, you know enough about your system to be able to run configure scripts and make things by hand. For a more detailed description of the installation procedure, look at the GHC users manual under "Installing GHC". === 2.2.3 How to run the compiler Running the compiler is fairly easy. Assuming that you have a program written with a mainfunction in a file called Main.hs, you can compile it simply by writing: % ghc --make Main.hs -o main The "make" option tells GHC that this is a program and not just a library and you want to build it and all modules it depends on. "Main.hs" stipulates the name of the file to compile; and the "-o main" means that you want to put the output in a file called "main". *NOTE* In Windows, you should say "-o main.exe" to tell Windows that this is an executable file. You can then run the program by simply typing "main" at the prompt. === 2.2.4 How to run the interpreter GHCi is invoked with the command "ghci" or "ghc -interactive". One or more modules or filenames can also be specified on the command line; this instructs GHCi to load the specified modules or filenames (and all the modules they depend on), just as if you had said :load modules at the GHCi prompt. === 2.2.5 Program options To list all of the options would take too much space. The most important option at this point is "-fglasgow-exts". When you start GHCi without "-fglasgow-exts" it is in Haskell 98 mode, which turns off all extensions. When you start with "-fglasgowexts", all extensions are turned on. If you've downloaded someone elses code and you're having trouble loading it, first make sure you have this flag set properly. Further information on the GHC and GHCi options are in the manual off of the GHC web page. === 2.2.6 How to get help To get GHC(i) specific help, go to the GHC web page. To get general Haskell help, go to the Haskell web page. == 2.3 NHC About NHC 3 ... === 2.3.1 Where to get it === 2.3.2 Installation procedures === 2.3.3 How to run it === 2.3.4 Program options === 2.3.5 How to get help == 2.4 Editors With good text editor, programming is fun. Of course, you can get along with simplistic editor capable of just cut-n-paste, but good editor is capable of doing most of the chores for you, letting you concentrate on what you are writing. With respect to programming in Haskell, good text editor should have as much as possible of the following features: * Syntax highlighting for source files * Indentation of source files * Interaction with Haskell interpreter (be it Hugs or GHCi) * Computer-aided code navigation * Code completion At the time of writing, several options were available: Emacs/XEmacs support Haskell via haskell-mode and accompanying Elist code (available from http://www.haskell.org/haskellmode), and 3 ... . What's else available? ... (X)Emacs seem to do the best job, having all the features listed above. Indentation is aware about Haskell's 2-dimensional layout rules (see Section 7.11, very smart and have to be seen in action to be believed. You can quickly jump to the definition of chosen function with the help of "Definitions" menu, and name of the currently edited function is always displayed in the modeline. = Chapter 3 -- Language Basics In this chapter we present the basic concepts of Haskell. In addition to familiarizing you with the interactive environments and showing you how to compile a basic program, we introduce the basic syntax of Haskell, which will probably be quite alien if you are used to languages like C and Java. However, before we talk about specifics of the language, we need to establish some general properties of Haskell. Most importantly, Haskell is a /lazy/ language, which lazy means that no computation takes place unless it is forced to take place when the result of that computation is used. This means, for instance, that you can define infinitely large data structures, provided that you never use the entire structure. For instance, using imperative-esque psuedo-code, we could create an infinite list containing the number 4in each position by doing something like: List makeList() { List current = new List(); current.value = 1; current.next = makeList(); return current; } By looking at this code, we can see what it's trying to do: it creates a new list, sets its value to 4and then recursively calls itself to make the rest of the list. Of course, if you actually wrote this code and called it, the program would never terminate, because makeListwould keep calling itself ad infinitum. This is because we assume this imperative-esque language is /strict/, the opposite of strict lazy. Strict languages are often referred to as "call by value," while lazy languages are referred to as "call by name." In the above psuedo-code, when we "run" makeList on the fifth line, we attempt to get a /value/ out of it. This leads to an infinite loop. The equivalent code in Haskell is: makeList = 1 : makeList This program reads: we're defining something called makeList(this is what goes on the left-hand side of the equals sign). On the right-hand side, we give the definition of makeList. In Haskell, the colon operator is used to create lists (we'll talk more about this soon). This right- hand side says that the value of makeListis the element 1stuck on to the beginning of the value of makeList. However, since Haskell is lazy (or "call by name"), we do not actually attempt to evaluate what makeListis at this point: we simply remember that if ever in the future we need the second element of makeList, we need to just look at makeList. Now, if you attempt to write makeListto a file, print it to the screen, or calculate the sum of its elements, the operation won't terminate because it would have to evaluate an infinitely long list. However, if you simply use a finite portion of the list (say the first 45 elements), the fact that the list is infinitely long doesn't matter. If you only use the first 4/5elements, only the first 4/5elements are ever calculated. This is laziness. Second, Haskell is case-sensitive. Many languages are, but Haskell actually uses case to give meaning. Haskell distinguishes between /values/ (for instance, numbers: 1, 2, 3, ...); strings: "abc", "hello", ... ; characters: 'a', 'b', ' ', ... ; even functions: for instance, the function squares a value, or the square-root function); and /types/ (the categories to which values belong). By itself, this is not unusual. Most languages have some system of types. What is unusual is that Haskell /requires/ that the names given to functions and values begin with a lower- case letter and that the names given to types begin with an upper-case letter. The moral is: if your otherwise correct program won't compile, be sure you haven't named your function Foo, or something else beginning with a capital letter. Being a functional language, Haskell eschews side effects. A side effect is essentially something that happens in the course of executing a function that is not related to the output produced by that function. For instance, in a language like C or Java, you are able to modify "global" variables from within a function. This is a side effect because the modification of this global variable is not related to the output produced by the function. Furthermore, modifying the state of the real world is considered a side effect: printing something to the screen, reading a file, etc., are all side effecting operations. Functions that do not have side effects are called /pure/. An easy test for whether or not a function is pure is to ask yourself a simple question: "Given the same arguments, will this function always produce the same result?". All of this means that if you're used to writing code in an imperative language (like C or Java), you're going to have to start thinking differently. Most importantly, if you have a value x, you must /not/ think of x as a register, a memory location or anything else of that nature. xis simply a name, just as "Hal" is my name. You cannot arbitrarily decide to store a different person in my name any more than you can arbitrarily decide to store a different value in x. This means that code that might look like the following C code is invalid (and has no counterpart) in Haskell: int x = 5; x = x + 1; A call like x = x + 1 is called /destructive update/ because we are destroying whatever was in x before and replacing it with a new value. Destructive update does not exist in Haskell. By not allowing destructive updates (or any other such side effecting operations), Haskell code is very easy to comprehend. That is, when we define a function `f`, and call that function with a particular argument `a` in the beginning of a program, and then, at the end of the program, again call `f` with the same argument `a`, we know we will get out the same result. This is because we know that `a` cannot have changed and because we know that `f` only depends on `a` (for instance, it didn't increment a global counter). This property is called /referential transparency/ and basically states that if two functions `f` and `g` produce the same values for the same arguments, then we may replace `f` with `g` (and vice-versa). *NOTE* There is no agreed-upon exact definition of referential transparency. The definition given above is the one I like best. They all carry the same interpretation; the differences lie in how they are formalized. == 3.1 Arithmetic Let's begin our foray into Haskell with simple arithmetic. Start up your favorite interactive shell (Hugs or GHCi; see Chapter 2 for installation instructions). The shell will output to the screen a few lines talking about itself and what it's doing and then should finish with the cursor on a line reading: Prelude> From here, you can begin to evaluate /expressions/. An expression is basically something that has a value. For instance, the number `5` is an expression (its value is 5). Val ues can be built up from other values; for instance, `5 + 6` is an expression (its value is 11). In fact, most simple arithmetic operations are supported by Haskell, including plus (+), minus (-), times (*), divided-by (/), exponentiation (^) and square- root (sqrt). You can experiment with these by asking the interactive shell to evaluate expressions and to give you their value. In this way, a Haskell shell can be used as a powerful calculator. Try some of the following: Prelude> 5*4+3 23 Prelude> 5^5-2 3123 Prelude> sqrt 2 1.4142135623730951 Prelude> 5*(4+3) 35 We can see that, in addition to the standard arithmetic operations, Haskell also allows grouping by parentheses, hence the difference between the values of *5*4+3* and *5*(4+3)*. The reason for this is that the "understood" grouping of the first expression is operator precedence *(5*4)+3*, due to /operator precedence/. Also note that parentheses aren't required around function arguments. For instance, we simply wrote `sqrt 2`, not `sqrt(2)`, as would be required in most other languages. You could write it with the parentheses, but in Haskell, since function application is so common, parentheses aren't required. .warning Even though parentheses are not always needed, sometimes it is better to leave them in anyway; other people will probably have to read your code, and if extra parentheses make the intent of the code clearer, use them. .warning. Now try entering *2.5000*. Does it work? .note If you're familiar with programming in other languages, you may find it odd that *sqrt 2* comes back with a decimal point (i.e., is a floating point number) even though the argument to the function seems to be an integer. This interchangability of numeric types is due to Haskell's system of /type classes/ and will be discussed in detail in Section 4.3). .note. .exercises - Exercise 3.1 We've seen that multiplication binds more tightly than division. Can you think of a way to determine whether function application binds more or less tightly than multiplication? .exercises. == 3.2 Pairs, Triples and More In addition to single values, we should also address multiple values. For instance, we may want to refer to a position by its @/Acoordinate, which would be a pair of integers. To make a pair of integers is simple: you enclose the pair in parenthesis and separate them with a comma. Try the following: Prelude> (5,3) (5,3)