LispE

Frere_de_la_Quote@alien.top · 3 years ago

LispE

drmeister@alien.top · 3 years ago

Does LispE need to be implemented as its own dialect of Lisp rather than as a library within Common Lisp? I’ve implemented a Common Lisp that interoperates with C++ (https://github.com/clasp-developers/clasp.git). I didn’t dare implement my own language for many reasons. I felt I had to implement my own Common Lisp because C++ interoperation is a complex problem that the common foreign function interface (CFFI) doesn’t solve. I’m pretty sure I made the right decision, but it’s been a long and hard road to achieve anything near a performant implementation. What’s your justification for LispE?

Frere_de_la_Quote@alien.top · 3 years ago

Hum… The answer might surprise you, but there is a real reason why I implemented LispE in the first place, and it was not to create a new Lisp dialect, this came later as a side-effect.

I’m researcher in Computational Linguistics, and I implemented a syntactic analyser during my PhD thesis, which was used as the main tool for research in linguistics in my laboratory for 17 years (from 1999 to 2016). If you look for XIP or Xerox Incremental Parser, you’ll find European projects, articles and patents, which all rely on it.

Let’s be honest, the advent of LLMs has made the whole endeavor quite moot.

Dealing with texts was quite complicated back then, and we had to deal with heterogeneous encodings all the time. Texts were often encoded in Latin, instead of UTF-8, and the Japanese team used a local encoding that was not UTF-8 neither… Big, big mess… I tried to use Python, but it sucked (and still suck) at handling multi-encoded strings, something that was quite common back then, as almost no dataset had been curated for encodings… A real nightmare, especially for languages such as French or Spanish, which use accented letters.

So I implemented my own language, which is called Tamgu now, to handle these problems (see https://github.com/naver/tamgu). There was a lot of work on string manipulation, with many of the main functions such as find or replace being implemented with SIMD instructions. For instance, encoding conversions for Latin, UTF-8, UTF-16 and UTF-32 rely on this SIMD for fast responses.

Tamgu is now used in production and has become a huge project. However, since the language implementation is now too large, I decided to create a very simple demonstrator in C++ to show how Tamgu actually works, to help other people contribute to it.

And Lisp is the easiest language to demonstrate how an AST interpreter works.

However, I also became very interested in Haskell and Array Languages (such as APL) . But since certain concepts eluded me, I used LispE as a platform to experiment with these concepts. For instance if you have a look on: https://github.com/naver/lispe/wiki/5.4-A-la-Haskell, you’ll see how I managed to replicate function composition.

I also experimented with many of the different APL operators such as scan, reduce, rho etc. which I implemented to also better understand their internal working.

Furthermore, LispE is also a Python library, which was a way to better understand Python API. Actually, LispE has been created in such a way that creating external libraries is really simple. Much much simpler than Python by the way. It basically consists of deriving a new class from LispE base class.

LispE also proved simple enough to experiment with Web Assembly.

Finally, LispE is also a very fast multithreaded language, which is almost lock free.

I love LispE, because I can tinker with it as much as I want. Adding a new instruction to the langage takes a couple of minutes. It has many features that are absent from Common Lisp, such as an array based implementation for lists, that makes it the perfect language for Advent of Code or Leetcode.

See: https://github.com/naver/lispe/wiki/6.20-Conway-Game-of-Life-in-LispE for an example of how LispE can help you now.

You can even used it as a terminal to handle Unix commands: https://github.com/naver/lispe/wiki/7.-Shell, especially that I developed my own terminal editor (see https://github.com/naver/lispe/wiki/1.2-Jag:-Terminal-Editor-With-Mouse-Support-and-Colour-Highlighting)

I have more than 30 years of professional experience in implementing parsers and languages in C++ and I thought that at the end of my career it will be fun to share this experience with others.

drmeister@alien.top · 3 years ago

I see - thank you.

arthurno1@alien.top · 3 years ago

Interesting.

The only case, when this structure is less efficient is when you need to insert an element at the head. In this case, we need to move all elements one step forward, which in a buffer is quite simple and efficient:

Have you considered using a gap-buffer instead of ordinary vector?

This solution certainly poses some problems and some traditional Lisp algorithms won’t work as expected.

Since you are using C++, isn’t it possible to implement your own vector with overloaded operators to adhere to Lisp list semantics? In other words, isn’t it possible to make your car/cdr & other access functions access elements from the top of the stack so that ordinary list semantics as in standard Lisps still work as expected? Just a curious question, I guess you have already tried.

Frere_de_la_Quote@alien.top · 3 years ago

Very interesting remark. Basically, a gap buffer is a linked list in disguise, which I wanted to avoid since most implementations become much slower when your data are no longer contiguous in memory.

And what you propose is basically what I have implemented. My list is implemented as two different objects: ITEM and LIST, where ITEM contains a buffer that can be extended at will and LIST contains a pointer to ITEM and an _offset_ value.

When I do a _cdr_, I build a new LIST object that shares the same ITEM pointer as the current list but with an offset incremented by 1.

The implementation is here: https://github.com/naver/lispe/blob/master/include/listes.h

arthurno1@alien.top · 3 years ago

a gap buffer is a linked list in disguise

I see a gap buffer more like a generalized std vector in regard to where you insert the items. I am not sure I understand why you think of it as a list, but perhaps we think about different things?

And what you propose is basically what I have implemented.

Mnjah; I am not sure we think of the same thing in that case. What I proposed, or what I asked why you didn’t do it that way, is to keep the “classical” Lisp semantics for lists so that you don’t need to rework many of the existing algorithms based on lists. No idea if that is important or not, but I don’t see a reason why you need to reverse those just because you use std::vector under the hood to store your lists. Where and how you push/pop could be kept just as an implementation detail.

Another thing that I was thinking of is the cost of traversing the list. By generating a new object for each cdr operation and updating reference counters, it becomes quite costly compared to just returning a pointer. Have you done any benchmarks to compare your idea with the “classical” one? Also, have you looked into cdr coding; which is another technique to make list elements contiguous in memory?

To note, perhaps a new Lisp could dispose with car/cdr operations, as long as you have some other alternative to access and traverse lists. After all cons/car/cdr/push/pop as we know them from classical lisp are defined as they are because of the implementation behind them. There is no definition of what a Lisp is, so nobody says a Lisp has to have those operations exactly as they are known in some older Lisps.

Frere_de_la_Quote@alien.top · 3 years ago

You are right, I implemented other methods to access elements in a list, which do not rely on cdr or car (see nth operator). Actually, I also implemented a parallel Linked Lists, where these functions make more sense. You can choose which structure is more adapted.

(list 1 2 3) ; array list

(llist 1 2 3) ; linked list

The “llist” behaves exactly as traditional lists.

I don’t use std::vector under the hood… :-)