Managed groups developing and deploying natural language components within research and product groups. Developed both back-end and front-end tools and technology for Natural Language applications. Has extensive experience in the area of document analysis for the purposes of summarization, knowledge extraction and search.
Architected a multi-year knowledge extraction project in cooperation with the Natural Language and Knowledge Representation groups at Xerox PARC that eventually spun-out as the wikipedia search startup Powerset. Was a member of the early (pre-Google) web-search team at IBM Almaden.
Co-designed one of the first big-data parsing projects (“Data-Oriented Parsing”) with Remko Scha.
Co-authored 24 granted patents. Additionally, 4 Intel patents filed.
As a formal semantic with an interest in linguistics above the level of the sentence, I try to understand how the structure of discourse informs the hearer about the meaning encoded. Using the Linguistic Discourse Model (a theory of discourse structure developed by Livia Polanyi) and versions of dynamic logic (for discourse meaning), I try to understand how different anaphors find their antecedents in texts and dialogs, how they get their meaning once they found them and what that tells us about the way we encode information structure.
Other subjects I am interested in are the semantics of questions and answers, the encoding of information structure, and storing that information efficiently in a database for semantic document search and general knowledge representation.
Intel Corp., Santa Clara, CA — 2012–present
- Managed a small group developing language tools and technologies for the Dialog Express platform.
- Managed a small group developing language tools and technologies for Oakley’s Radar Pace: sunglasses with a build in running and cycling coach.
- Evaluated a number of external companies for potential for acquisition or investment, leading to two acquisitions (at the request of Intel Capital).
- Member of the Security patent committee.
Senior Research SDE
Microsoft Corp. — SVC, Mountain View, CA — 2008–2012
- As part of Whole Page Relevance within Bing, maintained and further developed the core Natural Language Tools and Technologies brought in from Powerset. Including Finite State technology, the syntactic parser (XLE) and the semantic processing component licensed from Xerox PARC.
- Developed summarization technology, for use in the commerce portal within Bing.
- Improved front-end query classifiers for Bing.
- Helped integrate Finite State Machines in Bing query-classifiers and Cortana back-ends.
Powerset, San Francisco, CA — 2006–2008
- First regular employee. Helped set up the company starting in 2005.
- Wrote the first demos shown to the investors.
- Redesigned the document search and indexing technology developed at FXPAL into a Wikipedia search engine.
- As part of the semantics team developed transfer rules to analyze the syntactic parser output and produce indexible facts encoding semantic structure of the indexed texts.
- Developed Ruby libraries for index and query side pipelines to invoke syntactic and semantic processing components written in C, C++,TCL and Prolog. Assisted in port of transfer component from Prolog to C++.
Research Scientist, Senior Research Scientist
FXPAL, Palo Alto, CA — 1998–2002, 2002–2006
- Technical lead and system architecture designer and implementer of a number of experimental text-based natural language applications, including a spoken dialog system, a text-summarization system.
- Working closely with Xerox PARC, The team I led developed a general extensible document storage and analysis system with a build in search and question-answering system that was eventually the basis of the Powerset technology.
Postdoctoral Research Fellow
IBM, Almaden Research Center, San Jose, CA — 1997–1998
- With Soumen Chakrabarti and Byron Dom, developed "Focused Crawling" algorithms that improve on basic web-crawling by looking at the relevance of the content of documents during the web-crawl. The paper discussing this research won the award for Best Paper at the Eighth International World Wide Web Conference in 1999 and was also one of two papers to win the 1999 IBM Computer Science Best Paper Award.
Postdoctoral Research Fellow
IBM, Santa Teresa, San Jose, CA — 1996–1997
- Technical leader four person design and implementation effort of a natural language interface for database queries (as part of DB2 development).
CSLI, Stanford University, Stanford, CA — 1995
- Continued research into the formal aspects of discourse structure and semantics.
Department of Computational Linguistics, University of Amsterdam, Amsterdam, NL — 1994–1996
- Taught Computational Linguistics and Advanced C Programming at the Computational Linguistics Department. Research in Dynamic Logic applied to quantifiers in discourse.
Department of Computational Linguistics, University of Amsterdam, Amsterdam, NL — 1989 –1993
- Did research into the formal aspects of discourse structure and semantics leading to my PhD. Taught Topics in Computational Linguistics, Introduction to First Order Logic and Advanced C Programming.
- Member of the Ad Hoc ANSI Committee on Ontology Standards — 1996–1998
- Member of the Governing Board of the Department of Computational Linguistics, University of Amsterdam — 1991–1996
PhD — Computational Linguistics
University of Amsterdam, Amsterdam, the Netherlands — 1996
Thesis: Some Aspects of the Internal Structure of Discourse. The Dynamics of Nominal Anaphora. Supervisors: Professors Remko Scha and Johan van Benthem.
Undergraduate ("doctoraal" degree) - Theoretical Physics
University of Amsterdam, Amsterdam, the Netherlands — 1989
Thesis: Invariants of Transformations. Comparing Relativistic and Non-relativistic Space-Time Structures.