Harnessing The Data Life Cycle

I wrote this in 2003, before GMail, before Sharepoint, before Onenote, Evernote, or any of that. And I never did anything with it or showed it to anyone. I’m dumb.

INTRODUCTION

There are three distinct touch points to the clean flow of a piece of information’s life cycle:

1. Entry. This can be in a number of ways, and implies both a source (email, web, manually entered, generated by an application), and a repository (database, text file, file system, etc.).
2. Storage. This implies a structure and content. This can be in a text OR binary form.
3. Retrieval. This is (currently and unfortunately) very tightly coupled to the Storage mechanism.

Currently, all three steps are very labor intensive for the user. The task of gathering and storing the data directly affects the way that the user can retrieve data. Additionally, the way that a user retrieves data affects how they store it (web, mail, text, word docs, etc.)

I propose that the following changes could be manifested by adequate software, which would allow the focus to shift almost COMPLETELY to retrieval (3), instead of the current processes, which rely upon the user equally for 1,2 and 3.

1) Entry. Regardless of HOW the data arrives, it should end up being simple data. This is addressed more in #2 below. It suffices to say that The “Software Entity” should not be aware of the mode of provision, only that data is there to be processed.
2) Storage. There needs to be a set of fundamental properties assigned to a unit of data which is universal, and completely descriptive. Of course, xml would be my preferred mode. No data gets dropped, and all attributes are indexed.
3) Retrieval. If storage is done correctly, then retrieval is simply a method of using the latest searching algorithms to gather data from the compiled indexes.

ENTRY

Currently, data is in a number of location (the web, email, text files, files generated that is proprietary or spcific to an application or applications, etc. The onus is on the user to remember the locations and the storage structure (or structures as many users structure their file system one way and email repository differently) and to know WHERE to search for data should they need it. (“Was that info on a web page I saved in my favorites
or an email, from John? In that white paper I saved or that RSS feed I read?”). The single greatest benefit in this area would be a DATA aggregator, which would be not only a repository for files bot ALSO a: mail proxy, a browser proxy, a disk indexer, a freeform tree structure for manual entry, a word processor, a graphics editor, a mail CLIENT, a web CLIENT, an RSS Aggregator, a weblog PROVIDER, etc. etc. etc. If this is done right, it would simply become the desktop through which all data was retrieved, saved or reviewed.

To that end, for retrieval, there are some things that must be done. First, all data must be massaged, or inter[preted, into a common form for processing upon entry into the system (entry, receipt of mail, browsing to a page, etc.)

When data arrives (is created) on a system, it has some fundamental properties:

– Creation Date/Time
– Last Modified Date/Time
– Author (person or agent who captured the data)
– Size
– Content Type (text/binary)
– Source or origin(possibly the author OR the site or location the author got the data from. i.e. A web site URL, a newsgroup, etc.)
– Locale
– Content Name (this would indicate, possibly, how to handle the “grey matter” area below)
– Content

I would propose that permissions, authorization type data might be to far out of scope, and therefor, if necessary, should be included in the content specific data, not the universal attributes.

Once the data has been put into this structure, it is ready to be stored…

STORAGE
Currently, the onus is on the user to create a storage structure, whether it is a file system, outlook folders, nodes in a PIM (i.e. Keynote), a folder structure in an RSS Aggregator, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *